WorldWideScience

Sample records for fault tolerant systems

  1. Fault tolerant computing systems

    Fault tolerance involves the provision of strategies for error detection damage assessment, fault treatment and error recovery. A survey is given of the different sorts of strategies used in highly reliable computing systems, together with an outline of recent research on the problems of providing fault tolerance in parallel and distributed computing systems. (orig.)

  2. Fault Tolerant Control Systems

    Bøgh, S.A.

    from this study highlights requirements for a dedicated software environment for fault tolerant control systems design. The second detailed study addressed the detection of a fault event and determination of the failed component. A variety of algorithms were compared, based on two fault scenarios in...... faults, but also that the research field still misses a systematic approach to handle realistic problems such as low sampling rate and nonlinear characteristics of the system. The thesis contributed with methods to detect both faults and specifically with a novel algorithm for the actuator fault...... detection that is superior in terms of performance and complexity to the other algorithms in the comparative study....

  3. Soft Computing Approaches To Fault Tolerant Systems

    Neeraj Prakash Srivastava

    2014-05-01

    Full Text Available We present in this paper as an introduction to soft computing techniques for fault tolerant systems and the terminology with different ways of achieving fault tolerance. The paper focuses on the problem of fault tolerance using soft computing techniques. The fundamentals of soft computing approaches and its type with introduction of fault tolerance are discussed. The main objective is to show how to implement soft computing approaches for fault detection, isolation and identification. The paper contains details about soft computing application with an application of wireless sensor network as fault tolerant system.

  4. Synthesis of Fault-Tolerant Embedded Systems

    Eles, Petru; Izosimov, Viacheslav; Pop, Paul; Peng, Zebo

    This work addresses the issue of design optimization for fault- tolerant hard real-time systems. In particular, our focus is on the handling of transient faults using both checkpointing with rollback recovery and active replication. Fault tolerant schedules are generated based on a conditional...... process graph representation. The formulated system synthesis approaches decide the assignment of fault-tolerance policies to processes, the optimal placement of checkpoints and the mapping of processes to processors, such that multiple transient faults are tolerated, transparency requirements are...

  5. Software fault tolerance in computer operating systems

    Iyer, Ravishankar K.; Lee, Inhwan

    1994-01-01

    This chapter provides data and analysis of the dependability and fault tolerance for three operating systems: the Tandem/GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Based on measurements from these systems, basic software error characteristics are investigated. Fault tolerance in operating systems resulting from the use of process pairs and recovery routines is evaluated. Two levels of models are developed to analyze error and recovery processes inside an operating system and interactions among multiple instances of an operating system running in a distributed environment. The measurements show that the use of process pairs in Tandem systems, which was originally intended for tolerating hardware faults, allows the system to tolerate about 70% of defects in system software that result in processor failures. The loose coupling between processors which results in the backup execution (the processor state and the sequence of events occurring) being different from the original execution is a major reason for the measured software fault tolerance. The IBM/MVS system fault tolerance almost doubles when recovery routines are provided, in comparison to the case in which no recovery routines are available. However, even when recovery routines are provided, there is almost a 50% chance of system failure when critical system jobs are involved.

  6. Energy-efficient fault-tolerant systems

    Mathew, Jimson; Pradhan, Dhiraj K

    2013-01-01

    This book describes the state-of-the-art in energy efficient, fault-tolerant embedded systems. It covers the entire product lifecycle of electronic systems design, analysis and testing and includes discussion of both circuit and system-level approaches. Readers will be enabled to meet the conflicting design objectives of energy efficiency and fault-tolerance for reliability, given the up-to-date techniques presented.

  7. Fault Tolerant Quantum Filtering and Fault Detection for Quantum Systems

    Gao, Qing; Dong, Daoyi; Petersen, Ian R.

    2015-01-01

    This paper aims to determine the fault tolerant quantum filter and fault detection equation for a class of open quantum systems coupled to a laser field that is subject to stochastic faults. In order to analyze this class of open quantum systems, we propose a quantum-classical Bayesian inference method based on the definition of a so-called quantum-classical conditional expectation. It is shown that the proposed Bayesian inference approach provides a convenient tool to simultaneously derive t...

  8. Middleware Fault Tolerance Support for the BOSS Embedded Operating System

    Afonso, Francisco; Carlos A. Silva; Montenegro, Sérgio; Tavares, Adriano

    2006-01-01

    Critical embedded systems need a dependable operating system and application. Despite all efforts to prevent and remove faults in system development, residual software faults usually persist. Therefore, critical systems need some sort of fault tolerance to deal with these faults and also with hardware faults at operation time. This work proposes fault-tolerant support mechanisms for the BOSS embedded operating system, based on the application of proven fault tolerance strategies by middlew...

  9. Fault-tolerant parallel processing system

    Harper, R.E.; Lala, J.H.

    1990-03-06

    This patent describes a fault tolerant processing system for providing processing operations, while tolerating f failures in the execution thereof. It comprises: at least (3f + 1) fault containment regions. Each of the regions includes a plurality of processors; network means connected to the processors and to the network means of the others of the fault containment regions; groups of one or more processors being configured to form redundant processing sites at least one of the groups having (2f + 1) processors, each of the processors of a group being included in a different one of the fault containment regions. Each network means of a fault containment region includes means for providing communication operations between the network means and the network means of the others of the fault containment regions, each of the network means being connected to each other network means by at lest (2f + 1) disjoint communication paths, a minimum of (f + 1) rounds of communication being provided among the network means of the fault containment regions in the execution of a the processing operation; and means for synchronizing the communication operations of the network means with the communications operations of the network means of the other fault containment regions.

  10. Software engineering of fault tolerant systems

    Pelliccione, P; Muccini, Henry

    2007-01-01

    In architecting dependable systems, what is required to improve the overall system robustness is fault tolerance. Many methods have been proposed to this end, the solutions are usually considered late during the design and implementation phases of the software life-cycle (e.g., Java and Windows NT exception handling), thus reducing the effectiveness error and fault handling. Since the system design typically models only normal behaviour of the system while ignoring exceptional ones, the implementation of the system is unable to handle abnormal events. Consequently, the system may fail in unexp

  11. Fault tolerant architecture for artificial olfactory system

    In this paper, to cover and mask the faults that occur in the sensing unit of an artificial olfactory system, a novel architecture is offered. The proposed architecture is able to tolerate failures in the sensors of the array and the faults that occur are masked. The proposed architecture for extracting the correct results from the output of the sensors can provide the quality of service for generated data from the sensor array. The results of various evaluations and analysis proved that the proposed architecture has acceptable performance in comparison with the classic form of the sensor array in gas identification. According to the results, achieving a high odor discrimination based on the suggested architecture is possible. (paper)

  12. Method and system for environmentally adaptive fault tolerant computing

    Copenhaver, Jason L. (Inventor); Jeremy, Ramos (Inventor); Wolfe, Jeffrey M. (Inventor); Brenner, Dean (Inventor)

    2010-01-01

    A method and system for adapting fault tolerant computing. The method includes the steps of measuring an environmental condition representative of an environment. An on-board processing system's sensitivity to the measured environmental condition is measured. It is determined whether to reconfigure a fault tolerance of the on-board processing system based in part on the measured environmental condition. The fault tolerance of the on-board processing system may be reconfigured based in part on the measured environmental condition.

  13. Fault tolerant aggregation for power system services

    Kosek, Anna Magdalena; Gehrke, Oliver; Kullmann, Daniel

    2013-01-01

    Exploiting the flexibility in distributed energy resources (DER) is seen as an important contribution to allow high penetrations of renewable generation in electrical power systems. However, the present control infrastructure in power systems is not well suited for the integration of a very large...... number of small units. A common approach is to aggregate a portfolio of such units together and expose them to the power system as a single large virtual unit. In order to realize the vision of a Smart Grid, concepts for flexible, resilient and reliable aggregation infrastructures are required. This...... paper presents such a concept while focusing on the aspect of resilience and fault tolerance. The proposed concept makes use of a multi-level election algorithm to transparently manage the addition, removal, failure and reorganization of units. It has been implemented and tested as a proof-of-concept on...

  14. Fault-tolerant Actuator System for Electrical Steering of Vehicles

    Sørensen, Jesper Sandberg; Blanke, Mogens

    2006-01-01

    Being critical to the safety of vehicles, the steering system is required to maintain the vehicles ability to steer until it is brought to halt, should a fault occur. With electrical steering becoming a cost-effective candidate for electrical powered vehicles, a fault-tolerant architecture is...... needed that meets this requirement. This paper studies the fault-tolerance properties of an electrical steering system. It presents a fault-tolerant architecture where a dedicated AC motor design used in conjunction with cheap voltage measurements can ensure detection of all relevant faults in the...

  15. Fault detection and fault-tolerant control for nonlinear systems

    Li, Linlin

    2016-01-01

    Linlin Li addresses the analysis and design issues of observer-based FD and FTC for nonlinear systems. The author analyses the existence conditions for the nonlinear observer-based FD systems to gain a deeper insight into the construction of FD systems. Aided by the T-S fuzzy technique, she recommends different design schemes, among them the L_inf/L_2 type of FD systems. The derived FD and FTC approaches are verified by two benchmark processes. Contents Overview of FD and FTC Technology Configuration of Nonlinear Observer-Based FD Systems Design of L2 nonlinear Observer-Based FD Systems Design of Weighted Fuzzy Observer-Based FD Systems FTC Configurations for Nonlinear Systems< Application to Benchmark Processes Target Groups Researchers and students in the field of engineering with a focus on fault diagnosis and fault-tolerant control fields The Author Dr. Linlin Li completed her dissertation under the supervision of Prof. Steven X. Ding at the Faculty of Engineering, University of Duisburg-Essen, Germany...

  16. Abstractions for Fault-Tolerant Distributed System Verification

    Pike, Lee S.; Maddalon, Jeffrey M.; Miner, Paul S.; Geser, Alfons

    2004-01-01

    Four kinds of abstraction for the design and analysis of fault tolerant distributed systems are discussed. These abstractions concern system messages, faults, fault masking voting, and communication. The abstractions are formalized in higher order logic, and are intended to facilitate specifying and verifying such systems in higher order theorem provers.

  17. Fault tolerant control of systems with saturations

    Niemann, Hans Henrik

    2013-01-01

    This paper presents framework for fault tolerant controllers (FTC) that includes input saturation. The controller architecture known from FTC is based on the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization is extended to handle input saturation. Applying this controller architecture in connec...

  18. From fault classification to fault tolerance for multi-agent systems

    Potiron, Katia; Taillibert, Patrick

    2013-01-01

    Faults are a concern for Multi-Agent Systems (MAS) designers, especially if the MAS are built for industrial or military use because there must be some guarantee of dependability. Some fault classification exists for classical systems, and is used to define faults. When dependability is at stake, such fault classification may be used from the beginning of the system's conception to define fault classes and specify which types of faults are expected. Thus, one may want to use fault classification for MAS; however, From Fault Classification to Fault Tolerance for Multi-Agent Systems argues that

  19. Software fault tolerance

    Kazinov, Tofik Hasanaga; Mostafa, Jalilian Shahrukh

    2009-01-01

    Because of our present inability to produce errorfree software, software fault tolerance is and will contiune to be an important consideration in software system. The root cause of software design errors in the complexity of the systems. This paper surveys various software fault tolerance techniquest and methodologies. They are two gpoups: Single version and Multi version software fault tolerance techniques. It is expected that software fault tolerance research will benefit from this research...

  20. Fault Tolerance in Distributed Systems using Fused State Machines

    Balasubramanian, Bharath; Garg, Vijay K

    2013-01-01

    Replication is a standard technique for fault tolerance in distributed systems modeled as deterministic finite state machines (DFSMs or machines). To correct f crash or f/2 Byzantine faults among n different machines, replication requires nf additional backup machines. We present a solution called fusion that requires just f additional backup machines. First, we build a framework for fault tolerance in DFSMs based on the notion of Hamming distances. We introduce the concept of an (f,m)-fusion...

  1. Sensor Fault Tolerant Generic Model Control for Nonlinear Systems

    2000-01-01

    A modified Strong Tracking Filter (STF) is used to develop a new approach to sensor fault tolerant control. Generic Model Control (GMC) is used to control the nonlinear process while the process runs normally because of its robust control performance. If a fault occurs in the sensor, a sensor bias vector is then introduced to the output equation of the process model. The sensor bias vector is estimated on-line during every control period using the STF. The estimated sensor bias vector is used to develop a fault detection mechanism to supervise the sensors. When a sensor fault occurs, the conventional GMC is switched to a fault tolerant control scheme, which is, in essence, a state estimation and output prediction based GMC. The laboratory experimental results on a three-tank system demonstrate the effectiveness of the proposed Sensor Fault Tolerant Generic Model Control (SFTGMC) approach.

  2. Guaranteed Cost Fault-Tolerant Control for Networked Control Systems with Sensor Faults

    Qixin Zhu; Kaihong Lu; Guangming Xie; Yonghong Zhu

    2015-01-01

    For the large scale and complicated structure of networked control systems, time-varying sensor faults could inevitably occur when the system works in a poor environment. Guaranteed cost fault-tolerant controller for the new networked control systems with time-varying sensor faults is designed in this paper. Based on time delay of the network transmission environment, the networked control systems with sensor faults are modeled as a discrete-time system with uncertain parameters. And the mode...

  3. Validated Fault Tolerant Architectures for Space Station

    Lala, Jaynarayan H.

    1990-01-01

    Viewgraphs on validated fault tolerant architectures for space station are presented. Topics covered include: fault tolerance approach; advanced information processing system (AIPS); and fault tolerant parallel processor (FTPP).

  4. Fault tolerant hypercube computer system architecture

    Madan, Herb S. (Inventor); Chow, Edward (Inventor)

    1989-01-01

    A fault-tolerant multiprocessor computer system of the hypercube type comprising a hierarchy of computers of like kind which can be functionally substituted for one another as necessary is disclosed. Communication between the working nodes is via one communications network while communications between the working nodes and watch dog nodes and load balancing nodes higher in the structure is via another communications network separate from the first. A typical branch of the hierarchy reporting to a master node or host computer comprises, a plurality of first computing nodes; a first network of message conducting paths for interconnecting the first computing nodes as a hypercube. The first network provides a path for message transfer between the first computing nodes; a first watch dog node; and a second network of message connecting paths for connecting the first computing nodes to the first watch dog node independent from the first network, the second network provides an independent path for test message and reconfiguration affecting transfers between the first computing nodes and the first switch watch dog node. There is additionally, a plurality of second computing nodes; a third network of message conducting paths for interconnecting the second computing nodes as a hypercube. The third network provides a path for message transfer between the second computing nodes; a fourth network of message conducting paths for connecting the second computing nodes to the first watch dog node independent from the third network. The fourth network provides an independent path for test message and reconfiguration affecting transfers between the second computing nodes and the first watch dog node; and a first multiplexer disposed between the first watch dog node and the second and fourth networks for allowing the first watch dog node to selectively communicate with individual ones of the computing nodes through the second and fourth networks; as well as, a second watch dog node

  5. Fault tolerant filtering and fault detection for quantum systems driven by fields in single photon states

    Gao, Qing; Dong, Daoyi; Petersen, Ian R.; Rabitz, Herschel

    2016-06-01

    The purpose of this paper is to solve the fault tolerant filtering and fault detection problem for a class of open quantum systems driven by a continuous-mode bosonic input field in single photon states when the systems are subject to stochastic faults. Optimal estimates of both the system observables and the fault process are simultaneously calculated and characterized by a set of coupled recursive quantum stochastic differential equations.

  6. Measurement and analysis of operating system fault tolerance

    Lee, I.; Tang, D.; Iyer, R. K.

    1992-01-01

    This paper demonstrates a methodology to model and evaluate the fault tolerance characteristics of operational software. The methodology is illustrated through case studies on three different operating systems: the Tandem GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Measurements are made on these systems for substantial periods to collect software error and recovery data. In addition to investigating basic dependability characteristics such as major software problems and error distributions, we develop two levels of models to describe error and recovery processes inside an operating system and on multiple instances of an operating system running in a distributed environment. Based on the models, reward analysis is conducted to evaluate the loss of service due to software errors and the effect of the fault-tolerance techniques implemented in the systems. Software error correlation in multicomputer systems is also investigated.

  7. Fault Tolerant Services for Safe In-Car Embedded Systems

    Navet, Nicolas; Simonot-Lion, Françoise

    2005-01-01

    Due to the increasing criticality of the functions in terms of safety, embedded automotive systems must now respect stringent dependability constraints despite the faults that may occur in a very harsh environment. In a context where critical functions are distributed over the network, the communication system plays a major role. First, we discuss the main services and functionalities that a communication system should offer for easying the design of fault-tolerant applications in the automot...

  8. Fault-tolerant Control Systems-An Introductory Overview

    Jin Jiang

    2005-01-01

    This paper presents an introductory overview on the development of fault-tolerant control systems. For this reason, the paper is written in a tutorial fashion to summarize some of the important results in this subject area deliberately without going into details in any of them. However, key references are provided from which interested readers can obtain more detailed information on a particular subject. It is necessary to mention that, throughout this paper, no efforts were made to provide an exhaustive coverage on the subject matter. In fact, it is far from it. The paper merely represents the view and experience of its author. It can very well be that some important issues or topics were left out unintentionally. If that is the case, the author sincerely apologizes in advance.After a brief account of fault-tolerant control systems, particularly on the original motivations, and the concept of redundancies, the paper reviews the development of fault-tolerant control systems with highlights to several important issues from a historical perspective. The general approaches to fault-tolerant control has been divided into passive, active, and hybrid approaches. The analysis techniques for active fault-tolerant control systems are also discussed. Practical applications of faulttolerant control are highlighted from a practical and industrial perspective. Finally, some critical issues in this area are discussed as open problems for future research/development in this emerging field.

  9. Data-driven design of fault diagnosis and fault-tolerant control systems

    Ding, Steven X

    2014-01-01

    Data-driven Design of Fault Diagnosis and Fault-tolerant Control Systems presents basic statistical process monitoring, fault diagnosis, and control methods, and introduces advanced data-driven schemes for the design of fault diagnosis and fault-tolerant control systems catering to the needs of dynamic industrial processes. With ever increasing demands for reliability, availability and safety in technical processes and assets, process monitoring and fault-tolerance have become important issues surrounding the design of automatic control systems. This text shows the reader how, thanks to the rapid development of information technology, key techniques of data-driven and statistical process monitoring and control can now become widely used in industrial practice to address these issues. To allow for self-contained study and facilitate implementation in real applications, important mathematical and control theoretical knowledge and tools are included in this book. Major schemes are presented in algorithm form and...

  10. Guaranteed Cost Fault-Tolerant Control for Networked Control Systems with Sensor Faults

    Qixin Zhu

    2015-01-01

    Full Text Available For the large scale and complicated structure of networked control systems, time-varying sensor faults could inevitably occur when the system works in a poor environment. Guaranteed cost fault-tolerant controller for the new networked control systems with time-varying sensor faults is designed in this paper. Based on time delay of the network transmission environment, the networked control systems with sensor faults are modeled as a discrete-time system with uncertain parameters. And the model of networked control systems is related to the boundary values of the sensor faults. Moreover, using Lyapunov stability theory and linear matrix inequalities (LMI approach, the guaranteed cost fault-tolerant controller is verified to render such networked control systems asymptotically stable. Finally, simulations are included to demonstrate the theoretical results.

  11. Design of fault tolerant control system for steam generator using

    Kim, Myung Ki; Seo, Mi Ro [Korea Electric Power Research Institute, Taejon (Korea, Republic of)

    1998-12-31

    A controller and sensor fault tolerant system for a steam generator is designed with fuzzy logic. A structure of the proposed fault tolerant redundant system is composed of a supervisor and two fuzzy weighting modulators. A supervisor alternatively checks a controller and a sensor induced performances to identify which part, a controller or a sensor, is faulty. In order to analyze controller induced performance both an error and a change in error of the system output are chosen as fuzzy variables. The fuzzy logic for a sensor induced performance uses two variables : a deviation between two sensor outputs and its frequency. Fuzzy weighting modulator generates an output signal compensated for faulty input signal. Simulations show that the proposed fault tolerant control scheme for a steam generator regulates well water level by suppressing fault effect of either controllers or sensors. Therefore through duplicating sensors and controllers with the proposed fault tolerant scheme, both a reliability of a steam generator control and sensor system and that of a power plant increase even more. 2 refs., 9 figs., 1 tab. (Author)

  12. A Fault-tolerant Development Methodology for Industrial Control Systems

    Izadi-Zamanabadi, Roozbeh; Thybo, C.

    2004-01-01

    Developing advanced detection schemes is not the lone factor for obtaining a successful fault diagnosis performance. Acquiring significant achievements in applying Fault-tolerance in industrial development requires that fault diagnosis and recovery schemes are developed in a consistent and...

  13. Fault-tolerant design of picture archiving and communication systems

    Reliability is perhaps the most important attribute of a PACS. Any downtime of the system may seriously affect patient care. This paper describes fault-tolerant measures employed in the design of a hospital-wide PACS. Six fault-tolerant measures have been implemented: hardware redundance (networks and archives), data-base backups, monitoring routines for local host processes and network status; uninterruptible power supplied, structured software design techniques, and in-service training of all radiology technologists. A PACS consisting of 13 acquisition nodes, two optical archiving nodes, two data-base server nodes, and five workstation nodes has been developed

  14. Development and application of diagnostic systems to achieve fault tolerance

    Much work is currently being done to develop and apply diagnostic systems that are tolerant to faulted conditions in the process being monitored and in the sensors that measure the critical parameters associated with the process. A fault-tolerant diagnostic system based on state-determination, pattern-recognition techniques is currently undergoing testing and evaluation in certain applications at the EBR-II reactor. Testing and operational experience with the system to date has shown a high degree of tolerance to sensor failures, while being sensitive to very slight changes in the plant operational state. This paper briefly mentions related work being done by others, and describes in more detail the pattern-recognition system and the results of the testing and operational experience with the system at EBR-II. 9 refs., 10 figs

  15. Active Fault Tolerant Control of Livestock Stable Ventilation System

    Gholami, Mehdi

    2011-01-01

    of the hybrid model are estimated by a recursive estimation algorithm, the Extended Kalman Filter (EKF), using experimental data which was provided by an equipped laboratory. Two methods for active fault diagnosis are proposed. The AFD methods excite the system by injecting a so-called excitation...... degraded performance even in the faulty case. In this thesis, we have designed such controllers for climate control systems for livestock buildings in three steps: Deriving a model for the climate control system of a pig-stable. Designing a active fault diagnosis (AFD) algorithm for different kinds of...... fault. Designing a fault tolerant control scheme for the climate control system. In the first step, a conceptual multi-zone model for climate control of a live-stock building is derived. The model is a nonlinear hybrid model. Hybrid systems contain both discrete and continuous components. The parameters...

  16. Fault tolerance control for proton exchange membrane fuel cell systems

    Wu, Xiaojuan; Zhou, Boyang

    2016-08-01

    Fault diagnosis and controller design are two important aspects to improve proton exchange membrane fuel cell (PEMFC) system durability. However, the two tasks are often separately performed. For example, many pressure and voltage controllers have been successfully built. However, these controllers are designed based on the normal operation of PEMFC. When PEMFC faces problems such as flooding or membrane drying, a controller with a specific design must be used. This paper proposes a unique scheme that simultaneously performs fault diagnosis and tolerance control for the PEMFC system. The proposed control strategy consists of a fault diagnosis, a reconfiguration mechanism and adjustable controllers. Using a back-propagation neural network, a model-based fault detection method is employed to detect the PEMFC current fault type (flooding, membrane drying or normal). According to the diagnosis results, the reconfiguration mechanism determines which backup controllers to be selected. Three nonlinear controllers based on feedback linearization approaches are respectively built to adjust the voltage and pressure difference in the case of normal, membrane drying and flooding conditions. The simulation results illustrate that the proposed fault tolerance control strategy can track the voltage and keep the pressure difference at desired levels in faulty conditions.

  17. Fault-tolerant clock synchronization validation methodology. [in computer systems

    Butler, Ricky W.; Palumbo, Daniel L.; Johnson, Sally C.

    1987-01-01

    A validation method for the synchronization subsystem of a fault-tolerant computer system is presented. The high reliability requirement of flight-crucial systems precludes the use of most traditional validation methods. The method presented utilizes formal design proof to uncover design and coding errors and experimentation to validate the assumptions of the design proof. The experimental method is described and illustrated by validating the clock synchronization system of the Software Implemented Fault Tolerance computer. The design proof of the algorithm includes a theorem that defines the maximum skew between any two nonfaulty clocks in the system in terms of specific system parameters. Most of these parameters are deterministic. One crucial parameter is the upper bound on the clock read error, which is stochastic. The probability that this upper bound is exceeded is calculated from data obtained by the measurement of system parameters. This probability is then included in a detailed reliability analysis of the system.

  18. Reliable, fault tolerant control systems for nuclear generating stations

    Two operational features of CANDU Nuclear Power Stations provide for high plant availability. First, the plant re-fuels on-line, thereby eliminating the need for periodic and lengthy refuelling 'outages'. Second, the all plants are controlled by real-time computer systems. Later plants are also protected using real-time computer systems. In the past twenty years, the control systems now operating in 21 plants have achieved an availability of 99.8%, making significant contributions to high CANDU plant capacity factors. This paper describes some of the features that ensure the high degree of system fault tolerance and hence high plant availability. The emphasis will be placed on the fault tolerant features of the computer systems included in the latest reactor design - the CANDU 3 (450MWe). (author)

  19. Testing Virtual Reconfigurable Circuit Designed For A Fault Tolerant System

    P. N. Kumar

    2007-01-01

    Full Text Available This research describes about the testing of virtual reconfigurable circuit (VRC designed and implemented for a fault tolerant system which averages the (three sensor inputs. The circuits that are to be tested are those which are successfully evolved in this system under different situations such as (i all the three sensors are faultless (ii one of the input sensor fails as open (iii sensors fails as short circuit. The objective of this research is to test the desired optimal circuits evolved by decoding the configuration bit streams. The logic simulation tool used to perform fault simulation is AUSIM (Auburn University Simulator.

  20. Testing Virtual Reconfigurable Circuit Designed For A Fault Tolerant System

    P. N. Kumar; S. Anandhi; M. Elancheralathan; J. R.P. Perinbam

    2007-01-01

    This research describes about the testing of virtual reconfigurable circuit (VRC) designed and implemented for a fault tolerant system which averages the (three) sensor inputs. The circuits that are to be tested are those which are successfully evolved in this system under different situations such as (i) all the three sensors are faultless (ii) one of the input sensor fails as open (iii) sensors fails as short circuit. The objective of this research is to test the desired optimal circuits ev...

  1. Performance-Oriented Fault Tolerance in Computing Systems

    Borodin, D

    2010-01-01

    In this dissertation we address the overhead reduction of fault tolerance (FT) techniques. Due to technology trends such as decreasing feature sizes and lowering voltage levels, FT is becoming increasingly important in modern computing systems. FT techniques are based on some form of redundancy. It can be space redundancy (additional hardware), time redundancy (multiple executions), and/or information redundancy (additional verification information). This redundancy significantly increases th...

  2. Development and Evaluation of Fault-Tolerant Flight Control Systems

    Song, Yong D.; Gupta, Kajal (Technical Monitor)

    2004-01-01

    The research is concerned with developing a new approach to enhancing fault tolerance of flight control systems. The original motivation for fault-tolerant control comes from the need for safe operation of control elements (e.g. actuators) in the event of hardware failures in high reliability systems. One such example is modem space vehicle subjected to actuator/sensor impairments. A major task in flight control is to revise the control policy to balance impairment detectability and to achieve sufficient robustness. This involves careful selection of types and parameters of the controllers and the impairment detecting filters used. It also involves a decision, upon the identification of some failures, on whether and how a control reconfiguration should take place in order to maintain a certain system performance level. In this project new flight dynamic model under uncertain flight conditions is considered, in which the effects of both ramp and jump faults are reflected. Stabilization algorithms based on neural network and adaptive method are derived. The control algorithms are shown to be effective in dealing with uncertain dynamics due to external disturbances and unpredictable faults. The overall strategy is easy to set up and the computation involved is much less as compared with other strategies. Computer simulation software is developed. A serious of simulation studies have been conducted with varying flight conditions.

  3. Diagnosis and Fault-Tolerant Control for Thruster-Assisted Position Mooring System

    Nguyen, Trong Dong; Blanke, Mogens; Sørensen, Asgeir

    2007-01-01

    Development of fault-tolerant control systems is crucial to maintain safe operation of o®shore installations. The objective of this paper is to develop a fault- tolerant control for thruster-assisted position mooring (PM) system with faults occurring in the mooring lines. Faults in line's pretens......Development of fault-tolerant control systems is crucial to maintain safe operation of o®shore installations. The objective of this paper is to develop a fault- tolerant control for thruster-assisted position mooring (PM) system with faults occurring in the mooring lines. Faults in line......'s pretension or line breaks will degrade the performance of the positioning of the vessel. Faults will be detected and isolated through a fault diagnosis procedure. When faults are detected, they can be accommodated through the control action in which only parameter of the controlled plant has to be updated to...

  4. Sliding mode based fault detection, reconstruction and fault tolerant control scheme for motor systems.

    Mekki, Hemza; Benzineb, Omar; Boukhetala, Djamel; Tadjine, Mohamed; Benbouzid, Mohamed

    2015-07-01

    The fault-tolerant control problem belongs to the domain of complex control systems in which inter-control-disciplinary information and expertise are required. This paper proposes an improved faults detection, reconstruction and fault-tolerant control (FTC) scheme for motor systems (MS) with typical faults. For this purpose, a sliding mode controller (SMC) with an integral sliding surface is adopted. This controller can make the output of system to track the desired position reference signal in finite-time and obtain a better dynamic response and anti-disturbance performance. But this controller cannot deal directly with total system failures. However an appropriate combination of the adopted SMC and sliding mode observer (SMO), later it is designed to on-line detect and reconstruct the faults and also to give a sensorless control strategy which can achieve tolerance to a wide class of total additive failures. The closed-loop stability is proved, using the Lyapunov stability theory. Simulation results in healthy and faulty conditions confirm the reliability of the suggested framework. PMID:25747198

  5. Fault Tolerant Software: a Multi Agent System Solution

    Caponetti, Fabio; Bergantino, Nicola; Longhi, Sauro

    2009-01-01

    Development of high dependable systems remains a labour intensive task. This paper explores recent advances on the adaptation of the software agent architecture for control application while looking to dependability issues. Multiple agent systems theory will be reviewed giving methods to supervise...... it. Software ageing is shown to be the most common problem and rejuvenation its counteract. The paper will show how an agent population can be monitored, faulty agents isolated and reloaded in a healthy state, hence rejuvenated. The aim is to propose an architecture as basis for the design of control...... software able to tolerate faults and residual bugs without the need of maintenance stops....

  6. Fault-Tolerant Relative Navigation System (RNS) for Docking Project

    National Aeronautics and Space Administration — A method is propsed to develop a sensor fusion process for blending GPS/IMU/EO data for fault tolerant rendezvous and docking of spacecraft. The methodology takes...

  7. Diagnostic software and fault tolerant microprocessor based system architectures

    In numerous industrial applications including power generation, the availability of electronic systems to perform the tasks assigned has become a major issue. At the same time, the functional complexity of these systems has increased enormously. Fortunately, the arrival of cost effective microprocessor based hardware has given the system designer a cadre of techniques to ensure the desired degree of system integrity and availability. These include: dynamic redundancy, isolation, functional diversity, built-in self-tests, embedded test subsystems, communications, error checking and error correcting codes, etc. The choice among the available techniques is generally heuristic and depends greatly on the structure of major components and systems external to the electronic system itself as well as the postulated faults and their relative frequency. Indiscriminate use of these techniques will inevitably increase cost and reduce maintainability while actually reducing system availability and reliability. The issues and the application of these techniques are discussed by describing recent examples of fault tolerant microprocessor based system architectures which include the Plant Safety Monitoring System, the EAGLE-21 Process Protection System and the Advanced Rod Position Indication System for pressurized water reactors. Each of these systems utilize unique internal architectures that address the reliability, availability, and the communications issues while improving maintainability and man-machine interfaces

  8. Fault Tolerant Feedback Control

    Stoustrup, Jakob; Niemann, H.

    2001-01-01

    An architecture for fault tolerant feedback controllers based on the Youla parameterization is suggested. It is shown that the Youla parameterization will give a residual vector directly in connection with the fault diagnosis part of the fault tolerant feedback controller. It turns out that there...... is a separation be-tween the feedback controller and the fault tolerant part. The closed loop feedback properties are handled by the nominal feedback controller and the fault tolerant part is handled by the design of the Youla parameter. The design of the fault tolerant part will not affect the...... design of the nominal feedback con-troller....

  9. Distributed Evaluation Functions for Fault Tolerant Multi-Rover Systems

    Agogino, Adrian; Turner, Kagan

    2005-01-01

    The ability to evolve fault tolerant control strategies for large collections of agents is critical to the successful application of evolutionary strategies to domains where failures are common. Furthermore, while evolutionary algorithms have been highly successful in discovering single-agent control strategies, extending such algorithms to multiagent domains has proven to be difficult. In this paper we present a method for shaping evaluation functions for agents that provide control strategies that both are tolerant to different types of failures and lead to coordinated behavior in a multi-agent setting. This method neither relies of a centralized strategy (susceptible to single point of failures) nor a distributed strategy where each agent uses a system wide evaluation function (severe credit assignment problem). In a multi-rover problem, we show that agents using our agent-specific evaluation perform up to 500% better than agents using the system evaluation. In addition we show that agents are still able to maintain a high level of performance when up to 60% of the agents fail due to actuator, communication or controller faults.

  10. Disturbance observer based fault estimation and dynamic output feedback fault tolerant control for fuzzy systems with local nonlinear models.

    Han, Jian; Zhang, Huaguang; Wang, Yingchun; Liu, Yang

    2015-11-01

    This paper addresses the problems of fault estimation (FE) and fault tolerant control (FTC) for fuzzy systems with local nonlinear models, external disturbances, sensor and actuator faults, simultaneously. Disturbance observer (DO) and FE observer are designed, simultaneously. Compared with the existing results, the proposed observer is with a wider application range. Using the estimation information, a novel fuzzy dynamic output feedback fault tolerant controller (DOFFTC) is designed. The controller can be used for the fuzzy systems with unmeasurable local nonlinear models, mismatched input disturbances, and measurement output affecting by sensor faults and disturbances. At last, the simulation shows the effectiveness of the proposed methods. PMID:26456728

  11. Adaptive fault-tolerant control of linear systems with actuator saturation and L2-disturbances

    Wei GUAN; Guanghong YANG

    2009-01-01

    This paper studies the problem of designing adaptive fault-tolerant H-infinity controllers for linear timeinvariant systems with actuator saturation. The disturbance tolerance ability of the closed-loop system is measured by an optimal index. The notion of an adaptive H-infinity performance index is proposed to describe the disturbance attenuation performances of closed-loop systems. New methods for designing indirect adaptive fault-tolerant controllers via state feedback are presented for actuator fault compensations. Based on the on-line estimation of eventual faults, the adaptive fault-tolerant controller parameters are updated automatically to compensate for the fault effects on systems. The designs are developed in the framework of the linear matrix inequality (LMI) approach, which can guarantee the disturbance tolerance ability and adaptive H-infinity performances of closed-loop systems in the cases of actuator saturation and actuator failures. An example is given to illustrate the efficiency of the design method.

  12. Modeling the Fault Tolerant Capability of a Flight Control System: An Exercise in SCR Specification

    Alexander, Chris; Cortellessa, Vittorio; DelGobbo, Diego; Mili, Ali; Napolitano, Marcello

    2000-01-01

    In life-critical and mission-critical applications, it is important to make provisions for a wide range of contingencies, by providing means for fault tolerance. In this paper, we discuss the specification of a flight control system that is fault tolerant with respect to sensor faults. Redundancy is provided by analytical relations that hold between sensor readings; depending on the conditions, this redundancy can be used to detect, identify and accommodate sensor faults.

  13. Validation Methods for Fault-Tolerant avionics and control systems, working group meeting 1

    1979-01-01

    The proceedings of the first working group meeting on validation methods for fault tolerant computer design are presented. The state of the art in fault tolerant computer validation was examined in order to provide a framework for future discussions concerning research issues for the validation of fault tolerant avionics and flight control systems. The development of positions concerning critical aspects of the validation process are given.

  14. Fault-tolerant for Electric Vehicles Drive System Sensor Failure

    Zhang Liwei

    2013-10-01

    Full Text Available When EV failure happens, it needs to take some fault-tolerant method to ensure people’s safety. When the current sensor and speed sensor are out of work, the software fault-tolerant control algorithm switching strategy can be used. This paper has done theoretical analysis of the rotor field-oriented vectoe control algorithm into the open loop constant V/F control algorithm, and the phase angle compensation method is used to reduce the shock of current and torque, and simulation is done in MATLAB/Simulink.    

  15. Reliability modeling of digital component in plant protection system with various fault-tolerant techniques

    Highlights: • Integrated fault coverage is introduced for reflecting characteristics of fault-tolerant techniques in the reliability model of digital protection system in NPPs. • The integrated fault coverage considers the process of fault-tolerant techniques from detection to fail-safe generation process. • With integrated fault coverage, the unavailability of repairable component of DPS can be estimated. • The new developed reliability model can reveal the effects of fault-tolerant techniques explicitly for risk analysis. • The reliability model makes it possible to confirm changes of unavailability according to variation of diverse factors. - Abstract: With the improvement of digital technologies, digital protection system (DPS) has more multiple sophisticated fault-tolerant techniques (FTTs), in order to increase fault detection and to help the system safely perform the required functions in spite of the possible presence of faults. Fault detection coverage is vital factor of FTT in reliability. However, the fault detection coverage is insufficient to reflect the effects of various FTTs in reliability model. To reflect characteristics of FTTs in the reliability model, integrated fault coverage is introduced. The integrated fault coverage considers the process of FTT from detection to fail-safe generation process. A model has been developed to estimate the unavailability of repairable component of DPS using the integrated fault coverage. The new developed model can quantify unavailability according to a diversity of conditions. Sensitivity studies are performed to ascertain important variables which affect the integrated fault coverage and unavailability

  16. Reliability modeling of digital component in plant protection system with various fault-tolerant techniques

    Kim, Bo Gyung, E-mail: bogyungkim@kaist.ac.kr [Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701 (Korea, Republic of); Kang, Hyun Gook [Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701 (Korea, Republic of); Department of Nuclear Engineering, Khalifa University of Science, Technology and Research, Abu Dhabi (United Arab Emirates); Kim, Hee Eun [Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701 (Korea, Republic of); Lee, Seung Jun [Integrated Safety Assessment Team, Korea Atomic Energy Research Institute, 1045, Daedeok-daero, Daejeon 305-353 (Korea, Republic of); Seong, Poong Hyun [Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701 (Korea, Republic of)

    2013-12-15

    Highlights: • Integrated fault coverage is introduced for reflecting characteristics of fault-tolerant techniques in the reliability model of digital protection system in NPPs. • The integrated fault coverage considers the process of fault-tolerant techniques from detection to fail-safe generation process. • With integrated fault coverage, the unavailability of repairable component of DPS can be estimated. • The new developed reliability model can reveal the effects of fault-tolerant techniques explicitly for risk analysis. • The reliability model makes it possible to confirm changes of unavailability according to variation of diverse factors. - Abstract: With the improvement of digital technologies, digital protection system (DPS) has more multiple sophisticated fault-tolerant techniques (FTTs), in order to increase fault detection and to help the system safely perform the required functions in spite of the possible presence of faults. Fault detection coverage is vital factor of FTT in reliability. However, the fault detection coverage is insufficient to reflect the effects of various FTTs in reliability model. To reflect characteristics of FTTs in the reliability model, integrated fault coverage is introduced. The integrated fault coverage considers the process of FTT from detection to fail-safe generation process. A model has been developed to estimate the unavailability of repairable component of DPS using the integrated fault coverage. The new developed model can quantify unavailability according to a diversity of conditions. Sensitivity studies are performed to ascertain important variables which affect the integrated fault coverage and unavailability.

  17. Fault diagnosis and fault-tolerant control strategies for non-linear systems analytical and soft computing approaches

    Witczak, Marcin

    2014-01-01

      This book presents selected fault diagnosis and fault-tolerant control strategies for non-linear systems in a unified framework. In particular, starting from advanced state estimation strategies up to modern soft computing, the discrete-time description of the system is employed Part I of the book presents original research results regarding state estimation and neural networks for robust fault diagnosis. Part II is devoted to the presentation of integrated fault diagnosis and fault-tolerant systems. It starts with a general fault-tolerant control framework, which is then extended by introducing robustness with respect to various uncertainties. Finally, it is shown how to implement the proposed framework for fuzzy systems described by the well-known Takagi–Sugeno models. This research monograph is intended for researchers, engineers, and advanced postgraduate students in control and electrical engineering, computer science,as well as mechanical and chemical engineering.

  18. Evaluation of digital fault-tolerant architectures for nuclear power plant control systems

    Four fault tolerant architectures were evaluated for their potential reliability in service as control systems of nuclear power plants. The reliability analyses showed that human- and software-related common cause failures and single points of failure in the output modules are dominant contributors to system unreliability. The four architectures are triple-modular-redundant (TMR), both synchronous and asynchronous, and also dual synchronous and asynchronous. The evaluation includes a review of design features, an analysis of the importance of coverage, and reliability analyses of fault tolerant systems. An advantage of fault-tolerant controllers over those not fault tolerant, is that fault-tolerant controllers continue to function after the occurrence of most single hardware faults. However, most fault-tolerant controllers have single hardware components that will cause system failure, almost all controllers have single points of failure in software, and all are subject to common cause failures. Reliability analyses based on data from several industries that have fault-tolerant controllers were used to estimate the mean-time-between-failures of fault-tolerant controllers and to predict those failures modes that may be important in nuclear power plants. 7 refs., 4 tabs

  19. Ship Propulsion System as a Benchmark for Fault-Tolerant Control

    Izadi-Zamanabadi, Roozbeh; Blanke, M.

    1998-01-01

    -tolerant control is a fairly new area. The paper presents a ship propulsion system as a benchmark that should be useful as a platform for development of new ideas and comparison of methods. The benchmark has two main elements. One is development of efficient FDI algorithms, the other is analysis and implementation......Fault-tolerant control combines fault detection and isolation techniques with supervisory control to achieve autonomous accommodation of faults before they develop into failures. While fault detection and isolation (FDI) methods have matured during the past decade the extension to fault...... of autonomous fault accommodation. A benchmark kit can be obtained from the authors....

  20. Boolean Logic with Fault Tolerant Coding

    Alagoz, B. Baykant

    2009-01-01

    Error detectable and error correctable coding in Hamming space was researched to discover possible fault tolerant coding constellations, which can implement Boolean logic with fault tolerant property. Basic logic operators of the Boolean algebra were developed to apply fault tolerant coding in the logic circuits. It was shown that application of three-bit fault tolerant codes have provided the digital system skill of auto-recovery without need for designing additional-fault tolerance mechanisms.

  1. Industrial Cost-Benefit Assessment for Fault-tolerant Control Systems

    Thybo, C.; Blanke, M.

    1998-01-01

    Economic aspects are decisive for industrial acceptance of research concepts including the promising ideas in fault tolerant control. Fault tolerance is the ability of a system to detect, isolate and accommodate a fault, such that simple faults in a sub-system do not develop into failures at a...... system level. In a design phase for an industrial system, possibilities span from fail safe design where any single point failure is accommodated by hardware, over fault-tolerant design where selected faults are handled without extra hardware, to fault-ignorant design where no extra precaution is taken...... against failure. The paper describes the assessments needed to find the right path for new industrial designs. The economic decisions in the design phase are discussed: cost of different failures, profits associated with available benefits, investments needed for development and life-time support. The...

  2. Software fault tolerance

    Strigini, Lorenzo

    1990-01-01

    Software design faults are a cause of major concern, and their relative importance is growing as techniques for tolerating hardware faults gain wider acceptance. The application of fault tolerance to design faults is both increasing, in particular in some life-critical applications, and controversial, due to the imperfect state of knowledge about it. This paper surveys the existing applications and research results, to help the reader form an initial picture of the existing possibilities, and...

  3. Passive Fault Tolerant Control of Piecewise Affine Systems Based on H Infinity Synthesis

    Gholami, Mehdi; Cocquempot, vincent; Schiøler, Henrik; Bak, Thomas

    2011-01-01

    In this paper we design a passive fault tolerant controller against actuator faults for discretetime piecewise affine (PWA) systems. By using dissipativity theory and H analysis, fault tolerant state feedback controller design is expressed as a set of Linear Matrix Inequalities (LMIs). In the cur...... current paper, the PWA system switches not only due to the state but also due to the control input. The method is applied on a large scale livestock ventilation model....

  4. Synthesis of Fault-Tolerant Embedded Systems with Checkpointing and Replication

    Izosimov, Viacheslav; Pop, Paul; Eles, Petru; Peng, Zebo

    We present an approach to the synthesis of fault-tolerant hard real-time systems for safety-critical applications. We use checkpointing with rollback recovery and active replication for tolerating transient faults. Processes are statically scheduled and communications are performed using the time...

  5. Energy-Aware Synthesis of Fault-Tolerant Schedules for Real-Time Distributed Embedded Systems

    Poulsen, Kåre Harbo; Pop, Paul; Izosimov, Viacheslav

    This paper presents a design optimisation tool for distributed embedded real-time systems that 1) decides mapping, fault-tolerance policy and generates a fault-tolerant schedule, 2) is targeted for hard real-time, 3) has hard reliability goal, 4) generates static schedule for processes and messages...

  6. High-Intensity Radiated Field Fault-Injection Experiment for a Fault-Tolerant Distributed Communication System

    Yates, Amy M.; Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Gonzalez, Oscar R.; Gray, W. Steven

    2010-01-01

    Safety-critical distributed flight control systems require robustness in the presence of faults. In general, these systems consist of a number of input/output (I/O) and computation nodes interacting through a fault-tolerant data communication system. The communication system transfers sensor data and control commands and can handle most faults under typical operating conditions. However, the performance of the closed-loop system can be adversely affected as a result of operating in harsh environments. In particular, High-Intensity Radiated Field (HIRF) environments have the potential to cause random fault manifestations in individual avionic components and to generate simultaneous system-wide communication faults that overwhelm existing fault management mechanisms. This paper presents the design of an experiment conducted at the NASA Langley Research Center's HIRF Laboratory to statistically characterize the faults that a HIRF environment can trigger on a single node of a distributed flight control system.

  7. Towards fault-tolerant decision support systems for ship operator guidance

    Nielsen, Ulrik Dam; Lajic, Zoran; Jensen, Jørgen Juncher

    2012-01-01

    Fault detection and isolation are very important elements in the design of fault-tolerant decision support systems for ship operator guidance. This study outlines remedies that can be applied for fault diagnosis, when the ship responses are assumed to be linear in the wave excitation. A novel num...

  8. Fault-Tolerant Control of the Road Wheel Subsystem in a Steer-By-Wire System

    Bing Zheng

    2008-01-01

    Full Text Available This paper describes a fault-tolerant steer-by-wire road wheel control system. With dual motor and dual microcontroller architecture, this system has the capability to tolerate single-point failures without degrading the control system performance. The arbitration bus, mechanical arrangement of motors, and the developed control algorithm allow the system to reconfigure itself automatically in the event of a single-point fault, and assure a smooth reconfiguration process. Both simulation and experimental results illustrate the effectiveness of the proposed fault-tolerant control system.

  9. Advanced information processing system - Status report. [for fault tolerant and damage tolerant data processing for aerospace vehicles

    Brock, L. D.; Lala, J.

    1986-01-01

    The Advanced Information Processing System (AIPS) is designed to provide a fault tolerant and damage tolerant data processing architecture for a broad range of aerospace vehicles. The AIPS architecture also has attributes to enhance system effectiveness such as graceful degradation, growth and change tolerance, integrability, etc. Two key building blocks being developed by the AIPS program are a fault and damage tolerant processor and communication network. A proof-of-concept system is now being built and will be tested to demonstrate the validity and performance of the AIPS concepts.

  10. Efficient Fault Tree Analysis of Complex Fault Tolerant Multiple-Phased Systems

    MO Yuchang; LIU Hongwei; YANG Xiaozong

    2007-01-01

    Fault tolerant multiple phased systems (FTMPS), i.e., systems whose critical components are independently replicated and whose operational life can be partitioned in a set of disjoint periods, are called "phases". Because of their deployment in critical applications, their reliability analysis is a task of primary relevance to validate the designs. Fault tree analysis based on binary decision diagram (BDD) is one of the most commonly used techniques for FTMPS reliability analysis. To utilize the technique the fault tree structure of FTMPS needs to be converted into the corresponding BDD format. Our research work shows that the system BDD generation algorithms presented in the literature are too inefficient to be used for industrial complex FTPMS because of the problems, such as variable ordering and combination of large BDDs. This paper presents a more efficient approach consisting of a flatting pre-processing technique, a proved efficient ordering heuristic and a bottom-up generation algorithm. The approach tries to combine share-variable BDDs by complex combination operation firstly and then combine no-share-variable BDDs using simple combination operation, thus to alvoid the intensive computations caused by large BDD combination operations. An example FTMPS is analyzed to illustrate the advantages of our approach.

  11. Fault-Tolerant Control using Adaptive Time-Frequency Method in Bearing Fault Detection for DFIG Wind Energy System

    Suratsavadee Koonlaboon KORKUA

    2015-02-01

    Full Text Available With the advances in power electronic technology, doubly-fed induction generators (DFIG have increasingly drawn the interest of the wind turbine industry. To ensure the reliable operation and power quality of wind power systems, the fault-tolerant control for DFIG is studied in this paper. The fault-tolerant controller is designed to maintain an acceptable level of performance during bearing fault conditions. Based on measured motor current data, an adaptive statistical time-frequency method is then used to detect the fault occurrence in the system; the controller then compensates for faulty conditions. The feature vectors, including frequency components located in the neighborhood of the characteristic fault frequencies, are first extracted and then used to estimate the next sampling stator side current, in order to better perform the current control. Early fault detection, isolation and successful reconfiguration would be very beneficial in a wind energy conversion system. The feasibility of this fault-tolerant controller has been proven by means of mathematical modeling and digital simulation based on Matlab/Simulink. The simulation results of the generator output show the effectiveness of the proposed fault-tolerant controller.

  12. Design and Assessment of a Multiple Sensor Fault Tolerant Robust Control System

    J. Chen

    2008-03-01

    Full Text Available This paper presents an enhanced robust control design structure to realise fault tolerance towards sensor faults suitable for multi-input-multi-output (MIMO systems implementation. The proposed design permits fault detection and controller elements to be designed with considerations to stability and robustness towards uncertainties besides multiple faults environment on a common mathematical platform. This framework can also cater to systems requiring fast responses. A design example is illustrated with a fast, multivariable and unstable system, that is, the double inverted pendulum system. Results indicate the potential of this design framework to handle fast systems with multiple sensor faults.

  13. Fault-Tolerant Design of Spaceborne Mass Memory System

    张宇宁; 常亮; 杨根庆; 李华旺

    2010-01-01

    A fault-tolerant spaceborne mass memory architecture is presented based on entirely commercial-off-theshelf components.The highly modularized and scalable memory kernel supports the hierarchical design and is well suited to redundancy structure.Error correcting code(ECC) and periodical scrubbing are used to deal with bit errors induced by single event upset.For 8-bit wide devices, the parallel Reed Solomon(10, 8) can perform coder/decoder calculations in one clock cycle, achieving a data rate of several Gb/...

  14. A Reliable Fault-Tolerant Scheduling Algorithm for Real Time Embedded Systems

    Arar, Chafik; Kalla, Hamoudi; Kalla, Salim; Riadh, Hocine

    2013-01-01

    In this paper, we propose a fault-tolerant scheduling for realtime embedded systems. Our scheduling algorithm is dedicated to multibus heterogeneous architectures, which take as input a given system description and a given fault hypothesis. It is based on a data fragmentation and passive redundancy, which allow fast fault detection/retransmission and efficient use of buses. Our scheduling approach consist of a list scheduling heuristic based on a Global System Failure Rate (GSFR). In order to...

  15. Fault recovery characteristics of the fault tolerant multi-processor

    Padilla, Peter A.

    1990-01-01

    The fault handling performance of the fault tolerant multiprocessor (FTMP) was investigated. Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles byzantine or lying faults. It is pointed out that these weak areas in the FTMP's design increase the probability that, for any hardware fault, a good LRU (line replaceable unit) is mistakenly disabled by the fault management software. It is concluded that fault injection can help detect and analyze the behavior of a system in the ultra-reliable regime. Although fault injection testing cannot be exhaustive, it has been demonstrated that it provides a unique capability to unmask problems and to characterize the behavior of a fault-tolerant system.

  16. Award ER25750: Coordinated Infrastructure for Fault Tolerance Systems Indiana University Final Report

    Lumsdaine, Andrew

    2013-03-08

    The main purpose of the Coordinated Infrastructure for Fault Tolerance in Systems initiative has been to conduct research with a goal of providing end-to-end fault tolerance on a systemwide basis for applications and other system software. While fault tolerance has been an integral part of most high-performance computing (HPC) system software developed over the past decade, it has been treated mostly as a collection of isolated stovepipes. Visibility and response to faults has typically been limited to the particular hardware and software subsystems in which they are initially observed. Little fault information is shared across subsystems, allowing little flexibility or control on a system-wide basis, making it practically impossible to provide cohesive end-to-end fault tolerance in support of scientific applications. As an example, consider faults such as communication link failures that can be seen by a network library but are not directly visible to the job scheduler, or consider faults related to node failures that can be detected by system monitoring software but are not inherently visible to the resource manager. If information about such faults could be shared by the network libraries or monitoring software, then other system software, such as a resource manager or job scheduler, could ensure that failed nodes or failed network links were excluded from further job allocations and that further diagnosis could be performed. As a founding member and one of the lead developers of the Open MPI project, our efforts over the course of this project have been focused on making Open MPI more robust to failures by supporting various fault tolerance techniques, and using fault information exchange and coordination between MPI and the HPC system software stack from the application, numeric libraries, and programming language runtime to other common system components such as jobs schedulers, resource managers, and monitoring tools.

  17. Computationally efficient and numerically stable reliability bounds for repairable fault-tolerant systems

    Carrasco, Juan A.

    2002-01-01

    The transient analysis of large continuous time Markov reliability models of repairable fault-tolerant systems is computationally expensive due to model stiffness. In this paper, we develop and analyze a method to compute bounds for a measure defined on a particular, but quite wide, class of continuous time Markov models, encompassing both exact and bounding continuous time Markov unreliability models of fault-tolerant systems. The method is numerically stable and computes the bounds with wel...

  18. Evaluation of digital fault-tolerant architectures for nuclear power plant control systems

    This paper reports on four fault-tolerant architectures that were evaluated for their potential reliability in service as control systems of nuclear power plants. The reliability analyses showed that human- and software-related common cause failures and single points of failure in the output modules are dominant contributors to system unreliability. The four architectures are triple-modular-redundant, both synchronous and asynchronous, and also dual synchronous and asynchronous. The evaluation includes a review of design features, an analysis of the importance of coverage, and reliability analyses of fault-tolerant systems. Reliability analyses based on data from several industries that have fault-tolerant controllers were used to estimate the mean-time-between-failures of fault-tolerant controllers and to predict those failure modes that may be important in nuclear power plants

  19. Fault-tolerant flight control system combining expert system and analytical redundancy concepts

    Handelman, Dave

    1987-01-01

    This research involves the development of a knowledge-based fault-tolerant flight control system. A software architecture is presented that integrates quantitative analytical redundancy techniques and heuristic expert system problem solving concepts for the purpose of in-flight, real-time failure accommodation.

  20. Fault Tolerant Software Architectures

    Saridakis, Titos; Issarny, Valérie

    1998-01-01

    Coping explicitly with failures during the conception and the design of software development complicates significantly the designer's job. The design complexity leads to software descriptions difficult to understand, which have to undergo many simplifications until their first functioning version. To support the systematic development of complex, fault tolerant software, this paper proposes a layered framework for the analysis of the fault tolerance software properties, where the top-most lay...

  1. Active fault tolerant control of piecewise affine systems with reference tracking and input constraints

    Gholami, M.; Cocquempot, V.; Schiøler, H.; Bak, Thomas

    2014-01-01

    An active fault tolerant control (AFTC) method is proposed for discrete-time piecewise affine (PWA) systems. Only actuator faults are considered. The AFTC framework contains a supervisory scheme, which selects a suitable controller in a set of controllers such that the stability and an acceptable...

  2. Diagnosis and fault-tolerant control

    Blanke, Mogens; Lunze, Jan; Staroswiecki, Marcel

    2016-01-01

    Fault-tolerant control aims at a gradual shutdown response in automated systems when faults occur. It satisfies the industrial demand for enhanced availability and safety, in contrast to traditional reactions to faults, which bring about sudden shutdowns and loss of availability. The book presents effective model-based analysis and design methods for fault diagnosis and fault-tolerant control. Architectural and structural models are used to analyse the propagation of the fault through the process, to test the fault detectability and to find the redundancies in the process that can be used to ensure fault tolerance. It also introduces design methods suitable for diagnostic systems and fault-tolerant controllers for continuous processes that are described by analytical models of discrete-event systems represented by automata. The book is suitable for engineering students, engineers in industry and researchers who wish to get an overview of the variety of approaches to process diagnosis and fault-tolerant contro...

  3. Fault-Tolerant Consensus of Multi-Agent System With Distributed Adaptive Protocol.

    Chen, Shun; Ho, Daniel W C; Li, Lulu; Liu, Ming

    2015-10-01

    In this paper, fault-tolerant consensus in multi-agent system using distributed adaptive protocol is investigated. Firstly, distributed adaptive online updating strategies for some parameters are proposed based on local information of the network structure. Then, under the online updating parameters, a distributed adaptive protocol is developed to compensate the fault effects and the uncertainty effects in the leaderless multi-agent system. Based on the local state information of neighboring agents, a distributed updating protocol gain is developed which leads to a fully distributed continuous adaptive fault-tolerant consensus protocol design for the leaderless multi-agent system. Furthermore, a distributed fault-tolerant leader-follower consensus protocol for multi-agent system is constructed by the proposed adaptive method. Finally, a simulation example is given to illustrate the effectiveness of the theoretical analysis. PMID:25415998

  4. Fault-tolerant interconnection network and image-processing applications for the PASM parallel processing system

    The demand for very high speed data processing coupled with falling hardware costs has made large-scale parallel and distributed computer systems both desirable and feasible. Two modes of parallel processing are single instruction stream-multiple data stream (SIMD) and multiple instruction stream-multiple data stream (MIMD). PASM, a partitionable SIMD/MIMD system, is a reconfigurable multimicroprocessor system being designed for image processing and pattern recognition. An important component of these systems is the interconnection network, the mechanism for communication among the computation nodes and memories. Assuring high reliability for such complex systems is a significant task. Thus, a crucial practical aspect of an interconnection network is fault tolerance. In answer to this need, the Extra Stage Cube (ESC), a fault-tolerant, multistage cube-type interconnection network, is define. The fault tolerance of the ESC is explored for both single and multiple faults, routing tags are defined, and consideration is given to permuting data and partitioning the ESC in the presence of faults. The ESC is compared with other fault-tolerant multistage networks. Finally, reliability of the ESC and an enhanced version of it are investigated

  5. Validation Methods Research for Fault-Tolerant Avionics and Control Systems: Working Group Meeting, 2

    Gault, J. W. (Editor); Trivedi, K. S. (Editor); Clary, J. B. (Editor)

    1980-01-01

    The validation process comprises the activities required to insure the agreement of system realization with system specification. A preliminary validation methodology for fault tolerant systems documented. A general framework for a validation methodology is presented along with a set of specific tasks intended for the validation of two specimen system, SIFT and FTMP. Two major areas of research are identified. First, are those activities required to support the ongoing development of the validation process itself, and second, are those activities required to support the design, development, and understanding of fault tolerant systems.

  6. Pivotal decomposition for reliability analysis of fault tolerant control systems on unmanned aerial vehicles

    In this paper, we describe a framework to efficiently assess the reliability of fault tolerant control systems on low-cost unmanned aerial vehicles. The analysis is developed for a system consisting of a fixed number of actuators. In addition, the system includes a scheme to detect failures in individual actuators and, as a consequence, switch between different control algorithms for automatic operation of the actuators. Existing dynamic reliability analysis methods are insufficient for this class of systems because the coverage parameters for different actuator failures can be time-varying, correlated, and difficult to obtain in practice. We address these issues by combining new fault detection performance metrics with pivotal decomposition. These new metrics capture the interactions in different fault detection channels, and can be computed from stochastic models of fault detection algorithms. Our approach also decouples the high dimensional analysis problem into low dimensional sub-problems, yielding a computationally efficient analysis. Finally, we demonstrate the proposed method on a numerical example. The analysis results are also verified by Monte Carlo simulations. - Highlights: • We study fault tolerant control (FTC) systems on low-cost unmanned aerial vehicles. • We build a reliability structure model for FTC systems. • New fault detection performance metrics are integrated via pivotal decomposition. • The fault detection metrics capture the interactions in fault detection channels. • Numerical results show that FTC techniques can improve system reliability

  7. The Impact of a Fault Tolerant MPI on Scalable Systems Services and Applications

    Graham, Richard L [ORNL; Hursey, Joshua J [ORNL; Vallee, Geoffroy R [ORNL; Naughton, III, Thomas J [ORNL; Boehm, Swen [ORNL

    2012-01-01

    Exascale targeted scientific applications must be prepared for a highly concurrent computing environment where failure will be a regular event during execution. Natural and algorithm-based fault tolerance (ABFT) techniques can often manage failures more efficiently than traditional checkpoint/restart techniques alone. Central to many petascale applications is an MPI standard that lacks support for ABFT. The Run-Through Stabilization (RTS) proposal, under consideration for MPI 3, allows an application to continue execution when processes fail. The requirements of scalable, fault tolerant MPI implementations and applications will stress the capabilities of many system services. System services must evolve to efficiently support such applications and libraries in the presence of system component failures. This paper discusses how the RTS proposal impacts system services, highlighting specific requirements. Early experimentation results from Cray systems at ORNL using prototype MPI and runtime implementations are presented. Additionally, this paper outlines fault tolerance techniques targeted at leadership class applications.

  8. Application of Joint Parameter Identification and State Estimation to a Fault-Tolerant Robot System

    Sun, Zhen; Yang, Zhenyu

    The joint parameter identification and state estimation technique is applied to develop a fault-tolerant space robot system. The potential faults in the considered system are abrupt parametric faults, which indicate that some system parameters will immediately deviate from their nominal values if a......, it would further simplify the reconfigurable design task and possibly speed up the system recovery, if the system state information under the new operating circumstance can be available along with faulty parameter information. The joint parameter identification and state estimation using the combined...

  9. Energy-efficient fault tolerance in multiprocessor real-time systems

    Guo, Yifeng

    The recent progress in the multiprocessor/multicore systems has important implications for real-time system design and operation. From vehicle navigation to space applications as well as industrial control systems, the trend is to deploy multiple processors in real-time systems: systems with 4 -- 8 processors are common, and it is expected that many-core systems with dozens of processing cores will be available in near future. For such systems, in addition to general temporal requirement common for all real-time systems, two additional operational objectives are seen as critical: energy efficiency and fault tolerance. An intriguing dimension of the problem is that energy efficiency and fault tolerance are typically conflicting objectives, due to the fact that tolerating faults (e.g., permanent/transient) often requires extra resources with high energy consumption potential. In this dissertation, various techniques for energy-efficient fault tolerance in multiprocessor real-time systems have been investigated. First, the Reliability-Aware Power Management (RAPM) framework, which can preserve the system reliability with respect to transient faults when Dynamic Voltage Scaling (DVS) is applied for energy savings, is extended to support parallel real-time applications with precedence constraints. Next, the traditional Standby-Sparing (SS) technique for dual processor systems, which takes both transient and permanent faults into consideration while saving energy, is generalized to support multiprocessor systems with arbitrary number of identical processors. Observing the inefficient usage of slack time in the SS technique, a Preference-Oriented Scheduling Framework is designed to address the problem where tasks are given preferences for being executed as soon as possible (ASAP) or as late as possible (ALAP). A preference-oriented earliest deadline (POED) scheduler is proposed and its application in multiprocessor systems for energy-efficient fault tolerance is

  10. Preface of the special issue on Advances in Control and Fault-Tolerant Systems

    Korbicz, Jozef; Maquin, Didier; THEILLIOL, DIDIER

    2012-01-01

    Today's automatic control systems are of high degrees of integration, complexity, embedding and networking of heterogeneous entities. This trend is driven by the industrial needs for achieving new technical performance and meeting additional performance demands. A most critical and important issue surrounding the design and operation of complex automatic systems is the application of Fault Detection and Isolation and Fault-Tolerant Control (FDI/FTC) technology, aiming at guaranteeing high sys...

  11. Science Letters: Low-cost fault tolerance in evolvable multiprocessor systems: a graceful degradation approach

    Shervin VAKILI; Sied Mehdi FAKHRAIE; Siamak MOHAMMADI; Ali AHMADI

    2009-01-01

    The evolvable multiprocessor (EvoMP), as a novel multiprocessor system-on-chip (MPSoC) machine with evolvable task decomposition and scheduling, claims a major feature of low-cost and efficient fault tolerance. Non-centralized control and adaptive distribution of the program among the available processors are two major capabilities of this platform, which remarkably help to achieve an efficient fault tolerance scheme. This letter presents the operational as well as architectural details of this fault tolerance scheme. In this method, when a processor becomes faulty, it will be eliminated of contribution in program execution in remaining run-time. This method also utilizes dynamic rescheduling capability of the system to achieve the maximum possible efficiency after processor reduction. The results confirm the efficiency and remarkable advantages of the proposed approach over common redundancy based techniques in similar systems.

  12. Novel fault tolerant modular system architecture for I and C applications

    Novel fault tolerant 3U modular system architecture has been developed for safety related and safety critical I and C systems of the reactor. Design innovatively utilizes simplest multi-drop serial bus called Inter-Integrated Circuits (I2C) Bus for system operation with simplicity, fault tolerance and online maintainability (hot swap). I2C bus failure modes analysis was done and system design was hardened for possible failure modes. System backplane uses only passive components, dual redundant I2C buses, data consistency checks and geographical addressing scheme to tackle bus lock ups/stuck buses and bit flips in data transactions. Dual CPU active/standby redundancy architecture with hot swap implements tolerance for CPU software stuck up conditions and hardware faults. System cards implement hot swap for online maintainability, power supply fault containment, communication buses fault containment and I/O channel to channel isolation and independency. Typical applications for pure hardwired (without real time software) Core Temperature Monitoring System for FBRs, as a Universal Signal Conditioning System for safety related I and C systems and as a complete control system for non nuclear safety systems have also been discussed. (author)

  13. Design of fault tolerant control system for steam generator using fuzzy logic

    A controller and sensor fault tolerant system for a steam generator is designed with fuzzy logic. A structure of the proposed fault tolerant redundant system is composed of a supervisor and two fuzzy weighting modulators. A supervisor alternatively checks a controller and a sensor induced performances to identify which part, a controller or a sensor, is faulty. In order to analyze controller induced performance both an error and a change in error of the system output are chosen as fuzzy variables. The fuzzy logic for a sensor induced performance uses two variables : a deviation between two sensor outputs and its frequency. Fuzzy weighting modulator generates an output signal compensated for faulty input signal. Simulations show that the proposed fault tolerant control scheme for a stem generator regulates well water level by suppressing fault effect of either controllers or sensors. Therefore through duplicating sensors and controllers with the proposed fault tolerant scheme, both a reliability of a steam generator control and sensor system and that of a power plant increase even more

  14. A real-time fault-tolerant scheduling algorithm with low dependability cost in on-board computer system

    WANG Pei-dong; WEI Zhen-hua

    2008-01-01

    To make the on-board computer system more dependable and real-time in a satellite, an algorithm of the fault-tolerant scheduling in the on-board computer system with high priority recovery is proposed in this paper. This algorithm can schedule the on-board fault-tolerant tasks in real time. Due to the use of dependability cost, the overhead of scheduling the fault-tolerant tasks can be reduced. The mechanism of the high priority recovery will improve the response to recovery tasks. The fault-tolerant scheduling model is presented simulation results validate the correctness and feasibility of the proposed algorithm.

  15. Fault tolerance control of phase current in permanent magnet synchronous motor control system

    Chen, Kele; Chen, Ke; Chen, Xinglong; Li, Jinying

    2014-08-01

    As the Photoelectric tracking system develops from earth based platform to all kinds of moving platform such as plane based, ship based, car based, satellite based and missile based, the fault tolerance control system of phase current sensor is studied in order to detect and control of failure of phase current sensor on a moving platform. By using a DC-link current sensor and the switching state of the corresponding SVPWM inverter, the failure detection and fault control of three phase current sensor is achieved. Under such conditions as one failure, two failures and three failures, fault tolerance is able to be controlled. The reason why under the method, there exists error between fault tolerance control and actual phase current, is analyzed, and solution to weaken the error is provided. The experiment based on permanent magnet synchronous motor system is conducted, and the method is proven to be capable of detecting the failure of phase current sensor effectively and precisely, and controlling the fault tolerance simultaneously. With this method, even though all the three phase current sensors malfunction, the moving platform can still work by reconstructing the phase current of the motor.

  16. Scheduling and Optimization of Fault-Tolerant Embedded Systems with Transparency/Performance Trade-Offs

    Izosimov, Viacheslav; Pop, Paul; Eles, Petru; Peng, Zebo

    2012-01-01

    In this article, we propose a strategy for the synthesis of fault-tolerant schedules and for the mapping of fault-tolerant applications. Our techniques handle transparency/performance trade-offs and use the faultoccurrence information to reduce the overhead due to fault tolerance. Processes and...

  17. The fault-tolerant multiprocessor computer

    Smith, T.B. III; Lala, J.H.; Goldberg, J.; Kautz, W.H.; Melliar-Smith, P.M.; Green, M.W.; Levitt, K.N.; Schwartz, R.L.; Weinstock, C.B.; Palumbo, D.; Butler, R.W.

    1986-01-01

    This book presents studies of two fault-tolerant computer systems designed to meet the extreme reliability requirements for safety- critical functions in advanced NASA vehicles , plus a study of potential architectures for future flight control fault-tolerant systems, which might succeed the current generation of computers. While it is understood that these studies were done for NASA, they also have practical commercial applicability. The fault-tolerant multiprocessor (FTMP) architecture is a high reliability computer concept. The basic organization of the FTMP is that of a general purpose homogeneous multiprocessor. Three processors operate on a shared system (memory and l/O) bus. Replication and tight synchronization of all elements and hardware voting are employed to detect and correct any single fault. Reconfiguration is then employed to ''repair'' a fault. Multiple faults may be tolerated as a sequence of single faults with repair between fault occurrences.

  18. Fault tolerant distributed real time computer systems for I and C of prototype fast breeder reactor

    Highlights: • Architecture of distributed real time computer system (DRTCS) used in I and C of PFBR is explained. • Fault tolerant (hot standby) architecture, fault detection and switch over are detailed. • Scaled down model was used to study functional and performance requirements of DRTCS. • Quality of service parameters for scaled down model was critically studied. - Abstract: Prototype fast breeder reactor (PFBR) is in the advanced stage of construction at Kalpakkam, India. Three-tier architecture is adopted for instrumentation and control (I and C) of PFBR wherein bottom tier consists of real time computer (RTC) systems, middle tier consists of process computers and top tier constitutes of display stations. These RTC systems are geographically distributed and networked together with process computers and display stations. Hot standby architecture comprising of dual redundant RTC systems with switch over logic system is deployed in order to achieve fault tolerance. Fault tolerant dual redundant network connectivity is provided in each RTC system and TCP/IP protocol is selected for network communication. In order to assess the performance of distributed RTC systems, scaled down model was developed with 9 representative systems and nearly 15% of I and C signals of PFBR were connected and monitored. Functional and performance testing were carried out for each RTC system and the fault tolerant characteristics were studied by creating various faults into the system and observed the performance. Various quality of service parameters like connection establishment delay, priority parameter, transit delay, throughput, residual error ratio, etc., are critically studied for the network

  19. Fault Injection and Monitoring Capability for a Fault-Tolerant Distributed Computation System

    Torres-Pomales, Wilfredo; Yates, Amy M.; Malekpour, Mahyar R.

    2010-01-01

    The Configurable Fault-Injection and Monitoring System (CFIMS) is intended for the experimental characterization of effects caused by a variety of adverse conditions on a distributed computation system running flight control applications. A product of research collaboration between NASA Langley Research Center and Old Dominion University, the CFIMS is the main research tool for generating actual fault response data with which to develop and validate analytical performance models and design methodologies for the mitigation of fault effects in distributed flight control systems. Rather than a fixed design solution, the CFIMS is a flexible system that enables the systematic exploration of the problem space and can be adapted to meet the evolving needs of the research. The CFIMS has the capabilities of system-under-test (SUT) functional stimulus generation, fault injection and state monitoring, all of which are supported by a configuration capability for setting up the system as desired for a particular experiment. This report summarizes the work accomplished so far in the development of the CFIMS concept and documents the first design realization.

  20. Fault Tolerant Computer Architecture

    Sorin, Daniel

    2009-01-01

    For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes

  1. Cost and benefits design optimization model for fault tolerant flight control systems

    Rose, J.

    1982-01-01

    Requirements and specifications for a method of optimizing the design of fault-tolerant flight control systems are provided. Algorithms that could be used for developing new and modifying existing computer programs are also provided, with recommendations for follow-on work.

  2. Safety verification of a fault tolerant reconfigurable autonomous goal-based robotic control system

    Braman, Julia M. B.; Murray, Richard M.; Wagner, David A.

    2007-01-01

    Fault tolerance and safety verification of control systems are essential for the success of autonomous robotic systems. A control architecture called Mission Data System (MDS), developed at the Jet Propulsion Laboratory, takes a goal-based control approach. In this paper, a method for converting goal network control programs into linear hybrid systems is developed. The linear hybrid system can then be verified for safety in the presence of failures using existing symbo...

  3. Diagnosis and Tolerant Strategy of an Open-Switch Fault for T-type Three-Level Inverter Systems

    Choi, Uimin; Lee, Kyo Beum; Blaabjerg, Frede

    2014-01-01

    -tolerant strategy is explained by dividing into two cases: the faulty condition of half-bridge switches and the neutral-point switches. The performance of the T-type inverter system improves considerably by the proposed fault tolerant algorithm when a switch fails. The roposed method does not require additional......This paper proposes a new diagnosis method of an open-switch fault and fault-tolerant control strategy for T-type three-level inverter systems. The location of faulty switch can be identified by the average of normalized phase current and the change of the neutral-point voltage. The proposed fault...

  4. Adaptive Fault-Tolerant Control of Uncertain Nonlinear Large-Scale Systems With Unknown Dead Zone.

    Chen, Mou; Tao, Gang

    2016-08-01

    In this paper, an adaptive neural fault-tolerant control scheme is proposed and analyzed for a class of uncertain nonlinear large-scale systems with unknown dead zone and external disturbances. To tackle the unknown nonlinear interaction functions in the large-scale system, the radial basis function neural network (RBFNN) is employed to approximate them. To further handle the unknown approximation errors and the effects of the unknown dead zone and external disturbances, integrated as the compounded disturbances, the corresponding disturbance observers are developed for their estimations. Based on the outputs of the RBFNN and the disturbance observer, the adaptive neural fault-tolerant control scheme is designed for uncertain nonlinear large-scale systems by using a decentralized backstepping technique. The closed-loop stability of the adaptive control system is rigorously proved via Lyapunov analysis and the satisfactory tracking performance is achieved under the integrated effects of unknown dead zone, actuator fault, and unknown external disturbances. Simulation results of a mass-spring-damper system are given to illustrate the effectiveness of the proposed adaptive neural fault-tolerant control scheme for uncertain nonlinear large-scale systems. PMID:26340792

  5. Fault-Tolerant Heat Exchanger

    Izenson, Michael G.; Crowley, Christopher J.

    2005-01-01

    A compact, lightweight heat exchanger has been designed to be fault-tolerant in the sense that a single-point leak would not cause mixing of heat-transfer fluids. This particular heat exchanger is intended to be part of the temperature-regulation system for habitable modules of the International Space Station and to function with water and ammonia as the heat-transfer fluids. The basic fault-tolerant design is adaptable to other heat-transfer fluids and heat exchangers for applications in which mixing of heat-transfer fluids would pose toxic, explosive, or other hazards: Examples could include fuel/air heat exchangers for thermal management on aircraft, process heat exchangers in the cryogenic industry, and heat exchangers used in chemical processing. The reason this heat exchanger can tolerate a single-point leak is that the heat-transfer fluids are everywhere separated by a vented volume and at least two seals. The combination of fault tolerance, compactness, and light weight is implemented in a unique heat-exchanger core configuration: Each fluid passage is entirely surrounded by a vented region bridged by solid structures through which heat is conducted between the fluids. Precise, proprietary fabrication techniques make it possible to manufacture the vented regions and heat-conducting structures with very small dimensions to obtain a very large coefficient of heat transfer between the two fluids. A large heat-transfer coefficient favors compact design by making it possible to use a relatively small core for a given heat-transfer rate. Calculations and experiments have shown that in most respects, the fault-tolerant heat exchanger can be expected to equal or exceed the performance of the non-fault-tolerant heat exchanger that it is intended to supplant (see table). The only significant disadvantages are a slight weight penalty and a small decrease in the mass-specific heat transfer.

  6. Robust fault-tolerant H∞ control of active suspension systems with finite-frequency constraint

    Wang, Rongrong; Jing, Hui; Karimi, Hamid Reza; Chen, Nan

    2015-10-01

    In this paper, the robust fault-tolerant (FT) H∞ control problem of active suspension systems with finite-frequency constraint is investigated. A full-car model is employed in the controller design such that the heave, pitch and roll motions can be simultaneously controlled. Both the actuator faults and external disturbances are considered in the controller synthesis. As the human body is more sensitive to the vertical vibration in 4-8 Hz, robust H∞ control with this finite-frequency constraint is designed. Other performances such as suspension deflection and actuator saturation are also considered. As some of the states such as the sprung mass pitch and roll angles are hard to measure, a robust H∞ dynamic output-feedback controller with fault tolerant ability is proposed. Simulation results show the performance of the proposed controller.

  7. Fault Tolerance Mobile Agent System Using Witness Agent in 2-Dimensional Mesh Network

    Ahmad Rostami

    2010-09-01

    Full Text Available Mobile agents are computer programs that act autonomously on behalf of a user or its owner and travel through a network of heterogeneous machines. Fault tolerance is important in their itinerary. In this paper, existent methods of fault tolerance in mobile agents are described which they are considered in linear network topology. In the methods three agents are used to fault tolerance by cooperating to each others for detecting and recovering server and agent failure. Three types of agents are: actual agent which performs programs for its owner, witness agent which monitors the actual agent and the witness agent after itself, probe which is sent for recovery the actual agent or the witness agent on the side of the witness agent. Communication mechanism in the methods is message passing between these agents. The methods are considered in linear network. We introduce our witness agent approach for fault tolerance mobile agent systems in Two Dimensional Mesh (2D-Mesh Network. Indeed Our approach minimizes Witness-Dependency in this network and then represents its algorithm.

  8. A Systematic Approach to Sensitivity Analysis of Fault Tolerant Systems in NMR Architecture

    Kourosh Aslansefat

    2015-01-01

    Full Text Available A fault tree illustrates the ways through which a system fails. It states different ways in which combination of faulty components result in an undesired event in the system. Being used in phases such as designing and exploiting industrial systems, and the designers able to evaluate the dependability attributes such as reliability, MTTF and sensitivity. In addition, in the mentioned ability, the fault tree is a systematic method for finding systems bottlenecks and weakness point. In spite of its extensive use in evaluating the reliability of systems, fault tree is rarely used in calculating sensitivity. In the last decade, few researches has been conducted in this field, however these methods are not applicable to large scale systems and are not systematic. This paper provides a systematic method for evaluating system sensitivity through fault tree. Then, it introduces sensitivity of NMR architecture as one of the common structures of fault tolerance which is used for enhancing systems’ reliability, safety and availability in industry. This article presents a comprehensive and parameterized formula for NMR structure's sensitivity. The presented method can be a great help for designing and exploiting reliable systems engineers in systematic and instant calculation of sensitivity by means of fault tree.

  9. Modeling and Design of Fault-Tolerant and Self-Adaptive Reconfigurable Networked Embedded Systems

    Streichert Thilo

    2006-01-01

    Full Text Available Automotive, avionic, or body-area networks are systems that consist of several communicating control units specialized for certain purposes. Typically, different constraints regarding fault tolerance, availability and also flexibility are imposed on these systems. In this article, we will present a novel framework for increasing fault tolerance and flexibility by solving the problem of hardware/software codesign online. Based on field-programmable gate arrays (FPGAs in combination with CPUs, we allow migrating tasks implemented in hardware or software from one node to another. Moreover, if not enough hardware/software resources are available, the migration of functionality from hardware to software or vice versa is provided. Supporting such flexibility through services integrated in a distributed operating system for networked embedded systems is a substantial step towards self-adaptive systems. Beside the formal definition of methods and concepts, we describe in detail a first implementation of a reconfigurable networked embedded system running automotive applications.

  10. Modeling and Design of Fault-Tolerant and Self-Adaptive Reconfigurable Networked Embedded Systems

    Jürgen Teich

    2006-06-01

    Full Text Available Automotive, avionic, or body-area networks are systems that consist of several communicating control units specialized for certain purposes. Typically, different constraints regarding fault tolerance, availability and also flexibility are imposed on these systems. In this article, we will present a novel framework for increasing fault tolerance and flexibility by solving the problem of hardware/software codesign online. Based on field-programmable gate arrays (FPGAs in combination with CPUs, we allow migrating tasks implemented in hardware or software from one node to another. Moreover, if not enough hardware/software resources are available, the migration of functionality from hardware to software or vice versa is provided. Supporting such flexibility through services integrated in a distributed operating system for networked embedded systems is a substantial step towards self-adaptive systems. Beside the formal definition of methods and concepts, we describe in detail a first implementation of a reconfigurable networked embedded system running automotive applications.

  11. Problems related to the integration of fault tolerant aircraft electronic systems

    Bannister, J. A.; Adlakha, V.; Triyedi, K.; Alspaugh, T. A., Jr.

    1982-01-01

    Problems related to the design of the hardware for an integrated aircraft electronic system are considered. Taxonomies of concurrent systems are reviewed and a new taxonomy is proposed. An informal methodology intended to identify feasible regions of the taxonomic design space is described. Specific tools are recommended for use in the methodology. Based on the methodology, a preliminary strawman integrated fault tolerant aircraft electronic system is proposed. Next, problems related to the programming and control of inegrated aircraft electronic systems are discussed. Issues of system resource management, including the scheduling and allocation of real time periodic tasks in a multiprocessor environment, are treated in detail. The role of software design in integrated fault tolerant aircraft electronic systems is discussed. Conclusions and recommendations for further work are included.

  12. To err is robotic, to tolerate immunological: fault detection in multirobot systems.

    Tarapore, Danesh; Lima, Pedro U; Carneiro, Jorge; Christensen, Anders Lyhne

    2015-01-01

    Fault detection and fault tolerance represent two of the most important and largely unsolved issues in the field of multirobot systems (MRS). Efficient, long-term operation requires an accurate, timely detection, and accommodation of abnormally behaving robots. Most existing approaches to fault-tolerance prescribe a characterization of normal robot behaviours, and train a model to recognize these behaviours. Behaviours unrecognized by the model are consequently labelled abnormal or faulty. MRS employing these models do not transition well to scenarios involving temporal variations in behaviour (e.g., online learning of new behaviours, or in response to environment perturbations). The vertebrate immune system is a complex distributed system capable of learning to tolerate the organism's tissues even when they change during puberty or metamorphosis, and to mount specific responses to invading pathogens, all without the need of a genetically hardwired characterization of normality. We present a generic abnormality detection approach based on a model of the adaptive immune system, and evaluate the approach in a swarm of robots. Our results reveal the robust detection of abnormal robots simulating common electro-mechanical and software faults, irrespective of temporal changes in swarm behaviour. Abnormality detection is shown to be scalable in terms of the number of robots in the swarm, and in terms of the size of the behaviour classification space. PMID:25642825

  13. Observer-based fault-tolerant control for a class of nonlinear networked control systems

    Mahmoud, M. S.; Memon, A. M.; Shi, Peng

    2014-08-01

    This paper presents a fault-tolerant control (FTC) scheme for nonlinear systems which are connected in a networked control system. The nonlinear system is first transformed into two subsystems such that the unobservable part is affected by a fault and the observable part is unaffected. An observer is then designed which gives state estimates using a Luenberger observer and also estimates unknown parameter of the system; this helps in fault estimation. The FTC is applied in the presence of sampling due to the presence of a network in the loop. The controller gain is obtained using linear-quadratic regulator technique. The methodology is applied on a mechatronic system and the results show satisfactory performance.

  14. Fault detection and fault-tolerant control using sliding modes

    Alwi, Halim; Tan, Chee Pin

    2011-01-01

    ""Fault Detection and Fault-tolerant Control Using Sliding Modes"" is the first text dedicated to showing the latest developments in the use of sliding-mode concepts for fault detection and isolation (FDI) and fault-tolerant control in dynamical engineering systems. It begins with an introduction to the basic concepts of sliding modes to provide a background to the field. This is followed by chapters that describe the use and design of sliding-mode observers for FDI using robust fault reconstruction. The development of a class of sliding-mode observers is described from first principles throug

  15. Fault-tolerant multiprocessor computer

    Smith, T.B. III; Lala, J.H.; Goldberg, J.; Kautz, W.H.; Melliar-Smith, P.M.; Green, M.W.; Levitt, K.N.; Schwartz, R.L.; Weinstock, C.B.; Palumbo, D.L.

    1986-01-01

    The development and evaluation of fault-tolerant computer architectures and software-implemented fault tolerance (SIFT) for use in advanced NASA vehicles and potentially in flight-control systms are described in a collection of previously published reports prepared for NASA. Topics addressed include the principles of fault-tolerant multiprocessor (FTMP) operation; processor and slave regional designs; FTMP executive, facilities, aceptance-test/diagnostic, applications, and support software; FTM reliability and availability models; SIFT hardware design; and SIFT validation and verification.

  16. A method for the computation of reliability bounds for non-repairable fault-tolerant systems

    Suñé, Víctor; Carrasco, Juan A.

    1997-01-01

    A realistic modeling of fault-tolerant systems requires to take into account phenomena such as the dependence of component failure rates and coverage parameters on the operational configuration of the system, which cannot be properly captured using combinatorial techniques. Such dependencies can be modeled with detail using continuous-time Markov chains (CTMC’s). However, the use of CTMC models is limited by the well-known state space explosion problem. In this paper we develop a method for t...

  17. Architectures for fault-tolerant spacecraft computers

    Rennels, D. A.

    1978-01-01

    This paper summarizes the results of a long-term research program in fault-tolerant computing for spacecraft on-board processing. In response to changing device technology this program has progressed from the design of a fault-tolerant uniprocessor to the development of fault-tolerant distributed computer systems. The unusual requirements of spacecraft computing are described along with the resulting real-time computer architectures. The following aspects of these designs are discussed: (1) architectural features to minimize complexity in the distributed computer system, (2) fault-detection and recovery, (3) techniques to enhance reliability and testability, and (4) design approaches for LSI implementation.

  18. An architecture for fault tolerant controllers

    Niemann, Hans Henrik; Stoustrup, Jakob

    2005-01-01

    A general architecture for fault tolerant control is proposed. The architecture is based on the (primary) YJBK parameterization of all stabilizing compensators and uses the dual YJBK parameterization to quantify the performance of the fault tolerant system. The approach suggested can be applied f...

  19. Feasibility analysis and design of a fault tolerant computing system: a TMR microprocessor system design of 64-Bit COTS microprocessors

    Eken, Huseyin Baha

    2001-01-01

    The purpose of this thesis is to analyze and determine the feasibility of implementing a fault tolerant computing system that is able to function in the presence of radiation induced Single Event Upsets (SEU) by using the Triple Modular Redundancy (TMR) technique with 64-bit Commercial-Off-The- Shelf (COTS) microprocessors. Due to the radiation environment in space, electronic devices must be designed to tolerate the radiation effects. While there are radiation-hardened devices that can toler...

  20. Fault-tolerant Supervisory Control

    Izadi-Zamanabadi, Roozbeh

    systematic analysis of reconfiguration and design of supervisor logic. In addition, useful experience is obtained through implementation of a fault-tolerant control scheme against a simulated ship and its propulsion system. A development methodology, which was suggested in the Control Engineering Department...... realized on the basis of an extension to the traditional state-event machines. Due to parallelity (inherent modularity) the supervisor logic is more easily modified, updated, maintained, and tested. A salient feature is that a change in one task only necessitates redesign of essentially one corresponding...... state-event machine (SEM). A heuristic guideline is provided for designing the logic in form of SEMs. A ship propulsion system benchmark has been designed and used as a case study. This includes experimentation with the above methodologies and implementation of a fault-tolerant control against the...

  1. Economic modeling of fault tolerant flight control systems in commercial applications

    Finelli, G. B.

    1982-01-01

    This paper describes the current development of a comprehensive model which will supply the assessment and analysis capability to investigate the economic viability of Fault Tolerant Flight Control Systems (FTFCS) for commercial aircraft of the 1990's and beyond. An introduction to the unique attributes of fault tolerance and how they will influence aircraft operations and consequent airline costs and benefits is presented. Specific modeling issues and elements necessary for accurate assessment of all costs affected by ownership and operation of FTFCS are delineated. Trade-off factors are presented, aimed at exposing economically optimal realizations of system implementations, resource allocation, and operating policies. A trade-off example is furnished to graphically display some of the analysis capabilities of the comprehensive simulation model now being developed.

  2. A Decoding Approach to Fault Tolerant Control of Linear Systems with Quantized Disturbance Input

    Fosson, Sophie M

    2010-01-01

    The aim of this paper is to propose an alternative method to solve a Fault Tolerant Control problem. The model is a linear system affected by a disturbance term: this represents a large class of technological faulty processes. The goal is to make the system able to tolerate the undesired perturbation, i.e., to remove or at least reduce its negative effects; such a task is performed in three steps: the detection of the fault, its identification and the consequent process recovery. When the disturbance function is known to be \\emph{quantized} over a finite number of levels, the detection can be successfully executed by a recursive \\emph{decoding} algorithm, arising from Information and Coding Theory and suitably adapted to the control framework. This technique is analyzed and tested in a flight control issue; both theoretical considerations and simulations are reported.

  3. Fault-tolerant supervisory control of VAV air-conditioning systems

    Liu, X.-F.; Dexter, A. [Department of Engineering Science, University of Oxford, Oxford (United Kingdom)

    2001-07-01

    The paper describes a supervisory control scheme that adapts to the presence of degradation faults and minimises any resulting increase in energy consumption or deterioration in occupant comfort. Since there is a high degree of uncertainty associated with the results of any fault identification scheme in information-poor systems of this type, the supervisory control scheme uses fuzzy models to predict the control performance and a computationally undemanding optimisation scheme to determine the most appropriate set-points. The fault-tolerant control scheme is developed and evaluated using a detailed computer simulation of a multi-zone, variable-air-volume (VAV), air-conditioning system. The fuzzy models relate the performance of the terminal-boxes, the air-handling unit and the chiller to fuzzy descriptions of the cooling load, the supply air and chilled water temperature set-points, and the amount of air-side and water-side fouling. Results are presented that demonstrate the ability of the fuzzy models to predict the performance and show how the power consumption of the air-conditioning system varies with set-point changes and the presence of both water-side and air-side fouling. The main factors that determine the suitability of a particular air-conditioning system for fault-tolerant control are also discussed. (author)

  4. Fuxi: A fault-tolerant resource management and job scheduling system at internet scale

    Zhang, Z; C. Li; Tao, Y; Yang, R; H. Tang; Xu, J.

    2014-01-01

    Scalability and fault-tolerance are two fundamental challenges for all distributed computing at Internet scale. Despite many recent advances from both academia and industry, these two problems are still far from settled. In this paper, we present Fuxi, a resource management and job scheduling system that is capable of handling the kind of workload at Alibaba where hundreds of terabytes of data are generated and analyzed everyday to help optimize the company's business operations and user expe...

  5. A metaobject architecture for fault-tolerant distributed systems : the FRIENDS approach

    Fabre, Jean-Charles; Pérennou, Tanguy

    1998-01-01

    The FRIENDS system developed at LAAS-CNRS is a metalevel architecture providing libraries of metaobjects for fault tolerance, secure communication, and group-based distributed applications. The use of metaobjects provides a nice separation of concerns between mechanisms and applications. Metaobjects can be used transparently by applications and can be composed according to the needs of a given application, a given architecture, and its underlying properties. In FRIENDS, metaobjects are use...

  6. Energy/Reliability Trade-offs in Fault-Tolerant Event-Triggered Distributed Embedded Systems

    Gan, Junhe; Gruian, Flavius; Pop, Paul;

    2011-01-01

    This paper presents an approach to the synthesis of low-power fault-tolerant hard real-time applications mapped on distributed heterogeneous embedded systems. Our synthesis approach decides the mapping of tasks to processing elements, as well as the voltage and frequency levels for executing each...... with limited energy and hardware resources. We evaluated the algorithm proposed using several synthetic and reallife benchmarks....

  7. Advanced information processing system: The Army Fault-Tolerant Architecture detailed design overview

    Harper, Richard E.; Babikyan, Carol A.; Butler, Bryan P.; Clasen, Robert J.; Harris, Chris H.; Lala, Jaynarayan H.; Masotto, Thomas K.; Nagle, Gail A.; Prizant, Mark J.; Treadwell, Steven

    1994-01-01

    The Army Avionics Research and Development Activity (AVRADA) is pursuing programs that would enable effective and efficient management of large amounts of situational data that occurs during tactical rotorcraft missions. The Computer Aided Low Altitude Night Helicopter Flight Program has identified automated Terrain Following/Terrain Avoidance, Nap of the Earth (TF/TA, NOE) operation as key enabling technology for advanced tactical rotorcraft to enhance mission survivability and mission effectiveness. The processing of critical information at low altitudes with short reaction times is life-critical and mission-critical necessitating an ultra-reliable/high throughput computing platform for dependable service for flight control, fusion of sensor data, route planning, near-field/far-field navigation, and obstacle avoidance operations. To address these needs the Army Fault Tolerant Architecture (AFTA) is being designed and developed. This computer system is based upon the Fault Tolerant Parallel Processor (FTPP) developed by Charles Stark Draper Labs (CSDL). AFTA is hard real-time, Byzantine, fault-tolerant parallel processor which is programmed in the ADA language. This document describes the results of the Detailed Design (Phase 2 and 3 of a 3-year project) of the AFTA development. This document contains detailed descriptions of the program objectives, the TF/TA NOE application requirements, architecture, hardware design, operating systems design, systems performance measurements and analytical models.

  8. Fault-Tolerant Onboard Monitoring and Decision Support Systems

    Lajic, Zoran

    The purpose of this research project is to improve current onboard decision support systems. Special focus is on the onboard prediction of the instantaneous sea state. In this project a new approach to increasing the overall reliability of a monitoring and decision support system has been...

  9. Fault-Tolerant Scheduling for Real-Time Embedded Control Systems

    Chun-Hua Yang; Geert Deconinck; Wei-Hua Gui

    2004-01-01

    With the increasing complexity of industrial application, an embedded control system (ECS) requires processing a number of hard real-time tasks and needs fault-tolerance to assure high reliability. Considering the characteristics of real-time tasks in ECS, an integrated algorithm is proposed to schedule real-time tasks and to guarantee that all real-time tasks are completed before their deadlines even in the presence of faults. Based on the nonpreemptive critical-section protocol (NCSP), this paper analyzes the blocking time introduced by resource conflicts of relevancy tasks in fault-tolerant multiprocessor systems. An extended schedulability condition is presented to check the assignment feasibility of a given task to a processor. A primary/backup approach and on-line replacement of failed processors are used to tolerate processor failures. The analysis reveals that the integrated algorithm bounds the blocking time, requires limited overhead on the number of processors, and still assures good processor utilization. This is also demonstrated by simulation results. Both analysis and simulation show the effectiveness of the proposed algorithm in ECS.

  10. Fault-tolerant Agreement in Synchronous Message-passing Systems

    Raynal, Michel

    2010-01-01

    The present book focuses on the way to cope with the uncertainty created by process failures (crash, omission failures and Byzantine behavior) in synchronous message-passing systems (i.e., systems whose progress is governed by the passage of time). To that end, the book considers fundamental problems that distributed synchronous processes have to solve. These fundamental problems concern agreement among processes (if processes are unable to agree in one way or another in presence of failures, no non-trivial problem can be solved). They are consensus, interactive consistency, k-set agreement an

  11. Optimal maintenance center inventories for fault-tolerant repairable systems

    Lawrence, S. H.; Schaefer, M. K.

    1984-01-01

    A probabilistic approach is taken to determine the optimal repairable parts inventory for a maintenance center, servicing machines which contain several m-out-of-n systems of different parts, with a constraint on the total inventory investment. A model, based on the discrete Markov process, accounts for a typical ultrareliable avionics system, such as one presently being developed by NASA. The dynamic programming algorithm for minimizing the stockout and holding costs is applied to an exemplary maintenance center, and solutions for single-item and multi-item cases are given. The computational burden is noted to be reasonable and a computer program is used to generate optimal solutions.

  12. A Fault-Tolerant Modulation Method to Counteract the Double Open-Switch Fault in Matrix Converter Drive Systems without Redundant Power Devices

    Chen, Der-Fa; Nguyen-Duy, Khiem; Liu, Tian-Hua; Andersen, Michael A. E.

    This paper studies the double open-switch fault issue occurring within the conventional matrix converter driving a three-phase permanent-magnet synchronous motor system and proposes a fault-tolerant solution by introducing a revised modulation strategy. In this switching strategy, the rectifier...

  13. A Fault-Tolerant Modulation Method to Counteract the Double Open-Switch Fault in Matrix Converter Drive Systems without Redundant Power Devices

    Chen, Der-Fa; Nguyen-Duy, Khiem; Liu, Tian-Hua;

    2012-01-01

    This paper studies the double open-switch fault issue occurring within the conventional matrix converter driving a three-phase permanent-magnet synchronous motor system and proposes a fault-tolerant solution by introducing a revised modulation strategy. In this switching strategy, the rectifier-s...

  14. A Ship Propulsion System Model for Fault-tolerant Control

    Izadi-Zamanabadi, Roozbeh; Blanke, M.

    . The propulsion system model is presented in two versions: the first one consists of one engine and one propeller, and the othe one consists of two engines and their corresponding propellers placed in parallel in the ship. The corresponding programs are developed and are available....

  15. Fault tolerant PLC system using CPU and I/O redundancy with switch over logic system for nuclear instrumentation

    Nuclear instrumentation in power plants and fuel reprocessing plants demand very high reliable fault tolerant programmable logic controllers (PLC) since it is directly related to hazardous operation involving safety of plant, operator and in turn public at large. Components of control systems can fail depending on the circumstances and level of preparedness in plants, leading to a minor or major disaster. Utilizing existing technology and configuring system architecture, a fault tolerant PLC can provide superior solution to meet some of the challenges like high reliability, integrity and availability. This paper presents the background, concepts and implementation of fault tolerant PLC architecture using CPU and I/O redundancy with switch over logic system for nuclear instrumentation. (author)

  16. Task Migration for Fault-Tolerance in Mixed-Criticality Embedded Systems

    Saraswat, Prabhat Kumar; Pop, Paul; Madsen, Jan

    2009-01-01

    In this paper we are interested in mixed-criticality embedded applications implemented on distributed architectures. Depending on their time-criticality, tasks can be hard or soft real-time and regarding safety-criticality, tasks can be fault-tolerant to transient faults, permanent faults, or have...

  17. Fault tolerant computer control for a Maglev transportation system

    Lala, Jaynarayan H.; Nagle, Gail A.; Anagnostopoulos, George

    1994-05-01

    Magnetically levitated (Maglev) vehicles operating on dedicated guideways at speeds of 500 km/hr are an emerging transportation alternative to short-haul air and high-speed rail. They have the potential to offer a service significantly more dependable than air and with less operating cost than both air and high-speed rail. Maglev transportation derives these benefits by using magnetic forces to suspend a vehicle 8 to 200 mm above the guideway. Magnetic forces are also used for propulsion and guidance. The combination of high speed, short headways, stringent ride quality requirements, and a distributed offboard propulsion system necessitates high levels of automation for the Maglev control and operation. Very high levels of safety and availability will be required for the Maglev control system. This paper describes the mission scenario, functional requirements, and dependability and performance requirements of the Maglev command, control, and communications system. A distributed hierarchical architecture consisting of vehicle on-board computers, wayside zone computers, a central computer facility, and communication links between these entities was synthesized to meet the functional and dependability requirements on the maglev. Two variations of the basic architecture are described: the Smart Vehicle Architecture (SVA) and the Zone Control Architecture (ZCA). Preliminary dependability modeling results are also presented.

  18. Plan for the Characterization of HIRF Effects on a Fault-Tolerant Computer Communication System

    Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Miner, Paul S.; Koppen, Sandra V.

    2008-01-01

    This report presents the plan for the characterization of the effects of high intensity radiated fields on a prototype implementation of a fault-tolerant data communication system. Various configurations of the communication system will be tested. The prototype system is implemented using off-the-shelf devices. The system will be tested in a closed-loop configuration with extensive real-time monitoring. This test is intended to generate data suitable for the design of avionics health management systems, as well as redundancy management mechanisms and policies for robust distributed processing architectures.

  19. Fault Tolerant Neural Network for ECG Signal Classification Systems

    MERAH, M.

    2011-08-01

    Full Text Available The aim of this paper is to apply a new robust hardware Artificial Neural Network (ANN for ECG classification systems. This ANN includes a penalization criterion which makes the performances in terms of robustness. Specifically, in this method, the ANN weights are normalized using the auto-prune method. Simulations performed on the MIT ? BIH ECG signals, have shown that significant robustness improvements are obtained regarding potential hardware artificial neuron failures. Moreover, we show that the proposed design achieves better generalization performances, compared to the standard back-propagation algorithm.

  20. Synchronization of fault-tolerant parallel processing systems

    Harper, R.E.; Lala, J.H.

    1990-06-26

    This patent describes a system for synchronizing the operation of redundant processors forming a group thereof wherein a frame of operation is defined for each processor as a time period during which a selected number of specified processing events occurs. It comprises: means for performing a final one of the specified processing events for identifying the end of a current frame of operation; means responsive to the performing of the final one of the specified processing events by the last-to-perform of the processors for synchronizing the operations of the processors so that all of the processors can subsequently start their next frame of operation at substantially the same time; and each the processor further includes means responsive to the synchronizing means for performing a first one of the specified processing events to identify the start of the next frame of operation.

  1. Reliability model of fault-tolerant data processing system with primary and backup nodes

    Rahman, P. A.; Bobkova, E. Yu

    2016-04-01

    This paper deals with the fault-tolerant data processing systems, which are widely used in modern world of information technologies and have acceptable overhead expenses in hardware implementation. A simplified reliability model for duplex systems and the offered by authors advanced model for data processing systems with primary and backup nodes based on a three-state model of recoverable elements, which takes into consideration different failure rates of passive and active nodes and finite time of node activation, are also given. A calculation formula for the availability factor of the dual-node data processing system with primary and backup nodes and calculation examples are also provided.

  2. Rectifier Fault Diagnosis and Fault Tolerance of a Doubly Fed Brushless Starter Generator

    Liwei Shi; Zhou Bo

    2015-01-01

    This paper presents a rectifier fault diagnosis method with wavelet packet analysis to improve the fault tolerant four-phase doubly fed brushless starter generator (DFBLSG) system reliability. The system components and fault tolerant principle of the high reliable DFBLSG are given. And the common fault of the rectifier is analyzed. The process of wavelet packet transforms fault detection/identification algorithm is introduced in detail. The fault tolerant performance and output voltage experi...

  3. Fault-tolerant control of delta operator systems with actuator saturation and effectiveness loss

    Yang, Hongjiu; Zhang, Luyang; Zhao, Ling; Yuan, Yuan

    2016-07-01

    This paper studies the problem of robust fault-tolerant control against the actuator effectiveness loss for delta operator systems with actuator saturation. Ellipsoids are used to estimate the domain of attraction for the delta operator systems with actuator saturation and effectiveness loss. Some invariance set conditions used for enlarging the domain of attraction are expressed by linear matrix inequalities. Discussions on system performance optimisation are presented in this paper, including reduction on computational complexity, expansion of the domain of attraction and disturbance rejection. Two numerical examples are given to illustrate the effectiveness of the developed techniques.

  4. (m,n-Semirings and a Generalized Fault-Tolerance Algebra of Systems

    Syed Eqbal Alam

    2013-01-01

    Full Text Available We propose a new class of mathematical structures called (m,n-semirings (which generalize the usual semirings and describe their basic properties. We define partial ordering and generalize the concepts of congruence, homomorphism, and so forth, for (m,n-semirings. Following earlier work by Rao (2008, we consider systems made up of several components whose failures may cause them to fail and represent the set of such systems algebraically as an (m,n-semiring. Based on the characteristics of these components, we present a formalism to compare the fault-tolerance behavior of two systems using our framework of a partially ordered (m,n-semiring.

  5. MICROTHREAD BASED (MTB) COARSE GRAINED FAULT TOLERANCE SUPERSCALAR PROCESSOR ARCHITECTURE

    2006-01-01

    Fault tolerance in microprocessor systems has become a popular topic of architecture research.Much work has been done at different levels to accomplish reliability against soft errors, and some fault tolerance architectures have been proposed. But little attention is paid to the thread level superscalar fault tolerance.This letter introduces microthread concept into superscalar processor fault tolerance domain, and puts forward a novel fault tolerance architecture, namely, MicroThread Based (MTB) coarse grained transient fault tolerance superscalar processor architecture, then discusses some detailed implementations.

  6. Model prototype utilization in the analysis of fault tolerant control and data processing systems

    Kovalev, I. V.; Tsarev, R. Yu; Gruzenkin, D. V.; Prokopenko, A. V.; Knyazkov, A. N.; Laptenok, V. D.

    2016-04-01

    The procedure assessing the profit of control and data processing system implementation is presented in the paper. The reasonability of model prototype creation and analysis results from the implementing of the approach of fault tolerance provision through the inclusion of structural and software assessment redundancy. The developed procedure allows finding the best ratio between the development cost and the analysis of model prototype and earnings from the results of this utilization and information produced. The suggested approach has been illustrated by the model example of profit assessment and analysis of control and data processing system.

  7. Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

    Harper, Richard

    1989-01-01

    In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.

  8. Fault-tolerant parallel processor

    Harper, R.E.; Lala, J.H. (Charles Stark Draper Laboratory, Inc., Cambridge, MA (USA))

    1991-06-01

    This paper addresses issues central to the design and operation of an ultrareliable, Byzantine resilient parallel computer. Interprocessor connectivity requirements are met by treating connectivity as a resource that is shared among many processing elements, allowing flexibility in their configuration and reducing complexity. Redundant groups are synchronized solely by message transmissions and receptions, which aslo provide input data consistency and output voting. Reliability analysis results are presented that demonstrate the reduced failure probability of such a system. Performance analysis results are presented that quantify the temporal overhead involved in executing such fault-tolerance-specific operations. Empirical performance measurements of prototypes of the architecture are presented. 30 refs.

  9. Fault Tolerant Approach for Data Encryption and Digital Signature Based on ECC System

    ZENG Yong; MA Jian-feng

    2005-01-01

    An integrated fault tolerant approach for data encryption and digital signature based on elliptic curve cryptography is proposed. This approach allows the receiver to verify the sender's identity and can simultaneously deal with error detection and data correction. Up to three errors in our approach can be detected and corrected. This approach has at least the same security as that based on RSA system, but smaller keys to achieve the same level of security. Our approach is more efficient than the known ones and more suited for limited environments like personal digital assistants (PDAs), mobile phones and smart cards without RSA co- processors.

  10. Fault Tolerant Wind Farm Control

    Odgaard, Peter Fogh; Stoustrup, Jakob

    2013-01-01

    with best at a wind turbine control level. However, some faults are better dealt with at the wind farm control level, if the wind turbine is located in a wind farm. In this paper a benchmark model for fault detection and isolation, and fault tolerant control of wind turbines implemented at the wind...... farm control level is presented. The benchmark model includes a small wind farm of nine wind turbines, based on simple models of the wind turbines as well as the wind and interactions between wind turbines in the wind farm. The model includes wind and power references scenarios as well as three...... relevant fault scenarios. This benchmark model is used in an international competition dealing with Wind Farm fault detection and isolation and fault tolerant control....

  11. A Concept for fault tolerant controllers

    Niemann, Hans Henrik; Poulsen, Niels Kjølstad

    2009-01-01

    This paper describe a concept for fault tolerant controllers (FTC) based on the YJBK (after Youla, Jabr, Bongiorno and Kucera) parameterization. This controller architecture will allow to change the controller on-line in the case of faults in the system. In the described FTC concept, a safe mode...

  12. Reliability and coverage analysis of non-repairable fault-tolerant memory systems

    Cox, G. W.; Carroll, B. D.

    1976-01-01

    A method was developed for the construction of probabilistic state-space models for nonrepairable systems. Models were developed for several systems which achieved reliability improvement by means of error-coding, modularized sparing, massive replication and other fault-tolerant techniques. From the models developed, sets of reliability and coverage equations for the systems were developed. Comparative analyses of the systems were performed using these equation sets. In addition, the effects of varying subunit reliabilities on system reliability and coverage were described. The results of these analyses indicated that a significant gain in system reliability may be achieved by use of combinations of modularized sparing, error coding, and software error control. For sufficiently reliable system subunits, this gain may far exceed the reliability gain achieved by use of massive replication techniques, yet result in a considerable saving in system cost.

  13. Model Checking a Byzantine-Fault-Tolerant Self-Stabilizing Protocol for Distributed Clock Synchronization Systems

    Malekpour, Mahyar R.

    2007-01-01

    This report presents the mechanical verification of a simplified model of a rapid Byzantine-fault-tolerant self-stabilizing protocol for distributed clock synchronization systems. This protocol does not rely on any assumptions about the initial state of the system. This protocol tolerates bursts of transient failures, and deterministically converges within a time bound that is a linear function of the self-stabilization period. A simplified model of the protocol is verified using the Symbolic Model Verifier (SMV) [SMV]. The system under study consists of 4 nodes, where at most one of the nodes is assumed to be Byzantine faulty. The model checking effort is focused on verifying correctness of the simplified model of the protocol in the presence of a permanent Byzantine fault as well as confirmation of claims of determinism and linear convergence with respect to the self-stabilization period. Although model checking results of the simplified model of the protocol confirm the theoretical predictions, these results do not necessarily confirm that the protocol solves the general case of this problem. Modeling challenges of the protocol and the system are addressed. A number of abstractions are utilized in order to reduce the state space. Also, additional innovative state space reduction techniques are introduced that can be used in future verification efforts applied to this and other protocols.

  14. State of the art on fault-tolerant real time distributed systems

    The integration of new computerized functions in power plant, and especially nuclear power plant, control and instrumentation systems implies more and more stringent requirements as to communication system reliability. For if an item of equipment, or even a computer program, can be validated and qualified, no formal qualification procedure is presently imposed on communication networks. This is certainly due to the relative immaturity of these networks, but also to their complexity. It is for this reason that, in the context of preparation for the future PWR 2000 standardized nuclear plants, it would seem appropriate to take a look at fault-tolerant communication systems. Since C and I type applications (in the control room) are divided between several computers and are required to contend with extremely severe time constraints, EDF has undertaken investigation of fault-tolerant, real time distributed systems. This paper summarized the state of the art in the field as it appears from discussion with computer manufacturers, academics and research workers on related projects. The results obtained were then used to determine trends as to ''promising'' solutions. The paper concludes with recommended study programs for the PCC department of EDF/R and DD for the next few years. (author), 9 figs., 10 refs., 2 annexes

  15. Nonlinear, Adaptive and Fault-tolerant Control for Electro-hydraulic Servo Systems

    Choux, Martin

    Fluid power systems have been in use since 1795 with the rst hydraulic press patented by Joseph Bramah and today form the basis of many industries. Electro hydraulic servo systems are uid power systems controlled in closed-loop. They transform reference input signals into a set of movements in...... detected early and handled. Moreover, the task of controlling electro hydraulic systems for high performance operations is challenging due to the highly nonlinear behaviour of such systems and the large amount of uncertainties present in their models. This thesis focuses on nonlinear adaptive fault......-tolerant control for a representative electro hydraulic servo controlled motion system. The thesis extends existing models of hydraulic systems by considering more detailed dynamics in the servo valve and in the friction inside the hydraulic cylinder. It identies the model parameters using experimental data from a...

  16. Design Approach for Fault Recoverable ALU with Improved Fault Tolerance

    Ankit K V

    2015-08-01

    Full Text Available A new design for fault tolerant and fault recoverable ALU System has been proposed in this paper. Reliability is one of the most critical factors that have to be considered during the designing phase of any IC. In critical applications like Medical equipment & Military applications this reliability factor plays a very critical role in determining the acceptance of product. Insertion of special modules in the main design for reliability enhancement will give considerable amount of area & power penalty. So, a novel approach to this problem is to find ways for reusing the already available components in digital system in efficient way to implement recoverable methodologies. Triple Modular Redundancy (TMR has traditionally used for protecting digital logic from the SEUs (single event upset by triplicating the critical components of the system to give fault tolerance to system. ScTMR- Scan chain-based error recovery TMR technique provides recovery for all internal faults. ScTMR uses a roll-forward approach and employs the scan chain implemented in the circuits for testability purposes to recover the system to fault-free state. The proposed design will incorporate a ScTMR controller over TMR system of ALU and will make the system fault tolerant and fault recoverable. Hence, proposed design will be more efficient & reliable to use in critical applications, than any other design present till today.

  17. Multi-agent Platform and Toolbox for Fault Tolerant Networked Control Systems

    Mário J. G. C. Mendes

    2009-04-01

    Full Text Available Industrial distributed networked control systems use different communication networks to exchange different critical levels of information. Real-time control, fault diagnosis (FDI and Fault Tolerant Networked Control (FTNC systems demand one of the more stringent data exchange in the communication networks of these networked control systems (NCS. When dealing with large-scale complex NCS, designing FTNC systems is a very difficult task due to the large number of sensors and actuators spatially distributed and network connected. To solve this issue, a FTNC platform and toolbox are presented in this paper using simple and verifiable principles coming mainly from a decentralized design based on causal modelling partitioning of the NCS and distributed computing using multi-agent systems paradigm, allowing the use of agents with well established FTC methodologies or new ones developed taking into account the NCS specificities. The multi-agent platform and toolbox for FTNC systems have been built in Matlab/Simulink environment, which is in our days the scientific benchmark for this kind of research. Although the tests have been performed with a simple case, the results are promising and this approach is expected to succeed with more complex processes.

  18. Fault Tolerant External Memory Algorithms

    Jørgensen, Allan Grønlund; Brodal, Gerth Stølting; Mølhave, Thomas

    2009-01-01

    Algorithms dealing with massive data sets are usually designed for I/O-efficiency, often captured by the I/O model by Aggarwal and Vitter. Another aspect of dealing with massive data is how to deal with memory faults, e.g. captured by the adversary based faulty memory RAM by Finocchi and Italiano....... However, current fault tolerant algorithms do not scale beyond the internal memory. In this paper we investigate for the first time the connection between I/O-efficiency in the I/O model and fault tolerance in the faulty memory RAM, and we assume that both memory and disk are unreliable. We show a lower...... bound on the number of I/Os required for any deterministic dictionary that is resilient to memory faults. We design a static and a dynamic deterministic dictionary with optimal query performance as well as an optimal sorting algorithm and an optimal priority queue. Finally, we consider scenarios where...

  19. A Fault-Tolerant Emergency-Aware Access Control Scheme for Cyber-Physical Systems

    Wu, Guowei; Xia, Feng; Yao, Lin

    2012-01-01

    Access control is an issue of paramount importance in cyber-physical systems (CPS). In this paper, an access control scheme, namely FEAC, is presented for CPS. FEAC can not only provide the ability to control access to data in normal situations, but also adaptively assign emergency-role and permissions to specific subjects and inform subjects without explicit access requests to handle emergency situations in a proactive manner. In FEAC, emergency-group and emergency-dependency are introduced. Emergencies are processed in sequence within the group and in parallel among groups. A priority and dependency model called PD-AGM is used to select optimal response-action execution path aiming to eliminate all emergencies that occurred within the system. Fault-tolerant access control polices are used to address failure in emergency management. A case study of the hospital medical care application shows the effectiveness of FEAC.

  20. GRID COMPUTING AND FAULT TOLERANCE APPROACH

    Pankaj Gupta,

    2011-10-01

    Full Text Available Grid computing is a means of allocating the computational power of alarge number of computers to complex difficult computation orproblem. Grid computing is a distributed computing paradigm thatdiffers from traditional distributed computing in that it is aimed toward large scale systems that even span organizational boundaries. This paper proposes a method to achieve maximum fault tolerance in the Grid environment system by using Reliability consideration by using Replication approach and Check-point approach. Fault tolerance is an important property for large scale computational grid systems, where geographically distributed nodes co-operate to execute a task. In order to achieve high level of reliability and availability, the grid infrastructure should be a foolproof fault tolerant. Since the failure of resources affects job execution fatally, fault tolerance service is essential to satisfy QOS requirement in grid computing. Commonly utilized techniques for providing fault tolerance are job check pointing and replication. Both techniques mitigate the amount of work lost due to changing system availability but can introduce significant runtime overhead. The latter largely depends on the length of check pointing interval and the chosen number of replicas, respectively. In case of complex scientific workflows where tasks can execute in well defined order reliability is another biggest challenge because of the unreliable nature of the grid resources.

  1. Design Approach for Fault Recoverable ALU with Improved Fault Tolerance

    Ankit K V; S Murali Narasimham; Viajya Prakash A M

    2015-01-01

    A new design for fault tolerant and fault recoverable ALU System has been proposed in this paper. Reliability is one of the most critical factors that have to be considered during the designing phase of any IC. In critical applications like Medical equipment & Military applications this reliability factor plays a very critical role in determining the acceptance of product. Insertion of special modules in the main design for reliability enhancement will give considerable amount of ...

  2. Methods and apparatuses for self-generating fault-tolerant keys in spread-spectrum systems

    Moradi, Hussein; Farhang, Behrouz; Subramanian, Vijayarangam

    2015-12-15

    Self-generating fault-tolerant keys for use in spread-spectrum systems are disclosed. At a communication device, beacon signals are received from another communication device and impulse responses are determined from the beacon signals. The impulse responses are circularly shifted to place a largest sample at a predefined position. The impulse responses are converted to a set of frequency responses in a frequency domain. The frequency responses are shuffled with a predetermined shuffle scheme to develop a set of shuffled frequency responses. A set of phase differences is determined as a difference between an angle of the frequency response and an angle of the shuffled frequency response at each element of the corresponding sets. Each phase difference is quantized to develop a set of secret-key quantized phases and a set of spreading codes is developed wherein each spreading code includes a corresponding phase of the set of secret-key quantized phases.

  3. Fault tolerant operation of switched reluctance machine

    Wang, Wei

    The energy crisis and environmental challenges have driven industry towards more energy efficient solutions. With nearly 60% of electricity consumed by various electric machines in industry sector, advancement in the efficiency of the electric drive system is of vital importance. Adjustable speed drive system (ASDS) provides excellent speed regulation and dynamic performance as well as dramatically improved system efficiency compared with conventional motors without electronics drives. Industry has witnessed tremendous grow in ASDS applications not only as a driving force but also as an electric auxiliary system for replacing bulky and low efficiency auxiliary hydraulic and mechanical systems. With the vast penetration of ASDS, its fault tolerant operation capability is more widely recognized as an important feature of drive performance especially for aerospace, automotive applications and other industrial drive applications demanding high reliability. The Switched Reluctance Machine (SRM), a low cost, highly reliable electric machine with fault tolerant operation capability, has drawn substantial attention in the past three decades. Nevertheless, SRM is not free of fault. Certain faults such as converter faults, sensor faults, winding shorts, eccentricity and position sensor faults are commonly shared among all ASDS. In this dissertation, a thorough understanding of various faults and their influence on transient and steady state performance of SRM is developed via simulation and experimental study, providing necessary knowledge for fault detection and post fault management. Lumped parameter models are established for fast real time simulation and drive control. Based on the behavior of the faults, a fault detection scheme is developed for the purpose of fast and reliable fault diagnosis. In order to improve the SRM power and torque capacity under faults, the maximum torque per ampere excitation are conceptualized and validated through theoretical analysis and

  4. A Byzantine-Fault Tolerant Self-Stabilizing Protocol for Distributed Clock Synchronization Systems

    Malekpour, Mahyar R.

    2006-01-01

    Embedded distributed systems have become an integral part of safety-critical computing applications, necessitating system designs that incorporate fault tolerant clock synchronization in order to achieve ultra-reliable assurance levels. Many efficient clock synchronization protocols do not, however, address Byzantine failures, and most protocols that do tolerate Byzantine failures do not self-stabilize. Of the Byzantine self-stabilizing clock synchronization algorithms that exist in the literature, they are based on either unjustifiably strong assumptions about initial synchrony of the nodes or on the existence of a common pulse at the nodes. The Byzantine self-stabilizing clock synchronization protocol presented here does not rely on any assumptions about the initial state of the clocks. Furthermore, there is neither a central clock nor an externally generated pulse system. The proposed protocol converges deterministically, is scalable, and self-stabilizes in a short amount of time. The convergence time is linear with respect to the self-stabilization period. Proofs of the correctness of the protocol as well as the results of formal verification efforts are reported.

  5. Design of fault tolerant control system for individual blade control helicopters

    Tamayo, Sergio

    This dissertation presents the development of a fault tolerant control scheme for helicopters fitted with individually controlled blades. This novel approach attempts to improve fault tolerant capabilities of helicopter control system by increasing control redundancy using additional actuators for individual blade input and software re-mixing to obtain nominal or close to nominal conditions under failure. An advanced interactive simulation environment has been developed including modeling of sensor failure, swashplate actuator failure, individual blade actuator failure, and blade delamination to support the design, testing, and evaluation of the control laws. This simulation environment is based on the blade element theory for the calculation of forces and moments generated by the main rotor. This discretized model allows for individual blade analysis, which in turn allows measuring the consequences of a stuck blade, or loss of the surface area of the blade itself, with respect to the dynamics of the whole helicopter. The control laws are based on non-linear dynamic inversion and artificial neural network augmentation, which is a mix of linear and nonlinear methods that compensates for model inaccuracies due to linearization or failure. A stability analysis based on the Lyapunov function approach has shown that bounded tracking error is guaranteed, and under specific circumstances, global stability is guaranteed as well. An analysis over the degrees of freedom of the mechanical system and its impact over the helicopter handling qualities is also performed to measure the degree of redundancy achieved with the addition of individual blade actuators as compared to a classic swashplate helicopter configuration. Mathematical analysis and numerical simulation, using reconfiguration of the individual blade control under failure have shown that this control architecture can potentially improve the survivability of the aircraft and reduce pilot workload under failure

  6. Diagnosis and Fault-tolerant Control, 2nd edition

    Blanke, Mogens; Kinnaert, Michel; Lunze, Jan; Starosweicki, Marcel

    Fault-tolerant control aims at a graceful degradation of the behaviour of automated systems in case of faults. It satisfies the industrial demand for enhanced availability and safety, in contrast to traditional reactions to faults that bring about sudden shutdowns and loss of availability. The book...... presents effective model-based analysis and design methods for fault diagnosis and fault-tolerant control. Architectural and structural models are used to analyse the propagation of the fault throught the process, to test the fault detectability and to find the redundancies in the process that can be used...... to ensure fault tolerance. Design methods for diagnostic systems and fault-tolerant controllers are presented for processes that are described by analytical models, by discrete-event models or that can be dealt with as quantised systems. Five case studies on pilot processes show the applicability of...

  7. A Fault-tolerance Estimating Method for Ionosphere Corrections in Satellite Navigation System

    GAO Shuliang; LI Rui; HUANG Zhigang

    2011-01-01

    Aiming to the reliable estimates of the ionosphere differential corrections for the satellite navigation system in the presence of the ionosphere anomaly,a fault-tolerance estimating method,which is based on the distributed Kalman filtering,is proposed.The method utilizes the parallel sub-filters for estimating the ionosphere differential corrections.Meanwhile,an infinite norm (IN) method is proposed for the detection of the ionosphere irregularity in the filter processing.Once the anomaly is detected,the sub-filter contaminated by the anomaly measurements will be excluded to ensure the reliability of the estimates.The simulation is conducted to validate the method and the results indicate that the anomaly can be found timely due to the novel fault detection method based on the infinite norm.Because of the parallel sub-filter architecture,the measurements are classified by the spatial distribution so that the ionosphere anomaly can be positioned and excluded more easily.Thus,the method can provide the robust and accurate ionosphere differential corrections.

  8. A novel N-input voting algorithm for X-by-wire fault-tolerant systems.

    Karimi, Abbas; Zarafshan, Faraneh; Al-Haddad, S A R; Ramli, Abdul Rahman

    2014-01-01

    Voting is an important operation in multichannel computation paradigm and realization of ultrareliable and real-time control systems that arbitrates among the results of N redundant variants. These systems include N-modular redundant (NMR) hardware systems and diversely designed software systems based on N-version programming (NVP). Depending on the characteristics of the application and the type of selected voter, the voting algorithms can be implemented for either hardware or software systems. In this paper, a novel voting algorithm is introduced for real-time fault-tolerant control systems, appropriate for applications in which N is large. Then, its behavior has been software implemented in different scenarios of error-injection on the system inputs. The results of analyzed evaluations through plots and statistical computations have demonstrated that this novel algorithm does not have the limitations of some popular voting algorithms such as median and weighted; moreover, it is able to significantly increase the reliability and availability of the system in the best case to 2489.7% and 626.74%, respectively, and in the worst case to 3.84% and 1.55%, respectively. PMID:25386613

  9. A Novel N-Input Voting Algorithm for X-by-Wire Fault-Tolerant Systems

    Abbas Karimi

    2014-01-01

    Full Text Available Voting is an important operation in multichannel computation paradigm and realization of ultrareliable and real-time control systems that arbitrates among the results of N redundant variants. These systems include N-modular redundant (NMR hardware systems and diversely designed software systems based on N-version programming (NVP. Depending on the characteristics of the application and the type of selected voter, the voting algorithms can be implemented for either hardware or software systems. In this paper, a novel voting algorithm is introduced for real-time fault-tolerant control systems, appropriate for applications in which N is large. Then, its behavior has been software implemented in different scenarios of error-injection on the system inputs. The results of analyzed evaluations through plots and statistical computations have demonstrated that this novel algorithm does not have the limitations of some popular voting algorithms such as median and weighted; moreover, it is able to significantly increase the reliability and availability of the system in the best case to 2489.7% and 626.74%, respectively, and in the worst case to 3.84% and 1.55%, respectively.

  10. Coordinated Fault Tolerance for High-Performance Computing

    Dongarra, Jack; Bosilca, George; et al.

    2013-04-08

    Our work to meet our goal of end-to-end fault tolerance has focused on two areas: (1) improving fault tolerance in various software currently available and widely used throughout the HEC domain and (2) using fault information exchange and coordination to achieve holistic, systemwide fault tolerance and understanding how to design and implement interfaces for integrating fault tolerance features for multiple layers of the software stack—from the application, math libraries, and programming language runtime to other common system software such as jobs schedulers, resource managers, and monitoring tools.

  11. Modular, Fault-Tolerant Electronics Supporting Space Exploration Project

    National Aeronautics and Space Administration — Modern electronic systems tolerate only as many point failures as there are redundant system copies, using mere macro-scale redundancy. Fault Tolerant Electronics...

  12. Fault Tolerant Control: A Simultaneous Stabilization Result

    Stoustrup, Jakob; Blondel, V.D.

    2004-01-01

    This paper discusses the problem of designing fault tolerant compensators that stabilize a given system both in the nominal situation, as well as in the situation where one of the sensors or one of the actuators has failed. It is shown that such compensators always exist, provided that the system...

  13. Task Mapping and Bandwidth Reservation for Mixed Hard/Soft Fault-Tolerant Embedded Systems

    Saraswat, Prabhat Kumar; Pop, Paul; Madsen, Jan

    reserved for the servers determines the quality of service (QoS) for soft tasks. CBS enforces temporal isolation, such that soft task overruns do not affect the timing guarantees of hard tasks. Transient faults in hard tasks are tolerated using checkpointing with rollback recovery. We have proposed a Tabu...... Search-based approach for task mapping and CBS bandwidth reservation, such that the deadlines for the hard tasks are satisfied, even in the case of transient faults, and the QoS for the soft tasks is maximized. Researchers have used fixed execution time models, such as the worst-case execution times for...

  14. A design fix to supervisory control for fault-tolerant scheduling of real-time multiprocessor systems with aperiodic tasks

    Devaraj, Rajesh; Sarkar, Arnab; Biswas, Santosh

    2015-11-01

    In the article 'Supervisory control for fault-tolerant scheduling of real-time multiprocessor systems with aperiodic tasks', Park and Cho presented a systematic way of computing a largest fault-tolerant and schedulable language that provides information on whether the scheduler (i.e., supervisor) should accept or reject a newly arrived aperiodic task. The computation of such a language is mainly dependent on the task execution model presented in their paper. However, the task execution model is unable to capture the situation when the fault of a processor occurs even before the task has arrived. Consequently, a task execution model that does not capture this fact may possibly be assigned for execution on a faulty processor. This problem has been illustrated with an appropriate example. Then, the task execution model of Park and Cho has been modified to strengthen the requirement that none of the tasks are assigned for execution on a faulty processor.

  15. Towards fault-tolerant optimal control

    Chizeck, H. J.; Willsky, A. S.

    1979-01-01

    The paper considers the design of fault-tolerant controllers that may endow systems with dynamic reliability. Results for jump linear quadratic Gaussian control problems are extended to include random jump costs, trajectory discontinuities, and a simple case of non-Markovian mode transitions.

  16. Parallel fault-tolerant robot control

    Hamilton, D. L.; Bennett, J. K.; Walker, I. D.

    1992-01-01

    A shared memory multiprocessor architecture is used to develop a parallel fault-tolerant robot controller. Several versions of the robot controller are developed and compared. A robot simulation is also developed for control observation. Comparison of a serial version of the controller and a parallel version without fault tolerance showed the speedup possible with the coarse-grained parallelism currently employed. The performance degradation due to the addition of processor fault tolerance was demonstrated by comparison of these controllers with their fault-tolerant versions. Comparison of the more fault-tolerant controller with the lower-level fault-tolerant controller showed how varying the amount of redundant data affects performance. The results demonstrate the trade-off between speed performance and processor fault tolerance.

  17. Fault tolerance issues in nanoelectronics

    Spagocci, S.

    2008-01-01

    The astonishing success story of microelectronics cannot go on indefinitely. In fact, once devices reach the few-atom scale (nanoelectronics), transient quantum effects are expected to impair their behaviour. Fault tolerant techniques will then be required. The aim of this thesis is to investigate the problem of transient errors in nanoelectronic devices. Transient error rates for a selection of nanoelectronic gates, based upon quantum cellular automata and single electron devi...

  18. Low-Cost Fault Tolerant Methodology for Real Time MPSoC Based Embedded System

    2014-01-01

    We are proposing a design methodology for a fault tolerant homogeneous MPSoC having additional design objectives that include low hardware overhead and performance. We have implemented three different FT methodologies on MPSoCs and compared them against the defined constraints. The comparison of these FT methodologies is carried out by modelling their architectures in VHDL-RTL, on Spartan 3 FPGA. The results obtained through simulations helped us to identify the most relevant scheme in terms ...

  19. Architecture for Intrusion Detection System with Fault Tolerance Using Mobile Agent

    Chintan Bhatt; Asha Koshti; Hemant Agrawal; Zakiya Malek; Bhushan Trivedi

    2011-01-01

    This paper is a survey of the work, done for making an IDS fault tolerant.Architecture of IDS that usesmobile Agent provides higher scalability. Mobile Agent uses Platform for detecting Intrusions using filterAgent, co-relater agent, Interpreter agent and rule database. When server (IDS Monitor) goes down,other hosts based on priority takes Ownership. This architecture uses decentralized collection andanalysis for identifying Intrusion. Rule sets are fed based on user-behaviour or application...

  20. Fault-Tolerant, Real-Time, Multi-Core Computer System

    Gostelow, Kim P.

    2012-01-01

    A document discusses a fault-tolerant, self-aware, low-power, multi-core computer for space missions with thousands of simple cores, achieving speed through concurrency. The proposed machine decides how to achieve concurrency in real time, rather than depending on programmers. The driving features of the system are simple hardware that is modular in the extreme, with no shared memory, and software with significant runtime reorganizing capability. The document describes a mechanism for moving ongoing computations and data that is based on a functional model of execution. Because there is no shared memory, the processor connects to its neighbors through a high-speed data link. Messages are sent to a neighbor switch, which in turn forwards that message on to its neighbor until reaching the intended destination. Except for the neighbor connections, processors are isolated and independent of each other. The processors on the periphery also connect chip-to-chip, thus building up a large processor net. There is no particular topology to the larger net, as a function at each processor allows it to forward a message in the correct direction. Some chip-to-chip connections are not necessarily nearest neighbors, providing short cuts for some of the longer physical distances. The peripheral processors also provide the connections to sensors, actuators, radios, science instruments, and other devices with which the computer system interacts.

  1. Mission reliability analysis of fault-tolerant multiple-phased systems

    Fault-tolerant multiple-phased systems (FTMPS) are defined as systems whose critical components are independently replicated and whose operational life can be partitioned into a set of disjoint periods, called 'phases'. Because of their deployment in critical applications, their mission reliability analysis is a task of primary relevance to validate the designs. This paper is focused on the reliability analysis of FTMPS with random phase durations, non-exponentially distributed repair activities and different repair policies. For self-repairable FTMPS with a component-level reconfiguration architecture, we derive several efficient formulations from the underlying structure characteristics for their intraphase behavior analysis. We also present a uniform solution framework of the mission reliability for FTMPS with generally distributed phase durations. Compared with existing methods based on deterministic and stochastic Petri nets or Markov regenerative stochastic Petri nets, our approach is more simple in concept and powerful in computation. Two examples of FTMPS are analyzed to illustrate the advantages of our approach

  2. Mission reliability analysis of fault-tolerant multiple-phased systems

    Mo Yuchang [Harbin Institute of Technology, Harbin, Heilongjiang 150001 (China)], E-mail: myc@ftcl.hit.edu.cn; Siewiorek, Daniel [Department of Computer Science and Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213 (United States); Yang Xiaozong [Harbin Institute of Technology, Harbin, Heilongjiang 150001 (China)

    2008-07-15

    Fault-tolerant multiple-phased systems (FTMPS) are defined as systems whose critical components are independently replicated and whose operational life can be partitioned into a set of disjoint periods, called 'phases'. Because of their deployment in critical applications, their mission reliability analysis is a task of primary relevance to validate the designs. This paper is focused on the reliability analysis of FTMPS with random phase durations, non-exponentially distributed repair activities and different repair policies. For self-repairable FTMPS with a component-level reconfiguration architecture, we derive several efficient formulations from the underlying structure characteristics for their intraphase behavior analysis. We also present a uniform solution framework of the mission reliability for FTMPS with generally distributed phase durations. Compared with existing methods based on deterministic and stochastic Petri nets or Markov regenerative stochastic Petri nets, our approach is more simple in concept and powerful in computation. Two examples of FTMPS are analyzed to illustrate the advantages of our approach.

  3. Fault tolerant control schemes using integral sliding modes

    Hamayun, Mirza Tariq; Alwi, Halim

    2016-01-01

    The key attribute of a Fault Tolerant Control (FTC) system is its ability to maintain overall system stability and acceptable performance in the face of faults and failures within the feedback system. In this book Integral Sliding Mode (ISM) Control Allocation (CA) schemes for FTC are described, which have the potential to maintain close to nominal fault-free performance (for the entire system response), in the face of actuator faults and even complete failures of certain actuators. Broadly an ISM controller based around a model of the plant with the aim of creating a nonlinear fault tolerant feedback controller whose closed-loop performance is established during the design process. The second approach involves retro-fitting an ISM scheme to an existing feedback controller to introduce fault tolerance. This may be advantageous from an industrial perspective, because fault tolerance can be introduced without changing the existing control loops. A high fidelity benchmark model of a large transport aircraft is u...

  4. Synthesis of Fault-Tolerant Schedules with Transparency/Performance Trade-offs for Distributed Embedded Systems

    Izosimov, Viacheslav; Pop, Paul; Eles, Petru; Peng, Zebo

    application. We propose a novel algorithm for the synthesis of fault-tolerant schedules that can handle the transparency/performance trade-offs imposed by the designer, and makes use of the fault-occurrence information to reduce the overhead due to fault tolerance. We model the application as a conditional...... process graph, where the fault occurrence information is represented as conditional edges and the transparent recovery is captured using synchronization nodes....

  5. Validation Methods Research for Fault-Tolerant Avionics and Control Systems Sub-Working Group Meeting. CARE 3 peer review

    Trivedi, K. S. (Editor); Clary, J. B. (Editor)

    1980-01-01

    A computer aided reliability estimation procedure (CARE 3), developed to model the behavior of ultrareliable systems required by flight-critical avionics and control systems, is evaluated. The mathematical models, numerical method, and fault-tolerant architecture modeling requirements are examined, and the testing and characterization procedures are discussed. Recommendations aimed at enhancing CARE 3 are presented; in particular, the need for a better exposition of the method and the user interface is emphasized.

  6. Abnormal fault-recovery characteristics of the fault-tolerant multiprocessor uncovered using a new fault-injection methodology

    Padilla, Peter A.

    1991-03-01

    An investigation was made in AIRLAB of the fault handling performance of the Fault Tolerant MultiProcessor (FTMP). Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once in every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles Byzantine or lying faults. Byzantine faults behave such that the faulted unit points to a working unit as the source of errors. The design's problems involve: (1) the design and interface between the simplex error detection hardware and the error processing software, (2) the functional capabilities of the FTMP system bus, and (3) the communication requirements of a multiprocessor architecture. These weak areas in the FTMP's design increase the probability that, for any hardware fault, a good line replacement unit (LRU) is mistakenly disabled by the fault management software.

  7. Focused fault injection testing of software implemented fault tolerance mechanisms of Voltan TMR nodes

    Tao, S.; Ezhilchelvan, P. D.; Shrivastava, S. K.

    1995-03-01

    One way of gaining confidence in the adequacy of fault tolerance mechanisms of a system is to test the system by injecting faults and see how the system performs under faulty conditions. This paper presents an application of the focused fault injection method that has been developed for testing software implemented fault tolerance mechanisms of distributed systems. The method exploits the object oriented approach of software implementation to support the injection of specific classes of faults. With the focused fault injection method, the system tester is able to inject specific classes of faults (including malicious ones) such that the fault tolerance mechanisms of a target system can be tested adequately. The method has been applied to test the design and implementation of voting, clock synchronization, and ordering modules of the Voltan TMR (triple modular redundant) node. The tests performed uncovered three flaws in the system software.

  8. System Wide Joint Position Sensor Fault Tolerance in Robot Systems Using Cartesian Accelerometers

    Aldridge, Hal A.; Juang, Jer-Nan

    1997-01-01

    Joint position sensors are necessary for most robot control systems. A single position sensor failure in a normal robot system can greatly degrade performance. This paper presents a method to obtain position information from Cartesian accelerometers without integration. Depending on the number and location of the accelerometers. the proposed system can tolerate the loss of multiple position sensors. A solution technique suitable for real-time implementation is presented. Simulations were conducted using 5 triaxial accelerometers to recover from the loss of up to 4 joint position sensors on a 7 degree of freedom robot moving in general three dimensional space. The simulations show good estimation performance using non-ideal accelerometer measurements.

  9. Microcontroller-Based Fault Tolerant Data Acquisition System For Air Quality Monitoring And Control Of Environmental Pollution

    Tochukwu Chiagunye; Eze Aru Okereke; Ilo Somtoochukwu

    2015-01-01

    ABSTRACT The design applied Passive fault tolerance to a microcontroller based data acquisition system to achieve the stated considerations where redundant sensors and microcontrollers with associated circuitry were designed and implemented to enable measurement of pollutant concentration information from chimney vents in two industry. Microsoft visual basic was used to develop a data mining tool which implemented an underlying artificial neural network model for forecasting pollutant concent...

  10. Thermoelectric-Driven Sustainable Sensing and Actuation Systems for Fault-Tolerant Nuclear Incidents

    Longtin, Jon [Stony Brook Univ., NY (United States)

    2016-02-08

    safety systems, etc. Such an approach is intrinsically fault tolerant: in the event that system temperatures increase, the amount of available energy will increase, which will make more power available for applications. The system can also be used during normal conditions to provide enhanced monitoring of key system components.

  11. Thermoelectric-Driven Sustainable Sensing and Actuation Systems for Fault-Tolerant Nuclear Incidents

    safety systems, etc. Such an approach is intrinsically fault tolerant: in the event that system temperatures increase, the amount of available energy will increase, which will make more power available for applications. The system can also be used during normal conditions to provide enhanced monitoring of key system components.

  12. Robust Fault-Tolerant Control for Uncertain Networked Control Systems with State-Delay and Random Data Packet Dropout

    Xiaomei Qi

    2012-01-01

    Full Text Available A robust fault-tolerant controller design problem for networked control system (NCS with random packet dropout in both sensor-to-controller link and controller-to-actuator link is investigated. A novel stochastic NCS model with state-delay, model uncertainty, disturbance, probabilistic sensor failure, and actuator failure is proposed. The random packet dropout, sensor failures, and actuator failures are characterized by a binary random variable. The sufficient condition for asymptotical mean-square stability of NCS is derived and the closed-loop NCS satisfies H∞ performance constraints caused by the random packet dropout and disturbance. The fault-tolerant controller is designed by solving a linear matrix inequality. A numerical example is presented to illustrate the effectiveness of the proposed method.

  13. Low-Cost Fault Tolerant Methodology for Real Time MPSoC Based Embedded System

    Mohsin Amin

    2014-01-01

    Full Text Available We are proposing a design methodology for a fault tolerant homogeneous MPSoC having additional design objectives that include low hardware overhead and performance. We have implemented three different FT methodologies on MPSoCs and compared them against the defined constraints. The comparison of these FT methodologies is carried out by modelling their architectures in VHDL-RTL, on Spartan 3 FPGA. The results obtained through simulations helped us to identify the most relevant scheme in terms of the given design constraints.

  14. Improving Information Security by Implementing Fault Tolerance Concepts

    Aurel Serb

    2014-01-01

    Security issues are complex, and the risks of cyber crime are often difficult to stipulate, even for experts. The issues presented in this article try to be formed in a contribution to the consolidation of problems in the field of computer arhitecture cyber security. Fault tolerance is the best guarantee that high-confidence systems will not succumb to physical, design, or human-machine interaction faults. A fault tolerant system is one that can continue to operate reliably by producing accep...

  15. Fault Tolerant Ethernet Based Network for Time Sensitive Applications in Electrical Power Distribution Systems

    Leos Bohac

    2013-01-01

    Full Text Available The paper analyses and experimentally verifies deployment of Ethernet based network technology to enable fault tolerant and timely exchange of data among a number of high voltage protective relays that use proprietary serial communication line to exchange data in real time on a state of its high voltage circuitry facilitating a fast protection switching in case of critical failures. The digital serial signal is first fetched into PCM multiplexer where it is mapped to the corresponding E1 (2 Mbit/s time division multiplexed signal. Subsequently, the resulting E1 frames are then packetized and sent through Ethernet control LAN to the opposite PCM demultiplexer where the same but reverse processing is done finally sending a signal into the opposite protective relay. The challenge of this setup is to assure very timely delivery of the control information between protective relays even in the cases of potential failures of Ethernet network itself. The tolerance of Ethernet network to faults is assured using widespread per VLAN Rapid Spanning Tree Protocol potentially extended by 1+1 PCM protection as a valuable option.

  16. Evaluation of Simple Causal Message Logging for Large-Scale Fault Tolerant HPC Systems

    Bronevetsky, G; Meneses, E; Kale, L V

    2011-02-25

    The era of petascale computing brought machines with hundreds of thousands of processors. The next generation of exascale supercomputers will make available clusters with millions of processors. In those machines, mean time between failures will range from a few minutes to few tens of minutes, making the crash of a processor the common case, instead of a rarity. Parallel applications running on those large machines will need to simultaneously survive crashes and maintain high productivity. To achieve that, fault tolerance techniques will have to go beyond checkpoint/restart, which requires all processors to roll back in case of a failure. Incorporating some form of message logging will provide a framework where only a subset of processors are rolled back after a crash. In this paper, we discuss why a simple causal message logging protocol seems a promising alternative to provide fault tolerance in large supercomputers. As opposed to pessimistic message logging, it has low latency overhead, especially in collective communication operations. Besides, it saves messages when more than one thread is running per processor. Finally, we demonstrate that a simple causal message logging protocol has a faster recovery and a low performance penalty when compared to checkpoint/restart. Running NAS Parallel Benchmarks (CG, MG and BT) on 1024 processors, simple causal message logging has a latency overhead below 5%.

  17. A Piecewise Affine Hybrid Systems Approach to Fault Tolerant Satellite Formation Control

    Grunnet, Jacob Deleuran; Larsen, Jesper Abildgaard; Bak, Thomas;

    2008-01-01

    In this paper a procedure for modelling satellite formations   including failure dynamics as a piecewise-affine hybrid system is   shown. The formulation enables recently developed methods and tools   for control and analysis of piecewise-affine systems to be applied   leading to synthesis of fault...

  18. Measures of Fault Tolerance in Distributed Simulated Annealing

    Prakash, Aaditya

    2012-01-01

    In this paper, we examine the different measures of Fault Tolerance in a Distributed Simulated Annealing process. Optimization by Simulated Annealing on a distributed system is prone to various sources of failure. We analyse simulated annealing algorithm, its architecture in distributed platform and potential sources of failures. We examine the behaviour of tolerant distributed system for optimization task. We present possible methods to overcome the failures and achieve fault tolerance for t...

  19. Fault-Tolerant Process Control Methods and Applications

    Mhaskar, Prashant; Christofides, Panagiotis D

    2013-01-01

    Fault-Tolerant Process Control focuses on the development of general, yet practical, methods for the design of advanced fault-tolerant control systems; these ensure an efficient fault detection and a timely response to enhance fault recovery, prevent faults from propagating or developing into total failures, and reduce the risk of safety hazards. To this end, methods are presented for the design of advanced fault-tolerant control systems for chemical processes which explicitly deal with actuator/controller failures and sensor faults and data losses. Specifically, the book puts forward: ·         a framework for  detection, isolation and diagnosis of actuator and sensor faults for nonlinear systems; ·         controller reconfiguration and safe-parking-based fault-handling methodologies; ·         integrated-data- and model-based fault-detection and isolation and fault-tolerant control methods; ·         methods for handling sensor faults and data losses; and ·      ...

  20. Enhancement of Fault Tolerance in Cloud Computing

    Pushpanjali Gupta; Rasmi Ranjan Patra

    2014-01-01

    In recent years researchers are trying to work out scientific applications in cloud so that it decreases the infrastructure cost and increases the span of team and finally innovative ideas towards applications is increased. But the cloud is still not as much reliable, controllable as grid. So in the evolving Cloud computing environment there is a great need of fault tolerance mechanism for the system to work effectively even in the presence of failure. Moreover Big Organizations ar...

  1. Designing fault-tolerant real-time computer systems with diversified bus architecture for nuclear power plants

    Fault-tolerant real-time computer (FT-RTC) systems are widely used to perform safe operation of nuclear power plants (NPP) and safe shutdown in the event of any untoward situation. Design requirements for such systems need high reliability, availability, computational ability for measurement via sensors, control action via actuators, data communication and human interface via keyboard or display. All these attributes of FT-RTC systems are required to be implemented using best known methods such as redundant system design using diversified bus architecture to avoid common cause failure, fail-safe design to avoid unsafe failure and diagnostic features to validate system operation. In this context, the system designer must select efficient as well as highly reliable diversified bus architecture in order to realize fault-tolerant system design. This paper presents a comparative study between CompactPCI bus and Versa Module Eurocard (VME) bus architecture for designing FT-RTC systems with switch over logic system (SOLS) for NPP. (author)

  2. A Dynamic Slack Management Technique for Real-Time Distributed Embedded System with Enhanced Fault Tolerance and Resource Constraints

    Santhi Baskaran,

    2011-01-01

    Full Text Available This project work aims to develop a dynamic slack management technique, for real-time distributed embedded systems to reduce the total energy consumption in addition to timing, precedence and resource constraints. The Slack Distribution Technique proposed considers a modified Feedback Control Scheduling (FCS algorithm. This algorithm schedules dependent tasks effectively with precedence and resource constraints. It further minimizes the schedule length and utilizes the available slack to increase the energy efficiency. A fault tolerant mechanism uses a deferred-active-backup scheme increases the schedulability and provides reliability to the system.

  3. Fault-tolerant architectures for superconducting qubits

    In this short review, I draw attention to new developments in the theory of fault tolerance in quantum computation that may give concrete direction to future work in the development of superconducting qubit systems. The basics of quantum error-correction codes, which I will briefly review, have not significantly changed since their introduction 15 years ago. But an interesting picture has emerged of an efficient use of these codes that may put fault-tolerant operation within reach. It is now understood that two-dimensional surface codes, close relatives of the original toric code of Kitaev, can be adapted as shown by Raussendorf and Harrington to effectively perform logical gate operations in a very simple planar architecture, with error thresholds for fault-tolerant operation simulated to be 0.75%. This architecture uses topological ideas in its functioning, but it is not 'topological quantum computation'-there are no non-abelian anyons in sight. I offer some speculations on the crucial pieces of superconducting hardware that could be demonstrated in the next couple of years that would be clear stepping stones towards this surface-code architecture.

  4. Massive Sensor Array Fault Tolerance: Tolerance Mechanism and Fault Injection for Validation

    Dugan Um

    2010-01-01

    Full Text Available As today's machines become increasingly complex in order to handle intricate tasks, the number of sensors must increase for intelligent operations. Given the large number of sensors, detecting, isolating, and then tolerating faulty sensors is especially important. In this paper, we propose fault tolerance architecture suitable for a massive sensor array often found in highly advanced systems such as autonomous robots. One example is the sensitive skin, a type of massive sensor array. The objective of the sensitive skin is autonomous guidance of machines in unknown environments, requiring elongated operations in a remote site. The entirety of such a system needs to be able to work remotely without human attendance for an extended period of time. To that end, we propose a fault-tolerant architecture whereby component and analytical redundancies are integrated cohesively for effective failure tolerance of a massive array type sensor or sensor system. In addition, we discuss the evaluation results of the proposed tolerance scheme by means of fault injection and validation analysis as a measure of system reliability and performance.

  5. Multi-Fault Tolerance ofAircraft Control Surface Based on Fault-Tolerant Multi-Agent System%基于容错多智能体系统的飞机舵面多故障容错

    袁侃; 胡寿松

    2011-01-01

    To solve the problem of multi-fault diagnosis of complex systems, the basic concept of MultiAgent Systems (MAS) is expanded to establish the concept of Fault-Tolerant Multi-Agent System (FATMAS). Based on which the definitions of various functions, rules and flows for realizing multi-fault tolerance in FATMAS are given. Then the multiple faults can be detected by calling different functions of the system according to some definite flows, and the system can recover from the faults by copying the tasks of non-critical agents. The results of simulation in the F-16 aircraft multi-fault diagnosis and self-repairing illustrated that this method can effectively reduce the number of replicated agents through clear inter-agent message passing mechanism, and it is relatively close to the process of dealing with failures in actual system,therefore it can be used in complex system for multi-fault diagnosis and tolerance.%为解决复杂系统的多故障容错问题,首先将多智能体系统(MAS)的基本概念进行了扩充,定义了容错多智能体系统(FATMAS)的相关概念,并在此基础上提出了在FATMAS中实现多故障容错的各类函数定义、规则与流程.然后按照一定的流程调用不同函数就可对系统中的多故障进行检测,并可通过复制非关键智能体的任务使系统从多故障中恢复运行.F-16飞机多故障诊断与自修复的仿真结果说明,该方法能够通过清晰的智能体间消息传递机制有效地减少系统中复制智能体的数目,而且比较接近于实际系统中的故障处理过程,可用于复杂系统的多故障诊断与容错.

  6. Guaranteed Cost Active Fault-tolerant Control of Networked Control System with Packet Dropout and Transmission Delay

    Xiao-Yuan Luo; Mei-Jie Shang; Cai-Lian Chen; Xin-Ping Guan

    2010-01-01

    The problem of guaranteed cost active fault-tolerant controller (AFTC) design for networked control systems (NCSs)with both packet dropout and transmission delay is studied in this paper.Considering the packet dropout and transmission delay,a piecewise constant controller is adopted.With a guaranteed cost function,optimal controllers whose number is equal to the number of actuators are designed,and the design process is formulated as a convex optimal problem that can be solved by existing software.The control strategy is proposed as follows:when actuator failures appear,the fault detection and isolation unit sends out the information to the controller choosing strategy,and then the optimal stabilizing controller with the smallest guaranteed cost value is chosen.Two illustrative examples are given to demonstrate the effectiveness of the proposed approach.By comparing with the existing methods,it can be seen that our method has a better performance.

  7. Microcontroller-Based Fault Tolerant Data Acquisition System For Air Quality Monitoring And Control Of Environmental Pollution

    Tochukwu Chiagunye

    2015-08-01

    Full Text Available ABSTRACT The design applied Passive fault tolerance to a microcontroller based data acquisition system to achieve the stated considerations where redundant sensors and microcontrollers with associated circuitry were designed and implemented to enable measurement of pollutant concentration information from chimney vents in two industry. Microsoft visual basic was used to develop a data mining tool which implemented an underlying artificial neural network model for forecasting pollutant concentrations for future time periods. The feed forward back propagation method was used to train the ANN model with a training data set while a decision tree algorithm was used to select an optimal output result for the model from its two output neurons.

  8. Fault Tolerance in Control Architectures for Mobile Robots: Fantasy or Reality?

    Crestani, Didier; Godary-Dejean, Karen

    2012-01-01

    Due to the future development of robotic autonomous systems in human environment, the fault tolerance paradigm will be a central issue in robotics. This article presents a survey of fault tolerance concepts, means and implementations in robotic architectures.

  9. Fault tolerance and reliability in integrated ship control

    Nielsen, Jens Frederik Dalsgaard; Izadi-Zamanabadi, Roozbeh; Schiøler, Henrik

    2002-01-01

    Various strategies for achieving fault tolerance in large scale control systems are discussed. The positive and negative impacts of distribution through network communication are presented. The ATOMOS framework for standardized reliable marine automation is presented along with the corresponding...

  10. Fault-Tolerant Precision Formation Guidance for Interferometry Project

    National Aeronautics and Space Administration — A methodology is to be developed that will allow the development and implementation of fault-tolerant control system for distributed collaborative spacecraft. The...

  11. Fault-tolerant logics for FPGA linux

    The increasing use of SRAM-based reconfigurable architectures at important areas of research and development (like particle accelerators and space applications) brings new, currently partially unattended effects on top. An already well known, but nevertheless important problem of such systems is its susceptibility to radiation which increases in conjunction with particle flux and energy. Regarding to current knowledge, errors induced by Single Event Upsets (SEU) and Single Event Transients (SET) are handled exclusively in hardware by the use of spacial and temporal redundancy features. Our field of research is to extend conventional fault tolerance to multiple layers of embedded computer systems, starting with the FPGA bit layer and ending up in the software application layer to get a maximum of radiation tolerance in systems running FPGA Linux in radiation susceptible environments. Only a collaboration of all these layers is able to create an adequate amount of data security and process integrity.

  12. Extensions to the Parallel Real-Time Artificial Intelligence System (PRAIS) for fault-tolerant heterogeneous cycle-stealing reasoning

    Goldstein, David

    1991-01-01

    Extensions to an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS) are discussed. PRAIS strives for transparently parallelizing production (rule-based) systems, even under real-time constraints. PRAIS accomplished these goals (presented at the first annual C Language Integrated Production System (CLIPS) conference) by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors. Results using the original PRAIS architecture over a network of Sun 3's, Sun 4's and VAX's are presented. Mechanisms using the producer-consumer model to extend the architecture for fault-tolerance and distributed truth maintenance initiation are also discussed.

  13. A SAFE approach towards early design space exploration of Fault-tolerant multimedia MPSoCs

    P. van Stralen; A. Pimentel

    2012-01-01

    With the reduction in feature size, transient errors start to play an important role in modern embedded systems. It is therefore important to make fault-tolerance a first-class citizen in embedded system design. Fault-tolerance patterns are techniques to make an application fault-tolerant. Not only

  14. Diagnosis and Fault-tolerant Control

    Blanke, Mogens; Kinnaert, Michel; Lunze, Jan; Staroswiecki, Marcel

    The book presents effective model-based analysis and design methods for fault diagnosis and fault-tolerant control. Architectural and structural models are used to analyse the propagation of the fault through the process, to test the fault detectability and to find the redundancies in the process...... applicability of the presented methods. The theoretical results are illustrated by two running examples which are used throughout the book. The book addresses engineering students, engineers in industry and researchers who wish to get a survey over the variety of approaches to process diagnosis and fault...

  15. Software reliability through fault-avoidance and fault-tolerance

    Vouk, Mladen A.; Mcallister, David F.

    1993-01-01

    Strategies and tools for the testing, risk assessment and risk control of dependable software-based systems were developed. Part of this project consists of studies to enable the transfer of technology to industry, for example the risk management techniques for safety-concious systems. Theoretical investigations of Boolean and Relational Operator (BRO) testing strategy were conducted for condition-based testing. The Basic Graph Generation and Analysis tool (BGG) was extended to fully incorporate several variants of the BRO metric. Single- and multi-phase risk, coverage and time-based models are being developed to provide additional theoretical and empirical basis for estimation of the reliability and availability of large, highly dependable software. A model for software process and risk management was developed. The use of cause-effect graphing for software specification and validation was investigated. Lastly, advanced software fault-tolerance models were studied to provide alternatives and improvements in situations where simple software fault-tolerance strategies break down.

  16. Electrical Steering of Vehicles - Fault-tolerant Analysis and Design

    Blanke, Mogens; Thomsen, Jesper Sandberg

    2006-01-01

    The topic of this paper is systems that need be designed such that no single fault can cause failure at the overall level. A methodology is presented for analysis and design of fault-tolerant architectures, where diagnosis and autonomous reconfiguration can replace high cost triple redundancy...

  17. A Proactive Fault Tolerant Strategy for Desktop Grid

    Geeta Arora; Dr. Shaveta Rani; Dr. Paramjit Singh

    2015-01-01

    Desktop Grid resources are volatile, heterogeneous and geographically distributed in nature, so scheduling and fault tolerance become the important challenges for Desktop grid systems. In this paper a proactive fault tolerant strategy is developed which also considers the unwanted delays in association with speed and load of resources in the Grid while scheduling jobs on Grid resources. Simulation experiments are conducted by using GridSim toolkit 5.2. The experimental results obtained from a...

  18. Incorporating Fault Tolerance Tactics in Software Architecture Patterns

    Harrison, Neil B.; Avgeriou, Paris

    2008-01-01

    One important way that an architecture impacts fault tolerance is by making it easy or hard to implement measures that improve fault tolerance. Many such measures are described as fault tolerance tactics. We studied how various fault tolerance tactics can be implemented in the best-known architectur

  19. Object Replication and CORBA Fault-Tolerant Object Service

    2001-01-01

    CORBA (Common Object Request Broker Arc hitecture) provides 16Common Object Services for distributed application develo pment, but none of them are fault-tolerance related services. In this paper, we propose a replicated object based Fault-Tolerant Object Service (FTOS) for COR BA environment. Two fault-tolerant mechanisms are provided in FTOS including dy namic voting mechanism and object replication mechanism. The dynamic voting mech anism uses majority-voting strategy to ensure object state consistency in failu re situations. The object replication mechanism can help system administrators t o replicate and start-up objects easily. Our implementation provides a library according to the style of COSS. With this library, programmers can develop distr ibuted applications with fault-tolerance capability very easily.

  20. Byzantine-fault tolerant self-stabilizing protocol for distributed clock synchronization systems

    Malekpour, Mahyar R. (Inventor)

    2010-01-01

    A rapid Byzantine self-stabilizing clock synchronization protocol that self-stabilizes from any state, tolerates bursts of transient failures, and deterministically converges within a linear convergence time with respect to the self-stabilization period. Upon self-stabilization, all good clocks proceed synchronously. The Byzantine self-stabilizing clock synchronization protocol does not rely on any assumptions about the initial state of the clocks. Furthermore, there is neither a central clock nor an externally generated pulse system. The protocol converges deterministically, is scalable, and self-stabilizes in a short amount of time. The convergence time is linear with respect to the self-stabilization period.

  1. The New Fault Tolerant Onboard Computer for Microsatellite Missions

    2006-01-01

    This paper describes an onboard computer with dual processing modules. Each processing module is composed of 32 bit ARM reduced instruction set computer processor and other commercial-off-the-shelf devices. A set of fault handling mechanisms is implemented in the computer system, which enables the system to tolerate a single fault. The onboard software is organized around a set of processes that communicate among each other through a routing process. Meeting an extremely tight set of constraints that include mass, volume, power consumption and space environmental conditions, the fault-tolerant onboard computer has excellent data processing capability that can meet the erquirements of micro-satellite missions.

  2. A failure-distance dased method to bound the reliability of non-repairable fault-tolerant systems without the knowledge of minimal cuts

    Suñé, Víctor; Carrasco, Juan A.

    2001-01-01

    CTMC (continuous-time Markov chains) are a commonly used formalism for modeling fault-tolerant systems. One of the major drawbacks of CTMC is the well-known state-space explosion problem. This work develops and analyzes a method (SC-BM) to compute bounds for the reliability of non-repairable fault-tolerant systems in which only a portion of the state space of the CTMC is generated. SC-BM uses the failure distance concept as the method described in [1] but, unlike that method, w...

  3. Incorporating Fault Tolerance Tactics in Software Architecture Patterns

    Harrison, Neil B.; Avgeriou, Paris

    2008-01-01

    One important way that an architecture impacts fault tolerance is by making it easy or hard to implement measures that improve fault tolerance. Many such measures are described as fault tolerance tactics. We studied how various fault tolerance tactics can be implemented in the best-known architecture patterns. This shows that certain patterns are better suited to implementing fault tolerance tactics than others, and that certain alternate tactics are better matches than others for a given pat...

  4. USAGE OF STANDARD PERSONAL COMPUTER PORTS FOR DESIGNING OF THE DOUBLE REDUNDANT FAULT-TOLERANT COMPUTER CONTROL SYSTEMS

    Rafig SAMEDOV

    2005-01-01

    Full Text Available In this study, for designing of the fault-tolerant control systems by using standard personal computers, the ports have been investigated, different structure versions have been designed and the method for choosing of an optimal structure has been suggested. In this scope, first of all, the ÇİFTYAK system has been defined and its work principle has been determined. Then, data transmission ports of the standard personal computers have been classified and analyzed. After that, the structure versions have been designed and evaluated according to the used data transmission methods, the numbers of ports and the criterions of reliability, performance, truth, control and cost. Finally, the method for choosing of the most optimal structure version has been suggested.

  5. Enhancement of Fault Tolerance in Cloud Computing

    Pushpanjali Gupta

    2014-08-01

    Full Text Available In recent years researchers are trying to work out scientific applications in cloud so that it decreases the infrastructure cost and increases the span of team and finally innovative ideas towards applications is increased. But the cloud is still not as much reliable, controllable as grid. So in the evolving Cloud computing environment there is a great need of fault tolerance mechanism for the system to work effectively even in the presence of failure. Moreover Big Organizations are also opting for using Hybrid Cloud instead of private Cloud. Thus, in this paper we propose an approach of using a new framework in Cloud so as to use Cloud for scientific applications as well makes the public Cloud trustworthy platform. There is a progressive approach introduced to provide an effective way to achieve high fault tolerance in Clouds by enabling a new workflow planning method to balance performance, reliability and cost for critical scientific applications and focus mainly on use of distributed resources for workflow execution mainly in serial and concurrent manner.

  6. ENHANCEMENT OF FAULT TOLERANCE IN CLOUD COMPUTING

    Pushpanjali Gupta

    2015-10-01

    Full Text Available In recent years researchers are trying to work out scientific applications in cloud so that it decreases the infrastructure cost and increases the span of team and finally innovative ideas towards applications is increased. But the cloud is still not as much reliable, controllable as grid. So in the evolving Cloud computing environment there is a great need of fault tolerance mechanism for the system to work effectively even in the presence of failure. Moreover Big Organizations are also opting for using Hybrid Cloud instead of private Cloud. Thus, in this paper we propose an approach of using a new framework in Cloud so as to use Cloud for scientific applications as well makes the public Cloud trustworthy platform. There is a progressive approach introduced to provide an effective way to achieve high fault tolerance in Clouds by enabling a new workflow planning method to balance performance, reliability and cost for critical scientific applications and focus mainly on use of distributed resources for workflow execution mainly in serial and concurrent manner.

  7. Simulation modeling based method for choosing an effective set of fault tolerance mechanisms for real-time avionics systems

    Bakhmurov, A. G.; Balashov, V. V.; Glonina, A. B.; Pashkov, V. N.; Smeliansky, R. L.; Volkanov, D. Yu.

    2013-12-01

    In this paper, the reliability allocation problem (RAP) for real-time avionics systems (RTAS) is considered. The proposed method for solving this problem consists of two steps: (i) creation of an RTAS simulation model at the necessary level of abstraction and (ii) application of metaheuristic algorithm to find an optimal solution (i. e., to choose an optimal set of fault tolerance techniques). When during the algorithm execution it is necessary to measure the execution time of some software components, the simulation modeling is applied. The procedure of simulation modeling also consists of the following steps: automatic construction of simulation model of the RTAS configuration and running this model in a simulation environment to measure the required time. This method was implemented as an experimental software tool. The tool works in cooperation with DYANA simulation environment. The results of experiments with the implemented method are presented. Finally, future plans for development of the presented method and tool are briefly described.

  8. DESIGN AND IMPLEMENTATION OF PROCESS FAULT-TOLERANT SYSTEM FOR HIGH-PERFORMANCE FAULT-TOLERANT COMPUTER%面向高端容错计算机的进程容错系统设计与实现

    吴楠; 张东; 刘璧怡

    2013-01-01

    High-end fault-tolerant computers are mainly used in key sectors such as banking and telecommunications, and are extremely sensitive to failure, so it is extremely important to guarantee the availability of their key processes. Common mechanism of fault-tolerant is mainly realised based on static structural redundancy principle, but the redundancy in hardware layer costs high and is complex in execution, while the redundancy in application layer is of low versatile. This paper proposes a fault-tolerant mechanism and policy based on process redundancy, which constructs dual-modular redundancy or multi-modular redundancy on key application processes. The method employs the means of interprocess synchronisation to ensure the operation of redundancy processes based on the same execution logic, supervises the system and makes corresponding error handing on different faults. Compared with traditional fault-tolerant way, the process fault-tolerant management system has the characteristics of high versatility and low cost, can effectively ensure high reliability of the system with less performance lost and avoid the complexity in hardware customisation at the same time, while it keeps the transparent to applications and users as well.%高端容错计算机主要应用于银行、电信等关键领域中,对于系统失效极其敏感,保证系统关键进程的可靠性至关重要.常见的容错机制主要依据静态结构冗余原理实现,然而硬件层的冗余成本很高且实现复杂,应用软件层的冗余则不具有通用性.提出一种基于进程冗余的容错机制和策略,对关键进程构造双模冗余或多模冗余,采用进程间同步等手段确保冗余进程按照同样的执行逻辑运行,监控系统并对不同的错误进行相应的错误处理.与传统的容错方式相比,进程容错管理系统具有通用性高、成本低等特点,能在较小的性能损耗下有效地保证系统的高可靠性,同时避

  9. Fault-tolerant building-block computer study

    Rennels, D. A.

    1978-01-01

    Ultra-reliable core computers are required for improving the reliability of complex military systems. Such computers can provide reliable fault diagnosis, failure circumvention, and, in some cases serve as an automated repairman for their host systems. A small set of building-block circuits which can be implemented as single very large integration devices, and which can be used with off-the-shelf microprocessors and memories to build self checking computer modules (SCCM) is described. Each SCCM is a microcomputer which is capable of detecting its own faults during normal operation and is described to communicate with other identical modules over one or more Mil Standard 1553A buses. Several SCCMs can be connected into a network with backup spares to provide fault-tolerant operation, i.e. automated recovery from faults. Alternative fault-tolerant SCCM configurations are discussed along with the cost and reliability associated with their implementation.

  10. Design and Verification of Fault-Tolerant Components

    Zhang, Miaomiao; Liu, Zhiming; Ravn, Anders Peter; Morisset, Charles

    2009-01-01

    We present a systematic approach to design and verification of fault-tolerant components with real-time properties as found in embedded systems. A state machine model of the correct component is augmented with internal transitions that represent hypothesized faults. Also, constraints on the...... occurrence or timing of faults are included in this model. This model of a faulty component is then extended with fault detection and recovery mechanisms, again in the form of state machines. Desired properties of the component are model checked for each of the successive models. The models can be made...... relatively detailed such that they can serve directly as blueprints for engineering, and yet be amenable to exhaustive verication. The approach is illustrated with a design of a triple modular fault-tolerant system that is a real case we received from our collaborators in the aerospace field. We use UPPAAL...

  11. Scheduling and Voltage Scaling for Energy/Reliability Trade-offs in Fault-Tolerant Time-Triggered Embedded Systems

    Pop, Paul; Poulsen, Kåre Harbo; Izosimov, Viacheslav; Eles, Petru

    -execution and dynamic voltage scaling-based low-power techniques are competing for the slack in the schedules. Our approach decides the voltage levels and start times of processes and the transmission times of messages, such that the transient faults are tolerated, the timing constraints of the application are...

  12. Computer aided reliability, availability, and safety modeling for fault-tolerant computer systems with commentary on the HARP program

    Shooman, Martin L.

    1991-01-01

    Many of the most challenging reliability problems of our present decade involve complex distributed systems such as interconnected telephone switching computers, air traffic control centers, aircraft and space vehicles, and local area and wide area computer networks. In addition to the challenge of complexity, modern fault-tolerant computer systems require very high levels of reliability, e.g., avionic computers with MTTF goals of one billion hours. Most analysts find that it is too difficult to model such complex systems without computer aided design programs. In response to this need, NASA has developed a suite of computer aided reliability modeling programs beginning with CARE 3 and including a group of new programs such as: HARP, HARP-PC, Reliability Analysts Workbench (Combination of model solvers SURE, STEM, PAWS, and common front-end model ASSIST), and the Fault Tree Compiler. The HARP program is studied and how well the user can model systems using this program is investigated. One of the important objectives will be to study how user friendly this program is, e.g., how easy it is to model the system, provide the input information, and interpret the results. The experiences of the author and his graduate students who used HARP in two graduate courses are described. Some brief comparisons were made with the ARIES program which the students also used. Theoretical studies of the modeling techniques used in HARP are also included. Of course no answer can be any more accurate than the fidelity of the model, thus an Appendix is included which discusses modeling accuracy. A broad viewpoint is taken and all problems which occurred in the use of HARP are discussed. Such problems include: computer system problems, installation manual problems, user manual problems, program inconsistencies, program limitations, confusing notation, long run times, accuracy problems, etc.

  13. FTMP (Fault Tolerant Multiprocessor) programmer's manual

    Feather, F. E.; Liceaga, C. A.; Padilla, P. A.

    1986-01-01

    The Fault Tolerant Multiprocessor (FTMP) computer system was constructed using the Rockwell/Collins CAPS-6 processor. It is installed in the Avionics Integration Research Laboratory (AIRLAB) of NASA Langley Research Center. It is hosted by AIRLAB's System 10, a VAX 11/750, for the loading of programs and experimentation. The FTMP support software includes a cross compiler for a high level language called Automated Engineering Design (AED) System, an assembler for the CAPS-6 processor assembly language, and a linker. Access to this support software is through an automated remote access facility on the VAX which relieves the user of the burden of learning how to use the IBM 4381. This manual is a compilation of information about the FTMP support environment. It explains the FTMP software and support environment along many of the finer points of running programs on FTMP. This will be helpful to the researcher trying to run an experiment on FTMP and even to the person probing FTMP with fault injections. Much of the information in this manual can be found in other sources; we are only attempting to bring together the basic points in a single source. If the reader should need points clarified, there is a list of support documentation in the back of this manual.

  14. Fault Tolerant Parallel Filters Based On Bch Codes

    K.Mohana Krishna

    2015-04-01

    Full Text Available Digital filters are used in signal processing and communication systems. In some cases, the reliability of those systems is critical, and fault tolerant filter implementations are needed. Over the years, many techniques that exploit the filters’ structure and properties to achieve fault tolerance have been proposed. As technology scales, it enables more complex systems that incorporate many filters. In those complex systems, it is common that some of the filters operate in parallel, for example, by applying the same filter to different input signals. Recently, a simple technique that exploits the presence of parallel filters to achieve multiple fault tolerance has been presented. In this brief, that idea is generalized to show that parallel filters can be protected using Bose– Chaudhuri–Hocquenghem codes (BCH in which each filter is the equivalent of a bit in a traditional ECC. This new scheme allows more efficient protection when the number of parallel filters is large.

  15. Fault-Tolerant Mechanism of the Distributed Cluster Computers"

    SHANG Yizi; JIN Yang; WU Baosheng

    2007-01-01

    The distributed system with high performance and stability is commonly adopted in large scale scientific and engineering computing. In this paper, we discuss a fault-tolerant mechanism under Linux circumstance to improve the fault-tolerant ability of the system, namely a scheme and frame to form the stable computing platform. In terms of the structure and function of the distributed system, active list and file invocation strategies are employed in the task management. System multilevel fault-tolerance can be achieved by repeated processes in a single node and task migration on multi-nodes. Manager node agent introduced in this paper administrates the nodes using the list, disposes of the tasks according to the nodes'performance, and hence, to be able to make full use of the cluster resources. An evaluation method is proposed to appraise the performance. The analyzed results show the usefulness of the scheme proposed except for some additional overhead of memory consumption.

  16. Energy/Reliability Trade-offs in Fault-Tolerant Event-Triggered Distributed Embedded Systems

    Gan, Junhe; Gruian, Flavius; Pop, Paul; Madsen, Jan

    2011-01-01

    reliability simultaneously is especially challenging, since lowering the voltage to reduce the energy consumption has been shown to increase the transient fault rate. We presented a Tabu Search-based approach which uses an energy/reliability trade-off model to find reliable and schedulable implementations...

  17. A continuous-time semi-markov bayesian belief network model for availability measure estimation of fault tolerant systems

    Márcio das Chagas Moura

    2008-08-01

    Full Text Available In this work it is proposed a model for the assessment of availability measure of fault tolerant systems based on the integration of continuous time semi-Markov processes and Bayesian belief networks. This integration results in a hybrid stochastic model that is able to represent the dynamic characteristics of a system as well as to deal with cause-effect relationships among external factors such as environmental and operational conditions. The hybrid model also allows for uncertainty propagation on the system availability. It is also proposed a numerical procedure for the solution of the state probability equations of semi-Markov processes described in terms of transition rates. The numerical procedure is based on the application of Laplace transforms that are inverted by the Gauss quadrature method known as Gauss Legendre. The hybrid model and numerical procedure are illustrated by means of an example of application in the context of fault tolerant systems.Neste trabalho, é proposto um modelo baseado na integração entre processos semi-Markovianos e redes Bayesianas para avaliação da disponibilidade de sistemas tolerantes à falha. Esta integração resulta em um modelo estocástico híbrido o qual é capaz de representar as características dinâmicas de um sistema assim como tratar as relações de causa e efeito entre fatores externos tais como condições ambientais e operacionais. Além disso, o modelo híbrido permite avaliar a propagação de incerteza sobre a disponibilidade do sistema. É também proposto um procedimento numérico para a solução das equações de probabilidade de estado de processos semi-Markovianos descritos por taxas de transição. Tal procedimento numérico é baseado na aplicação de transformadas de Laplace que são invertidas pelo método de quadratura Gaussiana conhecido como Gauss Legendre. O modelo híbrido e procedimento numérico são ilustrados por meio de um exemplo de aplicação no contexto de

  18. Electronic Power Switch for Fault-Tolerant Networks

    Volp, J.

    1987-01-01

    Power field-effect transistors reduce energy waste and simplify interconnections. Current switch containing power field-effect transistor (PFET) placed in series with each load in fault-tolerant power-distribution system. If system includes several loads and supplies, switches placed in series with adjacent loads and supplies. System of switches protects against overloads and losses of individual power sources.

  19. SIFT - Multiprocessor architecture for Software Implemented Fault Tolerance flight control and avionics computers

    Forman, P.; Moses, K.

    1979-01-01

    A brief description of a SIFT (Software Implemented Fault Tolerance) Flight Control Computer with emphasis on implementation is presented. A multiprocessor system that relies on software-implemented fault detection and reconfiguration algorithms is described. A high level reliability and fault tolerance is achieved by the replication of computing tasks among processing units.

  20. Fault Tolerant Heterogeneous Limited Duplication Scheduling algorithm for Decentralized Grid

    DR. NITIN

    2013-04-01

    Full Text Available Fault tolerance is one of the most desirable property in decentralized grid computing systems, where computational resources are geographically distributed. These resources collaborate in order to execute workflow applications as fast as possible. In workflow applications, tasks are dependent on each other, so it becomes extremely vital that scheduling techniques should also have some decentralized fault tolerant mechanism. In this paper, we have proposed a decentralized fault tolerant mechanism which utilize the checkpoint concept; for Heterogeneous Limited Duplication (HLD algorithm. HLD is based on task duplication scheduling in heterogeneous environment. There are two fold benefits firstly; if node failure occurs then rest of grid nodes sustain the execution of application. Secondly, less makespan of application is obtained using checkpoint concept. Therefore, application scheduled over decentralized grid systems (which are known for their unreliable behavior will yield results fast utilizing algorithm proposed in this paper.

  1. Learning Fault-tolerant Speech Parsing with SCREEN

    Wermter, S; Wermter, Stefan; Weber, Volker

    1994-01-01

    This paper describes a new approach and a system SCREEN for fault-tolerant speech parsing. SCREEEN stands for Symbolic Connectionist Robust EnterprisE for Natural language. Speech parsing describes the syntactic and semantic analysis of spontaneous spoken language. The general approach is based on incremental immediate flat analysis, learning of syntactic and semantic speech parsing, parallel integration of current hypotheses, and the consideration of various forms of speech related errors. The goal for this approach is to explore the parallel interactions between various knowledge sources for learning incremental fault-tolerant speech parsing. This approach is examined in a system SCREEN using various hybrid connectionist techniques. Hybrid connectionist techniques are examined because of their promising properties of inherent fault tolerance, learning, gradedness and parallel constraint integration. The input for SCREEN is hypotheses about recognized words of a spoken utterance potentially analyzed by a spe...

  2. Adding Fault Tolerance to NPB Benchmarks Using ULFM

    Parchman, Zachary W [Tennessee Technological University (TTU); Vallee, Geoffroy R [ORNL; Naughton III, Thomas J [ORNL; Engelmann, Christian [ORNL; Bernholdt, David E [ORNL; Scott, Stephen L [Tennessee Technological University (TTU)

    2016-01-01

    In the world of high-performance computing, fault tolerance and application resilience are becoming some of the primary concerns because of increasing hardware failures and memory corruptions. While the research community has been investigating various options, from system-level solutions to application-level solutions, standards such as the Message Passing Interface (MPI) are also starting to include such capabilities. The current proposal for MPI fault tolerant is centered around the User-Level Failure Mitigation (ULFM) concept, which provides means for fault detection and recovery of the MPI layer. This approach does not address application-level recovery, which is currently left to application developers. In this work, we present a mod- ification of some of the benchmarks of the NAS parallel benchmark (NPB) to include support of the ULFM capabilities as well as application-level strategies and mechanisms for application-level failure recovery. As such, we present: (i) an application-level library to checkpoint and restore data, (ii) extensions of NPB benchmarks for fault tolerance based on different strategies, (iii) a fault injection tool, and (iv) some preliminary results that show the impact of such fault tolerant strategies on the application execution.

  3. Analysis of a hardware and software fault tolerant processor for critical applications

    Dugan, Joanne B.

    1993-01-01

    Computer systems for critical applications must be designed to tolerate software faults as well as hardware faults. A unified approach to tolerating hardware and software faults is characterized by classifying faults in terms of duration (transient or permanent) rather than source (hardware or software). Errors arising from transient faults can be handled through masking or voting, but errors arising from permanent faults require system reconfiguration to bypass the failed component. Most errors which are caused by software faults can be considered transient, in that they are input-dependent. Software faults are triggered by a particular set of inputs. Quantitative dependability analysis of systems which exhibit a unified approach to fault tolerance can be performed by a hierarchical combination of fault tree and Markov models. A methodology for analyzing hardware and software fault tolerant systems is applied to the analysis of a hypothetical system, loosely based on the Fault Tolerant Parallel Processor. The models consider both transient and permanent faults, hardware and software faults, independent and related software faults, automatic recovery, and reconfiguration.

  4. Design Approach for Fault Tolerance in FPGA Architecture

    Ms. Shweta S. Meshram

    2011-03-01

    Full Text Available Failures of nano-metric technologies owing to defects and shrinking process tolerances give rise tosignificant challenges for IC testing. In recent years the application space of reconfigurable devices hasgrown to include many platforms with a strong need for fault tolerance. While these systems frequentlycontain hardware redundancy to allow for continued operation in the presence of operational faults, theneed to recover faulty hardware and return it to full functionality quickly and efficiently is great. Inaddition to providing functional density, FPGAs provide a level of fault tolerance generally not found inmask-programmable devices by including the capability to reconfigure around operational faults in thefield. Reliability and process variability are serious issues for FPGAs in the future. With advancement inprocess technology, the feature size is decreasing which leads to higher defect densities, moresophisticated techniques at increased costs are required to avoid defects. If nano-technology fabricationare applied the yield may go down to zero as avoiding defect during fabrication will not be a feasibleoption Hence, feature architecture have to be defect tolerant. In regular structure like FPGA, redundancyis commonly used for fault tolerance. In this work we present a solution in which configuration bit-streamof FPGA is modified by a hardware controller that is present on the chip itself. The technique usesredundant device for replacing faulty device and increases the yield.

  5. Design Approach for Fault Tolerance in FPGA Architecture

    Ms. Shweta S. Meshram

    2011-03-01

    Full Text Available Failures of nano-metric technologies owing to defects and shrinking process tolerances give rise to significant challenges for IC testing. In recent years the application space of reconfigurable devices has grown to include many platforms with a strong need for fault tolerance. While these systems frequently contain hardware redundancy to allow for continued operation in the presence of operational faults, the need to recover faulty hardware and return it to full functionality quickly and efficiently is great. In addition to providing functional density, FPGAs provide a level of fault tolerance generally not found in mask-programmable devices by including the capability to reconfigure around operational faults in the field. Reliability and process variability are serious issues for FPGAs in the future. With advancement in process technology, the feature size is decreasing which leads to higher defect densities, more sophisticated techniques at increased costs are required to avoid defects. If nano-technology fabrication are applied the yield may go down to zero as avoiding defect during fabrication will not be a feasible option Hence, feature architecture have to be defect tolerant. In regular structure like FPGA, redundancy is commonly used for fault tolerance. In this work we present a solution in which configuration bit-stream of FPGA is modified by a hardware controller that is present on the chip itself. The technique uses redundant device for replacing faulty device and increases the yield.

  6. Fault Tolerant Control for Civil Structures Based on LMI Approach

    Chunxu Qu

    2013-01-01

    Full Text Available The control system may lose the performance to suppress the structural vibration due to the faults in sensors or actuators. This paper designs the filter to perform the fault detection and isolation (FDI and then reforms the control strategy to achieve the fault tolerant control (FTC. The dynamic equation of the structure with active mass damper (AMD is first formulated. Then, an estimated system is built to transform the FDI filter design problem to the static gain optimization problem. The gain is designed to minimize the gap between the estimated system and the practical system, which can be calculated by linear matrix inequality (LMI approach. The FDI filter is finally used to isolate the sensor faults and reform the FTC strategy. The efficiency of FDI and FTC is validated by the numerical simulation of a three-story structure with AMD system with the consideration of sensor faults. The results show that the proposed FDI filter can detect the sensor faults and FTC controller can effectively tolerate the faults and suppress the structural vibration.

  7. False alarms in fault-tolerant dominating sets in graphs

    Mateusz Nikodem

    2012-01-01

    Full Text Available We develop the problem of fault-tolerant dominating sets (liar's dominating sets in graphs. Namely, we consider a new kind of fault - a false alarm. Characterization of such fault-tolerant dominating sets in three different cases (dependent on the classification of the types of the faults are presented.

  8. Fault Tolerant System for Sparse Traffic Grooming in Optical WDM Mesh Networks Using Combiner Queue

    Sandip R. Shinde

    2016-03-01

    Full Text Available Queuing theory is an important concept in current internet technology. As the requirement of bandwidth goes on increasing it is necessary to use optical communication for transfer of data. Optical communication at backbone network requires various devices for traffic grooming. The cost of these devices is very high which leads to increase in the cost of network. One of the solutions to this problem is to have sparse traffic grooming in optical WDM mesh network. Sparse traffic grooming allows only few nodes in the network as grooming node (G-node. These G-nodes has the grooming capability and other nodes are simple nodes where traffic grooming is not possible. The grooming nodes are the special nodes which has high cost. The possibility of faults at such a node, or link failure is high. Resolving such faults and providing efficient network is very important. So we have importance of such survivable sparse traffic grooming network. Queuing theory helps to improve the result of network and groom the traffic in the network. The paper focuses on the improvement in performance of the backbone network and reduction in blocking probability. To achieve the goals of the work we have simulated the model. The main contribution is to use survivability on the sparse grooming network and use of combiner queues at each node. It has observed that Combiner queuing alone does the job of minimizing blocking probability and balancing the load over the network. The model is not only cost effective but also increases the performance of network and minimizes the call blocking probability

  9. Rule-based fault-tolerant flight control

    Handelman, Dave

    1988-01-01

    Fault tolerance has always been a desirable characteristic of aircraft. The ability to withstand unexpected changes in aircraft configuration has a direct impact on the ability to complete a mission effectively and safely. The possible synergistic effects of combining techniques of modern control theory, statistical hypothesis testing, and artificial intelligence in the attempt to provide failure accommodation for aircraft are investigated. This effort has resulted in the definition of a theory for rule based control and a system for development of such a rule based controller. Although presented here in response to the goal of aircraft fault tolerance, the rule based control technique is applicable to a wide range of complex control problems.

  10. Fault-tolerance for MPI Codes on Computational Clusters

    Hagen, Knut Imar

    2007-01-01

    This thesis focuses on fault-tolerance for MPI codes on computational clusters. When an application runs on a very large cluster with thousands of processors, there is likely that a process crashes due to a hardware or software failure. Fault-tolerance is the ability of a system to respond gracefully to an unexpected hardware or software failure. A test application which is meant to run for several weeks on several nodes is used in this thesis. The application is a seismic MPI application, w...

  11. Fault-tolerant search algorithms reliable computation with unreliable information

    Cicalese, Ferdinando

    2013-01-01

    Why a book on fault-tolerant search algorithms? Searching is one of the fundamental problems in computer science. Time and again algorithmic and combinatorial issues originally studied in the context of search find application in the most diverse areas of computer science and discrete mathematics. On the other hand, fault-tolerance is a necessary ingredient of computing. Due to their inherent complexity, information systems are naturally prone to errors, which may appear at any level - as imprecisions in the data, bugs in the software, or transient or permanent hardware failures. This book pr

  12. Fault Tolerant Control of Wind Turbines

    Odgaard, Peter Fogh; Stoustrup, Jakob; Kinnaert, Michel

    2013-01-01

    This paper presents a test benchmark model for the evaluation of fault detection and accommodation schemes. This benchmark model deals with the wind turbine on a system level, and it includes sensor, actuator, and system faults, namely faults in the pitch system, the drive train, the generator, and...... the converter system. Since it is a system-level model, converter and pitch system models are simplified because these are controlled by internal controllers working at higher frequencies than the system model. The model represents a three-bladed pitch-controlled variable-speed wind turbine with a...

  13. Implementation of middleware fault tolerance support for real-time embedded applications

    Afonso, Francisco; Carlos A. Silva; Montenegro, Sérgio; Tavares, Adriano

    2006-01-01

    Critical real-time embedded systems need to apply fault tolerance strategies to deal with operation time errors, either in hardware or software. In this paper we present the ongoing work to provide application fault tolerance by means of implementing middleware transparent support over the BOSS embedded operating system. The middleware uses a publishersubscriber protocol and enables the execution of several fault tolerance strategies with minimum burden to the application level software

  14. Control switching in high performance and fault tolerant control

    Niemann, Hans Henrik; Poulsen, Niels Kjølstad

    2010-01-01

    The problem of reliability in high performance control and in fault tolerant control is considered in this paper. A feedback controller architecture for high performance and fault tolerance is considered. The architecture is based on the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization. By using...... the nominal controller in the architecture as a simple and robust controller, it is possible to use the YJBK transfer function for optimization of the closed-loop performance. This can be done both in connections with normal operation of the system as well as in connection with faults in the system....... The architecture will also allow changing the applied sensors and/or actuators when switching between different controllers. This switchingget particular simple for open-loop stable systems....

  15. Design methodology for fault-tolerant control of advanced driver assistance systems

    Gietelink, O.J.; Ploeg, J.; Schutter, B. de; Verhaegen, M.H.G.

    2003-01-01

    The objective of this project is to develop a methodology for the design, testing, evaluation and implementation of control systems for Advanced Driver Assistance Systems (ADAS). Examples of ADAS are collision avoidance systems, lane departure warning systems, pre-crash sensing, and adaptive cruise

  16. Fault Detection for Shipboard Monitoring and Decision Support Systems

    Lajic, Zoran; Nielsen, Ulrik Dam

    2009-01-01

    In this paper a basic idea of a fault-tolerant monitoring and decision support system will be explained. Fault detection is an important part of the fault-tolerant design for in-service monitoring and decision support systems for ships. In the paper, a virtual example of fault detection will be p...

  17. Design of Test Articles and Monitoring System for the Characterization of HIRF Effects on a Fault-Tolerant Computer Communication System

    Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Miner, Paul S.; Koppen, Sandra V.

    2008-01-01

    This report describes the design of the test articles and monitoring systems developed to characterize the response of a fault-tolerant computer communication system when stressed beyond the theoretical limits for guaranteed correct performance. A high-intensity radiated electromagnetic field (HIRF) environment was selected as the means of injecting faults, as such environments are known to have the potential to cause arbitrary and coincident common-mode fault manifestations that can overwhelm redundancy management mechanisms. The monitors generate stimuli for the systems-under-test (SUTs) and collect data in real-time on the internal state and the response at the external interfaces. A real-time health assessment capability was developed to support the automation of the test. A detailed description of the nature and structure of the collected data is included. The goal of the report is to provide insight into the design and operation of these systems, and to serve as a reference document for use in post-test analyses.

  18. Steps toward fault-tolerant quantum chemistry.

    Taube, Andrew Garvin

    2010-05-01

    Developing quantum chemistry programs on the coming generation of exascale computers will be a difficult task. The programs will need to be fault-tolerant and minimize the use of global operations. This work explores the use a task-based model that uses a data-centric approach to allocate work to different processes as it applies to quantum chemistry. After introducing the key problems that appear when trying to parallelize a complicated quantum chemistry method such as coupled-cluster theory, we discuss the implications of that model as it pertains to the computational kernel of a coupled-cluster program - matrix multiplication. Also, we discuss the extensions that would required to build a full coupled-cluster program using the task-based model. Current programming models for high-performance computing are fault-intolerant and use global operations. Those properties are unsustainable as computers scale to millions of CPUs; instead one must recognize that these systems will be hierarchical in structure, prone to constant faults, and global operations will be infeasible. The FAST-OS HARE project is introducing a scale-free computing model to address these issues. This model is hierarchical and fault-tolerant by design, allows for the clean overlap of computation and communication, reducing the network load, does not require checkpointing, and avoids the complexity of many HPC runtimes. Development of an algorithm within this model requires a change in focus from imperative programming to a data-centric approach. Quantum chemistry (QC) algorithms, in particular electronic structure methods, are an ideal test bed for this computing model. These methods describe the distribution of electrons in a molecule, which determine the properties of the molecule. The computational cost of these methods is high, scaling quartically or higher in the size of the molecule, which is why QC applications are major users of HPC resources. The complexity of these algorithms means that

  19. Nonlinear, Adaptive and Fault-tolerant Control for Electro-hydraulic Servo Systems

    Choux, Martin; Blanke, Mogens; Hovland, Geir

    2011-01-01

    Fluid power systems have been in use since 1795 with the rst hydraulic press patented by Joseph Bramah and today form the basis of many industries. Electro hydraulic servo systems are uid power systems controlled in closed-loop. They transform reference input signals into a set of movements in hydraulic actuators (cylinders or motors) by the means of hydraulic uid under pressure. With the development of computing power and control techniques during the last few decades, they are used increasi...

  20. Implementations of a four-level mechanical architecture for fault-tolerant robots

    This paper describes a fault tolerant mechanical architecture with four levels devised and implemented in concert with NASA (Tesar, D. and Sreevijayan, D., Four-level fault tolerance in manipulator design for space operations. In First Int. Symp. Measurement and Control in Robotics (ISMCR '90), Houston, Texas, 20-22 June 1990.) Subsequent work has clarified and revised the architecture. The four levels proceed from fault tolerance at the actuator level, to fault tolerance via in-parallel chains, to fault tolerance using serial kinematic redundancy, and finally to the fault tolerance multiple arm systems provide. This is a subsumptive architecture because each successive layer can incorporate the fault tolerance provided by all layers beneath. For instance a serially-redundant robot can incorporate dual fault-tolerant actuators. Redundant systems provide the fault tolerance, but the guiding principle of this architecture is that functional redundancies actively increase the performance of the system. Redundancies do not simply remain dormant until needed. This paper includes specific examples of hardware and/or software implementation at all four levels

  1. System-Level Development of Fault-Tolerant Distributed Aero-Engine Control Architecture Project

    National Aeronautics and Space Administration — NASA's vision for an "intelligent engine" will be realized with the development of a truly distributed control system and reliable smart transducer node components;...

  2. Assessing the reliability of diverse fault-tolerant software-based systems

    Littlewood, B.; Popov, P. T.; L. Strigini

    2002-01-01

    We discuss a problem in the safety assessment of automatic control and protection systems. There is an increasing dependence on software for performing safety-critical functions, like the safety shut-down of dangerous plants. Software brings increased risk of design defects and thus systematic failures; redundancy with diversity between redundant channels is a possible defence. While diversity techniques can improve the dependability of software-based systems, they do not alleviate the diffic...

  3. Dependability modelling of a fault tolerant duplex system using AADL and GSPNs

    Rugina, Ana-Elena; Kanoun, Karama; Kaâniche, Mohamed; Guiochet, Jérémie

    2005-01-01

    This research report is intended to explore the possibilities of deriving Generalised Stochastic Petri Nets (GSPNs) dependability models from AADL dependability models in order to estimate dependability measures for computer-based systems.The AADL dependability models are composed of i) AADL architecture models including the various components of the system and ii) their associated AADL error models, as described in Section 3 of this report. Our reference document for describing error models ...

  4. 14 CFR Special Federal Aviation... - Fuel Tank System Fault Tolerance Evaluation Requirements

    2010-01-01

    ... above, if the application was filed before June 6, 2001, the effective date of this SFAR, and the... ignition source within the fuel tank system throughout the operational life of the airplane. (d) The... factors that provide an equivalent level of safety. (e) Each type certificate holder must comply no...

  5. A Structural Analysis Method Formulation for Fault-tolerant Control System Design

    Izadi-Zamanabadi, Roozbeh; Staroswiecki, M

    An analysis of structural model representation has been used to extract available inherent redundant information in the system. The paper presents a refined structured model representation based on bipartite directed graph definition and the necessary condition for sensor fusion based on the...

  6. Self Fault-Tolerance of Protocols: A Case Study

    2000-01-01

    The prerequisite for the existing protocols' correctness is that protocols can be normally operated under the normal conditions, rather than dealing with abnormal conditions.In other words, protocols with the fault-tolerance can not be provided when some fault occurs. This paper discusses the self fault-tolerance of protocols. It describes some concepts and methods for achieving self fault tolerance of protocols. Meanwhile, it provides a case study, investigates a typical protocol that does not satisfy the self fault-tolerance, and gives a new redesign version of this existing protocol using the proposed approach.

  7. Load management in a distributed multimedia streaming environment using a fault-tolerant hierarchical system

    AYBAY, HADİ IŞIK; SHAH, MOHAMMAD AHMED

    2015-01-01

    In contrast to text-only forms of communications, multimedia uses a combination of audiovisual means alongside textual modes of communication. Streaming multimedia is such multimedia that is constantly delivered by a provider of the multimedia to a client. In streaming multimedia the streamed content is continually presented to and received by the end user. Distributed multimedia systems (DMSs) deliver multimedia content to end-users by means of distributed multimedia databases and distribute...

  8. Fault Tolerant Ancillary Function of Power Converters in Distributed Generation Power System within a Microgrid Structure

    Di Tommaso, Antonino O.; Fabio Genduso; Rosario Miceli; Giuseppe Ricco Galluzzo

    2013-01-01

    Distributed generation (DG) is deeply changing the existing distribution networks which become very sophisticated and complex incorporating both active and passive equipment. The simplification of their management can be obtained assuming a structure with small networks, namely, microgrids, reproducing, in a smaller scale, the structure of large networks including production, transmission, and distribution of the electrical energy. Power converters in distributed generation systems carry on s...

  9. Fault tolerant attitude control for small unmanned aircraft systems equipped with an airflow sensor array

    Inspired by sensing strategies observed in birds and bats, a new attitude control concept of directly using real-time pressure and shear stresses has recently been studied. It was shown that with an array of onboard airflow sensors, small unmanned aircraft systems can promptly respond to airflow changes and improve flight performances. In this paper, a mapping function is proposed to compute aerodynamic moments from the real-time pressure and shear data in a practical and computationally tractable formulation. Since many microscale airflow sensors are embedded on the small unmanned aircraft system surface, it is highly possible that certain sensors may fail. Here, an adaptive control system is developed that is robust to sensor failure as well as other numerical mismatches in calculating real-time aerodynamic moments. The advantages of the proposed method are shown in the following simulation cases: (i) feedback pressure and wall shear data from a distributed array of 45 airflow sensors; (ii) 50% failure of the symmetrically distributed airflow sensor array; and (iii) failure of all the airflow sensors on one wing. It is shown that even if 50% of the airflow sensors have failures, the aircraft is still stable and able to track the attitude commands. (paper)

  10. Sliding mode fault detection and fault-tolerant control of smart dampers in semi-active control of building structures

    Yeganeh Fallah, Arash; Taghikhany, Touraj

    2015-12-01

    Recent decades have witnessed much interest in the application of active and semi-active control strategies for seismic protection of civil infrastructures. However, the reliability of these systems is still in doubt as there remains the possibility of malfunctioning of their critical components (i.e. actuators and sensors) during an earthquake. This paper focuses on the application of the sliding mode method due to the inherent robustness of its fault detection observer and fault-tolerant control. The robust sliding mode observer estimates the state of the system and reconstructs the actuators’ faults which are used for calculating a fault distribution matrix. Then the fault-tolerant sliding mode controller reconfigures itself by the fault distribution matrix and accommodates the fault effect on the system. Numerical simulation of a three-story structure with magneto-rheological dampers demonstrates the effectiveness of the proposed fault-tolerant control system. It was shown that the fault-tolerant control system maintains the performance of the structure at an acceptable level in the post-fault case.

  11. Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer. Volume 1: FTMP principles of operation

    Smith, T. B., Jr.; Lala, J. H.

    1983-01-01

    The basic organization of the fault tolerant multiprocessor, (FTMP) is that of a general purpose homogeneous multiprocessor. Three processors operate on a shared system (memory and I/O) bus. Replication and tight synchronization of all elements and hardware voting is employed to detect and correct any single fault. Reconfiguration is then employed to repair a fault. Multiple faults may be tolerated as a sequence of single faults with repair between fault occurrences.

  12. Superior model for fault tolerance computation in designing nano-sized circuit systems

    Singh, N. S. S., E-mail: narinderjit@petronas.com.my; Muthuvalu, M. S., E-mail: msmuthuvalu@gmail.com [Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, Perak (Malaysia); Asirvadam, V. S., E-mail: vijanth-sagayan@petronas.com.my [Electrical and Electronics Engineering Department, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, Perak (Malaysia)

    2014-10-24

    As CMOS technology scales nano-metrically, reliability turns out to be a decisive subject in the design methodology of nano-sized circuit systems. As a result, several computational approaches have been developed to compute and evaluate reliability of desired nano-electronic circuits. The process of computing reliability becomes very troublesome and time consuming as the computational complexity build ups with the desired circuit size. Therefore, being able to measure reliability instantly and superiorly is fast becoming necessary in designing modern logic integrated circuits. For this purpose, the paper firstly looks into the development of an automated reliability evaluation tool based on the generalization of Probabilistic Gate Model (PGM) and Boolean Difference-based Error Calculator (BDEC) models. The Matlab-based tool allows users to significantly speed-up the task of reliability analysis for very large number of nano-electronic circuits. Secondly, by using the developed automated tool, the paper explores into a comparative study involving reliability computation and evaluation by PGM and, BDEC models for different implementations of same functionality circuits. Based on the reliability analysis, BDEC gives exact and transparent reliability measures, but as the complexity of the same functionality circuits with respect to gate error increases, reliability measure by BDEC tends to be lower than the reliability measure by PGM. The lesser reliability measure by BDEC is well explained in this paper using distribution of different signal input patterns overtime for same functionality circuits. Simulation results conclude that the reliability measure by BDEC depends not only on faulty gates but it also depends on circuit topology, probability of input signals being one or zero and also probability of error on signal lines.

  13. Superior model for fault tolerance computation in designing nano-sized circuit systems

    As CMOS technology scales nano-metrically, reliability turns out to be a decisive subject in the design methodology of nano-sized circuit systems. As a result, several computational approaches have been developed to compute and evaluate reliability of desired nano-electronic circuits. The process of computing reliability becomes very troublesome and time consuming as the computational complexity build ups with the desired circuit size. Therefore, being able to measure reliability instantly and superiorly is fast becoming necessary in designing modern logic integrated circuits. For this purpose, the paper firstly looks into the development of an automated reliability evaluation tool based on the generalization of Probabilistic Gate Model (PGM) and Boolean Difference-based Error Calculator (BDEC) models. The Matlab-based tool allows users to significantly speed-up the task of reliability analysis for very large number of nano-electronic circuits. Secondly, by using the developed automated tool, the paper explores into a comparative study involving reliability computation and evaluation by PGM and, BDEC models for different implementations of same functionality circuits. Based on the reliability analysis, BDEC gives exact and transparent reliability measures, but as the complexity of the same functionality circuits with respect to gate error increases, reliability measure by BDEC tends to be lower than the reliability measure by PGM. The lesser reliability measure by BDEC is well explained in this paper using distribution of different signal input patterns overtime for same functionality circuits. Simulation results conclude that the reliability measure by BDEC depends not only on faulty gates but it also depends on circuit topology, probability of input signals being one or zero and also probability of error on signal lines

  14. Error Mitigation of Point-to-Point Communication for Fault-Tolerant Computing

    Akamine, Robert L.; Hodson, Robert F.; LaMeres, Brock J.; Ray, Robert E.

    2011-01-01

    Fault tolerant systems require the ability to detect and recover from physical damage caused by the hardware s environment, faulty connectors, and system degradation over time. This ability applies to military, space, and industrial computing applications. The integrity of Point-to-Point (P2P) communication, between two microcontrollers for example, is an essential part of fault tolerant computing systems. In this paper, different methods of fault detection and recovery are presented and analyzed.

  15. Fault tolerant microcomputer based alarm annunciator for Dhruva reactor

    The Dhruva alarm annunciator displays the status of 624 alarm points on an array of display windows using the standard ringback sequence. Recognizing the need for a very high availability, the system is implemented as a fault tolerant configuration. The annunciator is partitioned into three identical units; each unit is implemented using two microcomputers wired in a hot standby mode. In the event of one computer malfunctioning, the standby computer takes over control in a bouncefree transfer. The use of microprocessors has helped built-in flexibility in the system. The system also provides built-in capability to resolve the sequence of occurrence of events and conveys this information to another system for display on a CRT. This report describes the system features, fault tolerant organisation used and the hardware and software developed for the annunciation function. (author). 8 figs

  16. Algorithm-dependent fault tolerance for distributed computing

    P. D. Hough; M. e. Goldsby; E. J. Walsh

    2000-02-01

    Large-scale distributed systems assembled from commodity parts, like CPlant, have become common tools in the distributed computing world. Because of their size and diversity of parts, these systems are prone to failures. Applications that are being run on these systems have not been equipped to efficiently deal with failures, nor is there vendor support for fault tolerance. Thus, when a failure occurs, the application crashes. While most programmers make use of checkpoints to allow for restarting of their applications, this is cumbersome and incurs substantial overhead. In many cases, there are more efficient and more elegant ways in which to address failures. The goal of this project is to develop a software architecture for the detection of and recovery from faults in a cluster computing environment. The detection phase relies on the latest techniques developed in the fault tolerance community. Recovery is being addressed in an application-dependent manner, thus allowing the programmer to take advantage of algorithmic characteristics to reduce the overhead of fault tolerance. This architecture will allow large-scale applications to be more robust in high-performance computing environments that are comprised of clusters of commodity computers such as CPlant and SMP clusters.

  17. Diagnosis and Fault-tolerant Control, 3rd Edition

    Blanke, Mogens; Kinnaert, Michel; Lunze, Jan; Staroswiecki, Marcel

    The book presents effective model-based analysis and design methods for fault diagnosis and fault-tolerant control. Architectural and structural models are used to analyse the propagation of the fault through the process, to test the fault detectability and to find the redundancies in the process...

  18. H∞ Fault Tolerant Control of WECS Based on the PWA Model

    Yun-Tao Shi

    2014-03-01

    Full Text Available The main contribution of this paper is the development of H∞ fault tolerant control for a wind energy conversion system (WECS based on the stochastic piecewise affine (PWA model. In this paper the normal and fault stochastic PWA models for WECS including multiple working points at different wind speeds are established. A reliable piecewise linear quadratic regulator state feedback is designed for the fault tolerant actuator and sensor. A sufficient condition for the existence of the passive fault tolerant controller is derived based on some linear matrix inequalities (LMIs. It is shown that the H∞ fault tolerant controller of WECS can control the wind turbine exposed to multiple simultaneous sensor faults or actuator faults; that is, the reliability of wind turbines can be improved.

  19. COMPREHENSIVE EVALUATION OF FAULT-TOLERANT PROPERTIES OF REDUNDANT ROBOTS

    ZHAO Jing; FENG Dengdian

    2008-01-01

    When a redundant robot performs a fault-tolerant operation for locked joint failures, its fault tolerant properties should include dexterity and sudden change of joint velocity at the moment of locking failed joints and the dexterity during the post-failure. Firstly three fault-tolerant indexes, reduced condition number, sudden change of relative joint velocity and centrality are proposed, which can comprehensively evaluate the kinematical performance of a redundant robot during its entire fault-tolerant operations. Then, the influence of the initial postures of robot's end-effector on these fault-tolerant indexes is analyzed with a planar robot and a spatial robot. Simulation results show that for a given task the joint trajectory with the best comprehensive effect of fault tolerance can be determined by optimizing the initial posture of a robot.

  20. Real-time and fault tolerance in distributed control software

    Orlic, B.; Broenink, J. F.

    2003-01-01

    Closed loop control systems typically contain multitude of spatially distributed sensors and actuators operated simultaneously. So those systems are parallel and distributed in their essence. But mapping this parallelism onto the given distributed hardware architecture, brings in some additional requirements: safe multithreading, optimal process allocation, real-time scheduling of bus and network resources. Nowadays, fault tolerance methods and fast even online reconfiguration are becoming in...

  1. Fault tolerance in Hadoop MapReduce implementation

    Cogorno, Matías; Rey, Javier; Nesmachnow, Sergio

    2013-01-01

    This document reports the advances on exploring and understanding the fault tolerance mechanisms in Hadoop MapReduce. A description of the current fault tolerance features existing in Hadoop is provided, along with a review of related works on the topic. Finally, the document describes some relevant proposals about fault tolerance worth considering to implement in Hadoop within the PERMARE project in order to provide support for pervasive computing environments.

  2. Design study of Software-Implemented Fault-Tolerance (SIFT) computer

    Wensley, J. H.; Goldberg, J.; Green, M. W.; Kutz, W. H.; Levitt, K. N.; Mills, M. E.; Shostak, R. E.; Whiting-Okeefe, P. M.; Zeidler, H. M.

    1982-01-01

    Software-implemented fault tolerant (SIFT) computer design for commercial aviation is reported. A SIFT design concept is addressed. Alternate strategies for physical implementation are considered. Hardware and software design correctness is addressed. System modeling and effectiveness evaluation are considered from a fault-tolerant point of view.

  3. Model Prediction-Based Approach to Fault Tolerant Control with Applications

    Mahmoud, Professor Magdi S.; Khalid, Dr. Haris M.

    2013-01-01

    Abstract— Fault-tolerant control (FTC) is an integral component in industrial processes as it enables the system to continue robust operation under some conditions. In this paper, an FTC scheme is proposed for interconnected systems within an integrated design framework to yield a timely monitoring and detection of fault and reconfiguring the controller according to those faults. The unscented Kalman filter (UKF)-based fault detection and diagnosis system is initially run on the main plant an...

  4. A Blueprint for a Topologically Fault-tolerant Quantum Computer

    Bonderson, Parsa; Freedman, Michael; Nayak, Chetan

    2010-01-01

    The advancement of information processing into the realm of quantum mechanics promises a transcendence in computational power that will enable problems to be solved which are completely beyond the known abilities of any "classical" computer, including any potential non-quantum technologies the future may bring. However, the fragility of quantum states poses a challenging obstacle for realization of a fault-tolerant quantum computer. The topological approach to quantum computation proposes to surmount this obstacle by using special physical systems -- non-Abelian topologically ordered phases of matter -- that would provide intrinsic fault-tolerance at the hardware level. The so-called "Ising-type" non-Abelian topological order is likely to be physically realized in a number of systems, but it can only provide a universal gate set (a requisite for quantum computation) if one has the ability to perform certain dynamical topology-changing operations on the system. Until now, practical methods of implementing thes...

  5. Robust and Fault-Tolerant Linear Parameter-Varying Control of Wind Turbines

    Sloth, Christoffer; Esbensen, Thomas; Stoustrup, Jakob

    2011-01-01

    , designed using a proposed method that allows the inclusion of both faults and uncertainties in the LPV controller design. We specifically consider a 4.8 MW, variable-speed, variable-pitch wind turbine model with a fault in the pitch system. We propose the design of a nominal controller (NC), handling the...... parameter variations along the nominal operating trajectory caused by nonlinear aerodynamics. To accommodate the fault in the pitch system, an active fault-tolerant controller (AFTC) and a passive fault-tolerant controller (PFTC) are designed. In addition to the nominal LPV controller, we also propose a...

  6. Fault-tolerant distributed mass storage for LHC computing

    Wiebalck, A; Lindenstruth, V; Stinbeck, T M

    2003-01-01

    In this paper we present the concept and first prototyping results of a modular fault-tolerant distributed mass storage architecture for large Linux PC clusters as they are deployed by the upcoming particle physics experiments. The device masquerading technique using an Enhanced Network Block Device (ENBD) enables local RAID over remote disks as the key concept of the ClusterRAID system. The block level interface to remote files, partitions or disks provided by the ENBD makes it possible to use the standard Linux software RAID to add fault-tolerance to the system. Preliminary performance measurements indicate that the latency is comparable to a local hard drive. With four disks throughput rates of up to 55MB/s were achieved with first prototypes for a RAIDO setup, and about 40M/s for a RAID5 setup. (29 refs).

  7. Beam dynamics calculations for fault-tolerance

    The European Transmutation Demonstration requires a high-power proton accelerator operating in CW mode. This accelerator is also expected to have a very limited number of unexpected beam interruptions per year. To reach such an ambitious goal, it is clear that reliability-oriented design practices need to be followed from the early stage of components design and fault-tolerance capabilities have to be introduced to the maximum extent. The goal of this document is precisely to investigate in more details the fault-tolerance capability of the XT-ADS linac. From previous analysis, it appears that if nothing is done, a cavity's failure leads in nearly all the cases to a complete beam loss, due to the non-relativistic varying velocity of the particles. To avoid such a total beam loss, it is clear that some kind of retuning has to be performed to compensate the lack of acceleration due to the faulty cavity. We have to identify and develop fast failure recovery scenarios to ensure that such retuning can be performed in less than 1 second. 2 ways are investigated. The first way is to stop the beam to achieve the retuning (Scenario 1). The other way is to try to perform the retuning without stopping the beam (Scenario 2). The present analysis demonstrates on the beam dynamics point of view that a fast retuning procedure can be envisaged without stopping the beam (Scenario 2). Nevertheless, this Scenario 2 implies stringent specifications, especially on: - the fault detection time, that has to be extremely short (order of magnitude: 100 μs) and - the margins required on the accelerating field and RF power point of view, that are higher than in Scenario 1

  8. Learning Fault-tolerant Speech Parsing with SCREEN

    Wermter, Stefan; Weber, Volker

    1994-01-01

    This paper describes a new approach and a system SCREEN for fault-tolerant speech parsing. SCREEEN stands for Symbolic Connectionist Robust EnterprisE for Natural language. Speech parsing describes the syntactic and semantic analysis of spontaneous spoken language. The general approach is based on incremental immediate flat analysis, learning of syntactic and semantic speech parsing, parallel integration of current hypotheses, and the consideration of various forms of speech related errors. T...

  9. Optimal Configuration of Fault-Tolerance Parameters for Distributed Server Access

    Daidone, Alessandro; Renier, Thibault; Bondavalli, Andrea;

    2013-01-01

    Server replication is a common fault-tolerance strategy to improve transaction dependability for services in communications networks. In distributed architectures, fault-diagnosis and recovery are implemented via the interaction of the server replicas with the clients and other entities such as e...... replicated server architectures. In order to obtain insight into the system behaviour, a set of relevant environment parameters and controllable fault-tolerance parameters are chosen and the dependability/performance trade-off is evaluated....

  10. H∞ Fault Tolerant Control of WECS Based on the PWA Model

    Yun-Tao Shi; Qi Kou; De-Hui Sun; Zheng-Xi Li; Shu-Juan Qiao; Yan-Jiao Hou

    2014-01-01

    The main contribution of this paper is the development of H∞ fault tolerant control for a wind energy conversion system (WECS) based on the stochastic piecewise affine (PWA) model. In this paper the normal and fault stochastic PWA models for WECS including multiple working points at different wind speeds are established. A reliable piecewise linear quadratic regulator state feedback is designed for the fault tolerant actuator and sensor. A sufficient condition for the existence of the passiv...

  11. On the Transition Improvement of EV or HEV Induction Motor Propulsion Sensor Fault-Tolerant Controller

    Tabbache, Bekheira; Benbouzid, Mohamed; Kheloui, Abdelaziz

    2010-01-01

    International audience This technical paper deals with the transition performance improvement of a sensor fault-tolerant controller devoted to Electric (EV) or Hybrid Electric Vehicles (HEV). Indeed, improvements are brought over a previously developed technique that exhibit abrupt changes in the torque if a sensor fault is detected and after a transition from a control technique to another one [1]. The Fault-Tolerant Control (FTC) system firstly concerns the sliding mode control technique...

  12. Fault tolerant control with torque limitation based on fault mode for ten-phase permanent magnet synchronous motor

    Guo Hong

    2015-10-01

    Full Text Available This paper proposes a novel fault tolerant control with torque limitation based on the fault mode for the ten-phase permanent magnet synchronous motor (PMSM under various open-circuit and short-circuit fault conditions, which includes the optimal torque control and the torque limitation control based on the fault mode. The optimal torque control is adopted to guarantee the ripple-free electromagnetic torque operation for the ten-phase motor system under the post-fault condition. Furthermore, we systematically analyze the load capacity of the ten-phase motor system under different fault modes. And a torque limitation control approach based on the fault mode is proposed, which was not available earlier. This approach is able to ensure the safety operation of the faulted motor system in long operating time without causing the overheat fault. The simulation result confirms that the proposed fault tolerant control for the ten-phase motor system is able to guarantee the ripple-free electromagnetic torque and the safety operation in long operating time under the normal and fault conditions.

  13. FAIL-MPI: How fault-tolerant is fault-tolerant MPI ?

    Hérault, Thomas; Hoarau, William; Lemarinier, Pierre; Rodriguez, Eric; Tixeuil, Sébastien

    2006-01-01

    One of the topics of paramount importance in the development of Cluster and Grid middleware is the impact of faults since their occurrence probability in a Grid infrastructure and in large-scale distributed system is actually very high. MPI (Message Passing Interface) is a popular abstraction for programming distributed computation applications. FAIL is an abstract language for fault occurrence description capable of expressing complex and realistic fault scenarios. In this paper, we investig...

  14. Fault-tolerant quantum computation and communication on a distributed 2D array of small local systems

    Fujii, K.; Yamamoto, T.; Imoto, N. [Graduate School of Engineering Science, Osaka University, Toyonaka, Osaka 560-8531 (Japan); Koashi, M. [Photon Science Center, The University of Tokyo, 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8656 (Japan)

    2014-12-04

    We propose a scheme for distributed quantum computation with small local systems connected via noisy quantum channels. We show that the proposed scheme tolerates errors with probabilities ∼30% and ∼ 0.1% in quantum channels and local operations, respectively, both of which are improved substantially compared to the previous works.

  15. Fault Tolerance in ZigBee Wireless Sensor Networks

    Alena, Richard; Gilstrap, Ray; Baldwin, Jarren; Stone, Thom; Wilson, Pete

    2011-01-01

    Wireless sensor networks (WSN) based on the IEEE 802.15.4 Personal Area Network standard are finding increasing use in the home automation and emerging smart energy markets. The network and application layers, based on the ZigBee 2007 PRO Standard, provide a convenient framework for component-based software that supports customer solutions from multiple vendors. This technology is supported by System-on-a-Chip solutions, resulting in extremely small and low-power nodes. The Wireless Connections in Space Project addresses the aerospace flight domain for both flight-critical and non-critical avionics. WSNs provide the inherent fault tolerance required for aerospace applications utilizing such technology. The team from Ames Research Center has developed techniques for assessing the fault tolerance of ZigBee WSNs challenged by radio frequency (RF) interference or WSN node failure.

  16. Fault-tolerance techniques for SRAM-based FPGAs

    Kastensmidt, Fernanda Lima; Reis, Ricardo

    2006-01-01

    Fault-tolerance in integrated circuits is no longer the exclusive concern of space designers or highly-reliable applications engineers. Today, designers of many next-generation products must cope with reduced margin noises. The continuous evolution of fabrication technology of semiconductor components – shrinking transistor geometry, power supply, speed, and logic density – has significantly reduced the reliability of very deep submicron integrated circuits, in face of various internal and external sources of noise. Field Programmable Gate Arrays (FPGAs), customizable by SRAM cells, are the latest advance in the integrated circuit evolution: millions of memory cells to implement the logic, embedded memories, routing, and embedded microprocessors cores. These re-programmable systems-on-chip platforms must be fault-tolerant to cope with current requirements.

  17. Runtime Instrumentation of SystemC/TLM2 Interfaces for Fault Tolerance Requirements Verification in Software Cosimulation

    Antonio da Silva

    2014-01-01

    Full Text Available This paper presents the design of a SystemC transaction level modelling wrapping library that can be used for the assertion of system properties, protocol compliance, or fault injection. The library uses C++ virtual table hooks as a dynamic binary instrumentation technique to inline wrappers in the TLM2 transaction path. This technique can be applied after the elaboration phase and needs neither source code modifications nor recompilation of the top level SystemC modules. The proposed technique has been successfully applied to the robustness verification of the on-board boot software of the Instrument Control Unit of the Solar Orbiter’s Energetic Particle Detector.

  18. Fault-tolerant and Diagnostic Methods for Navigation

    Blanke, Mogens

    2003-01-01

    Precise and reliable navigation is crucial, and for reasons of safety, essential navigation instruments are often duplicated. Hardware redundancy is mostly used to manually switch between instruments should faults occur. In contrast, diagnostic methods are available that can use analytic redundancy...... to diagnose faults and autonomously provide valid navigation data, disregarding any faulty sensor data and use sensor fusion to obtain a best estimate for users. This paper discusses how diagnostic and fault-tolerant methods are applicable in marine systems. An example chosen is sensor fusion for...... navigation. Diagnosis design is based on parity relations and statistical hypothesis tests. Sensor fusion on healthy signals is made using a Kalman filter with inverse covariance updating to deal with asynchronous or missing data from instruments. The paper is presented at a tutorial level....

  19. FTMP - A highly reliable Fault-Tolerant Multiprocessor for aircraft

    Hopkins, A. L., Jr.; Smith, T. B., III; Lala, J. H.

    1978-01-01

    The FTMP (Fault-Tolerant Multiprocessor) is a complex multiprocessor computer that employs a form of redundancy related to systems considered by Mathur (1971), in which each major module can substitute for any other module of the same type. Despite the conceptual simplicity of the redundancy form, the implementation has many intricacies owing partly to the low target failure rate, and partly to the difficulty of eliminating single-fault vulnerability. An extensive analysis of the computer through the use of such modeling techniques as Markov processes and combinatorial mathematics shows that for random hard faults the computer can meet its requirements. It is also shown that the maintenance scheduled at intervals of 200 hr or more can be adequate most of the time.

  20. Active Fault Tolerant Control for Ultrasonic Piezoelectric Motor

    Boukhnifer, Moussa

    2012-07-01

    Ultrasonic piezoelectric motor technology is an important system component in integrated mechatronics devices working on extreme operating conditions. Due to these constraints, robustness and performance of the control interfaces should be taken into account in the motor design. In this paper, we apply a new architecture for a fault tolerant control using Youla parameterization for an ultrasonic piezoelectric motor. The distinguished feature of proposed controller architecture is that it shows structurally how the controller design for performance and robustness may be done separately which has the potential to overcome the conflict between performance and robustness in the traditional feedback framework. A fault tolerant control architecture includes two parts: one part for performance and the other part for robustness. The controller design works in such a way that the feedback control system will be solely controlled by the proportional plus double-integral PI2 performance controller for a nominal model without disturbances and H∞ robustification controller will only be activated in the presence of the uncertainties or an external disturbances. The simulation results demonstrate the effectiveness of the proposed fault tolerant control architecture.

  1. Analysis of a cascaded multilevel inverter with fault-tolerant control

    Jesús Aguayo Alquicira

    2011-08-01

    Full Text Available Cascaded multilevel inverters are widely used in industry for speed control of induction motors and, even when the converters’ operation is highly reliable, several faults can occur, leading to poor engine performance or even causing the whole system to stop. It is desirable to keep the system operational when a failure occurs, even when degraded, and implementing fault-tolerant systems are thus a good choice. This paper presents a general strategy for fault-tolerant control in a 7-level cascaded multilevel inverter (the faults are in semiconductor devices; the paper includes simulation and experimental results to validate the method.

  2. Fault-tolerant and efficient parallel computation. Doctoral thesis

    Shvartsman, A.A.

    1992-05-01

    Recent advances in computer technology made parallel machines a reality. Massively parallel systems use many general-purpose, inexpensive processing elements to attain computation speed-ups comparable to or better than those achieved by expensive, specialized machines with a small number of fast processors. In such setting, however, one would expect to see an increased number of processor failures attributable to hardware or software. This may eliminate the potential advantage of parallel computation. We believe that this presents a reliability bottleneck that is among fundamental problems in parallel computation. We investigate algorithmic ways of introducing fault-tolerance in multiprocessors under the constraint of preserving efficiency. This research demonstrates how in certain models of parallel computation it is possible to combine efficiency and fault-tolerance. We show that in the models we study, it is possible to develop efficient parallel algorithms without concern for fault-tolerance, and then correctly and efficiently execute these algorithms on parallel machines whose processors are subject to arbitrary dynamic failstop errors. By ensuring efficient executions for any patterns of failures, the efficiency is also maintained when failures are infrequent, or when the expected number of failures is small.

  3. On the Practicality of `Practical' Byzantine Fault Tolerance

    Chondros, Nikos; Roussopoulos, Mema

    2011-01-01

    Byzantine Fault Tolerant (BFT) systems are considered by the systems research community to be state of the art with regards to providing reliability in distributed systems. BFT systems provide safety and liveness guarantees with reasonable assumptions, amongst a set of nodes where at most f nodes display arbitrarily incorrect behaviors, known as Byzantine faults. Despite this, BFT systems are still rarely used in practice. In this paper we describe our experience, from an application developer's perspective, trying to leverage the publicly available and highly-tuned PBFT middleware (by Castro and Liskov), to provide provable reliability guarantees for an electronic voting application with high security and robustness needs. We describe several obstacles we encountered and drawbacks we identified in the PBFT approach. These include some that we tackled, such as lack of support for dynamic client management and leaving state management completely up to the application. Others still remaining include the lack of...

  4. Fault diagnosis and fault-tolerant control and guidance for aerospace vehicles from theory to application

    Zolghadri, Ali; Cieslak, Jerome; Efimov, Denis; Goupil, Philippe

    2014-01-01

    Fault Diagnosis and Fault-Tolerant Control and Guidance for Aerospace demonstrates the attractive potential of recent developments in control for resolving such issues as improved flight performance, self-protection and extended life of structures. Importantly, the text deals with a number of practically significant considerations: tuning, complexity of design, real-time capability, evaluation of worst-case performance, robustness in harsh environments, and extensibility when development or adaptation is required. Coverage of such issues helps to draw the advanced concepts arising from academic research back towards the technological concerns of industry. Initial coverage of basic definitions and ideas and a literature review gives way to a treatment of important electrical flight control system failures: the oscillatory failure case, runaway, and jamming. Advanced fault detection and diagnosis for linear and nonlinear systems are described. Lastly recovery strategies appropriate to remaining acuator/sensor/c...

  5. Guaranteed Cost Fault-tolerant Control of Networked Control Systems with Short Output Delay and Short Control Delay Based on State Observer

    Xiaomao Huang

    2013-04-01

    Full Text Available Supposing that the sensor and controller nodes were time-driven and the actuator node was event-driven, the problem of integrity against sensor failures for the networked control systems with short output delay and short control delay was discussed based on observer. The state observer of the system according to the time-delay compensation strategy was designed. Then, considering possible sensor failures, an augmented mathematic model for the networked control systems based on observer was developed. In terms of the given quadratic performance index function, the integrity condition of the system was given and the designs for guaranteed cost fault-tolerant controller and observer were presented respectively by using the cooperative design approach of the controller and observer and the approach of bilinear matrix inequalities. Finally, a numerical simulation example demonstrated the conclusions are feasible and effective. The proposed control method meets the requirements in industrial networked control systems.

  6. Architecting Fault Tolerance with Exception Handling: Verification and Validation

    Patrick H. S. Brito; Rogério de Lemos; Cecília M. F. Rubira; Eliane Martins

    2009-01-01

    When building dependable systems by integrating untrusted software components that were not originally designed to interact with each other, it is likely the occurrence of architectural mismatches related to assumptions in their failure behaviour. These mismatches, if not prevented during system design, have to be tolerated during runtime. This paper presents an architectural abstraction based on exception handling for structuring fault-tolerant software systems.This abstraction comprises several components and connectors that promote an existing untrusted software element into an idealised fault-tolerant architectural element. Moreover, it is considered in the context of a rigorous software development approach based on formal methods for representing the structure and behaviour of the software architecture. The proposed approach relies on a formal specification and verification for analysing exception propagation, and verifying important dependability properties, such as deadlock freedom, and scenarios of architectural reconfiguration. The formal models are automatically generated using model transformation from UML diagrams: component diagram representing the system structure, and sequence diagrams representing the system behaviour. Finally, the formal models are also used for generating unit and integration test cases that are used for assessing the correctness of the source code. The feasibility of the proposed architectural approach was evaluated on an embedded critical case study.

  7. Observer-based Fault Detection and Isolation for Nonlinear Systems

    Lootsma, T.F.

    -tolerance can be applied to ordinary industrial processes that are not categorized as high risk applications, but where high availability is desirable. The quality of fault-tolerant control is totally dependent on the quality of the underlying algorithms. They detect possible faults, and later reconfigure......With the rise in automation the increase in fault detectionand isolation & reconfiguration is inevitable. Interest in fault detection and isolation (FDI) for nonlinear systems has grown significantly in recent years. The design of FDI is motivated by the need for knowledge about occurring faults in...... fault-tolerant control systems (FTC systems). The idea of FTC systems is to detect, isolate, and handle faults in such a way that the systems can still perform in a required manner. One prefers reduced performance after occurrence of a fault to the shut down of (sub-) systems. Hence, the idea of fault...

  8. Observer-based Fault Detection and Isolation for Nonlinear Systems

    Lootsma, T.F.

    2001-01-01

    With the rise in automation the increase in fault detectionand isolation & reconfiguration is inevitable. Interest in fault detection and isolation (FDI) for nonlinear systems has grown significantly in recent years. The design of FDI is motivated by the need for knowledge about occurring faults in fault-tolerant control systems (FTC systems). The idea of FTC systems is to detect, isolate, and handle faults in such a way that the systems can still perform in a required manner. One prefers...

  9. Hybrid system modeling and fault diagnosis of fault tolerant inverter%新型容错逆变器的混杂系统建模与故障诊断

    李宁; 李颖晖; 朱喜华; 雷洪利; 俞佳

    2012-01-01

    考虑到传统开关函数模型只对逆变电路的控制变迁进行分析而无法描述电路的条件变迁,容易丢失电路在条件变迁过程中所表现出的故障信息,从而影响故障诊断的实时性和准确性.电力电子电路是典型的混杂系统,文章建立了一种新型容错逆变器的混杂系统模型,在此基础上提出了故障事件辨识向量的概念,并将其应用于逆变器故障的分析,仅通过故障事件辨识向量对逆变电路变迁过程中的故障事件进行辨识来完成故障诊断,对于电路变迁过程中的正常事件辨识不予考虑,减少了控制系统的数据处理量,提高了故障诊断的实时性和可靠性,实验验证了所提方法的可行性和有效性.%The traditional switch function model of inverters could analyze the control change of inverters, but could not be used to describe conditions change. The fault information contained in conditions change may be lost, and the real-time performance and accuracy of fault diagnosis will be reduced,Power electronic circuits were typical hybrid system. The paper established the hybrid system model for a fault-tolerant inverter, and then put forward the concept of incidents identification vector,which was applied to analyze the fault of inverter. It finished the fault diagnosis only through the identification of fault events by incidents identification vector, which could reduce the data capacity of control system, and the real time performance and reliability of fault diagnosis were improved- The simulation and experiment results show the feasibility and validity of the proposed method.

  10. Fault Detection and Isolation and Fault Tolerant Control of Wind Turbines Using Set-Valued Observers

    Casau, Pedro; Rosa, Paulo Andre Nobre; Tabatabaeipour, Seyed Mojtaba;

    2012-01-01

    account process disturbances, uncertainty and sensor noise. The FTC strategy takes advantage of the proposed FDI algorithm, enabling the controller reconfiguration shortly after fault events. Additionally, a robust controller is designed so as to increase the wind turbine's performance during low severity......Research on wind turbine Operations & Maintenance (O&M) procedures is critical to the expansion of Wind Energy Conversion systems (WEC). In order to reduce O&M costs and increase the lifespan of the turbine, we study the application of Set-Valued Observers (SVO) to the problem of Fault Detection...... and Isolation (FDI) and Fault Tolerant Control (FTC) of wind turbines, by taking advantage of the recent advances in SVO theory for model invalidation. A simple wind turbine model is presented along with possible faulty scenarios. The FDI algorithm is built on top of the described model, taking into...

  11. Multiple Fault Isolation in Redundant Systems

    Shakeri, M.; Pattipati, Krishna R.; Raghavan, V.; Patterson-Hine, Ann; Iverson, David L.

    1997-01-01

    We consider the problem of sequencing tests to isolate multiple faults in redundant (fault-tolerant) systems with minimum expected testing cost (time). It can be shown that single faults and minimal faults, i.e., minimum number of failures with a failure signature different from the union of failure signatures of individual failures, together with their failure signatures, constitute the necessary information for fault diagnosis in redundant systems. In this paper, we develop an algorithm to find all the minimal faults and their failure signatures. Then, we extend the Sure diagnostic strategies [1] of our previous work to diagnose multiple faults in redundant systems. The proposed algorithms and strategies are illustrated using several examples.

  12. A Fault-Tolerant Scheme of Holonomic Quantum Computation on Stabilizer Codes with Robustness to Low-weight Thermal Noise

    Zheng, Yi-Cong; Brun, Todd A.

    2013-01-01

    We show an equivalence relation between fault-tolerant circuits for a stabilizer code and fault-tolerant adiabatic processes for holonomic quantum computation (HQC), in the case where quantum information is encoded in the degenerated ground space of the system Hamiltonian. By this equivalence, we can systematically construct a fault-tolerant HQC scheme, which can geometrically implement a universal set of encoded quantum gates by adiabatically deforming the system Hamiltonian. During this pro...

  13. Hybrid fault tolerance techniques to detect transient faults in embedded processors

    Azambuja, José Rodrigo; Becker, Jürgen

    2014-01-01

    This book describes fault tolerance techniques based on software and hardware to create hybrid techniques. They are able to reduce overall performance degradation and increase error detection when associated with applications implemented in embedded processors. Coverage begins with an extensive discussion of the current state-of-the-art in fault tolerance techniques. The authors then discuss the best trade-off between software-based and hardware-based techniques and introduce novel hybrid techniques. Proposed techniques increase existing fault detection rates up to 100%, while maintaining low performance overheads in area and application execution time. • Discusses the effects of radiation on modern integrated circuits; • Provides a comprehensive overview of state-of-the art fault tolerance techniques based on software, hardware, and hybrid techniques; • Introduces novel hybrid fault tolerance techniques for reconfigurable FPGAs and ASICs; • Performs fault injection campaigns by simulation, bitstream ...

  14. Fusion of Built in Test (BIT) Technologies with Embeddable Fault Tolerant Techniques for Power System and Drives in Space Exploration Project

    National Aeronautics and Space Administration — Impact Technologies has proposed development of an effective prognostic and fault accommodation system for critical DC power systems including PV systems. Overall...

  15. A New Fault-tolerant Switched Reluctance Motor with reliable fault detection capability

    Lu, Kaiyuan

    2014-01-01

    For reliable fault detection, often, search coils are used in many fault-tolerant drives. The search coils occupy extra slot space. They are normally open-circuited and are not used for torque production. This degrades the motor performance, increases the cost and manufacture complexity. A new...... Fault-Tolerant Switched Reluctance (FTSR) motor is proposed in this paper. A unique feature of this special design is that it allows use of the unexcited phase coils as search coils for fault detection. Therefore this new motor has all the advantages of using search coils for reliable fault detection...

  16. Adaptive Fault Tolerant Execution of Multi-Robot Missions using Behavior Trees

    Colledanchise, Michele; Marzinotto, Alejandro; Dimarogonas, Dimos V.; Ögren, Petter

    2015-01-01

    Multi-robot teams offer possibilities of improved performance and fault tolerance, compared to single robot solutions. In this paper, we show how to realize those possibilities when starting from a single robot system controlled by a Behavior Tree (BT). By extending the single robot BT to a multi-robot BT, we are able to combine the fault tolerant properties of the BT, in terms of built-in fallbacks, with the fault tolerance inherent in multi-robot approaches, in terms of a faulty robot being...

  17. Design of Fault Tolerant Network Interfaces for NoCs

    Fiorin, Leandro; Micconi, Laura; Sami, Mariagiovanna

    2011-01-01

    Networks-on-Chip (NoCs) appeared as a strategy to deal with the communication requirements of complex IP-based System-on-Chips. As the complexity of designs increases and the technology scales down into the deep-submicron domain, the probability of malfunctions and failures in the NoC components...... increases. This paper focuses on the study and evaluation of techniques for increasing reliability and resilience of Network Interfaces (NIs). NIs act as interfaces between IP cores and the communication infrastructure; a faulty behavior in them could affect therefore the overall system. In this work, we...... propose a functional fault model for the NI components, and we present a two-level fault tolerant solution that can be employed for mitigating the effects of both single-event upset soft errors and hard errors on the NI. Experiments show that with a limited overhead we can obtain a significant reliability...

  18. A Framework-Based Approach for Fault-Tolerant Service Robots

    Heejune Ahn

    2012-11-01

    Full Text Available Recently the component‐based approach has become a major trend in intelligent service robot development due to its reusability and productivity. The framework in a component‐based system should provide essential services for application components. However, to our knowledge the existing robot frameworks do not yet support fault tolerance service. Moreover, it is often believed that faults can be handled only at the application level. In this paper, by extending the robot framework with the fault tolerance function, we argue that the framework‐based fault tolerance approach is feasible and even has many benefits, including that: 1 the system integrators can build fault tolerance applications from non‐fault‐aware components; 2 the constraints of the components and the operating environment can be considered at the time of integration, which ‐ cannot be anticipated eaily at the time of component development; 3 consistency in system reliability can be obtained even in spite of diverse application component sources. In the proposed construction, we build XML rule files defining the rules for probing and determining the fault conditions of each component, contamination cases from a faulty component, and the possible recovery and safety methods. The rule files are established by a system integrator and the fault manager in the framework controls the fault tolerance process according to the rules. We demonstrate that the fault‐tolerant framework can incorporate widely accepted fault tolerance techniques. The effectiveness and real‐time performance of the framework‐based approach and its techniques are examined by testing an autonomous mobile robot in typical fault scenarios.

  19. Local fault-tolerant quantum computation

    We analyze and study the effects of locality on the fault-tolerance threshold for quantum computation. We analytically estimate how the threshold will depend on a scale parameter r which characterizes the scale-up in the size of the circuit due to encoding. We carry out a detailed seminumerical threshold analysis for concatenated coding using the seven-qubit CSS code in the local and the 'nonlocal' setting. First, we find that the threshold in the local model for the [7,1,3] code has a 1/r dependence, which is in correspondence with our analytical estimate. Second, the threshold, beyond the 1/r dependence, does not depend too strongly on the noise levels for transporting qubits. Beyond these results, we find that it is important to look at more than one level of concatenation in order to estimate the threshold and that it may be beneficial in certain places, like in the transportation of qubits, to do error correction only infrequently

  20. A novel fault tolerant permanent magnet synchronous motor with improved optimal torque control for aerospace application

    Guo Hong

    2015-04-01

    Full Text Available Improving fault tolerant performance of permanent magnet synchronous motor has always been the central issue of the electrically supplied actuator for aerospace application. In this paper, a novel fault tolerant permanent magnet synchronous motor is proposed, which is characterized by two stators and two rotors on the same shaft with a circumferential displacement of mechanical angle of 4.5°. It helps to reduce the cogging torque. Each segment of the stator and the rotor can be considered as an 8-pole/10-slot five-phase permanent magnet synchronous motor with concentrated, single-layer and alternate teeth wound winding, which enhance the fault isolation capacity of the motor. Furthermore, the motor has high phase inductance to restrain the short-circuit current. In addition, an improved optimal torque control strategy is proposed to make the motor work well under the open-circuit fault and short-circuit fault conditions. Simulation and experiment results show that the proposed fault tolerant motor system has excellent fault tolerant capacity, which is able to operate continuously under the third open-circuit fault and second short-circuit fault condition without system performance degradation, which was not available earlier.

  1. Reliability and fault tolerance in the European ADS project

    Biarrotte, Jean-Luc

    2013-01-01

    After an introduction to the theory of reliability, this paper focuses on a description of the linear proton accelerator proposed for the European ADS demonstration project. Design issues are discussed and examples of cases of fault tolerance are given.

  2. Planning the deployment of fault-tolerant wireless sensor networks

    Sitanayah, Lanny

    2013-01-01

    Since Wireless Sensor Networks (WSNs) are subject to failures, fault-tolerance becomes an important requirement for many WSN applications. Fault-tolerance can be enabled in different areas of WSN design and operation, including the Medium Access Control (MAC) layer and the initial topology design. To be robust to failures, a MAC protocol must be able to adapt to traffic fluctuations and topology dynamics. We design ER-MAC that can switch from energy-efficient operation in norma...

  3. Design of passive fault-tolerant controllers of a quadrotor based on sliding mode theory

    Merheb Abdel-Razzak

    2015-09-01

    Full Text Available Abstract In this paper, sliding mode control is used to develop two passive fault tolerant controllers for an AscTec Pelican UAV quadrotor. In the first approach, a regular sliding mode controller (SMC augmented with an integrator uses the robustness property of variable structure control to tolerate partial actuator faults. The second approach is a cascaded sliding mode controller with an inner and outer SMC loops. In this configuration, faults are tolerated in the fast inner loop controlling the velocity system. Tuning the controllers to find the optimal values of the sliding mode controller gains is made using the ecological systems algorithm (ESA, a biologically inspired stochastic search algorithm based on the natural equilibrium of animal species. The controllers are tested using SIMULINK in the presence of two different types of actuator faults, partial loss of motor power affecting all the motors at once, and partial loss of motor speed. Results of the quadrotor following a continuous path demonstrated the effectiveness of the controllers, which are able to tolerate a significant number of actuator faults despite the lack of hardware redundancy in the quadrotor system. Tuning the controller using a faulty system improves further its ability to afford more severe faults. Simulation results show that passive schemes reserve their important role in fault tolerant control and are complementary to active techniques

  4. Passive Fault tolerant Control of an Inverted Double Pendulum

    Niemann, H.; Stoustrup, Jakob

    2003-01-01

    A passive fault tolerant control scheme is suggested, in which a nominal controller is augmented with an additional block, which guarantees stability and performance after the occurrence of a fault. The method is based on the Youla parameterization, which requires the nominal controller to be...

  5. Fault-tolerant Sensor Fusion for Marine Navigation

    Blanke, Mogens

    2006-01-01

    Reliability of navigation data are critical for steering and manoeuvring control, and in particular so at high speed or in critical phases of a mission. Should faults occur, faulty instruments need be autonomously isolated and faulty information discarded. This paper designs a navigation solution...... events where the fault-tolerant sensor fusion provided uninterrupted navigation data despite temporal instrument defects...

  6. Optimal torque control of fault-tolerant permanent magnet brushloss machines

    Wang, J B; Atallah, K.; Howe, D

    2003-01-01

    Describes a novel optimal torque control strategy for fault-tolerant permanent magnet brushless ac drives operating in both constant torque and constant power modes. The proposed control strategy enables ripple-free torque operation to be achieved while minimizing the copper loss under voltage and current constraints. The utility of the proposed strategy is demonstrated by computer simulations on a five-phase fault-tolerant drive system.

  7. SEIF: Secure and Efficient Intrusion Fault tolerant protocol for Wireless Sensor Networks

    Ouadjaout, Abdelraouf; Challal, Yacine; Lasla, Noureddine; Bagaa, Mouloud

    2008-01-01

    In wireless sensor networks, reliability represents a design goal of a primary concern. To build a comprehensive reliable system, it is essential to consider node failures and intruder attacks as unavoidable phenomena. In this paper, we present a new intrusion-fault tolerant routing scheme offering a high level of reliability through a secure multi-path communication topology. Unlike existing intrusion-fault tolerant solutions, our protocol is based on a distributed and in-network verificatio...

  8. A Remote Characterization System and a fault-tolerant tracking system for subsurface mapping of buried waste sites

    This paper describes two closely related projects that will provide new technology for characterizing hazardous waste burial sites. The first project, a collaborative effort by five of the national laboratories, involves the development and demonstration of a remotely controlled site characterization system. The Remote Characterization System (RCS) includes a unique low-signature survey vehicle, a base station, radio telemetry data links, satellite-based vehicle tracking, stereo vision, and sensors for noninvasive inspection of the surface and subsurface. The second project, conducted by the Idaho National Engineering Laboratory (INEL), involves the development of a position sensing system that can track a survey vehicle or instrument in the field. This system can coordinate updates at a rate of 200/s with an accuracy better than 0.1% of the distance separating the target and the sensor. It can employ acoustic or electromagnetic signals in a wide range of frequencies and can be operated as a passive or active device

  9. A Remote Characterization System and a fault-tolerant tracking system for subsurface mapping of buried waste sites

    Sandness, G.A.; Bennett, D.W. [Pacific Northwest Lab., Richland, WA (United States); Martinson, L. [Westinghouse Idaho Nuclear Co., Inc., Idaho Falls, ID (United States); Bingham, D.N.; Anderson, A.A. [EG and G Idaho, Inc., Idaho Falls, ID (United States)

    1992-08-01

    This paper describes two closely related projects that will provide new technology for characterizing hazardous waste burial sites. The first project, a collaborative effort by five of the national laboratories, involves the development and demonstration of a remotely controlled site characterization system. The Remote Characterization System (RCS) includes a unique low-signature survey vehicle, a base station, radio telemetry data links, satellite-based vehicle tracking, stereo vision, and sensors for noninvasive inspection of the surface and subsurface. The second project, conducted by the Idaho National Engineering Laboratory (INEL), involves the development of a position sensing system that can track a survey vehicle or instrument in the field. This system can coordinate updates at a rate of 200/s with an accuracy better than 0.1% of the distance separating the target and the sensor. It can employ acoustic or electromagnetic signals in a wide range of frequencies and can be operated as a passive or active device.

  10. Robust Adaptive Fault-Tolerant Tracking Control of Three-Phase Induction Motor

    Hossein Tohidi; Koksal Erenturk

    2014-01-01

    This paper deals with the problem of induction motor tracking control against actuator faults and external disturbances using the linear matrix inequalities (LMIs) method and the adaptive method. A direct adaptive fault-tolerant tracking controller design method is developed based on Lyapunov stability theory and a constructive algorithm based on linear matrix inequalities for online tuning of adaptive and state feedback gains to stabilize the closed-loop system in order to reduce the fault e...

  11. Particle Filter Based Fault-tolerant ROV Navigation using Hydro-acoustic Position and Doppler Velocity Measurements

    Zhao, Bo; Blanke, Mogens; Skjetne, Roger

    2012-01-01

    This paper presents a fault tolerant navigation system for a remotely operated vehicle (ROV). The navigation system uses hydro-acoustic position reference (HPR) and Doppler velocity log (DVL) measurements to achieve an integrated navigation. The fault tolerant functionality is based on a modied p...

  12. Design of passive fault-tolerant flight controller against actuator failures

    Yu Xiang

    2015-02-01

    Full Text Available The problem of designing passive fault-tolerant flight controller is addressed when the normal and faulty cases are prescribed. First of all, the considered fault and fault-free cases are formed by polytopes. As considering that the safety of a post-fault system is directly related to the maximum values of physical variables in the system, peak-to-peak gain is selected to represent the relationships among the amplitudes of actuator outputs, system outputs, and reference commands. Based on the parameter dependent Lyapunov and slack methods, the passive fault-tolerant flight controllers in the absence/presence of system uncertainty for actuator failure cases are designed, respectively. Case studies of an airplane under actuator failures are carried out to validate the effectiveness of the proposed approach.

  13. Compilation and Synthesis for Fault-Tolerant Digital Microfluidic Biochips

    Alistar, Mirela

    decides the introduction of the redundancy required for fault-tolerance. We consider both time redundancy, i.e., re-executing erroneous operations, and space redundancy, i.e., creating redundant droplets for fault-tolerance. Error recovery is performed such that the number of transient faults tolerated is...... routing of the operations in the application. During the execution of a bioassay, operations could experience transient faults, thus impacting negatively the correctness of the application. We have proposed both offline (design time) and online (runtime) recovery strategies. The online recovery strategy......-regular application-specific architectures are common in practice. Hence, we have proposed an approach to the synthesis of application-specific architectures, such that the cost is minimized and the timing constraints of the application are satisfied. We propose an algorithm to build a library of non-regular modules...

  14. Dynamic Output Feedback Based Active Decentralized Fault-Tolerant Control for Reconfigurable Manipulator with Concurrent Failures

    Yuanchun Li

    2015-01-01

    Full Text Available The goal of this paper is to describe an active decentralized fault-tolerant control (ADFTC strategy based on dynamic output feedback for reconfigurable manipulators with concurrent actuator and sensor failures. Consider each joint module of the reconfigurable manipulator as a subsystem, and treat the fault as the unknown input of the subsystem. Firstly, by virtue of linear matrix inequality (LMI technique, the decentralized proportional-integral observer (DPIO is designed to estimate and compensate the sensor fault online; hereafter, the compensated system model could be derived. Then, the actuator fault is estimated similarly by another DPIO using LMI as well, and the sufficient condition of the existence of H∞ fault-tolerant controller in the dynamic output feedback is presented for the compensated system model. Furthermore, the dynamic output feedback controller is presented based on the estimation of actuator fault to realize active fault-tolerant control. Finally, two 3-DOF reconfigurable manipulators with different configurations are employed to verify the effectiveness of the proposed scheme in simulation. The main advantages of the proposed scheme lie in that it can handle the concurrent faults act on the actuator and sensor on the same joint module, as well as there is no requirement of fault detection and isolation process; moreover, it is more feasible to the modularity of the reconfigurable manipulator.

  15. Fault tolerant wind speed estimator used in wind turbine controllers

    Odgaard, Peter Fogh; Stoustrup, Jakob

    2012-01-01

    Advanced control schemes can be used to optimize energy production and cost of energy in modern wind turbines. These control schemes most often rely on wind speed estimations. These designs of wind speed estimators are, however, not designed to be fault tolerant towards faults in the used sensors...... applying the proposed wind speed estimator to a simulation model of a wind turbine. Notice that since the faults are accommodated in the observer scheme the actual controller do not need to be adjusted or reconfigured to accommodate the sensor faults....

  16. Hardware and software fault tolerance - A unified architectural approach

    Lala, Jaynarayan H.; Alger, Linda S.

    1988-01-01

    The loss of hardware fault tolerance which often arises when design diversity is used to improve the fault tolerance of computer software is considered analytically, and a unified design approach is proposed to avoid the problem. The fundamental theory of fault-tolerant (FT) architectures is reviewed; the current status of design-diversity software development is surveyed; and the FT-processor/attached-processor (FTP/AP) architecture developed by Lala et al. (1986) is described in detail and illustrated with diagrams. FTP/AP is shown to permit efficient implementation of N-version FT software while still tolerating random hardware failures with very high coverage; the reliability is found to be significantly higher than that of conventional majority-vote N-version software.

  17. Improvement of Matrix Converter Drive Reliability by Online Fault Detection and a Fault-Tolerant Switching Strategy

    Nguyen-Duy, Khiem; Liu, Tian-Hua; Chen, Der-Fa

    2011-01-01

    The matrix converter system is becoming a very promising candidate to replace the conventional two-stage ac/dc/ac converter, but system reliability remains an open issue. The most common reliability problem is that a bidirectional switch has an open-switch fault during operation. In this paper, a...... matrix converter driving a speed-controlled permanent-magnet synchronous motor is examined under a single open-switch fault. First, a new fault-detection method is proposed using only the motor currents. Second, a novel fault-tolerant switching strategy is presented. By treating the matrix converter as a...... two-stage rectifier/inverter, existing modulation techniques for the inverter stage can be reused, whereas the rectifier stage is modified by control to counteract the fault. However, the proposed techniques require no additional hardware devices or circuit modifications to the matrix converter...

  18. Design of neuro fuzzy fault tolerant control using an adaptive observer

    New methodologies and concepts are developed in the control theory to meet the ever-increasing demands in industrial applications. Fault detection and diagnosis of technical processes have become important in the course of progressive automation in the operation of groups of electric drives. When a group of electric drives is under operation, fault tolerant control becomes complicated. For multiple motors in operation, fault detection and diagnosis might prove to be difficult. Estimation of all states and parameters of all drives is necessary to analyze the actuator and sensor faults. To maintain system reliability, detection and isolation of failures should be performed quickly and accurately, and hardware should be properly integrated. Luenberger full order observer can be used for estimation of the entire states in the system for the detection of actuator and sensor failures. Due to the insensitivity of the Luenberger observer to the system parameter variations, state estimation becomes inaccurate under the varying parameter conditions of the drives. Consequently, the estimation performance deteriorates, resulting in ordinary state observers unsuitable for fault detection technique. Therefore an adaptive observe, which can estimate the system states and parameter and detect the faults simultaneously, is designed in our paper. For a Group of D C drives, there may be parameter variations for some of the drives, and for other drives, there may not be parameter variations depending on load torque, friction, etc. So, estimation of all states and parameters of all drives is carried out using an adaptive observer. If there is any deviation with the estimated values, it is understood that fault has occurred and the nature of the fault, whether sensor fault or actuator fault, is determined by neural fuzzy network, and fault tolerant control is reconfigured. Experimental results with neuro fuzzy system using adaptive observer-based fault tolerant control are good, so as

  19. Coordinated Fault-Tolerance for High-Performance Computing Final Project Report

    Panda, Dhabaleswar Kumar [The Ohio State University; Beckman, Pete

    2011-07-28

    With the Coordinated Infrastructure for Fault Tolerance Systems (CIFTS, as the original project came to be called) project, our aim has been to understand and tackle the following broad research questions, the answers to which will help the HEC community analyze and shape the direction of research in the field of fault tolerance and resiliency on future high-end leadership systems. Will availability of global fault information, obtained by fault information exchange between the different HEC software on a system, allow individual system software to better detect, diagnose, and adaptively respond to faults? If fault-awareness is raised throughout the system through fault information exchange, is it possible to get all system software working together to provide a more comprehensive end-to-end fault management on the system? What are the missing fault-tolerance features that widely used HEC system software lacks today that would inhibit such software from taking advantage of systemwide global fault information? What are the practical limitations of a systemwide approach for end-to-end fault management based on fault awareness and coordination? What mechanisms, tools, and technologies are needed to bring about fault awareness and coordination of responses on a leadership-class system? What standards, outreach, and community interaction are needed for adoption of the concept of fault awareness and coordination for fault management on future systems? Keeping our overall objectives in mind, the CIFTS team has taken a parallel fourfold approach. Our central goal was to design and implement a light-weight, scalable infrastructure with a simple, standardized interface to allow communication of fault-related information through the system and facilitate coordinated responses. This work led to the development of the Fault Tolerance Backplane (FTB) publish-subscribe API specification, together with a reference implementation and several experimental implementations on top of

  20. Adaptive Fault Tolerance for Many-Core Based Space-Borne Computing

    James, Mark; Springer, Paul; Zima, Hans

    2010-01-01

    This paper describes an approach to providing software fault tolerance for future deep-space robotic NASA missions, which will require a high degree of autonomy supported by an enhanced on-board computational capability. Such systems have become possible as a result of the emerging many-core technology, which is expected to offer 1024-core chips by 2015. We discuss the challenges and opportunities of this new technology, focusing on introspection-based adaptive fault tolerance that takes into account the specific requirements of applications, guided by a fault model. Introspection supports runtime monitoring of the program execution with the goal of identifying, locating, and analyzing errors. Fault tolerance assertions for the introspection system can be provided by the user, domain-specific knowledge, or via the results of static or dynamic program analysis. This work is part of an on-going project at the Jet Propulsion Laboratory in Pasadena, California.

  1. Fault-Tolerant Control Strategy for Steering Failures in Wheeled Planetary Rovers

    Alexandre Carvalho Leite

    2012-01-01

    Full Text Available Fault-tolerant control design of wheeled planetary rovers is described. This paper covers all steps of the design process, from modeling/simulation to experimentation. A simplified contact model is used with a multibody simulation model and tuned to fit the experimental data. The nominal mode controller is designed to be stable and has its parameters optimized to improve tracking performance and cope with physical boundaries and actuator saturations. This controller was implemented in the real rover and validated experimentally. An impact analysis defines the repertory of faults to be handled. Failures in steering joints are chosen as fault modes; they combined six fault modes and a total of 63 possible configurations of these faults. The fault-tolerant controller is designed as a two-step procedure to provide alternative steering and reuse the nominal controller in a way that resembles a crab-like driving mode. Three fault modes are injected (one, two, and three failed steering joints in the real rover to evaluate the response of the nonreconfigured and reconfigured control systems in face of these faults. The experimental results justify our proposed fault-tolerant controller very satisfactorily. Additional concluding comments and an outlook summarize the lessons learned during the whole design process and foresee the next steps of the research.

  2. Superconducting generator field winding design for high fault tolerance

    Development of rotating electrical machines with superconducting field windings is proceeding at numerous sites worldwide. The primary emphasis is on large turbine generators for application to power systems. The EPRI/Westinghouse 300 MVA superconducting generator program is directed towards demonstration of the technology in an actual utility environment for a long period of time. The concept of stability, in the case of superconducting generators, includes the traditional concepts of stability with respect to the electromechanical interactions and oscillations of the machine with the power system as well as the thermohydraulic stability of the cryogenic rotor and its helium supply system. Power system disturbances, such as faults, produce flow and pressure transients in the rotor cooling system. Depending upon the severity and time history of the disturbances, these transients may occasion normalization of the superconductor and destabilize the generator output through loss of field excitation. This paper addresses the question of designing the superconducting winding and its cryogenic cooling system for stability in the presence of large disturbances, a capability which has been called high fault tolerance

  3. Fault detection and isolation in systems with parametric faults

    Stoustrup, Jakob; Niemann, Hans Henrik

    1999-01-01

    The problem of fault detection and isolation of parametric faults is considered in this paper. A fault detection problem based on parametric faults are associated with internal parameter variations in the dynamical system. A fault detection and isolation method for parametric faults is formulated...

  4. Fault-tolerant computer architecture based on INMOS transputer processor

    Ortiz, Jorge L.

    1987-01-01

    Redundant processing was used for several years in mission flight systems. In these systems, more than one processor performs the same task at the same time but only one processor is actually in real use. A fault-tolerance computer architecture based on the features provided by INMOS Transputers is presented. The Transputer architecture provides several communication links that allow data and command communication with other Transputers without the use of a bus. Additionally the Transputer allows the use of parallel processing to increase the system speed considerably. The processor architecture consists of three processors working in parallel keeping all the processors at the same operational level but only one processor is in real control of the process. The design allows each Transputer to perform a test to the other two Transputers and report the operating condition of the neighboring processors. A graphic display was developed to facilitate the identification of any problem by the user.

  5. MCNP load balancing and fault tolerance with PVM

    Version 4A of the Monte Carlo neutron, photon, and electron transport code MCNP developed by Los Alamos National Laboratory supports distributed-memory multiprocessing through the parallel virtual machine (PVM) software package, version 3.1.4. Using PVM for interprocessor communication, MCNP can simultaneously execute a single problem on a cluster of UNIX-based workstations. This capability provided system efficiencies that exceed 80% on dedicated workstation clusters; however, on heterogeneous or multiuser systems, the performance was limited by the slowest processor (i.e., equal work was assigned to each processor). The next public release of MCNP will provide multiprocessing enhancements that include load balancing and fault tolerance, which are shown to dramatically increase multiuser system efficiency and reliability

  6. MCNP load balancing and fault tolerance with PVM

    Version 4A of the Monte Carlo neutron, photon, and electron transport code MCNP, developed by LANL (Los Alamos National Laboratory), supports distributed-memory multiprocessing through the software package PVM (Parallel Virtual Machine, version 3.1.4). Using PVM for interprocessor communication, MCNP can simultaneously execute a single problem on a cluster of UNIX-based workstations. This capability provided system efficiencies that exceeded 80% on dedicated workstation clusters, however, on heterogeneous or multiuser systems, the performance was limited by the slowest processor (i.e., equal work was assigned to each processor). The next public release of MCNP will provide multiprocessing enhancements that include load balancing and fault tolerance which are shown to dramatically increase multiuser system efficiency and reliability

  7. Fault Tolerance Assistant (FTA): An Exception Handling Programming Model for MPI Applications

    Fang, Aiman [Univ. of Chicago, IL (United States). Dept. of Computer Science; Laguna, Ignacio [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Sato, Kento [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Islam, Tanzima [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Mohror, Kathryn [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2016-05-23

    Future high-performance computing systems may face frequent failures with their rapid increase in scale and complexity. Resilience to faults has become a major challenge for large-scale applications running on supercomputers, which demands fault tolerance support for prevalent MPI applications. Among failure scenarios, process failures are one of the most severe issues as they usually lead to termination of applications. However, the widely used MPI implementations do not provide mechanisms for fault tolerance. We propose FTA-MPI (Fault Tolerance Assistant MPI), a programming model that provides support for failure detection, failure notification and recovery. Specifically, FTA-MPI exploits a try/catch model that enables failure localization and transparent recovery of process failures in MPI applications. We demonstrate FTA-MPI with synthetic applications and a molecular dynamics code CoMD, and show that FTA-MPI provides high programmability for users and enables convenient and flexible recovery of process failures.

  8. Fault tolerant strategies for automated operation of nuclear reactors

    This paper introduces an automatic control system incorporating a number of verification, validation, and command generation tasks with-in a fault-tolerant architecture. The integrated system utilizes recent methods of artificial intelligence such as neural networks and fuzzy logic control. Furthermore, advanced signal processing and nonlinear control methods are also included in the design. The primary goal is to create an on-line capability to validate signals, analyze plant performance, and verify the consistency of commands before control decisions are finalized. The application of this approach to the automated startup of the Experimental Breeder Reactor-II (EBR-II) is performed using a validated nonlinear model. The simulation results show that the advanced concepts have the potential to improve plant availability andsafety

  9. Fault tolerant vector control of induction motor drive

    Odnokopylov, G.; Bragin, A.

    2014-10-01

    For electric composed of technical objects hazardous industries, such as nuclear, military, chemical, etc. an urgent task is to increase their resiliency and survivability. The construction principle of vector control system fault-tolerant asynchronous electric. Displaying recovery efficiency three-phase induction motor drive in emergency mode using two-phase vector control system. The process of formation of a simulation model of the asynchronous electric unbalance in emergency mode. When modeling used coordinate transformation, providing emergency operation electric unbalance work. The results of modeling transient phase loss motor stator. During a power failure phase induction motor cannot save circular rotating field in the air gap of the motor and ensure the restoration of its efficiency at rated torque and speed.

  10. Buffered coscheduling for parallel programming and enhanced fault tolerance

    Petrini, Fabrizio; Feng, Wu-chun

    2006-01-31

    A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval. The buffered coscheduling method of this invention also enhances the fault tolerance of a network of parallel machine processors or distributed system processors

  11. Fault tolerance in space-based digital signal processing and switching systems: Protecting up-link processing resources, demultiplexer, demodulator, and decoder

    Redinbo, Robert

    1994-01-01

    Fault tolerance features in the first three major subsystems appearing in the next generation of communications satellites are described. These satellites will contain extensive but efficient high-speed processing and switching capabilities to support the low signal strengths associated with very small aperture terminals. The terminals' numerous data channels are combined through frequency division multiplexing (FDM) on the up-links and are protected individually by forward error-correcting (FEC) binary convolutional codes. The front-end processing resources, demultiplexer, demodulators, and FEC decoders extract all data channels which are then switched individually, multiplexed, and remodulated before retransmission to earth terminals through narrow beam spot antennas. Algorithm based fault tolerance (ABFT) techniques, which relate real number parity values with data flows and operations, are used to protect the data processing operations. The additional checking features utilize resources that can be substituted for normal processing elements when resource reconfiguration is required to replace a failed unit.

  12. Fault tolerance in space-based digital signal processing and switching systems: Protecting up-link processing resources, demultiplexer, demodulator, and decoder

    Redinbo, Robert

    1994-09-01

    Fault tolerance features in the first three major subsystems appearing in the next generation of communications satellites are described. These satellites will contain extensive but efficient high-speed processing and switching capabilities to support the low signal strengths associated with very small aperture terminals. The terminals' numerous data channels are combined through frequency division multiplexing (FDM) on the up-links and are protected individually by forward error-correcting (FEC) binary convolutional codes. The front-end processing resources, demultiplexer, demodulators, and FEC decoders extract all data channels which are then switched individually, multiplexed, and remodulated before retransmission to earth terminals through narrow beam spot antennas. Algorithm based fault tolerance (ABFT) techniques, which relate real number parity values with data flows and operations, are used to protect the data processing operations. The additional checking features utilize resources that can be substituted for normal processing elements when resource reconfiguration is required to replace a failed unit.

  13. The Kaleidoscope switch-a new concept for implementation of a large and fault tolerant ATM switch system

    Dittmann, Lars

    This paper describes a new concept for implementing a large switch network based on smaller modules. The concept is based an an alternative self-routing structure that due to a point symmetry allows the bit in the routing tag to be processed in a random order. Among others this property provides ...... inherent fault protection and allows a simple implementation of broadcast and multicast. The concept has been implemented as a small prototype, that currently is used in a national experimental ATM network in Denmark...

  14. Secure & Fault Tolerance Handoff in Vanet Using Special Mobile Agent

    Shivani Jain; HimanshuTyagi; Charu Gupta

    2013-01-01

    Vehicular Traffic poses an emerging issue nowadays. The critical factors for the data communication are speed and time tradeoffs. For data communication, gathering and retrieving information many cost-effective and tested techniques are required in VANET. Client server architectures being coercive are commonly used in spite of having drawbacks of fault and time in-effectiveness. This paper elaborates a proposed method in VANET for fault tolerance information retrieval based on ...

  15. H Fault Tolerant Control of WECS Based on the PWA Model

    Yun-Tao Shi; Qi Kou; De-Hui Sun; Zheng-Xi Li; Shu-Juan Qiao; Yan-Jiao Hou

    2014-01-01

    The main contribution of this paper is the development of H∞ fault tolerant control for a wind energy conversion system (WECS) based on the stochastic piecewise affine (PWA) model. In this paper the normal and fault stochastic PWA models for WECS including multiple working points at different wind speeds are established. A reliable piecewise linear quadratic regulator state feedback is designed for the fault tolerant actuator and sensor. A sufficient condition for the existence of the passive...

  16. Production of Reliable Flight Crucial Software: Validation Methods Research for Fault Tolerant Avionics and Control Systems Sub-Working Group Meeting

    Dunham, J. R. (Editor); Knight, J. C. (Editor)

    1982-01-01

    The state of the art in the production of crucial software for flight control applications was addressed. The association between reliability metrics and software is considered. Thirteen software development projects are discussed. A short term need for research in the areas of tool development and software fault tolerance was indicated. For the long term, research in format verification or proof methods was recommended. Formal specification and software reliability modeling, were recommended as topics for both short and long term research.

  17. Fault-tolerant three-level inverter

    Edwards, John; Xu, Longya; Bhargava, Brij B.

    2006-12-05

    A method for driving a neutral point clamped three-level inverter is provided. In one exemplary embodiment, DC current is received at a neutral point-clamped three-level inverter. The inverter has a plurality of nodes including first, second and third output nodes. The inverter also has a plurality of switches. Faults are checked for in the inverter and predetermined switches are automatically activated responsive to a detected fault such that three-phase electrical power is provided at the output nodes.

  18. A Fault Tolerant Resource Allocation Architecture for Mobile Grid

    P. T. Vanathi

    2012-01-01

    Full Text Available Problem statement: In order to achieve high level of reliability and availability, the grid infrastructure should be fault tolerant. Since the failure of resources affects job execution fatally, fault tolerance service is essential to satisfy QoS requirement in grid computing with respect to mobile nodes. Approach: We propose a fault tolerant technique for improving reliability in mobile grid environment considering the node mobility. The Cluster head and monitoring agent was designed in such a way it addresses both resource and network failure and present recovery techniques for overcoming the faults. Results: The proposed model achieves a identifiable performance when compared to the previous model (HRAA. By simulation results, we analyze the node and link failures on parameters such as delivery ratio, throughput and delay against the rate of success. Conclusion: The proposed fault tolerant approach checks for availability of the nodes with least work load for transferring the executed job to cluster head providing an alternate path in case of failure thereby enhancing the reliability of the grid environment.

  19. Single-Shot Fault-Tolerant Quantum Error Correction

    Bombín, Héctor

    2015-07-01

    Conventional quantum error correcting codes require multiple rounds of measurements to detect errors with enough confidence in fault-tolerant scenarios. Here, I show that for suitable topological codes, a single round of local measurements is enough. This feature is generic and is related to self-correction and confinement phenomena in the corresponding quantum Hamiltonian model. Three-dimensional gauge color codes exhibit this single-shot feature, which also applies to initialization and gauge fixing. Assuming the time for efficient classical computations to be negligible, this yields a topological fault-tolerant quantum computing scheme where all elementary logical operations can be performed in constant time.

  20. A Novel Nanometric Fault Tolerant Reversible Subtractor Circuit

    Mozhgan Shiri

    2012-11-01

    Full Text Available Reversibility plays an important role when energy efficient computations are considered. Reversible logic circuits have received significant attention in quantum computing, low power CMOS design, optical information processing and nanotechnology in the recent years. This study proposes a new fault tolerant reversible half-subtractor and a new fault tolerant reversible full-subtractor circuit with nanometric scales. Also in this paper we demonstrate how the well-known and important, PERES gate and TR gate can be synthesized from parity preserving reversible gates. All the designs have nanometric scales.

  1. Fault Tolerant, Radiation Hard DSP Project

    National Aeronautics and Space Administration — We propose to develop a radiation tolerant/hardened signal processing node, which effectively utilizes state-of-the-art commercial semiconductors plus our...

  2. Energy Bounds for Fault-Tolerant Nanoscale Designs

    Marculescu, Diana

    2011-01-01

    The problem of determining lower bounds for the energy cost of a given nanoscale design is addressed via a complexity theory-based approach. This paper provides a theoretical framework that is able to assess the trade-offs existing in nanoscale designs between the amount of redundancy needed for a given level of resilience to errors and the associated energy cost. Circuit size, logic depth and error resilience are analyzed and brought together in a theoretical framework that can be seamlessly integrated with automated synthesis tools and can guide the design process of nanoscale systems comprised of failure prone devices. The impact of redundancy addition on the switching energy and its relationship with leakage energy is modeled in detail. Results show that 99% error resilience is possible for fault-tolerant designs, but at the expense of at least 40% more energy if individual gates fail independently with probability of 1%.

  3. The SIFT computer and its development. [Software Implemented Fault Tolerance for aircraft control

    Goldberg, J.

    1981-01-01

    Software Implemented Fault Tolerance (SIFT) is an aircraft control computer designed to allow failure probability of less than 10 to the -10th/hour. The system is based on advanced fault-tolerance computing and validation methodology. Since confirmation of reliability by observation is essentially impossible, system reliability is estimated by a Markov model. A mathematical proof is used to justify the validity of the Markov model. System design is represented by a hierarchy of abstract models, and the design proof comprises mathematical proofs that each model is, in fact, an elaboration of the next more abstract model.

  4. Analysis of GPS Abnormal Conditions within Fault Tolerant Control Laws

    Al-Sinbol, Gahssan

    The Global Position System (GPS) is a critical element for the functionality of autonomous flying vehicles. The GPS operation at normal and abnormal conditions directly impacts the trajectory tracking performance of the autonomous Unmanned Aerial Vehicles (UAVs) controllers. The effects of GPS parameter variation must be well understood and user-friendly computational tools must be developed to facilitate the design and evaluation of fault tolerant control laws. This thesis presents the development of a simplified GPS error model in Matlab/Simulink and its use performing a sensitivity analysis of GPS parameters effect under system normal and abnormal operation on different UAV trajectory tracking controllers. The model statistically generates position and velocity errors, simulates the effect of GPS satellite configuration on the position and velocity measurement accuracy, and implements a set of failures to the GPS readings. The model and its graphical user interface was integrated within the WVU UAV simulation environment as a masked Simulink block. The effects on the controllers' trajectory tracking performance of the following GPS parameters were investigated within normal operation ranges and outside: time delay, update rate, error standard deviation, bias, and major position and velocity failures. Several sets of control laws with fixed and adaptive parameters and of different levels of complexity have been used in this investigation. A complex performance index formulated in terms of tracking errors and control activity was used for control laws performance evaluation. The composition of various metrics within the performance index was performed using fixed and variable weights depending on the local characteristics of the commanded trajectory. This study has revealed that GPS error parameters have a significant impact on control laws performance. The proposed GPS model has proved to be a valuable, flexible tool for testing and evaluation of the fault

  5. Active fault-tolerant control strategy of large civil aircraft under elevator failures

    Wang Xingjian

    2015-12-01

    Full Text Available Aircraft longitudinal control is the most important actuation system and its failures would lead to catastrophic accident of aircraft. This paper proposes an active fault-tolerant control (AFTC strategy for civil aircraft with different numbers of faulty elevators. In order to improve the fault-tolerant flight control system performance and effective utilization of the control surface, trimmable horizontal stabilizer (THS is considered to generate the extra pitch moment. A suitable switching mechanism with performance improvement coefficient is proposed to determine when it is worthwhile to utilize THS. Furthermore, AFTC strategy is detailed by using model following technique and the proposed THS switching mechanism. The basic fault-tolerant controller is designed to guarantee longitudinal control system stability and acceptable performance degradation under partial elevators failure. The proposed AFTC is applied to Boeing 747-200 numerical model and simulation results validate the effectiveness of the proposed AFTC approach.

  6. FAULT TOLERANCE USING CREDENTIALS MANAGEMENT IN ONLINE TRANSACTION APPLICATION

    L. Javid Ali

    2014-07-01

    Full Text Available Web applications play a vital role in the IT field for satisfying the web customer. The customer always depends on the online transaction processing system. The web application has various forms which gives a complete service to the customer. These various forms have options that are used to satisfy the customer’s needs because of the attraction over web sites existing in the global market. The traditional web pages will be closed from the current session whenever the customer selects an improper option because of single sign-on property. Selection of wrong option that is not suitable for the current session will lead to reliability problem. If the same user needs the same service, again he has to navigate from home page to the required page, thus adding up extra burden on customer. The customer session should be maintained properly, so that the customer’s satisfaction is retained over the online web application. The existing system classifies the user with their access level and also their fault level. The main objective of the proposed work is to manage the credential in all levels in order to keep the valuable customer for a long time of access in the current session. The credential management and session management are used to manage a multilevel credential from web client to web resource level and vice versa. The options selected by the customer can be classified based on the fault and type of access. The credential management also performs the maintenance process for fixing the fault tolerance level to the web user. A complete log is recorded to trace the overall process in the online transaction processing.

  7. Study on Software Fault Injection Based on Onboard System

    PENGJunjie; HONGBingrong; YUANChengjun; LIAiguo; WEIZhenhua; QIAOYongqiang

    2005-01-01

    Fault injection techniques are the effective methods to evaluate the dependability and validate the fault tolerance mechanisms of computer systems. Among the different fault injection techniques, software implemented fault injection technique is regarded as one of the most promising technique for evaluation of the dependability of computer systems. In this paper, combined the advantages of software fault injection and the particularity of onboard system, a new software fault injection model, which can be used to evaluate the dependability and validate the fault tolerance mechanisms of the onboard system, is put forward. To evaluate the dependability of on boardsystem effectively, the application algorithm on how to use the model is presented. The experimental results show that using the fault injection model and algorithm put forward in this paper, not only most of low-level faults such as processor register faults, memory faults and so on can be injected, but also some high-level faults such as code faults, branch faults etc. can be injected, which can be used to evaluate the dependability of the onboard systems.

  8. A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism

    Yao, Erlin; Wang, Rui; Zhang, Wenli; Tan, Guangming

    2011-01-01

    Fault tolerance overhead of high performance computing (HPC) applications is becoming critical to the efficient utilization of HPC systems at large scale. HPC applications typically tolerate fail-stop failures by checkpointing. Another promising method is in the algorithm level, called algorithmic recovery. These two methods can achieve high efficiency when the system scale is not very large, but will both lose their effectiveness when systems approach the scale of Exaflops, where the number of processors including in system is expected to achieve one million. This paper develops a new and efficient algorithm-based fault tolerance scheme for HPC applications. When failure occurs during the execution, we do not stop to wait for the recovery of corrupted data, but replace them with the corresponding redundant data and continue the execution. A background accelerated recovery method is also proposed to rebuild redundancy to tolerate multiple times of failures during the execution. To demonstrate the feasibility ...

  9. Microprocessor-based fault-tolerant nuclear turbine governor

    A new microprocessor-based fault-tolerant nuclear turbine governor has been developed. Hierarchically distributed configuration and asynchronous triplicated architecture with middle value voting logic maximizes the plant availability. Problem-oriented language is provided for design ease and program maintainability. The turbine governor with these features is described with test results

  10. Critique of Fault-Tolerant Quantum Information Processing

    Alicki, Robert

    2013-01-01

    This is a chapter in a book \\emph{Quantum Error Correction} edited by D. A. Lidar and T. A. Brun, and published by Cambridge University Press (2013)\\\\ (http://www.cambridge.org/us/academic/subjects/physics/quantum-physics-quantum-information-and-quantum-computation/quantum-error-correction)\\\\ presenting the author's view on feasibility of fault-tolerant quantum information processing.

  11. Fault tolerant computer for nuclear power plant applications

    A quadruply redundant synchronous fault tolerant processor (FTP) is now under fabrication at the C.S. Draper Laboratory to be used initially as a trip monitor for the Experimental Breeder Reactor EBR-II operated by the Argonne National Laboratory in Idaho Falls, Idaho. The hardware architecture of this processor is described and certain issues unique to quadruply redundant computers are discussed

  12. Fault-tolerant quantum computing with color codes

    Landahl, Andrew J; Rice, Patrick R

    2011-01-01

    We present and analyze protocols for fault-tolerant quantum computing using color codes. We present circuit-level schemes for extracting the error syndrome of these codes fault-tolerantly. We further present an integer-program-based decoding algorithm for identifying the most likely error given the syndrome. We simulated our syndrome extraction and decoding algorithms against three physically-motivated noise models using Monte Carlo methods, and used the simulations to estimate the corresponding accuracy thresholds for fault-tolerant quantum error correction. We also used a self-avoiding walk analysis to lower-bound the accuracy threshold for two of these noise models. We present and analyze two architectures for fault-tolerantly computing with these codes: one with 2D arrays of qubits are stacked atop each other and one in a single 2D substrate. Our analysis demonstrates that color codes perform slightly better than Kitaev's surface codes when circuit details are ignored. When these details are considered, w...

  13. Nonlinear and fault-tolerant flight control using multivariate splines

    Tol, H.J.; De Visser, C.C.; Van Kampen, E.J.; Chu, Q.P.

    2015-01-01

    This paper presents a study on fault tolerant flight control of a high performance aircraft using multivariate splines. The controller is implemented by making use of spline model based adaptive nonlinear dynamic inversion (NDI). This method, indicated as SANDI, combines NDI control with nonlinear c

  14. Fault Tolerant Congestion based Algorithms in OBS Network

    Hardeep Singh

    2011-12-01

    Full Text Available In Optical Burst Switched networks, each light path carry huge amount of traffic, path failures may damage the user application. Hence fault-tolerance becomes an important issue on these networks. Blocking probability is a key index of quality of service in Optical Burst Switched (OBS network. The Erlang formula has been used extensively in the traffic engineering of optical communication to calculate the blocking probability. The paper revisits burst contention resolution problems in OBS networks. When the network is overloaded, no contention resolution scheme would effectively avoid the collision and cause blocking. It is important to first decide, a good routing algorithm and then to choose a wavelength assignment scheme. In this paper we have developed two algorithms, Fault Tolerant Optimized Blocking Algorithm (FTOBA and Fault Tolerant Least Congestion Algorithm (FTLCA and then compare the performance of these algorithms on the basis of blocking probability. These algorithms are based upon the congestion on path in OBS network and based on the simulation results, we shows that the reliable and fault tolerant routing algorithms reduces the blocking probability.

  15. Fault-model development for fault-tolerant VLSI design. Final technical report, May 1986-May 1987

    Hartmann, C.R.; Lala, P.K.; Ali, A.M.; Visweswaran, G.S.; Ganguly, S.

    1988-05-01

    Fault models provide systematic and precise representations of physical defects in microcircuits in a form suitable for simulation and test generation. The current difficulty in testing VLSI circuits can be attributed to the tremendous increase in design complexity and the inappropriateness of traditional stuck-at fault models. This report develops fault models for three different types of common defects that are not accurately represented by the stuck-at fault model. The faults examined in this report are: bridging faults, transistor stuck-open faults, and transient faults caused by alpha-particle radiation. A generalized fault model could not be developed for the three fault types. However, microcircuit behavior and fault-detection strategies are described for the bridging, transistor stuck-open, and transient (alpha-particle strike) faults. The results of this study can be applied to the simulation and analysis of faults in fault-tolerant VLSI circuits.

  16. Reversible Logic Synthesis of Fault Tolerant Carry Skip BCD Adder

    Islam, Md Saiful; 10.3329/jbas.v32i2.2431

    2010-01-01

    Reversible logic is emerging as an important research area having its application in diverse fields such as low power CMOS design, digital signal processing, cryptography, quantum computing and optical information processing. This paper presents a new 4*4 parity preserving reversible logic gate, IG. The proposed parity preserving reversible gate can be used to synthesize any arbitrary Boolean function. It allows any fault that affects no more than a single signal readily detectable at the circuit's primary outputs. It is shown that a fault tolerant reversible full adder circuit can be realized using only two IGs. The proposed fault tolerant full adder (FTFA) is used to design other arithmetic logic circuits for which it is used as the fundamental building block. It has also been demonstrated that the proposed design offers less hardware complexity and is efficient in terms of gate count, garbage outputs and constant inputs than the existing counterparts.

  17. Final Project Report. Scalable fault tolerance runtime technology for petascale computers

    Krishnamoorthy, Sriram [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Sadayappan, P [Ohio State Univ., Columbus, OH (United States)

    2015-06-16

    With the massive number of components comprising the forthcoming petascale computer systems, hardware failures will be routinely encountered during execution of large-scale applications. Due to the multidisciplinary, multiresolution, and multiscale nature of scientific problems that drive the demand for high end systems, applications place increasingly differing demands on the system resources: disk, network, memory, and CPU. In addition to MPI, future applications are expected to use advanced programming models such as those developed under the DARPA HPCS program as well as existing global address space programming models such as Global Arrays, UPC, and Co-Array Fortran. While there has been a considerable amount of work in fault tolerant MPI with a number of strategies and extensions for fault tolerance proposed, virtually none of advanced models proposed for emerging petascale systems is currently fault aware. To achieve fault tolerance, development of underlying runtime and OS technologies able to scale to petascale level is needed. This project has evaluated range of runtime techniques for fault tolerance for advanced programming models.

  18. Multi-fault Tolerance for Cartesian Data Distributions

    Ali, Nawab; Krishnamoorthy, Sriram; Halappanavar, Mahantesh; Daily, Jeffrey A.

    2013-06-01

    Faults are expected to play an increasingly important role in how algorithms and applications are designed to run on future extreme-scale sys- tems. Algorithm-based fault tolerance (ABFT) is a promising approach that involves modications to the algorithm to recover from faults with lower over- heads than replicated storage and a signicant reduction in lost work compared to checkpoint-restart techniques. Fault-tolerant linear algebra (FTLA) algo- rithms employ additional processors that store parities along the dimensions of a matrix to tolerate multiple, simultaneous faults. Existing approaches as- sume regular data distributions (blocked or block-cyclic) with the failures of each data block being independent. To match the characteristics of failures on parallel computers, we extend these approaches to mapping parity blocks in several important ways. First, we handle parity computation for generalized Cartesian data distributions with each processor holding arbitrary subsets of blocks in a Cartesian-distributed array. Second, techniques to handle corre- lated failures, i.e., multiple processors that can be expected to fail together, are presented. Third, we handle the colocation of parity blocks with the data blocks and do not require them to be on additional processors. Several al- ternative approaches, based on graph matching, are presented that attempt to balance the memory overhead on processors while guaranteeing the same fault tolerance properties as existing approaches that assume independent fail- ures on regular blocked data distributions. The evaluation of these algorithms demonstrates that the additional desirable properties are provided by the pro- posed approach with minimal overhead.

  19. Data-based fault-tolerant model predictive controller an application to a complex dearomatization process

    Kettunen, Markus

    2010-01-01

    The tightening global competition during the last few decades has been the driving force for the optimisation of industrial plant operations through the use of advanced control methods, such as model predictive control (MPC). As the occurrence of faults in the process measurements and actuators has become more common due to the increase in the complexity of the control systems, the need for fault-tolerant control (FTC) to prevent the degradation of the controller performance, and therefore th...

  20. Combining dynamical decoupling with fault-tolerant quantum computation

    Ng, Hui Khoon; Preskill, John

    2009-01-01

    We study how dynamical decoupling (DD) pulse sequences can improve the reliability of quantum computers. We prove upper bounds on the accuracy of DD-protected quantum gates and derive sufficient conditions for DD-protected gates to outperform unprotected gates. Under suitable conditions, fault-tolerant quantum circuits constructed from DD-protected gates can tolerate stronger noise, and have a lower overhead cost, than fault-tolerant circuits constructed from unprotected gates. Our accuracy estimates depend on the dynamics of the bath that couples to the quantum computer, and can be expressed either in terms of the operator norm of the bath's Hamiltonian or in terms of the power spectrum of bath correlations; we explain in particular how the performance of recursively generated concatenated pulse sequences can be analyzed from either viewpoint. Our results apply to Hamiltonian noise models with limited spatial correlations.

  1. Fault-tolerant Algorithms for Tick-Generation in Asynchronous Logic: Robust Pulse Generation

    Dolev, Danny; Lenzen, Christoph; Schmid, Ulrich

    2011-01-01

    Today's hardware technology presents a new challenge in designing robust systems. Deep submicron VLSI technology introduced transient and permanent faults that were never considered in low-level system designs in the past. Still, robustness of that part of the system is crucial and needs to be guaranteed for any successful product. Distributed systems, on the other hand, have been dealing with similar issues for decades. However, neither the basic abstractions nor the complexity of contemporary fault-tolerant distributed algorithms match the peculiarities of hardware implementations. This paper is intended to be part of an attempt striving to overcome this gap between theory and practice for the clock synchronization problem. Solving this task sufficiently well will allow to build a very robust high-precision clocking system for hardware designs like systems-on-chips in critical applications. As our first building block, we describe and prove correct a novel Byzantine fault-tolerant self-stabilizing pulse syn...

  2. Beam Dynamics Studies for the Fault Tolerance Assessment of the PDS-XADS Linac Design

    In order to meet the high availability/reliability required by the PDS-XADS design, the accelerator needs to implement to the maximum possible extent a fault tolerance strategy that would allow beam operation in the presence of most of the envisaged faults that could occur in its beam line components. In this work, we report the results of beam dynamics simulations performed to characterize the effects of the faults of the main linac components (cavities and focusing magnets) on the beam parameters. The outcome of this activity is the definition of the possible corrective actions that could be conceived (and implemented in the system) in order to guarantee the fault tolerance characteristics of the accelerator. This work has been supported by the PDS-XADS program, funded by the EU 5th Framework Program under contract FIKW-CT-2001-00179

  3. Fault detection, isolation and reconfiguration in FTMP Methods and experimental results. [fault tolerant multiprocessor

    Lala, J. H.

    1983-01-01

    The Fault-Tolerant Multiprocessor (FTMP) is a highly reliable computer designed to meet a goal of 10 to the -10th failures per hour and built with the objective of flying an active-control transport aircraft. Fault detection, identification, and recovery software is described, and experimental results obtained by injecting faults in the pin level in the FTMP are presented. Over 21,000 faults were injected in the CPU, memory, bus interface circuits, and error detection, masking, and error reporting circuits of one LRU of the multiprocessor. Detection, isolation, and reconfiguration times were recorded for each fault, and the results were found to agree well with earlier assumptions made in reliability modeling.

  4. Passive fault tolerant control of a double inverted pendulum - a case study

    Niemann, Hans Henrik; Stoustrup, Jakob

    2005-01-01

    A passive fault tolerant control scheme is suggested, in which a nominal controller is augmented with an additional block, which guarantees stability and performance after the occurrence of a fault. The method is based on the YJBK parameterization, which requires the nominal controller to be impl...... implemented in observer based form. The proposed method is applied to a double inverted pendulum system, for which an H_inf controller has been designed and verified in a lab setup. In this case study, the fault is a degradation of the tacho loop....

  5. Active fault tolerant control research for nuclear power plant based on BP neural network

    In view of the sensor fault of nuclear power plant, the sensor was trained by adopting improved back propagation (BP) neural network method, and the dynamic model bank in different states was set up. The system was detected by using BP neural network in real time. When the sensor goes wrong, it will be controlled by reconstruction. Taking pressurizer as the case, a simulation experiment was performed on the nuclear power plant simulator. The results show that the proposed method is valid for the fault tolerant control of sensor faults in nuclear power plant. (authors)

  6. Active Fault Isolation in MIMO Systems

    Niemann, Hans Henrik; Poulsen, Niels Kjølstad

    2014-01-01

    isolation is based directly on the input/output s ignals applied for the fault detection. It is guaranteed that the fault group includes the fault that had occurred in the system. The second step is individual fault isolation in the fault group . Both types of isolation are obtained by applying dedicated......Active fault isolation of parametric faults in closed-loop MIMO system s are considered in this paper. The fault isolation consists of two steps. T he first step is group- wise fault isolation. Here, a group of faults is isolated from other pos sible faults in the system. The group-wise fault...

  7. Short Survey on Design for Fault Tolerance of Computer Systems%计算机系统容错设计简述

    鄢贵海; 李晓维

    2013-01-01

    Highly reliable computer systems are the foundation of QoS (Quality of Services) of IT services. Since the birth of ENIAC, the ifrst electronic computer in history, reliability has become one of the major challenges in computer design. Fault tolerance serves as a major approach to high reliability. It is also a systematic science crossing multiple logical layers of the classical computing stacks. The design opportunity comes from the bottom device layer to the much higher application layer. Each logical layer faces speciifc design challenges. Following a bottom-up style, we brielfy survey these classical approaches in design for reliability.%高可靠计算机系统是是保证信息服务质量的基石。从第一台计算机ENIAC诞生起,可靠性就是计算机系统面临的主要挑战之一,容错设计是实现可靠性的有效途径,也是一项典型的跨计算机多个设计层次的系统科学。从底层的器件到顶层的应用程序,都存在优化可靠性的设计空间,每个层次的设计面向特定的可靠性设计挑战。文章将遵循自底向上的逻辑层次简述这些经典的设计方法。

  8. Fuzzy fault diagnostic system based on fault tree analysis

    Yang, Zong Xiao; Suzuki, Kazuhiko; Shimada, Yukiyasu; Sayama, Hayatoshi

    1995-01-01

    A method is presented for process fault diagnosis using information from fault tree analysis and uncertainty/imprecision of data. Fault tree analysis, which has been used as a method of system reliability/safety analysis, provides a procedure for identifying failures within a process. A fuzzy fault diagnostic system is constructed which uses the fuzzy fault tree analysis to represent a knowledge of the causal relationships in process operation and control system. The proposed method is applie...

  9. A fault-tolerant voltage measurement method for series connected battery packs

    Xia, Bing; Mi, Chris

    2016-03-01

    This paper proposes a fault-tolerant voltage measurement method for battery management systems. Instead of measuring the voltage of individual cells, the proposed method measures the voltage sum of multiple battery cells without additional voltage sensors. A matrix interpretation is developed to demonstrate the viability of the proposed sensor topology to distinguish between sensor faults and cell faults. A methodology is introduced to isolate sensor and cell faults by locating abnormal signals. A measurement electronic circuit is proposed to implement the design concept. Simulation and experiment results support the mathematical analysis and validate the feasibility and robustness of the proposed method. In addition, the measurement problem is generalized and the condition for valid sensor topology is discovered. The tuning of design parameters are analyzed based on fault detection reliability and noise levels.

  10. Fault Tolerant Control Using Proportional-Integral-Derivative Controller Tuned by Genetic Algorithm

    S. Kanthalakshmi

    2011-01-01

    Full Text Available Problem statement: The growing demand for reliability, maintainability and survivability in industrial processes has drawn significant research in fault detection and fault tolerant control domain. A fault is usually defined as an unexpected change in a system, such as component malfunction and variations in operating condition, which tends to degrade the overall system performance. The purpose of fault detection is to detect these malfunctions to take proper action in order to prevent faults from developing into a total system failure. Approach: In this study an effective integrated fault detection and fault tolerant control scheme was developed for a class of LTI system. The scheme was based on a Kalman filter for simultaneous state and fault parameter estimation, statistical decisions for fault detection and activation of controller reconfiguration. Proportional-Integral-Derivative (PID control schemes continue to provide the simplest and yet effective solutions to most of the control engineering applications today. Determination or tuning of the PID parameters continues to be important as these parameters have a great influence on the stability and performance of the control system. In this study GA was proposed to tune the PID controller. Results: The results reflect that proposed scheme improves the performance of the process in terms of time domain specifications, robustness to parametric changes and optimum stability. Also, A comparison with the conventional Ziegler-Nichols method proves the superiority of GA based system. Conclusion: This study demonstrates the effectiveness of genetic algorithm in tuning of a PID controller with optimum parameters. It is, moreover, proved to be robust to the variations in plant dynamic characteristics and disturbances assuring a parameter-insensitive operation of the process.

  11. High Performance Modeling of Intelligent Pattern Recognition with Enhanced Fault-Tolerance in Real Time

    Renukaradhya P.C

    2014-03-01

    Full Text Available Designing an ANN which could recognize the learned patterns even if there is variation in applied test patterns from learned patterns. A mechanism has been developed which provided the recognition facility intelligently. Recognition of patterns can be broadly categorized into two classes. When precision of recognition is not defined, term name “Forced recognition” given to the process. When precision of recognition is properly defined termed “Custom recognition” given to process. Analysis of fault tolerant property of feed forward architecture will be given training with back propagation method. Under this, analysis of effect of initially selected random weights and what should be the nature of random weights so that to maximize the fault tolerance capability of system has done. Analysis can be done with two different distribution namely Gaussian distribution and Uniform distribution. Effect of faults at output is also a function of fault position in ANN system like Hidden layer weight, Output layer weights, with processing elements at hidden layer. Analysis capability of back propagation algorithm itself is to tolerate the fault by learning process. A development of test mechanism to check faulty system in coming future is ANN system in hardware world i.e. on the VLSI chip. Once the architecture implemented it is required a mechanism to check the functioning. Analysis of internal parameters of ANN is completely research work with behavior of internal parameters, which will provide all responsible factors behind success of an ANN.

  12. Lightweight storage and overlay networks for fault tolerance.

    Oldfield, Ron A.

    2010-01-01

    The next generation of capability-class, massively parallel processing (MPP) systems is expected to have hundreds of thousands to millions of processors, In such environments, it is critical to have fault-tolerance mechanisms, including checkpoint/restart, that scale with the size of applications and the percentage of the system on which the applications execute. For application-driven, periodic checkpoint operations, the state-of-the-art does not provide a scalable solution. For example, on today's massive-scale systems that execute applications which consume most of the memory of the employed compute nodes, checkpoint operations generate I/O that consumes nearly 80% of the total I/O usage. Motivated by this observation, this project aims to improve I/O performance for application-directed checkpoints through the use of lightweight storage architectures and overlay networks. Lightweight storage provide direct access to underlying storage devices. Overlay networks provide caching and processing capabilities in the compute-node fabric. The combination has potential to signifcantly reduce I/O overhead for large-scale applications. This report describes our combined efforts to model and understand overheads for application-directed checkpoints, as well as implementation and performance analysis of a checkpoint service that uses available compute nodes as a network cache for checkpoint operations.

  13. Fault Tolerance for Industrial Actuators in Absence of Accurate Models and Hardware Redundancy

    Papageorgiou, Dimitrios; Blanke, Mogens; Niemann, Hans Henrik;

    2015-01-01

    This paper investigates Fault-Tolerant Control for closed-loop systems where only coarse models are available and there is lack of actuator and sensor redundancies. The problem is approached in the form of a typical servomotor in closed-loop. A linear model is extracted from input/output data...

  14. Modular Multilevel Converter Control Strategy with Fault Tolerance

    Teodorescu, Remus; Eni, Emanuel-Petre; Mathe, Laszlo;

    2013-01-01

    The Modular Multilevel Converter (MMC) technology has recently emerged in VSC-HVDC applications where it demonstrated higher efficiency and fault tolerance compared to the classical 2-level topology. Due to the ability of MMC to connect to HV levels, MMC can be also used in transformerless STATCOM...... and large wind turbines. In this paper, a control and communication strategy have been developed to accommodate tolerant module failure and capacitor voltage unbalance. A downscaled prototype converter has been built in order to validate and investigate the control strategy, and also test the proposed...... communication infrastructure based on Industrial Ethernet....

  15. A Decentralized Adaptive Approach to Fault Tolerant Flight Control

    Wu, N. Eva; Nikulin, Vladimir; Heimes, Felix; Shormin, Victor

    2000-01-01

    This paper briefly reports some results of our study on the application of a decentralized adaptive control approach to a 6 DOF nonlinear aircraft model. The simulation results showed the potential of using this approach to achieve fault tolerant control. Based on this observation and some analysis, the paper proposes a multiple channel adaptive control scheme that makes use of the functionally redundant actuating and sensing capabilities in the model, and explains how to implement the scheme to tolerate actuator and sensor failures. The conditions, under which the scheme is applicable, are stated in the paper.

  16. Fault-tolerance performance evaluation of fieldbus for NPCS network of KNGR

    In contrast with conventional fieldbus researches which are focused merely on real time performance, this study aims to evaluate the real-time performance of the communication system including fault-tolerant mechanisms. Maintaining performance in presence of recoverable faults is very important because the communication network will be applied to next generation NPP(Nuclear Power Plant). In order to guarantee the performance of NPP communication network, the time characteristics of the target system in presence of recoverable fault should be investigated. If the time characteristics meet the requirements of the system, the faults will be recovered by fieldbus recovery mechanisms and the system will be safe. If the time characteristics can not meet the requirements, the faults in the fieldbus can propagate to system failure. In this study, for the purpose of investigating the time characteristics of fieldbus, the recoverable faults are classified and then the formulas which represent delays including recovery mechanisms and the simulation model are developed. In order to validate the proposed approach, the simulation model is applied to the Korea Next Generation Reactor (KNGR) NSSS Process Control System (NPCS). The results of the simulation provide reasonable delay characteristics of the fault cases with recovery mechanisms. Using the outcome of the simulation and the system requirements, we also can calculate the failure propagation probability from fieldbus to outer system

  17. Fault management for data systems

    Boyd, Mark A.; Iverson, David L.; Patterson-Hine, F. Ann

    1993-01-01

    Issues related to automating the process of fault management (fault diagnosis and response) for data management systems are considered. Substantial benefits are to be gained by successful automation of this process, particularly for large, complex systems. The use of graph-based models to develop a computer assisted fault management system is advocated. The general problem is described and the motivation behind choosing graph-based models over other approaches for developing fault diagnosis computer programs is outlined. Some existing work in the area of graph-based fault diagnosis is reviewed, and a new fault management method which was developed from existing methods is offered. Our method is applied to an automatic telescope system intended as a prototype for future lunar telescope programs. Finally, an application of our method to general data management systems is described.

  18. Fault Tolerant Characteristics of Artificial Neural Network Electronic Hardware

    Zee, Frank

    1995-01-01

    The fault tolerant characteristics of analog-VLSI artificial neural network (with 32 neurons and 532 synapses) chips are studied by exposing them to high energy electrons, high energy protons, and gamma ionizing radiations under biased and unbiased conditions. The biased chips became nonfunctional after receiving a cumulative dose of less than 20 krads, while the unbiased chips only started to show degradation with a cumulative dose of over 100 krads. As the total radiation dose increased, all the components demonstrated graceful degradation. The analog sigmoidal function of the neuron became steeper (increase in gain), current leakage from the synapses progressively shifted the sigmoidal curve, and the digital memory of the synapses and the memory addressing circuits began to gradually fail. From these radiation experiments, we can learn how to modify certain designs of the neural network electronic hardware without using radiation-hardening techniques to increase its reliability and fault tolerance.

  19. Topological fault-tolerance in cluster state quantum computation

    We describe a fault-tolerant version of the one-way quantum computer using a cluster state in three spatial dimensions. Topologically protected quantum gates are realized by choosing appropriate boundary conditions on the cluster. We provide equivalence transformations for these boundary conditions that can be used to simplify fault-tolerant circuits and to derive circuit identities in a topological manner. The spatial dimensionality of the scheme can be reduced to two by converting one spatial axis of the cluster into time. The error threshold is 0.75% for each source in an error model with preparation, gate, storage and measurement errors. The operational overhead is poly-logarithmic in the circuit size

  20. Fault-tolerant Control of Unmanned Underwater Vehicles with Continuous Faults: Simulations and Experiments

    Qian Liu

    2010-02-01

    Full Text Available A novel thruster fault diagnosis and accommodation method for open-frame underwater vehicles is presented in the paper. The proposed system consists of two units: a fault diagnosis unit and a fault accommodation unit. In the fault diagnosis unit an ICMAC (Improved Credit Assignment Cerebellar Model Articulation Controllers neural network information fusion model is used to realize the fault identification of the thruster. The fault accommodation unit is based on direct calculations of moment and the result of fault identification is used to find the solution of the control allocation problem. The approach resolves the continuous faulty identification of the UV. Results from the experiment are provided to illustrate the performance of the proposed method in uncertain continuous faulty situation.

  1. BFTDT: Byzantine Fault Tolerance tryout for Dependable Transactions in Cloud

    Gayathri S

    2012-11-01

    Full Text Available Cloud Web Services (CWS is the technology used for business collaboration and integration among the web users. The Web Services Atomic Transactions (WS-AT have been used for the trusted distributed transaction processing over the web. The WS-AT in the distributed sense has byzantine faults to overcome that Byzantine Faults Techniques (BFT is used. The reliable coordinator provides the services that are Coordination services, Activation services, Registration Services and Completion services which make the transaction effective and reliable. In the trusted environment, to evade congestion of the resources, fair share bandwidth allocation scheme is used to allocate separate bandwidth for each web users and the transaction is processed Coordinator server and the Transaction Processing Monitor (TPM. The WS-AT for business applications analysis shows the high degree of dependability, security, trust, fault tolerance and fairness of the resources in the trusted environment.

  2. TRSTR: A Fault- Tolerant Microprocessor Architecture Based on SMT

    YANG Hua; CUI Gang; YANG Xiao-zong

    2005-01-01

    Based on Simultaneous Multithreading (SMT),we propose a fault-tolerant scheme called Tri-modular Redundantly and Simultaneously Threaded processor with Recovery (TRSTR). TRSTR features as following: First, we introduce an arbitrator context into the conventional SRT (Simultaneous and Redundantly Threaded), which acts as an arbitrator when results from the other two contexts disagree, or acts as an ordinary thread generally, thus making full use of SMT' s parallelism. Second, we append reconfigurable feature to sphere of replication in SRT, making it more flexible for changing demands and situations. Third, TRSTR has two working modes: Tri-Simultaneous with Voting (TSV) and Dual-Simultaneous with Arbitrator (DSA), which can switch at will. Finally, in addition to transient-fault coverage,TRSTR has on-line self-checking and self-recovering abilities,so as to shield off some permanent faults and reconfigure itself without stopping the crucial job, improving its reliability and availability.

  3. Advanced Information Processing System - Fault detection and error handling

    Lala, J. H.

    1985-01-01

    The Advanced Information Processing System (AIPS) is designed to provide a fault tolerant and damage tolerant data processing architecture for a broad range of aerospace vehicles, including tactical and transport aircraft, and manned and autonomous spacecraft. A proof-of-concept (POC) system is now in the detailed design and fabrication phase. This paper gives an overview of a preliminary fault detection and error handling philosophy in AIPS.

  4. Fault-Tolerant Visual Secret Sharing Schemes without Pixel Expansion

    Justie Su-Tzu Juan; Yung-Chang Chen; Song Guo

    2016-01-01

    Visual cryptography encrypts a secret image into two meaningless random images, called shares, such that it can be decrypted by human vision without any calculations. However, there would be problems in alignment when these two shares are staked by hand in practice. Therefore, this paper presents the fault-tolerant schemes of stacking two shares that are acquired from secret image encryption without pixel expansion. The main idea of these schemes is to combine several pixels as a unit and the...

  5. Faster Quantum Chemistry Simulation on Fault-Tolerant Quantum Computers

    Jones, N. Cody; Whitfield, James D.; McMahon, Peter L.; Yung, Man-Hong; Van Meter, Rodney; Aspuru-Guzik, Alan; Yamamoto, Yoshihisa

    2012-01-01

    Quantum computers can in principle simulate quantum physics exponentially faster than their classical counterparts, but some technical hurdles remain. We propose methods which substantially improve the performance of a particular form of simulation, ab initio quantum chemistry, on fault-tolerant quantum computers; these methods generalize readily to other quantum simulation problems. Quantum teleportation plays a key role in these improvements and is used extensively as a computing resource...

  6. Formal verification of fault-tolerant software design: the CSP approach

    Yeung, WL; Schneider, SA

    2005-01-01

    Software design techniques for tolerating both hardware and software faults have been developed over the past few decades. Paradoxically, it is essential that fault-tolerant software is designed with the highest possible rigour to prevent faults in itself. Such rigour is provided by formal methods and aided by model checking. We illustrate an approach to fault-tolerant software design based on communicating sequential processes through a running example.

  7. Faster quantum chemistry simulation on fault-tolerant quantum computers

    Cody Jones, N.; Whitfield, James D.; McMahon, Peter L.; Yung, Man-Hong; Van Meter, Rodney; Aspuru-Guzik, Alán; Yamamoto, Yoshihisa

    2012-11-01

    Quantum computers can in principle simulate quantum physics exponentially faster than their classical counterparts, but some technical hurdles remain. We propose methods which substantially improve the performance of a particular form of simulation, ab initio quantum chemistry, on fault-tolerant quantum computers; these methods generalize readily to other quantum simulation problems. Quantum teleportation plays a key role in these improvements and is used extensively as a computing resource. To improve execution time, we examine techniques for constructing arbitrary gates which perform substantially faster than circuits based on the conventional Solovay-Kitaev algorithm (Dawson and Nielsen 2006 Quantum Inform. Comput. 6 81). For a given approximation error ɛ, arbitrary single-qubit gates can be produced fault-tolerantly and using a restricted set of gates in time which is O(log ɛ) or O(log log ɛ) with sufficient parallel preparation of ancillas, constant average depth is possible using a method we call programmable ancilla rotations. Moreover, we construct and analyze efficient implementations of first- and second-quantized simulation algorithms using the fault-tolerant arbitrary gates and other techniques, such as implementing various subroutines in constant time. A specific example we analyze is the ground-state energy calculation for lithium hydride.

  8. Unconstrained and Constrained Fault-Tolerant Resource Allocation

    Liao, Kewen

    2011-01-01

    First, we study the Unconstrained Fault-Tolerant Resource Allocation (UFTRA) problem (a.k.a. FTFA problem in \\cite{shihongftfa}). In the problem, we are given a set of sites equipped with an unconstrained number of facilities as resources, and a set of clients with set $\\mathcal{R}$ as corresponding connection requirements, where every facility belonging to the same site has an identical opening (operating) cost and every client-facility pair has a connection cost. The objective is to allocate facilities from sites to satisfy $\\mathcal{R}$ at a minimum total cost. Next, we introduce the Constrained Fault-Tolerant Resource Allocation (CFTRA) problem. It differs from UFTRA in that the number of resources available at each site $i$ is limited by $R_{i}$. Both problems are practical extensions of the classical Fault-Tolerant Facility Location (FTFL) problem \\cite{Jain00FTFL}. For instance, their solutions provide optimal resource allocation (w.r.t. enterprises) and leasing (w.r.t. clients) strategies for the cont...

  9. Advanced information processing system: Fault injection study and results

    Burkhardt, Laura F.; Masotto, Thomas K.; Lala, Jaynarayan H.

    1992-01-01

    The objective of the AIPS program is to achieve a validated fault tolerant distributed computer system. The goals of the AIPS fault injection study were: (1) to present the fault injection study components addressing the AIPS validation objective; (2) to obtain feedback for fault removal from the design implementation; (3) to obtain statistical data regarding fault detection, isolation, and reconfiguration responses; and (4) to obtain data regarding the effects of faults on system performance. The parameters are described that must be varied to create a comprehensive set of fault injection tests, the subset of test cases selected, the test case measurements, and the test case execution. Both pin level hardware faults using a hardware fault injector and software injected memory mutations were used to test the system. An overview is provided of the hardware fault injector and the associated software used to carry out the experiments. Detailed specifications are given of fault and test results for the I/O Network and the AIPS Fault Tolerant Processor, respectively. The results are summarized and conclusions are given.

  10. Adaptive Fault-Tolerant Routing in 2D Mesh with Cracky Rectangular Model

    Yi Yang

    2014-01-01

    Full Text Available This paper mainly focuses on routing in two-dimensional mesh networks. We propose a novel faulty block model, which is cracky rectangular block, for fault-tolerant adaptive routing. All the faulty nodes and faulty links are surrounded in this type of block, which is a convex structure, in order to avoid routing livelock. Additionally, the model constructs the interior spanning forest for each block in order to keep in touch with the nodes inside of each block. The procedure for block construction is dynamically and totally distributed. The construction algorithm is simple and ease of implementation. And this is a fully adaptive block which will dynamically adjust its scale in accordance with the situation of networks, either the fault emergence or the fault recovery, without shutdown of the system. Based on this model, we also develop a distributed fault-tolerant routing algorithm. Then we give the formal proof for this algorithm to guarantee that messages will always reach their destinations if and only if the destination nodes keep connecting with these mesh networks. So the new model and routing algorithm maximize the availability of the nodes in networks. This is a noticeable overall improvement of fault tolerability of the system.

  11. Algorithm-Based Fault Tolerance for Numerical Subroutines

    Tumon, Michael; Granat, Robert; Lou, John

    2007-01-01

    A software library implements a new methodology of detecting faults in numerical subroutines, thus enabling application programs that contain the subroutines to recover transparently from single-event upsets. The software library in question is fault-detecting middleware that is wrapped around the numericalsubroutines. Conventional serial versions (based on LAPACK and FFTW) and a parallel version (based on ScaLAPACK) exist. The source code of the application program that contains the numerical subroutines is not modified, and the middleware is transparent to the user. The methodology used is a type of algorithm- based fault tolerance (ABFT). In ABFT, a checksum is computed before a computation and compared with the checksum of the computational result; an error is declared if the difference between the checksums exceeds some threshold. Novel normalization methods are used in the checksum comparison to ensure correct fault detections independent of algorithm inputs. In tests of this software reported in the peer-reviewed literature, this library was shown to enable detection of 99.9 percent of significant faults while generating no false alarms.

  12. Fault Diagnosis and Accommodation of LTI systems by modified Youla parameterization

    Minupriya A

    2012-06-01

    Full Text Available In this paper an Active Fault Tolerant Control (FTC scheme is proposed for Linear Time Invariant (LTI systems, which achieves fault diagnosis followed by fault accommodation. The fault diagnosis scheme is carried out in two steps; Fault detection followed by Fault isolation. Fault detection filter use the sensor measurements to generate residuals, which have a unique static pattern in response to each fault. Distortion in these static patterns generates the probability of the presence of fault. The fault accommodation scheme is carried out using the Generalized Internal Model Control (GIMC architecture, also known as modified Youla parameterization. In addition, performance indices are also evaluated to indicate that the resulting fault tolerant scheme can detect, identify and accommodate actuator and sensor faults under additive faults. The DC motor example is considered for the demonstration of the proposed scheme.

  13. A Modular and Fault-Tolerant Data Transport Framework

    Steinbeck, Timm M

    2009-01-01

    The High Level Trigger (HLT) of the future ALICE heavy-ion experiment has to reduce its input data rate of up to 25 GB/s to at most 1.25 GB/s for output before the data is written to permanent storage. To cope with these data rates a large PC cluster system is being designed to scale to several 1000 nodes, connected by a fast network. For the software that will run on these nodes a flexible data transport and distribution software framework, described in this thesis, has been developed. The framework consists of a set of separate components, that can be connected via a common interface. This allows to construct different configurations for the HLT, that are even changeable at runtime. To ensure a fault-tolerant operation of the HLT, the framework includes a basic fail-over mechanism that allows to replace whole nodes after a failure. The mechanism will be further expanded in the future, utilizing the runtime reconnection feature of the framework's component interface. To connect cluster nodes a communication ...

  14. A Modular and Fault-Tolerant Data Transport Framework

    Steinbeck, T M; Steinbeck, Timm M

    2004-01-01

    The High Level Trigger (HLT) of the future ALICE heavy-ion experiment has to reduce its input data rate of up to 25 GB/s to at most 1.25 GB/s for output before the data is written to permanent storage. To cope with these data rates a large PC cluster system is being designed to scale to several 1000 nodes, connected by a fast network. For the software that will run on these nodes a flexible data transport and distribution software framework, described in this thesis, has been developed. The framework consists of a set of separate components, that can be connected via a common interface. This allows to construct different configurations for the HLT, that are even changeable at runtime. To ensure a fault-tolerant operation of the HLT, the framework includes a basic fail-over mechanism that allows to replace whole nodes after a failure. The mechanism will be further expanded in the future, utilizing the runtime reconnection feature of the framework's component interface. To connect cluster nodes a communication ...

  15. Fault Tolerant Techniques for Spacecraft Data Recorders

    Anderson, Scott

    1990-01-01

    This paper presents the techniques for improving system reliability which SEAKR Engineering employs in the design of their spacecraft solid state data recorders. Briefly, these techniques include Hamming code error correction, periodic memory scrubbing, latch-up protection, excessive capacity, redundant power supplies/control/bus circuits, microcode protection, and shielding.

  16. Experimental Robot Position Sensor Fault Tolerance Using Accelerometers and Joint Torque Sensors

    Aldridge, Hal A.; Juang, Jer-Nan

    1997-01-01

    Robot systems in critical applications, such as those in space and nuclear environments, must be able to operate during component failure to complete important tasks. One failure mode that has received little attention is the failure of joint position sensors. Current fault tolerant designs require the addition of directly redundant position sensors which can affect joint design. The proposed method uses joint torque sensors found in most existing advanced robot designs along with easily locatable, lightweight accelerometers to provide a joint position sensor fault recovery mode. This mode uses the torque sensors along with a virtual passive control law for stability and accelerometers for joint position information. Two methods for conversion from Cartesian acceleration to joint position based on robot kinematics, not integration, are presented. The fault tolerant control method was tested on several joints of a laboratory robot. The controllers performed well with noisy, biased data and a model with uncertain parameters.

  17. Effect analysis of faults in digital I and C systems of nuclear power plants

    A reliability analysis of digital instrumentation and control (I and C) systems in nuclear power plants has been introduced as one of the important elements of a probabilistic safety assessment because of the unique characteristics of digital I and C systems. Digital I and C systems have various features distinguishable from those of analog I and C systems such as software and fault-tolerant techniques. In this work, the faults in a digital I and C system were analyzed and a model for representing the effects of the faults was developed. First, the effects of the faults in a system were analyzed using fault injection experiments. A software-implemented fault injection technique in which faults can be injected into the memory was used based on the assumption that all faults in a system are reflected in the faults in the memory. In the experiments, the effect of a fault on the system output was observed. In addition, the success or failure in detecting the fault by fault-tolerant functions included in the system was identified. Second, a fault tree model for representing that a fault is propagated to the system output was developed. With the model, it can be identified how a fault is propagated to the output or why a fault is not detected by fault-tolerant techniques. Based on the analysis results of the proposed method, it is possible to not only evaluate the system reliability but also identify weak points of fault-tolerant techniques by identifying undetected faults. The results can be reflected in the designs to improve the capability of fault-tolerant techniques. (author)

  18. Data Structures: Sequence Problems, Range Queries, and Fault Tolerance

    Jørgensen, Allan Grønlund

    terms of hardware performance and money in the design of todays high speed memory technologies. Hardware, power failures, and environmental conditions such as cosmic rays and alpha particles can all alter the memory in unpredictable ways. In applications where large memory capacities are needed at low...... etc. We consider data structures for two classic statistics functions, namely median and mode. Finally, Part III investigates fault tolerant algorithms and data structures. This deals with the trend of avoiding elaborate error checking and correction circuitry that would impose non-negligible costs in...... cost, it makes sense to assume that the algorithms themselves are in charge for dealing with memory faults. We investigate searching, sorting and counting algorithms and data structures that provably returns sensible information in spite of memory corruptions....

  19. Structural Fault Tolerance of Scale-Free Networks

    HAO Jingbo; YIN Jianping; ZHANG Boyun

    2007-01-01

    The fault tolerance of scale-free networks is examined in this paper. Through the simulation on the changes of the average path length and network fragmentation of the Barabasi-Albert model when faults happen, it can be observed that generic scale-free networks are quite robust to random failures, but are very vulnerable to targeted attacks at the same time. Therefore, an existing optimization strategy for the robustness of scale-free networks to failures and attacks is also introduced. The simulation similar with the above proved that the so-called (1, 0) network has potentially interconnectedness closer to that of a scale-free network and robustness to targeted attacks closer to that of an exponential network. Furthermore, its resistance to random failures is better than that of either of them.

  20. Fault-Tolerant Tree-Based Multicasting in Mesh Multicomputers

    WU Jie; CHEN Xiao

    2001-01-01

    We propose a fault-tolerant tree-based multicast algorithm for 2-dimensional (2-D) meshes based on the concept of the extended safety level which is a vector associated with each node to capture fault information in the neighborhood. In this approach each destination is reached through a minimum number of hops. In order to minimize the total number of traffic steps, three heuristic strategies are proposed. This approach can be easily implemented by pipelined circuit switching (PCS). A simulation study is conducted to measure the total number of traffic steps under different strategies. Our approach is the first attempt to address the faulttolerant tree-based multicast problem in 2-D meshes based on limited global information with a simple model and succinct information.

  1. Review of fault diagnosis and fault-tolerant control for modular multilevel converter of HVDC

    Liu, Hui; Loh, Poh Chiang; Blaabjerg, Frede

    This review focuses on faults in Modular Multilevel Converter (MMC) for use in high voltage direct current (HVDC) systems by analyzing the vulnerable spots and failure mechanism from device to system and illustrating the control & protection methods under failure condition. At the beginning......, several typical topologies of MMC-HVDC systems are presented. Then fault types such as capacitor voltage unbalance, unbalance between upper and lower arm voltage are analyzed and the corresponding fault detection and diagnosis approaches are explained. In addition, more attention is dedicated to control...

  2. The optimization of global fault tolerant trajectory for redundant manipulator based on self-motion

    Zhang Jian

    2015-01-01

    Full Text Available The redundancy feature of manipulators provides the possibility for the fault tolerant trajectory planning. Aiming at the completion of the specific task, an algorithm of global fault tolerant trajectory optimization for redundant manipulator based on the self-motion is proposed in this paper. Firstly, inverse kinematics equation of single redundancy manipulator based on self-motion variable and null-space velocity array of Jacobian are analyzed. Secondly, the mathematical description of fault tolerance criteria of the configuration of manipulator is established and the fault tolerance configuration group of manipulator is obtained by using iteration traversal under the fault tolerance criteria. Then, considering the joint limits and minimum the energy consumption as the optimization target, the global fault tolerant joint trajectory is achieved. Finally, simulation for 7 degree of freedom (DOF manipulator is performed, by which the effectiveness of the algorithm is validated.

  3. Fault tolerant control of electric pitch control system based on single current detection%基于单电流检测的电动变桨系统容错控制

    李宏伟; 付勃; 董海鹰; 杨立霞; 王睿敏

    2016-01-01

    针对电动变桨系统中常见的电流传感器故障,提出一种基于单电流检测的电动变桨系统变论域模糊容错控制方法。当变桨系统发生单个或两个电流传感器故障时,该方法利用直流母线电流传感器对所缺失的电流信息进行重构,保证三相电流能在任意两个相邻采样周期内得到及时更新,确保闭环系统稳定,并通过自适应阈值故障判断法完成故障相电流传感器的切换及容错。针对调制法引起的重构信号误差及电动变桨系统的主要控制目标,将变论域模糊控制方法应用于速度环,以改善系统抗负载扰动能力,提高容错系统鲁棒性。结果表明,该容错控制方法使得变桨系统在传感器故障情况下,牺牲部分系统性能后依然具有较理想的控制特性,并且该方法的正确性也得到了验证。%In view of the current sensors failure in electric pitch system,a variable universe fuzzy fault tolerant control method of electric pitch control system based on single current detection is proposed.When there is single or two-current sensor fault occurs,based on the proposed method the missing current information can be reconstructed by using direct current (DC)bus current sensor and the three-phase current can be updated in time within any two adjacent sampling periods,so as to ensure sta-bility of the closed-loop system.And then the switchover and fault tolerant control of fault current sensor would be accom-plished by fault diagnosis method based on adaptive threshold judgment.For the reconstructed signal error caused by the modu-lation method and the main control target of electric pitch system,a variable universe fuzzy control method is used in the speed loop,which can improve the anti-disturbance ability to load variation,and the robustness of fault tolerance system.The results show that the fault tolerant control method makes the variable pitch control system still has ideal

  4. Fault-Tolerant Visual Secret Sharing Schemes without Pixel Expansion

    Justie Su-Tzu Juan

    2016-01-01

    Full Text Available Visual cryptography encrypts a secret image into two meaningless random images, called shares, such that it can be decrypted by human vision without any calculations. However, there would be problems in alignment when these two shares are staked by hand in practice. Therefore, this paper presents the fault-tolerant schemes of stacking two shares that are acquired from secret image encryption without pixel expansion. The main idea of these schemes is to combine several pixels as a unit and then to encrypt each unit into a specific combination of pixels. Both theoretical analysis and simulation results demonstrate the effectiveness and practicality of the proposed schemes.

  5. Data center networks topologies, architectures and fault-tolerance characteristics

    Liu, Yang; Veeraraghavan, Malathi; Lin, Dong; Hamdi, Mounir

    2013-01-01

    This SpringerBrief presents a survey of data center network designs and topologies and compares several properties in order to highlight their advantages and disadvantages. The brief also explores several routing protocols designed for these topologies and compares the basic algorithms to establish connections, the techniques used to gain better performance, and the mechanisms for fault-tolerance. Readers will be equipped to understand how current research on data center networks enables the design of future architectures that can improve performance and dependability of data centers. This con

  6. Implementing fault tolerance in a superconducting quantum circuit

    Barends, Rami

    2015-03-01

    The surface code error correction scheme is appealing for superconducting circuits as the fundamental operations have been demonstrated at the fault-tolerant threshold. Here, we present experimental results on the repetition code, a one-dimensional primitive of the surface code which can detect bit-flip errors, implemented on a device consisting of nine Xmon transmon qubits. We discuss the basic mechanics of error detection, show preservation of a Greenberger-Horne-Zeilinger state, and show suppression of environmentally-induced error.

  7. Fault Tolerant Control Design for the Longitudinal Aircraft Dynamics using Quantitative Feedback Theory

    Ossmann, Daniel

    2015-01-01

    Flight control laws of modern aircraft are scheduled with respect to flight point parameters. The loss of the air data measurement system implies inevitably the loss of relevant scheduling information. A strategy to design a fault tolerant longitudinal flight control system is proposed which can accommodate the total loss of the angle of attack and the calibrated airspeed measurements. In this scenario the described robust longitudinal control law is employed ensuring a control performance ...

  8. Performance modeling of fault-tolerant circuit-switched communication networks

    Safaei, F.; Khonsari, A.; Fathy, M.; Alzeidi, N.; Ould-Khaoua, M.

    2006-01-01

    Circuit switching (CS) has been suggested as an efficient switching method for supporting simultaneous communications (such as data, voice, and images) across parallel systems due to its ability to preserve both communication performance and fault-tolerant demands in such systems. In this paper we present an efficient scheme to capture the mean message latency in 2D torus with CS in the presence of faulty components. We have also conducted extensive simulation experiments, the results of whic...

  9. A Vascular-Network-Based Nonuniform Hierarchical Fault-Tolerant Routing Algorithm for Wireless Sensor Networks

    Hongbing Li; Peng Gao; Qingyu Xiong; Weiren Shi; Qiang Chen

    2012-01-01

    Fault tolerance is the key technology in wireless sensor networks which attracts many research interests. Aiming at the issue that the nodes' failures affect the network's stability and service quality, a vascular-network-based fault-tolerant routing algorithm is presented by nonuniform hierarchical clustering. According to the distribution characteristics of the vascular network and inspirations to the fault tolerance for wireless sensor networks, a mathematical model and network topology ar...

  10. Selecting Fault Tolerant Styles for Third-Party Components with Model Checking Support

    Li, Junguo; Chen, Xiangping; Huang, Gang; HONG, MEI; Chauvel, Franck

    2009-01-01

    To build highly available or reliable applications out of unreliable third-party components, some software-implemented fault-tolerant mechanisms are introduced to gracefully deal with failures in the components. In this paper, we address an important issue in the approach: how to select the most suitable fault-tolerant mechanisms for a given application in a specific context. To alleviate the difficulty in the selection, these mechanisms are abstracted as Fault-tolerant styles (FTSs) at first...

  11. Prognostics Enhancemend Fault-Tolerant Control with an Application to a Hovercraft Project

    National Aeronautics and Space Administration — Fault-Tolerant Control (FTC) is an emerging area of engineering and scientific research that integrates prognostics, health management concepts and intelligent...

  12. Fault injection system for automatic testing system

    王胜文; 洪炳熔

    2003-01-01

    Considering the deficiency of the means for confirming the attribution of fault redundancy in the re-search of Automatic Testing System(ATS) , a fault-injection system has been proposed to study fault redundancyof automatic testing system through compurison. By means of a fault-imbeded environmental simulation, thefaults injected at the input level of the software are under test. These faults may induce inherent failure mode,thus bringing about unexpected output, and the anticipated goal of the test is attained. The fault injection con-sists of voltage signal generator, current signal generator and rear drive circuit which are specially developed,and the ATS can work regularly by means of software simulation. The experimental results indicate that the faultinjection system can find the deficiency of the automatic testing software, and identify the preference of fault re-dundancy. On the other hand, some soft deficiency never exposed before can be identified by analyzing the tes-ting results.

  13. Subaru FATS (fault tracking system)

    Winegar, Tom W.; Noumaru, Junichi

    2000-07-01

    The Subaru Telescope requires a fault tracking system to record the problems and questions that staff experience during their work, and the solutions provided by technical experts to these problems and questions. The system records each fault and routes it to a pre-selected 'solution-provider' for each type of fault. The solution provider analyzes the fault and writes a solution that is routed back to the fault reporter and recorded in a 'knowledge-base' for future reference. The specifications of our fault tracking system were unique. (1) Dual language capacity -- Our staff speak both English and Japanese. Our contractors speak Japanese. (2) Heterogeneous computers -- Our computer workstations are a mixture of SPARCstations, Macintosh and Windows computers. (3) Integration with prime contractors -- Mitsubishi and Fujitsu are primary contractors in the construction of the telescope. In many cases, our 'experts' are our contractors. (4) Operator scheduling -- Our operators spend 50% of their work-month operating the telescope, the other 50% is spent working day shift at the base facility in Hilo, or day shift at the summit. We plan for 8 operators, with a frequent rotation. We need to keep all operators informed on the current status of all faults, no matter the operator's location.

  14. Performance and economy of a fault-tolerant multiprocessor

    Lala, J. H.; Smith, C. J.

    1979-01-01

    The FTMP (Fault-Tolerant Multiprocessor) is one of two central aircraft fault-tolerant architectures now in the prototype phase under NASA sponsorship. The intended application of the computer includes such critical real-time tasks as 'fly-by-wire' active control and completely automatic Category III landings of commercial aircraft. The FTMP architecture is briefly described and it is shown that it is a viable solution to the multi-faceted problems of safety, speed, and cost. Three job dispatch strategies are described, and their results with respect to job-starting delay are presented. The first strategy is a simple First-Come-First-Serve (FCFS) job dispatch executive. The other two schedulers are an adaptive FCFS and an interrupt driven scheduler. Three failure modes are discussed, and the FTMP survival probability in the face of random hard failures is evaluated. It is noted that the hourly cost of operating two FTMPs in a transport aircraft can be as little as one-to-two percent of the total flight-hour cost of the aircraft.

  15. Fault detection in photovoltaic systems

    Nilsson, David

    2014-01-01

    This master’s thesis concerns three different areas in the field of fault detection in photovoltaic systems.Previous studies have concerned homogeneous systems with a large set of parameters being observed,while this study is focused on a more restrictive case. The first problem is to discover immediate faults occurring in solar panels. A new online algorithm is developed based on similarity measures with in a single installation. It performs reliably and is able to detect all significant fau...

  16. Reliability Improvement of a T-Type Three-Level Inverter With Fault-Tolerant Control Strategy

    Choi, Uimin; Blaabjerg, Frede; Lee, Kyo-Beum

    2015-01-01

    This paper proposes a fault-tolerant control strategy for a T-type three-level inverter when an open-circuit fault occurs. The proposed method is explained by dividing fault into two cases: the faulty condition of half-bridge switches and neutral-point switches. In case of the open-circuit fault in...... a neutral-point switch, two methods will be proposed and compared based on thermal analysis and neutral-point voltage oscillation. The reliability of T-type inverter systems is improved considerably by the proposed algorithm when a switch fails. The proposed method does not require any additional...

  17. Fault tolerant wind turbine production operation and shutdown (Sustainable Control)

    Van Engelen, T.; Schuurmans, J.; Kanev, S.; Dong, J.; Verhaegen, M.H.G.; Hayashi, Y.

    2011-01-01

    Extreme environmental conditions as well as system failure are real-life phenomena. Especially offshore, extreme environmental conditions and system faults are to be dealt with in an effective way. The project Sustainable Control, a new approach to operate wind turbines (Agentschap NL, grant EOSLT02

  18. 2009 fault tolerance for extreme-scale computing workshop, Albuquerque, NM - March 19-20, 2009.

    Katz, D. S.; Daly, J.; DeBardeleben, N.; Elnozahy, M.; Kramer, B.; Lathrop, S.; Nystrom, N.; Milfeld, K.; Sanielevici, S.; Scott, S.; Votta, L.; Louisiana State Univ.; Center for Exceptional Computing; LANL; IBM; Univ. of Illinois; Shodor Foundation; Pittsburgh Supercomputer Center; Texas Advanced Computing Center; ORNL; Sun Microsystems

    2009-02-01

    This is a report on the third in a series of petascale workshops co-sponsored by Blue Waters and TeraGrid to address challenges and opportunities for making effective use of emerging extreme-scale computing. This workshop was held to discuss fault tolerance on large systems for running large, possibly long-running applications. The main point of the workshop was to have systems people, middleware people (including fault-tolerance experts), and applications people talk about the issues and figure out what needs to be done, mostly at the middleware and application levels, to run such applications on the emerging petascale systems, without having faults cause large numbers of application failures. The workshop found that there is considerable interest in fault tolerance, resilience, and reliability of high-performance computing (HPC) systems in general, at all levels of HPC. The only way to recover from faults is through the use of some redundancy, either in space or in time. Redundancy in time, in the form of writing checkpoints to disk and restarting at the most recent checkpoint after a fault that cause an application to crash/halt, is the most common tool used in applications today, but there are questions about how long this can continue to be a good solution as systems and memories grow faster than I/O bandwidth to disk. There is interest in both modifications to this, such as checkpoints to memory, partial checkpoints, and message logging, and alternative ideas, such as in-memory recovery using residues. We believe that systematic exploration of these ideas holds the most promise for the scientific applications community. Fault tolerance has been an issue of discussion in the HPC community for at least the past 10 years; but much like other issues, the community has managed to put off addressing it during this period. There is a growing recognition that as systems continue to grow to petascale and beyond, the field is approaching the point where we don

  19. Communications protocols for a fault tolerant, integrated local area network for Space Station applications

    Meredith, B. D.

    1984-01-01

    The evolutionary growth of the Space Station and the diverse activities onboard are expected to require a hierarchy of integrated,local area networks capable of supporting data, voice and video communications. In addition, fault tolerant network operation is necessary to protect communications between critical systems attached to the net and to relieve the valuable human resources onboard Space Station of day-to-day data system repair tasks. An experimental, local area network is being developed which will serve as a testbed for investigating candidate algorithms and technologies for a fault tolerant, integrated network. The establishment of a set of rules or protocols which govern communications on the net is essential to obtain orderly and reliable operation. A hierarchy of protocols for the experimental network is presented and procedures for data and control communications are described.

  20. Checkpoint and Replication Oriented Fault Tolerant Mechanism for MapReduce Framework

    Yang Liu

    2013-09-01

    Full Text Available MapReduce is an emerging programming paradigm and an associated implementation for processing and generating big data which has been widely applied in data-intensive systems. In cloud environment, node and task failure is no longer accidental but a common feature of large-scale systems. In MapReduce framework, although the rescheduling based fault-tolerant method is simple to implement, it failed to fully consider the location of distributed data, the computation and storage overhead. Thus, a single node failure will increase the completion time dramatically. In this paper, a Checkpoint and Replication Oriented Fault Tolerant scheduling algorithm (CROFT is proposed, which takes both task and node failure into consideration. Preliminary experiments show that with less storage and network overhead. CROFT will significantly reduce the completion time at failure time, and the overall performance of MapReduce can be improved at least over 30% than original mechanism in Hadoop.  

  1. The Design and Reliability Analysis of the Self-organizing Fault-tolerant System%自组织容错系统的设计及其可靠性分析

    巨政权; 满梦华; 原亮

    2012-01-01

    以生物神经系统的容错机理为基础,借鉴传统的人工神经网络的构建方式,结合电子电路的构造特点,提出了一种自组织神经网络模型,构建了自组织容错系统,并对构建的容错系统建立可靠性模型.通过可靠性分析发现,对于同功能不同拓扑的承载电路,当其占用神经元越少、在各层上分布越均匀时,其可靠性越高.%The self-organizing neural networks model was proposed based on the fault-tolerant mechanism of the biological nervous system,the structure characteristic of the traditional artificial neural network and the electronic circuit,and then the self-organizing Fault-tolerant System and its reliability model were constructed.Through dependability analyzed,we found that the fewer the cell in the circuit,and the more evenly to distribute on every floor,the higher of its dependability.

  2. Wiring systems and fault finding

    Scaddan, Brian

    1905-01-01

    This book deals with an area of practice which many students and non-electricians find particularly challenging. It explains how to interpret circuit diagrams, wiring systems and the principles and practice of testing and fault diagnosis. It will give the reader confidence to understand the principles of testing and to apply this knowledge to fault finding in electrical circuits.It is a handy reference for anybody who needs to be able to trace faults in circuits, whether in domestic, commercial or industrial settings. It will be a time-saver for all electricians, plumbers, heating engineers, t

  3. Design and implementation of an SPB converter for fault tolerant PMSynRel motor control

    Apostolopoulos, Nikolaos

    2015-01-01

    The stacked polyphase bridges (SPB) converter topology is investigated in the presentthesis as a fault-tolerant choice for permanent-magnet synchronous reluctance (PMSyn-Rel) motor control. Integrated motor drive systems are studied as they offer great benefitsfor propulsion applications. Moreover, the importance of a modular topology, like theSPB, for an electric powertrain is discussed. The latter consists of a number of seriesconnected, 3-phase 2-level inverter submodules that supply separ...

  4. Decoherence-Free Subspaces for Multiple-Qubit Errors (II) Universal, Fault-Tolerant Quantum Computation

    Lidar, D A; Kempe, J; Whaley, K B; Lidar, Daniel A.; Bacon, David; Kempe, Julia

    2001-01-01

    Decoherence-free subspaces (DFSs) shield quantum information from errors induced by the interaction with an uncontrollable environment. Here we study a model of correlated errors forming an Abelian subgroup (stabilizer) of the Pauli group (the group of tensor products of Pauli matrices). Unlike previous studies of DFSs, this type of errors does not involve any spatial symmetry assumptions on the system-environment interaction. We solve the problem of universal, fault-tolerant quantum computation on the associated class of DFSs.

  5. A Hypermedia Distributed Application for Monitoring and Fault-Injection in Embedded Fault-tolerant Parallel Programs

    De Florio, Vincenzo; Deconinck, G.; Truyens, M.; Rosseel, W.; Lauwereins, Rudy

    2014-01-01

    We describe a distributed, multimedia application which is being developed in the framework of the ESPRIT-IV Project 21012 EFTOS (Embedded Fault-Tolerant Supercomputing). The application dynamically sets up a hierarchy of HTML pages reflecting the current status of an EFTOS-compliant dependable application running on a Parsytec CC system. These pages are fed to a World-Wide Web browser playing the role of a hypermedia monitor. The adopted approach allows the user to concentrate on the high-le...

  6. A Simple Fault-Tolerant Adaptive and Minimal Routing Approach in 3-D Meshes

    WU Jie(吴杰)

    2003-01-01

    In this paper we propose a sufficient condition for minimal routing in 3-dimensional (3-D) meshes with faulty nodes. It is based on an early work of the author on minimal routing in 2-dimensional (2-D) meshes. Unlike many traditional models that assume all the nodes know global fault distribution or just adjacent fault information, our approach is based on the concept of limited global fault information. First, we propose a fault model called faulty cube in which all faulty nodes in the system are contained in a set of faulty cubes. Fault information is then distributed to limited number of nodes while it is still sufficient to support minimal routing. The limited fault information collected at each node is represented by a vector called extended safety level. The extended safety level associated with a node can be used to determine the existence of a minimal path from this node to a given destination. Specifically, we study the existence of minimal paths at a given source node, limited distribution of fault information, minimal routing, and deadlock-free and livelock-free routing. Our results show that any minimal routing that is partially adaptive can be applied in our model as long as the destination node meets a certain condition. We also propose a dynamic planar-adaptive routing scheme that offers better fault tolerance and adaptivity than the planar-adaptive routing scheme in 3-D meshes. Our approach is the first attempt to address adaptive and minimal routing in 3-D meshes with faulty nodes using limited fault information.

  7. Fault-tolerant permanent-magnet synchronous machine drives: fault detection and isolation, control reconfiguration and design considerations

    MEINGUET, Fabien

    2012-01-01

    The need for efficiency, reliability and continuous operation has lead over the years to the development of fault-tolerant electrical drives for various industrial purposes and for transport applications. Permanent-magnet synchronous machines have also been gaining interest due to their high torque-to-mass ratio and high efficiency, which make them a very good candidate to reduce the weight and volume of the equipment.In this work, a multidisciplinary approach for the design of fault-tolerant...

  8. A learning system for fault finding

    Tunevi, Anders

    1989-01-01

    A learning system for fault finding has been constructed. This system contains many different types of knowledge, three ways to find faults and four ways to learn fault finding. The constructed learning system works for a class of fault finding problems. This class has been described in the paper. The developed system could be viewed as an architecture of a general learning system for fault finding. The system could also be used as a testbench of learning mechanisms. The experiences from this...

  9. Diagnosis and Fault-tolerant Control for Ship Station Keeping

    Blanke, Mogens

    2005-01-01

    design for systems of high complexity, and also analyse the cases of cascaded or multiple faults. The paper takes as example a ship with two CP propellers, rudders and a bow thruster as actuators, and instrumentation with a suite of global position sensors, inertial navigation units and conventional gyro...

  10. A DYNAMIC FAULT TOLERANT ALGORITHM FOR IMPROVISING PERFORMANCE OF MULTIMEDIA SERVICES

    2005-01-01

    Multimedia Services has drawn much attention from both industrial and academic researchers due to the emerging consumer market, how to provide High-Availability service is one of most important issues to take into account. In this paper, a dynamic fault tolerant algorithm is presented for highly available distributed multimedia service, then by introducing SLB(server load balancing) into fault tolerance and switching servers in different ways according to their functions, the proposed schema can preserve reliability and real-time of the system .The analysis and experiments indicate that resuming server's faulty by this method is smooth and transparent to the client The proposed algorithm is effectively improving the reliability of the multimedia service.

  11. Systematic Fault Tolerant Control Based on Adaptive Thau Observer Estimation for Quadrotor Uavs

    Cen Zhaohui

    2015-03-01

    Full Text Available A systematic fault tolerant control (FTC scheme based on fault estimation for a quadrotor actuator, which integrates normal control, active and passive FTC and fault parking is proposed in this paper. Firstly, an adaptive Thau observer (ATO is presented to estimate the quadrotor rotor fault magnitudes, and then faults with different magnitudes and time-varying natures are rated into corresponding fault severity levels based on the pre-defined fault-tolerant boundaries. Secondly, a systematic FTC strategy which can coordinate various FTC methods is designed to compensate for failures depending on the fault types and severity levels. Unlike former stand-alone passive FTC or active FTC, our proposed FTC scheme can compensate for faults in a way of condition-based maintenance (CBM, and especially consider the fatal failures that traditional FTC techniques cannot accommodate to avoid the crashing of UAVs. Finally, various simulations are carried out to show the performance and effectiveness of the proposed method.

  12. Fault tolerant position mooring control based on structural reliability

    Fang, Shaoji

    2012-01-01

    Safety of position mooring system is a prime concern in the marine industry and regulations are in force to prevent faults in equipment from causing failure of the whole system. There are many reasons for failures in elements of a mooring system, e.g. material damage, overload, fatigue, brittle fracture, corrosion, abrasion, extreme environment. For mooring cables, one of the typical reasons for failure is overload due to frequent replacement and inspections. Although a design based on knowle...

  13. Fault tolerant workflow scheduling based on replication and resubmission of tasks in Cloud Computing

    Jayadivya S K

    2012-06-01

    Full Text Available The aim of workflow scheduling system is to schedule the workflows within the user given deadline to achieve a good success rate. Workflow is a set of tasks processed in a predefined order based on its data and control dependency. Scheduling these workflows in a computing environment, like cloud environment, is an NP-Complete problem and it becomes more challenging when failures of tasks areconsidered. To overcome these failures, the workflow scheduling system should be fault tolerant. In this paper, the proposed Fault Tolerant Workflow Scheduling algorithm (FTWS provides fault tolerance by using replication and resubmission of tasks based on priority of the tasks. The replication of tasks depends on a heuristic metric which is calculated by finding the tradeoff between the replication factor and resubmission factor. The heuristic metric is considered because replication alone may lead to resource wastage and resubmission alone may increase makespan. Tasks are prioritized based on the criticality of the task which is calculated by using parameters like out degree, earliest deadline and high resubmission impact. Priority helps in meeting the deadline of a task and thereby reducing wastage of resources. FTWS schedules workflows within a deadline even in the presence of failures without using any history of information. The experiments were conducted in a simulated cloud environment by scheduling workflows in the presence of failures which are generated randomly. The experimental results of the proposed work demonstrate the effective success rate in-spite of various failures.

  14. Fault Tolerance Design of Frequency Conversion and Velocity Modulation Multi-CPU Control Computer System%变频调速多CPU控制计算机系统的容错设计

    刘成印; 马鹏祥; 徐红春

    2001-01-01

    According to the control characteristics of cycloconverter, this paper proposes a new multi-CPU computer control system. This architecture consists of self-diagnosis, fault-tolerance and multi-CPU computer control technology. Not only is reliability of computer control system improved, but also more complicated system control strategy can be realized.%根据交-交变频器的控制特性,提出了较为新颖的多CPU计算机控制系统结构。该结构集自诊断、容错、多CPU控制技术于一体,不但提高了计算机控制系统的可靠性,而且可实现较为复杂的系统控制策略。

  15. Fault-tolerant topology in the wireless sensor networks for energy depletion and random failure

    Nodes in the wireless sensor networks (WSNs) are prone to failure due to energy depletion and poor environment, which could have a negative impact on the normal operation of the network. In order to solve this problem, in this paper, we build a fault-tolerant topology which can effectively tolerate energy depletion and random failure. Firstly, a comprehensive failure model about energy depletion and random failure is established. Then an improved evolution model is presented to generate a fault-tolerant topology, and the degree distribution of the topology can be adjusted. Finally, the relation between the degree distribution and the topological fault tolerance is analyzed, and the optimal value of evolution model parameter is obtained. Then the target fault-tolerant topology which can effectively tolerate energy depletion and random failure is obtained. The performances of the new fault tolerant topology are verified by simulation experiments. The results show that the new fault tolerant topology effectively prolongs the network lifetime and has strong fault tolerance. (general)

  16. Structural Design of Systems with Safe Behavior under Single and Multiple Faults

    Blanke, Mogens; Staroswiecki, Marcel

    Handling of multiple simultaneous faults is a complex issue in fault-tolerant control. The design task is particularly made difficult by to the numerous different cases that need be analyzed. Aiming at safe fault-handling, this paper shows how structural analysis can be applied to find the...... structural analysis to disclose which faults could be isolated from a structural point of view using active fault isolation. Results from application on a marine control system illustrate the concepts....

  17. An Evaluation of Fault Tolerant Wind Turbine Control Schemes applied to a Benchmark Model

    Odgaard, Peter Fogh; Stoustrup, Jakob

    keeping the power grids stable. Advanced Fault Tolerant Control is one of the potential tools to increase reliability of modern wind turbines. A benchmark model for wind turbine fault detection and isolation and fault tolerant control has previously been proposed, and based on this benchmark an...... benchmark and is especially good accommodating sensors faults. The two other evaluated solutions do also well accommodating sensors faults, but have some issues which should be worked on, before they can be considered as a full solution to the benchmark problem....

  18. Ultrafast and fault-tolerant quantum communication across long distances.

    Muralidharan, Sreraman; Kim, Jungsang; Lütkenhaus, Norbert; Lukin, Mikhail D; Jiang, Liang

    2014-06-27

    Quantum repeaters (QRs) provide a way of enabling long distance quantum communication by establishing entangled qubits between remote locations. In this Letter, we investigate a new approach to QRs in which quantum information can be faithfully transmitted via a noisy channel without the use of long distance teleportation, thus eliminating the need to establish remote entangled links. Our approach makes use of small encoding blocks to fault-tolerantly correct both operational and photon loss errors. We describe a way to optimize the resource requirement for these QRs with the aim of the generation of a secure key. Numerical calculations indicate that the number of quantum memory bits at each repeater station required for the generation of one secure key has favorable polylogarithmic scaling with the distance across which the communication is desired. PMID:25014798

  19. Certifying qubit operations below the fault tolerance threshold

    Blume-Kohout, Robin; Nielsen, Erik; Rudinger, Kenneth; Mizrahi, Jonathan; Fortier, Kevin; Maunz, Peter

    2016-01-01

    Quantum information processors promise fast algorithms for problems inaccessible to classical computers. But since qubits are noisy and error-prone, they will depend on fault-tolerant quantum error correction (FTQEC) to compute reliably. Quantum error correction can protect against general noise if -- and only if -- the error in each physical qubit operation is smaller than a certain threshold. The threshold for general errors is quantified by their diamond norm. Until now, qubits have been assessed primarily by randomized benchmarking (RB), which reports a different "error rate" that is not sensitive to all errors, cannot be compared directly to diamond norm thresholds, and cannot efficiently certify a qubit for FTQEC. We use gate set tomography (GST) to completely characterize the performance of a trapped-Yb$^+$-ion qubit and certify it rigorously as suitable for FTQEC by establishing that its diamond norm error rate is less than $6.7\\times10^{-4}$ with $95\\%$ confidence.

  20. Fault-Tolerant Energy-Efficient Tree in Dynamic WSNs

    Tarek Moulahi

    2013-04-01

    Full Text Available Broadcasting has a main importance in Wireless Sens or Networks (WSNs. Effectively, the sink node has to collect periodically, data from the environment supervised by sensors. To perform this operation, i t sends requests to all nodes. Furthermore, WSNs have a dynamic behaviour due to their evolution. At any time, a node can be retrieved from the network due to an exhausting energy or a node problem. In fac t, WSNs are prone to failure such as software o r hardware malfunctioning, exhaustion of energy, wireless interference and environmental hazards. Thus, an appropriate broadcasting method should take into consideration this aspect and uses the le ss possible amount of energy to accomplish the task . In this paper, a robust tree-based scheme is proposed which is called Robust Tree Broadcasting (RTB. The new scheme has a load-balanced behaviour which indu ces an efficient use of energy. In addition, RTB has a high-quality fault tolerant performance.

  1. Incorporating Fault Tolerance in LEACH Protocol for Wireless Sensor Networks

    Rudranath Mitra

    2012-06-01

    Full Text Available Routing protocols have been a challenging issue in wireless sensor networks. WSN is one of the focussed are of research because of its multi-aspect applications. These networks are self-organized using clustering algorithms to conserve energy. LEACH (Low-Energy Adaptive Clustering Hierarchy protocol[1] is one of the significant protocols for routing in WSN. In LEACH, sensor nodes are organized in several small clusters where there are cluster heads in each cluster. These CHs gather data from their local clusters aggregate them & send them to the base station. On the LEACH many new schemes have been proposed to enhance its activity like its efficiency, security etc. In this paper the fault tolerance issue is being incorporated.

  2. Multiple Dimensional Fault Tolerant Schemes for Crypto Stream Ciphers

    Chang N. Zhang

    2010-07-01

    Full Text Available To enhance the security and reliability of the widely-used stream ciphers, a 2-D and a 3-D mesh-knightAlgorithm Based Fault Tolerant (ABFT schemes for stream ciphers are developed which can beuniversally applied to RC4 and other stream ciphers. Based on the ready-made arithmetic unit in streamciphers, the proposed 2-D ABFT scheme is able to detect and correct any simple error, and the 3-D meshknightABFT scheme is capable of detecting and correcting up to three errors in an n2-data matrix with linercomputation and bandwidth overhead. The proposed schemes provide one-to-one mapping between dataindex and check sum group so that error can be located and recovered by easier logic and simpleoperations.

  3. Fault Tolerance Mechanism in Chip Many-Core Processors

    ZHANG Lei; HAN Yinhe; LI Huawei; LI Xiaowei

    2007-01-01

    As semiconductor technology advances, there will be billions of transistors on a single chip. Chip many-core processors are emerging to take advantage of these greater transistor densities to deliver greater performance. Effective fault tolerance techniques are essential to improve the yield of such complex chips. In this paper, a core-level redundancy scheme called N+M is proposed to improve N-core processors'yield by providing M spare cores. In such architecture, topology is an important factor because it greatly affects the processors'performance. The concept of logical topology and a topology reconfiguration problem are introduced, which is able to transparently provide target topology with lowest performance degradation as the presence of faulty cores on-chip. A row rippling and column stealing (RRCS) algorithm is also proposed. Results show that PRCS can give solutions with average 13.8% degradation with negligible computing time.

  4. A Semantics-Based Approachfor Achieving Self Fault-Tolerance of Protocols

    李腊元; 李春林

    2000-01-01

    The cooperation of different processes may be lost by mistake when a protocol is executed. The protocol cannot be normally operated under this condition. In this paper, the self fault-tolerance of protocols is discussed, and a semanticsbased approach for achieving self fault-tolerance of protocols is presented. Some main characteristics of self fault-tolerance of protocols concerning liveness, nontermination and infinity are also presented. Meanwhile, the sufficient and necessary conditions for achieving self fault-tolerance of protocols are given. Finally, a typical protocol that does not satisfy the self fault-tolerance is investigated, and a new redesign version of this existing protocol using the proposed approach is given.

  5. Design of Parity Preserving Logic Based Fault Tolerant Reversible Arithmetic Logic Unit

    Rakshith Saligram1

    2013-06-01

    Full Text Available Reversible Logic is gaining significant consideration as the potential logic design style for implementation in modern nanotechnology and quantum computing with minimal impact on physical entropy .Fault Tolerant reversible logic is one class of reversible logic that maintain the parity of the input and the outputs. Significant contributions have been made in the literature towards the design of fault tolerant reversible logic gate structures and arithmetic units, however, there are not many efforts directed towards the design of fault tolerant reversible ALUs. Arithmetic Logic Unit (ALU is the prime performing unit in any computing device and it has to be made fault tolerant. In this paper we aim to design one such fault tolerant reversible ALU that is constructed using parity preserving reversible logic gates. The designed ALU can generate up to seven Arithmetic operations and four logical operations

  6. Observer-Based Fault Estimation and Accomodation for Dynamic Systems

    Zhang, Ke; Shi, Peng

    2013-01-01

    Due to the increasing security and reliability demand of actual industrial process control systems, the study on fault diagnosis and fault tolerant control of dynamic systems has received considerable attention. Fault accommodation (FA) is one of effective methods that can be used to enhance system stability and reliability, so it has been widely and in-depth investigated and become a hot topic in recent years. Fault detection is used to monitor whether a fault occurs, which is the first step in FA. On the basis of fault detection, fault estimation (FE) is utilized to determine online the magnitude of the fault, which is a very important step because the additional controller is designed using the fault estimate. Compared with fault detection, the design difficulties of FE would increase a lot, so research on FE and accommodation is very challenging. Although there have been advancements reported on FE and accommodation for dynamic systems, the common methods at the present stage have design difficulties, whi...

  7. GEARSHIFT: Guaranteeing availability requirements in SLAs using hybrid fault tolerance

    Gonzalez, Andres Javier; Helvik, Bjarne Emil; Tiwari, Prakriti; Denis, Becker; Wittner, Otto Jonassen

    2015-01-01

    The dependability of ICT systems is vital for today's society. However, operational systems are not fault free. Providers and customers have to define clear availability requirements and penalties on the delivered services by using SLAs. Fulfilling the stipulated availability may be expensive. The lack of mechanisms that allow a fine control of the SLA risk may lead to over-dimension the provided resources. Therefore, a relevant question for ICT service providers is: How to guarantee the SLA ...

  8. Remedial brushless AC operation of fault-tolerant doubly salient permanent-magnet motor drives

    Zhao, W.; Chau, KT; Cheng, M.; Ji, J.; Zhu, X.

    2010-01-01

    The doubly salient permanent-magnet (DSPM) machine is a new class of stator-PM brushless machines, which inherently offers the fault-tolerant feature. In this paper, a new operation strategy is proposed and implemented for fault-tolerant DSPM motor drives. The key is to operate the DSPM motor drive in a remedial brushless ac (BLAC) mode under the open-circuit fault condition, while operating in the conventional brushless dc mode under normal condition. Both cosimulation and experimental resul...

  9. A TESTING FRAMEWORK FOR FAULT TOLERANT COMPOSITION OF TRANSACTIONAL WEB SERVICES

    Deepali Diwase; Pujashree Vidap

    2012-01-01

    Software testers have great challenges in testing of web services therefore testing technique must be developed for testing of web services. Web service composition is an active research area over last few years. This paper proposes a framework for testing of fault tolerant composition of web services. It will tolerate faults whilecomposition of web services. Exception handling and transaction techniques are used as fault handling mechanisms. After composition web services are deployed on WS-...

  10. Fault-tolerant digital microfluidic biochips compilation and synthesis

    Pop, Paul; Stuart, Elena; Madsen, Jan

    2016-01-01

    This book describes for researchers in the fields of compiler technology, design and test, and electronic design automation the new area of digital microfluidic biochips (DMBs), and thus offers a new application area for their methods.  The authors present a routing-based model of operation execution, along with several associated compilation approaches, which progressively relax the assumption that operations execute inside fixed rectangular modules.  Since operations can experience transient faults during the execution of a bioassay, the authors show how to use both offline (design time) and online (runtime) recovery strategies. The book also presents methods for the synthesis of fault-tolerant application-specific DMB architectures. ·         Presents the current models used for the research on compilation and synthesis techniques of DMBs in a tutorial fashion; ·         Includes a set of “benchmarks”, which are presented in great detail and includes the source code of most of the t...

  11. Second-order sliding mode fault-tolerant control of heat recovery steam generator boiler in combined cycle power plants

    Power generation plants are intrinsically complex systems due to their numerous internal components. Higher energy efficiency in power plants is now achieved through employing combined cycles. In this article, an adaptive robust Sliding Mode Controller (SMC) is designed to overcome the faults in Heat Recovery Steam Generator boilers (HRSG boilers) as one of the main parts of a combined cycle plant. On condition that a fault occurs in the HRSG boiler, the control system must be able to reconfigure its parameters to maintain the admissible thresholds in dynamic variables such as drum pressure, steam temperature, and drum water level. To achieve good performance for the boiler, the proposed adaptive robust SMC shall conquer the effects of faults and uncertainties by estimating their upper bounds adaptively, and force the outputs of the multivariable boiler to track the outputs of a desired multivariable reference model. Manipulating a suitable control input and using second-order sliding mode control strategy, the output tracking error slides to zero on a PID sliding surface. Besides tracking, the controlled boiler tolerates faults in system matrix, faults in input matrix, and external disturbance signal. Numerical simulations confirm the effectiveness of the proposed FTC (Fault-Tolerant Control) system for an uncertain non-minimum phase HRSG boiler. Highlights: ► This paper proposes a PID-based adaptive second-order sliding mode controller (SMC). ► SMC is robust to actuator and sensor faults and tracks outputs of a reference system. ► SMC is used in fault tolerant control of a heat recovery steam generator boilers. ► Boiler and reference system have different number of states and inputs. ► Performance of SMC is investigated with different faults scenarios in simulations.

  12. Fault Diagnosis for Electrical Distribution Systems using Structural Analysis

    Knüppel, Thyge; Blanke, Mogens; Østergaard, Jacob

    Fault-tolerance in electrical distribution relies on the ability to diagnose possible faults and determine which components or units cause a problem or are close to doing so. Faults include defects in instrumentation, power generation, transformation and transmission. The focus of this paper is the...... structure graph. This paper shows how three-phase networks are modelled and analysed using structural methods, and it extends earlier results by showing how physical faults can be identified such that adequate remedial actions can be taken. The paper illustrates a feasible modelling technique for structural...... analysis of power systems, it demonstrates detection and isolation of failures in a network, and shows how typical faults are diagnosed. Nonlinear fault simulations illustrate the results....

  13. A performance evaluation of the software-implemented fault-tolerance computer

    Palumbo, D. L.; Butler, R. W.

    1986-01-01

    The results of a performance evaluation of the Software-Implemented Fault-Tolerance (SIFT) computer system conducted in the NASA Avionics Integration Research Laboratory are presented. The essential system functions are described and compared to both earlier design proposals and subsequent design improvements. Using SIFT's specimen task load, the executive tasks, such as reconfiguration, clock synchronization, and interactive consistency, are found to consume significant computing resources. Together with other system overhead (e.g., voting and scheduling), the operating system overhead is in excess of 60 percent. The authors propose specific design changes that reduce this overhead burden significantly.

  14. An online controller redesign based fault-tolerant strategy for thermal comfort in a multi-zone building

    Yamé, Joseph Julien; Jain, Tushar; Sauter, Dominique

    2015-01-01

    Variable air volume (VAV) boxes are used in multi-zone buildings, which varies the volume of constant temperature air separately supplied by an air-handling unit to meet challenging thermal load demands of individual zones. A fault appearing in the VAV damper severely affects the control performance and consequently, the energy efficiency of the overall system. To address this issue, we present a novel online redesign based fault-tolerant control strategy in this paper for a VAV thermal syste...

  15. Fault tolerant model predictive control of open channels

    Horváth, Klaudia; Blesa Izquierdo, Joaquim; Duviella, Eric; Chuquet, Karine

    2014-01-01

    Automated control of water systems (irrigation canals, navigation canals, rivers etc.) relies on the measured data. The control action is calculated, in case of feedback controller, directly from the on-line measured data. If the measured data is corrupted, the calculated control action will have a different effect than it is desired. Therefore, it is crucial that the feedback controller receives good quality measurement data. On-line fault detection techniques can be applied in order to dete...

  16. A New Adaptive Kalman Estimator for Detection and Isolation of Multiple Faults Integrated in a Fault Tolerant Control

    H. Jamouli

    2010-01-01

    Full Text Available For sequential jumps detection, isolation, and estimation in discrete-time stochastic linear systems, Willsky and Jones (1976 have developed the Generalized Likelihood Ratio (GLR test. After each detection and isolation of one jump, the treatment of another possible jump is obtained by a direct state estimate and covariance incrementation of the Kalman filter originally designed on the jump-free system. This paper proposes to extend this approach from a state estimator designed on a reference model directly sensitive to system changes. We will show that the obtained passive GLR test can be easily integrated in a Fault Tolerant Control System (FTCS via a control law designed in order to asymptotically reject the effect of sequential jumps.

  17. Comparing fault susceptibility of multiple ISAs and operating systems

    Chyłek, Sławomir

    2015-09-01

    This paper presents a research that aims to compare effects of faults on different configurations of computer systems. The study covers comparison of susceptibility to faults of x86, AMD64, ARM, PowerPC, MIPS architectures and Linux, FreeBSD, Minix operating systems. An emulation based software implemented fault injection technique was used to perform experiments. The problem of choosing an adequate number of tests in experiments is followed by report with collected results where multiple aspects of test runs were analyzed: providing correct computation result, availability of the system under test and error messages. The research allows to determine characteristics of susceptibility to faults of each platform and is a first step towards designing new fault tolerance solutions and assessing their effectiveness.

  18. SIFT - A preliminary evaluation. [Software Implemented Fault Tolerant computer for aircraft control

    Palumbo, D. L.; Butler, R. W.

    1983-01-01

    This paper presents the results of a performance evaluation of the SIFT computer system conducted in the NASA AIRLAB facility. The essential system functions are described and compared to both earlier design proposals and subsequent design improvements. The functions supporting fault tolerance are found to consume significant computing resources. With SIFT's specimen task load, scheduled at a 30-Hz rate, the executive tasks such as reconfiguration, clock synchronization and interactive consistency, require 55 percent of the available task slots. Other system overhead (e.g., voting and scheduling) use an average of 50 percent of each remaining task slot.

  19. AVR microcontroller simulator for software implemented hardware fault tolerance algorithms research

    Piotrowski, Adam; Tarnowski, Szymon; Napieralski, Andrzej

    2008-01-01

    Reliability of new, advanced electronic systems becomes a serious problem especially in places like accelerators and synchrotrons, where sophisticated digital devices operate closely to radiation sources. One of the possible solutions to harden the microprocessor-based system is a strict programming approach known as the Software Implemented Hardware Fault Tolerance. Unfortunately, in real environments it is not possible to perform precise and accurate tests of the new algorithms due to hardware limitation. This paper highlights the AVR-family microcontroller simulator project equipped with an appropriate monitoring and the SEU injection systems.

  20. Transient Faults in Computer Systems

    Masson, Gerald M.

    1993-01-01

    A powerful technique particularly appropriate for the detection of errors caused by transient faults in computer systems was developed. The technique can be implemented in either software or hardware; the research conducted thus far primarily considered software implementations. The error detection technique developed has the distinct advantage of having provably complete coverage of all errors caused by transient faults that affect the output produced by the execution of a program. In other words, the technique does not have to be tuned to a particular error model to enhance error coverage. Also, the correctness of the technique can be formally verified. The technique uses time and software redundancy. The foundation for an effective, low-overhead, software-based certification trail approach to real-time error detection resulting from transient fault phenomena was developed.

  1. LQCD workflow execution framework: Models, provenance and fault-tolerance

    Large computing clusters used for scientific processing suffer from systemic failures when operated over long continuous periods for executing workflows. Diagnosing job problems and faults leading to eventual failures in this complex environment is difficult, specifically when the success of an entire workflow might be affected by a single job failure. In this paper, we introduce a model-based, hierarchical, reliable execution framework that encompass workflow specification, data provenance, execution tracking and online monitoring of each workflow task, also referred to as participants. The sequence of participants is described in an abstract parameterized view, which is translated into a concrete data dependency based sequence of participants with defined arguments. As participants belonging to a workflow are mapped onto machines and executed, periodic and on-demand monitoring of vital health parameters on allocated nodes is enabled according to pre-specified rules. These rules specify conditions that must be true pre-execution, during execution and post-execution. Monitoring information for each participant is propagated upwards through the reflex and healing architecture, which consists of a hierarchical network of decentralized fault management entities, called reflex engines. They are instantiated as state machines or timed automatons that change state and initiate reflexive mitigation action(s) upon occurrence of certain faults. We describe how this cluster reliability framework is combined with the workflow execution framework using formal rules and actions specified within a structure of first order predicate logic that enables a dynamic management design that reduces manual administrative workload, and increases cluster-productivity.

  2. LQCD workflow execution framework: Models, provenance and fault-tolerance

    Piccoli, Luciano; Dubey, Abhishek; Simone, James N.; Kowalkowlski, James B.

    2010-04-01

    Large computing clusters used for scientific processing suffer from systemic failures when operated over long continuous periods for executing workflows. Diagnosing job problems and faults leading to eventual failures in this complex environment is difficult, specifically when the success of an entire workflow might be affected by a single job failure. In this paper, we introduce a model-based, hierarchical, reliable execution framework that encompass workflow specification, data provenance, execution tracking and online monitoring of each workflow task, also referred to as participants. The sequence of participants is described in an abstract parameterized view, which is translated into a concrete data dependency based sequence of participants with defined arguments. As participants belonging to a workflow are mapped onto machines and executed, periodic and on-demand monitoring of vital health parameters on allocated nodes is enabled according to pre-specified rules. These rules specify conditions that must be true pre-execution, during execution and post-execution. Monitoring information for each participant is propagated upwards through the reflex and healing architecture, which consists of a hierarchical network of decentralized fault management entities, called reflex engines. They are instantiated as state machines or timed automatons that change state and initiate reflexive mitigation action(s) upon occurrence of certain faults. We describe how this cluster reliability framework is combined with the workflow execution framework using formal rules and actions specified within a structure of first order predicate logic that enables a dynamic management design that reduces manual administrative workload, and increases cluster-productivity.

  3. Development and evaluation of a Fault-Tolerant Multiprocessor (FTMP) computer. Volume 2: FTMP software

    Lala, J. H.; Smith, T. B., III

    1983-01-01

    The software developed for the Fault-Tolerant Multiprocessor (FTMP) is described. The FTMP executive is a timer-interrupt driven dispatcher that schedules iterative tasks which run at 3.125, 12.5, and 25 Hz. Major tasks which run under the executive include system configuration control, flight control, and display. The flight control task includes autopilot and autoland functions for a jet transport aircraft. System Displays include status displays of all hardware elements (processors, memories, I/O ports, buses), failure log displays showing transient and hard faults, and an autopilot display. All software is in a higher order language (AED, an ALGOL derivative). The executive is a fully distributed general purpose executive which automatically balances the load among available processor triads. Provisions for graceful performance degradation under processing overload are an integral part of the scheduling algorithms.

  4. Design and Analysis of Software fault-Tolerant techniques for Softcore processors in reliable SRAM based FPGA

    Vatsya Tiwari

    2011-11-01

    Full Text Available This paper discusses high level techniques for designing fault tolerant systems in SRAM-based FPGAs, without modification in the FPGA architecture. Triple Modular Redundancy (TMR has been successfully applied in FPGAs to mitigate transient faults, which are likely to occur in space applications. However, TMR comes with high area and power dissipation penalties. The new technique proposed in this paper was specifically developed for FPGAs to cope with transient faults in the user combinational and sequential logic, while also reducing pin count, area and power dissipation. The methodology was validated by fault injection experiments in an emulation board. We present some fault coverage results and a comparison with the TMR approach

  5. Dual-quaternion based fault-tolerant control for spacecraft formation flying with finite-time convergence.

    Dong, Hongyang; Hu, Qinglei; Ma, Guangfu

    2016-03-01

    Study results of developing control system for spacecraft formation proximity operations between a target and a chaser are presented. In particular, a coupled model using dual quaternion is employed to describe the proximity problem of spacecraft formation, and a nonlinear adaptive fault-tolerant feedback control law is developed to enable the chaser spacecraft to track the position and attitude of the target even though its actuator occurs fault. Multiple-task capability of the proposed control system is further demonstrated in the presence of disturbances and parametric uncertainties as well. In addition, the practical finite-time stability feature of the closed-loop system is guaranteed theoretically under the designed control law. Numerical simulation of the proposed method is presented to demonstrate the advantages with respect to interference suppression, fast tracking, fault tolerant and practical finite-time stability. PMID:26775087

  6. Fault-tolerant quantum blind signature protocols against collective noise

    Zhang, Ming-Hui; Li, Hui-Fang

    2016-07-01

    This work proposes two fault-tolerant quantum blind signature protocols based on the entanglement swapping of logical Bell states, which are robust against two kinds of collective noises: the collective-dephasing noise and the collective-rotation noise, respectively. Both of the quantum blind signature protocols are constructed from four-qubit decoherence-free (DF) states, i.e., logical Bell qubits. The initial message is encoded on the logical Bell qubits with logical unitary operations, which will not destroy the anti-noise trait of the logical Bell qubits. Based on the fundamental property of quantum entanglement swapping, the receiver simply performs two Bell-state measurements (rather than four-qubit joint measurements) on the logical Bell qubits to verify the signature, which makes the protocols more convenient in a practical application. Different from the existing quantum signature protocols, our protocols can offer the high fidelity of quantum communication with the employment of logical qubits. Moreover, we hereinafter prove the security of the protocols against some individual eavesdropping attacks, and we show that our protocols have the characteristics of unforgeability, undeniability and blindness.

  7. Probabilistic analysis on fault tolerance of 3-Dimensional mesh networks

    王高才; 陈建二; 王国军; 陈松乔

    2003-01-01

    The probability model is used to analyze the fault tolerance of mesh. To simplify its analysis, it is as-sumed that the failure probability of each node is independent. A 3-D mesh is partitioned into smaller submeshes,and then the probability with which each submesh satisfies the defined condition is computed. If each submesh satis-fies the condition, then the whole mesh is connected. Consequently, the probability that a 3-D mesh is connected iscomputed assuming each node has a failure probability. Mathematical methods are used to derive a relationship be-tween network node failure probability and network connectivity probability. The calculated results show that the 3-D mesh networks can remain connected with very high probability in practice. It is formally proved that when thenetwork node failure probability is boutded by 0.45 %, the 3-D mesh networks of more than three hundred thousandnodes remain connected with probability larger than 99 %. The theoretical results show that the method is a power-ful technique to calculate the lower bound of the connectivity probability of mesh networks.

  8. ALLIANCE: An architecture for fault tolerant multi-robot cooperation

    Parker, L.E.

    1995-02-01

    ALLIANCE is a software architecture that facilitates the fault tolerant cooperative control of teams of heterogeneous mobile robots performing missions composed of loosely coupled, largely independent subtasks. ALLIANCE allows teams of robots, each of which possesses a variety of high-level functions that it can perform during a mission, to individually select appropriate actions throughout the mission based on the requirements of the mission, the activities of other robots, the current environmental conditions, and the robot`s own internal states. ALLIANCE is a fully distributed, behavior-based architecture that incorporates the use of mathematically modeled motivations (such as impatience and acquiescence) within each robot to achieve adaptive action selection. Since cooperative robotic teams usually work in dynamic and unpredictable environments, this software architecture allows the robot team members to respond robustly, reliably, flexibly, and coherently to unexpected environmental changes and modifications in the robot team that may occur due to mechanical failure, the learning of new skills, or the addition or removal of robots from the team by human intervention. The feasibility of this architecture is demonstrated in an implementation on a team of mobile robots performing a laboratory version of hazardous waste cleanup.

  9. Fault-tolerant authenticated quantum dialogue using logical Bell states

    Ye, Tian-Yu

    2015-09-01

    Two fault-tolerant authenticated quantum dialogue protocols are proposed in this paper by employing logical Bell states as the quantum resource, which combat the collective-dephasing noise and the collective-rotation noise, respectively. The two proposed protocols each can accomplish the mutual identity authentication and the dialogue between two participants simultaneously and securely over one kind of collective noise channels. In each of two proposed protocols, the information transmitted through the classical channel is assumed to be eavesdroppable and modifiable. The key for choosing the measurement bases of sample logical qubits is pre-shared privately between two participants. The Bell state measurements rather than the four-qubit joint measurements are adopted for decoding. The two participants share the initial states of message logical Bell states with resort to the direct transmission of auxiliary logical Bell states so that the information leakage problem is avoided. The impersonation attack, the man-in-the-middle attack, the modification attack and the Trojan horse attacks from Eve all are detectable.

  10. ALLIANCE: An architecture for fault tolerant multi-robot cooperation

    ALLIANCE is a software architecture that facilitates the fault tolerant cooperative control of teams of heterogeneous mobile robots performing missions composed of loosely coupled, largely independent subtasks. ALLIANCE allows teams of robots, each of which possesses a variety of high-level functions that it can perform during a mission, to individually select appropriate actions throughout the mission based on the requirements of the mission, the activities of other robots, the current environmental conditions, and the robot's own internal states. ALLIANCE is a fully distributed, behavior-based architecture that incorporates the use of mathematically modeled motivations (such as impatience and acquiescence) within each robot to achieve adaptive action selection. Since cooperative robotic teams usually work in dynamic and unpredictable environments, this software architecture allows the robot team members to respond robustly, reliably, flexibly, and coherently to unexpected environmental changes and modifications in the robot team that may occur due to mechanical failure, the learning of new skills, or the addition or removal of robots from the team by human intervention. The feasibility of this architecture is demonstrated in an implementation on a team of mobile robots performing a laboratory version of hazardous waste cleanup

  11. Fault Tolerance Approach in Mobile Agents for Information Retrieval Applications Using Check Points

    Rahul Hans

    2012-06-01

    Full Text Available Mobile agents have emerged as major programming paradigm for distributed applications. Mobile agents are the intelligent programs that act autonomously on behalf of a user and can migrate from one host to another host in a network in order to satisfy the requests made by their clients. A prerequisite for their use, however, is that they should be executed reliably independent of failures. Improving the survivability of mobile agents in presence of agent server failures is an important issue in order to guarantee continuous execution of mobile agents. Thus it is very important to make mobile agents fault tolerant. In this paper, we propose fault tolerance mechanism for the scenarios where the agent stops its execution due to fault on any server in the itinerary. Our approach makes use of check pointing, partial results or data retrieved and the address of last host visited is saved prior before the agent visits the next host in the itinerary .The proposed mechanism has been implemented on the Aglets mobile agent system and evaluated in terms of parameters such as round trip time, Reliable migration time, Check point time. The results show the improvement in reliability and performance, especially for mobile agents in Internet application.

  12. Fault Tolerant Architecture For A Fly-By-Light Flight Control Computer

    Thompson, Kevin; Stipanovich, John; Smith, Brian; Reddy, Mahesh C.

    1990-02-01

    The next generation of flight control computers will utilize fiber optic technology to produce a fly-by-light flight control system. Optical transducers and optical fibers will take the place of electrical position transducers and wires, torsion bars, bell cranks, and cables. Applications for this fly-by-light technology include space launch vehicles, upperstages, space-craft, and commercial/military aircraft. Optical fibers are lighter than mechanical transmission media and unlike conven-tional wire transmissions are not susceptible to electromagnetic interference (EMI) and high energy emission sources. This paper will give an overview of a fault tolerant In-Line Monitored optical flight control system being developed at Boeing Aerospace & Electronics in Seattle, Washington. This system uses passive transducers with fiber optic interconnections which hold promises to virtually eliminate EMI threats to flight control system performance and flight safety and also provide significant weight savings. The main emphasis of this paper will be the In-Line Monitored architecture of the optical transducer system required for use in a fault tolerant flight control system.

  13. Fault Diagnosis and Fault-Tolerant Control of Wind Turbines via a Discrete Time Controller with a Disturbance Compensator

    Yolanda Vidal

    2015-05-01

    Full Text Available This paper develops a fault diagnosis (FD and fault-tolerant control (FTC of pitch actuators in wind turbines. This is accomplished by combining a disturbance compensator with a controller, both of which are formulated in the discrete time domain. The disturbance compensator has a dual purpose: to estimate the actuator fault (which is used by the FD algorithm and to design the discrete time controller to obtain an FTC. That is, the pitch actuator faults are estimated, and then, the pitch control laws are appropriately modified to achieve an FTC with a comparable behavior to the fault-free case. The performance of the FD and FTC schemes is tested in simulations with the aero-elastic code FAST.

  14. Universal Fault Tolerant Quantum Computation on a Class of Decoherence-Free Subspaces Without Spatial Symmetry

    Lidar, D A; Kempe, J; Whaley, K B; Lidar, Daniel A.; Bacon, David; Kempe, Julia

    2001-01-01

    Decoherence-free subspaces (DFSs) are constructed without the assumption of spatially symmetric system-bath coupling. Instead the underlying assumption is that subgroups of the full Pauli group of errors are responsible for the decoherence. The corresponding decoherence-free states can protect quantum information in the presence of multiple-qubit errors, and are stabilizer codes. It is shown how to perform universal fault tolerant quantum computation on this class of DFSs. This is the first demonstration that it is possible to use only one- and two-body quantum gates to perform full-blown quantum computation on a class of DFSs, with a finite number of measurements.

  15. A novel fault tolerant permanent magnet synchronous motor with improved optimal torque control for aerospace application

    Guo Hong; Xu Jinquan; Kuang Xiaolin

    2015-01-01

    Improving fault tolerant performance of permanent magnet synchronous motor has always been the central issue of the electrically supplied actuator for aerospace application. In this paper, a novel fault tolerant permanent magnet synchronous motor is proposed, which is characterized by two stators and two rotors on the same shaft with a circumferential displacement of mechanical angle of 4.5°. It helps to reduce the cogging torque. Each segment of the stator and the rotor can be considered as ...

  16. A Byzantine resilient fault tolerant computer for nuclear power plant applications

    A quadruply redundant synchronous fault tolerant processor, capable of tolerating Byzantine faults, is now under fabrication at the C.S. Draper Laboratory to be used initially as a trip monitor for the Experimental Breeder Reactor EBR-II operated by the Argonne National Laboratory in Idaho Falls, Idaho. This paper describes the hardware architecture of this processor and discusses certain issues unique to quadruply redundant computers

  17. Minimum Fault-Tolerant, local and strong metric dimension of graphs

    Salman, Muhammad; Javaid, Imran; Chaudhry, Muhammad Anwar

    2014-01-01

    In this paper, we consider three similar optimization problems: the fault-tolerant metric dimension problem, the local metric dimension problem and the strong metric dimension problem. These problems have applications in many diverse areas, including network discovery and verification, robot navigation and chemistry, etc. We give integer linear programming formulations of the fault-tolerant metric dimension problem and the local metric dimension problem. Also, we study local metric dimension ...

  18. Assessing server fault tolerance and disaster recovery implementation in thin client architectures

    Slaydon, Samuel L.

    2007-01-01

    This thesis will focus on assessing server fault tolerance and disaster recovery procedures for thin-clients being implemented in smart classrooms and computer laboratories aboard the Naval Postgraduate School campus. The successful discovery of fault tolerance limits and a disaster recovery plan not only benefits the Naval Postgraduate School (NPS), but also provides the same for other commands that have implemented or plan to employ thin clients as part of their Information Technology ...

  19. Distributed Fault-Tolerant Event Region Detection of Wireless Sensor Networks

    Dyi-Rong Duh; Ssu-Pei Li; Victor W. Cheng

    2013-01-01

    This work provides a distributed fault-tolerant event region detection algorithm for wireless sensor networks. The proposed algorithm can identify faulty and fault-free sensors and ignore the abnormal readings to avoid false alarm. Moreover, every event region can also be detected and identified. Simulation results show that fault detection accuracy (FDA) is greater than 92%, false alarm rate (FAR) is near 0%, and event detection accuracy (EDA) is greater than 99% under uniform distribution. ...

  20. Verification of a Byzantine-Fault-Tolerant Self-stabilizing Protocol for Clock Synchronization

    Malekpour, Mahyar R.

    2008-01-01

    This paper presents the mechanical verification of a simplified model of a rapid Byzantine-fault-tolerant self-stabilizing protocol for distributed clock synchronization systems. This protocol does not rely on any assumptions about the initial state of the system except for the presence of sufficient good nodes, thus making the weakest possible assumptions and producing the strongest results. This protocol tolerates bursts of transient failures, and deterministically converges within a time bound that is a linear function of the self-stabilization period. A simplified model of the protocol is verified using the Symbolic Model Verifier (SMV). The system under study consists of 4 nodes, where at most one of the nodes is assumed to be Byzantine faulty. The model checking effort is focused on verifying correctness of the simplified model of the protocol in the presence of a permanent Byzantine fault as well as confirmation of claims of determinism and linear convergence with respect to the self-stabilization period. Although model checking results of the simplified model of the protocol confirm the theoretical predictions, these results do not necessarily confirm that the protocol solves the general case of this problem. Modeling challenges of the protocol and the system are addressed. A number of abstractions are utilized in order to reduce the state space.