WorldWideScience
 
 
1

Fault tolerant computing systems  

CERN Document Server

Fault tolerance involves the provision of strategies for error detection, damage assessment, fault treatment and error recovery. A survey is given of the different sorts of strategies used in highly reliable computing systems, together with an outline of recent research on the problems of providing fault tolerance in parallel and distributed computing systems. (15 refs).

Randell, B

1981-01-01

2

Fault tolerant computing systems  

International Nuclear Information System (INIS)

Fault tolerance involves the provision of strategies for error detection damage assessment, fault treatment and error recovery. A survey is given of the different sorts of strategies used in highly reliable computing systems, together with an outline of recent research on the problems of providing fault tolerance in parallel and distributed computing systems. (orig.)

1981-03-12

3

Fault-Tolerant UAV Flight Control System  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The main focus of this master?s thesis is fault-tolerant control systems (FTCSs) for unmanned aerial vehicles (UAVs). The goals are to develop an automatic-flight control system (AFCS) with fault detection and isolation (FDI) and a reconfiguration mechanism for accommodation of faults. The literature study reviews methods for fault-tolerant control and also discusses important faults and failures related to UAVs.The FTCS is implemented in MATLAB Simulink with a nonlinear model of the Ces...

Dybsjord, Kerrin Andre

2013-01-01

4

Fault Tolerance in Real Time Distributed System  

Directory of Open Access Journals (Sweden)

Full Text Available In this paper we investigate the different techniques of fault tolerance which are used in many real time distributed systems. The main focus is on types of fault occurring in the system, fault detection techniques and the recovery techniques used. A fault can occur due to link failure, resource failure or by any other reason is to be tolerated for working the system smoothly and accurately. These faults can be detected and recovered by many techniques used ccordingly. An appropriate fault detector can avoid loss due to system crash and reliable fault tolerance technique can save from system failure. This paper provides how these methods are applied to detect and tolerate faults from various Real Time Distributed Systems.

Arvind Kumar

2011-02-01

5

Software fault tolerance in computer operating systems  

Science.gov (United States)

This chapter provides data and analysis of the dependability and fault tolerance for three operating systems: the Tandem/GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Based on measurements from these systems, basic software error characteristics are investigated. Fault tolerance in operating systems resulting from the use of process pairs and recovery routines is evaluated. Two levels of models are developed to analyze error and recovery processes inside an operating system and interactions among multiple instances of an operating system running in a distributed environment. The measurements show that the use of process pairs in Tandem systems, which was originally intended for tolerating hardware faults, allows the system to tolerate about 70% of defects in system software that result in processor failures. The loose coupling between processors which results in the backup execution (the processor state and the sequence of events occurring) being different from the original execution is a major reason for the measured software fault tolerance. The IBM/MVS system fault tolerance almost doubles when recovery routines are provided, in comparison to the case in which no recovery routines are available. However, even when recovery routines are provided, there is almost a 50% chance of system failure when critical system jobs are involved.

Iyer, Ravishankar K.; Lee, Inhwan

1994-01-01

6

Energy-efficient fault-tolerant systems  

CERN Document Server

This book describes the state-of-the-art in energy efficient, fault-tolerant embedded systems. It covers the entire product lifecycle of electronic systems design, analysis and testing and includes discussion of both circuit and system-level approaches. Readers will be enabled to meet the conflicting design objectives of energy efficiency and fault-tolerance for reliability, given the up-to-date techniques presented.

Mathew, Jimson; Pradhan, Dhiraj K

2013-01-01

7

Fault tolerant control of systems with saturations  

DEFF Research Database (Denmark)

This paper presents framework for fault tolerant controllers (FTC) that includes input saturation. The controller architecture known from FTC is based on the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization is extended to handle input saturation. Applying this controller architecture in connection with faulty systems including input saturation gives an additional YJBK transfer function related to the input saturation. In the fault free case, this additional YJBK transfer function can be applied directly for optimizing the feedback loop around the input saturation. In the faulty case, the design problem is a mixed design problem involved both parametric faults and input saturation.

Niemann, Hans Henrik

2013-01-01

8

Software engineering of fault tolerant systems  

CERN Document Server

In architecting dependable systems, what is required to improve the overall system robustness is fault tolerance. Many methods have been proposed to this end, the solutions are usually considered late during the design and implementation phases of the software life-cycle (e.g., Java and Windows NT exception handling), thus reducing the effectiveness error and fault handling. Since the system design typically models only normal behaviour of the system while ignoring exceptional ones, the implementation of the system is unable to handle abnormal events. Consequently, the system may fail in unexp

Pelliccione, P; Muccini, Henry

2007-01-01

9

The Low Latency Fault Tolerance System  

CERN Document Server

The Low Latency Fault Tolerance (LLFT) system provides fault tolerance for distributed applications, using the leader-follower replication technique. The LLFT system provides application-transparent replication, with strong replica consistency, for applications that involve multiple interacting processes or threads. The LLFT system comprises a Low Latency Messaging Protocol, a Leader-Determined Membership Protocol, and a Virtual Determinizer Framework. The Low Latency Messaging Protocol provides reliable, totally ordered message delivery by employing a direct group-to-group multicast, where the ordering is determined by the primary replica in the group. The Leader-Determined Membership Protocol provides reconfiguration and recovery when a replica becomes faulty and when a replica joins or leaves a group, where the membership is determined by the primary replica. The Virtual Determinizer Framework captures the ordering information at the primary replica in the group and enforces the same ordering at the backup...

Zhao, Wenbing; Moser, L E

2010-01-01

10

Interface For Fault-Tolerant Control System  

Science.gov (United States)

Interface unit and controller emulator developed for research on electronic helicopter-flight-control systems equipped with artificial intelligence. Interface unit interrupt-driven system designed to link microprocessor-based, quadruply-redundant, asynchronous, ultra-reliable, fault-tolerant control system (controller) with electronic servocontrol unit that controls set of hydraulic actuators. Receives digital feedforward messages from, and transmits digital feedback messages to, controller through differential signal lines or fiber-optic cables (thus far only differential signal lines have been used). Analog signals transmitted to and from servocontrol unit via coaxial cables.

Shaver, Charles; Williamson, Michael

1989-01-01

11

A Fault-tolerant Development Methodology for Industrial Control Systems  

DEFF Research Database (Denmark)

Developing advanced detection schemes is not the lone factor for obtaining a successful fault diagnosis performance. Acquiring significant achievements in applying Fault-tolerance in industrial development requires that fault diagnosis and recovery schemes are developed in a consistent and logically sound manner. This paper presents the employe fault-tolerant development methodology and highlights steps, which have been essential for achieving complete and consistent monitoring capabilities. Fault diagnosis for a commercial refrigeration system is treated as a case-study.

Izadi-Zamanabadi, Roozbeh; Thybo, C.

2004-01-01

12

Fault-tolerant Actuator System for Electrical Steering of Vehicles  

DEFF Research Database (Denmark)

Being critical to the safety of vehicles, the steering system is required to maintain the vehicles ability to steer until it is brought to halt, should a fault occur. With electrical steering becoming a cost-effective candidate for electrical powered vehicles, a fault-tolerant architecture is needed that meets this requirement. This paper studies the fault-tolerance properties of an electrical steering system. It presents a fault-tolerant architecture where a dedicated AC motor design used in conjunction with cheap voltage measurements can ensure detection of all relevant faults in the steering system. The paper shows how active control reconfiguration can accommodate all critical faults. The fault-tolerant abilities of the steering system are demonstrated on the hardware of a warehouse truck.

Sørensen, Jesper Sandberg; Blanke, Mogens

2006-01-01

13

Fault-tolerant actuator system for electrical steering of vehicles  

DEFF Research Database (Denmark)

Being critical to the safety of vehicles, the steering system is required to maintain the vehicles ability to steer until it is brought to halt, should a fault occur. With electrical steering becoming a cost-effective candidate for electrical powered vehicles, a fault-tolerant architecture is needed that meets this requirement. This paper studies the fault-tolerance properties of an electrical steering system. It presents a fault-tolerant architecture where a dedicated AC motor design used in conjunction with cheap voltage measurements can ensure detection of all relevant faults in the steering system. The paper shows how active control reconfiguration can accommodate all critical faults. The fault-tolerant abilities of the steering system are demonstrated on the hardware of a warehouse truck.

Thomsen, Jesper Sandberg; Blanke, Mogens

2006-01-01

14

Fault tolerant hypercube computer system architecture  

Energy Technology Data Exchange (ETDEWEB)

This patent describes a fault-tolerant multi-processor computer system of the hypercube type. It comprises: a plurality of first computing nodes; a first network of message conducting path means for interconnecting the first computing nodes as a hypercube. The first network providing a path for message transfer between the first computing nodes; a first watch dog node; and, a second network of message conducting path means for directly connecting each of the first computing nodes to the first watch dog node independent from the first network. The second network providing an independent path for test message and reconfiguration affecting transfers between respective ones of the first computing nodes and the first watch dog node.

Madan, H.S.; Chow, E.

1989-09-19

15

PAV: Parallel Average Voting Algorithm for Fault-Tolerant Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Fault-tolerant systems are such systems that can continue their operation, even in presence of faults. Redundancy as one of the main techniques in implementation of fault-tolerant control systems uses voting algorithms to choose the most appropriate value among multiple redundant and probably faulty results. Average (mean) voter is one of the commonest voting methods which is suitable for decision making in highly-available and long-missions applications in which the availability and speed of...

2011-01-01

16

Intelligent System for Parallel Fault-Tolerant Diagnostic Tests Construction  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This investigation deals with the intelligent system for parallel fault-tolerant diagnostic tests construction. A modified parallel algorithm for fault-tolerant diagnostic tests construction is proposed. The algorithm is allowed to optimize processing time on tests construction. A matrix model ...

Anna Yankovskaya; Sergei Kitler

2013-01-01

17

Fault-Tolerant Onboard Monitoring and Decision Support Systems  

DEFF Research Database (Denmark)

The purpose of this research project is to improve current onboard decision support systems. Special focus is on the onboard prediction of the instantaneous sea state. In this project a new approach to increasing the overall reliability of a monitoring and decision support system has been established. The basic idea is to convert the given system into a fault-tolerant system and to improve multi-sensor data fusion for the particular system. The background of the project is the SeaSense system, which has been installed on several container ships and navy vessels. The SeaSense system provides a crude and simple estimation of the actual sea state (Hs and Tz), information about the longitudinal hull girder loading, seakeeping performance of the ship, and decision support on how to operate the ship within acceptable limits. The system is able to identify critical forthcoming events and to give advice regarding speed and course changes to decrease the wave-induced loads. The SeaSense system is based on the combineduse of a mathematical model and measurements from a set of sensors. The overall dependability of a shipboard monitoring and decision support system such as the SeaSense system can be improved using fault-tolerant techniques (Fault Diagnosis and System Re-design) and a Sensor Fusion Quality (SFQ) test. Fault diagnosis means to detect the presence of faults in the system. In case sea state estimation is conducted by a ship-wave buoy analogy the best solution is achieved when a set of three different ship responses are used. Faulty signals should be discarded from the procedure for sea state estimation if it is possible, if not the fault should be estimated. The fault diagnosis can be divided into three steps: Fault detection, fault isolation and fault estimation. Fault detection means to decide whether or not a fault has occurred. This step determines the time at which the system is subjected to the given fault. Fault isolation will find in which component a fault has occurred. This step determines the location of the fault. Fault estimation provides an estimate of magnitude of a fault. A supervisory function determines the severity of the fault once its origin has been isolated and its magnitude estimated. Fault-tolerant Sensor Fusion means that the monitoring and decision support system can accommodate faults so that the overall system continues to satisfy its goal and on the other hand in the absence of a fault, the system should be able to provide the most accurate information using the SFQ test.

Lajic, Zoran

2010-01-01

18

Active fault tolerant control design for switched hybrid systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper, an active Fault Tolerant Control (FTC) strategy is developed for Switched Hybrid Systems. The main contribution concerns the design of a linear Output Feedback dedicated to Switched Hybrid System. Based on an available Fault Detection, Isolation (FDI) scheme, the controllers redesign is performed on-line trough LMI both in fault-free and faulty cases in order to preserve the system closed-loop stability despite of actuator failures. The effectiveness and performances of the pro...

Rodrigues, Mickael; Theilliol, Didier; Sauter, Dominique

2006-01-01

19

Fault tolerant oxygen control of a diesel engine air system  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper is devoted to the fault tolerant control problem of a Diesel engine air system having a jammed Exhaust Gas Recirculation (EGR) valve. The fault tolerant control is based on replaning the trajectory in order to track a new controlled variable which is the oxygen concentration in the intake manifold instead of the fresh air mass flow. The trajectory planning is based on an inverse model approach, utilizing the fundamental thermodynamic relations of the air system.

Nitsche, Rainer; Bitzer, Matthias; El Khaldi, Mahmoud; Bloch, Ge?rard

2010-01-01

20

Fault tolerant tracking control for continuous Takagi-Sugeno systems with time varying faults  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper deals with Fault Tolerant Control design for continuous nonlinear Takagi-Sugeno faulty systems. The goal is to ensure both state and fault estimation and the state reference tracking even if faults occur. In this study, the faults affecting the system behavior are considered as time varying functions modeled by exponential functions or first order polynomials. Based on descriptor redundancy property, solutions are proposed for both cases, exponential and polyno- mial faults, in ter...

Bouarar, Tahar; Marx, Benoi?t; Maquin, Didier; Ragot, Jose?

2011-01-01

 
 
 
 
21

H infinity Integrated Fault Estimation and Fault Tolerant Control of Discrete-time Piecewise Linear Systems  

DEFF Research Database (Denmark)

In this paper we consider the problem of fault estimation and accommodation for discrete time piecewise linear systems. A robust fault estimator is designed to estimate the fault such that the estimation error converges to zero and Hâ?? performance of the fault estimation is minimized. Then, the estimate of fault is used to compensate for the effect of the fault. Hence, using the estimate of fault, a fault tolerant controller using a piecewise linear static output feedback is designed such that it stabilizes the system and provides an upper bound on the Hâ?? performance of the faulty system. Sufficient conditions for the existence of robust fault estimator and fault tolerant controller are derived in terms of linear matrix inequalities. Upper bounds on the Hâ?? performance can be minimized by solving convex optimization problems with linear matrix inequality constraints. The efficiency of the method is demonstrated by means of a numerical example.

Tabatabaeipour, Seyed Mojtaba; Bak, Thomas

2012-01-01

22

Safety Reliability Enhancement in Fault tolerant Automotive Embedded System  

Directory of Open Access Journals (Sweden)

Full Text Available Reliability is control and prevention of failures to reduce failure and improve operations by enhancing performance with system-level analysis and modelling are needed not only for predictability and comparability when partitioning end-to-end functions at design time levels of reliability. Reliability numbers by themselves will not motivate improvements, performance of two fault tolerant mechanisms dealing with repairable and non-repairable components that have failed. The improvement in the reliability and safety of a system with repairable components with respect to the fault tolerant systems under study correspond to a flexible arrangement of fault tolerant units (FTU’s. SFAS (Safety Fault tolerant Automotive Systems and ECU are being compared to achieve effective results. Reliability principles are discussed which assist system improvement for reducing the high unreliability. CAN Controllers are used in automotive for fault tolerant embedded system. The existing reliability enhancement models are emphasizing various redundancy techniques both in hardware and software without focusing a formal way of recovery time minimization from the affected or degraded states in the automotive systems.

Balachandra Pattanaik,

2013-01-01

23

Passive Fault-tolerant Control of Discrete-time Piecewise Affine Systems against Actuator Faults  

DEFF Research Database (Denmark)

In this paper, we propose a new method for passive fault-tolerant control of discrete time piecewise affine systems. Actuator faults are considered. A reliable piecewise linear quadratic regulator (LQR) state feedback is designed such that it can tolerate actuator faults. A sufficient condition for the exis- tence of a passive fault-tolerant controller is derived and formulated as the feasibility of a set of linear matrix inequalities (LMIs). The upper bound on the performance cost can be minimized using a convex optimization problem with LMI constraints which can be solved efficiently. The approach is illustrated on a numerical example and a two degree of freedom helicopter.

Tabatabaeipour, Seyed Mojtaba; Izadi-Zamanabadi, Roozbeh

2012-01-01

24

Passive fault-tolerant control of discrete time piecewise affine systems against actuator faults  

DEFF Research Database (Denmark)

In this article, we propose a new method for passive fault-tolerant control of discrete time piecewise affine systems. Actuator faults are considered. A reliable piecewise linear quadratic regulator state feedback is designed such that it can tolerate actuator faults. A sufficient condition for the existence of a passive fault-tolerant controller is derived and formulated as the feasibility of a set of linear matrix inequalities (LMIs). The upper bound on the performance cost can be minimised using a convex optimisation problem with LMI constraints which can be solved efficiently. The approach is illustrated on a numerical example and a two degree of freedom helicopter. © 2012 Taylor & Francis Group, LLC.

Tabatabaeipour, Mojtaba; Izadi-Zamanabadi, Roozbeh

2012-01-01

25

Fault tolerant digital control systems for boiling water reactors  

International Nuclear Information System (INIS)

In a Boiling Water Reactor nuclear power plant, the power generation control function is divided into several systems, each system controlling only a part of the total plant. Presently, each system is controlled by conventional analog or digital logic circuits with little interaction for coordinated control. The advent of microprocessors has allowed the development of distributed fault-tolerant digital controls. The objective is to replace these conventional controls with fault-tolerant digital controls connected together with digital communication links to form a fully integrated nuclear power plant control system

1986-09-01

26

SIFT - Design and analysis of a fault-tolerant computer for aircraft control. [Software Implemented Fault Tolerant systems  

Science.gov (United States)

SIFT (Software Implemented Fault Tolerance) is an ultrareliable computer for critical aircraft control applications that achieves fault tolerance by the replication of tasks among processing units. The main processing units are off-the-shelf minicomputers, with standard microcomputers serving as the interface to the I/O system. Fault isolation is achieved by using a specially designed redundant bus system to interconnect the processing units. Error detection and analysis and system reconfiguration are performed by software. Iterative tasks are redundantly executed, and the results of each iteration are voted upon before being used. Thus, any single failure in a processing unit or bus can be tolerated with triplication of tasks, and subsequent failures can be tolerated after reconfiguration. Independent execution by separate processors means that the processors need only be loosely synchronized, and a novel fault-tolerant synchronization method is described.

Wensley, J. H.; Lamport, L.; Goldberg, J.; Green, M. W.; Levitt, K. N.; Melliar-Smith, P. M.; Shostak, R. E.; Weinstock, C. B.

1978-01-01

27

OPTIMAL CHOICE WITHIN A FAULT TOLERANT FLIGHT CONTROL SYSTEM ????? ???????????? ??????????? ?????? ? ?????????????? ???????? ???????????????? ??????? ????? ???????????? ??????????? ??????? ? ????????????? ???????? ?????????????? ???????  

Digital Repository Infrastructure Vision for European Research (DRIVER)

 Safety of aircraft during the flight is one of the most important problems that concerns of all aviation. Failures/faults main elements automatic control system and damages to the external contour of the aircraft by foreign objects always lead to a change the characteristics of the aircraft, direct and indirect economic costs and sometimes to injury or death of passengers and crew. Real-time active fault tolerant control system makes it possible to warn or prevent emergency situations a...

2013-01-01

28

Data-driven design of fault diagnosis and fault-tolerant control systems  

CERN Multimedia

Data-driven Design of Fault Diagnosis and Fault-tolerant Control Systems presents basic statistical process monitoring, fault diagnosis, and control methods, and introduces advanced data-driven schemes for the design of fault diagnosis and fault-tolerant control systems catering to the needs of dynamic industrial processes. With ever increasing demands for reliability, availability and safety in technical processes and assets, process monitoring and fault-tolerance have become important issues surrounding the design of automatic control systems. This text shows the reader how, thanks to the rapid development of information technology, key techniques of data-driven and statistical process monitoring and control can now become widely used in industrial practice to address these issues. To allow for self-contained study and facilitate implementation in real applications, important mathematical and control theoretical knowledge and tools are included in this book. Major schemes are presented in algorithm form and...

Ding, Steven X

2014-01-01

29

Design of fault tolerant control system for steam generator using  

Energy Technology Data Exchange (ETDEWEB)

A controller and sensor fault tolerant system for a steam generator is designed with fuzzy logic. A structure of the proposed fault tolerant redundant system is composed of a supervisor and two fuzzy weighting modulators. A supervisor alternatively checks a controller and a sensor induced performances to identify which part, a controller or a sensor, is faulty. In order to analyze controller induced performance both an error and a change in error of the system output are chosen as fuzzy variables. The fuzzy logic for a sensor induced performance uses two variables : a deviation between two sensor outputs and its frequency. Fuzzy weighting modulator generates an output signal compensated for faulty input signal. Simulations show that the proposed fault tolerant control scheme for a steam generator regulates well water level by suppressing fault effect of either controllers or sensors. Therefore through duplicating sensors and controllers with the proposed fault tolerant scheme, both a reliability of a steam generator control and sensor system and that of a power plant increase even more. 2 refs., 9 figs., 1 tab. (Author)

Kim, Myung Ki; Seo, Mi Ro [Korea Electric Power Research Institute, Taejon (Korea, Republic of)

1998-12-31

30

Parallel language for programming dynamic fault tolerant computer systems  

Energy Technology Data Exchange (ETDEWEB)

Fault-tolerant computers usually involve parallel architectures where the commutation of a particular task is duplicated and a consensus result is taken. More recently it has been realized that not all tasks in a schedule require the full fault tolerance provided by the parallel redundancy, and as a consequence architectures have been developed that dynamically reconfigure themselves to improve the throughput of less sensitive tasks by utilizing the parallelism. A new language is presented for programming this type of system. It has properties similar to those of OCCAM and Pascal-M and is suitable for real-time use. 27 references.

Shafibegly, A.; Gillies, D.

1983-01-01

31

Fault Tolerance Middleware for a Multi-Core System  

Science.gov (United States)

Fault Tolerance Middleware (FTM) provides a framework to run on a dedicated core of a multi-core system and handles detection of single-event upsets (SEUs), and the responses to those SEUs, occurring in an application running on multiple cores of the processor. This software was written expressly for a multi-core system and can support different kinds of fault strategies, such as introspection, algorithm-based fault tolerance (ABFT), and triple modular redundancy (TMR). It focuses on providing fault tolerance for the application code, and represents the first step in a plan to eventually include fault tolerance in message passing and the FTM itself. In the multi-core system, the FTM resides on a single, dedicated core, separate from the cores used by the application. This is done in order to isolate the FTM from application faults and to allow it to swap out any application core for a substitute. The structure of the FTM consists of an interface to a fault tolerant strategy module, a responder module, a fault manager module, an error factory, and an error mapper that determines the severity of the error. In the present reference implementation, the only fault tolerant strategy implemented is introspection. The introspection code waits for an application node to send an error notification to it. It then uses the error factory to create an error object, and at this time, a severity level is assigned to the error. The introspection code uses its built-in knowledge base to generate a recommended response to the error. Responses might include ignoring the error, logging it, rolling back the application to a previously saved checkpoint, swapping in a new node to replace a bad one, or restarting the application. The original error and recommended response are passed to the top-level fault manager module, which invokes the response. The responder module also notifies the introspection module of the generated response. This provides additional information to the introspection module that it can use in generating its next response. For example, if the responder triggers an application rollback and errors are still occurring, the introspection module may decide to recommend an application restart.

Some, Raphael R.; Springer, Paul L.; Zima, Hans P.; James, Mark; Wagner, David A.

2012-01-01

32

Development and application of diagnostic systems to achieve fault tolerance  

International Nuclear Information System (INIS)

Much work is currently being done to develop and apply diagnostic systems that are tolerant to faulted conditions in the process being monitored and in the sensors that measure the critical parameters associated with the process. A fault-tolerant diagnostic system based on state-determination, pattern-recognition techniques is currently undergoing testing and evaluation in certain applications at the EBR-II reactor. Testing and operational experience with the system to date has shown a high degree of tolerance to sensor failures, while being sensitive to very slight changes in the plant operational state. This paper briefly mentions related work being done by others, and describes in more detail the pattern-recognition system and the results of the testing and operational experience with the system at EBR-II. 9 refs., 10 figs

1989-05-15

33

Fault tolerant multimicroprocessor operating system modeled with locality Petri Nets  

Energy Technology Data Exchange (ETDEWEB)

A symmetric multitasking operating system for a fault tolerant multimicroprocessor is outlined. It is intended for applications with strict real-time constraints and high safety requirements. A petri-net based modeling method suitable for representing systems with multiple operation localities is described. The proposed formalism is an extension of petri nets and retains their analytic properties and most of their graphic nature. The formalism is used to model parts of the operating system and application program execution. 10 references.

Aspelund, J.; Linturi, R.

1981-01-01

34

Reliable, fault tolerant control systems for nuclear generating stations  

International Nuclear Information System (INIS)

Two operational features of CANDU Nuclear Power Stations provide for high plant availability. First, the plant re-fuels on-line, thereby eliminating the need for periodic and lengthy refuelling 'outages'. Second, the all plants are controlled by real-time computer systems. Later plants are also protected using real-time computer systems. In the past twenty years, the control systems now operating in 21 plants have achieved an availability of 99.8%, making significant contributions to high CANDU plant capacity factors. This paper describes some of the features that ensure the high degree of system fault tolerance and hence high plant availability. The emphasis will be placed on the fault tolerant features of the computer systems included in the latest reactor design - the CANDU 3 (450MWe). (author)

1990-01-01

35

Testing Virtual Reconfigurable Circuit Designed For A Fault Tolerant System  

Directory of Open Access Journals (Sweden)

Full Text Available This research describes about the testing of virtual reconfigurable circuit (VRC designed and implemented for a fault tolerant system which averages the (three sensor inputs. The circuits that are to be tested are those which are successfully evolved in this system under different situations such as (i all the three sensors are faultless (ii one of the input sensor fails as open (iii sensors fails as short circuit. The objective of this research is to test the desired optimal circuits evolved by decoding the configuration bit streams. The logic simulation tool used to perform fault simulation is AUSIM (Auburn University Simulator.

P. N. Kumar

2007-01-01

36

OPTIMAL CHOICE WITHIN A FAULT TOLERANT FLIGHT CONTROL SYSTEM ????? ???????????? ??????????? ?????? ? ?????????????? ???????? ???????????????? ??????? ????? ???????????? ??????????? ??????? ? ????????????? ???????? ?????????????? ???????  

Directory of Open Access Journals (Sweden)

Full Text Available  Safety of aircraft during the flight is one of the most important problems that concerns of all aviation. Failures/faults main elements automatic control system and damages to the external contour of the aircraft by foreign objects always lead to a change the characteristics of the aircraft, direct and indirect economic costs and sometimes to injury or death of passengers and crew. Real-time active fault tolerant control system makes it possible to warn or prevent emergency situations and thus improve safety. ????????? ????? ?????? ???????????? ??????????? ?????? ???????? ? ???????? ?????????? ????????????? ?????? ???????? ? ?????? ? ?????????????? ???????????????? ??????????. ?????????? ????????? ??????????? ??????, ???????????? ?? ?????????????? ??????????? ?????? ????????. ????????????? ????? ?????? ???????????? ??????????? ??????? ?????? ? ?????? ????????? ?????????? ????????? ???????? ? ??????? ? ????????????? ??????????????? ??????????. ???????????? ????????? ??????????? ???????, ?? ?????????? ?? ??????????? ????????? ????????? ????????.

Dmitriy Shevchuk

2013-04-01

37

Piecewise Sliding Mode Decoupling Fault Tolerant Control System  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Problem statement: Proposed method in the present study could deal with fault tolerant control system by using the so called decentralized control theory with decoupling fashion sliding mode control, dealing with subsystems instead of whole system and to the knowledge of the author there is no known computational algorithm for decentralized case, Approach: In this study we present a decoupling strategy based on the selection of sliding surface, which should be in piecewise slidi...

Rafi Youssef; Hui Peng

2010-01-01

38

Design Optimization of Time-and Cost-Constrained Fault-Tolerant Distributed Embedded Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper we present an approach to the design optimization of fault-tolerant embedded systems for safety-critical applications. Processes are statically scheduled and communications are performed using the time-triggered protocol. We use process re-execution and replication for tolerating transient faults. Our design optimization approach decides the mapping of processes to processors and the assignment of fault-tolerant policies to processes such that transient faults are tolerated and ...

Izosimov, Viacheslav; Pop, Paul; Eles, Petru; Peng, Zebo

2005-01-01

39

Design Optimization of Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper we present an approach to the design optimization of fault-tolerant embedded systems for safety-critical applications. Processes are statically scheduled and communications are performed using the time-triggered protocol. We use process re-execution and replication for tolerating transient faults. Our design optimization approach decides the mapping of processes to processors and the assignment of fault-tolerant policies to processes such that transient faults are tolerated and ...

Izosimov, Viacheslav; Pop, Paul

2009-01-01

40

Summarize of Electric Vehicle Electric System Fault and Fault-tolerant Technology  

Directory of Open Access Journals (Sweden)

Full Text Available Electric vehicle drive system is a multi-variable function, running environment complexed and changeable system, so it’s failure form is complicated. In this paper, according to the fault happens in different position, establish vehicle fault table, analyze the consequences of failure may cause and the causes of failure. Combined with hardware limitations, and the maximum guarantee system performance requirements, passive software redundancy fault-tolerant strategy is put forward, give an example to analysis the pros and cons of this method.

Zhang Liwei

2013-09-01

 
 
 
 
41

A Fault Tolerant Mobile Agent Information Retrieval System  

Directory of Open Access Journals (Sweden)

Full Text Available Problem statement: Most of the information retrieval systems used only client-server architectures. The client-server model though powerful, had some limitations. In mobile computing environment which has both wired network and wireless networks with limited communication capabilities, the performance of the system was very low. Approach: Mobile agents are considered a suitable technology to develop applications such as information retrieval system for mobile computing environment. Mobile agents are autonomous and dynamic entities that can migrate between various nodes in the network. They offer many advantages over traditional design methodologies like: reduction in network load, overcoming network latency and disconnected operations. Since the mobile agents do not need continuous communication with the mobile host, they are not affected by the sudden disconnection of wireless network and the situation of turning mobile host off for power saving. In order to get the complete benefit of mobile agent system, the system must be fault tolerant. In the context of mobile agents, fault-tolerance prevents a partial or complete loss of the agent. Results: Our system in mobile computing environment ensured that the agent arrived at its destination with result and performance of the system improved by the way of reduction in the response time. And also, the system allowed sending more requests by the way of creating many mobile agents without affecting the performance. Conclusion: Our research compared the performance of client-server architecture and fault tolerant mobile agent information retrieval system and proved that our system solved the limitations faced by the client server architecture. The system can also be extended to adhoc networks.

R. Punithavathi

2010-01-01

42

System Diagnosis and Fault Tolerance for Distributed Computing System: A Review  

Directory of Open Access Journals (Sweden)

Full Text Available An adaptive system diagnosis fault tolerance method for distributed system. The system is comprised of a network including N nodes where N is integer and greater than equal to 3 and each node is able to execute an algorithm to communicate with the network. A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information. As computer network is a collection of hardware components it is very often that is may have some fault either in the hardware or in the software of the entire network. So to deal with these kinds of faults either hardware of software, some fault diagnosis and fault tolerance mechanism to be implemented for the proper functioning of the system. For such a fault detection and fault tolerant mechanism is to be discussed in this paper. What kind of fault and how they occur will discuss and try to find out some suitable solution of our proposed problem. Various fault detecting mechanism and fault tolerant methodology to be study here and the main goal of the study is to find out some automatic fault detection and fault tolerance techniques

Nilotpal Baruah

2013-10-01

43

Evaluation of fault-tolerant system performance by approximate techniques  

Science.gov (United States)

An approximate method for calculating the statistics of the performance of a fault-tolerant system is developed. An approximate method is necessary because the statistical model of the system behavior is large-scale and the time horizon of interest encompasses many cycles of the Redundancy Management logic. In the development, a compact representation of the necessary information called the v-transform is introduced and discussed. Based upon this representation, an approximation that leads to a very efficient computational procedure is suggested and numerically analyzed. A very brief discussion of other related work is also presented.

Walker, B. K.; Gerber, D. K.

1985-01-01

44

Fault Tolerant Operation in Aero Engine Using Distributed Computation System  

Directory of Open Access Journals (Sweden)

Full Text Available The paper presents fault tolerant operation in an aero engine based on real-time systems which is built for a very small set of mission-critical applications like space craft’s , avionics and other distributed control systems. The modern software deals with external interfaces and has to consider various timing implications The platform is based on the C and developed using Keil MDK tool with the targeted deadline of 100 milliseconds at the baud rate of 500 kbps. CAN interface executes the role of Transportation and Communication, an interface cable used for serial communication between Digital Electronic Control Unit (DECU and the host to transfer data to the pilot Online Monitoring System and that is based on Laboratory Virtual Instrument Engineering Workbench (Lab VIEW 7.1. Fault diagnosis typically assumes a sufficiently large fault signature and enough time for a reliable decision to be reached. However, for a class of safety critical faults on commercial aircraft engines, prompt detection is paramount within a millisecond range to allow accommodation to avert undesired engine behavior. At the same time, false positives must be avoided to prevent inappropriate control action.

Neela A G

2014-04-01

45

Survey on fault-tolerant vehicle design  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Fault-tolerant vehicle design is an emerging inter-disciplinary research domain, which is of increasedimportance due to the electrification of automotive systems. The goal of fault-tolerant systems is to handleoccuring faults under operational condition and enable the driver to get to a safe stop. This paperpresents results from an extended survey on fault-tolerant vehicle design. It aims to provide a holisticview on the fault-tolerant aspects of a vehicular system. An overview of fault-toler...

2012-01-01

46

Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 2: Army fault tolerant architecture design and analysis  

Science.gov (United States)

Described here is the Army Fault Tolerant Architecture (AFTA) hardware architecture and components and the operating system. The architectural and operational theory of the AFTA Fault Tolerant Data Bus is discussed. The test and maintenance strategy developed for use in fielded AFTA installations is presented. An approach to be used in reducing the probability of AFTA failure due to common mode faults is described. Analytical models for AFTA performance, reliability, availability, life cycle cost, weight, power, and volume are developed. An approach is presented for using VHSIC Hardware Description Language (VHDL) to describe and design AFTA's developmental hardware. A plan is described for verifying and validating key AFTA concepts during the Dem/Val phase. Analytical models and partial mission requirements are used to generate AFTA configurations for the TF/TA/NOE and Ground Vehicle missions.

Harper, R. E.; Alger, L. S.; Babikyan, C. A.; Butler, B. P.; Friend, S. A.; Ganska, R. J.; Lala, J. H.; Masotto, T. K.; Meyer, A. J.; Morton, D. P.

1992-01-01

47

Application-Transparent Fault Tolerance in Distributed Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

We present a new software architecture in which all concepts necessary to achieve fault tolerance can be added to an appli- cation automatically without any source code changes. As a case study, we consider the problem of providing a reliable service despite node failures by executing a group of replicat- ed servers. Replica creation and management as well as fail- ure detection and recovery are performed automatically by a separate fault tolerance layer (ft-layer) which is inserted be- tween...

1999-01-01

48

Diagnostic software and fault tolerant microprocessor based system architectures  

International Nuclear Information System (INIS)

In numerous industrial applications including power generation, the availability of electronic systems to perform the tasks assigned has become a major issue. At the same time, the functional complexity of these systems has increased enormously. Fortunately, the arrival of cost effective microprocessor based hardware has given the system designer a cadre of techniques to ensure the desired degree of system integrity and availability. These include: dynamic redundancy, isolation, functional diversity, built-in self-tests, embedded test subsystems, communications, error checking and error correcting codes, etc. The choice among the available techniques is generally heuristic and depends greatly on the structure of major components and systems external to the electronic system itself as well as the postulated faults and their relative frequency. Indiscriminate use of these techniques will inevitably increase cost and reduce maintainability while actually reducing system availability and reliability. The issues and the application of these techniques are discussed by describing recent examples of fault tolerant microprocessor based system architectures which include the Plant Safety Monitoring System, the EAGLE-21 Process Protection System and the Advanced Rod Position Indication System for pressurized water reactors. Each of these systems utilize unique internal architectures that address the reliability, availability, and the communications issues while improving maintainability and man-machine interfaces

1986-09-01

49

Fault-Tolerant Static Scheduling for Real-Time Distributed Embedded Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper investigates fault-tolerance issues in real-time distributed embedded systems. Our goal is to propose solutions to automatically produce distributed and fault-tolerant code. We first characterize the systems considered by giving the main assumptions about the physical and logical architecture of these systems. In particular, we consider only processor failures, with a fail-stop behavior. Then, we give a state of the art of the techniques used for fault-tolerance. We also briefly pr...

Girault, Alain; Lavarenne, Christophe; Sighireanu, Mihaela; Sorel, Yves

2000-01-01

50

Piecewise Sliding Mode Decoupling Fault Tolerant Control System  

Directory of Open Access Journals (Sweden)

Full Text Available Problem statement: Proposed method in the present study could deal with fault tolerant control system by using the so called decentralized control theory with decoupling fashion sliding mode control, dealing with subsystems instead of whole system and to the knowledge of the author there is no known computational algorithm for decentralized case, Approach: In this study we present a decoupling strategy based on the selection of sliding surface, which should be in piecewise sliding surface partition to apply the PwLTool which have as purpose in our case to delimit regions where sliding mode occur, after that as Results: We get a simple linearized model selected in those regions which could depict the complex system, Conclusion: With the 3 water tank level system as example we implement this new design scenario and since we are interested in networked control system we believe that this kind of controller implementation will not be affected by network delays.

Rafi Youssef

2010-01-01

51

Trace-Based Compositional Proof Theory for Fault Tolerant Distributed Systems.  

Science.gov (United States)

We present a compositional network proof theory to specify and verify safety properties of fault tolerant distributed systems. In this proof theory we abstract from the precise nature and occurrence of faults, but model their effect on the externally visi...

H. Schepers J. Hooman

1993-01-01

52

Design and analysis of reliable and fault-tolerant computer systems  

CERN Document Server

Covering both the theoretical and practical aspects of fault-tolerant mobile systems, and fault tolerance and analysis, this book tackles the current issues of reliability-based optimization of computer networks, fault-tolerant mobile systems, and fault tolerance and reliability of high speed and hierarchical networks.The book is divided into six parts to facilitate coverage of the material by course instructors and computer systems professionals. The sequence of chapters in each part ensures the gradual coverage of issues from the basics to the most recent developments. A useful set of refere

Abd-El-Barr, Mostafa

2006-01-01

53

Coverage modeling for dependability analysis of fault-tolerant systems  

Science.gov (United States)

Several different models for predicting coverage in a fault-tolerant system, including models for permanent, intermittent, and transient errors, are discussed. Markov, semi-Markov, nonhomogeneous Markov, and extended stochastic Petri net models for computing coverage are developed. Two types of events that interfere with recovery are examined; and methods for modeling such events, whether they are deterministic or random, are given. The sensitivity of system reliability/availability to the coverage parameter and the sensitivity of the coverage parameter to various error-handling strategies are investigated. It is found that a policy of attempting transient recovery upon detection of an error can actually increase the unreliability of the system. This result is true if the error detectability is not nearly perfect, so that the risk of producing an undetectable error is greater than the benefit gained by not discarding the component.

Dugan, Joanne Bechta; Trivedi, Kishor S.

1989-01-01

54

Using Ada for a distributed, fault tolerant system  

Science.gov (United States)

It is pointed out that advanced avionics applications increasingly require underlying machine architectures which are damage and fault tolerant, and which provide access to distributed sensors, effectors and high-throughput computational resources. The Advanced Information Processing System (AIPS), sponsored by NASA, is to provide an architecture which can meet the considered requirements. Ada was selected for implementing the AIPS system software. Advantages of Ada are related to its provisions for real-time programming, error detection, modularity and separate compilation, and standardization and portability. Chief drawbacks of this language are currently limited availability and maturity of language implementations, and limited experience in applying the language to real-time applications. The present investigation is concerned with current plans for employing Ada in the design of the software for AIPS. Attention is given to an overview of AIPS, AIPS software services, and representative design issues in each of four major software categories.

Dewolf, J. B.; Sodano, N. M.; Whittredge, R. S.

1984-01-01

55

Synthesis of Fault-Tolerant Embedded Systems with Checkpointing and Replication  

Digital Repository Infrastructure Vision for European Research (DRIVER)

We present an approach to the synthesis of fault-tolerant hard real-time systems for safety-critical applications. We use checkpointing with rollback recovery and active replication for tolerating transient faults. Processes are statically scheduled and communications are performed using the time-triggered protocol. Our synthesis approach decides the assignment of fault-tolerance policies to processes, the optimal placement of checkpoints and the mapping of processes to processors such that t...

Izosimov, Viacheslav; Pop, Paul; Eles, Petru; Peng, Zebo

2006-01-01

56

Fault diagnosis and fault-tolerant control strategies for non-linear systems analytical and soft computing approaches  

CERN Document Server

  This book presents selected fault diagnosis and fault-tolerant control strategies for non-linear systems in a unified framework. In particular, starting from advanced state estimation strategies up to modern soft computing, the discrete-time description of the system is employed Part I of the book presents original research results regarding state estimation and neural networks for robust fault diagnosis. Part II is devoted to the presentation of integrated fault diagnosis and fault-tolerant systems. It starts with a general fault-tolerant control framework, which is then extended by introducing robustness with respect to various uncertainties. Finally, it is shown how to implement the proposed framework for fuzzy systems described by the well-known Takagi–Sugeno models. This research monograph is intended for researchers, engineers, and advanced postgraduate students in control and electrical engineering, computer science,as well as mechanical and chemical engineering.

Witczak, Marcin

2014-01-01

57

Diagnosis and Fault-Tolerant Control for Thruster-Assisted Position Mooring System  

DEFF Research Database (Denmark)

Development of fault-tolerant control systems is crucial to maintain safe operation of o®shore installations. The objective of this paper is to develop a fault- tolerant control for thruster-assisted position mooring (PM) system with faults occurring in the mooring lines. Faults in line's pretension or line breaks will degrade the performance of the positioning of the vessel. Faults will be detected and isolated through a fault diagnosis procedure. When faults are detected, they can be accommodated through the control action in which only parameter of the controlled plant has to be updated to cope with the faulty condition. Simulations will be carried out to verify the advantages of the fault-tolerant control strategy for the PM system.

Blanke, Mogens

2007-01-01

58

Evaluation of digital fault-tolerant architectures for nuclear power plant control systems  

Energy Technology Data Exchange (ETDEWEB)

Four fault tolerant architectures were evaluated for their potential reliability in service as control systems of nuclear power plants. The reliability analyses showed that human- and software-related common cause failures and single points of failure in the output modules are dominant contributors to system unreliability. The four architectures are triple-modular-redundant (TMR), both synchronous and asynchronous, and also dual synchronous and asynchronous. The evaluation includes a review of design features, an analysis of the importance of coverage, and reliability analyses of fault tolerant systems. An advantage of fault-tolerant controllers over those not fault tolerant, is that fault-tolerant controllers continue to function after the occurrence of most single hardware faults. However, most fault-tolerant controllers have single hardware components that will cause system failure, almost all controllers have single points of failure in software, and all are subject to common cause failures. Reliability analyses based on data from several industries that have fault-tolerant controllers were used to estimate the mean-time-between-failures of fault-tolerant controllers and to predict those failures modes that may be important in nuclear power plants. 7 refs., 4 tabs.

Battle, R.E.

1990-01-28

59

Evaluation of digital fault-tolerant architectures for nuclear power plant control systems  

International Nuclear Information System (INIS)

Four fault tolerant architectures were evaluated for their potential reliability in service as control systems of nuclear power plants. The reliability analyses showed that human- and software-related common cause failures and single points of failure in the output modules are dominant contributors to system unreliability. The four architectures are triple-modular-redundant (TMR), both synchronous and asynchronous, and also dual synchronous and asynchronous. The evaluation includes a review of design features, an analysis of the importance of coverage, and reliability analyses of fault tolerant systems. An advantage of fault-tolerant controllers over those not fault tolerant, is that fault-tolerant controllers continue to function after the occurrence of most single hardware faults. However, most fault-tolerant controllers have single hardware components that will cause system failure, almost all controllers have single points of failure in software, and all are subject to common cause failures. Reliability analyses based on data from several industries that have fault-tolerant controllers were used to estimate the mean-time-between-failures of fault-tolerant controllers and to predict those failures modes that may be important in nuclear power plants. 7 refs., 4 tabs

1990-06-10

60

Software fault tolerance  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Software design faults are a cause of major concern, and their relative importance is growing as techniques for tolerating hardware faults gain wider acceptance. The application of fault tolerance to design faults is both increasing, in particular in some life-critical applications, and controversial, due to the imperfect state of knowledge about it. This paper surveys the existing applications and research results, to help the reader form an initial picture of the existing possibilities, and...

1990-01-01

 
 
 
 
61

Reliability performance of fault-tolerant digital control systems  

Energy Technology Data Exchange (ETDEWEB)

This paper presents the results of a generic reliability analysis of fault-tolerant digital control systems (F-T DCS). This analysis differs from previous efforts at estimating the reliability performance of F-T DCS in the sense that this analysis relies extensively on actual experience with redundant computer systems rather than on theoretical evaluations. The dominant contributors to the frequency of failure of F-T DCS are (1) failures within common or shared equipment, (2) software failures, and (3) inadvertent operator actions. Other contributors include loss of electric power, spurious signals that originate from within the DCS, lack of coverage, common cause failure (CCF) of redundant hardware, CCF of instrument channels, and physical damage from externally initiated events (e.g., high temperature). Much variation is expected in the reliability performance of F-T DCSs. Although some systems may operate for 10 or 15 years without experiencing system failures, other systems may fail several times during the same time interval. This variation is expected among systems of different architectures as well as among systems of the same architecture. Because most failures of DCSs can be traced to some kind of CCD, particularly software failures and inadvertent operator actions, CCFs should probably receive more attention than they are presently given when selecting an F-T DCS.

Paula, H.M.; Roberts, M.W. (JBF Associates, Inc., Knoxville, TN (USA)); Battle, R.E. (Oak Ridge National Lab., TN (USA))

1991-04-01

62

An Algebra of Fault Tolerance  

CERN Multimedia

Every system of any significant size is created by composition from smaller sub-systems or components. It is thus fruitful to analyze the fault-tolerance of a system as a function of its composition. In this paper, two basic types of system composition are described, and an algebra to describe fault tolerance of composed systems is derived. The set of systems forms monoids under the two composition operators, and a semiring when both are concerned. A partial ordering relation between systems is used to compare their fault-tolerance behaviors.

Rao, Shrisha

2009-01-01

63

Passive Fault Tolerant Control of Piecewise Affine Systems Based on H Infinity Synthesis  

DEFF Research Database (Denmark)

In this paper we design a passive fault tolerant controller against actuator faults for discretetime piecewise affine (PWA) systems. By using dissipativity theory and H analysis, fault tolerant state feedback controller design is expressed as a set of Linear Matrix Inequalities (LMIs). In the current paper, the PWA system switches not only due to the state but also due to the control input. The method is applied on a large scale livestock ventilation model.

Gholami, Mehdi; Cocquempot, vincent

2011-01-01

64

New Fault Tolerance Approach using Antecedence Graphs in Multi Agent Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Mobile agents are distributed programs which can move autonomously in a network, to perform tasks on behalf of user. They are susceptible to failures due to faults in communication channels, processors or malicious programs. In order to gain solid foundation at the heart of today's esociety, the mobile agent technology must address the issue of fault tolerance. Checkpointing has been widely used technique for providing fault tolerance in mobile agent systems. But the traditional message passi...

Kaur, Ramandeep; Krishna Challa, Rama; Singh, Rajwinder

2010-01-01

65

A Novel Fault Tolerant Reversible Gate For Nanotechnology Based Systems  

Directory of Open Access Journals (Sweden)

Full Text Available This paper proposes a novel reversible logic gate, NFT. It is a parity preserving reversible logic gate, that is, the parity of the outputs matches that of the inputs. We demonstrate that the NFT gate can implement all Boolean functions. It renders a wide class of circuit faults readily detectable at the circuit's outputs. The proposed parity preserving reversible gate, allows any fault that affects no more than a single signal to be detectable at the circuit's primary outputs. The NFT gate can be used to make fault tolerant reversible logic circuits. We demonstrate how the well-known, and very useful, Toffoli gate can be synthesized from only two parity-preserving reversible gates. We show that our proposed parity-preserving Toffoli gate is much better in terms of number of reversible gates, number of garbage outputs and hardware complexity with compared to the existing counterpart.

Majid Haghparast

2008-01-01

66

Software engineering for fault-tolerant systems. Final technical report, Jan 89-Aug 90  

Energy Technology Data Exchange (ETDEWEB)

The objectives of this study are to (1) assess the current state of the art of fault tolerant software schemes, (2) evaluate the status of various software engineering issues in this context, (3) identify critical gaps in the currently available technology and, (4) provide recommendations for research and development efforts to enhance the technological base of fault tolerant software engineering. Towards these objectives, the authors have discussed several software fault tolerance schemes, studied the available experimental and analytical evidence about their usefulness and assessed the current status of fault tolerant software engineering for sequential and parallel computers. Based on the studies reported here, they feel that the current state-of-the-art of fault tolerant software is mature enough to tolerate design faults in specific circumstances with appropriate provisions of redundancy and allied supporting mechanisms. However, no known fault tolerance technique can guarantee failure-free system operation. Further, it is questionable whether the current approaches are cost-effective in achieving the desired gain in operational software reliability. They feel that what is needed is a systematic, cost effective approach to software development which explicitly addresses the fault tolerance issues throughout the development life-cycle.

Goel, A.L.; Mansour, N.

1991-03-01

67

Analysis and optimization of fault-tolerant embedded systems with hardened processors  

DEFF Research Database (Denmark)

In this paper we propose an approach to the design optimization of fault-tolerant hard real-time embedded systems, which combines hardware and software fault tolerance techniques. We trade-off between selective hardening in hardware and process reexecution in software to provide the required levels of fault tolerance against transient faults with the lowest-possible system costs. We propose a system failure probability (SFP) analysis that connects the hardening level with the maximum number of reexecutions in software. We present design optimization heuristics, to select the fault-tolerant architecture and decide process mapping such that the system cost is minimized, deadlines are satisfied, and the reliability requirements are fulfilled.

Pop, Paul

2009-01-01

68

Synthesis of Fault-Tolerant Embedded Systems with Checkpointing and Replication  

DEFF Research Database (Denmark)

We present an approach to the synthesis of fault-tolerant hard real-time systems for safety-critical applications. We use checkpointing with rollback recovery and active replication for tolerating transient faults. Processes are statically scheduled and communications are performed using the time-triggered protocol. Our synthesis approach decides the assignment of fault-tolerance policies to processes, the optimal placement of checkpoints and the mapping of processes to processors such that transient faults are tolerated and the timing constraints of the application are satisfied. We present several synthesis algorithms which are able to find fault-tolerant implementations given a limited amount of resources. The developed algorithms are evaluated using extensive experiments, including a real-life example.

Izosimov, Viacheslav; Pop, Paul

2006-01-01

69

Design Optimization of Time- and Cost-Constrained Fault-Tolerant Distributed Embedded Systems  

DEFF Research Database (Denmark)

In this paper we present an approach to the design optimization of fault-tolerant embedded systems for safety-critical applications. Processes are statically scheduled and communications are performed using the time-triggered protocol. We use process re-execution and replication for tolerating transient faults. Our design optimization approach decides the mapping of processes to processors and the assignment of fault-tolerant policies to processes such that transient faults are tolerated and the timing constraints of the application are satisfied. We present several heuristics which are able to find fault-tolerant implementations given a limited amount of resources. The developed algorithms are evaluated using extensive experiments, including a real-life example.

Izosimov, Viacheslav; Pop, Paul

2005-01-01

70

Implementing Fault-Tolerance in Real-Time Systems by Program Transformations  

Digital Repository Infrastructure Vision for European Research (DRIVER)

We present a formal approach to implement fault-tolerance in real-time embedded systems. The initial fault-intolerant system consists of a set of independent periodic tasks scheduled onto a set of fail-silent processors connected by a reliable communication network. We transform the tasks such that, assuming the availability of an additional spare processor, the system tolerates one failure at a time (transient or permanent). Failure detection is implemented using heartbeating, and failure ma...

2006-01-01

71

Algorithm Based Fault Tolerant and Check Pointing for High Performance Computing Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

We present a new approach to fault tolerance for High Performance Computing system. An important consideration in the design of high performance multiprocessor systems is to ensure the correctness of the results computed in the presence of transient and intermittent failures. Concurrent error detection and correction have been applied to such systems in order to achieve reliability. Algorithm Based Fault Tolerance (ABFT) has been suggested as a cost-effective concurrent error detection scheme...

Hodjatollah Hamidi; Vafaei, A.; Monadjemi, A. H.

2009-01-01

72

Transient Fault Tolerance and System Safety Enhancement Based on System Theory  

Directory of Open Access Journals (Sweden)

Full Text Available Transient faults are hard to be detected and located due to their unpredictable nature and short duration, and they are the dominant causations of system failures, which makes it necessary to consider transient fault-tolerant design in the development of modern safety-critical industrial system. In this paper an approach based on system theory is proposed to tolerate the transient faults in tunnel construction wireless monitoring and control systems (TCWMCS, in which the effects of transient faults are expressed by dysfunction of interactions among software applications. After analyzing the dysfunctional interactions of the system by the operational process model and educing the causes of dysfunction in the functional control diagram, a safety enhancement way was proposed for the designers, in which effictive safety constraints were set up to tolerate the transient faults. The experiment evaluation indicated that the effects of transient faults could be exposed by the causal factors of dysfunctional interactions and system safety could be enhanced by the enforcement of  appropriate constraints.

Mingyue Yang

2011-10-01

73

FATOMAS - A Fault-Tolerant Mobile Agent System Based on the Agent-Dependent Approach  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Fault tolerance is fundamental to the further development of mobile agent applications. In the context of mobile agents, fault-tolerance prevents a partial or complete loss of the agent, i.e., it ensures that the agent arrives at its destination. In this paper, we present FATOMAS, a Java-based fault-tolerant mobile agent system based on an algorithm presented in an earlier paper. In contrary to the standard ``place-dependent'' architectural approach, FATOMAS uses the novel ``agent-d...

Pleisch, Stefan; Schiper, Andre?

2001-01-01

74

Fault recovery characteristics of the fault tolerant multi-processor  

Science.gov (United States)

The fault handling performance of the fault tolerant multiprocessor (FTMP) was investigated. Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles byzantine or lying faults. It is pointed out that these weak areas in the FTMP's design increase the probability that, for any hardware fault, a good LRU (line replaceable unit) is mistakenly disabled by the fault management software. It is concluded that fault injection can help detect and analyze the behavior of a system in the ultra-reliable regime. Although fault injection testing cannot be exhaustive, it has been demonstrated that it provides a unique capability to unmask problems and to characterize the behavior of a fault-tolerant system.

Padilla, Peter A.

1990-01-01

75

Award ER25750: Coordinated Infrastructure for Fault Tolerance Systems Indiana University Final Report  

Energy Technology Data Exchange (ETDEWEB)

The main purpose of the Coordinated Infrastructure for Fault Tolerance in Systems initiative has been to conduct research with a goal of providing end-to-end fault tolerance on a systemwide basis for applications and other system software. While fault tolerance has been an integral part of most high-performance computing (HPC) system software developed over the past decade, it has been treated mostly as a collection of isolated stovepipes. Visibility and response to faults has typically been limited to the particular hardware and software subsystems in which they are initially observed. Little fault information is shared across subsystems, allowing little flexibility or control on a system-wide basis, making it practically impossible to provide cohesive end-to-end fault tolerance in support of scientific applications. As an example, consider faults such as communication link failures that can be seen by a network library but are not directly visible to the job scheduler, or consider faults related to node failures that can be detected by system monitoring software but are not inherently visible to the resource manager. If information about such faults could be shared by the network libraries or monitoring software, then other system software, such as a resource manager or job scheduler, could ensure that failed nodes or failed network links were excluded from further job allocations and that further diagnosis could be performed. As a founding member and one of the lead developers of the Open MPI project, our efforts over the course of this project have been focused on making Open MPI more robust to failures by supporting various fault tolerance techniques, and using fault information exchange and coordination between MPI and the HPC system software stack?from the application, numeric libraries, and programming language runtime to other common system components such as jobs schedulers, resource managers, and monitoring tools.

Lumsdaine, Andrew

2013-03-08

76

Quorums Systems as a Method to Enhance Collaboration for Achieving Fault Tolerance in Distributed System  

Digital Repository Infrastructure Vision for European Research (DRIVER)

A system that implements the byzantine agreement algorithm is supposed to be very reliable and robust because of its fault tolerating feature. For very realistic environments, byzantine agreement protocols becomes inadequate, because they are based on the assumption that failures are controlled and they have unlimited severity. The byzantine agreement model works with a number of bounded failures that have to be tolerated. It is never concerned to identify these failures or to exclude them fr...

2009-01-01

77

Quorums Systems as a Method to Enhance Collaboration for Achieving Fault Tolerance in Distributed System  

Directory of Open Access Journals (Sweden)

Full Text Available A system that implements the byzantine agreement algorithm is supposed to be very reliable and robust because of its fault tolerating feature. For very realistic environments, byzantine agreement protocols becomes inadequate, because they are based on the assumption that failures are controlled and they have unlimited severity. The byzantine agreement model works with a number of bounded failures that have to be tolerated. It is never concerned to identify these failures or to exclude them from the system. In this paper, we tackle quorum systems, which is a particular sort of distributed systems where some storage or computations are replicated on various machines in the idea that some of them work correctly to produce a reliable output at some given moment of time. Thus, by majority voting collaboration with quorums, one can achieve fault tolerance in distributed systems. Further, we argue that an algorithm to identify faulty-behaving machines is useful to identify purposeful malicious behaviors.

Ioan PETRI

2009-01-01

78

Towards fault-tolerant decision support systems for ship operator guidance  

DEFF Research Database (Denmark)

Fault detection and isolation are very important elements in the design of fault-tolerant decision support systems for ship operator guidance. This study outlines remedies that can be applied for fault diagnosis, when the ship responses are assumed to be linear in the wave excitation. A novel numerical procedure is described for the calculation of residuals using the ship's transfer functions which correlate the wave excitation and the ship responses. As tests, multiplicative faults have artificially been imposed to full-scale motion measurements and it is shown that the developed model is able to detect and isolate all faults.

Nielsen, Ulrik Dam; Jensen, Jørgen Juncher

2012-01-01

79

Towards fault-tolerant decision support systems for ship operator guidance  

International Nuclear Information System (INIS)

Fault detection and isolation are very important elements in the design of fault-tolerant decision support systems for ship operator guidance. This study outlines remedies that can be applied for fault diagnosis, when the ship responses are assumed to be linear in the wave excitation. A novel numerical procedure is described for the calculation of residuals using the ship's transfer functions which correlate the wave excitation and the ship responses. As tests, multiplicative faults have artificially been imposed to full-scale motion measurements and it is shown that the developed model is able to detect and isolate all faults.

2012-08-01

80

Application-driven co-design of fault-tolerant industrial systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper presents a novel methodology for the HW/SW co-design of fault tolerant embedded systems that pursues the mitigation of radiation-induced upset events (which are a class of Single Event Effects - SEEs) on critical industrial applications. The proposal combines the flexibility and low cost of Software Implemented Hardware Fault Tolerance (SIHFT) techniques with the high reliability of selective hardware replication. The co-design flow is supported by a hardening platform that compris...

Restrepo Calle, Felipe; Marti?nez A?lvarez, Antonio; Guzma?n Miranda, Hipo?lito; Palomo Pinto, Francisco Rogelio; Cuenca Asensi, Sergio

2010-01-01

 
 
 
 
81

SFT: Scalable Fault Tolerance  

Energy Technology Data Exchange (ETDEWEB)

In this paper we will present a new technology that we are currently developing within the SFT: Scalable Fault Tolerance FastOS project which seeks to implement fault tolerance at the operating system level. Major design goals include dynamic reallocation of resources to allow continuing execution in the presence of hardware failures, very high scalability, high efficiency (low overhead), and transparency—requiring no changes to user applications. Our technology is based on a global coordination mechanism, that enforces transparent recovery lines in the system, and TICK, a lightweight, incremental checkpointing software architecture implemented as a Linux kernel module. TICK is completely user-transparent and does not require any changes to user code or system libraries; it is highly responsive: an interrupt, such as a timer interrupt, can trigger a checkpoint in as little as 2.5?s; and it supports incremental and full checkpoints with minimal overhead—less than 6% with full checkpointing to disk performed as frequently as once per minute.

Petrini, Fabrizio; Nieplocha, Jarek; Tipparaju, Vinod

2006-04-15

82

Mapping of Fault-Tolerant Applications with Transparency on Distributed Embedded Systems  

DEFF Research Database (Denmark)

In this paper we present an approach for the mapping optimization of fault-tolerant embedded systems for safety-critical applications. Processes and messages are statically scheduled. Process re-execution is used for recovering from multiple transient faults. We call process recovery transparent if it does not affect operation of other processes. Transparent recovery has the advantage of fault containment, improved debugability and less memory needed to store the fault-tolerant schedules. However, it will introduce additional delays that can lead to violations of the timing constraints of the application. We propose an algorithm for the mapping of fault-tolerant applications with transparency. The algorithm decides a mapping of processes on computation nodes such that the application is schedulable and the transparency properties imposed by the designer are satisfied. The mapping algorithm is driven by a heuristic that is able to estimate the worst-case schedule length and indicate whether a certain mapping alternative is schedulable

Izosimov, Viacheslav; Pop, Paul

2006-01-01

83

Fault Tolerant Control Systems : a Development Method and Real-Life Case Study  

DEFF Research Database (Denmark)

This thesis considered the development of fault tolerant control systems. The focus was on the category of automated processes that do not necessarily comprise a high number of identical sensors and actuators to maintain safe operation, but still have a potential for improving immunity to component failures. It is often feasible to increase availability for these control loops by designing the control system to perform on-line detection and reconfiguration in case of faults before the safety system makes a close-down of the process. A general development methodology is given in the thesis that carried the control system designer through the steps necessary to consider fault handling in an early design phase. It was shown how an existing control loop with interface to the plant wide control system could be extended with three additional modules to obtain fault tolerance: Fault detection and isolation, remedial action decision, and reconfiguration. The integration of these modules in software were considered. The general methodology covered the analysis, design, and implementation of fault tolerant control systems on an overall level. Two detailed studies were presented, one on fault detection and isolation design and one on design of the decision logic. Two application case studies were used to emphasize practical aspects of both the development methodology and the detailed studies. One was an electro-mechanical actuator in a position control loop for a diesel engine speed governor where the purpose was to avoid a total close-down in case of the most likely faults. The second was a fault tolerant attitude control system for a micro satellite where the operation of the system is mission critical. The purpose was to avoid hazardous effects from faults and maintain operation if possible. A method was introduced that, after a systematic examination of possible component failures, enables analysis of the relationship between failures and their consequences for the system's operation. This fault propagation analysis is based on coarse models of the subsystems describing the reaction to faults, as for example a variable being zero, low or high. Examples were given that illustrate how such models can be established by simple means, and yet provide important information when combined into a complete system. A special achievement was a method to determine how control loops behave in case of faults. This is not straight forward as the system behaviour depends on the character of the feedback. One of the detailed studies were the design of the decision logic in fault handling, realized as state-event machines. Guidelines for the design were provided, based on experience from the two case studies. Methods for verifying correct operation of the decision logic were described, where a completeness check against the fault propagation analysis is able to guarantee coverage of all considered faults. The usage of software tools to support the development process was illustrated with an off-the-shelf product for constraint logic solving and state-event machine analysis. The coarse system models and the decision logic were analyzed with the tool-box and it was shown how an easy analysis could be performed to verify correctness and completeness of the fault handling design. Experience from this study highlights requirements for a dedicated software environment for fault tolerant control systems design. The second detailed study addressed the detection of a fault event and determination of the failed component. A variety of algorithms were compared, based on two fault scenarios in the speed governor actuator setup. One was a position sensor fault and the second was an actuator current fault. The sensor fault detection was trivial, whereas the actuator fault was more challenging. The study demonstrated that many existing methods have a potential to detect and isolate the two faults, but also that the research field still misses a systematic approach to handle realistic problems such as low sampling rate and nonlinear characteristics of the system

Bøgh, S.A.

1997-01-01

84

Documentation of the Current Fault Detection, Isolation and Reconfiguration Software of the AIPS (Advanced Information Processing System) Fault-Tolerant Processor.  

Science.gov (United States)

Documentation is presented of the December 1986 version of the ADA code for the fault detection, isolation, and reconfiguration (FDIR) functions of the Advanced Information Processing System (AIPS) Fault-Tolerant Processor (FTP). Because the FTP is still ...

D. T. Lanning A. W. Shepard S. C. Johnson

1987-01-01

85

Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 1: Army fault tolerant architecture overview  

Science.gov (United States)

Digital computing systems needed for Army programs such as the Computer-Aided Low Altitude Helicopter Flight Program and the Armored Systems Modernization (ASM) vehicles may be characterized by high computational throughput and input/output bandwidth, hard real-time response, high reliability and availability, and maintainability, testability, and producibility requirements. In addition, such a system should be affordable to produce, procure, maintain, and upgrade. To address these needs, the Army Fault Tolerant Architecture (AFTA) is being designed and constructed under a three-year program comprised of a conceptual study, detailed design and fabrication, and demonstration and validation phases. Described here are the results of the conceptual study phase of the AFTA development. Given here is an introduction to the AFTA program, its objectives, and key elements of its technical approach. A format is designed for representing mission requirements in a manner suitable for first order AFTA sizing and analysis, followed by a discussion of the current state of mission requirements acquisition for the targeted Army missions. An overview is given of AFTA's architectural theory of operation.

Harper, R. E.; Alger, L. S.; Babikyan, C. A.; Butler, B. P.; Friend, S. A.; Ganska, R. J.; Lala, J. H.; Masotto, T. K.; Meyer, A. J.; Morton, D. P.

1992-01-01

86

Design and development of algorithms for fault-tolerant distributed systems  

Energy Technology Data Exchange (ETDEWEB)

This thesis describes the design and development of algorithms for fault tolerant distributed systems. The development of such algorithms requires making assumptions about the types of component faults for which tolerance is to be provided. Such assumptions must be specified accurately. To this end, this thesis develops a classification of faults in systems. This fault classification identifies a range of fault types from the most restricted to the least restricted. For each fault type, an algorithm for reaching distributed agreement in the presence of a bounded number of faulty processors is developed, and thus a family of agreement algorithms is presented. The influence of the various fault types on the complexities of these algorithms is discussed. Early stopping algorithms are also developed for selected fault types and the influence of fault types on the early stopping conditions of the respective algorithms is analyzed. The problem of evaluating the performance of distributed replicated systems which will require agreement algorithms is considered next. As a first step in the direction of meeting this challenging task, a pipeline triple modular redundant system is considered and analytical methods are derived to evaluate the performance of such a system. Finally, the accuracy of these methods is examined using computer simulations.

Ezhilchelvan, P.D.

1989-01-01

87

A Log-Scaling Fault Tolerant Agreement Algorithm for a Fault Tolerant MPI  

Energy Technology Data Exchange (ETDEWEB)

The lack of fault tolerance is becoming a limiting factor for application scalability in HPC systems. The MPI does not provide standardized fault tolerance interfaces and semantics. The MPI Forum's Fault Tolerance Working Group is proposing a collective fault tolerant agreement algorithm for the next MPI standard. Such algorithms play a central role in many fault tolerant applications. This paper combines a log-scaling two-phase commit agreement algorithm with a reduction operation to provide the necessary functionality for the new collective without any additional messages. Error handling mechanisms are described that preserve the fault tolerance properties while maintaining overall scalability.

Hursey, Joshua J [ORNL; Naughton, III, Thomas J [ORNL; Vallee, Geoffroy R [ORNL; Graham, Richard L [ORNL

2011-01-01

88

From Fault-tolerance to Attack Tolerance.  

Science.gov (United States)

Means to build fault-tolerant services have been at hand for some time. Defense against attacks remains a difficult problem, though. The problem becomes ever more urgent with the increasing use of networked computing systems in our society's critical infr...

F. B. Schneider

2011-01-01

89

Fault detection and fault tolerant control of a smart base isolation system with magneto-rheological damper  

International Nuclear Information System (INIS)

Fault detection and isolation (FDI) in real-time systems can provide early warnings for faulty sensors and actuator signals to prevent events that lead to catastrophic failures. The main objective of this paper is to develop FDI and fault tolerant control techniques for base isolation systems with magneto-rheological (MR) dampers. Thus, this paper presents a fixed-order FDI filter design procedure based on linear matrix inequalities (LMI). The necessary and sufficient conditions for the existence of a solution for detecting and isolating faults using the H? formulation is provided in the proposed filter design. Furthermore, an FDI-filter-based fuzzy fault tolerant controller (FFTC) for a base isolation structure model was designed to preserve the pre-specified performance of the system in the presence of various unknown faults. Simulation and experimental results demonstrated that the designed filter can successfully detect and isolate faults from displacement sensors and accelerometers while maintaining excellent performance of the base isolation technology under faulty conditions

2011-08-01

90

Preface of the special issue on Advances in Control and Fault-Tolerant Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Today's automatic control systems are of high degrees of integration, complexity, embedding and networking of heterogeneous entities. This trend is driven by the industrial needs for achieving new technical performance and meeting additional performance demands. A most critical and important issue surrounding the design and operation of complex automatic systems is the application of Fault Detection and Isolation and Fault-Tolerant Control (FDI/FTC) technology, aiming at guaranteeing high sys...

Korbicz, Jozef; Maquin, Didier; Theilliol, Didier

2012-01-01

91

Fault Tolerant Control Systems : a Development Method and Real-Life Case Study  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This thesis considered the development of fault tolerant control systems. The focus was on the category of automated processes that do not necessarily comprise a high number of identical sensors and actuators to maintain safe operation, but still have a potential for improving immunity to component failures. It is often feasible to increase availability for these control loops by designing the control system to perform on-line detection and reconfiguration in case of faults before the safety ...

2005-01-01

92

Fault Tolerant Control Systems : a Development Method and Real-Life Case Study  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This thesis considered the development of fault tolerant control systems. The focus was on the category of automated processes that do not necessarily comprise a high number of identical sensors and actuators to maintain safe operation, but still have a potential for improving immunity to component failures. It is often feasible to increase availability for these control loops by designing the control system to perform on-line detection and reconfiguration in case of faults before the safety ...

1997-01-01

93

Active Fault Tolerant Control-FTC-Design for Takagi-Sugeno Fuzzy Systems with Weighting Functions Depending on the FTC  

Directory of Open Access Journals (Sweden)

Full Text Available In this paper the problem of active fault tolerant control design for noisy systems described by Takagi-Sugeno fuzzy models is studied. The proposed control strategy is based on the known of the fault estimated and the error between the faulty system state and a reference system state. The considered systems are affected by actuator and sensor faults and have the weighting functions depending on the fault tolerant control. A mathematical transformation is used to conceive an augmented system in which all the faults affecting the initial system appear as actuator faults. Then, an adaptive proportional integral observer is used in order to estimate the state and the faults. The problem of conception of the proportional integral observer and of the fault tolerant control strategy is formulated in linear matrices inequalities which can be solved easily. To illustrate the proposed method, It is applied to the three tanks systems.

Atef Khedher

2011-05-01

94

A Piecewise Affine Hybrid Systems Approach to Fault Tolerant Satellite Formation Control  

DEFF Research Database (Denmark)

In this paper a procedure for modelling satellite formations   including failure dynamics as a piecewise-affine hybrid system is   shown. The formulation enables recently developed methods and tools   for control and analysis of piecewise-affine systems to be applied   leading to synthesis of fault tolerant controllers and analysis of   the system behaviour given possible faults.  The method is   illustrated using a simple example involving two satellites trying   to reach a specific formation despite of actuator faults occurring.

Grunnet, Jacob Deleuran; Larsen, Jesper Abildgaard

2008-01-01

95

Active fault tolerant control of piecewise affine systems with reference tracking and input constraints  

DEFF Research Database (Denmark)

An active fault tolerant control (AFTC) method is proposed for discrete-time piecewise affine (PWA) systems. Only actuator faults are considered. The AFTC framework contains a supervisory scheme, which selects a suitable controller in a set of controllers such that the stability and an acceptable performance of the faulty system are held. The design of the supervisory scheme is not considered here. The set of controllers is composed of a normal controller for the fault-free case, an active fault detection and isolation controller for isolation and identification of the faults, and a set of passive fault tolerant controllers (PFTCs) modules designed to be robust against a set of actuator faults. In this research, the piecewise nonlinear model is approximated by a PWA system. The PFTCs are state feedback laws. Each one is robust against a fixed set of actuator faults and is able to track the reference signal while the control inputs are bounded. The PFTC problem is transformed into a feasibility problem of a set of LMIs. The method is applied on a large-scale live-stock ventilation model.

Gholami, M.; Cocquempot, V.

2013-01-01

96

An Overview of Checkpointing Techniques for Fault Tolerance in Distributed Computing Systems  

Directory of Open Access Journals (Sweden)

Full Text Available Checkpointing is an important feature in distributed computing systems. It gives fault tolerance without requiring additional efforts from the programmer[1]. In order to provide fault tolerance for distributed systems, the checkpointing technique has widely been used and many researchers have been performed to reduce the overhead of checkpointing coordination. A checkpoint is a snapshot of the current state of a process. It saves enough information in non-volatile stable storage such that, if the contents of the volatile storage are lost due to process failure, one can reconstruct the process state from the information saved in the non-volatile stable storage [1].

Jagdish Makhijani Dr. Anil Rajput

2012-02-01

97

Fault Tolerance in a Multi-Layered DRE System: A Case Study  

Directory of Open Access Journals (Sweden)

Full Text Available

Dynamic resource management is a crucial part of the infrastructure for emerging distributed real-time embedded systems, responsible for keeping mission-critical applications operating and allocating the resources necessary for them to meet their requirements. Because of this, the resource manager must be fault-tolerant, with nearly continuous operation. This paper describes our efforts to develop a fault-tolerant multi-layer dynamic resource management capability and the challenges we encountered, some due to the fault tolerance requirements we needed to meet and others due to characteristics of the resource management software. The challenges include the need for extremely rapid recovery; supporting the characteristics of component middleware, including peer-to-peer communication and multi-tiered calling semantics; supporting multiple languages; and the co-existence of replicated and non-replicated elements. Making our multi-layer dynamic resource manager fault-tolerant required simultaneously overcoming all of these challenges, presenting a significant fault tolerance research challenge.

Matthew Gillen

2006-09-01

98

Interactive Animation of Fault Tolerant Parallel Algorithms.  

Science.gov (United States)

Animation of algorithms makes understanding them intuitively easier. This paper describes the software tool Raft (Robust Animator of Fault Tolerant Algorithms). The Raft system allows the user to animate a number of parallel algorithms which achieve fault...

S. W. Apgar

1992-01-01

99

Closed-loop fault-tolerant control for uncertain nonlinear systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

We are designing, perhaps for the first time, closed-loop fault-tolerant control for uncertain nonlinear systems. Our solution is based on a new algebraic estimation technique of the derivatives of a time signal, which • yields good estimates of the unknown parameters and of the residuals, i.e., of the fault indicators, • is easily implementable in real time, • is robust with respect to a large variety of noises, without any necessity of knowing their statistical properties. Convincing ...

Fliess, Michel; Join, Ce?dric; Sira-ramirez, Hebertt

2005-01-01

100

Fault tolerant control of outdoor air and AHU supply air temperature in VAV air conditioning systems using PCA method  

Energy Technology Data Exchange (ETDEWEB)

This paper presents a fault tolerant control method to control the outdoor air ventilation and AHU supply air temperature, which concerned indoor air quality and humidity, respectively to satisfy ASHRAE Standard in VAV systems. The principal component analysis method, joint angle method, and compensatory reconstruction are used to detect, isolate, and reconstruct the fault, respectively for fault tolerant control. They are tested and evaluated in a simulation environment under the condition of temperature and flow sensors with fix bias faults. (author)

Jin, Xinqiao; Du, Zhimin [Department of Mechanical Engineering, Shanghai Jiao Tong University (China)

2006-08-15

 
 
 
 
101

Energy/Reliability Trade-offs in Fault-Tolerant Event-Triggered Distributed Embedded Systems  

DEFF Research Database (Denmark)

This paper presents an approach to the synthesis of low-power fault-tolerant hard real-time applications mapped on distributed heterogeneous embedded systems. Our synthesis approach decides the mapping of tasks to processing elements, as well as the voltage and frequency levels for executing each task, such that transient faults are tolerated, the timing constraints of the application are satisfied, and the energy consumed is minimized. Tasks are scheduled using fixed-priority preemptive scheduling, while replication is used for recovery from multiple transient faults. Addressing energy and reliability simultaneously is especially challenging, since lowering the voltage to reduce the energy consumption has been shown to increase the transient fault rate. We presented a Tabu Search-based approach which uses an energy/reliability trade-off model to find reliable and schedulable implementations with limited energy and hardware resources. We evaluated the algorithm proposed using several synthetic and reallife benchmarks.

Gan, Junhe; Gruian, Flavius

2011-01-01

102

Fault Tolerant Computer Architecture  

CERN Document Server

For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes

Sorin, Daniel

2009-01-01

103

BYZANTINE FAULT TOLERANCE MODEL FOR SOAP FAULTS  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The proposed model is to configure Byzantine Fault Tolerance mechanism for every SOAP fault message that is transmitted. The reliability and availability are of major requirements of Web services since they operate in the distributed environment. One of the reliability issues is handling faults. Fault occurs in all the phases of Service Oriented Architecture i.e. during publishing, discovery, composition, binding, and execution. These faults maylead to service downtime, behaves abnormally, an...

2012-01-01

104

Tolerance of design faults  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The idea that diverse or dissimilar computations could be used to detect errors can be traced back to Dynosius Lardner's analysis of Babbage's mechanical computers in the early 19th century. In the modern era of electronic computers, diverse redundancy techniques were pioneered in the 1970's by Elmendorf, Randell, Avi?zienis and Chen. Since then, the tolerance of design faults has been a very active research topic, which has had practical impact on real critical applications. In this paper, ...

Powell, David; Arlat, Jean; Deswarte, Yves; Kanoun, Karama

2011-01-01

105

Fault Injection and Monitoring Capability for a Fault-Tolerant Distributed Computation System  

Science.gov (United States)

The Configurable Fault-Injection and Monitoring System (CFIMS) is intended for the experimental characterization of effects caused by a variety of adverse conditions on a distributed computation system running flight control applications. A product of research collaboration between NASA Langley Research Center and Old Dominion University, the CFIMS is the main research tool for generating actual fault response data with which to develop and validate analytical performance models and design methodologies for the mitigation of fault effects in distributed flight control systems. Rather than a fixed design solution, the CFIMS is a flexible system that enables the systematic exploration of the problem space and can be adapted to meet the evolving needs of the research. The CFIMS has the capabilities of system-under-test (SUT) functional stimulus generation, fault injection and state monitoring, all of which are supported by a configuration capability for setting up the system as desired for a particular experiment. This report summarizes the work accomplished so far in the development of the CFIMS concept and documents the first design realization.

Torres-Pomales, Wilfredo; Yates, Amy M.; Malekpour, Mahyar R.

2010-01-01

106

An Introduction to Software Engineering and Fault Tolerance  

CERN Document Server

This book consists of the chapters describing novel approaches to integrating fault tolerance into software development process. They cover a wide range of topics focusing on fault tolerance during the different phases of the software development, software engineering techniques for verification and validation of fault tolerance means, and languages for supporting fault tolerance specification and implementation. Accordingly, the book is structured into the following three parts: Part A: Fault tolerance engineering: from requirements to code; Part B: Verification and validation of fault tolerant systems; Part C: Languages and Tools for engineering fault tolerant systems.

Pelliccione, Patrizio; Guelfi, Nicolas; Romanovsky, Alexander

2010-01-01

107

A Fault Tolerant Colored Petri Net Model for Flexible Manufacturing Systems  

Scientific Electronic Library Online (English)

Full Text Available SciELO Brazil | Language: English Abstract in english This paper introduces an approach based on Colored Petri Nets (CPN) to systematically introduce fault-tolerance in the design of a supervisor for a Flexible Manufacturing System (FMS). The system is modeled by means of Place/Transition nets and then is structurally reduced, resulting in a CPN that i [...] s independent of a specific production route. The introduction of fault tolerance in the design of such a supervisor considers both forward recovery and backward recovery. For forward recovery we anticipate faults in resources in a production route and reschedule the production routes for production orders before the faulty resource is reached. The backward recovery is considered at the level of a resource in such a way that when a faulty resource is fixed, the operation restarts on the last consistent operation executed

Barros, Tomaz C.; Figueiredo, Jorge C.A. de; Perkusich, Angelo.

108

A Review of Checkpointing Based Fault Tolerance Techniques in Mobile Distributed Systems  

Directory of Open Access Journals (Sweden)

Full Text Available Fault Tolerance Techniques enable systems to perform tasks in the presence of faults. A checkpoint is a local state of a process saved on stable storage. In a distributed system, since the processes in the system do not share memory, a global state of the system isdefined as a set of local states, one from each process. In case of a fault in distributed systems, checkpointing enables the execution of a program to be resumed from a previous consistent global state rather than resuming the execution from the beginning. In this way, the amount of useful processing lost because of the fault is significantly reduced. Checkpointing is an effective fault tolerant technique in distributed system as it avoids the domino effect and require minimum storage requirement. Most of the earlier coordinated checkpoint algorithms block their computation during checkpointing and forces minimum-process or nonblocking even though many of them may not be necessary or non-blocking minimum-process but takes useless checkpoints or reduced useless checkpoint but has higher synchronization message overhead or has high checkpoint request propagation time. In this paper, we discuss various issues related to the checkpointing for distributed systems and mobile computing environments. We also present a survey of some checkpointing algorithms for distributed systems.

Rachit Garg,

2010-07-01

109

Fault-Tolerant Heat Exchanger  

Science.gov (United States)

A compact, lightweight heat exchanger has been designed to be fault-tolerant in the sense that a single-point leak would not cause mixing of heat-transfer fluids. This particular heat exchanger is intended to be part of the temperature-regulation system for habitable modules of the International Space Station and to function with water and ammonia as the heat-transfer fluids. The basic fault-tolerant design is adaptable to other heat-transfer fluids and heat exchangers for applications in which mixing of heat-transfer fluids would pose toxic, explosive, or other hazards: Examples could include fuel/air heat exchangers for thermal management on aircraft, process heat exchangers in the cryogenic industry, and heat exchangers used in chemical processing. The reason this heat exchanger can tolerate a single-point leak is that the heat-transfer fluids are everywhere separated by a vented volume and at least two seals. The combination of fault tolerance, compactness, and light weight is implemented in a unique heat-exchanger core configuration: Each fluid passage is entirely surrounded by a vented region bridged by solid structures through which heat is conducted between the fluids. Precise, proprietary fabrication techniques make it possible to manufacture the vented regions and heat-conducting structures with very small dimensions to obtain a very large coefficient of heat transfer between the two fluids. A large heat-transfer coefficient favors compact design by making it possible to use a relatively small core for a given heat-transfer rate. Calculations and experiments have shown that in most respects, the fault-tolerant heat exchanger can be expected to equal or exceed the performance of the non-fault-tolerant heat exchanger that it is intended to supplant (see table). The only significant disadvantages are a slight weight penalty and a small decrease in the mass-specific heat transfer.

Izenson, Michael G.; Crowley, Christopher J.

2005-01-01

110

Stochastic Models for Fault Tolerance  

CERN Multimedia

As modern society relies on the fault-free operation of complex computing systems, system fault-tolerance has become an indispensable requirement. Therefore, we need mechanisms that guarantee correct service in cases where system components fail, be they software or hardware elements. Redundancy patterns are commonly used, for either redundancy in space or redundancy in time. Wolter's book details methods of redundancy in time that need to be issued at the right moment. In particular, she addresses the so-called "timeout selection problem", i.e., the question of choosing the right ti

Wolter, Katinka M

2010-01-01

111

STUDIES ON CONFIGURATION AND RECOVERY TECHNIQUES FOR FAULT-TOLERANT COMPUTING SYSTEMS  

Digital Repository Infrastructure Vision for European Research (DRIVER)

It is of great importance to operate a computer system with high reliability. Several techniques to achieve the high reliability of a computer system have been proposed and implemented in the real computer systems. This dissertation discusses configuration and recovery techniques for fault-tolerant computing systems, for which stochastic models are presented to evaluate performance and/or reliability. Chapter 1 gives introduction for configuration and recovery techniques based on the concept ...

??, ?.; ????, ???; Fukumoto, Satoshi

1992-01-01

112

Gyro autonomy and fault tolerance  

Science.gov (United States)

The paper deals with studies on autonomous systems making attitude control system hardware more redundant and fault-tolerant, with little or no hardware modification. Two different approaches were taken, one to assess how such capabilities could be enhanced without use of advanced computer science techniques such as artificial intelligence, and the other based on artificial intelligence expert systems. The goal of the investigation was to develop a reasoning process in order to detect operations failure of a satellite having redundant gyroscopes.

Abel, S.; Lachman, M. E.

1984-08-01

113

Fault Tolerance Mobile Agent System Using Witness Agent in 2-Dimensional Mesh Network  

Directory of Open Access Journals (Sweden)

Full Text Available Mobile agents are computer programs that act autonomously on behalf of a user or its owner and travel through a network of heterogeneous machines. Fault tolerance is important in their itinerary. In this paper, existent methods of fault tolerance in mobile agents are described which they are considered in linear network topology. In the methods three agents are used to fault tolerance by cooperating to each others for detecting and recovering server and agent failure. Three types of agents are: actual agent which performs programs for its owner, witness agent which monitors the actual agent and the witness agent after itself, probe which is sent for recovery the actual agent or the witness agent on the side of the witness agent. Communication mechanism in the methods is message passing between these agents. The methods are considered in linear network. We introduce our witness agent approach for fault tolerance mobile agent systems in Two Dimensional Mesh (2D-Mesh Network. Indeed Our approach minimizes Witness-Dependency in this network and then represents its algorithm.

Ahmad Rostami

2010-09-01

114

Integrity-Enhancing Replica Coordination for Byzantine Fault Tolerant Systems  

CERN Document Server

Strong replica consistency is often achieved by writing deterministic applications, or by using a variety of mechanisms to render replicas deterministic. There exists a large body of work on how to render replicas deterministic under the benign fault model. However, when replicas can be subject to malicious faults, most of the previous work is no longer effective. Furthermore, the determinism of the replicas is often considered harmful from the security perspective and for many applications, their integrity strongly depends on the randomness of some of their internal operations. This calls for new approaches towards achieving replica consistency while preserving the replica randomness. In this paper, we present two such approaches. One is based on Byzantine agreement and the other on threshold coin-tossing. Each approach has its strength and weaknesses. We compare the performance of the two approaches and outline their respective best use scenarios.

Zhao, Wenbing

2008-01-01

115

Design of an active fault tolerant control for nonlinear systems described by a multi-model representation  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper, an new active Fault Tolerant Control (FTC) strategy is developed to nonlinear systems described by multiple linear models to prevent the system deterioration by the synthesis of adapted controllers. When a fault is detected by the fault detection and diagnosis scheme, the reconfigurable controller is designed automatically using a robust gain scheduling strategy. The main contribution concerns the design of state feedback gains through LMI both in fault-free and faulty cases in...

Rodrigues, Mickael; Theilliol, Didier; Sauter, Dominique

2005-01-01

116

Fault tolerant control for unstable systems: A linear time varying approach  

DEFF Research Database (Denmark)

In (passive) fault tolerant control design, the objective is to find a fixed compensator, which will maintain a suitable performance - or at least stability - in the event that a fault should occur. A major theoretical obstacle to obtain this objective, is that even if the system models corresponding to the occurrence of various faults are simultaneously stabilizable by a linear, time-invariant compensator, this compensator might have to be of very high order, as shown in a recent publication. In this paper, we propose a design procedure for a timevarying compensator, which overcomes the obstacle for any finite number of faults with a controller order of no more than the plant order. The performance of this compensator might be poor, but a heuristic procedure for improving the performance is also shown, and an example demonstrates that this improvement can be truly significant.

Niemann, Hans Henrik

2004-01-01

117

Design Optimization of Time- and Cost-Constrained Fault-Tolerant Embedded Systems with Checkpointing and Replication  

DEFF Research Database (Denmark)

We present an approach to the synthesis of fault-tolerant hard real-time systems for safety-critical applications. We use checkpointing with rollback recovery and active replication for tolerating transient faults. Processes and communications are statically scheduled. Our synthesis approach decides the assignment of fault-tolerance policies to processes, the optimal placement of checkpoints and the mapping of processes to processors such that multiple transient faults are tolerated and the timing constraints of the application are satisfied. We present several design optimization approaches which are able to find fault-tolerant implementations given a limited amount of resources. The developed algorithms are evaluated using extensive experiments, including a real-life example.

Pop, Paul

2009-01-01

118

New fault tolerant matrix converter  

Energy Technology Data Exchange (ETDEWEB)

The matrix converter (MC) presents a promising topology that will have to overcome certain barriers (protection systems, durability, the development of converters for real applications, etc.) in order to gain a foothold in the industry. In some applications, where continuous operation must be insured in the case of a system failure, improved reliability of the converter is of particular importance. In this sense, this article focuses on the study of a fault tolerant MC. The fault tolerance of a converter is characterized by its total or partial response in the case of a breakage of any of its components. Taking into consideration that virtually no work has been done on fault tolerant MCs, this paper describes the most important studies in this area. Moreover, a new method is proposed for detecting the breakage of MC semiconductors. Likewise, a new variation of SVM modulation with failure tolerance capacity is presented. This guarantees the continuous operation of the converter and the pseudo-optimum control of a PMSM. This paper also proposes a novel MC topology, which allows the flexible reconfiguration of this converter, when one or several of its semiconductors are damaged. In this way, the MC can continue operating at 100% of its performance without having to double its resources. In this way, it can be said that the solution described in this article represents a step forward towards the development of reliable matrix converters for real applications. (author)

Ibarra, Edorta; Andreu, Jon; Kortabarria, Inigo; Ormaetxea, Enekoitz; Alegria, Inigo Martinez de; Martin, Jose Luis [Department of Electronics and Telecommunications, University of the Basque Country, Alameda de Urquijo s/n, E-48013 Bilbao (Spain); Ibanez, Pedro [TECNALIA, Energy Unit, Parque Tecnologico de Zamudio, E-48170 Bizkaia (Spain)

2011-02-15

119

Fault-tolerant control design for trajectory tracking in driver assistance systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The paper proposes a control system with the brake and the steering for developing a driver assistance system. The purpose is to design a cruise control method to track the road geometry with a predefined velocity and guarantee the road stability of the vehicle simultaneously. An actuator selection method is developed in the control design, in which the actuator limits, energy requirements and vehicle operations are taken into consideration. The method is extended with a fault-tolerant featur...

Ne?meth, Bala?zs; Gaspar, Peter; Bokor, Jozsef; Sename, Olivier; Dugard, Luc

2012-01-01

120

Dynamic modelling and simulation of fault-tolerant systems based on stochastic activity networks  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Dependability analysis is crucial to control the risks resulting from failures in modern industrial systems. This paper proposes a modeling approach that constructs dynamic models of fault-tolerant (FT) systems based on Stochastic Activity Networks (SANs). This approach allows the systematic inclusion of the diagnosis performances in the dependability analysis. This SAN-model is used jointly with the Monte Carlo simulation to assess the impact of the diagnosis performance on the availability ...

Maza, Samia

2012-01-01

 
 
 
 
121

Fault Tolerance Mobile Agent System Using Witness Agent in 2-Dimensional Mesh Network  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Mobile agents are computer programs that act autonomously on behalf of a user or its owner and travel through a network of heterogeneous machines. Fault tolerance is important in their itinerary. In this paper, existent methods of fault tolerance in mobile agents are described which they are considered in linear network topology. In the methods three agents are used to fault tolerance by cooperating to each others for detecting and recovering server and agent failure. Three types of agents ar...

Ahmad Rostami; Hassan Rashidi; Majidreza Shams Zahraie

2010-01-01

122

Fault-Tolerant Identification in Wireless Sensor Networks for Maximizing System Lifetime  

Directory of Open Access Journals (Sweden)

Full Text Available Wireless Sensor Network (WSN is used by manyapplications such as security, command and control andsurveillance monitoring. In all such applications, themain application of WSN is sensing data and retrieval ofdata. There are many WSN systems that are querybased. They give responses in a stipulated time based onthe user’s query word. However, the WSN has possiblesensor faults for it is not reliable and thus the networkenergy level goes down. It results in reduction of lifetimeof network. To overcome the fault tolerance mechanismscan be used to improve reliability of the finding failurenodes and recovered by cluster heads. This paperpresents an algorithm that can effectively increaselifetime of WSN besides satisfying the QoS requirementsof application. Such algorithm is adaptive and also fault– tolerant. It uses path and source redundancy and basedon hop-by-hop data delivery. Empirical simulationresults revealed that the proposed system is feasible. Thissystem also proposed the authentication of all kinds ofidentified faults and provides the services in qualitymanner. It increases the data flow and reduces the faults

Middela Shailaja

2012-09-01

123

Fault Tolerant Control: A Simultaneous Stabilization Result  

DEFF Research Database (Denmark)

This paper discusses the problem of designing fault tolerant compensators that stabilize a given system both in the nominal situation, as well as in the situation where one of the sensors or one of the actuators has failed. It is shown that such compensators always exist, provided that the system is detectable from each output and that it is stabilizable. The proof of this result is constructive, and a worked example shows how to design a fault tolerant compensator for a simple, yet challeging system. A family of second order systems is described that requires fault tolerant compensators of arbitrarily high order. Udgivelsesdato: FEB

Stoustrup, Jakob; Blondel, V.D.

2004-01-01

124

Advanced information processing system: The Army Fault-Tolerant Architecture detailed design overview  

Science.gov (United States)

The Army Avionics Research and Development Activity (AVRADA) is pursuing programs that would enable effective and efficient management of large amounts of situational data that occurs during tactical rotorcraft missions. The Computer Aided Low Altitude Night Helicopter Flight Program has identified automated Terrain Following/Terrain Avoidance, Nap of the Earth (TF/TA, NOE) operation as key enabling technology for advanced tactical rotorcraft to enhance mission survivability and mission effectiveness. The processing of critical information at low altitudes with short reaction times is life-critical and mission-critical necessitating an ultra-reliable/high throughput computing platform for dependable service for flight control, fusion of sensor data, route planning, near-field/far-field navigation, and obstacle avoidance operations. To address these needs the Army Fault Tolerant Architecture (AFTA) is being designed and developed. This computer system is based upon the Fault Tolerant Parallel Processor (FTPP) developed by Charles Stark Draper Labs (CSDL). AFTA is hard real-time, Byzantine, fault-tolerant parallel processor which is programmed in the ADA language. This document describes the results of the Detailed Design (Phase 2 and 3 of a 3-year project) of the AFTA development. This document contains detailed descriptions of the program objectives, the TF/TA NOE application requirements, architecture, hardware design, operating systems design, systems performance measurements and analytical models.

Harper, Richard E.; Babikyan, Carol A.; Butler, Bryan P.; Clasen, Robert J.; Harris, Chris H.; Lala, Jaynarayan H.; Masotto, Thomas K.; Nagle, Gail A.; Prizant, Mark J.; Treadwell, Steven

1994-01-01

125

Lightweigth Adaptive fault-tolerant data storage system (AFTSYS)  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Research group ARCOS of Universidad Carlos III de Madrid (Spain) have been working on flexible and adaptive data storage systems for several years. The storage systems developed are featured by software governance, making them portable across different hardware storage resources, and their dynamic adaptativy to the different circumstances of computer systems following the autonomic system paradigm. They also allow getting high performance storage by using data distribution or striping across ...

2008-01-01

126

Design of multi-level fault tolerant systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The literature on reliable systems is composed by a very broad range of specific problems and solutions. Very few designs of reliable systems are reported, in which an integrated methodology is taken into account as one of the most important design goals. This fact makes, in general, difficult to provide a good readability and possibility of evaluation of the proposed solutions for enhancing the reliability of these systems. The aim of this paper is to provide a structured methodology and sev...

Ciuffoletti, A.; Simoncini, L.

1981-01-01

127

A Fault Tolerant Mobile Agent Information Retrieval System  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Problem statement: Most of the information retrieval systems used only client-server architectures. The client-server model though powerful, had some limitations. In mobile computing environment which has both wired network and wireless networks with limited communication capabilities, the performance of the system was very low. Approach: Mobile agents are considered a suitable technology to develop applications such as information retrieval system for mobile computing environme...

2010-01-01

128

Fault tolerance improvement for queuing systems under stress load  

International Nuclear Information System (INIS)

Various kinds of queuing information systems (exchange auctions systems, web servers, SCADA) are faced to unpredictable situations during operation, when information flow that requires being analyzed and processed rises extremely. Such stress load situations often require human (dispatcher's or administrator's) intervention that is the reason why the time of the first denial of service is extremely important. Common queuing systems architecture is described. Existing approaches to computing resource management are considered. A new late-first-denial-of-service resource management approach is proposed

2009-01-01

129

Application of Joint Parameter Identification and State Estimation to a Fault-Tolerant Robot System  

DEFF Research Database (Denmark)

The joint parameter identification and state estimation technique is applied to develop a fault-tolerant space robot system. The potential faults in the considered system are abrupt parametric faults, which indicate that some system parameters will immediately deviate from their nominal values if a fault happens. The concerned system parameters consist of deterministic parts as well as those describing the stochastic features in the system. Due to the purpose for design of reconfigurable control, these deviated system parameters need to be identified as precisely and quickly as possible. Meanwhile, it would further simplify the reconfigurable design task and possibly speed up the system recovery, if the system state information under the new operating circumstance can be available along with faulty parameter information. The joint parameter identification and state estimation using the combined Kalman Filter and Maximum Likelihood (KF-ML) techniques is discussed and applied in this study. The simulation results on a space robot system showed that the proposed method is quite promising in providing both faulty parameter information and state estimation in a quick, accurate and robust manner.

Sun, Zhen; Yang, Zhenyu

2011-01-01

130

Active Fault Tolerant Control of Livestock Stable Ventilation System  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Modern stables and greenhouses are equipped with different components for providing a comfortable climate for animals and plant. A component malfunction may result in loss of production. Therefore, it is desirable to design a control system, which is stable, and is able to provide an acceptable degraded performance even in the faulty case. In this thesis, we have designed such controllers for climate control systems of livestock buildings in three steps: • Deriving a model for the climate c...

Gholami, Mehdi

2012-01-01

131

Active Fault Tolerant Control of Livestock Stable Ventilation System  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Modern stables and greenhouses are equipped with different components for providing a comfortable climate for animals and plant. A component malfunction may result in loss of production. Therefore, it is desirable to design a control system, which is stable, and is able to provide an acceptable degraded performance even in the faulty case. In this thesis, we have designed such controllers for climate control systems of livestock buildings in three steps: • Deriving a model for the climate c...

Gholami, Mehdi

2011-01-01

132

Fault-tolerant Agreement in Synchronous Message-passing Systems  

CERN Document Server

The present book focuses on the way to cope with the uncertainty created by process failures (crash, omission failures and Byzantine behavior) in synchronous message-passing systems (i.e., systems whose progress is governed by the passage of time). To that end, the book considers fundamental problems that distributed synchronous processes have to solve. These fundamental problems concern agreement among processes (if processes are unable to agree in one way or another in presence of failures, no non-trivial problem can be solved). They are consensus, interactive consistency, k-set agreement an

Raynal, Michel

2010-01-01

133

Fault Tolerant Neural Network for ECG Signal Classification Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The aim of this paper is to apply a new robust hardware Artificial Neural Network (ANN) for ECG classification systems. This ANN includes a penalization criterion which makes the performances in terms of robustness. Specifically, in this method, the ANN weights are normalized using the auto-prune method. Simulations performed on the MIT ? BIH ECG signals, have shown that significant robustness improvements are obtained regarding potential hardware artificial neuron failures. Moreover, we ...

Merah, M.; Ouamri, A.; Nait-ali, A.; Keche, M.

2011-01-01

134

Fault Tolerant Neural Network for ECG Signal Classification Systems  

Directory of Open Access Journals (Sweden)

Full Text Available The aim of this paper is to apply a new robust hardware Artificial Neural Network (ANN for ECG classification systems. This ANN includes a penalization criterion which makes the performances in terms of robustness. Specifically, in this method, the ANN weights are normalized using the auto-prune method. Simulations performed on the MIT ? BIH ECG signals, have shown that significant robustness improvements are obtained regarding potential hardware artificial neuron failures. Moreover, we show that the proposed design achieves better generalization performances, compared to the standard back-propagation algorithm.

MERAH, M.

2011-08-01

135

Microprocessor-based fault-tolerant reactor control and information system  

International Nuclear Information System (INIS)

The Reactor Manual Control system (RMCS) and the Rod Position Information system (RPIS), applied to the boiling water reactor (BWR) power plants, areas among the most important systems in controlling the output of nuclear reactor power. Improvement in reliability of the RMCS and RPIS is thus highly important for availability during plant operation. This paper presents a highly reliable RMCS and RPIS employing microprocessors. The developed equipment has been made fault-tolerant by adopting redundancy of each system. The amount of cabling normally required has been reduced by multiplexing transmission via fiber-optic cable. The size of the control panel has been reduced and maintainability improved

1990-01-01

136

Survey On Fault Tolerance In Grid Computing  

Directory of Open Access Journals (Sweden)

Full Text Available Grid computing is defined as a hardware and software infrastructure that enables coordinatedresource sharing within dynamic organizations. In grid computing, the probability of a failure is muchgreater than in traditional parallel computing. Therefore, the fault tolerance is an important property inorder to achieve reliability, availability and QOS. In this paper, we give a survey on various faulttolerance techniques, fault management in different systems and related issues. A fault tolerance servicedeals with various types of resource failures, which include process failure, processor failure and networkfailures. This survey provides the related research results about fault tolerance in distinct functional areasof grid infrastructure and also gave the future directions about fault tolerance techniques, and it is a goodreference for researcher.

P. Latchoumy

2011-12-01

137

A Survey on Software Fault tolerance in Parallel Computing  

Directory of Open Access Journals (Sweden)

Full Text Available Software almost inevitably contains defects. Do everything possible to reduce the fault rate; Use faulttolerance techniques to deal with software faults. Fault tolerance is the ability of a system to perform its function correctly even in the presence of internal faults. Most of the ordinary systems lack fault tolerant software fix. This paper surveys various software Fault Tolerance techniques and methodologies. The conventional fault tolerant approaches viz., Recovery Block (RB, N Version Programming (NVP etc., are too costly to fix in an ordinary lowcost application system because, both the RB and NVP rely on multiple (at least three versions of both software and computing machines.

Jashan Deep

2013-08-01

138

(m,n)-Semirings and a Generalized Fault Tolerance Algebra of Systems  

CERN Document Server

We propose a new class of mathematical structures called (m,n)-semirings} (which generalize the usual semirings), and describe their basic properties. We also define partial ordering, and generalize the concepts of congruence, homomorphism, ideals, etc., for (m,n)-semirings. Following earlier work by Rao, we consider a system as made up of several components whose failures may cause it to fail, and represent the set of systems algebraically as an (m,n)-semiring. Based on the characteristics of these components we present a formalism to compare the fault tolerance behaviour of two systems using our framework of a partially ordered (m,n)-semiring.

Alam, Syed Eqbal; Davvaz, Bijan

2010-01-01

139

An integrated methodology for the dynamic performance and reliability evaluation of fault-tolerant systems  

International Nuclear Information System (INIS)

We propose an integrated methodology for the reliability and dynamic performance analysis of fault-tolerant systems. This methodology uses a behavioral model of the system dynamics, similar to the ones used by control engineers to design the control system, but also incorporates artifacts to model the failure behavior of each component. These artifacts include component failure modes (and associated failure rates) and how those failure modes affect the dynamic behavior of the component. The methodology bases the system evaluation on the analysis of the dynamics of the different configurations the system can reach after component failures occur. For each of the possible system configurations, a performance evaluation of its dynamic behavior is carried out to check whether its properties, e.g., accuracy, overshoot, or settling time, which are called performance metrics, meet system requirements. Markov chains are used to model the stochastic process associated with the different configurations that a system can adopt when failures occur. This methodology not only enables an integrated framework for evaluating dynamic performance and reliability of fault-tolerant systems, but also enables a method for guiding the system design process, and further optimization. To illustrate the methodology, we present a case-study of a lateral-directional flight control system for a fighter aircraft

2008-11-01

140

Fault-Tolerant Process Control Methods and Applications  

CERN Document Server

Fault-Tolerant Process Control focuses on the development of general, yet practical, methods for the design of advanced fault-tolerant control systems; these ensure an efficient fault detection and a timely response to enhance fault recovery, prevent faults from propagating or developing into total failures, and reduce the risk of safety hazards. To this end, methods are presented for the design of advanced fault-tolerant control systems for chemical processes which explicitly deal with actuator/controller failures and sensor faults and data losses. Specifically, the book puts forward: ·         a framework for  detection, isolation and diagnosis of actuator and sensor faults for nonlinear systems; ·         controller reconfiguration and safe-parking-based fault-handling methodologies; ·         integrated-data- and model-based fault-detection and isolation and fault-tolerant control methods; ·         methods for handling sensor faults and data losses; and ·      ...

Mhaskar, Prashant; Christofides, Panagiotis D

2013-01-01

 
 
 
 
141

A Study on Fault-Tolerant Software Architecture for COTS-Based Dependable System  

International Nuclear Information System (INIS)

Recently, with the rapid development of digital computers and information processing technologies, nuclear instrument and control (I and C) systems which needs safety-critical function have adopted digital technologies. Also, use of commercial off-the-shelf (COTS) software in safety-critical system has been incremented with several reasons such as economical efficiency and technical problems. But, it requires a considerable integration effort and brings about software quality and safety issues. COTS software is usually provided as a black box that cannot be modified. The biggest problem when we integrate such a product into dependable systems is the reliability of COTS software. There is no guarantee that the software will perform its function correctly. It may have bugs or unidentified components. Recently, the method of software verification and validation (V and V) is accepted as a way to assure the dependability of new-developed safety-critical nuclear I and C software. But, because of the limitation of COTS software, software V and V cant be applied as rigorously as new-developed software. There are considerable attentions into describing software architecture with respect to there dependability properties. In this paper, we present fault-tolerant software architecture using the C2 architectural style. The remainder of the paper is organized as follows: Section 2 discusses background work on the COTS software in nuclear I and C, software fault tolerance and C2 architectural style. Section 3 describes the architecture for fault-tolerant COTS-based software. Finally, we discuss the conclusion and future work

2005-10-27

142

Analysing Fault Tolerance for Erlang Applications  

Digital Repository Infrastructure Vision for European Research (DRIVER)

ERLANG is a concurrent functional language, well suited for distributed, highly concurrent and fault-tolerant software. An important part of Erlang is its support for failure recovery. Fault tolerance is provided by organising the processes of an ERLANG application into tree structures. In these structures, parent processes monitor failures of their children and are responsible for their restart. Libraries support the creation of such structures during system initialisation.A technique to aut...

Nystro?m, Jan Henry

2009-01-01

143

Synthesis of Fault-Tolerant Schedules with Transparency/Performance Trade-offs for Distributed Embedded Systems  

DEFF Research Database (Denmark)

In this paper we present an approach to the scheduling of fault-tolerant embedded systems for safety-critical applications. Processes and messages are statically scheduled, and we use process re-execution for recovering from multiple transient faults. If process recovery is performed such that the operation of other processes is not affected, we call it transparent recovery. Although transparent recovery has the advantages of fault containment, improved debugability and less memory needed to store the fault-tolerant schedules, it will introduce delays that can violate the timing constraints of the application. We propose a novel algorithm for the synthesis of fault-tolerant schedules that can handle the transparency/performance trade-offs imposed by the designer, and makes use of the fault-occurrence information to reduce the overhead due to fault tolerance. We model the application as a conditional process graph, where the fault occurrence information is represented as conditional edges and the transparent recovery is captured using synchronization nodes.

Izosimov, Viacheslav; Pop, Paul

2006-01-01

144

Fault tolerant synchronization of chaotic heavy symmetric gyroscope systems versus external disturbances via Lyapunov rule-based fuzzy control.  

Science.gov (United States)

In this paper, fault tolerant synchronization of chaotic gyroscope systems versus external disturbances via Lyapunov rule-based fuzzy control is investigated. Taking the general nature of faults in the slave system into account, a new synchronization scheme, namely, fault tolerant synchronization, is proposed, by which the synchronization can be achieved no matter whether the faults and disturbances occur or not. By making use of a slave observer and a Lyapunov rule-based fuzzy control, fault tolerant synchronization can be achieved. Two techniques are considered as control methods: classic Lyapunov-based control and Lyapunov rule-based fuzzy control. On the basis of Lyapunov stability theory and fuzzy rules, the nonlinear controller and some generic sufficient conditions for global asymptotic synchronization are obtained. The fuzzy rules are directly constructed subject to a common Lyapunov function such that the error dynamics of two identical chaotic motions of symmetric gyros satisfy stability in the Lyapunov sense. Two proposed methods are compared. The Lyapunov rule-based fuzzy control can compensate for the actuator faults and disturbances occurring in the slave system. Numerical simulation results demonstrate the validity and feasibility of the proposed method for fault tolerant synchronization. PMID:21868010

Farivar, Faezeh; Shoorehdeli, Mahdi Aliyari

2012-01-01

145

State of the art on fault-tolerant real time distributed systems  

International Nuclear Information System (INIS)

The integration of new computerized functions in power plant, and especially nuclear power plant, control and instrumentation systems implies more and more stringent requirements as to communication system reliability. For if an item of equipment, or even a computer program, can be validated and qualified, no formal qualification procedure is presently imposed on communication networks. This is certainly due to the relative immaturity of these networks, but also to their complexity. It is for this reason that, in the context of preparation for the future PWR 2000 standardized nuclear plants, it would seem appropriate to take a look at fault-tolerant communication systems. Since C and I type applications (in the control room) are divided between several computers and are required to contend with extremely severe time constraints, EDF has undertaken investigation of fault-tolerant, real time distributed systems. This paper summarized the state of the art in the field as it appears from discussion with computer manufacturers, academics and research workers on related projects. The results obtained were then used to determine trends as to ''promising'' solutions. The paper concludes with recommended study programs for the PCC department of EDF/R and DD for the next few years. (author), 9 figs., 10 refs., 2 annexes

1992-01-01

146

Model Checking a Byzantine-Fault-Tolerant Self-Stabilizing Protocol for Distributed Clock Synchronization Systems  

Science.gov (United States)

This report presents the mechanical verification of a simplified model of a rapid Byzantine-fault-tolerant self-stabilizing protocol for distributed clock synchronization systems. This protocol does not rely on any assumptions about the initial state of the system. This protocol tolerates bursts of transient failures, and deterministically converges within a time bound that is a linear function of the self-stabilization period. A simplified model of the protocol is verified using the Symbolic Model Verifier (SMV) [SMV]. The system under study consists of 4 nodes, where at most one of the nodes is assumed to be Byzantine faulty. The model checking effort is focused on verifying correctness of the simplified model of the protocol in the presence of a permanent Byzantine fault as well as confirmation of claims of determinism and linear convergence with respect to the self-stabilization period. Although model checking results of the simplified model of the protocol confirm the theoretical predictions, these results do not necessarily confirm that the protocol solves the general case of this problem. Modeling challenges of the protocol and the system are addressed. A number of abstractions are utilized in order to reduce the state space. Also, additional innovative state space reduction techniques are introduced that can be used in future verification efforts applied to this and other protocols.

Malekpour, Mahyar R.

2007-01-01

147

A multi-layer robust adaptive fault tolerant control system for high performance aircraft  

Science.gov (United States)

Modern high-performance aircraft demand advanced fault-tolerant flight control strategies. Not only the control effector failures, but the aerodynamic type failures like wing-body damages often result in substantially deteriorate performance because of low available redundancy. As a result the remaining control actuators may yield substantially lower maneuvering capabilities which do not authorize the accomplishment of the air-craft's original specified mission. The problem is to solve the control reconfiguration on available control redundancies when the mission modification is urged to save the aircraft. The proposed robust adaptive fault-tolerant control (RAFTC) system consists of a multi-layer reconfigurable flight controller architecture. It contains three layers accounting for different types and levels of failures including sensor, actuator, and fuselage damages. In case of the nominal operation with possible minor failure(s) a standard adaptive controller stands to achieve the control allocation. This is referred to as the first layer, the controller layer. The performance adjustment is accounted for in the second layer, the reference layer, whose role is to adjust the reference model in the controller design with a degraded transit performance. The upmost mission adjust is in the third layer, the mission layer, when the original mission is not feasible with greatly restricted control capabilities. The modified mission is achieved through the optimization of the command signal which guarantees the boundedness of the closed-loop signals. The main distinguishing feature of this layer is the the mission decision property based on the current available resources. The contribution of the research is the multi-layer fault-tolerant architecture that can address the complete failure scenarios and their accommodations in realities. Moreover, the emphasis is on the mission design capabilities which may guarantee the stability of the aircraft with restricted post-failure control capabilities. The implementation issues of the architecture are also addressed, with possible realizations and the feasibility analysis.

Huo, Ying

148

Reverse Computation for Rollback-based Fault Tolerance in Large Parallel Systems  

Energy Technology Data Exchange (ETDEWEB)

Reverse computation is presented here as an important future direction in addressing the challenge of fault tolerant execution on very large cluster platforms for parallel computing. As the scale of parallel jobs increases, traditional checkpointing approaches suffer scalability problems ranging from computational slowdowns to high congestion at the persistent stores for checkpoints. Reverse computation can overcome such problems and is also better suited for parallel computing on newer architectures with smaller, cheaper or energy-efficient memories and file systems. Initial evidence for the feasibility of reverse computation in large systems is presented with detailed performance data from a particle simulation scaling to 65,536 processor cores and 950 accelerators (GPUs). Reverse computation is observed to deliver very large gains relative to checkpointing schemes when nodes rely on their host processors/memory to tolerate faults at their accelerators. A comparison between reverse computation and checkpointing with measurements such as cache miss ratios, TLB misses and memory usage indicates that reverse computation is hard to ignore as a future alternative to be pursued in emerging architectures.

Perumalla, Kalyan S [ORNL; Park, Alfred J [ORNL

2013-01-01

149

Scheduling and Optimization of Fault-Tolerant Embedded Systems with Transparency/Performance Trade-Offs  

DEFF Research Database (Denmark)

In this article, we propose a strategy for the synthesis of fault-tolerant schedules and for the mapping of fault-tolerant applications. Our techniques handle transparency/performance trade-offs and use the faultoccurrence information to reduce the overhead due to fault tolerance. Processes and messages are statically scheduled, and we use process reexecution for recovering from multiple transient faults. We propose a finegrained transparent recovery, where the property of transparency can be selectively applied to processes and messages. Transparency hides the recovery actions in a selected part of the application so that they do not affect the schedule of other processes and messages. While leading to longer schedules, transparent recovery has the advantage of both improved debuggability and less memory needed to store the faulttolerant schedules.

Izosimov, Viacheslav; Pop, Paul

2012-01-01

150

Fault-tolerant adaptive control for load-following in static space nuclear power systems  

International Nuclear Information System (INIS)

In this paper the possible use of dual-loop, model-based adaptive control system for load-following in static space nuclear power systems is investigated. The objective of the fault-tolerant, autonomous control system is to deliver the demanded electric power at the desired voltage level, by appropriately manipulating the neutron power through the control drums. As a result sufficient thermal power is produced to meet the required demand in the presence of dynamically changing system operating conditions and potential sensor failures. The designed controller is proposed for use in combination with the currently considered shunt regulators, or as a back-up controller when other means of power system control, including some of the sensors, fail

1992-08-03

151

Design of Fault-Tolerant Control for Trajectory Tracking  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The paper proposes a fault-tolerant integrated control system with the brake and the steering for developing a driver assistance system. The purpose is to design a fault-tolerant control which is able to guarantee the trajectory tracking and lateral stability of the vehicle against actuator fault scenarios. Since both actuators affect the lateral dynamics of the vehicle, in the control design a balance and priority between them must be achieved. The method is extended with a fault-tolerant fe...

Ne?meth, Balazs; Gaspar, Peter; Bokor, Jozsef; Sename, Olivier; Dugard, Luc

2012-01-01

152

Advances in Database Technology: F1-Fault Tolerant RDBMS, C-Block and Q system  

Directory of Open Access Journals (Sweden)

Full Text Available In this paper, we discuss the latest database technologies that supports the present critical challenges faced by the organizations, as managing the data effectively has become a major need. In particular the latest f1-fault tolerant distributed RDBMS which is an hybrid database that combines the scalability of big table andfunctionality of SQL are discussed. Then c-block system that address the challenge of identifying the duplicates in large datasets for better efficiency and next the q system for efficient data integration which performs automatic data integration for the incoming datasets are discussed and finally we examine the integration of all these technologies in a system that would address the issues pertaining to data management

Y.Sailaja1 , M.Nalini Sri

2013-04-01

153

A Constraint Logic Programming Framework for the Synthesis of Fault-Tolerant Schedules for Distributed Embedded Systems  

DEFF Research Database (Denmark)

We present a constraint logic programming (CLP) approach for synthesis of fault-tolerant hard real-time applications on distributed heterogeneous architectures. We address time-triggered systems, where processes and messages are statically scheduled based on schedule tables. We use process re-execution for recovering from multiple transient faults. We propose three scheduling approaches, which each present a trade-off between schedule simplicity and performance, (i) full transparency, (ii) slack sharing and (iii) conditional, and provide various degrees of transparency. We have developed a CLP framework that produces the fault-tolerant schedules, guaranteeing schedulability in the presence of transient faults. We show how the framework can be used to tackle design optimization problems.The proposed approach has been evaluated using extensive experiments.

Poulsen, KÃ¥re Harbo; Pop, Paul

2007-01-01

154

Fault Tolerant Control of Induction Motor  

Directory of Open Access Journals (Sweden)

Full Text Available The principle of vector control of electrical machines is to control both the magnitude and the phase of each phase, current and voltage. MATLAB/Simulink has been performed for assessment of operating features of the proposed scheme. Proportional Integral (PI speed controller is designed in this paper. Test response of the developed variable speed drive along with the simulated response is given and discussed in detail for torque and speed. Fault tolerant fundamental is applied to the system when it is subject to a system fault. Two faults are investigated in this paper, stator short winding and broken rotor bar. The induction motor operates with acceptable performance in both speed and torque. The induction motor modeling along with the vector control fault tolerant scheme is investigated to show the optimal response of the control system

Khalaf Salloum Gaeid

2011-08-01

155

Design of fault tolerant control system for individual blade control helicopters  

Science.gov (United States)

This dissertation presents the development of a fault tolerant control scheme for helicopters fitted with individually controlled blades. This novel approach attempts to improve fault tolerant capabilities of helicopter control system by increasing control redundancy using additional actuators for individual blade input and software re-mixing to obtain nominal or close to nominal conditions under failure. An advanced interactive simulation environment has been developed including modeling of sensor failure, swashplate actuator failure, individual blade actuator failure, and blade delamination to support the design, testing, and evaluation of the control laws. This simulation environment is based on the blade element theory for the calculation of forces and moments generated by the main rotor. This discretized model allows for individual blade analysis, which in turn allows measuring the consequences of a stuck blade, or loss of the surface area of the blade itself, with respect to the dynamics of the whole helicopter. The control laws are based on non-linear dynamic inversion and artificial neural network augmentation, which is a mix of linear and nonlinear methods that compensates for model inaccuracies due to linearization or failure. A stability analysis based on the Lyapunov function approach has shown that bounded tracking error is guaranteed, and under specific circumstances, global stability is guaranteed as well. An analysis over the degrees of freedom of the mechanical system and its impact over the helicopter handling qualities is also performed to measure the degree of redundancy achieved with the addition of individual blade actuators as compared to a classic swashplate helicopter configuration. Mathematical analysis and numerical simulation, using reconfiguration of the individual blade control under failure have shown that this control architecture can potentially improve the survivability of the aircraft and reduce pilot workload under failure conditions.

Tamayo, Sergio

156

Fault Tolerant Homopolar Magnetic Bearings  

Science.gov (United States)

Magnetic suspensions (MS) satisfy the long life and low loss conditions demanded by satellite and ISS based flywheels used for Energy Storage and Attitude Control (ACESE) service. This paper summarizes the development of a novel MS that improves reliability via fault tolerant operation. Specifically, flux coupling between poles of a homopolar magnetic bearing is shown to deliver desired forces even after termination of coil currents to a subset of failed poles . Linear, coordinate decoupled force-voltage relations are also maintained before and after failure by bias linearization. Current distribution matrices (CDM) which adjust the currents and fluxes following a pole set failure are determined for many faulted pole combinations. The CDM s and the system responses are obtained utilizing 1D magnetic circuit models with fringe and leakage factors derived from detailed, 3D, finite element field models. Reliability results are presented vs. detection/correction delay time and individual power amplifier reliability for 4, 6, and 7 pole configurations. Reliability is shown for two success criteria, i.e. (a) no catcher bearing contact following pole failures and (b) re-levitation off of the catcher bearings following pole failures. An advantage of the method presented over other redundant operation approaches is a significantly reduced requirement for backup hardware such as additional actuators or power amplifiers.

Li, Ming-Hsiu; Palazzolo, Alan; Kenny, Andrew; Provenza, Andrew; Beach, Raymond; Kascak, Albert

2003-01-01

157

Coordinated Fault Tolerance for High-Performance Computing  

Energy Technology Data Exchange (ETDEWEB)

Our work to meet our goal of end-to-end fault tolerance has focused on two areas: (1) improving fault tolerance in various software currently available and widely used throughout the HEC domain and (2) using fault information exchange and coordination to achieve holistic, systemwide fault tolerance and understanding how to design and implement interfaces for integrating fault tolerance features for multiple layers of the software stack—from the application, math libraries, and programming language runtime to other common system software such as jobs schedulers, resource managers, and monitoring tools.

Dongarra, Jack; Bosilca, George; et al.

2013-04-08

158

A Fault-Tolerant Modulation Method to Counteract the Double Open-Switch Fault in Matrix Converter Drive Systems without Redundant Power Devices  

DEFF Research Database (Denmark)

This paper studies the double open-switch fault issue occurring within the conventional matrix converter driving a three-phase permanent-magnet synchronous motor system and proposes a fault-tolerant solution by introducing a revised modulation strategy. In this switching strategy, the rectifier-stage modulation is adjusted based on the knowledge of the switching logics of the inverter-stage and the operating input voltage sectors. However, the proposed fault-tolerant method does not rely on the assist of any redundant power devices or any reconfiguration of the matrix converter circuit by means of using redundant physical connections. It is shown that different locations of the double open switch affect the availability of the revised modulation. The steady state absolute speed error achieved with the proposed method is 4% of the nominal speed. Experimental results are performed to demonstrate the efficacy of the proposed methods.

Chen, Der-Fa; Nguyen-Duy, Khiem

2012-01-01

159

Multi-agent Platform and Toolbox for Fault Tolerant Networked Control Systems  

Directory of Open Access Journals (Sweden)

Full Text Available

Industrial distributed networked control systems use different communication networks to exchange different critical levels of information. Real-time control, fault diagnosis (FDI and Fault Tolerant Networked Control (FTNC systems demand one of the more stringent data exchange in the communication networks of these networked control systems (NCS. When dealing with large-scale complex NCS, designing FTNC systems is a very difficult task due to the large number of sensors and actuators spatially distributed and network connected. To solve this issue, a FTNC platform and toolbox are presented in this paper using simple and verifiable principles coming mainly from a decentralized design based on causal modelling partitioning of the NCS and distributed computing using multi-agent systems paradigm, allowing the use of agents with well established FTC methodologies or new ones developed taking into account the NCS specificities. The multi-agent platform and toolbox for FTNC systems have been built in Matlab/Simulink environment, which is in our days the scientific benchmark for this kind of research. Although the tests have been performed with a simple case, the results are promising and this approach is expected to succeed with more complex processes.

José Sá da Costa

2009-04-01

160

Network fault tolerance in LA-MPI  

Energy Technology Data Exchange (ETDEWEB)

LA-MPI is a high-performance, network-fault-tolerant implementation of MPl designcd for terascale clusters that are inherently unreliable due to their very large number of system components and to trade-offs between cost and pcrformance. This paper reviews the architectural design of LA-MPI, focusing on our approach to guaranteeing data integrity. We discuss our network data path abstraction that makes LA-MPI highly portable, givcs high-performance through mcssage striping, and niost importantly provides the basis for network fault tolerance. Finally we include some performance numbers for the Quadrics and UDP network paths.

Aulwes, R. T. (Robbie T.); Daniel, D. J. (David J.); Desai, N. N. (Nehal N.); Graham, R. L. (Richard L.); Risinger, L. D. (Larrd Dean); Sukalski, M. W. (Mitchel W.); Taylor, M. A. (Mark)

2003-01-01

 
 
 
 
161

Fault Tolerant Analysis For Holonic Manufacturing Systems Based On Collaborative Petri Nets  

Directory of Open Access Journals (Sweden)

Full Text Available Uncertainties are significant characteristics of today's manufacturing systems. Holonic manufacturing systems are new paradigms to handle uncertainties and changes in manufacturing environments. Among many sources of uncertainties, failure prone machines are one of the most important ones. This paper focuses on handling machine failures in holonic manufacturing systems. Machine failure will reduce the number of available resources. Feasibility analysis need to be conducted to check whether the works in process can be completed. To facilitate feasibility analysis, we characterize feasible conditions for systems with failure prone machines. This paper combines the flexibility and robustness of multi-agent theory with the modeling and analytical power of Petri net to adaptively synthesize Petri net agents to control holonic manufacturing systems. The main results include: (1 a collaborative Petri net (CPN agent model for holonic manufacturing systems, (2 a feasible condition to test whether a certain type of machine failures are allowed based on collaborative Petri net agents and (3 fault tolerant analysis of the proposed method.

Fu-Shiung Hsieh

2003-04-01

162

Fault-tolerance characteristics of neural networks  

Energy Technology Data Exchange (ETDEWEB)

A methodology for measuring and improving the fault tolerance characteristics of neural networks is presented. Sensitivity analysis and headroom analysis programs have been developed using fault models more realistic and appropriate for emulating hardware failures than those used previously. The potential mode of failure is simulated as a corruption of stored weight and threshold values. These analysis tools enable the fault tolerance characteristics of neural networks to be evaluated. It is demonstrated how functionally identical neural networks can have significantly different reliability characteristics should they be subjected to hardware platform failures. Criteria for selection of globally optimal architecture and trained state are discussed using results provided by the sensitivity and headroom analysis programs. These criteria, combined with empirical results, lead to implied design rules which can be adopted by engineers of neural networks for improving and maximizing the fault tolerance characteristics of the system. A novel modification to the backward error propagation training algorithm is discussed and evaluated on its effectiveness in improving the robustness of the trained network. The modification involves the deliberate injection of a small amount of random white noise on the network's weights and thresholds to expedite and increase the likelihood of a more optimal convergence stat occurring for the network from the aspect of fault tolerance. The methodology is demonstrated using an iterative design scenario and is shown to be effective under certain circumstance.

Chun, R.K.L.

1989-01-01

163

Fault-tolerant Control of Discrete-time LPV systems using Virtual Actuators and Sensors  

DEFF Research Database (Denmark)

This paper proposes a new fault-tolerant control (FTC) method for discrete-time linear parameter varying (LPV) systems using a reconfiguration block. The basic idea of the method is to achieve the FTC goal without re-designing the nominal controller by inserting a reconfiguration block between the plant and the nominal controller. The reconfiguration block is realized by an LPV virtual actuator and an LPV virtual sensor. Its goal is to transform the signals from the faulty system such that its behavior is similar to that of the nominal system from the viewpoint of the controller. Furthermore, it transforms the output of the controller for the faulty system such that the stability and performance goals are preserved. Input-to-state stabilizing LPV gains of the virtual actuator and sensor are obtained by solving linear matrix inequalities (LMIs). We show that separate design of these gains guarantees the input-to-state stability (ISS) of the closed-loop reconfigured system. Moreover, we obtain performances in terms of the ISS gains for the virtual actuator, the virtual sensor and their interconnection. Minimizing these performances is formulated as convex optimization problems subject to LMI constraints. Finally, the effectiveness of the method is demonstrated via a numerical example and stator current control of an induction motor.

Tabatabaeipour, Mojtaba; Stoustrup, Jakob

2014-01-01

164

Fault Tolerance-Challenges, Techniques and Implementation in Cloud Computing  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Fault tolerance is a major concern to guarantee availability and reliability of critical services as well as application execution. In order to minimize failure impact on the system and application execution, failures should be anticipated and proactively handled. Fault tolerance techniques are used to predict these failures and take an appropriate action before failures actually occur. This paper discusses the existing fault tolerance techniques in cloud computing based on their policies, to...

2012-01-01

165

Scalable distributed consensus to support MPI fault tolerance.  

Energy Technology Data Exchange (ETDEWEB)

As system sizes increase, the amount of time in which an application can run without experiencing a failure decreases. Exascale applications will need to address fault tolerance. In order to support algorithm-based fault tolerance, communication libraries will need to provide fault-tolerance features to the application. One important fault-tolerance operation is distributed consensus. This is used, for example, to collectively decide on a set of failed processes. This paper describes a scalable, distributed consensus algorithm that is used to support new MPI fault-tolerance features proposed by the MPI 3 Forum's fault-tolerance working group. The algorithm was implemented and evaluated on a 4,096-core Blue Gene/P. The implementation was able to perform a full-scale distributed consensus in 305 {mu}s and scaled logarithmically.

Buntinas, D. (Mathematics and Computer Science)

2011-01-01

166

Heap Base Coordinator Finding with Fault Tolerant Method in Distributed Systems  

Directory of Open Access Journals (Sweden)

Full Text Available Coordinator finding in wireless networks is a very important problem, and this problem is solved by suitable algorithms. The main goals of coordinator finding are synchronizing the processes at optimal using of the resources. Many different algorithms have been presented for coordinator finding. The most important leader election algorithms are the Bully and Ring algorithms. In this paper we analyze and compare these algorithms with together and we propose new approach with fault tolerant mechanisms base on heap for coordinator finding in wireless environment. Our algorithm's running time and message complexity compare favorably with existing algorithms. Our work involves substantial modifications of an existing algorithm and its proof, and we adapt the existing algorithms to the noisy environment base on fault tolerant mechanisms

Mehdi EffatParvar

2011-07-01

167

Advanced Information Processing System (AIPS)-based fault tolerant avionics architecture for launch vehicles  

Science.gov (United States)

An avionics architecture for the advanced launch system (ALS) that uses validated hardware and software building blocks developed under the advanced information processing system program is presented. The AIPS for ALS architecture defined is preliminary, and reliability requirements can be met by the AIPS hardware and software building blocks that are built using the state-of-the-art technology available in the 1992-93 time frame. The level of detail in the architecture definition reflects the level of detail available in the ALS requirements. As the avionics requirements are refined, the architecture can also be refined and defined in greater detail with the help of analysis and simulation tools. A useful methodology is demonstrated for investigating the impact of the avionics suite to the recurring cost of the ALS. It is shown that allowing the vehicle to launch with selected detected failures can potentially reduce the recurring launch costs. A comparative analysis shows that validated fault-tolerant avionics built out of Class B parts can result in lower life-cycle-cost in comparison to simplex avionics built out of Class S parts or other redundant architectures.

Lala, Jaynarayan H.; Harper, Richard E.; Jaskowiak, Kenneth R.; Rosch, Gene; Alger, Linda S.; Schor, Andrei L.

168

Advanced Information Processing System (AIPS)-based fault tolerant avionics architecture for launch vehicles  

Science.gov (United States)

An avionics architecture for the advanced launch system (ALS) that uses validated hardware and software building blocks developed under the advanced information processing system program is presented. The AIPS for ALS architecture defined is preliminary, and reliability requirements can be met by the AIPS hardware and software building blocks that are built using the state-of-the-art technology available in the 1992-93 time frame. The level of detail in the architecture definition reflects the level of detail available in the ALS requirements. As the avionics requirements are refined, the architecture can also be refined and defined in greater detail with the help of analysis and simulation tools. A useful methodology is demonstrated for investigating the impact of the avionics suite to the recurring cost of the ALS. It is shown that allowing the vehicle to launch with selected detected failures can potentially reduce the recurring launch costs. A comparative analysis shows that validated fault-tolerant avionics built out of Class B parts can result in lower life-cycle-cost in comparison to simplex avionics built out of Class S parts or other redundant architectures.

Lala, Jaynarayan H.; Harper, Richard E.; Jaskowiak, Kenneth R.; Rosch, Gene; Alger, Linda S.; Schor, Andrei L.

1990-01-01

169

Fault tolerant sequential control  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Due to an increasing number of functions and process steps modern mechatronic assembly lines become more and more complex. Especially high precision systems face the conflict between extended availability requirements and system complexity, bearing in mind the related economic pressure. Besides the increasing demands in productivity there are also raising demands in availability and reliability. Hence a systematic approach of manufacturing and assembly plants is necessary. To meet this challe...

Neugebauer, Reimund; Barthel, S.; Richter, M.

2010-01-01

170

Validation Methods Research for Fault-Tolerant Avionics and Control Systems Sub-Working Group Meeting. CARE 3 peer review  

Science.gov (United States)

A computer aided reliability estimation procedure (CARE 3), developed to model the behavior of ultrareliable systems required by flight-critical avionics and control systems, is evaluated. The mathematical models, numerical method, and fault-tolerant architecture modeling requirements are examined, and the testing and characterization procedures are discussed. Recommendations aimed at enhancing CARE 3 are presented; in particular, the need for a better exposition of the method and the user interface is emphasized.

Trivedi, K. S. (editor); Clary, J. B. (ed)

1980-01-01

171

Abnormal fault-recovery characteristics of the fault-tolerant multiprocessor uncovered using a new fault-injection methodology  

Science.gov (United States)

An investigation was made in AIRLAB of the fault handling performance of the Fault Tolerant MultiProcessor (FTMP). Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once in every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles Byzantine or lying faults. Byzantine faults behave such that the faulted unit points to a working unit as the source of errors. The design's problems involve: (1) the design and interface between the simplex error detection hardware and the error processing software, (2) the functional capabilities of the FTMP system bus, and (3) the communication requirements of a multiprocessor architecture. These weak areas in the FTMP's design increase the probability that, for any hardware fault, a good line replacement unit (LRU) is mistakenly disabled by the fault management software.

Padilla, Peter A.

1991-03-01

172

FAULT TOLERANCE IN FPGA THROUGH KING SHIFTING  

Directory of Open Access Journals (Sweden)

Full Text Available A wide range of fault tolerance methods for FPGAs have been proposed. Approaches range from simple architectural redundancy to fully on-line adaptive implementations. The homogeneous structure of ?eld programmable gate arrays (FPGAs suggests that the defect tolerance can be achieved by shifting the con?guration data inside the FPGA. All methods and schemes are qualitatively compared and some particularly promising approaches are highlighted. The applications of these methods also differ; some are used only for manufacturing yield enhancement, while others can be used in-system. This survey attempts to provide an overview of the current state of the art for fault tolerance in FPGAs.In this paper we have discussed the king shifting allocation method.

S. Sharma

2012-05-01

173

Fault-tolerance experiments with the JPL STAR computer.  

Science.gov (United States)

Results of fault-tolerance experiments performed using an experimental computer with dynamic (standby) redundancy, including replaceable subsystems and a 'program rollback' provision to eliminate transient-caused errors. After a brief review of the specification of fault-tolerance with respect to transient faults, including a description of the method of injection of transient faults in software and system tests, fault-tolerance experiments carried out with this computer with regard to the determination of fault classes, software verification, system verification, and recovery stability are summarized. A test and repair processor is described which constitutes a special monitor unit of the computer and is used to obtain information for fault detection in the other subsystems of the computer and to ensure that proper recovery occurs when a fault is detected.

Avizienis, A.; Rennels, D. A.

1972-01-01

174

Comment on "Fault Tolerant analysis for stochastic systems using switching diffusion processes' by Yang, Jiang and Cocquempot  

DEFF Research Database (Denmark)

Results are given in Yang, Jiang and Cocquempot (Yang, H., Jiang, B., and Cocquempot, V. (2009), â??Fault Tolerance Analysis for Stochastic Systems using Switching Diffusion Processesâ??, International Journal of Control, 82, 1516â??1525) regarding the overall stability of switched diffusion processes based on stability properties of separate processes combined through stochastic switching. This article argues two main results to be empty, in that the presented hypotheses are logically inconsistent.

Schiøler, Henrik; Leth, John-Josef

2011-01-01

175

Verifying Safety of Fault-Tolerant Distributed Components -- Extended Version  

Digital Repository Infrastructure Vision for European Research (DRIVER)

We shows how to ensure correctness and fault-tolerance of distributed components by behavioural speci cation. We specify a system combining a simple distributed component application and a fault-tolerance mechanism. We choose to encode the most general and the most demanding kind of faults, Byzantine failures, but only for some of the components of our system. With Byzantine failures a faulty process can have any behaviour, thus replication is the only convenient classical solution; this grea...

2011-01-01

176

Role of reliability modeling in the design of a fault-tolerant feedwater control system  

International Nuclear Information System (INIS)

The use of reliability models in the design and development of a digital feedwater control system for a boiling water reactor (BWR) is described. Fault tree models, data bases, and their impact on system design processes are discussed. The cut sets for the system model have provided a direct means of defining and evaluating potential design changes that can significantly improve the reliability of the system. These changes have included reconfiguration of the system power supplies, additional signal redundancy, additional positional feedback for control actuators, new fault detection schemes, and operator interfaces that are less likely to cause system failures. 3 references, 3 figures

1985-07-01

177

Fault-tolerant computing system, on-board computing and software for engineering test satellite-VI (ETS-VI) attitude control subsystem  

Science.gov (United States)

The fault-tolerant computing system for the attitude-control system of the ETS-VI, a two-ton class, large-size, synchronous, three-axis satellite, is described. A system overview, including specifications, is given. Detailed descriptions are presented of the fault-tolerant functions executed by hardware and software. The results of a hardware and software development model are reported.

Ichikawa, Shin-Ichiro; Kawada, Yasuhiro; Mine, Masaya; Ishige, Yasuo; Itsukaichi, Atsushi

178

Broadcasting Messages in Fault-Tolerant Distributed Systems: the benefit of handling input-triggered and output-triggered suspicions differently  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Broadcasting messages in fault-tolerant distributed systems: the benefit of handling input-triggered and output-triggered suspicions differently B. Charron-Bost, X. Defago, and A. Schiper ABSTRACT The paper investigates the two main and seemingly antagonistic approaches to broadcasting messages reliably in fault-tolerant distributed systems: the approach based on Reliable Broadcast, and the one based on View Synchronous Communication (or VSC for short). While VSC does ...

Charron-bost, B.; De?fago, X.; Schiper, A.

2002-01-01

179

Strategies for Fault Tolerance in Multicomponent Applications  

Energy Technology Data Exchange (ETDEWEB)

This paper discusses on-going work with the Integrated Plasma Simulator (IPS), a framework for coupled multiphysics simulations of plasmas, to allow simulations to run through the loss of nodes on which the simulation is executing. While many different techniques are available to improve the fault tolerance of computational science applications on high-performance computer systems, checkpoint/restart (C/R) remains virtually the only one that see widespread use in practice. Our focus here is to augment the traditional C/R approach with additional techniques that can provide a more localized and tailored response to faults based on the ability to restart failed tasks on an individual basis, and the use of information external to the application itself in order to guide decision-making, in many cases avoiding the need to stop and restart the entire simulation. This capability involves several features within the IPS framework, and leverages the Fault Tolerance Backplane, a publish/subscribe event service to disseminate fault-related information throughout HPC systems, to obtain information from the Reliability, Availability and Serviceability (RAS) subsystem of the HPC system. This work is described in the context of Cray XT-series computer systems for concreteness, but is applicable to other environments as well. As part of the analysis of this work, we discuss the requirements to generalize this approach to other complex simulation applications beyond the Integrated Plasma Simulator.

Shet, Aniruddha G [ORNL; Elwasif, Wael R [ORNL; Foley, Samantha S [ORNL; Park, Byung H [ORNL; Bernholdt, David E [ORNL; Bramley, Randall B [ORNL

2011-01-01

180

A Concept for fault tolerant controllers  

DEFF Research Database (Denmark)

This paper describe a concept for fault tolerant controllers (FTC) based on the YJBK (after Youla, Jabr, Bongiorno and Kucera) parameterization. This controller architecture will allow to change the controller on-line in the case of faults in the system. In the described FTC concept, a safe mode controller is applied as the basic feedback controller. A controller for normal operation with high performance is obtained by including certain YJBK parameters (transfer functions) in the controller. This will allow a fast switch from normal operation to safe mode operation in case of critical faults in the system. The described FTC architecture allow the different feedback controllers to apply different sets of sensors and actuators.

Niemann, Hans Henrik; Poulsen, Niels Kjølstad

2009-01-01

 
 
 
 
181

Efficient Fault-Tolerant Strategy Selection Algorithm in Cloud Computing  

Directory of Open Access Journals (Sweden)

Full Text Available Cloud computing is upcoming a mainstream feature of information technology. More progressively enterprises deploy their software systems in the cloud environment. The applications in cloud are usually large scale and containing a lot of distributed cloud components. Building cloud applications is highly reliable for challenging and critical research issues. Information processing systems has increased the significance of its correct and continuous operation even in the presence of faulty components. To address this issue, proposes a cloud framework to build fault-tolerant cloud applications. We first propose fault detection algorithms to identify significant components from the huge amount of cloud components. Then, we present an efficient fault-tolerance strategy selection algorithm to determine the most suitable fault-tolerance strategy for each significant component. Software fault tolerance is widely adopted to increase the overall system reliability in critical applications. System reliability can be enhanced by employing functionally equivalent components to tolerate component failures. Fault-tolerance strategies introduced a three well-known techniques are in the following with formulas for calculating the failure probabilities of the fault-tolerant modules. Our work will mainly be driven toward the implementation of the framework to measure the strength of fault tolerance service and to make an in-depth analysis of the cost benefits among all the stakeholders. An algorithm is proposed to automatically determine an efficient fault-tolerance strategy for the significant cloud components. Using real failure traces and model, we evaluate the proposed resource provisioning policies to determine their performance, cost as well as cost efficiency. The experimental results show that by tolerating faults of a small part of the most important components, the reliability of cloud applications can be highly improved.

P.Priyanka

2014-02-01

182

Fault Tolerant Ethernet Based Network for Time Sensitive Applications in Electrical Power Distribution Systems  

Directory of Open Access Journals (Sweden)

Full Text Available The paper analyses and experimentally verifies deployment of Ethernet based network technology to enable fault tolerant and timely exchange of data among a number of high voltage protective relays that use proprietary serial communication line to exchange data in real time on a state of its high voltage circuitry facilitating a fast protection switching in case of critical failures. The digital serial signal is first fetched into PCM multiplexer where it is mapped to the corresponding E1 (2 Mbit/s time division multiplexed signal. Subsequently, the resulting E1 frames are then packetized and sent through Ethernet control LAN to the opposite PCM demultiplexer where the same but reverse processing is done finally sending a signal into the opposite protective relay. The challenge of this setup is to assure very timely delivery of the control information between protective relays even in the cases of potential failures of Ethernet network itself. The tolerance of Ethernet network to faults is assured using widespread per VLAN Rapid Spanning Tree Protocol potentially extended by 1+1 PCM protection as a valuable option.

Leos Bohac

2013-01-01

183

Evaluation of Simple Causal Message Logging for Large-Scale Fault Tolerant HPC Systems  

Energy Technology Data Exchange (ETDEWEB)

The era of petascale computing brought machines with hundreds of thousands of processors. The next generation of exascale supercomputers will make available clusters with millions of processors. In those machines, mean time between failures will range from a few minutes to few tens of minutes, making the crash of a processor the common case, instead of a rarity. Parallel applications running on those large machines will need to simultaneously survive crashes and maintain high productivity. To achieve that, fault tolerance techniques will have to go beyond checkpoint/restart, which requires all processors to roll back in case of a failure. Incorporating some form of message logging will provide a framework where only a subset of processors are rolled back after a crash. In this paper, we discuss why a simple causal message logging protocol seems a promising alternative to provide fault tolerance in large supercomputers. As opposed to pessimistic message logging, it has low latency overhead, especially in collective communication operations. Besides, it saves messages when more than one thread is running per processor. Finally, we demonstrate that a simple causal message logging protocol has a faster recovery and a low performance penalty when compared to checkpoint/restart. Running NAS Parallel Benchmarks (CG, MG and BT) on 1024 processors, simple causal message logging has a latency overhead below 5%.

Bronevetsky, G; Meneses, E; Kale, L V

2011-02-25

184

Using Server Clusterization to Establish Fault-Tolerant Internet Connectivity  

Directory of Open Access Journals (Sweden)

Full Text Available This study discusses the issue of providing tolerance to hardware and software faults in Internet system as well as issues related to clusterization of servers. A replication scheme is presented, and a detailed dependability analysis of this scheme is performed. The proposed model was designed mainly for fault-tolerant internet system where many unrelated applications could compete for hardware and software resources, thereby exhibiting highly varying and dynamic system characteristics. A major feature of the model under consideration is to attempt the adaptive execution of redundant components for a required level of fault tolerance.

O.O. Adeosun

2010-11-01

185

Electrical Steering of Vehicles - Fault-tolerant Analysis and Design  

DEFF Research Database (Denmark)

The topic of this paper is systems that need be designed such that no single fault can cause failure at the overall level. A methodology is presented for analysis and design of fault-tolerant architectures, where diagnosis and autonomous reconfiguration can replace high cost triple redundancy solutions and still meet strict requirements to functional safety. The paper applies graph-based analysis of functional system structure to find a novel fault-tolerant architecture for an electrical steering where a dedicated AC-motor design and cheap voltage measurements ensure ability to detect all relevant faults. The paper shows how active control reconfiguration can accommodate all critical faults and the fault-tolerant abilities are demonstrated on a warehouse truck hardware.

Blanke, Mogens; Thomsen, Jesper Sandberg

2006-01-01

186

Fault Tolerant Design for Attitude Orbit Control System (AOCS) of ADEOS-II (Advanced Earth Observing Satellite-II)  

Science.gov (United States)

Fault tolerance of Spacecraft Attitude and Control Subsystem (AOCS) is extremely important because an AOCS failure can result in the total loss of spacecraft. More specifically, FDIR (Fault Detection, Isolation and Recovery) function has been applied to many satellites resulting in successful mission operations. This paper presents the FDIR function for the AOCS of ADEOS-II with emphasis on a newly developed FDIR for hybrid navigation. Hybrid navigation which is ADEOS-II nominal operational mode processes GPS Receiver (GPSR), Fine Sun Sensor Head (FSSH), Inertial Reference Unit (IRU) and Earth Sensor Assembly (ESA) for attitude determination and achieves specified attitude control accuracy. Regarding fault tolerance of hybrid navigation, by employing FDIR system, the automatic recovery from failure modes has been proven in the real orbit operations, and the highly robust AOCS was developed. Features of new FDIR are shown below. (a)The FDIR not only monitors anomalies of AOCS components but evaluate the attitude determination of hybrid navigation in comparison with the attitude determination of normal navigation operates independently. (b)The FDIR in case of anomaly turns into the normal navigation mode and maintains continuity of almost earth observation mission. The approach and algorithm developed in this FDIR design can be applied to next earth observing satellite required the precise attitude control accuracy.

Kojima, Yasushi; Tanamachi, Takehiko; Ohkami, Yoshiaki

187

Fault Tolerant Magnetic Bearing for Turbomachinery  

Science.gov (United States)

NASA Glenn Research Center (GRC) has developed a Fault-Tolerant Magnetic Bearing Suspension rig to enhance the bearing system safety. It successfully demonstrated that using only two active poles out of eight redundant poles from each radial bearing (that is, simply 12 out of 16 poles dead) levitated the rotor and spun it without losing stability and desired position up to the maximum allowable speed of 20,000 rpm. In this paper, it is demonstrated that as far as the summation of force vectors of the attracting poles and rotor weight is zero, a fault-tolerant magnetic bearing system maintained the rotor at the desired position without losing stability even at the maximum rotor speed. A proportional-integral-derivative (PID) controller generated autonomous corrective actions with no operator's input for the fault situations without losing load capacity in terms of rotor position. This paper also deals with a centralized modal controller to better control the dynamic behavior over system modes.

Choi, Benjamin; Provenza, Andrew

2001-01-01

188

Design and failure analysis of logic-compatible multilevel gain-cell-based DRAM for fault-tolerant VLSI systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper considers the problem of increasing the storage density in fault-tolerant VLSI systems which require only limited data retention times. To this end, the concept of storing many bits per memory cell is applied to area-efficient and fully logic-compatible gain-cell-based dynamic memories. A memory macro in 90-nm CMOS technology including multilevel write and read circuits is proposed and analyzed with respect to its read failure probability due to within-die process variations by mea...

2011-01-01

189

Diagnosis and Fault-tolerant Control, 2nd edition.  

DEFF Research Database (Denmark)

Fault-tolerant control aims at a graceful degradation of the behaviour of automated systems in case of faults. It satisfies the industrial demand for enhanced availability and safety, in contrast to traditional reactions to faults that bring about sudden shutdowns and loss of availability. The book presents effective model-based analysis and design methods for fault diagnosis and fault-tolerant control. Architectural and structural models are used to analyse the propagation of the fault throught the process, to test the fault detectability and to find the redundancies in the process that can be used to ensure fault tolerance. Design methods for diagnostic systems and fault-tolerant controllers are presented for processes that are described by analytical models, by discrete-event models or that can be dealt with as quantised systems. Five case studies on pilot processes show the applicability of the presented methods. The theoretical results are illustrated by two running examples used throughout the book. The second edition includes new material about reconfigurable control, diagnosis of nonlinear systems, and remote diagnosis. The application examples are extended by a steering-by-wire system and the air path of a diesel engine, both of which include experimental results. The bibliographical notes at the end of all chapters have been up-dated. The chapters end with exercises to be used in lectures.

Blanke, Mogens

2006-01-01

190

A Dynamic Effective Fault Tolerance System in Robotic Manipulator using a Hybrid Neural Network based Controller  

Directory of Open Access Journals (Sweden)

Full Text Available Robot manipulator play important role in the field of automobile industry, mainly it is used in gas welding application and manufacturing and assembling of motor parts. In complex trajectory, on each joint the speed of the robot manipulator is affected. For that reason, it is necessary to analyze the noise and vibration of robot's joints for predicting faults also improve the control precision of robotic manipulator. In this study we will propose a new fault detection system for Robot manipulator. The proposed hybrid fault detection system is designed based on fuzzy support vector machine and Artificial Neural Networks (ANNs. In this system the decouple joints are identified and corrected using fuzzy SVM, here non-linear signal are used for complete process and treatment, the Artificial Neural Networks (ANNs are used to detect the free-swinging and locked joint of the robot, two types of neural predictors are also employed in the proposed adaptive neural network structure. The simulation results of a hybrid controller demonstrate the feasibility and performance of the methodology.

G. Jiji

2014-04-01

191

Fault tolerant control - a residual based set-up  

DEFF Research Database (Denmark)

A new set-up for fault tolerant control (FTC) for stable systems is presented in this paper. The new set-up is based on a simple implementation of the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization. This implementation of the YJBK parameterization will allow a direct and simple reconfiguration of the feedback controller. Another central part of fault tolerant control is fault diagnosis. The controller implementation can be applied directly in connection with both passive diagnosis (PFD) as well as with active fault diagnosis (AFD). The presented FTC set-up is investigated with respect to sensor reconfiguration. Actuator reconfiguration can be dealt with in a similar way.

Niemann, Hans Henrik; Poulsen, Niels Kjølstad

2009-01-01

192

Application layer techniques for hardware and software fault tolerance  

Energy Technology Data Exchange (ETDEWEB)

Application layer techniques (ALTs) have been suggested as add-on techniques that will improve the overall fault tolerance of a system on top of the fault tolerance provided by the hardware and operating systems - level techniques. Compared to the techniques in the other two layers, ALTs have the advantage that they are flexible and less expensive. In this paper we discuss three varieties of ALTs namely control-flow checking using assertions (CCA), algorithm-based fault tolerance (ABFT), and multi-version objects (MVOs). The three approaches are relatively orthogonal in the sense that application of any combination of the techniques improves the fault tolerance of the system in a complementary fashion. Illustrative examples are provided for each technique.

Nair, V.S.S. [Southern Methodist Univ., Dallas, TX (United States)

1996-12-31

193

Enhanced Maritime Safety through Diagnosis and Fault Tolerant Control  

DEFF Research Database (Denmark)

Faults in steering, navigation instruments or propulsion machinery are serious on a marine vessel since the consequence could be loss of maneuvering ability, and imply risk of damage to vessel personnel or environment. Early diagnosis and accomodation of faults could enhance safety. Fault-tolerant control is a methodology to help prevent that faults develop into failure. The means include on-line fault diagnosis, automatic condition assessment and calculation of remedial action to avoid hazards. This paper gives an overview of methods to obtain fault-tolerance: fault diagnosis; analysis of properties of a falty system; means to determine remedial actions. The paper illustrates the techniques by two marine examples, sensor fusion for automatic steering and control of the main engine.

Blanke, Mogens

2001-01-01

194

Fault Tolerant Architecture for Telecom Wireless CORBA  

Directory of Open Access Journals (Sweden)

Full Text Available In order for non-mobile ORB to interoperate with CORBA objects and clients running on a mobile terminal, OMG have specified Wireless Access and Terminal Mobility of CORBA. In the common core of the CORBA specification, Fault Tolerance has been specified. But it is intended for the wired networks. This study proposes a fault tolerant architecture for the Telecom wireless CORBA based on replication and checkpoint of objects. The storage available at Access Bridge is employed to log messages and entity states of objects on behalf of mobile terminals. The logging and recovery infrastructures are designed on each Access Bridge, to implement the fault tolerant for Telecom wireless CORBA. The Logging Mechanism records the message in a log, from which the Recovery Mechanism can retrieve the message during recovery. The performance analysis shows that the proposed fault tolerant architecture ensures a low loss of computing incurred by the fault of the server object. The proposed fault tolerance architecture is a graceful extension of the original wired Fault Tolerant CORBA and is able to cooperate with the published CORBA specifications seamlessly.

Zhenpeng Xu

2013-01-01

195

Extensions to the Parallel Real-Time Artificial Intelligence System (PRAIS) for fault-tolerant heterogeneous cycle-stealing reasoning  

Science.gov (United States)

Extensions to an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS) are discussed. PRAIS strives for transparently parallelizing production (rule-based) systems, even under real-time constraints. PRAIS accomplished these goals (presented at the first annual C Language Integrated Production System (CLIPS) conference) by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors. Results using the original PRAIS architecture over a network of Sun 3's, Sun 4's and VAX's are presented. Mechanisms using the producer-consumer model to extend the architecture for fault-tolerance and distributed truth maintenance initiation are also discussed.

Goldstein, David

1991-01-01

196

Formal and Fault Tolerant Design  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Software quality and reliability were verified for a long time at the post-implementation level (test, fault sce-nario ...). The design of embedded systems and digital circuits is more and more complex because of inte-gration density, heterogeneity. Now almost ¾ of the digital circuits contain at least one processor, that is, can execute software code. In other words, co-design is the most usual case and traditional verification by simu-lation is no more practical. Moreover, the increase in ...

Aljer, Ammar; Devienne, Philippe

2012-01-01

197

Bounded-set approach to the evaluation of the reliability of fault-tolerant systems. Part 1: methodology, powered spares  

Energy Technology Data Exchange (ETDEWEB)

In the paper, we present a new method for fault-tolerant reliability calculations. This method is developed to analyse closed non-Markov systems, i.e. systems containing modules having arbitrary failure-time distributions. Such systems arise for a number of reasons, such as when redundancy is employed at various levels, or when considering the effect of transient faults. The analysis of such systems by conventional techniques will be very difficult. As the bounded-set approach is new to the area of reliability modeling, a step-bystep development of the approach is presented. in the paper. The model described is suitable for closed Markov systems with powered spares. Such systems can also be modelled using the more established Markov state transition approach. Comparisons of the results obtained using the bounded-set approach with those obtained using the Markov state transition approach are given. In the companion paper, further extensions o the bounded-set approach are given to enable the modeling of closed Markov systems with unpowered spares, non-Markov systems and systems whose components cannot be readily aggregated into independent subsystems.

Yak, Y.W.; Dillon, T.S.; Forward, K.E.

1985-07-01

198

Fault tolerant control of a three-phase three-wire shunt active filter system based on reliability analysis  

Energy Technology Data Exchange (ETDEWEB)

This paper deals with fault tolerant shunt three-phase three-wire active filter topologies for which reliability is very important in industry applications. The determination of the optimal reconfiguration structure among various ones with or without redundant components is discussed based on reliability criteria. First, the reconfiguration of the inverter is detailed and a fast fault diagnosis method for power semi-conductor or driver fault detection and compensation is presented. This method avoids false fault detection due to power semi-conductors switching. The control architecture and algorithm are studied and a fault tolerant control strategy is considered. Simulation results in open and short circuit cases validate the theoretical study. Finally, the reliability of the studied three-phase three-wire filter shunt active topologies is analyzed to determine the optimal one. (author)

Poure, P. [Laboratoire d' Instrumentation Electronique de Nancy LIEN, EA 3440, Nancy-Universite, Faculte des Sciences et Techniques, BP 239, 54506 Vandoeuvre Cedex (France); Weber, P.; Theilliol, D. [Centre de Recherche en Automatique de Nancy UMR 7039, Nancy-Universite, CNRS, Faculte des Sciences et Techniques, BP 239, 54506 Vandoeuvre Cedex (France); Saadate, S. [Groupe de Recherches en Electrotechnique et Electronique de Nancy UMR 7037, Nancy-Universite, CNRS, Faculte des Sciences et Techniques, BP 239, 54506 Vandoeuvre Cedex (France)

2009-02-15

199

Fault-Tolerant Logic Gates Using Neuromorphic CMOS Circuits  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Fault-tolerant design methods for VLSI circuits, which have traditionally been addressed at system level, will not be adequate for future very-deep submicron CMOS devices where serious degradation of reliability is expected. Therefore, a new design approach has been considered at low level of abstraction in order to implement robustness and faulttolerance into these devices. Moreover, fault tolerant properties of multi- layer feed-forward artificial neural networks have...

Joye, Neil; Schmid, Alexandre; Leblebici, Yusuf; Asai, Tetsuya; Amemiya, Yoshihito

2007-01-01

200

SEU fault tolerance in artificial neural networks  

Energy Technology Data Exchange (ETDEWEB)

In this paper the authors investigate the robustness of Artificial Neural Networks when encountering transient modification of information bits related to the network operation. These kinds of faults are likely to occur as a consequence of interaction with radiation. Results of tests performed to evaluate the fault tolerance properties of two different digital neural circuits are presented.

Velazco, R.; Assoum, A.; Radi, N.E. [Lab. de Genie Informatique, Grenoble (France); Ecoffet, R. [Centre National d`Etudes Spatiales, Toulouse (France); Botey, X. [Univ. Politecnica de Catalunya, Barcelona (Spain)

1995-12-01

 
 
 
 
201

Analysing an Integrated Technique for the Software Requirements of a Safety Critical System based on Software Inspection, Requirements Traceability and Fault Tolerance  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Requirement analysis is important for developing and implementing safety critical systems.The system identified is intended to reduce the risks and fulfils earlier discovered safety requirements. A single error during requirements gathering can generate serious software faults. Software inspection is done for the verification and validation process. Traceability of requirements is carried out to avoid later risk factors. Fault tolerant architectures are used for the efficient behaviour of saf...

2011-01-01

202

AN EFFICIENT FAULT TOLERANT SCHEDULING APPROACH FOR COMPUTATIONAL GRID  

Directory of Open Access Journals (Sweden)

Full Text Available Grid computing serves as an important technology to facilitate distributed computation computational grids solve large scale scientific problems using heterogeneous geographically distributed resources. Problems like dispatching and scheduling of tasks are considered as major issues in computational grid environment. The Grid Scheduler must select proper resources for executing the tasks with less response time. There are various reasons such as network failure, overloaded resource conditions, or non-availability of required software components for execution failure. Thus, fault-tolerant systems should be able to identify and handle failures and support reliable execution in the presence of failures. Hence the integration of fault tolerance measures and communication time with scheduling gains much importance. In this study, a new fault tolerance based scheduling approach Fault Tolerant Min-Min (FTMM for scheduling statically available meta tasks is proposed wherein failure rate and the fitness value are calculated. The performance of the fault tolerant scheduling policy is compared with min-min scheduling policy using GridSim and the results shows that the proposed policy performs better with less makespan in the presence of failures. The number of tasks successfully completed is also more when compared to the non-fault tolerant min-min scheduling policy. Thus the proposed FTMM algorithm not only achieves better hit rate but also improved makespan.

P. Keerthika

2012-01-01

203

Algorithmic Based Fault Tolerance Applied to High Performance Computing  

CERN Multimedia

We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance technique (Huang and Abraham, 1984) to the need of parallel distributed computation. We obtain a strongly scalable mechanism for fault tolerance. We can also detect and correct errors (bit-flip) on the fly of a computation. To assess the viability of our approach, we have developed a fault tolerant matrix-matrix multiplication subroutine and we propose some models to predict its running time. Our parallel fault-tolerant matrix-matrix multiplication scores 1.4 TFLOPS on 484 processors (cluster jacquard.nersc.gov) and returns a correct result while one process failure has happened. This represents 65% of the machine peak efficiency and less than 12% overhead with respect to the fastest failure-free implementation. We predict (and have observed) that, as we increase the processor count, the overhead of the fault tolerance drops significantly.

Bosilca, George; Dongarra, Jack; Langou, Julien

2008-01-01

204

A continuous-time semi-markov bayesian belief network model for availability measure estimation of fault tolerant systems  

Directory of Open Access Journals (Sweden)

Full Text Available In this work it is proposed a model for the assessment of availability measure of fault tolerant systems based on the integration of continuous time semi-Markov processes and Bayesian belief networks. This integration results in a hybrid stochastic model that is able to represent the dynamic characteristics of a system as well as to deal with cause-effect relationships among external factors such as environmental and operational conditions. The hybrid model also allows for uncertainty propagation on the system availability. It is also proposed a numerical procedure for the solution of the state probability equations of semi-Markov processes described in terms of transition rates. The numerical procedure is based on the application of Laplace transforms that are inverted by the Gauss quadrature method known as Gauss Legendre. The hybrid model and numerical procedure are illustrated by means of an example of application in the context of fault tolerant systems.Neste trabalho, é proposto um modelo baseado na integração entre processos semi-Markovianos e redes Bayesianas para avaliação da disponibilidade de sistemas tolerantes à falha. Esta integração resulta em um modelo estocástico híbrido o qual é capaz de representar as características dinâmicas de um sistema assim como tratar as relações de causa e efeito entre fatores externos tais como condições ambientais e operacionais. Além disso, o modelo híbrido permite avaliar a propagação de incerteza sobre a disponibilidade do sistema. É também proposto um procedimento numérico para a solução das equações de probabilidade de estado de processos semi-Markovianos descritos por taxas de transição. Tal procedimento numérico é baseado na aplicação de transformadas de Laplace que são invertidas pelo método de quadratura Gaussiana conhecido como Gauss Legendre. O modelo híbrido e procedimento numérico são ilustrados por meio de um exemplo de aplicação no contexto de sistemas tolerantes à falha.

Márcio das Chagas Moura

2008-08-01

205

A continuous-time semi-markov bayesian belief network model for availability measure estimation of fault tolerant systems  

Scientific Electronic Library Online (English)

Full Text Available SciELO Brazil | Language: English Abstract in portuguese Neste trabalho, é proposto um modelo baseado na integração entre processos semi-Markovianos e redes Bayesianas para avaliação da disponibilidade de sistemas tolerantes à falha. Esta integração resulta em um modelo estocástico híbrido o qual é capaz de representar as características dinâmicas de um s [...] istema assim como tratar as relações de causa e efeito entre fatores externos tais como condições ambientais e operacionais. Além disso, o modelo híbrido permite avaliar a propagação de incerteza sobre a disponibilidade do sistema. É também proposto um procedimento numérico para a solução das equações de probabilidade de estado de processos semi-Markovianos descritos por taxas de transição. Tal procedimento numérico é baseado na aplicação de transformadas de Laplace que são invertidas pelo método de quadratura Gaussiana conhecido como Gauss Legendre. O modelo híbrido e procedimento numérico são ilustrados por meio de um exemplo de aplicação no contexto de sistemas tolerantes à falha. Abstract in english In this work it is proposed a model for the assessment of availability measure of fault tolerant systems based on the integration of continuous time semi-Markov processes and Bayesian belief networks. This integration results in a hybrid stochastic model that is able to represent the dynamic charact [...] eristics of a system as well as to deal with cause-effect relationships among external factors such as environmental and operational conditions. The hybrid model also allows for uncertainty propagation on the system availability. It is also proposed a numerical procedure for the solution of the state probability equations of semi-Markov processes described in terms of transition rates. The numerical procedure is based on the application of Laplace transforms that are inverted by the Gauss quadrature method known as Gauss Legendre. The hybrid model and numerical procedure are illustrated by means of an example of application in the context of fault tolerant systems.

Moura, Márcio das Chagas; Droguett, Enrique López.

206

SABRE: a bio-inspired fault-tolerant electronic architecture.  

Science.gov (United States)

As electronic devices become increasingly complex, ensuring their reliable, fault-free operation is becoming correspondingly more challenging. It can be observed that, in spite of their complexity, biological systems are highly reliable and fault tolerant. Hence, we are motivated to take inspiration for biological systems in the design of electronic ones. In SABRE (self-healing cellular architectures for biologically inspired highly reliable electronic systems), we have designed a bio-inspired fault-tolerant hierarchical architecture for this purpose. As in biology, the foundation for the whole system is cellular in nature, with each cell able to detect faults in its operation and trigger intra-cellular or extra-cellular repair as required. At the next level in the hierarchy, arrays of cells are configured and controlled as function units in a transport triggered architecture (TTA), which is able to perform partial-dynamic reconfiguration to rectify problems that cannot be solved at the cellular level. Each TTA is, in turn, part of a larger multi-processor system which employs coarser grain reconfiguration to tolerate faults that cause a processor to fail. In this paper, we describe the details of operation of each layer of the SABRE hierarchy, and how these layers interact to provide a high systemic level of fault tolerance. PMID:23302298

Bremner, P; Liu, Y; Samie, M; Dragffy, G; Pipe, A G; Tempesti, G; Timmis, J; Tyrrell, A M

2013-03-01

207

On the Fault Tolerance and Hamiltonicity of the Optical Transpose Interconnection System of Non-Hamiltonian Base Graphs  

CERN Multimedia

Hamiltonicity is an important property in parallel and distributed computation. Existence of Hamiltonian cycle allows efficient emulation of distributed algorithms on a network wherever such algorithm exists for linear-array and ring, and can ensure deadlock freedom in some routing algorithms in hierarchical interconnection networks. Hamiltonicity can also be used for construction of independent spanning tree and leads to designing fault tolerant protocols. Optical Transpose Interconnection Systems or OTIS (also referred to as two-level swapped network) is a widely studied interconnection network topology which is popular due to high degree of scalability, regularity, modularity and package ability. Surprisingly, to our knowledge, only one strong result is known regarding Hamiltonicity of OTIS - showing that OTIS graph built of Hamiltonian base graphs are Hamiltonian. In this work we consider Hamiltonicity of OTIS networks, built on Non-Hamiltonian base and answer some important questions. First, we prove tha...

Ghosh, Esha; Rangan, C Pandu

2011-01-01

208

Application of a fault-tolerant microprocessor-based core-surveillance system in a German fast breeder reactor  

International Nuclear Information System (INIS)

For the fast breeder reactor KNK II at Karlsruhe, Germany, a microprocessor-based safety shut-down system is built. Analogue to the triple modular instrumentation it consists of TMR hardware. Functionally it is split into four blocks which operate in cascade-like fashion. The main functions are mean value calculation, current limit control, trend control, and final evaluation. In order to secure correctness, several constructive and analytical methods are applied for fault avoidance, like formal specification languages, programming guidelines, software quality assurance plan, validation, verification, and testing. Since additional means for correct and safe operation are still necessary, fault-tolerance and error-detection techniques are applied. These include self-checking programs, plausibility checks, control data, information exchange and control between the redundancies, and especially diversity. This diversity refers to different teams for the different development phases as well as to different tools and environments, like different programming languages for the application software. Three separate but functional identical programs will be implemented in Iftran, Pascal and PL/M. These will not only be used during the extensive testing period, but also during final operation

1986-09-01

209

A Primer on Architectural Level Fault Tolerance  

Science.gov (United States)

This paper introduces the fundamental concepts of fault tolerant computing. Key topics covered are voting, fault detection, clock synchronization, Byzantine Agreement, diagnosis, and reliability analysis. Low level mechanisms such as Hamming codes or low level communications protocols are not covered. The paper is tutorial in nature and does not cover any topic in detail. The focus is on rationale and approach rather than detailed exposition.

Butler, Ricky W.

2008-01-01

210

Automating the Addition of Fault Tolerance with Discrete Controller Synthesis  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Discrete controller synthesis (DCS) is a formal approach, based on the same state-space exploration algorithms as model-checking. Its interest lies in the ability to obtain automatically systems satisfying by construction formal properties specified a priori. In this paper, our aim is to demonstrate the feasibility of this approach for fault tolerance. We start with a fault intolerant program, modeled as the synchronous parallel composition of finite labeled transition systems; we specify for...

2009-01-01

211

Design of Fault Tolerant Reversible Multiplier  

Directory of Open Access Journals (Sweden)

Full Text Available In the recent years, reversible logic has emerged as a promising technology having its applications in low power CMOS, quantum computing, nanotechnology, and optical computing. The classical set of gates such as AND, OR, and EXOR are not reversible. This paper proposes a novel 4x4 bit reversible fault tolerant multiplier circuit which can multiply two 4-bit numbers. It is faster and has lower hardware complexity compared to the existing designs. In addition, the proposed reversible multiplier is better than the existing counterparts in terms of delay & power. It is based on two concepts. The partial products can be generated in parallel using Fredkin gates and thereafter the addition is done by using reversible parallel adder designed from IG gates. Thus, this paper provides the initial threshold to building of more complex system which can execute more complicated operations using reversible logic.

H. P. Sinha

2012-01-01

212

A tool for automatic formal analysis of fault tolerance  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The use of computer-based systems is rapidly increasing and such systems can now be found in a wide range of applications, including safety-critical applications such as cars and aircrafts. To make the development of such systems more efficient, there is a need for tools for automatic safety analysis, such as analysis of fault tolerance. In this thesis, a tool for automatic formal analysis of fault tolerance was developed. The tool is built on top of the existing development environment for t...

Nilsson, Markus

2005-01-01

213

Garbage collection: an exercise in distributed, fault-tolerant programming  

Energy Technology Data Exchange (ETDEWEB)

Two garbage-collection algorithms are presented to reclaim unused storage in object-oriented systems implemented on local area networks. The algorithms are fault-tolerant and allowed parallel, incremental collection in an object address space distributed throughout the system. The two approaches allow multiple collectors, so some unused storage can be reclaimed in partitioned networks. The first method makes use of fault-tolerant reference counts together with an algorithm to collect cycles of objects that would otherwise remain unclaimed. The second method adapts a parallel collector so that it can be used to collect subspaces of the entire network address space. Throughout this work concern is with a methodology for developing distributed, parallel, fault-tolerant programs. Also, there is concern with the suitability of object-oriented systems for such applications.

Vestal, S.C.

1987-01-01

214

Analysing an Integrated Technique for the Software Requirements of a Safety Critical System based on Software Inspection, Requirements Traceability and Fault Tolerance  

Directory of Open Access Journals (Sweden)

Full Text Available Requirement analysis is important for developing and implementing safety critical systems.The system identified is intended to reduce the risks and fulfils earlier discovered safety requirements. A single error during requirements gathering can generate serious software faults. Software inspection is done for the verification and validation process. Traceability of requirements is carried out to avoid later risk factors. Fault tolerant architectures are used for the efficient behaviour of safety critical systems. This integrated environment is proposed for developing the safety critical system. The computer aided tool is used for the integrated environment

V.Sree Dharinya

2011-06-01

215

Reconfigurable Fault Tolerance for FPGAs  

Science.gov (United States)

The invention allows a field-programmable gate array (FPGA) or similar device to be efficiently reconfigured in whole or in part to provide higher capacity, non-redundant operation. The redundant device consists of functional units such as adders or multipliers, configuration memory for the functional units, a programmable routing method, configuration memory for the routing method, and various other features such as block RAM, I/O (random access memory, input/output) capability, dedicated carry logic, etc. The redundant device has three identical sets of functional units and routing resources and majority voters that correct errors. The configuration memory may or may not be redundant, depending on need. For example, SRAM-based FPGAs will need some type of radiation-tolerant configuration memory, or they will need triple-redundant configuration memory. Flash or anti-fuse devices will generally not need redundant configuration memory. Some means of loading and verifying the configuration memory is also required. These are all components of the pre-existing redundant FPGA. This innovation modifies the voter to accept a MODE input, which specifies whether ordinary voting is to occur, or if redundancy is to be split. Generally, additional routing resources will also be required to pass data between sections of the device created by splitting the redundancy. In redundancy mode, the voters produce an output corresponding to the two inputs that agree, in the usual fashion. In the split mode, the voters select just one input and convey this to the output, ignoring the other inputs. In a dual-redundant system (as opposed to triple-redundant), instead of a voter, there is some means to latch or gate a state update only when both inputs agree. In this case, the invention would require modification of the latch or gate so that it would operate normally in redundant mode, and would separately latch or gate the inputs in non-redundant mode.

Shuler, Robert, Jr.

2010-01-01

216

Fault Tolerant Heterogeneous Limited Duplication Scheduling algorithm for Decentralized Grid  

Directory of Open Access Journals (Sweden)

Full Text Available Fault tolerance is one of the most desirable property in decentralized grid computing systems, where computational resources are geographically distributed. These resources collaborate in order to execute workflow applications as fast as possible. In workflow applications, tasks are dependent on each other, so it becomes extremely vital that scheduling techniques should also have some decentralized fault tolerant mechanism. In this paper, we have proposed a decentralized fault tolerant mechanism which utilize the checkpoint concept; for Heterogeneous Limited Duplication (HLD algorithm. HLD is based on task duplication scheduling in heterogeneous environment. There are two fold benefits firstly; if node failure occurs then rest of grid nodes sustain the execution of application. Secondly, less makespan of application is obtained using checkpoint concept. Therefore, application scheduled over decentralized grid systems (which are known for their unreliable behavior will yield results fast utilizing algorithm proposed in this paper.

DR. NITIN

2013-04-01

217

Measures of Fault Tolerance in Distributed Simulated Annealing  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper, we examine the different measures of Fault Tolerance in a Distributed Simulated Annealing process. Optimization by Simulated Annealing on a distributed system is prone to various sources of failure. We analyse simulated annealing algorithm, its architecture in distributed platform and potential sources of failures. We examine the behaviour of tolerant distributed system for optimization task. We present possible methods to overcome the failures and achieve fau...

Prakash, Aaditya

2012-01-01

218

Interactive animation of fault-tolerant parallel algorithms  

Energy Technology Data Exchange (ETDEWEB)

Animation of algorithms makes understanding them intuitively easier. This paper describes the software tool Raft (Robust Animator of Fault Tolerant Algorithms). The Raft system allows the user to animate a number of parallel algorithms which achieve fault tolerant execution. In particular, we use it to illustrate the key Write-All problem. It has an extensive user-interface which allows a choice of the number of processors, the number of elements in the Write-All array, and the adversary to control the processor failures. The novelty of the system is that the interface allows the user to create new on-line adversaries as the algorithm executes.

Apgar, S.W.

1992-02-01

219

Design Approach for Fault Tolerance in FPGA Architecture  

Directory of Open Access Journals (Sweden)

Full Text Available Failures of nano-metric technologies owing to defects and shrinking process tolerances give rise tosignificant challenges for IC testing. In recent years the application space of reconfigurable devices hasgrown to include many platforms with a strong need for fault tolerance. While these systems frequentlycontain hardware redundancy to allow for continued operation in the presence of operational faults, theneed to recover faulty hardware and return it to full functionality quickly and efficiently is great. Inaddition to providing functional density, FPGAs provide a level of fault tolerance generally not found inmask-programmable devices by including the capability to reconfigure around operational faults in thefield. Reliability and process variability are serious issues for FPGAs in the future. With advancement inprocess technology, the feature size is decreasing which leads to higher defect densities, moresophisticated techniques at increased costs are required to avoid defects. If nano-technology fabricationare applied the yield may go down to zero as avoiding defect during fabrication will not be a feasibleoption Hence, feature architecture have to be defect tolerant. In regular structure like FPGA, redundancyis commonly used for fault tolerance. In this work we present a solution in which configuration bit-streamof FPGA is modified by a hardware controller that is present on the chip itself. The technique usesredundant device for replacing faulty device and increases the yield.

Ms. Shweta S. Meshram

2011-03-01

220

Fault-Tolerant Partial Replication in Large-Scale Database Systems  

CERN Document Server

We investigate a decentralised approach to committing transactions in a replicated database, under partial replication. Previous protocols either reexecute transactions entirely and/or compute a total order of transactions. In contrast, ours applies update values, and orders only conflicting transactions. It results that transactions execute faster, and distributed databases commit in small committees. Both effects contribute to preserve scalability as the number of databases and transactions increase. Our algorithm ensures serializability, and is live and safe in spite of faults.

Sutra, Pierre

2008-01-01

 
 
 
 
221

An Active Fault-Tolerant PWM Tracker for Unknown Nonlinear Stochastic Hybrid Systems: NARMAX Model and OKID-Based State-Space Self-Tuning Control  

Digital Repository Infrastructure Vision for European Research (DRIVER)

An active fault-tolerant pulse-width-modulated tracker using the nonlinear autoregressive moving average with exogenous inputs model-based state-space self-tuning control is proposed for continuous-time multivariable nonlinear stochastic systems with unknown system parameters, plant noises, measurement noises, and inaccessible system states. Through observer/Kalman filter identification method, a good initial guess of the unknown parameters of the chosen model is obtained so as to reduce the ...

Chu-Tong Wang; Tsai, Jason S. H.; Chia-Wei Chen; You Lin; Shu-Mei Guo; Leang-San Shieh

2010-01-01

222

Using Peer Support to Reduce Fault-Tolerant Overhead in Distributed Shared Memories  

Canada Institute for Scientific and Technical Information (Canada)

We present a peer logging system for reducing peformance overhead in fault tolerant distributed shared memory systems. Our system provides fault tolerant shared memory using individual checkpointing and rollback, Peer logging logs DSM modification messages to remote nodes instead of to local disks. We present results for implementations of our fault tolerant technique using simulations of both TreadMarks, a software only DSM, and Cashmere, a DSM using memory mapped hardware. We compare simulations with no fault tolerance to simulations with local disk logging and peer logging. We present results showing that fault tolerant Treadmarks can be achieved with an average of 17 percent overhead for peer logging. We also present results showing that while almost any DSM protocol can be made fault tolerant, systems with localized DSM page meta-data have much lower overheads.

1996-01-01

223

A modified NARMAX model-based self-tuner with fault tolerance for unknown nonlinear stochastic hybrid systems with an input-output direct feed-through term.  

Science.gov (United States)

A modified nonlinear autoregressive moving average with exogenous inputs (NARMAX) model-based state-space self-tuner with fault tolerance is proposed in this paper for the unknown nonlinear stochastic hybrid system with a direct transmission matrix from input to output. Through the off-line observer/Kalman filter identification method, one has a good initial guess of modified NARMAX model to reduce the on-line system identification process time. Then, based on the modified NARMAX-based system identification, a corresponding adaptive digital control scheme is presented for the unknown continuous-time nonlinear system, with an input-output direct transmission term, which also has measurement and system noises and inaccessible system states. Besides, an effective state space self-turner with fault tolerance scheme is presented for the unknown multivariable stochastic system. A quantitative criterion is suggested by comparing the innovation process error estimated by the Kalman filter estimation algorithm, so that a weighting matrix resetting technique by adjusting and resetting the covariance matrices of parameter estimate obtained by the Kalman filter estimation algorithm is utilized to achieve the parameter estimation for faulty system recovery. Consequently, the proposed method can effectively cope with partially abrupt and/or gradual system faults and input failures by the fault detection. PMID:24012389

Tsai, Jason S-H; Hsu, Wen-Teng; Lin, Long-Guei; Guo, Shu-Mei; Tann, Joseph W

2014-01-01

224

Fault-tolerant search algorithms reliable computation with unreliable information  

CERN Document Server

Why a book on fault-tolerant search algorithms? Searching is one of the fundamental problems in computer science. Time and again algorithmic and combinatorial issues originally studied in the context of search find application in the most diverse areas of computer science and discrete mathematics. On the other hand, fault-tolerance is a necessary ingredient of computing. Due to their inherent complexity, information systems are naturally prone to errors, which may appear at any level - as imprecisions in the data, bugs in the software, or transient or permanent hardware failures. This book pr

Cicalese, Ferdinando

2013-01-01

225

A High-Throughput Byzantine Fault-Tolerant Protocol  

Digital Repository Infrastructure Vision for European Research (DRIVER)

State-machine replication (SMR) is a software technique for tolerating failures and for providing high availability in large-scale systems, through the use of commodity hardware. A replicated state-machine comprises a number of replicas, each of which runs an agreement protocol, with the goal of ensuring a consistent state across all of the replicas. In hostile environments, such as the Internet, Byzantine fault tolerant state-machine replication (BFT...

2012-01-01

226

Broadcasting Messages in Fault-Tolerant Distributed Systems: the benefit of handling input-triggered and output-triggered suspicions differently  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper investigates the two main and seemingly antagonistic approaches to broadcasting messages in fault-tolerant distributed systems: the approach based on Reliable Broadcast, and the one based on View Synchronous Communication (or VSC for short). We discuss both communication primitives in a system model with fair-lossy channel, which leads us to introduce the "time-bounded buffering" problem: VSC addresses this problem, but not Reliable Broadcast. Moreover, we show that VSC solves Reli...

Charron-bost, Bernadette; Defago, Xavier; Schiper, Andre?

2002-01-01

227

A New Checkpoint Approach for Fault Tolerance in Grid Computing  

Directory of Open Access Journals (Sweden)

Full Text Available Computational and Service grid are used to solve large-scalescientific application using grid resources. The main focus is onfault identification, fault rectification (fault tolerance usingcheckpoint approaches. In order to achieve the fault tolerance,checkpoint approach can be used. Job check pointing is one ofthe most common utilized techniques for providing faulttolerance in computational grids. The effectiveness of checkpointing depends on the choice of the checkpoint interval. Acommon technique for fault tolerance is dynamically adaptingthe checkpoint, in which all the failure information aremaintained in the Grid Information Server. This requires aseparate server for storage purpose in order to increase theexecution time. The main goal of checkpoint approach is tominimize the overall execution time in grid system. In this workfault tolerant scheduling is achieved using kernel-levelcheckpoint. In case of resource failure, the Fault Index BasedRescheduling (FIBR algorithm is used to reschedule the jobs tosome other available resources. This ensures that the job isexecuted with minimized execution time.

Gokuldev S

2013-06-01

228

The Analysis of Multi-Layer Fault-Tolerance Methodology for Applying COTS in Deep Space Missions  

Science.gov (United States)

Fault-tolerant systems are traditionally divided into fault containment regions and custom logic is added to ensure the effects of a fault within a containment region would not propagate to the other regions.

Chau, S.; Alkalai, L.; Tai, A.

2000-01-01

229

Control switching in high performance and fault tolerant control  

DEFF Research Database (Denmark)

The problem of reliability in high performance control and in fault tolerant control is considered in this paper. A feedback controller architecture for high performance and fault tolerance is considered. The architecture is based on the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization. By using the nominal controller in the architecture as a simple and robust controller, it is possible to use the YJBK transfer function for optimization of the closed-loop performance. This can be done both in connections with normal operation of the system as well as in connection with faults in the system. The architecture will also allow changing the applied sensors and/or actuators when switching between different controllers. This switchingget particular simple for open-loop stable systems.

Niemann, Hans Henrik; Poulsen, Niels Kjølstad

2010-01-01

230

A Unified Fault-Tolerance Protocol  

Science.gov (United States)

Davies and Wakerly show that Byzantine fault tolerance can be achieved by a cascade of broadcasts and middle value select functions. We present an extension of the Davies and Wakerly protocol, the unified protocol, and its proof of correctness. We prove that it satisfies validity and agreement properties for communication of exact values. We then introduce bounded communication error into the model. Inexact communication is inherent for clock synchronization protocols. We prove that validity and agreement properties hold for inexact communication, and that exact communication is a special case. As a running example, we illustrate the unified protocol using the SPIDER family of fault-tolerant architectures. In particular we demonstrate that the SPIDER interactive consistency, distributed diagnosis, and clock synchronization protocols are instances of the unified protocol.

Miner, Paul; Gedser, Alfons; Pike, Lee; Maddalon, Jeffrey

2004-01-01

231

A distributed fault tolerant architecture for nuclear reactor control and safety functions  

International Nuclear Information System (INIS)

This paper reports on a fault tolerance architecture that provides tolerance to a broad scope of hardware, software, and communications faults which is being developed. This architecture relies on widely commercially available operating systems, local area networks, and software standards. Thus, development time is significantly shortened, and modularity allows for continuous and inexpensive system enhancement throughout the expected 20- year life. The fault containment and parallel processing capabilites of computers network are being exploited to provide a high performance, high availability network capable of tolerating a broad scope of hardware software, and operating system faults. The system can tolerate all but one known (and avoidable) single fault, two known and avoidable dual faults, and will detect all higher order fault sequences and provide diagnostics to allow for rapid manual recovery

1989-12-05

232

Proactive Fault Tolerance Using Preemptive Migration  

Energy Technology Data Exchange (ETDEWEB)

Proactive fault tolerance (FT) in high-performance computing is a concept that prevents compute node failures from impacting running parallel applications by preemptively migrating application parts away from nodes that are about to fail. This paper provides a foundation for proactive FT by defining its architecture and classifying implementation options. This paper further relates prior work to the presented architecture and classification, and discusses the challenges ahead for needed supporting technologies.

Engelmann, Christian [ORNL; Vallee, Geoffroy R [ORNL; Naughton, III, Thomas J [ORNL; Scott, Stephen L [ORNL

2009-01-01

233

Fault Tolerant Control of Induction Motor  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The principle of vector control of electrical machines is to control both the magnitude and the phase of each phase, current and voltage. MATLAB/Simulink has been performed for assessment of operating features of the proposed scheme. Proportional Integral (PI) speed controller is designed in this paper. Test response of the developed variable speed drive along with the simulated response is given and discussed in detail for torque and speed. Fault tolerant fundamental is applied to t...

Khalaf Salloum Gaeid

2011-01-01

234

Rapid recovery from transient faults in the fault-tolerant processor with fault-tolerant shared memory  

Science.gov (United States)

The Draper fault-tolerant processor with fault-tolerant shared memory (FTP/FTSM), which is designed to allow application tasks to continue execution during the memory alignment process, is described. Processor performance is not affected by memory alignment. In addition, the FTP/FTSM incorporates a hardware scrubber device to perform the memory alignment quickly during unused memory access cycles. The FTP/FTSM architecture is described, followed by an estimate of the time required for channel reintegration.

Harper, Richard E.; Butler, Bryan P.

1990-01-01

235

Steps toward fault-tolerant quantum chemistry.  

Energy Technology Data Exchange (ETDEWEB)

Developing quantum chemistry programs on the coming generation of exascale computers will be a difficult task. The programs will need to be fault-tolerant and minimize the use of global operations. This work explores the use a task-based model that uses a data-centric approach to allocate work to different processes as it applies to quantum chemistry. After introducing the key problems that appear when trying to parallelize a complicated quantum chemistry method such as coupled-cluster theory, we discuss the implications of that model as it pertains to the computational kernel of a coupled-cluster program - matrix multiplication. Also, we discuss the extensions that would required to build a full coupled-cluster program using the task-based model. Current programming models for high-performance computing are fault-intolerant and use global operations. Those properties are unsustainable as computers scale to millions of CPUs; instead one must recognize that these systems will be hierarchical in structure, prone to constant faults, and global operations will be infeasible. The FAST-OS HARE project is introducing a scale-free computing model to address these issues. This model is hierarchical and fault-tolerant by design, allows for the clean overlap of computation and communication, reducing the network load, does not require checkpointing, and avoids the complexity of many HPC runtimes. Development of an algorithm within this model requires a change in focus from imperative programming to a data-centric approach. Quantum chemistry (QC) algorithms, in particular electronic structure methods, are an ideal test bed for this computing model. These methods describe the distribution of electrons in a molecule, which determine the properties of the molecule. The computational cost of these methods is high, scaling quartically or higher in the size of the molecule, which is why QC applications are major users of HPC resources. The complexity of these algorithms means that MPI alone is insufficient to achieve parallel scaling; QC developers have been forced to use alternative approaches to achieve scalability and would be receptive to radical shifts in the programming paradigm. Initial work in adapting the simplest QC method, Hartree-Fock, to this the new programming model indicates that the approach is beneficial for QC applications. However, the advantages to being able to scale to exascale computers are greatest for the computationally most expensive algorithms; within QC these are the high-accuracy coupled-cluster (CC) methods. Parallel coupledcluster programs are available, however they are based on the conventional MPI paradigm. Much of the effort is spent handling the complicated data dependencies between the various processors, especially as the size of the problem becomes large. The current paradigm will not survive the move to exascale computers. Here we discuss the initial steps toward designing and implementing a CC method within this model. First, we introduce the general concepts behind a CC method, focusing on the aspects that make these methods difficult to parallelize with conventional techniques. Then we outline what is the computational core of the CC method - a matrix multiply - within the task-based approach that the FAST-OS project is designed to take advantage of. Finally we outline the general setup to implement the simplest CC method in this model, linearized CC doubles (LinCC).

Taube, Andrew Garvin

2010-05-01

236

Improving Fault Tolerance in Ad-Hoc Networks by Using Residue Number System  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this study, we presented a method for distributing data storage by using residue number system for mobile systems and wireless networks based on peer to peer paradigm. Generally, redundant residue number system is capable in error detection and correction. In proposed method, we made a new system by mixing Redundant Residue Number System (RRNS), Multi Level Residue Number System (ML RNS) and Multiple Valued Logic (MVL RNS) which was perfect for parallel, carry free, high speed arith...

Barati, A.; Dehghan, M.; Movaghar, A.; Barati, H.

2008-01-01

237

An Approach to Build Software Based on Fault Tolerance Computing Using Uncertainty Factor  

Directory of Open Access Journals (Sweden)

Full Text Available In this work, we have started with an overview on fault tolerance based system. In case of design diversity based software fault tolerance system, we observed that uncertainty remains an important factor. Keeping this factor, we have discussed about implementing Bayes’ theorem and probabilistic mathematical model to handle the uncertainty factor. We assume that, once developed, the complete model will give us better efficiency. The rest of this paper deals with other types of fault tolerance systems and their approaches. This part is a kind of literature review, which includes, fault tolerant computing schemes that rely on the single-design as well as on the multiple-design. Further, in single-design, we have discussed about recovery block, N-version programming, N self-checking programming scheme. Lastly, focusing on multiple-design, we have discussed about software engineering aspects, error detection mechanisms and fault tolerance by fault injection. The paper ends with a general conclusion.

Mrityunjay Brahma

2013-12-01

238

Improving Fault Tolerance in Ad-Hoc Networks by Using Residue Number System  

Directory of Open Access Journals (Sweden)

Full Text Available In this study, we presented a method for distributing data storage by using residue number system for mobile systems and wireless networks based on peer to peer paradigm. Generally, redundant residue number system is capable in error detection and correction. In proposed method, we made a new system by mixing Redundant Residue Number System (RRNS, Multi Level Residue Number System (ML RNS and Multiple Valued Logic (MVL RNS which was perfect for parallel, carry free, high speed arithmetic and the system supports secure data communication. In addition it had ability of error detection and correction. In comparison to other number systems, it had many improvements in data security, error detection and correction, speed of storage and retrieval.

A. Barati

2008-01-01

239

Fault Tolerant Electrical Power System. Phase II. Analysis and Preliminary Design.  

Science.gov (United States)

The primary purpose of the program is to develop an electrical power generation and distribution system that can supply electrical power to the various critical systems on the aircraft with a reliability and power quality level commensurate with the requi...

M. W. Dige P. J. Leong D. L. Sommer

1986-01-01

240

Fault tolerance for manufacturing components  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The more the information technologies begin to be incorporated into the industrial productive fabric, the more complex it becomes to organise them. It is vital to implant proactive, self-managed systems that ensure continuous operation and, therefore, business continuity. This article proposes a regeneration system for industrial production elements that transfers the concept of high availability to the manufacturing levels of the organisation, acting automatically under open protocols when t...

Marcos Jorquera, Diego; Macia? Pe?rez, Francisco; Gilart Iglesias, Virgilio; Capella D Alton, Alfonso

2006-01-01

 
 
 
 
241

On the computational aspects of performability models of fault-tolerant computer systems  

Energy Technology Data Exchange (ETDEWEB)

This paper shows that the conditional moments of performability in Markov models are the states of a cascaded, linear, continuous-time dynamic system with identical system matrices in each stage. This interpretation leads to a simple method of computing the first moment for nonhomogeneous Markov models with finite mission time. In addition, the cascaded system representation leads to the derivation of a set of two stable algorithms for propagating the conditional moments of performability in homogeneous Markov models.

Pattipati, K.R.; Shah, S.A. (Dept. of Electrical and Systems Engineering, Univ. of Connecticut, Storrs, CT (US))

1990-06-01

242

The Design of a Fault-Tolerant COTS-Based Bus Architecture for Space Applications  

Science.gov (United States)

The high-performance, scalability and miniaturization requirements together with the power, mass and cost constraints mandate the use of commercial-off-the-shelf (COTS) components and standards in the X2000 avionics system architecture for deep-space missions. In this paper, we report our experiences and findings on the design of an IEEE 1394 compliant fault-tolerant COTS-based bus architecture. While the COTS standard IEEE 1394 adequately supports power management, high performance and scalability, its topological criteria impose restrictions on fault tolerance realization. To circumvent the difficulties, we derive a "stack-tree" topology that not only complies with the IEEE 1394 standard but also facilitates fault tolerance realization in a spaceborne system with limited dedicated resource redundancies. Moreover, by exploiting pertinent standard features of the 1394 interface which are not purposely designed for fault tolerance, we devise a comprehensive set of fault detection mechanisms to support the fault-tolerant bus architecture.

Chau, Savio N.; Alkalai, Leon; Tai, Ann T.

2000-01-01

243

Error Mitigation of Point-to-Point Communication for Fault-Tolerant Computing  

Science.gov (United States)

Fault tolerant systems require the ability to detect and recover from physical damage caused by the hardware s environment, faulty connectors, and system degradation over time. This ability applies to military, space, and industrial computing applications. The integrity of Point-to-Point (P2P) communication, between two microcontrollers for example, is an essential part of fault tolerant computing systems. In this paper, different methods of fault detection and recovery are presented and analyzed.

Akamine, Robert L.; Hodson, Robert F.; LaMeres, Brock J.; Ray, Robert E.

2011-01-01

244

Distributed state estimation and model predictive control of linear interconnected system: Application to fault tolerant control  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper, a distributed and networked control system architecture based on independent Model Predictive Control/Kalman-Filter (MPC/KF) architectures, is proposed. Interconnected subsystems, possibly located at different sites, exchange information through the digital communication network. For the partial local state measurement, the key component for realistic Distributed Model Predictive Control (DMPC) formulation is the state estimations. These state estimations are generated by Kalma...

Menighed, Kamel; Aubrun, Christophe; Yame?, Joseph Julien

2009-01-01

245

Fault tolerant microcomputer based alarm annunciator for Dhruva reactor  

International Nuclear Information System (INIS)

The Dhruva alarm annunciator displays the status of 624 alarm points on an array of display windows using the standard ringback sequence. Recognizing the need for a very high availability, the system is implemented as a fault tolerant configuration. The annunciator is partitioned into three identical units; each unit is implemented using two microcomputers wired in a hot standby mode. In the event of one computer malfunctioning, the standby computer takes over control in a bouncefree transfer. The use of microprocessors has helped built-in flexibility in the system. The system also provides built-in capability to resolve the sequence of occurrence of events and conveys this information to another system for display on a CRT. This report describes the system features, fault tolerant organisation used and the hardware and software developed for the annunciation function. (author). 8 figs

1988-01-01

246

A lightweight fault-tolerant middleware for a Subaru Telescope second generation observation control system  

Science.gov (United States)

Subaru Telescope is developing a second-generation Observation Control System that specifically addresses some of the deficiencies of the current Subaru OCS. Two areas of concern are complexity and failure handling. The current system has over 1000 dedicated OCS processes spread across a dozen hosts and provides nothing in the way of automated failover. Furthermore, manual failover is so fraught with difficulty that it is rarely attempted. Our Generation 2 OCS is written almost entirely in Python and builds upon a Subaru-developed middleware based on the XML-RPC protocol. This framework offers the following benefits: - has very few dependences outside of standard Python - provides a nearly seamless remote proxy object-oriented interface - provides optional user/password authentication and/or SSL encryption - is extremely simple to use from client applications - is connectionless, and assists transparent failover of communications and services on a cluster of hosts - has reasonable performance for a wide range of needs - allows multiple language bindings - for dynamic languages, requires no interface stub files The "back end" (service side) of the OCS is nearing completion, and has already been used successfully during two separate OCS engineering runs. It is comprised of only a couple dozen processes, and provides automated failover capabilities on a rack of commodity x86 Linux servers. We provide an overview of the middleware design and its failover capabilities. Some data on the performance of communications using the middleware protocol is included.

Jeschke, Eric; Bon, Bruce; Inagaki, Takeshi; Streeper, Sam

2008-08-01

247

Adaptive Execution Assistance for Multiplexed Fault-Tolerant Chip Multiprocessors  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Relentless scaling of CMOS fabrication technology has made contemporary integrated circuits increasingly susceptible to transient faults, wearout-related permanent faults, intermittent faults and process variations. Therefore, mechanisms to mitigate the effects of decreased reliability are expected to become essential components of future general­ purpose microprocessors. In this paper, we introduce a new throughput-efficient architecture for multiplexed fault-tolerant chip multiproce...

Subramanyan, Pramod; Singh, Virendra; Saluja, Kewal; Larsson, Erik

2011-01-01

248

H? Fault Tolerant Control of WECS Based on the PWA Model  

Directory of Open Access Journals (Sweden)

Full Text Available The main contribution of this paper is the development of H? fault tolerant control for a wind energy conversion system (WECS based on the stochastic piecewise affine (PWA model. In this paper the normal and fault stochastic PWA models for WECS including multiple working points at different wind speeds are established. A reliable piecewise linear quadratic regulator state feedback is designed for the fault tolerant actuator and sensor. A sufficient condition for the existence of the passive fault tolerant controller is derived based on some linear matrix inequalities (LMIs. It is shown that the H? fault tolerant controller of WECS can control the wind turbine exposed to multiple simultaneous sensor faults or actuator faults; that is, the reliability of wind turbines can be improved.

Yun-Tao Shi

2014-03-01

249

Synthesis of Fault Tolerant Reversible Logic Circuits  

CERN Document Server

Reversible logic is emerging as an important research area having its application in diverse fields such as low power CMOS design, digital signal processing, cryptography, quantum computing and optical information processing. This paper presents a new 4*4 universal reversible logic gate, IG. It is a parity preserving reversible logic gate, that is, the parity of the inputs matches the parity of the outputs. The proposed parity preserving reversible gate can be used to synthesize any arbitrary Boolean function. It allows any fault that affects no more than a single signal readily detectable at the circuit's primary outputs. Finally, it is shown how a fault tolerant reversible full adder circuit can be realized using only two IGs. It has also been demonstrated that the proposed design offers less hardware complexity and is efficient in terms of gate count, garbage outputs and constant inputs than the existing counterparts.

Islam, Md Saiful; Begum, Zerina; Hafiz, Mohd Zulfiquar; Mahmud, Abdullah Al; 10.1109/CAS-ICTD.2009.4960883

2010-01-01

250

Fault-Tolerant Routing in Butterfly Networks  

Directory of Open Access Journals (Sweden)

Full Text Available This research shows that Butterfly networks can be fault-tolerant using Masked Interval Routing Scheme (MIRS. The MIRS was introduced with the aim of compressing the routing tables in a network. It was shown that MIRS could drastically reduce interval information stored in networks such as globe and hypercube graphs, compared to the classical Interval Routing Scheme (IRS. In Butterfly graphs of O(N vertices the number of intervals per edge goes down from ? in IRS to O(logN in MIRS. This research shows that MIRS may be advantageously used in Butterfly networks, proving that optimal routing with one interval per edge is still possible with a harmless subset of faulty vertices. This research gives an optimal algorithm to reconfigure the intervals in the presence of faults.

Mohammed H. Mahafzah

2010-01-01

251

An Efficient Byzantine Fault Tolerant Agreement  

Science.gov (United States)

Most of the distributed transaction protocols rely on atomic commitment. Dealing with arbitrary failures effectively is a major operational challenge to be faced with. In the contemporary literature, protocols such as BFTDC [3] and PBFT [5] run an effective agreement protocol. The drawback with these protocols is that they incur increased message overhead as well as latency into the protocol execution. The paper presents an efficient byzantine fault tolerant distributed commit protocol. The proposed agreement protocol helps in achieving faster execution results using an effective view change mechanism. The protocol is computationally more efficient. It has lower message complexity and reduces the time overhead incurred in transaction processing.

Saini, Poonam; Singh, Awadhesh Kumar

2010-11-01

252

Fault tolerant quantum computation with nondeterministic gates.  

Science.gov (United States)

In certain approaches to quantum computing the operations between qubits are nondeterministic and likely to fail. For example, a distributed quantum processor would achieve scalability by networking together many small components; operations between components should be assumed to be failure prone. In the ultimate limit of this architecture each component contains only one qubit. Here we derive thresholds for fault-tolerant quantum computation under this extreme paradigm. We find that computation is supported for remarkably high failure rates (exceeding 90%) providing that failures are heralded; meanwhile the rate of unknown errors should not exceed 2 in 10(4) operations. PMID:21231569

Li, Ying; Barrett, Sean D; Stace, Thomas M; Benjamin, Simon C

2010-12-17

253

Fault Tolerant Control in a Semi-active Suspension  

Digital Repository Infrastructure Vision for European Research (DRIVER)

A Fault Tolerant Control System (FTCS) in a Quarter of Vehicle (QoV ) model is proposed. The control law is time-varying using a Linear Parameter-Varying (LPV ) based controller, which includes two scheduling parameters. One parameter for monitoring the nonlinear behavior of the damper, and another for fault accommodation using a reference model obtained by a state observer of the normal operating regime. The QoV model represents a semi-active suspension, including an experimental magneto-rhe...

Tudon-mart?nez, Juan C.; Morales-mene?ndez, Rube?n; Ramirez-mendoza, Ricardo; Sename, Olivier; Dugard, Luc

2012-01-01

254

Design of fault-tolerant inductive position sensor  

Energy Technology Data Exchange (ETDEWEB)

The position sensors used in a magnetic bearing system are desirable to provide some degree of fault-tolerance as the rotor position is necessary for the feedback control to overcome the open-loop instability. In this paper, we propose and inductive position sensor that can cope with a partial fault in the sensor. The sensor has multiple poles which can be combined to sense the in-plane motion of the rotor. When a high-frequency voltage signal drives each pole of the sensor, the resulting current in the sensor coil contains information regarding the rotor position. The signal processing circuit of the sensor extracts this position information. In this paper, we used the magnetic circuit model of the sensor that shows the analytical relationship between the sensor output and the rotor motion. The multi-polar structure of the sensor makes it possible to introduce redundancy which can be exploited for fault-tolerant operation. The proposed sensor is applied to a magnetically levitated turbo-molecular vacuum pump. Experimental results validate the fault-tolerance algorithm.

Paek, Sung Kuk; Noh, Myoung Gyu [Chungnam National University, Daejeon (Korea, Republic of); Park, Byeong Cheol [Korea Electric Power Research Institute, Daejeon (Korea, Republic of)

2008-03-15

255

Fault tolerant high-performance PACS network design and implementation  

Science.gov (United States)

The Wake Forest University School of Medicine and the Wake Forest University/Baptist Medical Center (WFUBMC) are implementing a second generation PACS. The first generation PACS provided helpful information about the functional and temporal requirements of the system. It highlighted the importance of image retrieval speed, system availability, RIS/HIS integration, the ability to rapidly view images on any PACS workstation, network bandwidth, equipment redundancy, and the ability for the system to evolve using standards-based components. This paper deals with the network design and implementation of the PACS. The physical layout of the hospital areas served by the PACS, the choice of network equipment and installation issues encountered are addressed. Efforts to optimize fault tolerance are discussed. The PACS network is a gigabit, mixed-media network based on LAN emulation over ATM (LANE) with a rapid migration from LANE to Multiple Protocols Over ATM (MPOA) planned. Two fault-tolerant backbone ATM switches serve to distribute network accesses with two load-balancing 622 megabit per second (Mbps) OC-12 interconnections. The switch was sized to be upgradable to provide a 2.54 Gbps OC-48 interconnection with an OC-12 interconnection as a load-balancing backup. Modalities connect with legacy network interface cards to a switched-ethernet device. This device has two 155 Mbps OC-3 load-balancing uplinks to each of the backbone ATM switches of the PACS. This provides a fault-tolerant logical connection to the modality servers which pass verified DICOM images to the PACS servers and proper PACS diagnostic workstations. Where fiber pulls were prohibitively expensive, edge ATM switches were installed with an OC-12 uplink to a backbone ATM switches. The PACS and data base servers are fault-tolerant, hot-swappable Sun Enterprise Servers with an OC-12 connection to a backbone ATM switch and a fast-ethernet connection to a back-up network. The workstations come with 10/100 BASET autosense cards. A redundant switched-ethernet network will be installed to provide yet another degree of network fault-tolerance. The switched-ethernet devices are connected to each of the backbone ATM switches with two-load-balancing OC-3 connections to provide fault-tolerant connectivity in the event of a primary network failure.

Chimiak, William J.; Boehme, Johannes M.

1998-07-01

256

Fault-tolerant distributed mass storage for LHC computing  

CERN Document Server

In this paper we present the concept and first prototyping results of a modular fault-tolerant distributed mass storage architecture for large Linux PC clusters as they are deployed by the upcoming particle physics experiments. The device masquerading technique using an Enhanced Network Block Device (ENBD) enables local RAID over remote disks as the key concept of the ClusterRAID system. The block level interface to remote files, partitions or disks provided by the ENBD makes it possible to use the standard Linux software RAID to add fault-tolerance to the system. Preliminary performance measurements indicate that the latency is comparable to a local hard drive. With four disks throughput rates of up to 55MB/s were achieved with first prototypes for a RAIDO setup, and about 40M/s for a RAID5 setup. (29 refs).

Wiebalck, A; Lindenstruth, V; Stinbeck, T M

2003-01-01

257

On Reliability Analysis of Fault-tolerant Multistage Interconnection Networks  

Directory of Open Access Journals (Sweden)

Full Text Available The design of a suitable interconnection network for inter-processor communication is one of the key issues of the system performance. The reliability of these networks and their ability to continue operating despite failures are major concerns in determining the overall system performance. In this paper a new irregular network IABN has been proposed modifying existing ABN network. ABN is a regular multipath network with limited fault tolerance. The reliabilities of the IABN and ABN multi-stage interconnection networks have been calculated and compared in terms of the Upper and Lower bounds of Mean time to failure (MTTF.The IABN is a network that provides much better fault-tolerance by providing three time more paths between any pair of source-destination and better reliability at the expanse of little more cost than ABN.

Rinkle Aggarwal

2008-11-01

258

Checkpoint-based Intelligent Fault tolerance For Cloud Service Providers  

Directory of Open Access Journals (Sweden)

Full Text Available With the increasing demand and benefits of cloud computing infrastructure, real time computing can be performed on cloud infrastructure. A real time system can take advantage of intensive computing capabilities and scalable virtualized environment of cloud computing to execute real time tasks. In most of the real time cloud applications, processing is done on remote cloud computing nodes. So there are more chances of errors, due to the undetermined latency and loose control over computing node. On the other side, most of the real time systems are also safety critical and should be highly reliable. So there is an increased requirement for fault tolerance to achieve reliability for the real time computing on cloud Infrastructure. In this paper, proposes a smart checkpoint infrastructure for virtualized service providers and fault tolerance model for real time cloud computing. The checkpoints are stored in a Hadoop Distributed File System. This allows resuming a task execution faster after a node crash and increasing the fault tolerance of the system, since checkpoints are distributed and replicated in all the nodes of the provider. This paper presents a running implementation of this infrastructure and its evaluation, demonstrating that it is an effective way to make faster checkpoints with low interference on task execution and efficient task recovery after a node failure.One advantage of cloud computing is the dynamicity of re- source provisioning. Our architecture makes use of this advantage by enabling dynamic run- time modi?cations of replication groups

Rejin Paul

2012-12-01

259

Design of Reliable Adaptive Filter with Fault Tolerance Using DSP  

Energy Technology Data Exchange (ETDEWEB)

LSM algorithm has been used for plant identifier and noise cancellation. This algorithm has been researched for performance enhancement of filtering. The design and development of a reliable system has been becoming a key issue in industry field because the reliability of a system is considered as an important factor to perform the system's function successfully. And the computing with reliability and fault tolerance is a important factor in the case of aviation, system communication, and nuclear plant. This paper presents design of reliable adaptive filter with fault tolerance. Generally, redundancy is used for reliability. In this case it needs computing or circuit for voting mechanism or computing for fault detection or switching part. But this presented Filter is not in need of computing for voting mechanism, or fault detection. Therefore it has simple computing , and practicality for application. And in this paper, reliability of adaptive filter is analyzed. The effectiveness of the proposed adaptive filter is demonstrated to the case studies of plant identifier and noise cancellation by using DSP. (author). 9 refs., 18 figs.

Ryoo, D. W.; Lee, J. W. [Electronics and Telecommunications Research Institute, Taejon (Korea); Seo, B. H. [Kyungbok National University, Taegu (Korea)

2001-01-01

260

Data Driven Fault Tolerant Control: A Subspace Approach:  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The main stream research on fault detection and fault tolerant control has been focused on model based methods. As far as a model is concerned, changes therein due to faults have to be extracted from measured data. Generally speaking, existing approaches process measured inputs and outputs either by a filter designed based on a known model (e.g. for additive faults), or by an identification scheme to estimate the changed model parameters (e.g. due to multiplicative faults). Since the classica...

Dong, J.

2009-01-01

 
 
 
 
261

On the Transition Improvement of EV or HEV Induction Motor Propulsion Sensor Fault-Tolerant Controller  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This technical paper deals with the transition performance improvement of a sensor fault-tolerant controller devoted to Electric (EV) or Hybrid Electric Vehicles (HEV). Indeed, improvements are brought over a previously developed technique that exhibit abrupt changes in the torque if a sensor fault is detected and after a transition from a control technique to another one [1]. The Fault-Tolerant Control (FTC) system firstly concerns the sliding mode control technique since better performances...

Tabbache, Bekheira; Benbouzid, Mohamed; Kheloui, Abdelaziz

2010-01-01

262

Proactive Fault Tolerance for HPC with Xen Virtualization  

Energy Technology Data Exchange (ETDEWEB)

with thousands of processors. At such large counts of compute nodes, faults are becoming common place. Current techniques to tolerate faults focus on reactive schemes to recover from faults and generally rely on a checkpoint/restart mechanism. Yet, in today's systems, node failures can often be anticipated by detecting a deteriorating health status. Instead of a reactive scheme for fault tolerance (FT), we are promoting a proactive one where processes automatically migrate from ?unhealthy? nodes to healthy ones. Our approach relies on operating system virtualization techniques exemplied by but not limited to Xen. This paper contributes an automatic and transparent mechanism for proactive FT for arbitrary MPI applications. It leverages virtualization techniques combined with health monitoring and load-based migration. We exploit Xen's live migration mechanism for a guest operating system (OS) to migrate an MPI task from a health-deteriorating node to a healthy one without stopping the MPI task during most of the migration. Our proactive FT daemon orchestrates the tasks of health monitoring, load determination and initiation of guest OS migration. Experimental results demonstrate that live migration hides migration costs and limits the overhead to only a few seconds making it an attractive approach to realize FT in HPC systems. Overall, our enhancements make proactive FT a valuable asset for long-running MPI application that is complementary to reactive FT using full checkpoint/ restart schemes since checkpoint frequencies can be reduced as fewer unanticipated failures are encountered. In the context of OS virtualization, we believe that this is the rst comprehensive study of proactive fault tolerance where live migration is actually triggered by health monitoring.

Nagarajan, Arun Babu [North Carolina State University; Mueller, Frank [North Carolina State University; Engelmann, Christian [ORNL; Scott, Stephen L [ORNL

2007-01-01

263

Rule-based fault diagnosis of hall sensors and fault-tolerant control of PMSM  

Science.gov (United States)

Hall sensor is widely used for estimating rotor phase of permanent magnet synchronous motor(PMSM). And rotor position is an essential parameter of PMSM control algorithm, hence it is very dangerous if Hall senor faults occur. But there is scarcely any research focusing on fault diagnosis and fault-tolerant control of Hall sensor used in PMSM. From this standpoint, the Hall sensor faults which may occur during the PMSM operating are theoretically analyzed. According to the analysis results, the fault diagnosis algorithm of Hall sensor, which is based on three rules, is proposed to classify the fault phenomena accurately. The rotor phase estimation algorithms, based on one or two Hall sensor(s), are initialized to engender the fault-tolerant control algorithm. The fault diagnosis algorithm can detect 60 Hall fault phenomena in total as well as all detections can be fulfilled in 1/138 rotor rotation period. The fault-tolerant control algorithm can achieve a smooth torque production which means the same control effect as normal control mode (with three Hall sensors). Finally, the PMSM bench test verifies the accuracy and rapidity of fault diagnosis and fault-tolerant control strategies. The fault diagnosis algorithm can detect all Hall sensor faults promptly and fault-tolerant control algorithm allows the PMSM to face failure conditions of one or two Hall sensor(s). In addition, the transitions between health-control and fault-tolerant control conditions are smooth without any additional noise and harshness. Proposed algorithms can deal with the Hall sensor faults of PMSM in real applications, and can be provided to realize the fault diagnosis and fault-tolerant control of PMSM.

Song, Ziyou; Li, Jianqiu; Ouyang, Minggao; Gu, Jing; Feng, Xuning; Lu, Dongbin

2013-07-01

264

Fault Detection for Shipboard Monitoring and Decision Support Systems  

DEFF Research Database (Denmark)

In this paper a basic idea of a fault-tolerant monitoring and decision support system will be explained. Fault detection is an important part of the fault-tolerant design for in-service monitoring and decision support systems for ships. In the paper, a virtual example of fault detection will be presented for a containership with a real decision support system onboard. All possible faults can be simulated and detected using residuals and the generalized likelihood ratio (GLR) algorithm.

Lajic, Zoran; Nielsen, Ulrik Dam

2009-01-01

265

Fault Detection for Shipboard Monitoring and Decision Support Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper a basic idea of a fault-tolerant monitoring and decision support system will be explained. Fault detection is an important part of the fault-tolerant design for in-service monitoring and decision support systems for ships. In the paper, a virtual example of fault detection will be presented for a containership with a real decision support system onboard. All possible faults can be simulated and detected using residuals and the generalized likelihood ratio (GLR) algorithm.

2008-01-01

266

Fault Tolerance in ZigBee Wireless Sensor Networks  

Science.gov (United States)

Wireless sensor networks (WSN) based on the IEEE 802.15.4 Personal Area Network standard are finding increasing use in the home automation and emerging smart energy markets. The network and application layers, based on the ZigBee 2007 PRO Standard, provide a convenient framework for component-based software that supports customer solutions from multiple vendors. This technology is supported by System-on-a-Chip solutions, resulting in extremely small and low-power nodes. The Wireless Connections in Space Project addresses the aerospace flight domain for both flight-critical and non-critical avionics. WSNs provide the inherent fault tolerance required for aerospace applications utilizing such technology. The team from Ames Research Center has developed techniques for assessing the fault tolerance of ZigBee WSNs challenged by radio frequency (RF) interference or WSN node failure.

Alena, Richard; Gilstrap, Ray; Baldwin, Jarren; Stone, Thom; Wilson, Pete

2011-01-01

267

Fault-tolerance techniques for SRAM-based FPGAs  

CERN Document Server

Fault-tolerance in integrated circuits is no longer the exclusive concern of space designers or highly-reliable applications engineers. Today, designers of many next-generation products must cope with reduced margin noises. The continuous evolution of fabrication technology of semiconductor components – shrinking transistor geometry, power supply, speed, and logic density – has significantly reduced the reliability of very deep submicron integrated circuits, in face of various internal and external sources of noise. Field Programmable Gate Arrays (FPGAs), customizable by SRAM cells, are the latest advance in the integrated circuit evolution: millions of memory cells to implement the logic, embedded memories, routing, and embedded microprocessors cores. These re-programmable systems-on-chip platforms must be fault-tolerant to cope with current requirements.

Kastensmidt, Fernanda Lima; Reis, Ricardo

2006-01-01

268

Design and Analysis of a Fault Tolerant Microprocessor Based on Triple Modular Redundancy Using VHDL  

Directory of Open Access Journals (Sweden)

Full Text Available There are numerous real time & operation critical systems in which the failure of the system is unacceptable at any stage of processing. The examples of such systems are like ATM machines, satellites, spacecraft etc. In this paper a fault tolerant microprocessor is developed by using checker units with a fault secure ALU and to develop a fault secure ALU the parity prediction logic and two rail checker method was used. Finally triple modular redundancy is applied to develop a fault tolerant processor. Proposed method was validated using the VHDL test environment and the results showed that the reliability of the system increased with a little area overhead.

Deepti Shinghal

2011-03-01

269

Documentation of the current fault detection, isolation and reconfiguration software of the AIPS fault-tolerant processor  

Science.gov (United States)

Documentation is presented of the December 1986 version of the ADA code for the fault detection, isolation, and reconfiguration (FDIR) functions of the Advanced Information processing System (AIPS) Fault-Tolerant Processor (FTP). Because the FTP is still under development and the software is constantly undergoing changes, this should not be considered final documentation of the FDIR software of the FTP.

Lanning, David T.; Shepard, Allen W.; Johnson, Sally C.

1987-01-01

270

Redundant finite rings for fault-tolerant signal processors  

Science.gov (United States)

Redundant Residue Number Systems (RRNS) have been proposed as suitable candidates for fault tolerance in compute intensive applications. The redundancy is based on multiple projections to moduli sub-sets and conducting a search for results that lie in a so-called illegitimate range. This paper presents RRNS fault tolerant procedures for a recently introduced finite polynomial ring mapping procedure (modulus replication RNS). The mapping technique dispenses with the need for many relatively prime ring moduli, which is a major draw-back with conventional RRNS systems. Although double, triple, and quadrupole modular redundancy can be implemented in the polynomial mapping structure, polynomial coefficient circuitry, or the independent direct product ring computational channels, for error detection and/or correction, this paper discusses the implementation of redundant rings which are generated by (1) redundant residues, (2) spare general computational channels, or (3) a combination of the two. The first architecture is suitable for RNS embedding in the MRRNS, and the second for single moduli mappings. The combination architecture allows a trade-off between the two extremes. The application area is in fault tolerant compute intensive DSP arrays.

Jullien, Graham A.; Bizzan, S. S.; Wigley, Neil M.; Miller, W. C.

1994-10-01

271

Active Fault Tolerant Control for Ultrasonic Piezoelectric Motor  

Science.gov (United States)

Ultrasonic piezoelectric motor technology is an important system component in integrated mechatronics devices working on extreme operating conditions. Due to these constraints, robustness and performance of the control interfaces should be taken into account in the motor design. In this paper, we apply a new architecture for a fault tolerant control using Youla parameterization for an ultrasonic piezoelectric motor. The distinguished feature of proposed controller architecture is that it shows structurally how the controller design for performance and robustness may be done separately which has the potential to overcome the conflict between performance and robustness in the traditional feedback framework. A fault tolerant control architecture includes two parts: one part for performance and the other part for robustness. The controller design works in such a way that the feedback control system will be solely controlled by the proportional plus double-integral PI2 performance controller for a nominal model without disturbances and H? robustification controller will only be activated in the presence of the uncertainties or an external disturbances. The simulation results demonstrate the effectiveness of the proposed fault tolerant control architecture.

Boukhnifer, Moussa

2012-07-01

272

A Framework-Based Approach for Fault-Tolerant Service Robots  

Directory of Open Access Journals (Sweden)

Full Text Available Recently the component?based approach has become a major trend in intelligent service robot development due to its reusability and productivity. The framework in a component?based system should provide essential services for application components. However, to our knowledge the existing robot frameworks do not yet support fault tolerance service. Moreover, it is often believed that faults can be handled only at the application level. In this paper, by extending the robot framework with the fault tolerance function, we argue that the framework?based fault tolerance approach is feasible and even has many benefits, including that: 1 the system integrators can build fault tolerance applications from non?fault?aware components; 2 the constraints of the components and the operating environment can be considered at the time of integration, which ? cannot be anticipated eaily at the time of component development; 3 consistency in system reliability can be obtained even in spite of diverse application component sources. In the proposed construction, we build XML rule files defining the rules for probing and determining the fault conditions of each component, contamination cases from a faulty component, and the possible recovery and safety methods. The rule files are established by a system integrator and the fault manager in the framework controls the fault tolerance process according to the rules. We demonstrate that the fault?tolerant framework can incorporate widely accepted fault tolerance techniques. The effectiveness and real?time performance of the framework?based approach and its techniques are examined by testing an autonomous mobile robot in typical fault scenarios.

Heejune Ahn

2012-11-01

273

Tolerance towards sensor faults: An application to a flexible arm manipulator  

Digital Repository Infrastructure Vision for European Research (DRIVER)

As more engineering operations become automatic, the need for robustness towards faults increases. Hence, a fault tolerant control (FTC) scheme is a valuable asset. This paper presents a robust sensor fault FTC scheme implemented on a flexible arm manipulator, which has many applications in automation. Sensor faults affect the system's performance in the closed loop when the faulty sensor readings are used to generate the control input. In this paper, the non-faulty sensors are used to recons...

Chee Pin Tan; Habib, Maki K.

2008-01-01

274

Analysis of a cascaded multilevel inverter with fault-tolerant control  

Directory of Open Access Journals (Sweden)

Full Text Available Cascaded multilevel inverters are widely used in industry for speed control of induction motors and, even when the converters’ operation is highly reliable, several faults can occur, leading to poor engine performance or even causing the whole system to stop. It is desirable to keep the system operational when a failure occurs, even when degraded, and implementing fault-tolerant systems are thus a good choice. This paper presents a general strategy for fault-tolerant control in a 7-level cascaded multilevel inverter (the faults are in semiconductor devices; the paper includes simulation and experimental results to validate the method.

Jesús Aguayo Alquicira

2011-08-01

275

Optimized Nanometric Fault Tolerant Reversible BCD Adder  

Directory of Open Access Journals (Sweden)

Full Text Available In this study a novel nanometric fault tolerant quantum and reversible binary coded decimal adder is proposed. Reversible logic has found emerging attentions in optical information processing, quantum computing, nanotechnology and low power design. BCD Adder is a combinational circuit that can be used for the addition of two numbers in BCD arithmetic's. The proposed reversible BCD adder has also parity preserving property. It is better than all the existing counterparts. The proposed circuit is optimized. It is compared with the existing circuits in terms of number of constant inputs, number of garbage outputs, quantum cost and hardware complexity. All of the parameters are improved dramatically. It is to be noted that all the circuits have nanometric scales.

Majid Haghparast

2012-01-01

276

A fault-tolerant attitude control system for a satellite based on fuzzy global sliding mode control algorithm  

Science.gov (United States)

An effective approach for fault diagnosis of aeroengine based on integration of wavelet analysis and neural networks is presented. The wavelet transform can accurately localizes the characteristics of a signal in time-frequency domains and in a view of the inter relationship of wavelet transform between exponent theory, the whole and local exponents obtained from wavelet transform coefficients as features are presented for extracting fault signals, which are inputted into radial basis function for fault pattern recognition. The fault diagnosis model of aero-engine is established and the improved Levenberg-Marquardt training algorithm is used to fulfill the network structure and parameter identification. By choosing enough samples to train the fault diagnosis network and the information representing the faults input into the neural network, the fault pattern can be determined. The robustness of wavelet neural network for fault diagnosis is discussed. The practical fault diagnosis for aeroengine vibration approves to be accurate and comprehensive.

Liang, Jinjin; Dong, Chaoyang; Wang, Qing

2008-11-01

277

Fault diagnosis and fault-tolerant control and guidance for aerospace vehicles from theory to application  

CERN Document Server

Fault Diagnosis and Fault-Tolerant Control and Guidance for Aerospace demonstrates the attractive potential of recent developments in control for resolving such issues as improved flight performance, self-protection and extended life of structures. Importantly, the text deals with a number of practically significant considerations: tuning, complexity of design, real-time capability, evaluation of worst-case performance, robustness in harsh environments, and extensibility when development or adaptation is required. Coverage of such issues helps to draw the advanced concepts arising from academic research back towards the technological concerns of industry. Initial coverage of basic definitions and ideas and a literature review gives way to a treatment of important electrical flight control system failures: the oscillatory failure case, runaway, and jamming. Advanced fault detection and diagnosis for linear and nonlinear systems are described. Lastly recovery strategies appropriate to remaining acuator/sensor/c...

Zolghadri, Ali; Cieslak, Jerome; Efimov, Denis; Goupil, Philippe

2014-01-01

278

A Fault-Tolerant Control Architecture for Induction Motor Drives in Automotive Applications  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper describes a fault-tolerant control system for a high-performance induction motor drive that propels an electrical vehicle (EV) or hybrid electric vehicle (HEV). In the proposed control scheme, the developed system takes into account the controller transition smoothness in the event of sensor failure. Moreover, due to the EV or HEV requirements for sensorless operations, a practical sensorless control scheme is developed and used within the proposed fault-tolerant control system. Th...

Diallo, Demba; Benbouzid, Mohamed; Makouf, Abdessalam

2004-01-01

279

Fault tolerant wind speed estimator used in wind turbine controllers  

DEFF Research Database (Denmark)

Advanced control schemes can be used to optimize energy production and cost of energy in modern wind turbines. These control schemes most often rely on wind speed estimations. These designs of wind speed estimators are, however, not designed to be fault tolerant towards faults in the used sensors. In this paper a fault tolerant wind speed estimator is designed based on a set of unknown input observers, each designed to the different sets of non-faulty sensors. Faults in the rotor, generator and wind speed sensors are considered. The designed wind speed estimator is passive tolerant towards faults in the wind speed sensors, and faults in the generator and rotor speed sensors are accommodated by an active fault tolerant observer scheme in which the faults are detected and identified, and the observer corresponding to the non-faulty sensors are used. The potential of the scheme is shown by applying the proposed wind speed estimator to a simulation model of a wind turbine. Notice that since the faults are accommodated in the observer scheme the actual controller do not need to be adjusted or reconfigured to accommodate the sensor faults.

Odgaard, Peter Fogh; Stoustrup, Jakob

2012-01-01

280

An approach to the verification of a fault-tolerant, computer-based reactor safety system: A case study using automated reasoning: Volume 1: Interim report  

International Nuclear Information System (INIS)

The purpose of this project is to explore the feasibility of automating the verification process for computer systems. The intent is to demonstrate that both the software and hardware that comprise the system meet specified availability and reliability criteria, that is, total design analysis. The approach to automation is based upon the use of Automated Reasoning Software developed at Argonne National Laboratory. This approach is herein referred to as formal analysis and is based on previous work on the formal verification of digital hardware designs. Formal analysis represents a rigorous evaluation which is appropriate for system acceptance in critical applications, such as a Reactor Safety System (RSS). This report describes a formal analysis technique in the context of a case study, that is, demonstrates the feasibility of applying formal analysis via application. The case study described is based on the Reactor Safety System (RSS) for the Experimental Breeder Reactor-II (EBR-II). This is a system where high reliability and availability are tantamount to safety. The conceptual design for this case study incorporates a Fault-Tolerant Processor (FTP) for the computer environment. An FTP is a computer which has the ability to produce correct results even in the presence of any single fault. This technology was selected as it provides a computer-based equivalent to the traditional analog based RSSs. This provides a more conservative design constraint than that imposed by the IEEE Standard, Criteria For Protection Systems For Nuclear Power Generating Stations (ANSI N42.7-1972)

1987-01-01

 
 
 
 
281

An Active Fault-Tolerant Control Method Ofunmanned Underwater Vehicles with Continuous and Uncertain Faults  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper introduces a novel thruster fault diagnosis and accommodation system for open-frame underwater vehicles with abrupt faults. The proposed system consists of two subsystems: a fault diagnosis subsystem and a fault accommodation sub-system. In the fault diagnosis subsystem a ICMAC(Improved Credit Assignment Cerebellar Model Articulation Controllers) neural network is used to realize the on-line fault identification and the weighting matrix computation. The fault accommodation subsyste...

Daqi Zhu; Qian Liu; Yongsheng Yang

2008-01-01

282

Modeling and measurement of fault-tolerant multiprocessors  

Science.gov (United States)

The workload effects on computer performance are addressed first for a highly reliable unibus multiprocessor used in real-time control. As an approach to studing these effects, a modified Stochastic Petri Net (SPN) is used to describe the synchronous operation of the multiprocessor system. From this model the vital components affecting performance can be determined. However, because of the complexity in solving the modified SPN, a simpler model, i.e., a closed priority queuing network, is constructed that represents the same critical aspects. The use of this model for a specific application requires the partitioning of the workload into job classes. It is shown that the steady state solution of the queuing model directly produces useful results. The use of this model in evaluating an existing system, the Fault Tolerant Multiprocessor (FTMP) at the NASA AIRLAB, is outlined with some experimental results. Also addressed is the technique of measuring fault latency, an important microscopic system parameter. Most related works have assumed no or a negligible fault latency and then performed approximate analyses. To eliminate this deficiency, a new methodology for indirectly measuring fault latency is presented.

Shin, K. G.; Woodbury, M. H.; Lee, Y. H.

1985-01-01

283

Formal validation of fault-tolerance mechanisms inside GUARDS  

International Nuclear Information System (INIS)

In this paper we report the experiments carried out during the specification and validation of the fault-tolerance mechanisms developed in the European project Generic Upgradable Architecture for Real-time Dependable Systems (GUARDS). These mechanisms are the components of an architecture developed for embedded safety-critical systems. The validation approach is based on model-checking techniques and exploits the verification methodology supported by the Just Another Concurrency Kit (JACK) environment. The properties that guarantee the desired behaviour of the mechanisms are specified as temporal logic formulae; the JACK model-checker is then used to verify that the behaviour of the mechanisms satisfy such properties also in the presence of faults

2001-03-01

284

Fault Tolerance In Grid Computing: State of the Art and Open Issues  

Directory of Open Access Journals (Sweden)

Full Text Available Fault tolerance is an important property for large scale computational grid systems, wheregeographically distributed nodes co-operate to execute a task. In order to achieve high level of reliabilityand availability, the grid infrastructure should be a foolproof fault tolerant. Since the failure of resourcesaffects job execution fatally, fault tolerance service is essential to satisfy QOS requirement in gridcomputing. Commonly utilized techniques for providing fault tolerance are job checkpointing andreplication. Both techniques mitigate the amount of work lost due to changing system availability but canintroduce significant runtime overhead. The latter largely depends on the length of checkpointing intervaland the chosen number of replicas, respectively. In case of complex scientific workflows where tasks canexecute in well defined order reliability is another biggest challenge because of the unreliable nature ofthe grid resources.

Ritu Garg

2011-02-01

285

Fault Tolerant Neuro-Robust Position Control of DC Motors  

Directory of Open Access Journals (Sweden)

Full Text Available DC motors are widely used in industry such as mechanics, robotics, and aerospace engineering. In this paper, we present a high performance control method for position control of DC motors. Fault-tolerant control model are also addressed to combine with neuro-robust control approach. It is shown that with the proposed control algorithms, external disturbances and coupled dynamics inherent in the system are effectively compensated using neural network unit in which no analytical estimation on the upper bound of the reconstruction error and uncertainties is needed. Simulations on various flight conditions also confirm the effectiveness of the proposed methods.

Ran Zhang

2011-10-01

286

Exact Regenerating Codes for Byzantine Fault Tolerance in Distributed Storage  

CERN Document Server

Due to the use of commodity software and hardware, crash-stop and Byzantine failures are likely to be more prevalent in today's large-scale distributed storage systems. Regenerating codes have been shown to be a more efficient way to disperse information across multiple nodes and recover crash-stop failures in the literature. In this paper, we present the design of regeneration codes in conjunction with integrity check that allows exact regeneration of failed nodes and data reconstruction in presence of Byzantine failures. A progressive decoding mechanism is incorporated in both procedures to leverage computation performed thus far. The fault-tolerance and security properties of the schemes are also analyzed.

Han, Yunghsiang S; Mow, Wai Ho

2011-01-01

287

Fault Tolerant CII Middle ware for Wide Area Monitoring ,Control and Protection in Realistic Operational Environments  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Fault tolerance and dependability are of high importance to the information infrastructure that supports and controls the operation of the Critical Infrastructure such as the electrical power grid and telecommunication systems due to their vital role in the proper function of society and economy. This thesis examines GridStat, an established middleware approach for providing fault tolerance and dependability and performs a constructive evaluation of its architectural design. It, then, builds ...

Kakavas, Ioannis

2012-01-01

288

Wind turbine fault detection and fault tolerant control : An enhanced benchmark challenge  

DEFF Research Database (Denmark)

In this updated edition of a previous wind turbine fault detection and fault tolerant control challenge, we present a more sophisticated wind turbine model and updated fault scenarios to enhance the realism of the challenge and therefore the value of the solutions. This paper describes the challenge model and the requirements for challenge participants. In addition, it motivates many of the faults by citing publications that give field data from wind turbine control tests.

Odgaard, Peter Fogh; Johnson, Kathryn

2013-01-01

289

Wind turbine fault detection and fault tolerant control : a second challenge  

DEFF Research Database (Denmark)

In this updated edition of a previous wind turbine fault detection and fault tolerant control challenge, we present a more sophisticated wind turbine model and updated fault scenarios to enhance the realism of the challenge and therefore the value of the solutions. This paper describes the challenge model and the requirements for challenge participants. In addition, it motivates many of the faults by citing publications that give field data from wind turbine control tests.

Odgaard, Peter Fogh; Johnson, Kathryn

2013-01-01

290

An extended induction motor model for investigation of faulted machines and fault tolerant variable speed drives  

Digital Repository Infrastructure Vision for European Research (DRIVER)

High performance variable speed induction motor drives have been commercially available for industrial applications for many years. More recently they have been proposed for applications such as hybrid automotive drives, and some pump applications on more electric aircraft. These applications will require the drive to operate in the presence of faults i.e. they must be “Fault Tolerant” and be capable of “Fault Ride Through”. The aim of this project was therefore to investigate fault r...

Jasim, Omar

2010-01-01

291

A universal, fault-tolerant, non-linear analytic network for modeling and fault detection  

Energy Technology Data Exchange (ETDEWEB)

The similarities and differences of a universal network to normal neural networks are outlined. The description and application of a universal network is discussed by showing how a simple linear system is modeled by normal techniques and by universal network techniques. A full implementation of the universal network as universal process modeling software on a dedicated computer system at EBR-II is described and example results are presented. It is concluded that the universal network provides different feature recognition capabilities than a neural network and that the universal network can provide extremely fast, accurate, and fault-tolerant estimation, validation, and replacement of signals in a real system.

Mott, J.E. (Advanced Modeling Techniques Corp., Idaho Falls, ID (United States)); King, R.W.; Monson, L.R.; Olson, D.L.; Staffon, J.D. (Argonne National Lab., Idaho Falls, ID (United States))

1992-03-06

292

A universal, fault-tolerant, non-linear analytic network for modeling and fault detection  

International Nuclear Information System (INIS)

The similarities and differences of a universal network to normal neural networks are outlined. The description and application of a universal network is discussed by showing how a simple linear system is modeled by normal techniques and by universal network techniques. A full implementation of the universal network as universal process modeling software on a dedicated computer system at EBR-II is described and example results are presented. It is concluded that the universal network provides different feature recognition capabilities than a neural network and that the universal network can provide extremely fast, accurate, and fault-tolerant estimation, validation, and replacement of signals in a real system

1992-05-27

293

Fault tolerant control based on set-theoretic methods.  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The scope of the thesis is the analysis and design of fault tolerant control (FTC) schemes through the use of set-theoretic methods. In the framework of multisensor schemes, the faults appearance and the modalities to accurately detect them are investigated as well as the design of control laws which assure the closed-loop stability. By using invariant/contractive sets to describe the residual signals, a fault detection and isolation (FDI) mechanism with reduced computational demands is imple...

2011-01-01

294

A Tool for Assessing Fault Tolerance Mechanisms applied to Web Service applications  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Testing Fault Tolerance Mechanisms (FTM's) is crucial for the development of today's Web Service applications. In this work, we propose a methodology for assessing the efficacy of FTMs applied to Web services applications distributed over the Internet. We present a tool that uses application level fault injection techniques to inject communication faults by using a network Emulator Service. The emulator also generates additional workload on the tested system in order to produce more realistic...

2009-01-01

295

Architectural concepts and redundancy techniques in fault-tolerant computers  

Science.gov (United States)

This paper presents a description of redundancy techniques employed in the design of fault-tolerant computers, and a discussion of the effects of functional requirements, technology constraints, and cost considerations which enter into the choice of these techniques. The STAR computer, developed at the Jet Propulsion Laboratory for long-duration planetary spacecraft missions, is discussed along with several later fault-tolerant computer designs. The class of computers described in this paper employs dynamic redundancy, i.e., the machine is divided into a set of submodules, each with standby spares; a special hard core monitor unit detects and diagnoses faults, and effects automated recovery by replacing failed parts.

Rennels, D. A.

1974-01-01

296

Fault tolerance through reconfiguration in VLSI and WSI arrays  

Energy Technology Data Exchange (ETDEWEB)

This book discusses the research in fault tolerance. The authors focus in particular on reconfiguration techniques and present their results in the reconfiguration of processing arrays. Contents include: Introduction; Typical Processing Arrays; Failure Mechanisms and Fault Models; Basic Problems of Fault-Tolerance Through Array Configuration; Technologies Supporting Reconfiguration; Testing; Reconfiguration: An Introduction; The Diogenes Approach; Reconfiguration for Linear Arrays; Graph-Theoretical Approaches to Reconfiguration; Local Reconfiguration; Global Reconfiguration Techniques: Row/Column Elimination; Global Mapping: Index Mapping Reconfiguration Techniques; Reconfiguration Based on Request-Acknowledge Local Protocols; Reconfiguration of Multiple-Pipeline Structures; Some Extensions Toward Time-Redundancy; Appendix: Reliability Prediction of Arrays.

Negrini, R.; Sami, M.G.; Stefanelli, R. (Politecnico di Milano (IT))

1989-01-01

297

A Remote Characterization System and a fault-tolerant tracking system for subsurface mapping of buried waste sites  

Energy Technology Data Exchange (ETDEWEB)

This paper describes two closely related projects that will provide new technology for characterizing hazardous waste burial sites. The first project, a collaborative effort by five of the national laboratories, involves the development and demonstration of a remotely controlled site characterization system. The Remote Characterization System (RCS) includes a unique low-signature survey vehicle, a base station, radio telemetry data links, satellite-based vehicle tracking, stereo vision, and sensors for noninvasive inspection of the surface and subsurface. The second project, conducted by the Idaho National Engineering Laboratory (INEL), involves the development of a position sensing system that can track a survey vehicle or instrument in the field. This system can coordinate updates at a rate of 200/s with an accuracy better than 0.1% of the distance separating the target and the sensor. It can employ acoustic or electromagnetic signals in a wide range of frequencies and can be operated as a passive or active device.

Sandness, G.A.; Bennett, D.W. (Pacific Northwest Lab., Richland, WA (United States)); Martinson, L. (Westinghouse Idaho Nuclear Co., Inc., Idaho Falls, ID (United States)); Bingham, D.N.; Anderson, A.A. (EG and G Idaho, Inc., Idaho Falls, ID (United States))

1992-08-01

298

Quantitative fault tolerant control design for a hydraulic actuator with a leaking piston seal  

Science.gov (United States)

Hydraulic actuators are complex fluid power devices whose performance can be degraded in the presence of system faults. In this thesis a linear, fixed-gain, fault tolerant controller is designed that can maintain the positioning performance of an electrohydraulic actuator operating under load with a leaking piston seal and in the presence of parametric uncertainties. Developing a control system tolerant to this class of internal leakage fault is important since a leaking piston seal can be difficult to detect, unless the actuator is disassembled. The designed fault tolerant control law is of low-order, uses only the actuator position as feedback, and can: (i) accommodate nonlinearities in the hydraulic functions, (ii) maintain robustness against typical uncertainties in the hydraulic system parameters, and (iii) keep the positioning performance of the actuator within prescribed tolerances despite an internal leakage fault that can bypass up to 40% of the rated servovalve flow across the actuator piston. Experimental tests verify the functionality of the fault tolerant control under normal and faulty operating conditions. The fault tolerant controller is synthesized based on linear time-invariant equivalent (LTIE) models of the hydraulic actuator using the quantitative feedback theory (QFT) design technique. A numerical approach for identifying LTIE frequency response functions of hydraulic actuators from acceptable input-output responses is developed so that linearizing the hydraulic functions can be avoided. The proposed approach can properly identify the features of the hydraulic actuator frequency response that are important for control system design and requires no prior knowledge about the asymptotic behavior or structure of the LTIE transfer functions. A distributed hardware-in-the-loop (HIL) simulation architecture is constructed that enables the performance of the proposed fault tolerant control law to be further substantiated, under realistic operating conditions. Using the HIL framework, the fault tolerant hydraulic actuator is operated as a flight control actuator against the real-time numerical simulation of a high-performance jet aircraft. A robust electrohydraulic loading system is also designed using QFT so that the in-flight aerodynamic load can be experimentally replicated. The results of the HIL experiments show that using the fault tolerant controller to compensate the internal leakage fault at the actuator level can benefit the flight performance of the airplane.

Karpenko, Mark

299

Design of neuro fuzzy fault tolerant control using an adaptive observer  

International Nuclear Information System (INIS)

New methodologies and concepts are developed in the control theory to meet the ever-increasing demands in industrial applications. Fault detection and diagnosis of technical processes have become important in the course of progressive automation in the operation of groups of electric drives. When a group of electric drives is under operation, fault tolerant control becomes complicated. For multiple motors in operation, fault detection and diagnosis might prove to be difficult. Estimation of all states and parameters of all drives is necessary to analyze the actuator and sensor faults. To maintain system reliability, detection and isolation of failures should be performed quickly and accurately, and hardware should be properly integrated. Luenberger full order observer can be used for estimation of the entire states in the system for the detection of actuator and sensor failures. Due to the insensitivity of the Luenberger observer to the system parameter variations, state estimation becomes inaccurate under the varying parameter conditions of the drives. Consequently, the estimation performance deteriorates, resulting in ordinary state observers unsuitable for fault detection technique. Therefore an adaptive observe, which can estimate the system states and parameter and detect the faults simultaneously, is designed in our paper. For a Group of D C drives, there may be parameter variations for some of the drives, and for other drives, there may not be parameter variations depending on load torque, friction, etc. So, estimation of all states and parameters of all drives is carried out using an adaptive observer. If there is any deviation with the estimated values, it is understood that fault has occurred and the nature of the fault, whether sensor fault or actuator fault, is determined by neural fuzzy network, and fault tolerant control is reconfigured. Experimental results with neuro fuzzy system using adaptive observer-based fault tolerant control are good, so as to confirm the best characteristics of the proposed approach

2001-01-01

300

Coordinated Fault-Tolerance for High-Performance Computing Final Project Report  

Energy Technology Data Exchange (ETDEWEB)

With the Coordinated Infrastructure for Fault Tolerance Systems (CIFTS, as the original project came to be called) project, our aim has been to understand and tackle the following broad research questions, the answers to which will help the HEC community analyze and shape the direction of research in the field of fault tolerance and resiliency on future high-end leadership systems. Will availability of global fault information, obtained by fault information exchange between the different HEC software on a system, allow individual system software to better detect, diagnose, and adaptively respond to faults? If fault-awareness is raised throughout the system through fault information exchange, is it possible to get all system software working together to provide a more comprehensive end-to-end fault management on the system? What are the missing fault-tolerance features that widely used HEC system software lacks today that would inhibit such software from taking advantage of systemwide global fault information? What are the practical limitations of a systemwide approach for end-to-end fault management based on fault awareness and coordination? What mechanisms, tools, and technologies are needed to bring about fault awareness and coordination of responses on a leadership-class system? What standards, outreach, and community interaction are needed for adoption of the concept of fault awareness and coordination for fault management on future systems? Keeping our overall objectives in mind, the CIFTS team has taken a parallel fourfold approach. Our central goal was to design and implement a light-weight, scalable infrastructure with a simple, standardized interface to allow communication of fault-related information through the system and facilitate coordinated responses. This work led to the development of the Fault Tolerance Backplane (FTB) publish-subscribe API specification, together with a reference implementation and several experimental implementations on top of existing publish-subscribe tools. We enhanced the intrinsic fault tolerance capabilities representative implementations of a variety of key HPC software subsystems and integrated them with the FTB. Targeting software subsystems included: MPI communication libraries, checkpoint/restart libraries, resource managers and job schedulers, and system monitoring tools. Leveraging the aforementioned infrastructure, as well as developing and utilizing additional tools, we have examined issues associated with expanded, end-to-end fault response from both system and application viewpoints. From the standpoint of system operations, we have investigated log and root cause analysis, anomaly detection and fault prediction, and generalized notification mechanisms. Our applications work has included libraries for fault-tolerance linear algebra, application frameworks for coupled multiphysics applications, and external frameworks to support the monitoring and response for general applications. Our final goal was to engage the high-end computing community to increase awareness of tools and issues around coordinated end-to-end fault management.

Panda, Dhabaleswar Kumar [The Ohio State University; Beckman, Pete

2011-07-01

 
 
 
 
301

Run-Through Stabilization: An MPI Proposal for Process Fault Tolerance  

Energy Technology Data Exchange (ETDEWEB)

The MPI standard lacks semantics and interfaces for sustained application execution in the presence of process failures. Exascale HPC systems may require scalable, fault resilient MPI applications. The mission of the MPI Forum's Fault Tolerance Working Group is to enhance the standard to enable the development of scalable, fault tolerant HPC applications. This paper presents an overview of the Run-Through Stabilization proposal. This proposal allows an application to continue execution even if MPI processes fail during execution. The discussion introduces the implications on point-to-point and collective operations over communicators, though the full proposal addresses all aspects of the MPI standard.

Hursey, Joshua J [ORNL; Graham, Richard L [ORNL; Bronevetsky, Greg [Lawrence Livermore National Laboratory (LLNL); Butinas, Darius [Argonne National Laboratory (ANL); Pritchard, Howard [Cray, Inc.; Solt, David G. [Hewlett-Packard

2011-01-01

302

Block QCA Fault-Tolerant Logic Gates  

Science.gov (United States)

Suitably patterned arrays (blocks) of quantum-dot cellular automata (QCA) have been proposed as fault-tolerant universal logic gates. These block QCA gates could be used to realize the potential of QCA for further miniaturization, reduction of power consumption, increase in switching speed, and increased degree of integration of very-large-scale integrated (VLSI) electronic circuits. The limitations of conventional VLSI circuitry, the basic principle of operation of QCA, and the potential advantages of QCA-based VLSI circuitry were described in several NASA Tech Briefs articles, namely Implementing Permutation Matrices by Use of Quantum Dots (NPO-20801), Vol. 25, No. 10 (October 2001), page 42; Compact Interconnection Networks Based on Quantum Dots (NPO-20855) Vol. 27, No. 1 (January 2003), page 32; Bit-Serial Adder Based on Quantum Dots (NPO-20869), Vol. 27, No. 1 (January 2003), page 35; and Hybrid VLSI/QCA Architecture for Computing FFTs (NPO-20923), which follows this article. To recapitulate the principle of operation (greatly oversimplified because of the limitation on space available for this article): A quantum-dot cellular automata contains four quantum dots positioned at or between the corners of a square cell. The cell contains two extra mobile electrons that can tunnel (in the quantummechanical sense) between neighboring dots within the cell. The Coulomb repulsion between the two electrons tends to make them occupy antipodal dots in the cell. For an isolated cell, there are two energetically equivalent arrangements (denoted polarization states) of the extra electrons. The cell polarization is used to encode binary information. Because the polarization of a nonisolated cell depends on Coulomb-repulsion interactions with neighboring cells, universal logic gates and binary wires could be constructed, in principle, by arraying QCA of suitable design in suitable patterns. Heretofore, researchers have recognized two major obstacles to realization of QCA-based logic gates: One is the need for (and the difficulty of attaining) operation of QCA circuitry at room temperature or, for that matter, at any temperature above a few Kelvins. It has been theorized that room-temperature operation could be made possible by constructing QCA as molecular-scale devices. However, in approaching the lower limit of miniaturization at the molecular level, it becomes increasingly imperative to overcome the second major obstacle, which is the need for (and the difficulty of attaining) high precision in the alignments of adjacent QCA in order to ensure the correct interactions among the quantum dots.

Firjany, Amir; Toomarian, Nikzad; Modarres, Katayoon

2003-01-01

303

Formal verification of fault-tolerance using theorem-proving techniques  

Energy Technology Data Exchange (ETDEWEB)

With the increasing interest in applying artificial intelligence techniques to problems in design automation attention has been directed toward developing additional approaches to verify properties of digital systems. Properties of interest would include functionality, timing behavior, and fault-tolerance capabilities. This paper describes a formal verification system based on the use of automated reasoning techniques to validate fault-tolerance. A Petri net representation will be described together with the theorem-proving implementation of rule-based system for manipulating system descriptions. Digital systems extracted from the literature are used to illustrate the representation and the capabilities of the formal verification system under development. 69 refs., 13 figs., 1 tab.

Kljaich, J. Jr.; Smith, B.T.; Wojcik, A.S.

1989-01-01

304

MCNP load balancing and fault tolerance with PVM  

International Nuclear Information System (INIS)

Version 4A of the Monte Carlo neutron, photon, and electron transport code MCNP, developed by LANL (Los Alamos National Laboratory), supports distributed-memory multiprocessing through the software package PVM (Parallel Virtual Machine, version 3.1.4). Using PVM for interprocessor communication, MCNP can simultaneously execute a single problem on a cluster of UNIX-based workstations. This capability provided system efficiencies that exceeded 80% on dedicated workstation clusters, however, on heterogeneous or multiuser systems, the performance was limited by the slowest processor (i.e., equal work was assigned to each processor). The next public release of MCNP will provide multiprocessing enhancements that include load balancing and fault tolerance which are shown to dramatically increase multiuser system efficiency and reliability

1995-11-01

305

MCNP load balancing and fault tolerance with PVM  

Energy Technology Data Exchange (ETDEWEB)

Version 4A of the Monte Carlo neutron, photon, and electron transport code MCNP, developed by LANL (Los Alamos National Laboratory), supports distributed-memory multiprocessing through the software package PVM (Parallel Virtual Machine, version 3.1.4). Using PVM for interprocessor communication, MCNP can simultaneously execute a single problem on a cluster of UNIX-based workstations. This capability provided system efficiencies that exceeded 80% on dedicated workstation clusters, however, on heterogeneous or multiuser systems, the performance was limited by the slowest processor (i.e., equal work was assigned to each processor). The next public release of MCNP will provide multiprocessing enhancements that include load balancing and fault tolerance which are shown to dramatically increase multiuser system efficiency and reliability.

McKinney, G.W.

1995-07-01

306

Active and Passive Fault-Tolerant LPV Control of Wind Turbines  

DEFF Research Database (Denmark)

This paper addresses the design and comparison of active and passive fault-tolerant linear parameter-varying (LPV) controllers for wind turbines. The considered wind turbine plant model is characterized by parameter variations along the nominal operating trajectory and includes a model of an incipient fault in the pitch system. We propose the design of an active fault-tolerant controller (AFTC) based on an existing LPV controller design method and extend this method to apply for the design of a passive fault-tolerant controller (PFTC). Both controllers are based on output feedback and are scheduled on the varying parameter to manage the parametervarying nature of the model. The PFTC only relies on measured system variables and an estimated wind speed, while the AFTC also relies on information from a fault diagnosis system. Consequently, the optimization problem involved in designing the PFTC is more difficult to solve, as it involves solving bilinear matrix inequalities (BMIs) instead of linear matrix inequalities (LMIs). Simulation results show the performance of the active faulttolerant control system to be slightly superior to that of the passive fault-tolerant control system.

Sloth, Christoffer; Esbensen, Thomas

2010-01-01

307

FPGA-Based, Self-Checking, Fault-Tolerant Computers  

Science.gov (United States)

A proposed computer architecture would exploit the capabilities of commercially available field-programmable gate arrays (FPGAs) to enable computers to detect and recover from bit errors. The main purpose of the proposed architecture is to enable fault-tolerant computing in the presence of single-event upsets (SEUs). [An SEU is a spurious bit flip (also called a soft error) caused by a single impact of ionizing radiation.] The architecture would also enable recovery from some soft errors caused by electrical transients and, to some extent, from intermittent and permanent (hard) errors caused by aging of electronic components. A typical FPGA of the current generation contains one or more complete processor cores, memories, and highspeed serial input/output (I/O) channels, making it possible to shrink a board-level processor node to a single integrated-circuit chip. Custom, highly efficient microcontrollers, general-purpose computers, custom I/O processors, and signal processors can be rapidly and efficiently implemented by use of FPGAs. Unfortunately, FPGAs are susceptible to SEUs. Prior efforts to mitigate the effects of SEUs have yielded solutions that degrade performance of the system and require support from external hardware and software. In comparison with other fault-tolerant- computing architectures (e.g., triple modular redundancy), the proposed architecture could be implemented with less circuitry and lower power demand. Moreover, the fault-tolerant computing functions would require only minimal support from circuitry outside the central processing units (CPUs) of computers, would not require any software support, and would be largely transparent to software and to other computer hardware. There would be two types of modules: a self-checking processor module and a memory system (see figure). The self-checking processor module would be implemented on a single FPGA and would be capable of detecting its own internal errors. It would contain two CPUs executing identical programs in lock step, with comparison of their outputs to detect errors. It would also contain various cache local memory circuits, communication circuits, and configurable special-purpose processors that would use self-checking checkers. (The basic principle of the self-checking checker method is to utilize logic circuitry that generates error signals whenever there is an error in either the checker or the circuit being checked.) The memory system would comprise a main memory and a hardware-controlled check-pointing system (CPS) based on a buffer memory denoted the recovery cache. The main memory would contain random-access memory (RAM) chips and FPGAs that would, in addition to everything else, implement double-error-detecting and single-error-correcting memory functions to enable recovery from single-bit errors.

Some, Raphael; Rennels, David

2004-01-01

308

Buffered coscheduling for parallel programming and enhanced fault tolerance  

Science.gov (United States)

A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval. The buffered coscheduling method of this invention also enhances the fault tolerance of a network of parallel machine processors or distributed system processors

Petrini, Fabrizio (Los Alamos, NM); Feng, Wu-chun (Los Alamos, NM)

2006-01-31

309

Simulating chemistry efficiently on fault-tolerant quantum computers  

CERN Document Server

Quantum computers can in principle simulate quantum physics exponentially faster than their classical counterparts, but some technical hurdles remain. Here we consider methods to make proposed chemical simulation algorithms computationally fast on fault-tolerant quantum computers in the circuit model. Fault tolerance constrains the choice of available gates, so that arbitrary gates required for a simulation algorithm must be constructed from sequences of fundamental operations. We examine techniques for constructing arbitrary gates which perform substantially faster than circuits based on the conventional Solovay-Kitaev algorithm [C.M. Dawson and M.A. Nielsen, \\emph{Quantum Inf. Comput.}, \\textbf{6}:81, 2006]. For a given approximation error $\\epsilon$, arbitrary single-qubit gates can be produced fault-tolerantly and using a limited set of gates in time which is $O(\\log \\epsilon)$ or $O(\\log \\log \\epsilon)$; with sufficient parallel preparation of ancillas, constant average depth is possible using a method w...

Jones, N Cody; McMahon, Peter L; Yung, Man-Hong; Van Meter, Rodney; Aspuru-Guzik, Alán; Yamamoto, Yoshihisa

2012-01-01

310

A benchmark for fault tolerant flight control evaluation  

Science.gov (United States)

A large transport aircraft simulation benchmark (REconfigurable COntrol for Vehicle Emergency Return - RECOVER) has been developed within the GARTEUR (Group for Aeronautical Research and Technology in Europe) Flight Mechanics Action Group 16 (FM-AG(16)) on Fault Tolerant Control (2004 2008) for the integrated evaluation of fault detection and identification (FDI) and reconfigurable flight control strategies. The benchmark includes a suitable set of assessment criteria and failure cases, based on reconstructed accident scenarios, to assess the potential of new adaptive control strategies to improve aircraft survivability. The application of reconstruction and modeling techniques, based on accident flight data, has resulted in high-fidelity nonlinear aircraft and fault models to evaluate new Fault Tolerant Flight Control (FTFC) concepts and their real-time performance to accommodate in-flight failures.

Smaili, H.; Breeman, J.; Lombaerts, T.; Stroosma, O.

2013-12-01

311

A Fault Tolerance Management Framework for Wireless Sensor Networks  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Wireless Sensor Networks (WSNs) have the potential of significantly enhancing our ability to monitor and interact with our physical environment. Realizing a fault tolerant operation is critical to the success of WSNs. The main challenge is providing fault tolerance (FT) while conserving the limited resources ...

Iman Salehy; Mohamed Eltoweissy; Adnan Agbariax; Hesham El-Sayed

2007-01-01

312

A Novel Nanometric Fault Tolerant Reversible Subtractor Circuit  

Directory of Open Access Journals (Sweden)

Full Text Available Reversibility plays an important role when energy efficient computations are considered. Reversible logic circuits have received significant attention in quantum computing, low power CMOS design, optical information processing and nanotechnology in the recent years. This study proposes a new fault tolerant reversible half-subtractor and a new fault tolerant reversible full-subtractor circuit with nanometric scales. Also in this paper we demonstrate how the well-known and important, PERES gate and TR gate can be synthesized from parity preserving reversible gates. All the designs have nanometric scales.

Mozhgan Shiri

2012-11-01

313

Fault tolerant homopolar magnetic bearings with flux invariant control  

International Nuclear Information System (INIS)

The theory for a novel fault-tolerant 4-active-pole homopolar magnetic bearing is developed. If any one coil of the four coils in the bearing actuator fail, the remaining three coil currents change via an optimal distribution matrix such that the same opposing pole, C-core type, control fluxes as those of the un-failed bearing are produced. The homopolar magnetic bearing thus provides unaltered magnetic forces without any loss of the bearing load capacity even if any one coil suddenly fails. Numerical examples are provided to illustrate the novel fault-tolerant, 4-active pole homopolar magnetic bearings

2006-05-01

314

Production of Reliable Flight Crucial Software: Validation Methods Research for Fault Tolerant Avionics and Control Systems Sub-Working Group Meeting  

Science.gov (United States)

The state of the art in the production of crucial software for flight control applications was addressed. The association between reliability metrics and software is considered. Thirteen software development projects are discussed. A short term need for research in the areas of tool development and software fault tolerance was indicated. For the long term, research in format verification or proof methods was recommended. Formal specification and software reliability modeling, were recommended as topics for both short and long term research.

Dunham, J. R. (editor); Knight, J. C. (editor)

1982-01-01

315

Improvement of Matrix Converter Drive Reliability by Online Fault Detection and a Fault-Tolerant Switching Strategy.  

DEFF Research Database (Denmark)

The matrix converter system is becoming a very promising candidate to replace the conventional two-stage ac/dc/ac converter, but system reliability remains an open issue. The most common reliability problem is that a bidirectional switch has an open-switch fault during operation. In this paper, a matrix converter driving a speed-controlled permanent-magnet synchronous motor is examined under a single open-switch fault. First, a new fault-detection method is proposed using only the motor currents. Second, a novel fault-tolerant switching strategy is presented. By treating the matrix converter as a two-stage rectifier/inverter, existing modulation techniques for the inverter stage can be reused, whereas the rectifier stage is modified by control to counteract the fault. However, the proposed techniques require no additional hardware devices or circuit modifications to the matrix converter. Experimental results show that the proposed method can maintain the motor speed with a maximum ripple of 2%â??a fivefold improvement over the uncompensated system. The proposed method therefore offers a very economical and effective solution for the matrix converter fault tolerance problem.

Nguyen-Duy, Khiem

2011-01-01

316

Fault tolerant coverage and connectivity in presence of channel randomness.  

Science.gov (United States)

Some applications of wireless sensor network require K-coverage and K-connectivity to ensure the system to be fault tolerance and to make it more reliable. Therefore, it makes coverage and connectivity an important issue in wireless sensor networks. In this paper, we proposed K-coverage and K-connectivity models for wireless sensor networks. In both models, nodes are distributed according to Poisson distribution in the sensor field. To make the proposed model more realistic we used log-normal shadowing path loss model to capture the radio irregularities and studied its impact on K-coverage and K-connectivity. The value of K can be different for different types of applications. Further, we also analyzed the problem of node failure for K-coverage model. In the simulation section, results clearly show that coverage and connectivity of wireless sensor network depend on the node density, shadowing parameters like the path loss exponent, and standard deviation. PMID:24574922

Sagar, Anil Kumar; Lobiyal, D K

2014-01-01

317

Strategic Planning for Fault-Tolerant Internet Connectivity Using Basic Fault-Tolerant Architectural Design as Platform  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Present focus in this study is to provide Internet connectivity without any interruption even at the presence of faults/failures thereby enhancing Internet services performance. To achieve this, the deployment and redeployment of faulty component(s) are done using Basic Fault-Tolerant (BFT) architectural design. A framework to provide enhanced performance in terms of confidentiality, integrity and availability in clusters is suggested using BFT, considering all sources of vulnerabilities incl...

Adeosun, O. O.; Adagunodo, E. R.; Adetunde, I. A.; Adeosun, T. H.

2008-01-01

318

Analysis of GPS Abnormal Conditions within Fault Tolerant Control Laws  

Science.gov (United States)

The Global Position System (GPS) is a critical element for the functionality of autonomous flying vehicles. The GPS operation at normal and abnormal conditions directly impacts the trajectory tracking performance of the autonomous Unmanned Aerial Vehicles (UAVs) controllers. The effects of GPS parameter variation must be well understood and user-friendly computational tools must be developed to facilitate the design and evaluation of fault tolerant control laws. This thesis presents the development of a simplified GPS error model in Matlab/Simulink and its use performing a sensitivity analysis of GPS parameters effect under system normal and abnormal operation on different UAV trajectory tracking controllers. The model statistically generates position and velocity errors, simulates the effect of GPS satellite configuration on the position and velocity measurement accuracy, and implements a set of failures to the GPS readings. The model and its graphical user interface was integrated within the WVU UAV simulation environment as a masked Simulink block. The effects on the controllers' trajectory tracking performance of the following GPS parameters were investigated within normal operation ranges and outside: time delay, update rate, error standard deviation, bias, and major position and velocity failures. Several sets of control laws with fixed and adaptive parameters and of different levels of complexity have been used in this investigation. A complex performance index formulated in terms of tracking errors and control activity was used for control laws performance evaluation. The composition of various metrics within the performance index was performed using fixed and variable weights depending on the local characteristics of the commanded trajectory. This study has revealed that GPS error parameters have a significant impact on control laws performance. The proposed GPS model has proved to be a valuable, flexible tool for testing and evaluation of the fault tolerant capabilities of autonomous flight control laws.

Al-Sinbol, Gahssan

319

A New and Efficient Algorithm-Based Fault Tolerance Scheme for A Million Way Parallelism  

CERN Multimedia

Fault tolerance overhead of high performance computing (HPC) applications is becoming critical to the efficient utilization of HPC systems at large scale. HPC applications typically tolerate fail-stop failures by checkpointing. Another promising method is in the algorithm level, called algorithmic recovery. These two methods can achieve high efficiency when the system scale is not very large, but will both lose their effectiveness when systems approach the scale of Exaflops, where the number of processors including in system is expected to achieve one million. This paper develops a new and efficient algorithm-based fault tolerance scheme for HPC applications. When failure occurs during the execution, we do not stop to wait for the recovery of corrupted data, but replace them with the corresponding redundant data and continue the execution. A background accelerated recovery method is also proposed to rebuild redundancy to tolerate multiple times of failures during the execution. To demonstrate the feasibility ...

Yao, Erlin; Wang, Rui; Zhang, Wenli; Tan, Guangming

2011-01-01

320

Particle Filter Based Fault-tolerant ROV Navigation using Hydro-acoustic Position and Doppler Velocity Measurements  

DEFF Research Database (Denmark)

This paper presents a fault tolerant navigation system for a remotely operated vehicle (ROV). The navigation system uses hydro-acoustic position reference (HPR) and Doppler velocity log (DVL) measurements to achieve an integrated navigation. The fault tolerant functionality is based on a modied particle lter. This particle lter is able to run in an asynchronous manner to accommodate the measurement drop out problem, and it overcomes the measurement outliers by switching observation models. Simulations with experimental data show that this fault tolerant navigation system can accurately estimate the ROV kinematic states, even when sensor failures appear frequently.

Zhao, Bo; Blanke, Mogens

2012-01-01

 
 
 
 
321

Tolerance towards sensor faults: An application to a flexible arm manipulator  

Directory of Open Access Journals (Sweden)

Full Text Available As more engineering operations become automatic, the need for robustness towards faults increases. Hence, a fault tolerant control (FTC scheme is a valuable asset. This paper presents a robust sensor fault FTC scheme implemented on a flexible arm manipulator, which has many applications in automation. Sensor faults affect the system's performance in the closed loop when the faulty sensor readings are used to generate the control input. In this paper, the non-faulty sensors are used to reconstruct the faults on the potentially faulty sensors. The reconstruction is subtracted from the faulty sensors to form a compensated `virtual sensor' and this signal (instead of the normally used faulty sensor output is then used to generate the control input. A design method is also presented in which the FTC scheme is made insensitive to any system uncertainties. Two fault conditions are tested; total failure and incipient faults. Then the scheme robustness is tested by implementing the flexible joint's FTC scheme on a flexible link, which has different parameters. Excellent results have been obtained for both cases (joint and link; the FTC scheme caused the system performance is almost identical to the fault-free scenario, whilst providing an indication that a fault is present, even for simultaneous faults.

Chee Pin Tan

2008-11-01

322

Diogenes approach to testable fault-tolerant arrays of processors  

Energy Technology Data Exchange (ETDEWEB)

A strategy for designing testable fault-tolerant arrays of processors is described by a series of examples. The strategy achieves fault tolerance by introducing redundancy in an array's communication links rather than in its processing elements (PEs). The major characteristics of the designs produced are as follows. (1) testability: the designs always afford isolation and scan-in scan-out capabilities for each PE. (2) Simplicity of configuration: the process of programming an array to its fault-free format consists only of setting a few variables ( = control lines) per PE. (3) Dynamic fault tolerance: the settings of variables can be altered at any time. (4) Transparency to PE designer: transforming the design of an array of PEs to a diogenes design of the array involves changing only the communication links of array, leaving the PEs and their interfaces unchanged. (5) Area-efficiency: the designs produced by the strategy are often (asymptotically) optimal in area. (6) Regularity and modularity: fault-laden chips can easily be interconnected to build an array of the desired size. (7) Speed: diogenes layouts need never have signal wires travel more than the width of a single PE without being enhanced; thus PE failures cannot cause arbitrarily long unenhanced runs of wire. 31 references.

Rosenberg, A.L.

1983-10-01

323

Fault-tolerant quantum computing with color codes  

CERN Document Server

We present and analyze protocols for fault-tolerant quantum computing using color codes. We present circuit-level schemes for extracting the error syndrome of these codes fault-tolerantly. We further present an integer-program-based decoding algorithm for identifying the most likely error given the syndrome. We simulated our syndrome extraction and decoding algorithms against three physically-motivated noise models using Monte Carlo methods, and used the simulations to estimate the corresponding accuracy thresholds for fault-tolerant quantum error correction. We also used a self-avoiding walk analysis to lower-bound the accuracy threshold for two of these noise models. We present and analyze two architectures for fault-tolerantly computing with these codes: one with 2D arrays of qubits are stacked atop each other and one in a single 2D substrate. Our analysis demonstrates that color codes perform slightly better than Kitaev's surface codes when circuit details are ignored. When these details are considered, w...

Landahl, Andrew J; Rice, Patrick R

2011-01-01

324

Fault Tolerant Congestion based Algorithms in OBS Network  

Directory of Open Access Journals (Sweden)

Full Text Available In Optical Burst Switched networks, each light path carry huge amount of traffic, path failures maydamage the user application. Hence fault-tolerance becomes an important issue on these networks.Blocking probability is a key index of quality of service in Optical Burst Switched (OBS network. TheErlang formula has been used extensively in the traffic engineering of optical communication to calculatethe blocking probability. The paper revisits burst contention resolution problems in OBS networks. Whenthe network is overloaded, no contention resolution scheme would effectively avoid the collision andcause blocking. It is important to first decide, a good routing algorithm and then to choose a wavelengthassignment scheme. In this paper we have developed two algorithms, Fault Tolerant Optimized BlockingAlgorithm (FTOBA and Fault Tolerant Least Congestion Algorithm (FTLCA and then compare theperformance of these algorithms on the basis of blocking probability. These algorithms are based uponthe congestion on path in OBS network and based on the simulation results, we shows that the reliableand fault tolerant routing algorithms reduces the blocking probability.

Hardeep Singh, Dr.Jai Prakash, Dinesh Arora & Dr.Amit Wason

2011-12-01

325

Fault Tolerant Message Efficient Coordinator Election Algorithm in High Traffic Bidirectional Ring Network  

Directory of Open Access Journals (Sweden)

Full Text Available Nowadays use of distributed systems such as internet and cloud computing is growing dramatically. Coordinator existence in these systems is crucial due to processes coordinating and consistency requirement as well. However the growth makes their election algorithm even more complicated. Too many algorithms are proposed in this area but the two most well known one are Bully and Ring. In this paper we propose a fault tolerant coordinator election algorithm in typical bidirectional ring topology which is twice as fast as Ring algorithm although far fewer messages are passing due to election. Fault tolerance technique is applied which leads the waiting time for the election reaching to zero.

Danial Rahdari

2012-12-01

326

Fault Diagnosis and Accommodation of LTI systems by modified Youla parameterization  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper an Active Fault Tolerant Control (FTC) scheme is proposed for Linear Time Invariant (LTI) systems, which achieves fault diagnosis followed by fault accommodation. The fault diagnosis scheme is carried out in two steps; Fault detection followed by Fault isolation. Fault detection filter use the sensor measurements to generate residuals, which have a unique static pattern in response to each fault. Distortion in these static patterns generates the probability of the presence of fa...

2012-01-01

327

Design and Analysis of Linear Fault-Tolerant Permanent-Magnet Vernier Machines  

Science.gov (United States)

This paper proposes a new linear fault-tolerant permanent-magnet (PM) vernier (LFTPMV) machine, which can offer high thrust by using the magnetic gear effect. Both PMs and windings of the proposed machine are on short mover, while the long stator is only manufactured from iron. Hence, the proposed machine is very suitable for long stroke system applications. The key of this machine is that the magnetizer splits the two movers with modular and complementary structures. Hence, the proposed machine offers improved symmetrical and sinusoidal back electromotive force waveform and reduced detent force. Furthermore, owing to the complementary structure, the proposed machine possesses favorable fault-tolerant capability, namely, independent phases. In particular, differing from the existing fault-tolerant machines, the proposed machine offers fault tolerance without sacrificing thrust density. This is because neither fault-tolerant teeth nor the flux-barriers are adopted. The electromagnetic characteristics of the proposed machine are analyzed using the time-stepping finite-element method, which verifies the effectiveness of the theoretical analysis.

Xu, Liang; Liu, Guohai; Du, Yi; Liu, Hu

2014-01-01

328

Reversible Logic Synthesis of Fault Tolerant Carry Skip BCD Adder  

CERN Multimedia

Reversible logic is emerging as an important research area having its application in diverse fields such as low power CMOS design, digital signal processing, cryptography, quantum computing and optical information processing. This paper presents a new 4*4 parity preserving reversible logic gate, IG. The proposed parity preserving reversible gate can be used to synthesize any arbitrary Boolean function. It allows any fault that affects no more than a single signal readily detectable at the circuit's primary outputs. It is shown that a fault tolerant reversible full adder circuit can be realized using only two IGs. The proposed fault tolerant full adder (FTFA) is used to design other arithmetic logic circuits for which it is used as the fundamental building block. It has also been demonstrated that the proposed design offers less hardware complexity and is efficient in terms of gate count, garbage outputs and constant inputs than the existing counterparts.

Islam, Md Saiful; 10.3329/jbas.v32i2.2431

2010-01-01

329

A Framework-Based Approach for Fault-Tolerant Service Robots  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Recently the component?based approach has become a major trend in intelligent service robot development due to its reusability and productivity. The framework in a component?based system should provide essential services for application components. However, to our knowledge the existing robot frameworks do not yet support fault tolerance service. Moreover, it is often believed that faults can be handled only at the application level. In this paper, by extending the robot framework with th...

2012-01-01

330

Combining dynamical decoupling with fault-tolerant quantum computation  

Energy Technology Data Exchange (ETDEWEB)

We study how dynamical decoupling (DD) pulse sequences can improve the reliability of quantum computers. We prove upper bounds on the accuracy of DD-protected quantum gates and derive sufficient conditions for DD-protected gates to outperform unprotected gates. Under suitable conditions, fault-tolerant quantum circuits constructed from DD-protected gates can tolerate stronger noise and have a lower overhead cost than fault-tolerant circuits constructed from unprotected gates. Our accuracy estimates depend on the dynamics of the bath that couples to the quantum computer and can be expressed either in terms of the operator norm of the bath's Hamiltonian or in terms of the power spectrum of bath correlations; we explain in particular how the performance of recursively generated concatenated pulse sequences can be analyzed from either viewpoint. Our results apply to Hamiltonian noise models with limited spatial correlations.

Ng, Hui Khoon; Preskill, John [Institute for Quantum Information, California Institute of Technology, Pasadena, California 91125 (United States); Lidar, Daniel A. [Departments of Electrical Engineering, Chemistry, and Physics, and Center for Quantum Information Science and Technology, University of Southern California, Los Angeles, California 90089 (United States)

2011-07-15

331

Fault-tolerant Algorithms for Tick-Generation in Asynchronous Logic: Robust Pulse Generation  

CERN Multimedia

Today's hardware technology presents a new challenge in designing robust systems. Deep submicron VLSI technology introduced transient and permanent faults that were never considered in low-level system designs in the past. Still, robustness of that part of the system is crucial and needs to be guaranteed for any successful product. Distributed systems, on the other hand, have been dealing with similar issues for decades. However, neither the basic abstractions nor the complexity of contemporary fault-tolerant distributed algorithms match the peculiarities of hardware implementations. This paper is intended to be part of an attempt striving to overcome this gap between theory and practice for the clock synchronization problem. Solving this task sufficiently well will allow to build a very robust high-precision clocking system for hardware designs like systems-on-chips in critical applications. As our first building block, we describe and prove correct a novel Byzantine fault-tolerant self-stabilizing pulse syn...

Dolev, Danny; Lenzen, Christoph; Schmid, Ulrich

2011-01-01

332

An Active Fault-Tolerant Control Method Ofunmanned Underwater Vehicles with Continuous and Uncertain Faults  

Directory of Open Access Journals (Sweden)

Full Text Available This paper introduces a novel thruster fault diagnosis and accommodation system for open-frame underwater vehicles with abrupt faults. The proposed system consists of two subsystems: a fault diagnosis subsystem and a fault accommodation sub-system. In the fault diagnosis subsystem a ICMAC(Improved Credit Assignment Cerebellar Model Articulation Controllers neural network is used to realize the on-line fault identification and the weighting matrix computation. The fault accommodation subsystem uses a control algorithm based on weighted pseudo-inverse to find the solution of the control allocation problem. To illustrate the proposed method effective, simulation example, under multi-uncertain abrupt faults, is given in the paper.

Yongsheng Yang

2008-11-01

333

Refinement for fault-tolerance: An aircraft hand-off protocol  

Science.gov (United States)

Part of the Advanced Automation System (AAS) for air-traffic control is a protocol to permit flight hand-off from one air-traffic controller to another. The protocol must be fault-tolerant and, therefore, is subtle -- an ideal candidate for the application of formal methods. This paper describes a formal method for deriving fault-tolerant protocols that is based on refinement and proof outlines. The AAS hand-off protocol was actually derived using this method; that derivation is given.

Marzullo, Keith; Schneider, Fred B.; Dehn, Jon

1994-01-01

334

CPN based fault-tolerance performance evaluation of fieldbus for KNGR NPCS network  

International Nuclear Information System (INIS)

In contrast with conventional Fieldbus researches which are focused on real time performanc ignoring fault-tolerant mechanisms, the aim of this work is real-time performance evaluation of the system including fault. Because the communication network will be applied to Next Generation NPP, maintaining performance in presence of recoverable fault is important. To guarantee this in NPP Control Network, we should investigate the time characteristics of the target system in case of recoverable fault. If the time characteristics meet the requirements of the system, the faults will be recovered by Fieldbus recovery mechanisms and the system will be safe. But, if time characteristics can not meet the requirements, the faults in the Fieldbus can propagate to system failure. For this purpose, we classified the recoverable faults, made the formula which represents delays including recovery mechaisms and made simulation model. We appied the simulation model to KNGR NPCS with some assumptions. The outcome of the simulation is reallistic delays of the fault cases which have been classified. From the outcome of the simulation and the system requirements, we can calculate failure propagation probability from Fieldbus to outer system

1998-10-01

335

Fault Tolerant Wind Farm Control : a Benchmark Model  

DEFF Research Database (Denmark)

This paper presents a test benchmark model for the evaluation of fault detection and accommodation schemes. This benchmark model deals with the wind turbine on a system level, and it includes sensor, actuator, and system faults, namely faults in the pitch system, the drive train, the generator, and the converter system. Since it is a system-level model, converter and pitch system models are simplified because these are controlled by internal controllers working at higher frequencies than the system model. The model represents a three-bladed pitch-controlled variable-speed wind turbine with a nominal power of 4.8 MW. The fault detection and isolation (FDI) problem was addressed by several teams, and five of the solutions are compared in the second part of this paper. This comparison relies on additional test data in which the faults occur in different operating conditions than in the test data used for the FDI design.

Odgaard, Peter Fogh; Stoustrup, Jakob

2013-01-01

336

Fault Tolerant Control of Wind Turbines : A benchmark model  

DEFF Research Database (Denmark)

This paper presents a test benchmark model for the evaluation of fault detection and accommodation schemes. This benchmark model deals with the wind turbine on a system level, and it includes sensor, actuator, and system faults, namely faults in the pitch system, the drive train, the generator, and the converter system. Since it is a system-level model, converter and pitch system models are simplified because these are controlled by internal controllers working at higher frequencies than the system model. The model represents a three-bladed pitch-controlled variable-speed wind turbine with a nominal power of 4.8 MW. The fault detection and isolation (FDI) problem was addressed by several teams, and five of the solutions are compared in the second part of this paper. This comparison relies on additional test data in which the faults occur in different operating conditions than in the test data used for the FDI design.

Odgaard, Peter Fogh; Stoustrup, Jakob

2013-01-01

337

Review of fault diagnosis and fault-tolerant control for modular multilevel converter of HVDC  

DEFF Research Database (Denmark)

This review focuses on faults in Modular Multilevel Converter (MMC) for use in high voltage direct current (HVDC) systems by analyzing the vulnerable spots and failure mechanism from device to system and illustrating the control & protection methods under failure condition. At the beginning, several typical topologies of MMC-HVDC systems are presented. Then fault types such as capacitor voltage unbalance, unbalance between upper and lower arm voltage are analyzed and the corresponding fault detection and diagnosis approaches are explained. In addition, more attention is dedicated to control strategies, when running in MMC faults or grid faults. This paper ends up with a discussion of other opportunities for future development.

Liu, Hui; Loh, Poh Chiang

2013-01-01

338

Making classical ground-state spin computing fault-tolerant  

Science.gov (United States)

We examine a model of classical deterministic computing in which the ground state of the classical system is a spatial history of the computation. This model is relevant to quantum dot cellular automata as well as to recent universal adiabatic quantum computing constructions. In its most primitive form, systems constructed in this model cannot compute in an error-free manner when working at nonzero temperature. However, by exploiting a mapping between the partition function for this model and probabilistic classical circuits we are able to show that it is possible to make this model effectively error-free. We achieve this by using techniques in fault-tolerant classical computing and the result is that the system can compute effectively error-free if the temperature is below a critical temperature. We further link this model to computational complexity and show that a certain problem concerning finite temperature classical spin systems is complete for the complexity class Merlin-Arthur. This provides an interesting connection between the physical behavior of certain many-body spin systems and computational complexity.

Crosson, I. J.; Bacon, D.; Brown, K. R.

2010-09-01

339

Fault tolerant, reliable and scalable scientific ballooning control software  

Science.gov (United States)

The Universal Balloon Control Software package (UBCS) was first designed and developed for the ATIC experiment in 1997 and has evolved over the years into a highly reliable and adaptable control system. The system has logged thousands of hours of operation time on ATIC with few reboots and has been adapted for the HASP balloon payload which has had two successful flights in 2006 and 2007. The goal was to develop a UBCS that was fault tolerant and auto-recoverable while at the same time extremely reliable and scalable. In order to meet these goals, we designed a modular software system where each process was able to run in parallel with other processes on the same or different CPUs. These modular processes needed to be relatively independent; so that one process didn't rely on another in order to function. We chose QNX 4.25 as the operating system because of its multi-tasking abilities and the level of abstraction offered in communication between processes. Another key component in the UBCS, called the Buffer Process Group (BPG), was developed to de-couple processes from one another allowing each to operate independently. The BPG is a client/server process data port with a standardized interface allowing any given server to load records for access by an independent client at any given time. The BPG is capable of handling many data servers and clients simultaneously. Examples of data servers are the data acquisition process and housekeeping processes and examples of data clients are the archive process, the down link telemetry processes and the ground display processes. Together, the BPG process and the QNX 4.25 OS allow the UBCS to meet all of its design goals. In particular they allow the system to be highly fault tolerant and recoverable. A monitoring process is able to restart failed processes and reboot the computers on which they reside, if necessary. This allows the UBCS to recover from software errors or bugs as well as hardware glitches such as temporary power problems or single event upsets. During the presentation we will discuss in more detail how this software design is applicable to many different platforms and our plans for evolving the software package for future balloon experiments.

Stewart, Michael F.; Ellison, Steven B.; Isbert, Joachim; Granger, Doug; Guzik, T. Gregory; Wefel, John P.

340

Fault detection, isolation and reconfiguration in FTMP Methods and experimental results. [fault tolerant multiprocessor  

Science.gov (United States)

The Fault-Tolerant Multiprocessor (FTMP) is a highly reliable computer designed to meet a goal of 10 to the -10th failures per hour and built with the objective of flying an active-control transport aircraft. Fault detection, identification, and recovery software is described, and experimental results obtained by injecting faults in the pin level in the FTMP are presented. Over 21,000 faults were injected in the CPU, memory, bus interface circuits, and error detection, masking, and error reporting circuits of one LRU of the multiprocessor. Detection, isolation, and reconfiguration times were recorded for each fault, and the results were found to agree well with earlier assumptions made in reliability modeling.

Lala, J. H.

1983-01-01

 
 
 
 
341

Direct Fault Tolerant RLV Altitude Control: A Singular Perturbation Approach  

Science.gov (United States)

In this paper, we present a direct fault tolerant control (DFTC) technique, where by "direct" we mean that no explicit fault identification is used. The technique will be presented for the attitude controller (autopilot) for a reusable launch vehicle (RLV), although in principle it can be applied to many other applications. Any partial or complete failure of control actuators and effectors will be inferred from saturation of one or more commanded control signals generated by the controller. The saturation causes a reduction in the effective gain, or bandwidth of the feedback loop, which can be modeled as an increase in singular perturbation in the loop. In order to maintain stability, the bandwidth of the nominal (reduced-order) system will be reduced proportionally according to the singular perturbation theory. The presented DFTC technique automatically handles momentary saturations and integrator windup caused by excessive disturbances, guidance command or dispersions under normal vehicle conditions. For multi-input, multi-output (MIMO) systems with redundant control effectors, such as the RLV attitude control system, an algorithm is presented for determining the direction of bandwidth cutback using the method of minimum-time optimal control with constrained control in order to maintain the best performance that is possible with the reduced control authority. Other bandwidth cutback logic, such as one that preserves the commanded direction of the bandwidth or favors a preferred direction when the commanded direction cannot be achieved, is also discussed. In this extended abstract, a simplistic example is proved to demonstrate the idea. In the final paper, test results on the high fidelity 6-DOF X-33 model with severe dispersions will be presented.

Zhu, J. J.; Lawrence, D. A.; Fisher, J.; Shtessel, Y. B.; Hodel, A. S.; Lu, P.; Jackson, Scott (Technical Monitor)

2002-01-01

342

Passive fault tolerant control of a double inverted pendulum - a case study  

DEFF Research Database (Denmark)

A passive fault tolerant control scheme is suggested, in which a nominal controller is augmented with an additional block, which guarantees stability and performance after the occurrence of a fault. The method is based on the YJBK parameterization, which requires the nominal controller to be implemented in observer based form. The proposed method is applied to a double inverted pendulum system, for which an H_inf controller has been designed and verified in a lab setup. In this case study, the fault is a degradation of the tacho loop.

Niemann, Hans Henrik

2005-01-01

343

Passive Fault tolerant Control of an Inverted Double Pendulum : A Case Study Example  

DEFF Research Database (Denmark)

A passive fault tolerant control scheme is suggested, in which a nominal controller is augmented with an additional block, which guarantees stability and performance after the occurrence of a fault. The method is based on the Youla parameterization, which requires the nominal controller to be implemented in the observer based form. The proposed method is applied to a double inverted pendulum system, for which an H controller has been designed and verified in a lap setup. In this case study, the fault is a degradation of the tacho loop.

Niemann, H.; Stoustrup, Jakob

2003-01-01

344

Multiversion software reliability through fault-avoidance and fault-tolerance  

Science.gov (United States)

In this project we have proposed to investigate a number of experimental and theoretical issues associated with the practical use of multi-version software in providing dependable software through fault-avoidance and fault-elimination, as well as run-time tolerance of software faults. In the period reported here we have working on the following: We have continued collection of data on the relationships between software faults and reliability, and the coverage provided by the testing process as measured by different metrics (including data flow metrics). We continued work on software reliability estimation methods based on non-random sampling, and the relationship between software reliability and code coverage provided through testing. We have continued studying back-to-back testing as an efficient mechanism for removal of uncorrelated faults, and common-cause faults of variable span. We have also been studying back-to-back testing as a tool for improvement of the software change process, including regression testing. We continued investigating existing, and worked on formulation of new fault-tolerance models. In particular, we have partly finished evaluation of Consensus Voting in the presence of correlated failures, and are in the process of finishing evaluation of Consensus Recovery Block (CRB) under failure correlation. We find both approaches far superior to commonly employed fixed agreement number voting (usually majority voting). We have also finished a cost analysis of the CRB approach.

Vouk, Mladen A.; Mcallister, David F.

1990-01-01

345

A Byzantine resilient processor with an encoded fault-tolerant shared memory  

Science.gov (United States)

The memory requirements for ultra-reliable computers are expected to increase due to future increases in mission functionality and operating-system requirements. This increase will have a negative effect on the reliability and cost of the system. Increased memory size will also reduce the ability to reintegrate a channel after a transient fault, since the time required to reintegrate a channel in a conventional fault-tolerant processor is dominated by memory realignment time. A Byzantine Resilient Fault-Tolerant Processor with Fault-Tolerant Shared Memory (FTP/FTSM) is presented as a solution to these problems. The FTSM uses an encoded memory system, which reduces the memory requirement by one-half compared to a conventional quad-FTP design. This increases the reliability and decreases the cost of the system. The realignment problem is also addressed by the FTSM. Because any single error is corrected upon a read from the FTSM, a faulty channel's corrupted memory does not need realignment before reintegration of the faulty channel. A combination of correct-on-access and background scrubbing is proposed to prevent the accumulation of transient errors in the memory. With a hardware-implemented scrubber, the scrubbing cycle time, and therefore the memory fault latency, can be upper-bounded at a small value. This technique increases the reliability of the memory system and facilitates validation of its reliability model.

Butler, Bryan; Harper, Richard

1990-01-01

346

On Permutation Capabilities of Fault Tolerant Multistage Interconnection Networks  

Directory of Open Access Journals (Sweden)

Full Text Available In this paper permutation capabilities analysis of fault tolerant [1] Multistage Interconnection Networks have been presented. I have examined some popular FT(Four Tree[8], MFT(Modified Four Tree[2], PHI(Phi Network [11], NFT(New Four Tree[4], IFT(improved Four Tree[5], IASN(Irregular Augmented Shuffle[14] and IIASN(Improved Irregular Augmented Shuffle[3] networks which are irregular in nature[11].Permutation capabilities are measured in terms of incremental and identical basis by introducing various faults at the different stages of the networks.

Sandeep Sharma

2012-11-01

347

Modular Multilevel Converter Control Strategy with Fault Tolerance  

DEFF Research Database (Denmark)

The Modular Multilevel Converter (MMC) technology has recently emerged in VSC-HVDC applications where it demonstrated higher efficiency and fault tolerance compared to the classical 2-level topology. Due to the ability of MMC to connect to HV levels, MMC can be also used in transformerless STATCOM and large wind turbines. In this paper, a control and communication strategy have been developed to accommodate tolerant module failure and capacitor voltage unbalance. A downscaled prototype converter has been built in order to validate and investigate the control strategy, and also test the proposed communication infrastructure based on Industrial Ethernet.

Teodorescu, Remus; Eni, Emanuel-Petre

2013-01-01

348

Fault Tolerant Characteristics in Quantum-dot Cellular Automata Devices  

Science.gov (United States)

We present analytical results of fault tolerant properties of various quantum-dot cellular automata (QCA) devices. In any electronic computation device such as a computer, one needs digital signals for computation. In this model, the binary numbers are encoded from charge configurations in quantum dots. Data transfer, signal flow, and computations can be performed by electron polarization in the nanostructure. Our main focus is to investigate the functionality of a QCA device by studying the thermal and manufacturing defects. A Hubbard-type Hamiltonian and Inter-cellular Hartree approximation have been used for modeling, and a uniform random distribution has been implemented for the defect simulations. Simple devices such as quantum wire, logical gates, inverter, cross-over, XOR, and Full Adder will be discussed. Results show fault tolerance of a device is strongly dependent on the temperatures as well as on the manufacturing defects.

Khatun, Mahfuza; Padgett, Benjamin

2011-10-01

349

Proactive and Reactive View Change for Fault Tolerant Byzantine Agreement  

Directory of Open Access Journals (Sweden)

Full Text Available Problem statement: Dealing with arbitrary failures effectively, while reaching agreement, remains a major operational challenge in distributed transactions. In the contemporary literature, standard protocols such as Byzantine Fault Tolerant Distributed Commit and Practical Byzantine Fault Tolerance handles the problem to a greater extent. However, the limitation with these protocols is that they incur increased message overhead as well as large latency. Approach: To improve the failure resiliency with minimum execution overhead, we propose two new protocols based on proactive view change and reactive view change. Also, both approaches have been analyzed and compared. Results: Our dynamic analysis reflects that, in a faulty scenario, the proactive approach is computationally more efficient with reduced latency as compared to reactive one. Conclusion/Recommendations: Moreover, unlike PBFT and BFTDC, our agreement protocol runs in two phases, which leads to reduced message overhead and total execution time.

Poonam Saini

2011-01-01

350

On Universal and Fault-Tolerant Quantum Computing  

CERN Document Server

A novel universal and fault-tolerant basis (set of gates) for quantum computation is described. Such a set is necessary to perform quantum computation in a realistic noisy environment. The new basis consists of two single-qubit gates (Hadamard and ${\\sigma_z}^{1/4}$), and one double-qubit gate (Controlled-NOT). Since the set consisting of Controlled-NOT and Hadamard gates is not universal, the new basis achieves universality by including only one additional elementary (in the sense that it does not include angles that are irrational multiples of $\\pi$) single-qubit gate, and hence, is potentially the simplest universal basis that one can construct. We also provide an alternative proof of universality for the only other known class of universal and fault-tolerant basis proposed by Shor and by Kitaev.

Boykin, P O; Pulver, M; Roychowdhury, V P; Vatan, F; Mor, Tal; Pulver, Matthew; Roychowdhury, Vwani; Vatan, Farrokh

1999-01-01

351

Designing an Agent-Based Intrusion Detection System for Heterogeneous Wireless Sensor Networks: Robust, Fault Tolerant and Dynamic Reconfigurable  

Directory of Open Access Journals (Sweden)

Full Text Available Protecting networks against different types of attacks is one of most important posed issue into the network and information security domains. This problem on Wireless Sensor Networks (WSNs, in attention to their special properties, has more importance. Now, there are some of proposed solutions to protect Wireless Sensor Networks (WSNs against different types of intrusions; but no one of them has a comprehensive view to this problem and they are usually designed in single-purpose; but, the proposed design in this paper has been a comprehensive view to this issue by presenting a complete architecture of Intrusion Detection System (IDS. The main contribution of this architecture is its modularity and flexibility; i.e. it is designed and applicable, in four steps on intrusion detection process, consistent to the application domain and its required security level. Focus of this paper is on the heterogeneous WSNs and network-based IDS, by designing and deploying the Wireless Sensor Network wide level Intrusion Detection System (WSNIDS on the base station (sink. Finally, this paper has been designed a questionnaire to verify its idea, by using the acquired results from analyzing the questionnaires.

Hossein Jadidoleslamy

2011-08-01

352

Compilation and Synthesis for Fault-Tolerant Digital Microfluidic Biochips  

DEFF Research Database (Denmark)

Microfluidic-based biochips are replacing the conventional biochemical analyzers, by integrating all the necessary functions for biochemical analysis using microfluidics. The digital microfluidic biochips (DMBs) manipulate discrete amounts of fluids of nanoliter volume, named droplets, on an array of electrodes to perform operations such as dispensing, transport, mixing, split, dilution and detection. Researchers have proposed compilation approaches, which, starting from a biochemical application and a biochip architecture, determine the allocation, resource binding, scheduling, placement and routing of the operations in the application. During the execution of a bioassay, operations could experience transient faults, thus impacting negatively the correctness of the application. We have proposed both offline (design time) and online (runtime) recovery strategies. The online recovery strategy decides the introduction of the redundancy required for fault-tolerance. We consider both time redundancy, i.e., re-executing erroneous operations, and space redundancy, i.e., creating redundant droplets for fault-tolerance. Error recovery is performed such that the number of transient faults tolerated is maximized and the timing constraints of the biochemical application are satisfied. Previous work has assumed that the biochip architecture is given, and most approaches consider a rectangular shape for the electrode array, where operations execute on rectangular â??modulesâ? formed of electrodes. However, non-regular application-specific architectures are common in practice. Hence, we have proposed an approach to the synthesis of application-specific architectures, such that the cost is minimized and the timing constraints of the application are satisfied. We propose an algorithm to build a library of non-regular modules for a given applicationspecific architecture, so that the area of a non-regular application-specific biochip can be used effectively. During fabrication, DMBs can be affected by permanent faults, which may lead to the failure of the application. Our approach introduces redundant electrodes to synthesize fault-tolerant architectures aiming at increasing the yield of DMBs. We also propose a method to estimate, at design time, the application completion time in case of permanent faults in order to verify if an application can be successfully run on the architecture. The proposed approaches were evaluated using several real-life case studies and synthetic benchmarks.

Alistar, Mirela

2014-01-01

353

Rapid Recovery for Systems with Scarce Faults  

Directory of Open Access Journals (Sweden)

Full Text Available Our goal is to achieve a high degree of fault tolerance through the control of a safety critical systems. This reduces to solving a game between a malicious environment that injects failures and a controller who tries to establish a correct behavior. We suggest a new control objective for such systems that offers a better balance between complexity and precision: we seek systems that are k-resilient. In order to be k-resilient, a system needs to be able to rapidly recover from a small number, up to k, of local faults infinitely many times, provided that blocks of up to k faults are separated by short recovery periods in which no fault occurs. k-resilience is a simple but powerful abstraction from the precise distribution of local faults, but much more refined than the traditional objective to maximize the number of local faults. We argue why we believe this to be the right level of abstraction for safety critical systems when local faults are few and far between. We show that the computational complexity of constructing optimal control with respect to resilience is low and demonstrate the feasibility through an implementation and experimental results.

Chung-Hao Huang

2012-10-01

354

Improving the Navigability of a Hexapod Robot using a Fault-Tolerant Adaptive Gait  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper encompasses a study on the development of a walking gait for fault tolerant locomotion in unstructured environments. The fault tolerant gait for adaptive locomotion fulfills stability conditions in opposition to a fault (locked joints or sensor failure) event preventing a robot to realize stable locomotion over uneven terrains. To accomplish this feat, a fault tolerant gait based on force?position control is proposed in this paper for a hexapod robot to enable stable walking with...

Umar Asif

2012-01-01

355

Fault-tolerant Landau-Zener quantum gates  

International Nuclear Information System (INIS)

We present a method to perform fault-tolerant single-qubit gate operations using Landau-Zener tunneling. In a single Landau-Zener pulse, the qubit transition frequency is varied in time so that it passes through the frequency of the radiation field. We show that a simple three-pulse sequence allows eliminating errors in the gate up to the third order in errors in the qubit energies or the radiation frequency

2006-01-01

356

Fault-Tolerant Landau-Zener Quantum Gates  

Digital Repository Infrastructure Vision for European Research (DRIVER)

We present a method to perform fault-tolerant single-qubit gate operations using Landau-Zener tunneling. In a single Landau-Zener pulse, the qubit transition frequency is varied in time so that it passes through the frequency of the radiation field. We show that a simple three-pulse sequence allows eliminating errors in the gate up to the third order in errors in the qubit energies or the radiation frequency.

Hicke, C.; Santos, L. F.; Dykman, M. I.

2005-01-01

357

FAULT TOLERANT SCHEDULING STRATEGY FOR COMPUTATIONAL GRID ENVIRONMENT  

Directory of Open Access Journals (Sweden)

Full Text Available Computational grids have the potential for solving large-scale scientific applications using heterogeneous and geographically distributed resources. In addition to the challenges of managing and scheduling these applications, reliability challenges arise because of the unreliable nature of grid infrastructure. Two major problems that are critical to the effective utilization of computational resources are efficient scheduling of jobs and providing fault tolerance in a reliable manner. This paper addresses these problems by combining the checkpoint replication based fault tolerance echanism with Minimum Total Time to Release (MTTR job scheduling algorithm. TTR includes the service time of the job, waiting time in the queue, transfer of input and output data to and from the resource. The MTTR algorithm minimizes the TTR by selecting a computational resource based on job requirements, job characteristics and hardware features of the resources. The fault tolerance mechanism used here sets the job checkpoints based on the resource failure rate. If resource failure occurs, the job is restarted from its last successful state using a checkpoint file from another grid resource. Acritical aspect for an automatic recovery is the availability of checkpoint files. A strategy to increase the availability of checkpoints is replication. Replica Resource Selection Algorithm (RRSA is proposed to provide Checkpoint Replication Service (CRS. Globus Tool Kit is used as the grid middleware to set up a grid environment and evaluate the performance of the proposed approach. The monitoring tools Ganglia and NWS (Network Weather Service are used to gather hardware and network details respectively. The experimental results demonstrate that, the proposed approach effectively schedule the grid jobs with fault tolerant way thereby reduces TTR of the jobs submitted in the grid. Also, it increases the percentage of jobs completed within specified deadline and making the grid trustworthy.

MALARVIZHI NANDAGOPAL,

2010-09-01

358

Safety in Numbers: Fault Tolerance in Robot Swarms  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The swarm intelligence literature frequently asserts that swarms exhibit high levels of robustness. That claim is, however, rather less frequently supported by empirical or theoretical analysis. But what do we mean by a 'robust' swarm? How would we measure the robustness or – to put it another way – fault-tolerance of a robotic swarm? These questions are not just of academic interest. If swarm robotics is to make the transition from the laboratory to real-world engineering implementation,...

Winfield, A. F. T.; Nembrini, Julien

2006-01-01

359

Is Fault-Tolerant Quantum Computation Really Possible?  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The so-called "threshold" theorem says that, once the error rate per qubit per gate is below a certain value, indefinitely long quantum computation becomes feasible, even if all of the qubits involved are subject to relaxation processes, and all the manipulations with qubits are not exact. The purpose of this article, intended for physicists, is to outline the ideas of quantum error correction and to take a look at the proposed technical instruction for fault-tolerant quantu...

Dyakonov, M. I.

2006-01-01

360

Efficient Model Checking of Fault-Tolerant Distributed Protocols  

Digital Repository Infrastructure Vision for European Research (DRIVER)

To aid the formal verification of fault-tolerant distributed protocols, we propose an approach that significantly reduces the costs of their model checking. These protocols often specify atomic, process-local events that consume a set of messages, change the state of a process, and send zero or more messages. We call such events quorum transitions and leverage them to optimize state exploration in two ways. First, we generate fewer states compared to models where quorum transitions are expres...

Bokor, Pe?ter; Kinder, Johannes; Serafini, Marco; Suri, Neeraj

2011-01-01

 
 
 
 
361

Optimal correction of concatenated fault-tolerant quantum codes  

Science.gov (United States)

We present a method of concatenated quantum error correction in which improved classical processing is used with existing quantum codes and fault-tolerant circuits to more reliably correct errors. Rather than correcting each level of a concatenated code independently, our method uses information about the likelihood of errors having occurred at lower levels to maximize the probability of correctly interpreting error syndromes. Results of simulations of our method applied to the [[4,1,2

Evans, Z. W. E.; Stephens, A. M.

2012-12-01

362

Fault tolerant onboard implementation of control procedures in tethered satellite  

Science.gov (United States)

The Space Shuttle's Tethered Satellite requires general spacecraft management, autonomous data handling, and safety precautions for both the Shuttle and the satellite. Fault tolerance is implemented via a process of task-migration between two processors in the event of a failure in either. The two microprocessors have independent software packages, one for general spacecraft management and the other for attitude control. A backup software package is used when one of the two microprocessors is out of service.

Ranieri, R.; Giannini, G.; Airaghi, A.; Fossati, D.

363

BFTDT: Byzantine Fault Tolerance tryout for Dependable Transactions in Cloud  

Directory of Open Access Journals (Sweden)

Full Text Available Cloud Web Services (CWS is the technology used for business collaboration and integration among the web users. The Web Services Atomic Transactions (WS-AT have been used for the trusted distributed transaction processing over the web. The WS-AT in the distributed sense has byzantine faults to overcome that Byzantine Faults Techniques (BFT is used. The reliable coordinator provides the services that are Coordination services, Activation services, Registration Services and Completion services which make the transaction effective and reliable. In the trusted environment, to evade congestion of the resources, fair share bandwidth allocation scheme is used to allocate separate bandwidth for each web users and the transaction is processed Coordinator server and the Transaction Processing Monitor (TPM. The WS-AT for business applications analysis shows the high degree of dependability, security, trust, fault tolerance and fairness of the resources in the trusted environment.

Gayathri S

2012-11-01

364

Wireless Fault-Tolerant Controllers in Cascaded Industrial Workcells Using Wi-Fi and Ethernet  

Directory of Open Access Journals (Sweden)

Full Text Available A Wireless Networked Control System using 802.11b is used to model fault-tolerance at the controller level of an industrial workcell. The fault-tolerance study in this paper presents the cascading of two independent workcells where each controller must be able to handle the load of both cells in case of failure of the other one. The intercommunication is completely wireless between the cells and this feature is investigated. The model incorporates unmodified 802.11b and 802.11g for communication. Sensors send sampled data to both controllers and the controllers to exchange a watchdog. The fault-free and faulty models are both simulated using OPNET Network Modeler. External interference on the critical intercommunication link is also investigated. Results of simulations are presented based on a 95% confidence analysis, guaranteeing correct system performance.

Tarek K. Refaat

2013-11-01

365

Fault-Tolerant Time Synchronization in Wireless Sensor Networks  

Directory of Open Access Journals (Sweden)

Full Text Available Wireless Sensor Networks are a special type of ad-hoc networks, where wireless devices collaborate with other devices to send data to the destination. Synchronization is an important issue for wireless sensor networks because temporal coordination is required for many of the collaborative tasks they perform. E.g. For the task of Data Fusion, in object tracking and velocity estimation, in setting the sleep modes of the various nodes so that the battery life is prolonged, etc.. There are several synchronization schemes which have been put forward till date. But only few of them are fault-tolerant. Fault-Tolerant, in this context, means that the scheme would work efficiently even in the presence of malicious nodes. Malicious nodes in this paper refer mainly to the nodes which may provide incorrect time. This paper proposes a novel fault-tolerant synchronization scheme which will provide internal synchronization, taking into consideration the malicious or faulty nodes present in the network.

Vikram Singh, T. P. Sharma

2013-06-01

366

Unconstrained and Constrained Fault-Tolerant Resource Allocation  

CERN Document Server

First, we study the Unconstrained Fault-Tolerant Resource Allocation (UFTRA) problem (a.k.a. FTFA problem in \\cite{shihongftfa}). In the problem, we are given a set of sites equipped with an unconstrained number of facilities as resources, and a set of clients with set $\\mathcal{R}$ as corresponding connection requirements, where every facility belonging to the same site has an identical opening (operating) cost and every client-facility pair has a connection cost. The objective is to allocate facilities from sites to satisfy $\\mathcal{R}$ at a minimum total cost. Next, we introduce the Constrained Fault-Tolerant Resource Allocation (CFTRA) problem. It differs from UFTRA in that the number of resources available at each site $i$ is limited by $R_{i}$. Both problems are practical extensions of the classical Fault-Tolerant Facility Location (FTFL) problem \\cite{Jain00FTFL}. For instance, their solutions provide optimal resource allocation (w.r.t. enterprises) and leasing (w.r.t. clients) strategies for the cont...

Liao, Kewen

2011-01-01

367

Design of a fault-tolerant controller for the SP-100 space reactor  

Energy Technology Data Exchange (ETDEWEB)

The control system of an SP-100 space reactor is a key element of space reactor design to meet the space mission requirements of safety, reliability, and life expectancy. In this work, a fault-tolerant controller (FTC) is developed to control the thermoelectric (TE) power in the SP-100 space reactor. A fault-tolerant controller makes the control system stable and retains acceptable performance even under system faults. The objectives of the proposed model predictive controller are to minimize both the difference between the predicted TE power and the desired power, and the variation of control drum angle that adjusts the control reactivity. Also, the objectives are subject to constraints of maximum and minimum control drum angle and maximum drum angle variation speed. The model predictive controller incorporates a fault detection and diagnostics algorithm so that the controller can work properly even under input and output measurement faults. A lumped parameter simulation model of the SP-100 nuclear space reactor is used to verify the proposed controller design. Simulation result show that the TE generator power level, regulated by the proposed controller, could track the target power level effectively even under measurement faults, satisfying all control constraints. (authors)

Na, M. G. [Nuclear Eng. Dept., Chosun Univ., 375 Seosuk-dong, Dong-gu, Gwangju 501-759 (Korea, Republic of); Upadhyaya, B. R. [Nuclear Eng. Dept., Univ. of Tennessee, Knoxville, TN 37996-2300 (United States)

2006-07-01

368

Fault-Tolerant, Radiation-Hard DSP  

Science.gov (United States)

Commercial digital signal processors (DSPs) for use in high-speed satellite computers are challenged by the damaging effects of space radiation, mainly single event upsets (SEUs) and single event functional interrupts (SEFIs). Innovations have been developed for mitigating the effects of SEUs and SEFIs, enabling the use of very-highspeed commercial DSPs with improved SEU tolerances. Time-triple modular redundancy (TTMR) is a method of applying traditional triple modular redundancy on a single processor, exploiting the VLIW (very long instruction word) class of parallel processors. TTMR improves SEU rates substantially. SEFIs are solved by a SEFI-hardened core circuit, external to the microprocessor. It monitors the health of the processor, and if a SEFI occurs, forces the processor to return to performance through a series of escalating events. TTMR and hardened-core solutions were developed for both DSPs and reconfigurable field-programmable gate arrays (FPGAs). This includes advancement of TTMR algorithms for DSPs and reconfigurable FPGAs, plus a rad-hard, hardened-core integrated circuit that services both the DSP and FPGA. Additionally, a combined DSP and FPGA board architecture was fully developed into a rad-hard engineering product. This technology enables use of commercial off-the-shelf (COTS) DSPs in computers for satellite and other space applications, allowing rapid deployment at a much lower cost. Traditional rad-hard space computers are very expensive and typically have long lead times. These computers are either based on traditional rad-hard processors, which have extremely low computational performance, or triple modular redundant (TMR) FPGA arrays, which suffer from power and complexity issues. Even more frustrating is that the TMR arrays of FPGAs require a fixed, external rad-hard voting element, thereby causing them to lose much of their reconfiguration capability and in some cases significant speed reduction. The benefits of COTS high-performance signal processing include significant increase in onboard science data processing, enabling orders of magnitude reduction in required communication bandwidth for science data return, orders of magnitude improvement in onboard mission planning and critical decision making, and the ability to rapidly respond to changing mission environments, thus enabling opportunistic science and orders of magnitude reduction in the cost of mission operations through reduction of required staff. Additional benefits of COTS-based, high-performance signal processing include the ability to leverage considerable commercial and academic investments in advanced computing tools, techniques, and infra structure, and the familiarity of the science and IT community with these computing environments.

Czajkowski, David

2011-01-01

369

Fault-tolerant Control of Unmanned Underwater Vehicles with Continuous Faults: Simulations and Experiments  

Directory of Open Access Journals (Sweden)

Full Text Available A novel thruster fault diagnosis and accommodation method for open-frame underwater vehicles is presented in the paper. The proposed system consists of two units: a fault diagnosis unit and a fault accommodation unit. In the fault diagnosis unit an ICMAC (Improved Credit Assignment Cerebellar Model Articulation Controllers neural network information fusion model is used to realize the fault identification of the thruster. The fault accommodation unit is based on direct calculations of moment and the result of fault identification is used to find the solution of the control allocation problem. The approach resolves the continuous faulty identification of the UV. Results from the experiment are provided to illustrate the performance of the proposed method in uncertain continuous faulty situation.

Qian Liu

2010-02-01

370

Graphics enhanced computer emulation for improved timing-race and fault tolerance control system analysis. [of Centaur liquid-fuel booster  

Science.gov (United States)

A computer simulation system has been developed for the Space Shuttle's advanced Centaur liquid fuel booster rocket, in order to conduct systems safety verification and flight operations training. This simulation utility is designed to analyze functional system behavior by integrating control avionics with mechanical and fluid elements, and is able to emulate any system operation, from simple relay logic to complex VLSI components, with wire-by-wire detail. A novel graphics data entry system offers a pseudo-wire wrap data base that can be easily updated. Visual subsystem operations can be selected and displayed in color on a six-monitor graphics processor. System timing and fault verification analyses are conducted by injecting component fault modes and min/max timing delays, and then observing system operation through a red line monitor.

Szatkowski, G. P.

1983-01-01

371

Unitary reflection groups for quantum fault tolerance  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper explores the representation of quantum computing in terms of unitary reflections (unitary transformations that leave invariant a hyperplane of a vector space). The symmetries of qubit systems are found to be supported by Euclidean real reflections (i.e., Coxeter groups) or by specific imprimitive reflection groups, introduced (but not named) in a recent paper [Planat M and Jorrand Ph 2008, {\\it J Phys A: Math Theor} {\\bf 41}, 182001]. The automorphisms of multiple qubit systems are...

Planat, Michel; Kibler, Maurice

2010-01-01

372

Fault Tolerant Flight Control - A Survey  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Nowadays, control systems are involved in nearly all aspects of our lives. They are all around us, but their presence is not always really apparent. They are in our kitchens, in our DVD-players, computers and our cars. They are found in elevators, ships, aircraft and spacecraft. Control systems are present in every industry, they are used to control chemical reactors, distillation columns, and nuclear power plants. They are constantly and inexhaustibly working, making our life more comfortabl...

2010-01-01

373

TN- or TT-system. The difference of tolerable risks of protection under fault conditions; TN- oder TT-Systemen. Unterschiede in der Grenzrisiken fuer den Schutz gegen elektrischen Schlag unter Fehlerbedingungen  

Energy Technology Data Exchange (ETDEWEB)

For protection against electric shock under fault conditions (protection against indirect contact or fault protection) in the installation of buildings in most cases measures of fault protection by automatic disconnection of supply are used in form of the TN-system (protective neutral earthing) or the TT-system (protective direct earthing with RCD's as protective devices). The differences of tolerable risks of these measures of protection with protective conductors with regard to disconnecting times and touch voltages and in connection with it fault voltages and prospective touch voltages are investigated based on calculations of examples and measurements of comparison in the network. At the end of this contribution the most important definitions are explained. (orig.) [German] Zum Schutz gegen elektrischen Schlag unter Fehlerbedingungen (Schutz bei indirektem Beruehren oder auch Fehlerschutz) werden in der Gebaeudeinstallation in den weitaus meisten Faellen als Schutzmassnahmen durch automatische Abschaltung der Stromversorgung das TN-System (Nullung) oder das TT-System (Fehlerstrom-Schutzschaltung) angewendet. Die Unterschiede in den Grenzrisiken dieser Schutzleiter-Schutzmassnahmen in bezug auf die Abschaltzeiten und die Beruehrungsspannungen sowie die damit im Zusammenhang stehenden Fehlerspannungen und unbeeinflussten Beruehrungsspannungen werden anhand von Beispielrechnungen und Vergleichsmessungen im Netz untersucht. (orig.)

Biegelmeier, G.; Krefter, K.H. [Vereinigte Elektrizitaetswerke Westfalen AG (VEW), Dortmund (Germany). Abt. Energieanwendung; VEW Eurotest GmbH, Dortmund (Germany)

2000-02-07

374

Lecture Notes : Practical Approach to Reliability, Safety, and Active Fault-tolerance  

DEFF Research Database (Denmark)

"The fundamental objective of the combined safety and Reliability assessment is to identify critical items in the design and the choice of equipment that may jeopardize safety or availability, and thereby to provide arguments for the selection between different options for the system." Achieving safety and reliability has been one the prime objectives for system designers while designing safety critical system for decades. With growing environmental awareness, concerns, and demands, the scope of the design of reliable (and safe) systems has been enhanced to even small components as sensors and actuators. In the past, the normal procedure to address the higher demand for reliability was to add hardware redundancy that in turn increases the production and maintenance costs. Active fault-tolerant design is an attempt to achieve higher redundancy while minimizing the costs. In chapter 2 reliability and safety related issues are considered and described. The idea of introducing this chapter is to provide an overview of the concepts and methods used for reliability and safety assessment. The focus in chapter 3 is on fault-tolerance concept. Type of possible faults in components and customary methods for applying redundancy is described. Finally, the chapter is wrapped up by considering and describing the main subject, which is a formal and consistent procedure to design active fault-tolerant systems

Izadi-Zamanabadi, Roozbeh

2000-01-01

375

Actuator fault-tolerant control design based on reconfigurable reference input  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The prospective work reported in this paper explores a new approach to enhance the perform-ance of an active fault tolerant control system. The proposed technique is based on a modified recovery/trajectory control system in which a reconfigurable reference input is considered when performance degradation occurs in the system due to faults in actuator dynamics. An added value of this work is to reduce the energy spent to achieve the desired closed-loop per-formance. This work is justified by t...

2008-01-01

376

Structural Design of Systems with Safe Behavior under Single and Multiple Faults  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Handling of multiple simultaneous faults is a complex issue in fault-tolerant control. The design task is particularly made difficult by to the numerous different cases that need be analyzed. Aiming at safe fault-handling, this paper shows how structural analysis can be applied to find the analytical redundancy relations for all relevant combinations of faults, and can cope with the complexity and size of a real system. Being essential for fault-tolerant control schemes t...

Blanke, Mogens; Staroswiecki, Marcel

2006-01-01

377

Byzantine Fault Tolerance of Regenerating Codes  

CERN Document Server

Recent years have witnessed a slew of coding techniques custom designed for networked storage systems. Network coding inspired regenerating codes are the most prolifically studied among these new age storage centric codes. A lot of effort has been invested in understanding the fundamental achievable trade-offs of storage and bandwidth usage to maintain redundancy in presence of different models of failures, showcasing the efficacy of regenerating codes with respect to traditional erasure coding techniques. For practical usability in open and adversarial environments, as is typical in peer-to-peer systems, we need however not only resilience against erasures, but also from (adversarial) errors. In this paper, we study the resilience of generalized regenerating codes (supporting multi-repairs, using collaboration among newcomers) in the presence of two classes of Byzantine nodes, relatively benign selfish (non-cooperating) nodes, as well as under more active, malicious polluting nodes. We give upper bounds on t...

Oggier, Frédérique

2011-01-01

378

FPGA fault tolerance in particle physics experiments  

International Nuclear Information System (INIS)

The behavior of matter in physically extreme conditions is in focus of many high-energy-physics experiments. For this purpose, high energy charged particles (ions) are collided with each other and energy- or baryon densities are created similar to those at the beginning of the universe or to those which can be found in the center of neutron stars. In both cases a plasma of quarks and gluons (QGP) is present, which immediately decomposes to hadrons within a short period of time. At this process, particles are formed, which allow statements about the beginning of the universe when captured by large detectors, but which also lead to the massive occurance of hardware failures within the detector's electronic devices. This contribution is about methods to mitigate radiation susceptibility for Field Programmable Gate Arrays (FPGA), enabling them to be used within particle detector systems to directly gain valid data in the readout chain or to be used as detector-control-system.

2010-03-15

379

Towards Fault-Tolerant Dynamical Decoupling  

Science.gov (United States)

Dynamical Decoupling (DD) is a error suppression technique which combats decoherence by applying strong and fast pulses to a quantum system to effectively average system-environment interactions. Although many DD constructions have been designed which exhibit suppression of interactions to high orders in time-dependent perturbation theory, this result is predominately in the ideal pulse limit as DD effectiveness degrades significantly in the presence of additional errors generated by faulty pulses. Here, we present a decoupling scheme which provides robustness to certain forms of pulse errors and utilizes concatenation to attain high order error suppression. Using numerical simulations, we convey the advantages of this scheme over additional robust DD constructions and provide evidence for the possibility of arbitrary order error suppression in the presence of pulse errors.

Quiroz, Gregory; Lidar, Daniel

2013-03-01

380

Two New Protocols for Fault Tolerant Agreement  

Directory of Open Access Journals (Sweden)

Full Text Available The paper attempts to handle failures effectively, while reaching agreement, in a distributed transaction processing system. The standard protocols such as BFTDC [3], Zyzzyva [4] and PBFT [5] handle the problem to a greater extent. However, the limitation with these protocols is that they incur increased message overhead as well as large latency. Moreover, the nodes are evacuated from the transactionsystem after being declared faulty. We propose a novel proactive based agreement which identifies the tentative failures in the system. To improve the failure resiliency with minimum execution overhead, we also propose an optimized reactive view change mechanism. Both mechanisms have been analyzed and compared. The dynamic analysis of the protocol reflects that, in a faulty scenario, the proactive approach is computationally more efficient with reduced latency as compared to reactive one. Moreover, unlike PBFT and BFTDC, our agreement protocol runs in two phases, which leads to reduced message overhead and total execution time. The protocol treats the fail-silent (i.e. crashed nodes in the system.

Poonam Saini

2011-02-01

 
 
 
 
381

Fault-Tolerant Vision for Vehicle Guidance in Agriculture  

DEFF Research Database (Denmark)

The emergence of widely available vision technologies is enabling for a wide range of automation tasks in industry and other areas. Agricultural vehicle guidance systems have benefitted from advances in 3D vision based on stereo camera technology. By automatically guiding vehicles along crops and other field structures the operatorâ??s stress levels can be reduced. High precision steering in sensitive crops can also be maintained for longer periods of time as the driver is less tired. Safety and availabilitymust be inherent in such systems in order to get widespread market acceptance. To tolerate dropout of 3D vision, faults in classification, or other defects, redundant information should be utilized. Such information can be used to diagnose faulty behavior and to temporarily continue operation with a reduced set of sensors when faults or artifacts occur. Additional sensors include GPS receivers and inertial sensors. To fully utilize the possibilities in 3D vision, the system must also be able to learn and adapt to changing environments. By learning features of the environment new diagnostic relations can be generated by creating redundant feed-forward information about crop location. Also, by mapping the field that is seen by the stereo camera, it is possible to support the guidance system by storing salient information about the environment. By tracking the motion of the vehicle, vision output can be fused over time to create more reliable and robust estimates of crop location. This thesis approaches these challenges by considering systematic design methods using graph-based analysis. It is demonstrated how diagnostic relations can be derived and remedial actions can be done to maintain safety and healthy ii functioning of vision systems. The combination of redundant information from 3D vision, mapping, and aiding sensors such as GPS provide means to detect and isolate single faults in the system. In addition, learning is employed to adapt the system to variational changes in the natural environment. 3D vision is enhanced by learning texture and color information. Intensity gradients on small neighborhoods of pixels are shown to provide a superior approach to modeling texture information than other methods. Stochastic automatas using optimally quantized data is demonstrated as a strong approach for offline learning. It is considered how 3D vision provides labeling of training data that subsequently can be fed into a learning system. Statistical change detection theory is shown to be a suitable approach to detecting artifacts in the learning process so safe operation can be maintained. The system can be used to perform real-time classification using a fast online approach that is superior to state-of-the-art. Advances in tracking vehicle motion using 3D vision is demonstrated to allow unprecedented high accuracy maps to be created of the local environment. Features in the environment are extracted and tracked using novel feature detectors relying on approximating the Laplacian operator with a bi-level octagonal kernel. It is shown how these features display high levels of accuracy and stability while being considerable faster than similar feature detectors. Artifacts in 3D vision range measurements are demonstrated to be detectable by using the generated 3D maps and a probabilistic approach to fusing and comparing range measurements.

Blas, Morten Rufus

2010-01-01

382

Fault-tolerance performance evaluation of fieldbus for NPCS network of KNGR  

International Nuclear Information System (INIS)

In contrast with conventional fieldbus researches which are focused merely on real-time performance, this study aims to evaluate the real-time performance of the communication system including fault-tolerant mechanisms. Maintaining performance in presence of recoverable faults is very important in case that the communication network is applied to a highly reliable system such as next generation Nuclear Power Plant (NPP). If the time characteristics meet the requirements of the system, the faults will be recovered by fieldbus recovery mechanisms and the system will be safe. If the time characteristics can not meet the requirements, the faults in the fieldbus can propagate to the system failure. In this study, for the purpose of investigating the time characteristics of fieldbus, the recoverable faults are classified and then the formulas that represent delays including recovery mechanisms are developed. In order to validate the proposed approach, we have developed a simulation model that represents the Korea Next Generation Reactor (KNGR) NSSS Process Control System (NPCS). The results of the simulation show us the reasonable delay characteristics of the fault cases with recovery mechanisms. Using the simulation results and the system requirements, we also can calculate the failure propagation probability from fieldbus to outer system. (author)

2001-02-01

383

Advanced information processing system: Fault injection study and results  

Science.gov (United States)

The objective of the AIPS program is to achieve a validated fault tolerant distributed computer system. The goals of the AIPS fault injection study were: (1) to present the fault injection study components addressing the AIPS validation objective; (2) to obtain feedback for fault removal from the design implementation; (3) to obtain statistical data regarding fault detection, isolation, and reconfiguration responses; and (4) to obtain data regarding the effects of faults on system performance. The parameters are described that must be varied to create a comprehensive set of fault injection tests, the subset of test cases selected, the test case measurements, and the test case execution. Both pin level hardware faults using a hardware fault injector and software injected memory mutations were used to test the system. An overview is provided of the hardware fault injector and the associated software used to carry out the experiments. Detailed specifications are given of fault and test results for the I/O Network and the AIPS Fault Tolerant Processor, respectively. The results are summarized and conclusions are given.

Burkhardt, Laura F.; Masotto, Thomas K.; Lala, Jaynarayan H.

1992-01-01

384

Fault-tolerance techniques for high-speed fiber-optic networks  

Science.gov (United States)

Four fiber optic network topologies (linear bus, ring, central star, and distributed star) are discussed relative to their application to high data throughput, fault tolerant networks. The topologies are also examined in terms of redundancy and the need to provide for single point, failure free (or better) system operation. Linear bus topology, although traditionally the method of choice for wire systems, presents implementation problems when larger fiber optic systems are considered. Ring topology works well for high speed systems when coupled with a token passing protocol, but it requires a significant increase in protocol complexity to manage system reconfiguration due to ring and node failures. Star topologies offer a natural fault tolerance, without added protocol complexity, while still providing high data throughput capability.

Deruiter, John

1991-01-01

385

Fault diagnosis of nuclear logging system  

International Nuclear Information System (INIS)

In order to diagnose and remove expressly the faults of nuclear logging system, the fault diagnosis method based on fault tree, fuzzy logic and expert system are submitted, The given live examples show that the fault diagnosis method can satisfy the need of fault diagnosis and removing of working field, the developing direction of fault diagnosis in logging system is given out. (authors)

2009-11-01

386

Fault Diagnosis and Accommodation of LTI systems by modified Youla parameterization  

Directory of Open Access Journals (Sweden)

Full Text Available In this paper an Active Fault Tolerant Control (FTC scheme is proposed for Linear Time Invariant (LTI systems, which achieves fault diagnosis followed by fault accommodation. The fault diagnosis scheme is carried out in two steps; Fault detection followed by Fault isolation. Fault detection filter use the sensor measurements to generate residuals, which have a unique static pattern in response to each fault. Distortion in these static patterns generates the probability of the presence of fault. The fault accommodation scheme is carried out using the Generalized Internal Model Control (GIMC architecture, also known as modified Youla parameterization. In addition, performance indices are also evaluated to indicate that the resulting fault tolerant scheme can detect, identify and accommodate actuator and sensor faults under additive faults. The DC motor example is considered for the demonstration of the proposed scheme.

Minupriya A

2012-06-01

387

Sensor and Sensorless Fault Tolerant Control for Induction Motors Using a Wavelet Index  

Directory of Open Access Journals (Sweden)

Full Text Available Fault Tolerant Control (FTC systems are crucial in industry to ensure safe and reliable operation, especially of motor drives. This paper proposes the use of multiple controllers for a FTC system of an induction motor drive, selected based on a switching mechanism. The system switches between sensor vector control, sensorless vector control, closed-loop voltage by frequency (V/f control and open loop V/f control. Vector control offers high performance, while V/f is a simple, low cost strategy with high speed and satisfactory performance. The faults dealt with are speed sensor failures, stator winding open circuits, shorts and minimum voltage faults. In the event of compound faults, a protection unit halts motor operation. The faults are detected using a wavelet index. For the sensorless vector control, a novel Boosted Model Reference Adaptive System (BMRAS to estimate the motor speed is presented, which reduces tuning time. Both simulation results and experimental results with an induction motor drive show the scheme to be a fast and effective one for fault detection, while the control methods transition smoothly and ensure the effectiveness of the FTC system. The system is also shown to be flexible, reverting rapidly back to the dominant controller if the motor returns to a healthy state.

Ammar Masaoud

2012-03-01

388

Thermal Fault Tolerance Analysis of Carbon Fiber Rope Barrier Systems for Use in the Reusable Solid Rocket Motor ( RSRM) Nozzle Joints  

Science.gov (United States)

Carbon Fiber Rope (CFR) thermal barrier systems are being considered for use in several RSRM (Reusable Solid Rocket Motor) nozzle joints as a replacement for the current assembly gap close-out process/design. This study provides for development and test verification of analysis methods used for flow-thermal modeling of a CFR thermal barrier subject to fault conditions such as rope combustion gas blow-by and CFR splice failure. Global model development is based on a 1-D (one dimensional) transient volume filling approach where the flow conditions are calculated as a function of internal 'pipe' and porous media 'Darcy' flow correlations. Combustion gas flow rates are calculated for the CFR on a per-linear inch basis and solved simultaneously with a detailed thermal-gas dynamic model of a local region of gas blow by (or splice fault). Effects of gas compressibility, friction and heat transfer are accounted for the model. Computational Fluid Dynamic (CFD) solutions of the fault regions are used to characterize the local flow field, quantify the amount of free jet spreading and assist in the determination of impingement film coefficients on the nozzle housings. Gas to wall heat transfer is simulated by a large thermal finite element grid of the local structure. The employed numerical technique loosely couples the FE (Finite Element) solution with the gas dynamics solution of the faulted region. All free constants that appear in the governing equations are calibrated by hot fire sub-scale test. The calibrated model is used to make flight predictions using motor aft end environments and timelines. Model results indicate that CFR barrier systems provide a near 'vented joint' style of pressurization. Hypothetical fault conditions considered in this study (blow by, splice defect) are relatively benign in terms of overall heating to nozzle metal housing structures.

Clayton, J. Louie; Phelps, Lisa (Technical Monitor)

2001-01-01

389

Arc fault detection system  

Energy Technology Data Exchange (ETDEWEB)

An arc fault detection system for use on ungrounded or high-resistance-grounded power distribution systems is provided which can be retrofitted outside electrical switchboard circuits having limited space constraints. The system includes a differential current relay that senses a current differential between current flowing from secondary windings located in a current transformer coupled to a power supply side of a switchboard, and a total current induced in secondary windings coupled to a load side of the switchboard. When such a current differential is experienced, a current travels through a operating coil of the differential current relay, which in turn opens an upstream circuit breaker located between the switchboard and a power supply to remove the supply of power to the switchboard.

Jha, Kamal N. (Bethel Park, PA)

1999-01-01

390

Realizing fault-tolerant interconnection networks via chaining  

Energy Technology Data Exchange (ETDEWEB)

A scheme applicable to a wide class of multistage interconnection networks to enhance their fault-tolerant capability is proposed. Multiple paths between each input-output pair of a network are created by connecting together switching elements within the same state. This scheme provides a network with alternative paths at every stage, requires a simple self-routing algorithm, and allows a network to become more robust as its size increases. An analysis is performed to obtain a quantitative measurement on the reliability improvement of the scheme.

Tzeng, N.F.; Yew, P.C.; Zhu, C.Q.

1988-04-01

391

Fault tolerant authenticated quantum direct communication immune to collective noises  

Science.gov (United States)

This study proposes two new coding functions for GHZ states and GHZ-like states, respectively. Based on these coding functions, two fault tolerant authenticated quantum direct communication (AQDC) protocols are proposed. Each of which is robust under one kind of collective noises: collective-dephasing noise and collective-rotation noise, respectively. Moreover, the proposed AQDC protocols enable a sender to send a secure as well as authenticated message to a receiver within only one step quantum transmission without using the classical channels.

Yang, Chun-Wei; Hwang, Tzonelih

2013-11-01

392

Fully distributed and fault tolerant task management based on diffusions  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The task management is a critical component for the computational grids. The aim is to assign tasks on nodes according to a global scheduling policy and a view of local resources of nodes. A peer-to-peer approach for the task management involves a better scalability for the grid and a higher fault tolerance. But some mechanisms have to be proposed to avoid the computation of replicated tasks that can reduce the efficiency and increase the load of nodes. In the same way, thes...

Bui, Alain; Flauzac, Olivier; Rabat, Cyril

2008-01-01

393

Catalysis and activation of magic states in fault tolerant architectures  

CERN Multimedia

In many architectures for fault tolerant quantum computing universality is achieved by a combination of Clifford group unitaries and preparation of suitable non-stabilizer states, the so-called magic states. Universality is possible even for some fairly noisy non-stabilizer states, as distillation can convert many copies into a purer magic state. Here we propose novel protocols that exploit multiple species of magic states in surprising ways. These protocols provide examples of previously unobserved phenomena that are analogous to catalysis and activation well known in entanglement theory.

Campbell, Earl T

2010-01-01

394

Data center networks topologies, architectures and fault-tolerance characteristics  

CERN Document Server

This SpringerBrief presents a survey of data center network designs and topologies and compares several properties in order to highlight their advantages and disadvantages. The brief also explores several routing protocols designed for these topologies and compares the basic algorithms to establish connections, the techniques used to gain better performance, and the mechanisms for fault-tolerance. Readers will be equipped to understand how current research on data center networks enables the design of future architectures that can improve performance and dependability of data centers. This con

Liu, Yang; Veeraraghavan, Malathi; Lin, Dong; Hamdi, Mounir

2013-01-01

395

Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessors  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Continued CMOS scaling is expected to make future micro-processors susceptible to transient faults, hard faults, manufacturing defects and process variations causing fault tolerance to become important even for general purpose processors targeted at the commodity market. Tomitigate the effect of decreased reliability, a number of fault-tolerant architectures have been proposed that exploit the natural coarse-grained redundancy available in chip multiprocessors (CMPs). These architectures exec...

Subramanyan, Pramod; Singh, Virendra; Saluja, Kewal K.; Larsson, Erik

2010-01-01

396

Scalable Fault-Tolerant Location Management Scheme for Mobile IP  

Directory of Open Access Journals (Sweden)

Full Text Available As the number of mobile nodes registering with a network rapidly increases in Mobile IP, multiple mobility (home of foreign agents can be allocated to a network in order to improve performance and availability. Previous fault tolerant schemes (denoted by PRT schemes to mask failures of the mobility agents use passive replication techniques. However, they result in high failure-free latency during registration process if the number of mobility agents in the same network increases, and force each mobility agent to manage bindings of all the mobile nodes registering with its network. In this paper, we present a new fault-tolerant scheme (denoted by CML scheme using checkpointing and message logging techniques. The CML scheme achieves low failure-free latency even if the number of mobility agents in a network increases, and improves scalability to a large number of mobile nodes registering with each network compared with the PRT schemes. Additionally, the CML scheme allows each failed mobility agent to recover bindings of the mobile nodes registering with the mobility agent when it is repaired even if all the other mobility agents in the same network concurrently fail.

JinHo Ahn

2001-11-01

397

Fully distributed and fault tolerant task management based on diffusions  

CERN Document Server

The task management is a critical component for the computational grids. The aim is to assign tasks on nodes according to a global scheduling policy and a view of local resources of nodes. A peer-to-peer approach for the task management involves a better scalability for the grid and a higher fault tolerance. But some mechanisms have to be proposed to avoid the computation of replicated tasks that can reduce the efficiency and increase the load of nodes. In the same way, these mechanisms have to limit the number of exchanged messages to avoid the overload of the network. In a previous paper, we have proposed two methods for the task management called active and passive. These methods are based on a random walk: they are fully distributed and fault tolerant. Each node owns a local tasks states set updated thanks to a random walk and each node is in charge of the local assignment. Here, we propose three methods to improve the efficiency of the active method. These new methods are based on a circulating word. The...

Bui, Alain; Rabat, Cyril

2008-01-01

398

2009 fault tolerance for extreme-scale computing workshop, Albuquerque, NM - March 19-20, 2009.  

Energy Technology Data Exchange (ETDEWEB)

This is a report on the third in a series of petascale workshops co-sponsored by Blue Waters and TeraGrid to address challenges and opportunities for making effective use of emerging extreme-scale computing. This workshop was held to discuss fault tolerance on large systems for running large, possibly long-running applications. The main point of the workshop was to have systems people, middleware people (including fault-tolerance experts), and applications people talk about the issues and figure out what needs to be done, mostly at the middleware and application levels, to run such applications on the emerging petascale systems, without having faults cause large numbers of application failures. The workshop found that there is considerable interest in fault tolerance, resilience, and reliability of high-performance computing (HPC) systems in general, at all levels of HPC. The only way to recover from faults is through the use of some redundancy, either in space or in time. Redundancy in time, in the form of writing checkpoints to disk and restarting at the most recent checkpoint after a fault that cause an application to crash/halt, is the most common tool used in applications today, but there are questions about how long this can continue to be a good solution as systems and memories grow faster than I/O bandwidth to disk. There is interest in both modifications to this, such as checkpoints to memory, partial checkpoints, and message logging, and alternative ideas, such as in-memory recovery using residues. We believe that systematic exploration of these ideas holds the most promise for the scientific applications community. Fault tolerance has been an issue of discussion in the HPC community for at least the past 10 years; but much like other issues, the community has managed to put off addressing it during this period. There is a growing recognition that as systems continue to grow to petascale and beyond, the field is approaching the point where we don't have any choice but to address this through R&D efforts.

Katz, D. S.; Daly, J.; DeBardeleben, N.; Elnozahy, M.; Kramer, B.; Lathrop, S.; Nystrom, N.; Milfeld, K.; Sanielevici, S.; Scott, S.; Votta, L.; Louisiana State Univ.; Center for Exceptional Computing; LANL; IBM; Univ. of Illinois; Shodor Foundation; Pittsburgh Supercomputer Center; Texas Advanced Computing Center; ORNL; Sun Microsystems

2009-02-01

399

Pseudothreshold or threshold? - More realistic threshold estimates for fault-tolerant quantum computing  

CERN Document Server

An arbitrarily reliable quantum computer can be efficiently constructed from noisy components using a recursive simulation procedure, provided that those components fail with probability less than the fault-tolerance threshold. Recent estimates of the threshold are n