WorldWideScience
1

Fault tolerant computing systems  

International Nuclear Information System (INIS)

Fault tolerance involves the provision of strategies for error detection damage assessment, fault treatment and error recovery. A survey is given of the different sorts of strategies used in highly reliable computing systems, together with an outline of recent research on the problems of providing fault tolerance in parallel and distributed computing systems. (orig.)

2

Fault-Tolerant UAV Flight Control System  

OpenAIRE

The main focus of this master?s thesis is fault-tolerant control systems (FTCSs) for unmanned aerial vehicles (UAVs). The goals are to develop an automatic-flight control system (AFCS) with fault detection and isolation (FDI) and a reconfiguration mechanism for accommodation of faults. The literature study reviews methods for fault-tolerant control and also discusses important faults and failures related to UAVs.The FTCS is implemented in MATLAB Simulink with a nonlinear model of the Ces...

Dybsjord, Kerrin Andre

2013-01-01

3

Reconfigurable fault tolerant avionics system  

Science.gov (United States)

This paper presents the design of a reconfigurable avionics system based on modern Static Random Access Memory (SRAM)-based Field Programmable Gate Array (FPGA) to be used in future generations of nano satellites. A major concern in satellite systems and especially nano satellites is to build robust systems with low-power consumption profiles. The system is designed to be flexible by providing the capability of reconfiguring itself based on its orbital position. As Single Event Upsets (SEU) do not have the same severity and intensity in all orbital locations, having the maximum at the South Atlantic Anomaly (SAA) and the polar cusps, the system does not have to be fully protected all the time in its orbit. An acceptable level of protection against high-energy cosmic rays and charged particles roaming in space is provided within the majority of the orbit through software fault tolerance. Check pointing and roll back, besides control flow assertions, is used for that level of protection. In the minority part of the orbit where severe SEUs are expected to exist, a reconfiguration for the system FPGA is initiated where the processor systems are triplicated and protection through Triple Modular Redundancy (TMR) with feedback is provided. This technique of reconfiguring the system as per the level of the threat expected from SEU-induced faults helps in reducing the average dynamic power consumption of the system to one-third of its maximum. This technique can be viewed as a smart protection through system reconfiguration. The system is built on the commercial version of the (XC5VLX50) Xilinx Virtex5 FPGA on bulk silicon with 324 IO. Simulations of orbit SEU rates were carried out using the SPENVIS web-based software package.

Ibrahim, M. M.; Asami, K.; Cho, Mengu

4

Software fault tolerance in computer operating systems  

Science.gov (United States)

This chapter provides data and analysis of the dependability and fault tolerance for three operating systems: the Tandem/GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Based on measurements from these systems, basic software error characteristics are investigated. Fault tolerance in operating systems resulting from the use of process pairs and recovery routines is evaluated. Two levels of models are developed to analyze error and recovery processes inside an operating system and interactions among multiple instances of an operating system running in a distributed environment. The measurements show that the use of process pairs in Tandem systems, which was originally intended for tolerating hardware faults, allows the system to tolerate about 70% of defects in system software that result in processor failures. The loose coupling between processors which results in the backup execution (the processor state and the sequence of events occurring) being different from the original execution is a major reason for the measured software fault tolerance. The IBM/MVS system fault tolerance almost doubles when recovery routines are provided, in comparison to the case in which no recovery routines are available. However, even when recovery routines are provided, there is almost a 50% chance of system failure when critical system jobs are involved.

Iyer, Ravishankar K.; Lee, Inhwan

1994-01-01

5

Energy-efficient fault-tolerant systems  

CERN Document Server

This book describes the state-of-the-art in energy efficient, fault-tolerant embedded systems. It covers the entire product lifecycle of electronic systems design, analysis and testing and includes discussion of both circuit and system-level approaches. Readers will be enabled to meet the conflicting design objectives of energy efficiency and fault-tolerance for reliability, given the up-to-date techniques presented.

Mathew, Jimson; Pradhan, Dhiraj K

2013-01-01

6

Approaches differ for fault-tolerant systems  

Energy Technology Data Exchange (ETDEWEB)

Efforts to provide fault-tolerant computer systems focus on two primary architectures: redundant hardware executing different tasks and parallel processors operating on the same set of data and instructions. Parallel processing is the approach favored by August systems (Tigard, Oregon), Hewlett-Packard (Palo Alto, California), Parallel Computers (Santa Cruz, California), Stratus Computers (Natick, Massachuetts) and Tandem Computers (Cupertino, California). Multiple redundant system elements can be found in implementations from Auragen Systems (Fort Lee, New Jersey), and Tolerant Systems (Milpitas, California). Critical differences between the two approaches are the ability to recover from errors in real time as well as the degree of fault tolerance implemented in hardware and software.

Aseo, J.

1983-09-01

7

Software engineering of fault tolerant systems  

CERN Document Server

In architecting dependable systems, what is required to improve the overall system robustness is fault tolerance. Many methods have been proposed to this end, the solutions are usually considered late during the design and implementation phases of the software life-cycle (e.g., Java and Windows NT exception handling), thus reducing the effectiveness error and fault handling. Since the system design typically models only normal behaviour of the system while ignoring exceptional ones, the implementation of the system is unable to handle abnormal events. Consequently, the system may fail in unexp

Pelliccione, P; Muccini, Henry

2007-01-01

8

Fault tolerant aggregation for power system services  

DEFF Research Database (Denmark)

Exploiting the flexibility in distributed energy resources (DER) is seen as an important contribution to allow high penetrations of renewable generation in electrical power systems. However, the present control infrastructure in power systems is not well suited for the integration of a very large number of small units. A common approach is to aggregate a portfolio of such units together and expose them to the power system as a single large virtual unit. In order to realize the vision of a Smart Grid, concepts for flexible, resilient and reliable aggregation infrastructures are required. This paper presents such a concept while focusing on the aspect of resilience and fault tolerance. The proposed concept makes use of a multi-level election algorithm to transparently manage the addition, removal, failure and reorganization of units. It has been implemented and tested as a proof-of-concept on the distributed smart grid test bed SYSLAB at the Technical University of Denmark.

Kosek, Anna Magdalena; Gehrke, Oliver

2013-01-01

9

Comparing Distributed Online Stream Processing Systems Considering Fault Tolerance Issues  

Directory of Open Access Journals (Sweden)

Full Text Available This paper presents an analysis of four online stream processing systems (MillWheel, S4, Spark Streaming and Storm regarding the strategies they use for fault tolerance. We use this sort of system for processing of data streams that can come from different sources such as web sites, sensors, mobile phones or any set of devices that provide real-time high-speed data. Typically, these systems are concerned more with the throughput in data processing than on fault tolerance. However, depending on the type of application, we should consider fault tolerance as an important a feature. The work describes some of the main strategies for fault tolerance – replication components, upstream backup, checkpoint and recovery – and shows how each of the four systems uses these strategies. In the end, the paper discusses the advantages and disadvantages of the combination of the strategies for fault tolerance in these systems.

André Leon Sampaio Gradvohl

2014-05-01

10

On Methods for the Formal Specification of Fault Tolerant Systems  

OpenAIRE

This paper introduces different views for understanding problems and faults with the goal of defining a method for the formal specification of systems. The idea of Layered Fault Tolerant Specification (LFTS) is proposed to make the method extensible to fault tolerant systems. The principle is layering the specification in different levels, the first one for the normal behavior and the others for the abnormal. The abnormal behavior is described in terms of an Error Injector (...

Mazzara, Manuel

2012-01-01

11

From fault classification to fault tolerance for multi-agent systems  

CERN Document Server

Faults are a concern for Multi-Agent Systems (MAS) designers, especially if the MAS are built for industrial or military use because there must be some guarantee of dependability. Some fault classification exists for classical systems, and is used to define faults. When dependability is at stake, such fault classification may be used from the beginning of the system's conception to define fault classes and specify which types of faults are expected. Thus, one may want to use fault classification for MAS; however, From Fault Classification to Fault Tolerance for Multi-Agent Systems argues that

Potiron, Katia; Taillibert, Patrick

2013-01-01

12

Fault-Tolerant Onboard Monitoring and Decision Support Systems  

DEFF Research Database (Denmark)

The purpose of this research project is to improve current onboard decision support systems. Special focus is on the onboard prediction of the instantaneous sea state. In this project a new approach to increasing the overall reliability of a monitoring and decision support system has been established. The basic idea is to convert the given system into a fault-tolerant system and to improve multi-sensor data fusion for the particular system. The background of the project is the SeaSense system, which has been installed on several container ships and navy vessels. The SeaSense system provides a crude and simple estimation of the actual sea state (Hs and Tz), information about the longitudinal hull girder loading, seakeeping performance of the ship, and decision support on how to operate the ship within acceptable limits. The system is able to identify critical forthcoming events and to give advice regarding speed and course changes to decrease the wave-induced loads. The SeaSense system is based on the combineduse of a mathematical model and measurements from a set of sensors. The overall dependability of a shipboard monitoring and decision support system such as the SeaSense system can be improved using fault-tolerant techniques (Fault Diagnosis and System Re-design) and a Sensor Fusion Quality (SFQ) test. Fault diagnosis means to detect the presence of faults in the system. In case sea state estimation is conducted by a ship-wave buoy analogy the best solution is achieved when a set of three different ship responses are used. Faulty signals should be discarded from the procedure for sea state estimation if it is possible, if not the fault should be estimated. The fault diagnosis can be divided into three steps: Fault detection, fault isolation and fault estimation. Fault detection means to decide whether or not a fault has occurred. This step determines the time at which the system is subjected to the given fault. Fault isolation will find in which component a fault has occurred. This step determines the location of the fault. Fault estimation provides an estimate of magnitude of a fault. A supervisory function determines the severity of the fault once its origin has been isolated and its magnitude estimated. Fault-tolerant Sensor Fusion means that the monitoring and decision support system can accommodate faults so that the overall system continues to satisfy its goal and on the other hand in the absence of a fault, the system should be able to provide the most accurate information using the SFQ test.

Lajic, Zoran

2010-01-01

13

Fault tolerant oxygen control of a diesel engine air system  

OpenAIRE

This paper is devoted to the fault tolerant control problem of a Diesel engine air system having a jammed Exhaust Gas Recirculation (EGR) valve. The fault tolerant control is based on replaning the trajectory in order to track a new controlled variable which is the oxygen concentration in the intake manifold instead of the fresh air mass flow. The trajectory planning is based on an inverse model approach, utilizing the fundamental thermodynamic relations of the air system.

Nitsche, Rainer; Bitzer, Matthias; El Khaldi, Mahmoud; Bloch, Ge?rard

2010-01-01

14

Fault tolerant tracking control for continuous Takagi-Sugeno systems with time varying faults  

OpenAIRE

This paper deals with Fault Tolerant Control design for continuous nonlinear Takagi-Sugeno faulty systems. The goal is to ensure both state and fault estimation and the state reference tracking even if faults occur. In this study, the faults affecting the system behavior are considered as time varying functions modeled by exponential functions or first order polynomials. Based on descriptor redundancy property, solutions are proposed for both cases, exponential and polyno- mial faults, in ter...

Bouarar, Tahar; Marx, Benoi?t; Maquin, Didier; Ragot, Jose?

2011-01-01

15

H infinity Integrated Fault Estimation and Fault Tolerant Control of Discrete-time Piecewise Linear Systems  

DEFF Research Database (Denmark)

In this paper we consider the problem of fault estimation and accommodation for discrete time piecewise linear systems. A robust fault estimator is designed to estimate the fault such that the estimation error converges to zero and H? performance of the fault estimation is minimized. Then, the estimate of fault is used to compensate for the effect of the fault. Hence, using the estimate of fault, a fault tolerant controller using a piecewise linear static output feedback is designed such that it stabilizes the system and provides an upper bound on the H? performance of the faulty system. Sufficient conditions for the existence of robust fault estimator and fault tolerant controller are derived in terms of linear matrix inequalities. Upper bounds on the H? performance can be minimized by solving convex optimization problems with linear matrix inequality constraints. The efficiency of the method is demonstrated by means of a numerical example.

Tabatabaeipour, Seyed Mojtaba; Bak, Thomas

2012-01-01

16

Measurement and analysis of operating system fault tolerance  

Science.gov (United States)

This paper demonstrates a methodology to model and evaluate the fault tolerance characteristics of operational software. The methodology is illustrated through case studies on three different operating systems: the Tandem GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Measurements are made on these systems for substantial periods to collect software error and recovery data. In addition to investigating basic dependability characteristics such as major software problems and error distributions, we develop two levels of models to describe error and recovery processes inside an operating system and on multiple instances of an operating system running in a distributed environment. Based on the models, reward analysis is conducted to evaluate the loss of service due to software errors and the effect of the fault-tolerance techniques implemented in the systems. Software error correlation in multicomputer systems is also investigated.

Lee, I.; Tang, D.; Iyer, R. K.

1992-01-01

17

Fault tolerant hypercube computer system architecture  

Science.gov (United States)

A fault-tolerant multiprocessor computer system of the hypercube type comprising a hierarchy of computers of like kind which can be functionally substituted for one another as necessary is disclosed. Communication between the working nodes is via one communications network while communications between the working nodes and watch dog nodes and load balancing nodes higher in the structure is via another communications network separate from the first. A typical branch of the hierarchy reporting to a master node or host computer comprises, a plurality of first computing nodes; a first network of message conducting paths for interconnecting the first computing nodes as a hypercube. The first network provides a path for message transfer between the first computing nodes; a first watch dog node; and a second network of message connecting paths for connecting the first computing nodes to the first watch dog node independent from the first network, the second network provides an independent path for test message and reconfiguration affecting transfers between the first computing nodes and the first switch watch dog node. There is additionally, a plurality of second computing nodes; a third network of message conducting paths for interconnecting the second computing nodes as a hypercube. The third network provides a path for message transfer between the second computing nodes; a fourth network of message conducting paths for connecting the second computing nodes to the first watch dog node independent from the third network. The fourth network provides an independent path for test message and reconfiguration affecting transfers between the second computing nodes and the first watch dog node; and a first multiplexer disposed between the first watch dog node and the second and fourth networks for allowing the first watch dog node to selectively communicate with individual ones of the computing nodes through the second and fourth networks; as well as, a second watch dog node operably connected to the first multiplexer whereby the second watch dog node can selectively communicate with individual ones of the computing nodes through the second and fourth networks. The branch is completed by a first load balancing node; and a second multiplexer connected between the first load balancing node and the first and second watch dog nodes, allowing the first load balancing node to selectively communicate with the first and second watch dog nodes.

Madan, Herb S. (inventor); Chow, Edward (inventor)

1989-01-01

18

Passive Fault-tolerant Control of Discrete-time Piecewise Affine Systems against Actuator Faults  

DEFF Research Database (Denmark)

In this paper, we propose a new method for passive fault-tolerant control of discrete time piecewise affine systems. Actuator faults are considered. A reliable piecewise linear quadratic regulator (LQR) state feedback is designed such that it can tolerate actuator faults. A sufficient condition for the exis- tence of a passive fault-tolerant controller is derived and formulated as the feasibility of a set of linear matrix inequalities (LMIs). The upper bound on the performance cost can be minimized using a convex optimization problem with LMI constraints which can be solved efficiently. The approach is illustrated on a numerical example and a two degree of freedom helicopter.

Tabatabaeipour, Seyed Mojtaba; Izadi-Zamanabadi, Roozbeh

2012-01-01

19

Synthesizing Fault Tolerant Safety Critical Systems  

OpenAIRE

To keep pace with today’s nanotechnology, safety critical embedded systems are becoming less tolerant to errors. Research into techniques to cope with errors in these systems has mostly focused on transformational approach, replication of hardware devices, parallel program design, component based design and/or information redundancy. It would be better to tackle the issue early in the design process that a safety critical system never fails to satisfy its strict dependability requiremen...

Seemanta Saha; Muhammad Sheikh Sadi

2014-01-01

20

Data-driven design of fault diagnosis and fault-tolerant control systems  

CERN Document Server

Data-driven Design of Fault Diagnosis and Fault-tolerant Control Systems presents basic statistical process monitoring, fault diagnosis, and control methods, and introduces advanced data-driven schemes for the design of fault diagnosis and fault-tolerant control systems catering to the needs of dynamic industrial processes. With ever increasing demands for reliability, availability and safety in technical processes and assets, process monitoring and fault-tolerance have become important issues surrounding the design of automatic control systems. This text shows the reader how, thanks to the rapid development of information technology, key techniques of data-driven and statistical process monitoring and control can now become widely used in industrial practice to address these issues. To allow for self-contained study and facilitate implementation in real applications, important mathematical and control theoretical knowledge and tools are included in this book. Major schemes are presented in algorithm form and...

Ding, Steven X

2014-01-01

21

Summarize of Electric Vehicle Electric System Fault and Fault-tolerant Technology  

OpenAIRE

Electric vehicle drive system is a multi-variable function, running environment complexed and changeable system, so it’s failure form is complicated. In this paper, according to the fault happens in different position, establish vehicle fault table, analyze the consequences of failure may cause and the causes of failure. Combined with hardware limitations, and the maximum guarantee system performance requirements, passive software redundancy fault-tolerant strategy is put forward, give an e...

Zhang Liwei; Huang Xianjin; Yang Yannan; Xu Chen; Liu Jie

2013-01-01

22

Design a Fault Tolerance for Real Time Distributed System  

OpenAIRE

This paper designed a fault tolerance for soft real time distributed system (FTRTDS). This system is designed to be independently on specific mechanisms and facilities of the underlying real time distributed system. It is designed to be distributed on all the computers in the distributed system and controlled by a central unit.Besides gathering information about a target program spontaneously, it provides information about the target operating system and the target hardware in order to diagno...

Khammas, Ban M.

2012-01-01

23

Trends in reliability modeling technology for fault tolerant systems  

Science.gov (United States)

Developments in reliability modeling for large fault tolerant avionic computing systems are presented. Issues of state size and complexity, fault coverage, and practical computation are addressed. A two-fold developmental effort is described based on the structural and fault coverage modeling approaches. A technique which was successfully applied to an 865 state pure death stationary Markov model is presented. Of particular interest is a short computer program which executes very quickly to produce reliability results of a large state space model. This model also incorporates fault coverage states for processor, memory, and bus line replaceable units. A second structural reliability modeling scheme is aimed at solving nonstationary Markov models. This technique provides the tool required for studying the reliability of systems with nonconstant failure rates and includes intermittent/transient faults, electronic hardware which exhibits decreasing failure rates, and hydromechanical devices which typically have wearout failure mechanisms. Several aspects of fault coverage, including modeling and data measurement of intermittent/transient faults and latent faults, are elucidated and illustrated. The CARE II (computer-aided reliability estimation) coverage is presented and shortcomings to be eliminated are discussed.

Bavuso, S. J.

1979-01-01

24

Fault-Tolerant Control of a Distributed Database System  

OpenAIRE

Optimal state information-based control policy for a distributed database system subject to server failures is considered. Fault-tolerance is made possible by the partitioned architecture of the system and data redundancy therein. Control actions include restoration of lost data sets in a single server using redundant data sets in the remaining servers, routing of queries to intact servers, or overhaul of the entire system for renewal. Control policies are determined by solving Markov decisio...

Eva Wu, N.; Ruschmann, Matthew C.; Linderman, Mark H.

2008-01-01

25

Piecewise Sliding Mode Decoupling Fault Tolerant Control System  

OpenAIRE

Problem statement: Proposed method in the present study could deal with fault tolerant control system by using the so called decentralized control theory with decoupling fashion sliding mode control, dealing with subsystems instead of whole system and to the knowledge of the author there is no known computational algorithm for decentralized case, Approach: In this study we present a decoupling strategy based on the selection of sliding surface, which should be in piecewise sliding surface par...

Rafi Youssef; Hui Peng

2010-01-01

26

Multilevel Gain Cell Arrays for Fault-Tolerant VLSI Systems  

OpenAIRE

Embedded memories dominate area, power and cost of modern very large scale integrated circuits system on chips ( VLSI SoCs). Furthermore, due to process variations, it becomes challenging to design reliable energy efficient systems. Therefore, fault-tolerant designs will be area efficient, cost effective and have low power consumption. The idea of this project is to design embedded memories where reliability is intentionally compromised to increase storage density. Gain cell memories are smal...

Khalid, Muhammad Umer

2011-01-01

27

Evaluation and Checkpointing of Fault Tolerant Mobile Agents Execution in Distributed Systems  

OpenAIRE

The reliable execution of a mobile agent is a very important design issue to build a mobile agent system and many fault-tolerant schemes have been proposed. Hence, in this paper, we present evaluation of the performance of the fault-tolerant schemes for the mobile agent environment. Our evaluation focuses on the checkpointing schemes and deals with the cooperating agents. We derive the FANTOMAS (Fault-Tolerant approach for Mobile Agents) design which offers a user transparent fault tolerance ...

Hodjatollah Hamidi; Abbas Vafaei; Seyed Amirhassan Monadjemi

2010-01-01

28

Fault tolerant control for nonlinear systems subject to different types of sensor faults  

OpenAIRE

This paper deals with the problem of fault tolerant control of nonlinear systems represented by Takagi-Sugeno models subject to sensor faults. Observer based controllers are designed for each faulty-situation (mode). The classical switching law is replaced by a new mechanism which avoid the switching phenomenon. The purpose is to be able to study the stability of the global closed-loop system. This new mechanism uses the residual signals obtained by a residual generator. A bank of observers i...

Ichalal, Dalil; Marx, Benoi?t; Maquin, Didier; Ragot, Jose?

2011-01-01

29

Summarize of Electric Vehicle Electric System Fault and Fault-tolerant Technology  

Directory of Open Access Journals (Sweden)

Full Text Available Electric vehicle drive system is a multi-variable function, running environment complexed and changeable system, so it’s failure form is complicated. In this paper, according to the fault happens in different position, establish vehicle fault table, analyze the consequences of failure may cause and the causes of failure. Combined with hardware limitations, and the maximum guarantee system performance requirements, passive software redundancy fault-tolerant strategy is put forward, give an example to analysis the pros and cons of this method.

Zhang Liwei

2013-09-01

30

Fault-tolerant Supervisory Control : System Analysis and Logic Design  

DEFF Research Database (Denmark)

The main purpose of this work has been to achieve active fault-tolerance in control systems, defined as a methodology where fault detection and isolation techniques are combined with supervisory control to achieve autonomous accommodation of faults before they develop into failures. The aim of this work has been to develop and employ concepts and methods that are suitable for use in different automation processes, with applicability in various industrial fields. The requirements for high productivity and quality has resulted in employing additional instrumentation and use of more sophisticated control algorithms. The drawback is, however, that these control systems have become more vulnerable to even simple faults in instrumentation. On the other hand, due to cost-optimality requirements, an extensive use of hardware redundancy has been prohibited. Nevertheless, the dependency and availability could be increased through enhancing control systems' ability to on-line perform fault detection and reconfiguration when a fault occurs and before a safety system shuts-down the entire process. The main contributions of this research effort are development and experimentation with methodologies for systematic analysis of reconfiguration and design of supervisor logic. In addition, useful experience is obtained through implementation of a fault-tolerant control scheme against a simulated ship and its propulsion system. A development methodology, which was suggested in the Control Engineering Department, is extended to cope with the important reconfiguration problem. In order to enable a designer to acquire knowledge about reconfiguration possibilities, the structural analysis method is added as an extension to the existing methodology. This extension builds upon the earlier method where fault propagation and severity analysis are the essential parts. Structural analysis (SA) enables the designer to distinguish between the parts of the systems with no redundant information and the parts with possible redundant information. This method, hence, provides the designer with information, which is necessary during the selection of remedial actions. Furthermore, it is shown how sensor information fusion is obtained by using the SA method. The construction of the supervisor's decision logic is essential for the active form of fault-tolerant control. In this regard, two approaches has been presented. The first aims at constructing the decision logic in form of a ``language''. This language is obtained as a direct result of the component based approach, presented in this thesis. This approach is based on the definition of a functional component, components placement in a control system hierarchy and the definition of system level hierarchy. The supervisor language includes all valid strings, representing the combination of valid components, that keep the system functional. This approach is simple and can be automated. In the second approach, implementation of supervisor functionality is realized on the basis of an extension to the traditional state-event machines. Due to parallelity (inherent modularity) the supervisor logic is more easily modified, updated, maintained, and tested. A salient feature is that a change in one task only necessitates redesign of essentially one corresponding state-event machine (SEM). A heuristic guideline is provided for designing the logic in form of SEMs. A ship propulsion system benchmark has been designed and used as a case study. This includes experimentation with the above methodologies and implementation of a fault-tolerant control against the simulation. Four generic faults have been considered. It has been shown how the SA method is easily employed to generate analytical redundancy relations, which in turn are then used for FDI purposes. Three different methods are used to generate residuals. These methods are: simple numerical calculation, a non-linear observer, and a Neuro-Fuzzy method. Employment of each method follows the assumption about the available system information. The results show that it is p

Izadi-Zamanabadi, Roozbeh

1999-01-01

31

Transient Fault Tolerance and System Safety Enhancement Based on System Theory  

OpenAIRE

Transient faults are hard to be detected and located due to their unpredictable nature and short duration, and they are the dominant causations of system failures, which makes it necessary to consider transient fault-tolerant design in the development of modern safety-critical industrial system. In this paper an approach based on system theory is proposed to tolerate the transient faults in tunnel construction wireless monitoring and control systems (TCWMCS), in which the effects of transient...

Xiongfeng Huang; Chunjie Zhou; Yuanqing Qin; Ye Wang; Mingyue Yang

2011-01-01

32

System Diagnosis and Fault Tolerance for Distributed Computing System: A Review  

Directory of Open Access Journals (Sweden)

Full Text Available An adaptive system diagnosis fault tolerance method for distributed system. The system is comprised of a network including N nodes where N is integer and greater than equal to 3 and each node is able to execute an algorithm to communicate with the network. A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information. As computer network is a collection of hardware components it is very often that is may have some fault either in the hardware or in the software of the entire network. So to deal with these kinds of faults either hardware of software, some fault diagnosis and fault tolerance mechanism to be implemented for the proper functioning of the system. For such a fault detection and fault tolerant mechanism is to be discussed in this paper. What kind of fault and how they occur will discuss and try to find out some suitable solution of our proposed problem. Various fault detecting mechanism and fault tolerant methodology to be study here and the main goal of the study is to find out some automatic fault detection and fault tolerance techniques

Nilotpal Baruah

2013-10-01

33

A Ship Propulsion System Model for Fault-tolerant Control  

DEFF Research Database (Denmark)

This report presents a propulsion system model for a low speed marine vehicle, which can be used as a test benchmark for Fault-Tolerant Control purposes. The benchmark serves the purpose of offering realistic and challenging problems relevant in both FDI and (autonomous) supervisory control area. The propulsion system model is presented in two versions: the first one consists of one engine and one propeller, and the othe one consists of two engines and their corresponding propellers placed in parallel in the ship. The corresponding programs are developed and are available.

Izadi-Zamanabadi, Roozbeh; Blanke, M.

1998-01-01

34

Fault Tolerant Software: a Multi Agent System Solution  

DEFF Research Database (Denmark)

Development of high dependable systems remains a labour intensive task. This paper explores recent advances on the adaptation of the software agent architecture for control application while looking to dependability issues. Multiple agent systems theory will be reviewed giving methods to supervise it. Software ageing is shown to be the most common problem and rejuvenation its counteract. The paper will show how an agent population can be monitored, faulty agents isolated and reloaded in a healthy state, hence rejuvenated. The aim is to propose an architecture as basis for the design of control software able to tolerate faults and residual bugs without the need of maintenance stops.

Caponetti, Fabio; Bergantino, Nicola

2009-01-01

35

Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 2: Army fault tolerant architecture design and analysis  

Science.gov (United States)

Described here is the Army Fault Tolerant Architecture (AFTA) hardware architecture and components and the operating system. The architectural and operational theory of the AFTA Fault Tolerant Data Bus is discussed. The test and maintenance strategy developed for use in fielded AFTA installations is presented. An approach to be used in reducing the probability of AFTA failure due to common mode faults is described. Analytical models for AFTA performance, reliability, availability, life cycle cost, weight, power, and volume are developed. An approach is presented for using VHSIC Hardware Description Language (VHDL) to describe and design AFTA's developmental hardware. A plan is described for verifying and validating key AFTA concepts during the Dem/Val phase. Analytical models and partial mission requirements are used to generate AFTA configurations for the TF/TA/NOE and Ground Vehicle missions.

Harper, R. E.; Alger, L. S.; Babikyan, C. A.; Butler, B. P.; Friend, S. A.; Ganska, R. J.; Lala, J. H.; Masotto, T. K.; Meyer, A. J.; Morton, D. P.

1992-01-01

36

Diagnostic software and fault tolerant microprocessor based system architectures  

International Nuclear Information System (INIS)

In numerous industrial applications including power generation, the availability of electronic systems to perform the tasks assigned has become a major issue. At the same time, the functional complexity of these systems has increased enormously. Fortunately, the arrival of cost effective microprocessor based hardware has given the system designer a cadre of techniques to ensure the desired degree of system integrity and availability. These include: dynamic redundancy, isolation, functional diversity, built-in self-tests, embedded test subsystems, communications, error checking and error correcting codes, etc. The choice among the available techniques is generally heuristic and depends greatly on the structure of major components and systems external to the electronic system itself as well as the postulated faults and their relative frequency. Indiscriminate use of these techniques will inevitably increase cost and reduce maintainability while actually reducing system availability and reliability. The issues and the application of these techniques are discussed by describing recent examples of fault tolerant microprocessor based system architectures which include the Plant Safety Monitoring System, the EAGLE-21 Process Protection System and the Advanced Rod Position Indication System for pressurized water reactors. Each of these systems utilize unique internal architectures that address the reliability, availability, and the communications issues while improving maintainability and man-machine interfaces

37

Model Driven Configuration of Fault Tolerance Solutions for Component-Based Software System  

OpenAIRE

Fault tolerance is very important for complex component-based software systems, but its configuration is complicated and challenging. In this paper, we propose a model driven approach to semi-automatic configuration of fault tolerance solutions. At design time, a set of reusable fault tolerance solu-tions are modeled as architecture styles, with the key properties verified by model checking. At runtime, the runtime software architecture of the target sys-tem is automatically constructed by th...

Wu, Yihan; Huang, Gang; Song, Hui; Zhang, Ying

2012-01-01

38

Aspect-oriented fault tolerance for real-time embedded systems  

OpenAIRE

Real-time embedded systems for safety-critical applications have to introduce fault tolerance mechanisms in order to cope with hardware and software errors. Fault tolerance is usually applied by means of redundancy and diversity. Redundant hardware implies the establishment of a distributed system executing a set of fault tolerance strategies by software, and may also employ some form of diversity, by using different variants or versions for the same processing. This paper describes our ap...

Afonso, Francisco; Silva, Carlos A.; Brito, Nuno; Montenegro, Se?rgio; Tavares, Adriano

2008-01-01

39

Fault Tolerance in Real Time Multiprocessors - Embedded Systems  

CERN Document Server

All real time tasks which are termed as critical tasks by nature have to complete its execution before its deadline, even in presence of faults. The most popularly used real time task assignment algorithms are First Fit (FF), Best Fit (BF), Bin Packing (BP).The common task scheduling algorithms are Rate Monotonic (RM), Earliest Deadline First (EDF) etc.All the current approaches deal with either fault tolerance or criticality in real time. In this paper we have proposed an integrated approach with a new algorithm, called SASA (Sorting And Sequential Assignment) which maps the real time task assignment with task schedule and fault tolerance

Persya, A Christy

2010-01-01

40

Design and analysis of reliable and fault-tolerant computer systems  

CERN Document Server

Covering both the theoretical and practical aspects of fault-tolerant mobile systems, and fault tolerance and analysis, this book tackles the current issues of reliability-based optimization of computer networks, fault-tolerant mobile systems, and fault tolerance and reliability of high speed and hierarchical networks.The book is divided into six parts to facilitate coverage of the material by course instructors and computer systems professionals. The sequence of chapters in each part ensures the gradual coverage of issues from the basics to the most recent developments. A useful set of refere

Abd-El-Barr, Mostafa

2006-01-01

41

Piecewise Sliding Mode Decoupling Fault Tolerant Control System  

Directory of Open Access Journals (Sweden)

Full Text Available Problem statement: Proposed method in the present study could deal with fault tolerant control system by using the so called decentralized control theory with decoupling fashion sliding mode control, dealing with subsystems instead of whole system and to the knowledge of the author there is no known computational algorithm for decentralized case, Approach: In this study we present a decoupling strategy based on the selection of sliding surface, which should be in piecewise sliding surface partition to apply the PwLTool which have as purpose in our case to delimit regions where sliding mode occur, after that as Results: We get a simple linearized model selected in those regions which could depict the complex system, Conclusion: With the 3 water tank level system as example we implement this new design scenario and since we are interested in networked control system we believe that this kind of controller implementation will not be affected by network delays.

Rafi Youssef

2010-01-01

42

Modeling the Fault Tolerant Capability of a Flight Control System: An Exercise in SCR Specification  

Science.gov (United States)

In life-critical and mission-critical applications, it is important to make provisions for a wide range of contingencies, by providing means for fault tolerance. In this paper, we discuss the specification of a flight control system that is fault tolerant with respect to sensor faults. Redundancy is provided by analytical relations that hold between sensor readings; depending on the conditions, this redundancy can be used to detect, identify and accommodate sensor faults.

Alexander, Chris; Cortellessa, Vittorio; DelGobbo, Diego; Mili, Ali; Napolitano, Marcello

2000-01-01

43

Fault-Tolerant Control using Adaptive Time-Frequency Method in Bearing Fault Detection for DFIG Wind Energy System  

OpenAIRE

With the advances of power electronic technology, doubly-fed induction generators (DFIG) have increasingly drawn the interest of wind turbine industries. To ensure the reliable operation and power quality of wind power systems, the fault-tolerant control for DFIG is studied in this paper. The fault-tolerant controller is design to maintain acceptable performance during bearing fault condition. Based on measured motor currents data, an adaptive statistical time-frequency method is then used to...

Korkua, Suratsavadee Koonlaboon

2015-01-01

44

Reliability modeling of digital component in plant protection system with various fault-tolerant techniques  

Energy Technology Data Exchange (ETDEWEB)

Highlights: • Integrated fault coverage is introduced for reflecting characteristics of fault-tolerant techniques in the reliability model of digital protection system in NPPs. • The integrated fault coverage considers the process of fault-tolerant techniques from detection to fail-safe generation process. • With integrated fault coverage, the unavailability of repairable component of DPS can be estimated. • The new developed reliability model can reveal the effects of fault-tolerant techniques explicitly for risk analysis. • The reliability model makes it possible to confirm changes of unavailability according to variation of diverse factors. - Abstract: With the improvement of digital technologies, digital protection system (DPS) has more multiple sophisticated fault-tolerant techniques (FTTs), in order to increase fault detection and to help the system safely perform the required functions in spite of the possible presence of faults. Fault detection coverage is vital factor of FTT in reliability. However, the fault detection coverage is insufficient to reflect the effects of various FTTs in reliability model. To reflect characteristics of FTTs in the reliability model, integrated fault coverage is introduced. The integrated fault coverage considers the process of FTT from detection to fail-safe generation process. A model has been developed to estimate the unavailability of repairable component of DPS using the integrated fault coverage. The new developed model can quantify unavailability according to a diversity of conditions. Sensitivity studies are performed to ascertain important variables which affect the integrated fault coverage and unavailability.

Kim, Bo Gyung, E-mail: bogyungkim@kaist.ac.kr [Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701 (Korea, Republic of); Kang, Hyun Gook [Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701 (Korea, Republic of); Department of Nuclear Engineering, Khalifa University of Science, Technology and Research, Abu Dhabi (United Arab Emirates); Kim, Hee Eun [Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701 (Korea, Republic of); Lee, Seung Jun [Integrated Safety Assessment Team, Korea Atomic Energy Research Institute, 1045, Daedeok-daero, Daejeon 305-353 (Korea, Republic of); Seong, Poong Hyun [Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701 (Korea, Republic of)

2013-12-15

45

Diagnosis and Fault-Tolerant Control for Thruster-Assisted Position Mooring System  

DEFF Research Database (Denmark)

Development of fault-tolerant control systems is crucial to maintain safe operation of o®shore installations. The objective of this paper is to develop a fault- tolerant control for thruster-assisted position mooring (PM) system with faults occurring in the mooring lines. Faults in line's pretension or line breaks will degrade the performance of the positioning of the vessel. Faults will be detected and isolated through a fault diagnosis procedure. When faults are detected, they can be accommodated through the control action in which only parameter of the controlled plant has to be updated to cope with the faulty condition. Simulations will be carried out to verify the advantages of the fault-tolerant control strategy for the PM system.

Nguyen, Trong Dong; Blanke, Mogens

2007-01-01

46

Ship Propulsion System as a Benchmark for Fault-Tolerant Control  

DEFF Research Database (Denmark)

Fault-tolerant control combines fault detection and isolation techniques with supervisory control to achieve autonomous accommodation of faults before they develop into failures. While fault detection and isolation (FDI) methods have matured during the past decade the extension to fault-tolerant control is a fairly new area. The paper presents a ship propulsion system as a benchmark that should be useful as a platform for development of new ideas and comparison of methods. The benchmark has two main elements. One is development of efficient FDI algorithms, the other is analysis and implementation of autonomous fault accommodation. A benchmark kit can be obtained from the authors.

Izadi-Zamanabadi, Roozbeh; Blanke, M.

1998-01-01

47

Observer based actuator fault tolerant control for nonlinear Takagi-Sugeno systems : an LMI approach  

OpenAIRE

A new actuator fault tolerant control strategy is proposed in this paper for nonlinear Takagi-Sugeno (T-S) systems. The control law aims to compensate the actuator faults and allows the system states to track a reference states corresponding to the output of the system in the fault free situation. The design of such a control law requires the knowledge of the faults, this task is achieved with a proportional integral observer (PIO). The robust stability of the system with the fault tolerant c...

Ichalal, Dalil; Marx, Benoi?t; Ragot, Jose?; Maquin, Didier

2010-01-01

48

Passive Fault Tolerant Control of Piecewise Affine Systems Based on H Infinity Synthesis  

DEFF Research Database (Denmark)

In this paper we design a passive fault tolerant controller against actuator faults for discretetime piecewise affine (PWA) systems. By using dissipativity theory and H analysis, fault tolerant state feedback controller design is expressed as a set of Linear Matrix Inequalities (LMIs). In the current paper, the PWA system switches not only due to the state but also due to the control input. The method is applied on a large scale livestock ventilation model.

Gholami, Mehdi; Cocquempot, vincent

2011-01-01

49

High-Intensity Radiated Field Fault-Injection Experiment for a Fault-Tolerant Distributed Communication System  

Science.gov (United States)

Safety-critical distributed flight control systems require robustness in the presence of faults. In general, these systems consist of a number of input/output (I/O) and computation nodes interacting through a fault-tolerant data communication system. The communication system transfers sensor data and control commands and can handle most faults under typical operating conditions. However, the performance of the closed-loop system can be adversely affected as a result of operating in harsh environments. In particular, High-Intensity Radiated Field (HIRF) environments have the potential to cause random fault manifestations in individual avionic components and to generate simultaneous system-wide communication faults that overwhelm existing fault management mechanisms. This paper presents the design of an experiment conducted at the NASA Langley Research Center's HIRF Laboratory to statistically characterize the faults that a HIRF environment can trigger on a single node of a distributed flight control system.

Yates, Amy M.; Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Gonzalez, Oscar R.; Gray, W. Steven

2010-01-01

50

Design and Assessment of a Multiple Sensor Fault Tolerant Robust Control System  

OpenAIRE

This paper presents an enhanced robust control design structure to realise fault tolerance towards sensor faults suitable for multi-input-multi-output (MIMO) systems implementation. The proposed design permits fault detection and controller elements to be designed with considerations to stability and robustness towards uncertainties besides multiple faults environment on a common mathematical platform. This framework can also cater to systems requiring fast responses. A design example is illu...

Yang, S. S.; Chen, J.

2008-01-01

51

Optimal structure of fault-tolerant software systems  

International Nuclear Information System (INIS)

This paper considers software systems consisting of fault-tolerant components. These components are built from functionally equivalent but independently developed versions characterized by different reliability and execution time. Because of hardware resource constraints, the number of versions that can run simultaneously is limited. The expected system execution time and its reliability (defined as probability of obtaining the correct output within a specified time) strictly depend on parameters of software versions and sequence of their execution. The system structure optimization problem is formulated in which one has to choose software versions for each component and find the sequence of their execution in order to achieve the greatest system reliability subject to cost constraints. The versions are to be chosen from a list of available products. Each version is characterized by its reliability, execution time and cost. The suggested optimization procedure is based on an algorithm for determining system execution time distribution that uses the moment generating function approach and on the genetic algorithm. Both N-version programming and the recovery block scheme are considered within a universal model. Illustrated example is presented

52

Fault-Tolerant Control using Adaptive Time-Frequency Method in Bearing Fault Detection for DFIG Wind Energy System  

Directory of Open Access Journals (Sweden)

Full Text Available With the advances of power electronic technology, doubly-fed induction generators (DFIG have increasingly drawn the interest of wind turbine industries. To ensure the reliable operation and power quality of wind power systems, the fault-tolerant control for DFIG is studied in this paper. The fault-tolerant controller is design to maintain acceptable performance during bearing fault condition. Based on measured motor currents data, an adaptive statistical time-frequency method is then used to detect the fault occurrence in the system and then let the controller compensate for faulty conditions. The feature vectors including frequency components located in the neighborhood of the characteristic fault frequencies is first extracted and then used to estimate the next sampling stator side current in order to better perform the current control. Therefore, with early fault detection, isolation and successful reconfiguration would very beneficial in wind energy conversion system. The feasibility of this fault-tolerant controller has been proven by means of mathematical model and digital simulations based on Matlab/Simulink. The simulation results of the generator output show the effectiveness of this proposed fault-tolerant controller.

Suratsavadee Koonlaboon KORKUA

2015-01-01

53

Design and Assessment of a Multiple Sensor Fault Tolerant Robust Control System  

Directory of Open Access Journals (Sweden)

Full Text Available This paper presents an enhanced robust control design structure to realise fault tolerance towards sensor faults suitable for multi-input-multi-output (MIMO systems implementation. The proposed design permits fault detection and controller elements to be designed with considerations to stability and robustness towards uncertainties besides multiple faults environment on a common mathematical platform. This framework can also cater to systems requiring fast responses. A design example is illustrated with a fast, multivariable and unstable system, that is, the double inverted pendulum system. Results indicate the potential of this design framework to handle fast systems with multiple sensor faults.

J. Chen

2008-03-01

54

Fault-tolerant model predictive control within the hybrid systems framework: application to sewer networks  

OpenAIRE

In this paper, model predictive control (MPC) problem with fault-tolerance capabilities is formulated within the hybrid systems framework. In particular, the mixed logical dynamic form to represent hybrid systems is considered. Using this approach, a hybrid model of the system to be controlled is obtained, which includes inherent hybrid phenomena and possible modes caused by faults occurrence. This allows to adapt the system model on-line by taking into account the fault information provided ...

Ocampo-martinez, Carlos; Puig, Vicenc?

2009-01-01

55

Fault tolerant control for nonlinear systems described by Takagi-Sugeno models  

OpenAIRE

In this paper the problem of active fault tolerant control (FTC) in noisy systems is studied. The proposed FTC strategy is based on the known of the fault estimate and the error between the faulty system state and a reference system state. A proportional integral observer is used in order to estimate the state and the actuator faults. The obtained results are then extended to nonlinear systems described by nonlinear Takagi-Sugeno models. The problem of conception of the proportional integral ...

Kheder, Atef; Ben Othman, Kamel; Benrejeb, Mohamed; Maquin, Didier

2010-01-01

56

Transient Fault Tolerance and System Safety Enhancement Based on System Theory  

Directory of Open Access Journals (Sweden)

Full Text Available Transient faults are hard to be detected and located due to their unpredictable nature and short duration, and they are the dominant causations of system failures, which makes it necessary to consider transient fault-tolerant design in the development of modern safety-critical industrial system. In this paper an approach based on system theory is proposed to tolerate the transient faults in tunnel construction wireless monitoring and control systems (TCWMCS, in which the effects of transient faults are expressed by dysfunction of interactions among software applications. After analyzing the dysfunctional interactions of the system by the operational process model and educing the causes of dysfunction in the functional control diagram, a safety enhancement way was proposed for the designers, in which effictive safety constraints were set up to tolerate the transient faults. The experiment evaluation indicated that the effects of transient faults could be exposed by the causal factors of dysfunctional interactions and system safety could be enhanced by the enforcement of  appropriate constraints.

Xiongfeng Huang

2011-10-01

57

Award ER25750: Coordinated Infrastructure for Fault Tolerance Systems Indiana University Final Report  

Energy Technology Data Exchange (ETDEWEB)

The main purpose of the Coordinated Infrastructure for Fault Tolerance in Systems initiative has been to conduct research with a goal of providing end-to-end fault tolerance on a systemwide basis for applications and other system software. While fault tolerance has been an integral part of most high-performance computing (HPC) system software developed over the past decade, it has been treated mostly as a collection of isolated stovepipes. Visibility and response to faults has typically been limited to the particular hardware and software subsystems in which they are initially observed. Little fault information is shared across subsystems, allowing little flexibility or control on a system-wide basis, making it practically impossible to provide cohesive end-to-end fault tolerance in support of scientific applications. As an example, consider faults such as communication link failures that can be seen by a network library but are not directly visible to the job scheduler, or consider faults related to node failures that can be detected by system monitoring software but are not inherently visible to the resource manager. If information about such faults could be shared by the network libraries or monitoring software, then other system software, such as a resource manager or job scheduler, could ensure that failed nodes or failed network links were excluded from further job allocations and that further diagnosis could be performed. As a founding member and one of the lead developers of the Open MPI project, our efforts over the course of this project have been focused on making Open MPI more robust to failures by supporting various fault tolerance techniques, and using fault information exchange and coordination between MPI and the HPC system software stack?from the application, numeric libraries, and programming language runtime to other common system components such as jobs schedulers, resource managers, and monitoring tools.

Lumsdaine, Andrew

2013-03-08

58

Application-driven co-design of fault-tolerant industrial systems  

OpenAIRE

This paper presents a novel methodology for the HW/SW co-design of fault tolerant embedded systems that pursues the mitigation of radiation-induced upset events (which are a class of Single Event Effects - SEEs) on critical industrial applications. The proposal combines the flexibility and low cost of Software Implemented Hardware Fault Tolerance (SIHFT) techniques with the high reliability of selective hardware replication. The co-design flow is supported by a hardening platform that compris...

Restrepo Calle, Felipe; Marti?nez A?lvarez, Antonio; Guzma?n Miranda, Hipo?lito; Palomo Pinto, Francisco Rogelio; Cuenca Asensi, Sergio

2010-01-01

59

Enhanced fault-tolerant quantum computing in $d$-level systems  

OpenAIRE

Error correcting codes protect quantum information and form the basis of fault tolerant quantum computing. Leading proposals for fault-tolerant quantum computation require codes with an exceedingly rare property, a transverse non-Clifford gate. Codes with the desired property are presented for $d$-level, qudit, systems with prime $d$. The codes use $n=d-1$ qudits and can detect upto $\\sim d/3$ errors. We quantify the performance of these codes for one approach to quantum com...

Campbell, Earl T.

2014-01-01

60

Mapping of Fault-Tolerant Applications with Transparency on Distributed Embedded Systems  

DEFF Research Database (Denmark)

In this paper we present an approach for the mapping optimization of fault-tolerant embedded systems for safety-critical applications. Processes and messages are statically scheduled. Process re-execution is used for recovering from multiple transient faults. We call process recovery transparent if it does not affect operation of other processes. Transparent recovery has the advantage of fault containment, improved debugability and less memory needed to store the fault-tolerant schedules. However, it will introduce additional delays that can lead to violations of the timing constraints of the application. We propose an algorithm for the mapping of fault-tolerant applications with transparency. The algorithm decides a mapping of processes on computation nodes such that the application is schedulable and the transparency properties imposed by the designer are satisfied. The mapping algorithm is driven by a heuristic that is able to estimate the worst-case schedule length and indicate whether a certain mapping alternative is schedulable

Izosimov, Viacheslav; Pop, Paul

2006-01-01

61

Fault Tolerant Software Architectures  

OpenAIRE

Coping explicitly with failures during the conception and the design of software development complicates significantly the designer's job. The design complexity leads to software descriptions difficult to understand, which have to undergo many simplifications until their first functioning version. To support the systematic development of complex, fault tolerant software, this paper proposes a layered framework for the analysis of the fault tolerance software properties, where the top-most lay...

Saridakis, Titos; Issarny, Vale?rie

1998-01-01

62

Active Fault Tolerant Control-FTC-Design for Takagi-Sugeno Fuzzy Systems with Weighting Functions Depending on the FTC  

OpenAIRE

In this paper the problem of active fault tolerant control design for noisy systems described by Takagi-Sugeno fuzzy models is studied. The proposed control strategy is based on the known of the fault estimated and the error between the faulty system state and a reference system state. The considered systems are affected by actuator and sensor faults and have the weighting functions depending on the fault tolerant control. A mathematical transformation is used to conceive an augmented system ...

Atef Khedher; Kamel Ben Othman; Mohamed Benrejeb

2011-01-01

63

Fault tolerant distributed real time computer systems for I and C of prototype fast breeder reactor  

Energy Technology Data Exchange (ETDEWEB)

Highlights: • Architecture of distributed real time computer system (DRTCS) used in I and C of PFBR is explained. • Fault tolerant (hot standby) architecture, fault detection and switch over are detailed. • Scaled down model was used to study functional and performance requirements of DRTCS. • Quality of service parameters for scaled down model was critically studied. - Abstract: Prototype fast breeder reactor (PFBR) is in the advanced stage of construction at Kalpakkam, India. Three-tier architecture is adopted for instrumentation and control (I and C) of PFBR wherein bottom tier consists of real time computer (RTC) systems, middle tier consists of process computers and top tier constitutes of display stations. These RTC systems are geographically distributed and networked together with process computers and display stations. Hot standby architecture comprising of dual redundant RTC systems with switch over logic system is deployed in order to achieve fault tolerance. Fault tolerant dual redundant network connectivity is provided in each RTC system and TCP/IP protocol is selected for network communication. In order to assess the performance of distributed RTC systems, scaled down model was developed with 9 representative systems and nearly 15% of I and C signals of PFBR were connected and monitored. Functional and performance testing were carried out for each RTC system and the fault tolerant characteristics were studied by creating various faults into the system and observed the performance. Various quality of service parameters like connection establishment delay, priority parameter, transit delay, throughput, residual error ratio, etc., are critically studied for the network.

Manimaran, M., E-mail: maran@igcar.gov.in; Shanmugam, A.; Parimalam, P.; Murali, N.; Satya Murty, S.A.V.

2014-03-15

64

Fault Tolerant Control Systems : a Development Method and Real-Life Case Study  

DEFF Research Database (Denmark)

This thesis considered the development of fault tolerant control systems. The focus was on the category of automated processes that do not necessarily comprise a high number of identical sensors and actuators to maintain safe operation, but still have a potential for improving immunity to component failures. It is often feasible to increase availability for these control loops by designing the control system to perform on-line detection and reconfiguration in case of faults before the safety system makes a close-down of the process. A general development methodology is given in the thesis that carried the control system designer through the steps necessary to consider fault handling in an early design phase. It was shown how an existing control loop with interface to the plant wide control system could be extended with three additional modules to obtain fault tolerance: Fault detection and isolation, remedial action decision, and reconfiguration. The integration of these modules in software were considered. The general methodology covered the analysis, design, and implementation of fault tolerant control systems on an overall level. Two detailed studies were presented, one on fault detection and isolation design and one on design of the decision logic. Two application case studies were used to emphasize practical aspects of both the development methodology and the detailed studies. One was an electro-mechanical actuator in a position control loop for a diesel engine speed governor where the purpose was to avoid a total close-down in case of the most likely faults. The second was a fault tolerant attitude control system for a micro satellite where the operation of the system is mission critical. The purpose was to avoid hazardous effects from faults and maintain operation if possible. A method was introduced that, after a systematic examination of possible component failures, enables analysis of the relationship between failures and their consequences for the system's operation. This fault propagation analysis is based on coarse models of the subsystems describing the reaction to faults, as for example a variable being zero, low or high. Examples were given that illustrate how such models can be established by simple means, and yet provide important information when combined into a complete system. A special achievement was a method to determine how control loops behave in case of faults. This is not straight forward as the system behaviour depends on the character of the feedback. One of the detailed studies were the design of the decision logic in fault handling, realized as state-event machines. Guidelines for the design were provided, based on experience from the two case studies. Methods for verifying correct operation of the decision logic were described, where a completeness check against the fault propagation analysis is able to guarantee coverage of all considered faults. The usage of software tools to support the development process was illustrated with an off-the-shelf product for constraint logic solving and state-event machine analysis. The coarse system models and the decision logic were analyzed with the tool-box and it was shown how an easy analysis could be performed to verify correctness and completeness of the fault handling design. Experience from this study highlights requirements for a dedicated software environment for fault tolerant control systems design. The second detailed study addressed the detection of a fault event and determination of the failed component. A variety of algorithms were compared, based on two fault scenarios in the speed governor actuator setup. One was a position sensor fault and the second was an actuator current fault. The sensor fault detection was trivial, whereas the actuator fault was more challenging. The study demonstrated that many existing methods have a potential to detect and isolate the two faults, but also that the research field still misses a systematic approach to handle realistic problems such as low sampling rate and nonlinear characteristics of the system

BØgh, S.A.

1997-01-01

65

An architecture for fault tolerant controllers  

DEFF Research Database (Denmark)

A general architecture for fault tolerant control is proposed. The architecture is based on the (primary) YJBK parameterization of all stabilizing compensators and uses the dual YJBK parameterization to quantify the performance of the fault tolerant system. The approach suggested can be applied for additive faults, parametric faults, and for system structural changes. The modeling for each of these fault classes is described. The method allows to design for passive as well as for active fault handling. Also, the related design method can be fitted either to guarantee stability or to achieve graceful degradation in the sense of guaranteed degraded performance. A number of fault diagnosis problems, fault tolerant control problems, and feedback control with fault rejection problems are formulated/considered, mainly from a fault modeling point of view. The method is illustrated on a servo example including an additive fault and a parametric fault.

Niemann, Hans Henrik; Stoustrup, Jakob

2005-01-01

66

Adaptive sensor-fault tolerant control for a class of multivariable uncertain nonlinear systems.  

Science.gov (United States)

This paper deals with the active fault tolerant control (AFTC) problem for a class of multiple-input multiple-output (MIMO) uncertain nonlinear systems subject to sensor faults and external disturbances. The proposed AFTC method can tolerate three additive (bias, drift and loss of accuracy) and one multiplicative (loss of effectiveness) sensor faults. By employing backstepping technique, a novel adaptive backstepping-based AFTC scheme is developed using the fact that sensor faults and system uncertainties (including external disturbances and unexpected nonlinear functions caused by sensor faults) can be on-line estimated and compensated via robust adaptive schemes. The stability analysis of the closed-loop system is rigorously proven using a Lyapunov approach. The effectiveness of the proposed controller is illustrated by two simulation examples. PMID:25701191

Khebbache, Hicham; Tadjine, Mohamed; Labiod, Salim; Boulkroune, Abdesselem

2015-03-01

67

Fault-Tolerant Control of a 2 DOF Helicopter (TRMS System) Based on H_infinity  

OpenAIRE

In this paper, a Fault-Tolerant control of 2 DOF Helicopter (TRMS System) Based on H-infinity is presented. In particular, the introductory part of the paper presents a Fault-Tolerant Control (FTC), the first part of this paper presents a description of the mathematical model of TRMS, and the last part of the paper presented and a polytypic Unknown Input Observer (UIO) is synthesized using equalities and LMIs. This UIO is used to observe the faults and then compensate them, ...

Bouguerra, Abderrahmen; Saigaa, Djamel; Kara, Kamel; Zeghlache, Samir; Loukal, Keltoum

2013-01-01

68

Probabilistic safety assessment on the fault-tolerant mechanism of digital I and C systems  

International Nuclear Information System (INIS)

There are various problems in applying the digital equipment including software to the safety-related system of a nuclear power plant because no standard on quantitative safety assessment is well- accepted. Especially, the fault-tolerant features which is one of the most beneficial aspects of a microprocessor-based system should be evaluated quantitatively in order to assess the safety of a digital system. This paper describes the fault-tolerant features of digital systems which can be applied to software, hardware or system. For the case of watchdog timer which is expected to be the most competitive fault-tolerant mechanism for nuclear power plant's safety systems, this paper show an example of the process of probabilistic safety assessment. The estimation of the coverage factor value of applied lerant mechanism is found to be very important

69

Validation Methods Research for Fault-Tolerant Avionics and Control Systems: Working Group Meeting, 2  

Science.gov (United States)

The validation process comprises the activities required to insure the agreement of system realization with system specification. A preliminary validation methodology for fault tolerant systems documented. A general framework for a validation methodology is presented along with a set of specific tasks intended for the validation of two specimen system, SIFT and FTMP. Two major areas of research are identified. First, are those activities required to support the ongoing development of the validation process itself, and second, are those activities required to support the design, development, and understanding of fault tolerant systems.

Gault, J. W. (ed); Trivedi, K. S. (editor); Clary, J. B. (editor)

1980-01-01

70

Fault-tolerant interconnection network and image-processing applications for the PASM parallel processing system  

International Nuclear Information System (INIS)

The demand for very high speed data processing coupled with falling hardware costs has made large-scale parallel and distributed computer systems both desirable and feasible. Two modes of parallel processing are single instruction stream-multiple data stream (SIMD) and multiple instruction stream-multiple data stream (MIMD). PASM, a partitionable SIMD/MIMD system, is a reconfigurable multimicroprocessor system being designed for image processing and pattern recognition. An important component of these systems is the interconnection network, the mechanism for communication among the computation nodes and memories. Assuring high reliability for such complex systems is a significant task. Thus, a crucial practical aspect of an interconnection network is fault tolerance. In answer to this need, the Extra Stage Cube (ESC), a fault-tolerant, multistage cube-type interconnection network, is define. The fault tolerance of the ESC is explored for both single and multiple faults, routing tags are defined, and consideration is given to permuting data and partitioning the ESC in the presence of faults. The ESC is compared with other fault-tolerant multistage networks. Finally, reliability of the ESC and an enhanced version of it are investigated

71

A novel mathematical setup for fault tolerant control systems with state-dependent failure process  

Science.gov (United States)

In this paper, we consider a fault tolerant control system (FTCS) with state- dependent failures and provide a tractable mathematical model to handle the state-dependent failures. By assuming abrupt changes in system parameters, we use a jump process modelling of failure process and the fault detection and isolation (FDI) process. In particular, we assume that the failure rates of the failure process vary according to which set the state of the system belongs to.

Chitraganti, S.; Aberkane, S.; Aubrun, C.

2014-12-01

72

Fault Tolerance Analysis and Self-Healing Strategy of Autonomous, Evolvable Hardware Systems  

OpenAIRE

This paper presents an analysis of the fault tolerance achieved by an autonomous, fully embedded evolvable hardware system, which uses a combination of partial dynamic reconfiguration and an evolutionary algorithm (EA). It demonstrates that the system may self-recover from both transient and cumulative permanent faults. This self-adaptive system, based on a 2D array of 16 (4×4) Processing Elements (PEs), is tested with an image filtering application. Results show that it may properly recover...

Salvador Perea, Rube?n; Otero Marnotes, Andres; Mora, Javier; Torre Arnanz, Eduardo La; Sekanina, Luka?s; Riesgo Alcaide, Teresa

2011-01-01

73

Fault Tolerant Control for Takagi-Sugeno systems with unmeasurable premise variables by trajectory tracking  

OpenAIRE

This paper presents a new method for fault tolerant control of nonlinear systems described by Takagi- Sugeno fuzzy systems with unmeasurable premise variables. The idea is to use a reference model and design a new control law to minimize the state deviation between a healthy reference model and the eventually faulty actual model. This scheme requires the knowledge of the system states and of the occurring faults. These signals are estimated from a Proportional-Integral Observer (PIO) or Propo...

Ichalal, Dalil; Marx, Benoi?t; Ragot, Jose?; Maquin, Didier

2010-01-01

74

Checkpointing Based Fault Tolerant Job Scheduling System for Computational Grid  

OpenAIRE

A computational grid environment, due to its heterogeneous, autonomous and dynamic nature is prone to different kinds of faults which may lead to delay in completion of job or even execution of job from starting point. Checkpointing mechanism plays a vital role for making grid more reliable, cost effective and efficient. In this paper, we have proposed schemes based on system checkpointing and application checkpointing. Their performance comparison is done based on the empirical study. The AB...

Mangesh Ramesh Balpande

2014-01-01

75

Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 1: Army fault tolerant architecture overview  

Science.gov (United States)

Digital computing systems needed for Army programs such as the Computer-Aided Low Altitude Helicopter Flight Program and the Armored Systems Modernization (ASM) vehicles may be characterized by high computational throughput and input/output bandwidth, hard real-time response, high reliability and availability, and maintainability, testability, and producibility requirements. In addition, such a system should be affordable to produce, procure, maintain, and upgrade. To address these needs, the Army Fault Tolerant Architecture (AFTA) is being designed and constructed under a three-year program comprised of a conceptual study, detailed design and fabrication, and demonstration and validation phases. Described here are the results of the conceptual study phase of the AFTA development. Given here is an introduction to the AFTA program, its objectives, and key elements of its technical approach. A format is designed for representing mission requirements in a manner suitable for first order AFTA sizing and analysis, followed by a discussion of the current state of mission requirements acquisition for the targeted Army missions. An overview is given of AFTA's architectural theory of operation.

Harper, R. E.; Alger, L. S.; Babikyan, C. A.; Butler, B. P.; Friend, S. A.; Ganska, R. J.; Lala, J. H.; Masotto, T. K.; Meyer, A. J.; Morton, D. P.

1992-01-01

76

Checkpointing Based Fault Tolerant Job Scheduling System for Computational Grid  

Directory of Open Access Journals (Sweden)

Full Text Available A computational grid environment, due to its heterogeneous, autonomous and dynamic nature is prone to different kinds of faults which may lead to delay in completion of job or even execution of job from starting point. Checkpointing mechanism plays a vital role for making grid more reliable, cost effective and efficient. In this paper, we have proposed schemes based on system checkpointing and application checkpointing. Their performance comparison is done based on the empirical study. The ABSC scheme is suitable for the applications where computations are not intense. But for computationally intense applications where reliability is more important ABAC scheme is more suitable. But this scheme may produce slight overheads in fault free situations and very reliable in faulty situations.

Mangesh Ramesh Balpande

2014-09-01

77

Fault detection and fault tolerant control of a smart base isolation system with magneto-rheological damper  

Science.gov (United States)

Fault detection and isolation (FDI) in real-time systems can provide early warnings for faulty sensors and actuator signals to prevent events that lead to catastrophic failures. The main objective of this paper is to develop FDI and fault tolerant control techniques for base isolation systems with magneto-rheological (MR) dampers. Thus, this paper presents a fixed-order FDI filter design procedure based on linear matrix inequalities (LMI). The necessary and sufficient conditions for the existence of a solution for detecting and isolating faults using the H_{\\infty } formulation is provided in the proposed filter design. Furthermore, an FDI-filter-based fuzzy fault tolerant controller (FFTC) for a base isolation structure model was designed to preserve the pre-specified performance of the system in the presence of various unknown faults. Simulation and experimental results demonstrated that the designed filter can successfully detect and isolate faults from displacement sensors and accelerometers while maintaining excellent performance of the base isolation technology under faulty conditions.

Wang, Han; Song, Gangbing

2011-08-01

78

Active fault tolerant control of piecewise affine systems with reference tracking and input constraints  

DEFF Research Database (Denmark)

An active fault tolerant control (AFTC) method is proposed for discrete-time piecewise affine (PWA) systems. Only actuator faults are considered. The AFTC framework contains a supervisory scheme, which selects a suitable controller in a set of controllers such that the stability and an acceptable performance of the faulty system are held. The design of the supervisory scheme is not considered here. The set of controllers is composed of a normal controller for the fault-free case, an active fault detection and isolation controller for isolation and identification of the faults, and a set of passive fault tolerant controllers (PFTCs) modules designed to be robust against a set of actuator faults. In this research, the piecewise nonlinear model is approximated by a PWA system. The PFTCs are state feedback laws. Each one is robust against a fixed set of actuator faults and is able to track the reference signal while the control inputs are bounded. The PFTC problem is transformed into a feasibility problem of a set of LMIs. The method is applied on a large-scale live-stock ventilation model.

Gholami, M.; Cocquempot, V.

2013-01-01

79

Self-stabilizing byzantine-fault-tolerant clock synchronization system and method  

Science.gov (United States)

Systems and methods for rapid Byzantine-fault-tolerant self-stabilizing clock synchronization are provided. The systems and methods are based on a protocol comprising a state machine and a set of monitors that execute once every local oscillator tick. The protocol is independent of specific application specific requirements. The faults are assumed to be arbitrary and/or malicious. All timing measures of variables are based on the node's local clock and thus no central clock or externally generated pulse is used. Instances of the protocol are shown to tolerate bursts of transient failures and deterministically converge with a linear convergence time with respect to the synchronization period as predicted.

Malekpour, Mahyar R. (Inventor)

2012-01-01

80

Fault tolerance control of phase current in permanent magnet synchronous motor control system  

Science.gov (United States)

As the Photoelectric tracking system develops from earth based platform to all kinds of moving platform such as plane based, ship based, car based, satellite based and missile based, the fault tolerance control system of phase current sensor is studied in order to detect and control of failure of phase current sensor on a moving platform. By using a DC-link current sensor and the switching state of the corresponding SVPWM inverter, the failure detection and fault control of three phase current sensor is achieved. Under such conditions as one failure, two failures and three failures, fault tolerance is able to be controlled. The reason why under the method, there exists error between fault tolerance control and actual phase current, is analyzed, and solution to weaken the error is provided. The experiment based on permanent magnet synchronous motor system is conducted, and the method is proven to be capable of detecting the failure of phase current sensor effectively and precisely, and controlling the fault tolerance simultaneously. With this method, even though all the three phase current sensors malfunction, the moving platform can still work by reconstructing the phase current of the motor.

Chen, Kele; Chen, Ke; Chen, Xinglong; Li, Jinying

2014-08-01

81

Energy/Reliability Trade-offs in Fault-Tolerant Event-Triggered Distributed Embedded Systems  

DEFF Research Database (Denmark)

This paper presents an approach to the synthesis of low-power fault-tolerant hard real-time applications mapped on distributed heterogeneous embedded systems. Our synthesis approach decides the mapping of tasks to processing elements, as well as the voltage and frequency levels for executing each task, such that transient faults are tolerated, the timing constraints of the application are satisfied, and the energy consumed is minimized. Tasks are scheduled using fixed-priority preemptive scheduling, while replication is used for recovery from multiple transient faults. Addressing energy and reliability simultaneously is especially challenging, since lowering the voltage to reduce the energy consumption has been shown to increase the transient fault rate. We presented a Tabu Search-based approach which uses an energy/reliability trade-off model to find reliable and schedulable implementations with limited energy and hardware resources. We evaluated the algorithm proposed using several synthetic and reallife benchmarks.

Gan, Junhe; Gruian, Flavius

2011-01-01

82

Enhanced fault-tolerant quantum computing in d-level systems.  

Science.gov (United States)

Error-correcting codes protect quantum information and form the basis of fault-tolerant quantum computing. Leading proposals for fault-tolerant quantum computation require codes with an exceedingly rare property, a transversal non-Clifford gate. Codes with the desired property are presented for d-level qudit systems with prime d. The codes use n=d-1 qudits and can detect up to ?d/3 errors. We quantify the performance of these codes for one approach to quantum computation known as magic-state distillation. Unlike prior work, we find performance is always enhanced by increasing d. PMID:25526106

Campbell, Earl T

2014-12-01

83

The fault-tolerant multiprocessor computer  

Science.gov (United States)

The development and evaluation of fault-tolerant computer architectures and software-implemented fault tolerance (SIFT) for use in advanced NASA vehicles and potentially in flight-control systems are described in a collection of previously published reports prepared for NASA. Topics addressed include the principles of fault-tolerant multiprocessor (FTMP) operation; processor and slave regional designs; FTMP executive, facilities, acceptance-test/diagnostic, applications, and support software; FTM reliability and availability models; SIFT hardware design; and SIFT validation and verification.

Smith, T. B., III (editor); Lala, J. H. (editor); Goldberg, J. (editor); Kautz, W. H. (editor); Melliar-Smith, P. M. (editor); Green, M. W. (editor); Levitt, K. N. (editor); Schwartz, R. L. (editor); Weinstock, C. B. (editor); Palumbo, D. L. (editor)

1986-01-01

84

Fault Tolerant Computer Architecture  

CERN Document Server

For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes

Sorin, Daniel

2009-01-01

85

A Hybrid Real-time Fault-tolerant Scheduling Algorithm for Partial Reconfigurable System  

Directory of Open Access Journals (Sweden)

Full Text Available Partial reconfigurable system is an architecture consisting general purpose processors and FPGAs, in which FPGA can be reconfigured in run-time. Based on the architecture, software tasks and hardware tasks that are executed on processor and FPGA respectively co-exist. In this paper, a real-time fault-tolerant scheduling algorithm is proposed to schedule software/hardware hybrid tasks. In the algorithm, the sufficient condition for schedulable hybrid tasks is derived from analyzing system operation conditions when the first deadline is missed, and rollback/recovery and TMR approaches are used respectively to schedule software subtasks and hardware subtasks for fault tolerance. The experimental results demonstrate that all deadlines of accepted hybrid tasks are met and processor’s utilization ratio is increased greatly compared with that of the exiting approaches when multiple faults occur.

Jinyong Yin

2012-11-01

86

Diagnosis and Fault-tolerant Control  

DEFF Research Database (Denmark)

The book presents effective model-based analysis and design methods for fault diagnosis and fault-tolerant control. Architectural and structural models are used to analyse the propagation of the fault through the process, to test the fault detectability and to find the redundancies in the process that can be used to ensure fault tolerance. Design methods for diagnostic systems and fault-tolerant controllers are presented for processes that are described by analytical models, by discrete-event models or that can be dealt with as quantised systems. Four case studies on pilot processes show the applicability of the presented methods. The theoretical results are illustrated by two running examples which are used throughout the book. The book addresses engineering students, engineers in industry and researchers who wish to get a survey over the variety of approaches to process diagnosis and fault-tolerant control.

Blanke, Mogens; Kinnaert, Michel

2003-01-01

87

An Efficient Fault Tolerance System Design for Cmos/Nanodevice Digital Memories  

Directory of Open Access Journals (Sweden)

Full Text Available Targeting on the future fault-prone hybrid CMOS/Nanodevice digital memories, this paper present two faulttolerance design approaches the integrally address the tolerance for defect and transient faults. These two approaches share several key features, including the use of a group of Bose-Chaudhuri- Hocquenghem (BCH codes for both defect tolerance and transient fault tolerance, and integration of BCH code selection and dynamic logical-to-physical address mapping. Thus, a new model of BCH decoder is proposed to reduce the area and simplify the computational scheduling of both syndrome and chien search blocks without parallelism leading to high throughput.The goal of fault tolerant computing is improve the dependability of systems where dependability can be defined as the ability of a system to deliver service at an acceptable level of confidence in either presence or absence falult.ss The results of the simulation and implementation using Xilinx ISE software and the LCD screen on the FPGA’s Board will be shown at last.

D. Kavitha

2014-11-01

88

Diagnosis and Tolerant Strategy of an Open-Switch Fault for T-type Three-Level Inverter Systems  

DEFF Research Database (Denmark)

This paper proposes a new diagnosis method of an open-switch fault and fault-tolerant control strategy for T-type three-level inverter systems. The location of faulty switch can be identified by the average of normalized phase current and the change of the neutral-point voltage. The proposed fault-tolerant strategy is explained by dividing into two cases: the faulty condition of half-bridge switches and the neutral-point switches. The performance of the T-type inverter system improves considerably by the proposed fault tolerant algorithm when a switch fails. The roposed method does not require additional components and complex calculations. Simulation and experimental results verify the feasibility of the proposed fault diagnosis and fault-tolerant control strategy.

Choi, Uimin; Lee, Kyo Beum

2014-01-01

89

Stochastic Models for Fault Tolerance  

CERN Document Server

As modern society relies on the fault-free operation of complex computing systems, system fault-tolerance has become an indispensable requirement. Therefore, we need mechanisms that guarantee correct service in cases where system components fail, be they software or hardware elements. Redundancy patterns are commonly used, for either redundancy in space or redundancy in time. Wolter's book details methods of redundancy in time that need to be issued at the right moment. In particular, she addresses the so-called "timeout selection problem", i.e., the question of choosing the right ti

Wolter, Katinka M

2010-01-01

90

Fault Tolerance Mobile Agent System Using Witness Agent in 2-Dimensional Mesh Network  

Directory of Open Access Journals (Sweden)

Full Text Available Mobile agents are computer programs that act autonomously on behalf of a user or its owner and travel through a network of heterogeneous machines. Fault tolerance is important in their itinerary. In this paper, existent methods of fault tolerance in mobile agents are described which they are considered in linear network topology. In the methods three agents are used to fault tolerance by cooperating to each others for detecting and recovering server and agent failure. Three types of agents are: actual agent which performs programs for its owner, witness agent which monitors the actual agent and the witness agent after itself, probe which is sent for recovery the actual agent or the witness agent on the side of the witness agent. Communication mechanism in the methods is message passing between these agents. The methods are considered in linear network. We introduce our witness agent approach for fault tolerance mobile agent systems in Two Dimensional Mesh (2D-Mesh Network. Indeed Our approach minimizes Witness-Dependency in this network and then represents its algorithm.

Ahmad Rostami

2010-09-01

91

Problems, issues and techniques in validating a fault tolerant system for reactor control  

International Nuclear Information System (INIS)

Nuclear reactor control systems and many other safety related systems, have reliability requirements that can only be met by fault tolerant systems. While there are commercially available systems that are, or claim to be, fault tolerant, it is necessary to validate that this is indeed true. This raises the issue of whether the analysis that is used to predict the reliability properly reflects the actual system. Another issue is whether the installed system agrees with the design. This includes both hardware and software questions and includes topics such as quality assurance, testing, etc. Techniques are described to address these issues. The research that is needed to advance these techniques to the point of practical utility is outlined

92

To err is robotic, to tolerate immunological: fault detection in multirobot systems.  

Science.gov (United States)

Fault detection and fault tolerance represent two of the most important and largely unsolved issues in the field of multirobot systems (MRS). Efficient, long-term operation requires an accurate, timely detection, and accommodation of abnormally behaving robots. Most existing approaches to fault-tolerance prescribe a characterization of normal robot behaviours, and train a model to recognize these behaviours. Behaviours unrecognized by the model are consequently labelled abnormal or faulty. MRS employing these models do not transition well to scenarios involving temporal variations in behaviour (e.g., online learning of new behaviours, or in response to environment perturbations). The vertebrate immune system is a complex distributed system capable of learning to tolerate the organism's tissues even when they change during puberty or metamorphosis, and to mount specific responses to invading pathogens, all without the need of a genetically hardwired characterization of normality. We present a generic abnormality detection approach based on a model of the adaptive immune system, and evaluate the approach in a swarm of robots. Our results reveal the robust detection of abnormal robots simulating common electro-mechanical and software faults, irrespective of temporal changes in swarm behaviour. Abnormality detection is shown to be scalable in terms of the number of robots in the swarm, and in terms of the size of the behaviour classification space. PMID:25642825

Tarapore, Danesh; Lima, Pedro U; Carneiro, Jorge; Christensen, Anders Lyhne

2015-01-01

93

Architectures for fault-tolerant spacecraft computers  

Science.gov (United States)

This paper summarizes the results of a long-term research program in fault-tolerant computing for spacecraft on-board processing. In response to changing device technology this program has progressed from the design of a fault-tolerant uniprocessor to the development of fault-tolerant distributed computer systems. The unusual requirements of spacecraft computing are described along with the resulting real-time computer architectures. The following aspects of these designs are discussed: (1) architectural features to minimize complexity in the distributed computer system, (2) fault-detection and recovery, (3) techniques to enhance reliability and testability, and (4) design approaches for LSI implementation.

Rennels, D. A.

1978-01-01

94

New fault tolerant matrix converter  

Energy Technology Data Exchange (ETDEWEB)

The matrix converter (MC) presents a promising topology that will have to overcome certain barriers (protection systems, durability, the development of converters for real applications, etc.) in order to gain a foothold in the industry. In some applications, where continuous operation must be insured in the case of a system failure, improved reliability of the converter is of particular importance. In this sense, this article focuses on the study of a fault tolerant MC. The fault tolerance of a converter is characterized by its total or partial response in the case of a breakage of any of its components. Taking into consideration that virtually no work has been done on fault tolerant MCs, this paper describes the most important studies in this area. Moreover, a new method is proposed for detecting the breakage of MC semiconductors. Likewise, a new variation of SVM modulation with failure tolerance capacity is presented. This guarantees the continuous operation of the converter and the pseudo-optimum control of a PMSM. This paper also proposes a novel MC topology, which allows the flexible reconfiguration of this converter, when one or several of its semiconductors are damaged. In this way, the MC can continue operating at 100% of its performance without having to double its resources. In this way, it can be said that the solution described in this article represents a step forward towards the development of reliable matrix converters for real applications. (author)

Ibarra, Edorta; Andreu, Jon; Kortabarria, Inigo; Ormaetxea, Enekoitz; Alegria, Inigo Martinez de; Martin, Jose Luis [Department of Electronics and Telecommunications, University of the Basque Country, Alameda de Urquijo s/n, E-48013 Bilbao (Spain); Ibanez, Pedro [TECNALIA, Energy Unit, Parque Tecnologico de Zamudio, E-48170 Bizkaia (Spain)

2011-02-15

95

Spacecraft formation stabilization and fault tolerance: a state-varying switched system approach  

OpenAIRE

The focus of this paper is the spacecraft formation flying problem where the formation switches successively among multiple shapes, the number and the composition of spacecraft in the group may change among these shapes, and some spacecraft may be faulty. The whole flying process is modeled as a state-varying switched system. The formation stability and fault tolerability are analyzed by using new results on state-varying switched systems.

Yang, Hao; Jiang, Bin; Cocquempot, Vincent; Chen, Mou

2013-01-01

96

Integrity-Enhancing Replica Coordination for Byzantine Fault Tolerant Systems  

CERN Document Server

Strong replica consistency is often achieved by writing deterministic applications, or by using a variety of mechanisms to render replicas deterministic. There exists a large body of work on how to render replicas deterministic under the benign fault model. However, when replicas can be subject to malicious faults, most of the previous work is no longer effective. Furthermore, the determinism of the replicas is often considered harmful from the security perspective and for many applications, their integrity strongly depends on the randomness of some of their internal operations. This calls for new approaches towards achieving replica consistency while preserving the replica randomness. In this paper, we present two such approaches. One is based on Byzantine agreement and the other on threshold coin-tossing. Each approach has its strength and weaknesses. We compare the performance of the two approaches and outline their respective best use scenarios.

Zhao, Wenbing

2008-01-01

97

Fault tolerant safety related computer based process control system for TAPP- 3 and 4  

International Nuclear Information System (INIS)

Computer based control systems for safety related applications in nuclear power plants have to meet not only the functional, performance and interface requirements, but in addition, they have to meet regulatory requirements like enhanced reliability, safety and security. While meeting these stringent requirements, such computer based systems also need to ensure high availability. Availability of these safety related systems has a direct influence on commercial operation of the NPP and on the availability of several megawatts of electrical power to the national grid. Several design features such as fault tolerance, on-line diagnostics and self-supervision etc. are to be incorporated in the computer system architecture, hardware design and software design to meet high reliability and high availability criteria. Reactor Control Division (RCnD) has designed and developed 'Dual Processor Hot Standby' (DPHS) fault tolerant architecture, which not only meets the safety requirements but also provides very high availability. The fault tolerant features of DPHS architecture and the design of Process Control System based on DPHS architecture (DPH5-PCS) for TAPP-3 and 4 are highlighted in this paper. DPH5-PCS for Tarapur Atomic Power Project (TAPP) -3 and 4 regulates Primary Heat Transport (PHT) system pressure, Pressuriser pressure, Pressuriser level, Bleed condenser pressure, Bleed condenser level and Steam generator pressure. (author)

98

Fault Tolerant Control: A Simultaneous Stabilization Result  

DEFF Research Database (Denmark)

This paper discusses the problem of designing fault tolerant compensators that stabilize a given system both in the nominal situation, as well as in the situation where one of the sensors or one of the actuators has failed. It is shown that such compensators always exist, provided that the system is detectable from each output and that it is stabilizable. The proof of this result is constructive, and a worked example shows how to design a fault tolerant compensator for a simple, yet challeging system. A family of second order systems is described that requires fault tolerant compensators of arbitrarily high order. Udgivelsesdato: FEB

Stoustrup, Jakob; Blondel, V.D.

2004-01-01

99

Fault tolerant synchronization of chaotic systems based on T-S fuzzy model with fuzzy sampled-data controller  

Science.gov (United States)

In this paper the fault tolerant synchronization of two chaotic systems based on fuzzy model and sample data is investigated. The problem of fault tolerant synchronization is formulated to study the global asymptotical stability of the error system with the fuzzy sampled-data controller which contains a state feedback controller and a fault compensator. The synchronization can be achieved no matter whether the fault occurs or not. To investigate the stability of the error system and facilitate the design of the fuzzy sampled-data controller, a Takagi-Sugeno (T-S) fuzzy model is employed to represent the chaotic system dynamics. To acquire good performance and produce a less conservative analysis result, a new parameter-dependent Lyapunov-Krasovksii functional and a relaxed stabilization technique are considered. The stability conditions based on linear matrix inequality are obtained to achieve the fault tolerant synchronization of the chaotic systems. Finally, a numerical simulation is shown to verify the results.

Ma, Da-Zhong; Zhang, Hua-Guang; Wang, Zhan-Shan; Feng, Jian

2010-05-01

100

Failure transition distance-based importance sampling schemes for the simulation of repairable fault-tolerant computer systems  

OpenAIRE

Markov models are often used to evaluate dependability attributes of fault-tolerant computer systems. The use in practice of Markov models is, however, hampered by the well-known state space explosion problem. Simulation alleviates the problem. For Markov models of repairable fault-tolerant systems, standard simulation of dependability measures tends to be expensive due to the rarity of the system failure event. Importance sampling can speed up the simulation. This paper develops two importan...

Carrasco, Juan A.

2006-01-01

101

Modeling Run-Time Distributions in Passively Replicated Fault-Tolerant Systems  

OpenAIRE

Many real-time applications will have strict reliability requirements in addition to the timing requirements. To fulfill these reliability requirements, it may be necessary to use a fault-tolerance strategy. An active replication strategy, where several instances of the task is run in parallel, is the preferred choice for many real-time systems, as the parallel execution of the task instances gives a high probability that at least some of the instances finish successfully before the deadlines...

Tjora, A?smund

2007-01-01

102

Failure distance-based simulation of repairable fault-tolerant systems  

OpenAIRE

This paper presents a new importance sampling scheme called failure biasing for the efficient simulation of Markovian models of repairable fault-tolerant systems. The new scheme enriches the failure biasing scheme previously proposed by exploiting the concept of failure distance. This results in a much more efficient simulation with speedups over failure biasing of orders of magnitude in typical cases. The paper also discusses the efficient implementation of the new importance sampling scheme...

Carrasco, Juan A.

1992-01-01

103

Simulation of steady-state availability models of fault-tolerant systems with deferred repair  

OpenAIRE

This paper targets the simulation of continuous-time Markov chain models of fault-tolerant systems with deferred repair. We start by stating sufficient conditions for a given importance sampling scheme to satisfy the bounded relative error property. Using those sufficient conditions, it is noted that many previously proposed importance sampling schemes such as failure biasing and balanced failure biasing satisfy that property. Then, we adapt the importance sampling schemes failure transition ...

Carrasco, Juan A.

2006-01-01

104

Hierarchical object-oriented modeling of fault-tolerant computer systems  

OpenAIRE

A hierarchical, object-oriented modeling language for the specification of dependability models for complex fault-tolerant computer systems is overviewed. The language incorporates the hierarchical notions of cluster, operational mode and configuration and borrows from object-oriented programming the concepts of class, parameterization, and instantiation. These features together result in a highly expressive environment allowing the concise specification of sophisticated dependability mode...

Carrasco, Juan A.

1991-01-01

105

Output Feedback Robust Hinf Control of Uncertain Active Fault Tolerant Control Systems via Convex Analysis  

OpenAIRE

This paper deals with the problem of H_{infty} and robust H_{infty} control, via dynamic output feedback, of continuous time Active Fault Tolerant Control Systems with Markovian Parameters (AFTCSMP) subject to both structured and unstructured parameter uncertainties. The above problematic is addressed under a convex programming approach. Indeed, the fundamental tool in the analysis is an LMI (Linear Matrix Inequalities) characterization of dynamical compensators that stochastically (robustly)...

Aberkane, Samir; Sauter, Dominique; Ponsart, Jean-christophe

2007-01-01

106

A Study on Fault-Tolerant Software Architecture for COTS-Based Dependable System  

International Nuclear Information System (INIS)

Recently, with the rapid development of digital computers and information processing technologies, nuclear instrument and control (I and C) systems which needs safety-critical function have adopted digital technologies. Also, use of commercial off-the-shelf (COTS) software in safety-critical system has been incremented with several reasons such as economical efficiency and technical problems. But, it requires a considerable integration effort and brings about software quality and safety issues. COTS software is usually provided as a black box that cannot be modified. The biggest problem when we integrate such a product into dependable systems is the reliability of COTS software. There is no guarantee that the software will perform its function correctly. It may have bugs or unidentified components. Recently, the method of software verification and validation (V and V) is accepted as a way to assure the dependability of new-developed safety-critical nuclear I and C software. But, because of the limitation of COTS software, software V and V cant be applied as rigorously as new-developed software. There are considerable attentions into describing software architecture with respect to there dependability properties. In this paper, we present fault-tolerant software architecture using the C2 architectural style. The remainder of the paper is organized as follows: Section 2 discusses background work on the COTS software in nuclear I and C, software fault tolerance and C2 ar and C, software fault tolerance and C2 architectural style. Section 3 describes the architecture for fault-tolerant COTS-based software. Finally, we discuss the conclusion and future work

107

Application of Joint Parameter Identification and State Estimation to a Fault-Tolerant Robot System  

DEFF Research Database (Denmark)

The joint parameter identification and state estimation technique is applied to develop a fault-tolerant space robot system. The potential faults in the considered system are abrupt parametric faults, which indicate that some system parameters will immediately deviate from their nominal values if a fault happens. The concerned system parameters consist of deterministic parts as well as those describing the stochastic features in the system. Due to the purpose for design of reconfigurable control, these deviated system parameters need to be identified as precisely and quickly as possible. Meanwhile, it would further simplify the reconfigurable design task and possibly speed up the system recovery, if the system state information under the new operating circumstance can be available along with faulty parameter information. The joint parameter identification and state estimation using the combined Kalman Filter and Maximum Likelihood (KF-ML) techniques is discussed and applied in this study. The simulation results on a space robot system showed that the proposed method is quite promising in providing both faulty parameter information and state estimation in a quick, accurate and robust manner.

Sun, Zhen; Yang, Zhenyu

2011-01-01

108

Plan for the Characterization of HIRF Effects on a Fault-Tolerant Computer Communication System  

Science.gov (United States)

This report presents the plan for the characterization of the effects of high intensity radiated fields on a prototype implementation of a fault-tolerant data communication system. Various configurations of the communication system will be tested. The prototype system is implemented using off-the-shelf devices. The system will be tested in a closed-loop configuration with extensive real-time monitoring. This test is intended to generate data suitable for the design of avionics health management systems, as well as redundancy management mechanisms and policies for robust distributed processing architectures.

Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Miner, Paul S.; Koppen, Sandra V.

2008-01-01

109

Fault-tolerant embedded system design and optimization considering reliability estimation uncertainty  

International Nuclear Information System (INIS)

In this paper, we model embedded system design and optimization, considering component redundancy and uncertainty in the component reliability estimates. The systems being studied consist of software embedded in associated hardware components. Very often, component reliability values are not known exactly. Therefore, for reliability analysis studies and system optimization, it is meaningful to consider component reliability estimates as random variables with associated estimation uncertainty. In this new research, the system design process is formulated as a multiple-objective optimization problem to maximize an estimate of system reliability, and also, to minimize the variance of the reliability estimate. The two objectives are combined by penalizing the variance for prospective solutions. The two most common fault-tolerant embedded system architectures, N-Version Programming and Recovery Block, are considered as strategies to improve system reliability by providing system redundancy. Four distinct models are presented to demonstrate the proposed optimization techniques with or without redundancy. For many design problems, multiple functionally equivalent software versions have failure correlation even if they have been independently developed. The failure correlation may result from faults in the software specification, faults from a voting algorithm, and/or related faults from any two software versions. Our approach considers this correlation in formulating practicrs this correlation in formulating practical optimization models. Genetic algorithms with a dynamic penalty function are applied in solving this optimization problem, and reasonable and interesting results are obtained and discussed

110

Survey On Fault Tolerance In Grid Computing  

Directory of Open Access Journals (Sweden)

Full Text Available Grid computing is defined as a hardware and software infrastructure that enables coordinatedresource sharing within dynamic organizations. In grid computing, the probability of a failure is muchgreater than in traditional parallel computing. Therefore, the fault tolerance is an important property inorder to achieve reliability, availability and QOS. In this paper, we give a survey on various faulttolerance techniques, fault management in different systems and related issues. A fault tolerance servicedeals with various types of resource failures, which include process failure, processor failure and networkfailures. This survey provides the related research results about fault tolerance in distinct functional areasof grid infrastructure and also gave the future directions about fault tolerance techniques, and it is a goodreference for researcher.

P. Latchoumy

2011-12-01

111

Nonlinear, Adaptive and Fault-tolerant Control for Electro-hydraulic Servo Systems  

DEFF Research Database (Denmark)

Fluid power systems have been in use since 1795 with the rst hydraulic press patented by Joseph Bramah and today form the basis of many industries. Electro hydraulic servo systems are uid power systems controlled in closed-loop. They transform reference input signals into a set of movements in hydraulic actuators (cylinders or motors) by the means of hydraulic uid under pressure. With the development of computing power and control techniques during the last few decades, they are used increasingly in many industrial elds which require high actuation forces within limited space. However, despite numerous attractive properties, hydraulic systems are always subject to potential leakages in their components, friction variation in their hydraulic actuators and deciency in their sensors. These violations of normal behaviour reduce the system performances and can lead to system failure if they are not detected early and handled. Moreover, the task of controlling electro hydraulic systems for high performance operations is challenging due to the highly nonlinear behaviour of such systems and the large amount of uncertainties present in their models. This thesis focuses on nonlinear adaptive fault-tolerant control for a representative electro hydraulic servo controlled motion system. The thesis extends existing models of hydraulic systems by considering more detailed dynamics in the servo valve and in the friction inside the hydraulic cylinder. It identies the model parameters using experimental data from a test bed by analysing both the time response to standard input signals and the variation of the outputs with dierent excitation frequencies. The thesis also presents a model that accurately describes the static and dynamic normal behaviour of the system. Further, in this thesis, a fault detector is designed and implemented on the test bed that successfully diagnoses internal or external leakages, friction variations in the actuator or fault related to pressure sensors. The presented algorithm uses the position and pressure measurements to detect and isolate faults, avoiding missed detection and false alarm. The thesis also develops a high performance adaptive nonlinear controller for the hydraulic system which outperforms comparable linear controllers widely used in the industry. Because of the controller adaptivity, uncertainties in the model parameters can be handled. Moreover, a special attention is given to reduce the complexity of the controller in order to demonstrate its real-time implementation. Finally the thesis combines the techniques developed in fault detection and nonlinear control in order to develop an active fault-tolerant controller for electro hydraulic servo systems. In order to maintain overall service and performances as high as possible when a potential fault occurs, the fault-tolerant controlled system prognoses the fault and changes its controller parameters or structure. The consequences of an unexpected fault are avoided, high availability is ensured and the overall safety in electro hydraulic servo systems is increased.

Choux, Martin

2011-01-01

112

Fault tolerant, multiplexed control rod position detection and indication system for nuclear power plants  

International Nuclear Information System (INIS)

The majority of Westinghouse nuclear plants placed in service thus far have incorporated a Rod Position Indication system based upon an analog design philosophy. This system, while meeting all functional and accuracy requirements, has proven somewhat cumbersome, particularly in the area of initial field calibration and maintenance. This paper describes a new Digital Rod Position Indication system (DRPI) developed for use with pressurized water reactors. The system is based upon a digital design philosophy and meets all previous design constraints and environmental requirements. Further, fault tolerance, improved accuracy, interference from adjacent rods and the elimination of adjustments and calibration has been provided

113

Reliability of computer systems and networks fault tolerance, analysis, and design  

CERN Document Server

With computers becoming embedded as controllers in everything from network servers to the routing of subway schedules to NASA missions, there is a critical need to ensure that systems continue to function even when a component fails. In this book, bestselling author Martin Shooman draws on his expertise in reliability engineering and software engineering to provide a complete and authoritative look at fault tolerant computing. He clearly explains all fundamentals, including how to use redundant elements in system design to ensure the reliability of computer systems and networks.Market: Systems

Shooman, Martin L

2002-01-01

114

Fault Tolerant Cache Schemes  

Science.gov (United States)

Most of modern microprocessors employ on—chip cache memories to meet the memory bandwidth demand. These caches are now occupying a greater real es tate of chip area. Also, continuous down scaling of transistors increases the possi bility of defects in the cache area which already starts to occupies more than 50% of chip area. For this reason, various techniques have been proposed to tolerate defects in cache blocks. These techniques can be classified into three different cat egories, namely, cache line disabling, replacement with spare block, and decoder reconfiguration without spare blocks. This chapter examines each of those fault tol erant techniques with a fixed typical size and organization of L1 cache, through extended simulation using SPEC2000 benchmark on individual techniques. The de sign and characteristics of each technique are summarized with a view to evaluate the scheme. We then present our simulation results and comparative study of the three different methods.

Tu, H.-Yu.; Tasneem, Sarah

115

Behavioral System-theoretic approach to fault-tolerant control  

OpenAIRE

The field of system and control theory has achieved an interdisciplinary status during the past five decades, and we refer to the theory that was developed during this period as the conventional control theory. This mainly relates to the study of automation and the design of controllers. A controller is a device that makes the interconnection with a given system so that the controlled system can behave in a desired way. In this thesis, we deal with the issues when the controlled system become...

Jain, Tushar

2012-01-01

116

A Robust and Fault-Tolerant Distributed Intrusion Detection System  

CERN Document Server

Since it is impossible to predict and identify all the vulnerabilities of a network, and penetration into a system by malicious intruders cannot always be prevented, intrusion detection systems (IDSs) are essential entities for ensuring the security of a networked system. To be effective in carrying out their functions, the IDSs need to be accurate, adaptive, and extensible. Given these stringent requirements and the high level of vulnerabilities of the current days' networks, the design of an IDS has become a very challenging task. Although, an extensive research has been done on intrusion detection in a distributed environment, distributed IDSs suffer from a number of drawbacks e.g., high rates of false positives, low detection efficiency etc. In this paper, the design of a distributed IDS is proposed that consists of a group of autonomous and cooperating agents. In addition to its ability to detect attacks, the system is capable of identifying and isolating compromised nodes in the network thereby introduc...

Sen, Jaydip

2011-01-01

117

Fault-Tolerant Process Control Methods and Applications  

CERN Document Server

Fault-Tolerant Process Control focuses on the development of general, yet practical, methods for the design of advanced fault-tolerant control systems; these ensure an efficient fault detection and a timely response to enhance fault recovery, prevent faults from propagating or developing into total failures, and reduce the risk of safety hazards. To this end, methods are presented for the design of advanced fault-tolerant control systems for chemical processes which explicitly deal with actuator/controller failures and sensor faults and data losses. Specifically, the book puts forward: ·         a framework for  detection, isolation and diagnosis of actuator and sensor faults for nonlinear systems; ·         controller reconfiguration and safe-parking-based fault-handling methodologies; ·         integrated-data- and model-based fault-detection and isolation and fault-tolerant control methods; ·         methods for handling sensor faults and data losses; and ·      ...

Mhaskar, Prashant; Christofides, Panagiotis D

2013-01-01

118

Local rollback for fault-tolerance in parallel computing systems  

Science.gov (United States)

A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.

Blumrich, Matthias A. (Yorktown Heights, NY); Chen, Dong (Yorktown Heights, NY); Gara, Alan (Yorktown Heights, NY); Giampapa, Mark E. (Yorktown Heights, NY); Heidelberger, Philip (Yorktown Heights, NY); Ohmacht, Martin (Yorktown Heights, NY); Steinmacher-Burow, Burkhard (Boeblingen, DE); Sugavanam, Krishnan (Yorktown Heights, NY)

2012-01-24

119

Fault-tolerant Agreement in Synchronous Message-passing Systems  

CERN Document Server

The present book focuses on the way to cope with the uncertainty created by process failures (crash, omission failures and Byzantine behavior) in synchronous message-passing systems (i.e., systems whose progress is governed by the passage of time). To that end, the book considers fundamental problems that distributed synchronous processes have to solve. These fundamental problems concern agreement among processes (if processes are unable to agree in one way or another in presence of failures, no non-trivial problem can be solved). They are consensus, interactive consistency, k-set agreement an

Raynal, Michel

2010-01-01

120

The Isis project: Fault-tolerance in large distributed systems  

Science.gov (United States)

This final status report covers activities of the Isis project during the first half of 1992. During the report period, the Isis effort has achieved a major milestone in its effort to redesign and reimplement the Isis system using Mach and Chorus as target operating system environments. In addition, we completed a number of publications that address issues raised in our prior work; some of these have recently appeared in print, while others are now being considered for publication in a variety of journals and conferences.

Birman, Kenneth P.; Marzullo, Keith

1993-01-01

121

An integrated methodology for the dynamic performance and reliability evaluation of fault-tolerant systems  

Energy Technology Data Exchange (ETDEWEB)

We propose an integrated methodology for the reliability and dynamic performance analysis of fault-tolerant systems. This methodology uses a behavioral model of the system dynamics, similar to the ones used by control engineers to design the control system, but also incorporates artifacts to model the failure behavior of each component. These artifacts include component failure modes (and associated failure rates) and how those failure modes affect the dynamic behavior of the component. The methodology bases the system evaluation on the analysis of the dynamics of the different configurations the system can reach after component failures occur. For each of the possible system configurations, a performance evaluation of its dynamic behavior is carried out to check whether its properties, e.g., accuracy, overshoot, or settling time, which are called performance metrics, meet system requirements. Markov chains are used to model the stochastic process associated with the different configurations that a system can adopt when failures occur. This methodology not only enables an integrated framework for evaluating dynamic performance and reliability of fault-tolerant systems, but also enables a method for guiding the system design process, and further optimization. To illustrate the methodology, we present a case-study of a lateral-directional flight control system for a fighter aircraft.

Dominguez-Garcia, Alejandro D. [Department of Electrical and Computer Engineering, University of Illionois at Urbana-Champaign, Urbana, IL 61801-2918 (United States)], E-mail: aledan@UIUC.EDU; Kassakian, John G.; Schindall, Joel E. [Laboratory for Electromagnetic and Electronic Systems, Massachusetts Institute of Technology, Cambridge, MA 02139-4307 (United States); Zinchuk, Jeffrey J. [Charles Stark Draper Laboratory, Cambridge, MA 02139-3563 (United States)

2008-11-15

122

An integrated methodology for the dynamic performance and reliability evaluation of fault-tolerant systems  

International Nuclear Information System (INIS)

We propose an integrated methodology for the reliability and dynamic performance analysis of fault-tolerant systems. This methodology uses a behavioral model of the system dynamics, similar to the ones used by control engineers to design the control system, but also incorporates artifacts to model the failure behavior of each component. These artifacts include component failure modes (and associated failure rates) and how those failure modes affect the dynamic behavior of the component. The methodology bases the system evaluation on the analysis of the dynamics of the different configurations the system can reach after component failures occur. For each of the possible system configurations, a performance evaluation of its dynamic behavior is carried out to check whether its properties, e.g., accuracy, overshoot, or settling time, which are called performance metrics, meet system requirements. Markov chains are used to model the stochastic process associated with the different configurations that a system can adopt when failures occur. This methodology not only enables an integrated framework for evaluating dynamic performance and reliability of fault-tolerant systems, but also enables a method for guiding the system design process, and further optimization. To illustrate the methodology, we present a case-study of a lateral-directional flight control system for a fighter aircraft

123

Performance Evaluation of SDS Algorithm with Fault Tolerance for Distributed System  

Directory of Open Access Journals (Sweden)

Full Text Available In the recent past, Security-sensitive applications, such as electronic transaction processing systems, stock quote update systems, which require high quality of security to guarantee authentication, integrity, and confidentiality of information, have adopted Heterogeneous Distributed System (HDS as their platforms.We systematically design a security-driven scheduling architecture that can dynamically measure the trust level of each node in the system by using differential equations and introduce SRank to estimate security overhead of critical tasks using SDS algorithm.Furthermore,we can achieve high quality of security for applications by using security-driven scheduling algorithm for DAGs in terms of minimizing the makespan, risk probability, and speedup. In addition to that the fault tolerant is included using Security Driven Fault Tolerant Scheduling Algorithm (SDFT to tolerate N processors failure at one time, and it introduced a new global scheduler to improve efficiency of scheduling process.Moreover, the SDFT supported flexible security policy applied on real time tasks according to its security requirement and considered the effect of security overhead during scheduling. We also observe that the improvement obtained by our algorithm increases as the security-sensitive data of applications increases.

K.Sathiya Bharathi,

2012-07-01

124

Fault-tolerant routing in peer-to-peer systems  

CERN Document Server

We consider the problem of designing an overlay network and routing mechanism that permits finding resources efficiently in a peer-to-peer system. We argue that many existing approaches to this problem can be modeled as the construction of a random graph embedded in a metric space whose points represent resource identifiers, where the probability of a connection between two nodes depends only on the distance between them in the metric space. We study the performance of a peer-to-peer system where nodes are embedded at grid points in a simple metric space: a one-dimensional real line. We prove upper and lower bounds on the message complexity of locating particular resources in such a system, under a variety of assumptions about failures of either nodes or the connections between them. Our lower bounds in particular show that the use of inverse power-law distributions in routing, as suggested by Kleinberg (1999), is close to optimal. We also give efficient heuristics to dynamically maintain such a system as new...

Aspnes, J; Shah, G; Aspnes, James; Diamadi, Zoe; Shah, Gauri

2003-01-01

125

Fault Tolerant Neural Network for ECG Signal Classification Systems  

Directory of Open Access Journals (Sweden)

Full Text Available The aim of this paper is to apply a new robust hardware Artificial Neural Network (ANN for ECG classification systems. This ANN includes a penalization criterion which makes the performances in terms of robustness. Specifically, in this method, the ANN weights are normalized using the auto-prune method. Simulations performed on the MIT ? BIH ECG signals, have shown that significant robustness improvements are obtained regarding potential hardware artificial neuron failures. Moreover, we show that the proposed design achieves better generalization performances, compared to the standard back-propagation algorithm.

MERAH, M.

2011-08-01

126

Algorithmic Based Fault Tolerance Applied to High Performance Computing  

OpenAIRE

We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance technique (Huang and Abraham, 1984) to the need of parallel distributed computation. We obtain a strongly scalable mechanism for fault tolerance. We can also detect and correct errors (bit-flip) on the fly of a computation. To assess the viability of our approach, we have developed a fault tolerant matrix-m...

Bosilca, George; Delmas, Remi; Dongarra, Jack; Langou, Julien

2008-01-01

127

P2P???????????? Fault-Tolerant Method in P2P Information Management Systems  

Directory of Open Access Journals (Sweden)

Full Text Available FissionE?????Kautz??P2P??????????????????(d = 2????????????????????FissionE?????????????FissionE?????????????????????????????????????????????“??”?????????????????????FissionE is a Kautz graph based infrastructure of P2P information management systems. It has the optimal network diameter given node degree d = 2. In order to address the problem of degraded routing performance caused by node failures, in this paper we propose a fault-tolerant routing algorithm for the FissionE system. The basic idea is to bypass failed node or link with some certain mechanism, so that FissionE can achieve better routing performance.

??

2012-03-01

128

The NILE system architecture: fault-tolerant, wide-area access to computing and data resources  

International Nuclear Information System (INIS)

NILE is a multi-disciplinary project building a distributed computing environment for HEP. It provides wide-area, fault-tolerant, integrated access to processing and data resources for collaborators of the CLEO experiment, though the goals and principles are applicable to many domains. NILE has three main objectives: a realistic distributed system architecture design, the design of a robust data model, and a Fast-Track implementation providing a prototype design environment which will also be used by CLEO physicists. This paper focuses on the software and wide-area system architecture design and the computing issues involved in making NILE services highly-available. (author)

129

Replicated R-Resilient Process Allocation for Load Distribution in Fault Tolerant System  

Directory of Open Access Journals (Sweden)

Full Text Available Process allocation for load distribution can improve system performance by utilizing resources efficiently. For primary-backup based fault tolerant system, a classic load-balancing process allocation method (two-stage allocation algorithm has been proposed that can balance the load before as well as after faults occurrence. But two-stage allocation algorithm has bad scalability since its load-balancing performance reduces dramatically when each primary process is duplicated more than once (i.e., has more than one backup process. In this study, we present an improved algorithm named RSA (R-Stage Allocation algorithm that can have the load better balanced no matter how many backup processes each primary process owns; Simulations are also used to compare the proposed algorithm with the two-stage allocation algorithm and the experimental results show that when extending to replicated R-Resilient processes, RSA has significantly better load distribution performance than two-stage allocation algorithm.

Jian Wang

2008-01-01

130

Formal specification and mechanical verification of SIFT - A fault-tolerant flight control system  

Science.gov (United States)

The paper describes the methodology being employed to demonstrate rigorously that the SIFT (software-implemented fault-tolerant) computer meets its requirements. The methodology uses a hierarchy of design specifications, expressed in the mathematical domain of multisorted first-order predicate calculus. The most abstract of these, from which almost all details of mechanization have been removed, represents the requirements on the system for reliability and intended functionality. Successive specifications in the hierarchy add design and implementation detail until the PASCAL programs implementing the SIFT executive are reached. A formal proof that a SIFT system in a 'safe' state operates correctly despite the presence of arbitrary faults has been completed all the way from the most abstract specifications to the PASCAL program.

Melliar-Smith, P. M.; Schwartz, R. L.

1982-01-01

131

Synthesis of Fault-Tolerant Schedules with Transparency/Performance Trade-offs for Distributed Embedded Systems  

DEFF Research Database (Denmark)

In this paper we present an approach to the scheduling of fault-tolerant embedded systems for safety-critical applications. Processes and messages are statically scheduled, and we use process re-execution for recovering from multiple transient faults. If process recovery is performed such that the operation of other processes is not affected, we call it transparent recovery. Although transparent recovery has the advantages of fault containment, improved debugability and less memory needed to store the fault-tolerant schedules, it will introduce delays that can violate the timing constraints of the application. We propose a novel algorithm for the synthesis of fault-tolerant schedules that can handle the transparency/performance trade-offs imposed by the designer, and makes use of the fault-occurrence information to reduce the overhead due to fault tolerance. We model the application as a conditional process graph, where the fault occurrence information is represented as conditional edges and the transparent recovery is captured using synchronization nodes.

Izosimov, Viacheslav; Pop, Paul

2006-01-01

132

Model Checking a Byzantine-Fault-Tolerant Self-Stabilizing Protocol for Distributed Clock Synchronization Systems  

Science.gov (United States)

This report presents the mechanical verification of a simplified model of a rapid Byzantine-fault-tolerant self-stabilizing protocol for distributed clock synchronization systems. This protocol does not rely on any assumptions about the initial state of the system. This protocol tolerates bursts of transient failures, and deterministically converges within a time bound that is a linear function of the self-stabilization period. A simplified model of the protocol is verified using the Symbolic Model Verifier (SMV) [SMV]. The system under study consists of 4 nodes, where at most one of the nodes is assumed to be Byzantine faulty. The model checking effort is focused on verifying correctness of the simplified model of the protocol in the presence of a permanent Byzantine fault as well as confirmation of claims of determinism and linear convergence with respect to the self-stabilization period. Although model checking results of the simplified model of the protocol confirm the theoretical predictions, these results do not necessarily confirm that the protocol solves the general case of this problem. Modeling challenges of the protocol and the system are addressed. A number of abstractions are utilized in order to reduce the state space. Also, additional innovative state space reduction techniques are introduced that can be used in future verification efforts applied to this and other protocols.

Malekpour, Mahyar R.

2007-01-01

133

State of the art on fault-tolerant real time distributed systems  

International Nuclear Information System (INIS)

The integration of new computerized functions in power plant, and especially nuclear power plant, control and instrumentation systems implies more and more stringent requirements as to communication system reliability. For if an item of equipment, or even a computer program, can be validated and qualified, no formal qualification procedure is presently imposed on communication networks. This is certainly due to the relative immaturity of these networks, but also to their complexity. It is for this reason that, in the context of preparation for the future PWR 2000 standardized nuclear plants, it would seem appropriate to take a look at fault-tolerant communication systems. Since C and I type applications (in the control room) are divided between several computers and are required to contend with extremely severe time constraints, EDF has undertaken investigation of fault-tolerant, real time distributed systems. This paper summarized the state of the art in the field as it appears from discussion with computer manufacturers, academics and research workers on related projects. The results obtained were then used to determine trends as to ''promising'' solutions. The paper concludes with recommended study programs for the PCC department of EDF/R and DD for the next few years. (author), 9 figs., 10 refs., 2 annexes

134

Filtering and fault tolerant control of parameter-varying time-delay systems and applications  

Science.gov (United States)

This dissertation addresses some open problems in control systems theory. The problems considered include the dynamic controller and filter design for Linear Parameter Varying (LPV) time-delay systems, the reconfigurable control design in Fault Tolerant Control Systems (FTCS) and fault diagnostics in Diesel engines. In the first part of this thesis, we investigate the problem of designing parameter-dependent filters for output estimation of LPV time-delay systems. The filters are designed such that the filtering error system guarantees an optimum level of H2 or Hinfinity performance. A state-delay term is included in the filter dynamics to reduce the design conservatism and improve the performance. The Linear Matrix Inequality (LMI)-based synthesis conditions developed for the filter design purposes are categorized into the rate-dependent and delay-dependent conditions which could handle the time-varying state-delay and bounded small delay cases, respectively. Among these two, the latter one is shown to provide a significant reduction in the conservativeness in the filter design. The second part of the thesis examines the analysis and synthesis of Fault Tolerant Control (FTC) systems in an LPV framework. For reconfigurable control design purposes, the information from Fault Detection and Isolation (FDI) module, that provides an estimate of the fault parameters, is utilized to schedule the controller matrices. We will also present a formulation that incorporates the factor of detection delay in the FTC supervisory system. It is shown that including this delay in the synthesis conditions leads to improved performance and reduced control effort. For analysis of the FTC systems including time-delay, where the fault parameters might be identified inaccurately, we first introduce the notion of brief instability for LPV time-delay systems. In these systems it is possible that the output trajectory converges to zero even though there are parameter trajectories for which the system is locally unstable for a short period of time. Using the analysis conditions for LPV time-delay systems including brief instability, we develop analysis conditions that lead to an explicit formulae that indicates how the FTC closed-loop system performance is degraded under the false identification of the fault parameters. The results are validated on a model of a Highly Maneuverable Aircraft Technology (HiMAT) vehicle. The last part of this thesis presents a model-based diagnostic algorithm for the detection and estimation of the internal leak and restriction in the Exhaust Gas Recirculation (EGR) system of Diesel engines. The initial step in the proposed method is the identification of two parameters in a static relationship. As soon as a fault occurs, the identification algorithm provides a change in the coefficients of the static equation. The results of the experimental validation of the diagnostic algorithm are illustrated on data collected from a test cell and using different trucks during the transient cycle. A statistical analysis is also performed to determine the thresholds that capture the normal variability of the healthy system.

Mohammadpour Velni, Javad

135

Scheduling and Optimization of Fault-Tolerant Embedded Systems with Transparency/Performance Trade-Offs  

DEFF Research Database (Denmark)

In this article, we propose a strategy for the synthesis of fault-tolerant schedules and for the mapping of fault-tolerant applications. Our techniques handle transparency/performance trade-offs and use the faultoccurrence information to reduce the overhead due to fault tolerance. Processes and messages are statically scheduled, and we use process reexecution for recovering from multiple transient faults. We propose a finegrained transparent recovery, where the property of transparency can be selectively applied to processes and messages. Transparency hides the recovery actions in a selected part of the application so that they do not affect the schedule of other processes and messages. While leading to longer schedules, transparent recovery has the advantage of both improved debuggability and less memory needed to store the faulttolerant schedules.

Izosimov, Viacheslav; Pop, Paul

2012-01-01

136

GRID COMPUTING AND FAULT TOLERANCE APPROACH  

Directory of Open Access Journals (Sweden)

Full Text Available Grid computing is a means of allocating the computational power of alarge number of computers to complex difficult computation orproblem. Grid computing is a distributed computing paradigm thatdiffers from traditional distributed computing in that it is aimed toward large scale systems that even span organizational boundaries. This paper proposes a method to achieve maximum fault tolerance in the Grid environment system by using Reliability consideration by using Replication approach and Check-point approach. Fault tolerance is an important property for large scale computational grid systems, where geographically distributed nodes co-operate to execute a task. In order to achieve high level of reliability and availability, the grid infrastructure should be a foolproof fault tolerant. Since the failure of resources affects job execution fatally, fault tolerance service is essential to satisfy QOS requirement in grid computing. Commonly utilized techniques for providing fault tolerance are job check pointing and replication. Both techniques mitigate the amount of work lost due to changing system availability but can introduce significant runtime overhead. The latter largely depends on the length of check pointing interval and the chosen number of replicas, respectively. In case of complex scientific workflows where tasks can execute in well defined order reliability is another biggest challenge because of the unreliable nature of the grid resources.

Pankaj Gupta,

2011-10-01

137

Reliability and maintainability assessment factors for reliable fault-tolerant systems  

Science.gov (United States)

A long term goal of the NASA Langley Research Center is the development of a reliability assessment methodology of sufficient power to enable the credible comparison of the stochastic attributes of one ultrareliable system design against others. This methodology, developed over a 10 year period, is a combined analytic and simulative technique. An analytic component is the Computer Aided Reliability Estimation capability, third generation, or simply CARE III. A simulative component is the Gate Logic Software Simulator capability, or GLOSS. The numerous factors that potentially have a degrading effect on system reliability and the ways in which these factors that are peculiar to highly reliable fault tolerant systems are accounted for in credible reliability assessments. Also presented are the modeling difficulties that result from their inclusion and the ways in which CARE III and GLOSS mitigate the intractability of the heretofore unworkable mathematics.

Bavuso, S. J.

1984-01-01

138

Backstepping decentralized fault tolerant control for reconfigurable modular robots  

OpenAIRE

For the actuators fault of reconfigurable modular robots, a backstepping decentralized fault tolerant control(DFTC) algorithm is proposed. The reconfigurable robot system is divied into a set of interconnected subsystems. The fault tolerant controller is designed based on backstepping method.

Jinbao He; Xinhua Yi; Zaifei Luo; Guojun Li

2013-01-01

139

A Fault-Tolerant Emergency-Aware Access Control Scheme for Cyber-Physical Systems  

CERN Document Server

Access control is an issue of paramount importance in cyber-physical systems (CPS). In this paper, an access control scheme, namely FEAC, is presented for CPS. FEAC can not only provide the ability to control access to data in normal situations, but also adaptively assign emergency-role and permissions to specific subjects and inform subjects without explicit access requests to handle emergency situations in a proactive manner. In FEAC, emergency-group and emergency-dependency are introduced. Emergencies are processed in sequence within the group and in parallel among groups. A priority and dependency model called PD-AGM is used to select optimal response-action execution path aiming to eliminate all emergencies that occurred within the system. Fault-tolerant access control polices are used to address failure in emergency management. A case study of the hospital medical care application shows the effectiveness of FEAC.

Wu, Guowei; Xia, Feng; Yao, Lin

2012-01-01

140

Fault tolerant operation of switched reluctance machine  

Science.gov (United States)

The energy crisis and environmental challenges have driven industry towards more energy efficient solutions. With nearly 60% of electricity consumed by various electric machines in industry sector, advancement in the efficiency of the electric drive system is of vital importance. Adjustable speed drive system (ASDS) provides excellent speed regulation and dynamic performance as well as dramatically improved system efficiency compared with conventional motors without electronics drives. Industry has witnessed tremendous grow in ASDS applications not only as a driving force but also as an electric auxiliary system for replacing bulky and low efficiency auxiliary hydraulic and mechanical systems. With the vast penetration of ASDS, its fault tolerant operation capability is more widely recognized as an important feature of drive performance especially for aerospace, automotive applications and other industrial drive applications demanding high reliability. The Switched Reluctance Machine (SRM), a low cost, highly reliable electric machine with fault tolerant operation capability, has drawn substantial attention in the past three decades. Nevertheless, SRM is not free of fault. Certain faults such as converter faults, sensor faults, winding shorts, eccentricity and position sensor faults are commonly shared among all ASDS. In this dissertation, a thorough understanding of various faults and their influence on transient and steady state performance of SRM is developed via simulation and experimental study, providing necessary knowledge for fault detection and post fault management. Lumped parameter models are established for fast real time simulation and drive control. Based on the behavior of the faults, a fault detection scheme is developed for the purpose of fast and reliable fault diagnosis. In order to improve the SRM power and torque capacity under faults, the maximum torque per ampere excitation are conceptualized and validated through theoretical analysis and experiments. With the proposed optimal waveform, torque production is greatly improved under the same Root Mean Square (RMS) current constraint. Additionally, position sensorless operation methods under phase faults are investigated to account for the combination of physical position sensor and phase winding faults. A comprehensive solution for position sensorless operation under single and multiple phases fault are proposed and validated through experiments. Continuous position sensorless operation with seamless transition between various numbers of phase fault is achieved.

Wang, Wei

141

Coordinated Fault Tolerance for High-Performance Computing  

Energy Technology Data Exchange (ETDEWEB)

Our work to meet our goal of end-to-end fault tolerance has focused on two areas: (1) improving fault tolerance in various software currently available and widely used throughout the HEC domain and (2) using fault information exchange and coordination to achieve holistic, systemwide fault tolerance and understanding how to design and implement interfaces for integrating fault tolerance features for multiple layers of the software stack—from the application, math libraries, and programming language runtime to other common system software such as jobs schedulers, resource managers, and monitoring tools.

Dongarra, Jack; Bosilca, George; et al.

2013-04-08

142

System-level fault-tolerance in large-scale parallel machines with buffered coscheduling  

Energy Technology Data Exchange (ETDEWEB)

As the number of processors for multi-teraflop systems grows to tens of thousands, with proposed petaflops systems likely to contain hundreds of thousands of processors, the assumption of fully reliable hardware has been abandoned. Although the mean time between failures for the individual Components can be very high, the large total component count will inevitably lead to frequent failures. It is therefore ofparamount importance to develop new software solutions to deal with the unavoidable reality of hardware faults. In this paper we will first describe the nature of the failures of current large-scale machines, and extrapolate these results to future machines. Based on this preliminary analysis we will present a new technology that we are currently developing, buffered coscheduling, which seeks to implement fault tolerance at the operating system level. Major design goals include dynamic reallocation of resources to allow continuing execution in the presence of hardware failures, very high scalability, high eficiency (low overhead), and transparency-requiring no changes to user applications. Preliminary results show that this is attainable with current hardware.

Petrini, F. (Fabrizio); Davis, Kei,; Sancho, J. C. (Jose Carlos)

2004-01-01

143

Fault Injection Campaign for a Fault Tolerant Duplex Framework  

Science.gov (United States)

Fault tolerance is an efficient approach adopted to avoid or reduce the damage of a system failure. In this work we present the results of a fault injection campaign we conducted on the Duplex Framework (DF). The DF is a software developed by the UCLA group [1, 2] that uses a fault tolerant approach and allows to run two replicas of the same process on two different nodes of a commercial off-the-shelf (COTS) computer cluster. A third process running on a different node, constantly monitors the results computed by the two replicas, and eventually restarts the two replica processes if an inconsistency in their computation is detected. This approach is very cost efficient and can be adopted to control processes on spacecrafts where the fault rate produced by cosmic rays is not very high.

Sacco, Gian Franco; Ferraro, Robert D.; von llmen, Paul; Rennels, Dave A.

2007-01-01

144

Reversible Fault-Tolerant Logic  

CERN Document Server

It is now widely accepted that the CMOS technology implementing irreversible logic will hit a scaling limit beyond 2016, and that the increased power dissipation is a major limiting factor. Reversible computing can potentially require arbitrarily small amounts of energy. Recently several nano-scale devices which have the potential to scale, and which naturally perform reversible logic, have emerged. This paper addresses several fundamental issues that need to be addressed before any nano-scale reversible computing systems can be realized, including reliability and performance trade-offs and architecture optimization. Many nano-scale devices will be limited to only near neighbor interactions, requiring careful optimization of circuits. We provide efficient fault-tolerant (FT) circuits when restricted to both 2D and 1D. Finally, we compute bounds on the entropy (and hence, heat) generated by our FT circuits and provide quantitative estimates on how large can we make our circuits before we lose any advantage ove...

Boykin, P O; Roychowdhury, Vwani P.

2005-01-01

145

A Byzantine-Fault Tolerant Self-Stabilizing Protocol for Distributed Clock Synchronization Systems  

Science.gov (United States)

Embedded distributed systems have become an integral part of safety-critical computing applications, necessitating system designs that incorporate fault tolerant clock synchronization in order to achieve ultra-reliable assurance levels. Many efficient clock synchronization protocols do not, however, address Byzantine failures, and most protocols that do tolerate Byzantine failures do not self-stabilize. Of the Byzantine self-stabilizing clock synchronization algorithms that exist in the literature, they are based on either unjustifiably strong assumptions about initial synchrony of the nodes or on the existence of a common pulse at the nodes. The Byzantine self-stabilizing clock synchronization protocol presented here does not rely on any assumptions about the initial state of the clocks. Furthermore, there is neither a central clock nor an externally generated pulse system. The proposed protocol converges deterministically, is scalable, and self-stabilizes in a short amount of time. The convergence time is linear with respect to the self-stabilization period. Proofs of the correctness of the protocol as well as the results of formal verification efforts are reported.

Malekpour, Mahyar R.

2006-01-01

146

A Fault-Tolerant Modulation Method to Counteract the Double Open-Switch Fault in Matrix Converter Drive Systems without Redundant Power Devices  

DEFF Research Database (Denmark)

This paper studies the double open-switch fault issue occurring within the conventional matrix converter driving a three-phase permanent-magnet synchronous motor system and proposes a fault-tolerant solution by introducing a revised modulation strategy. In this switching strategy, the rectifier-stage modulation is adjusted based on the knowledge of the switching logics of the inverter-stage and the operating input voltage sectors. However, the proposed fault-tolerant method does not rely on the assist of any redundant power devices or any reconfiguration of the matrix converter circuit by means of using redundant physical connections. It is shown that different locations of the double open switch affect the availability of the revised modulation. The steady state absolute speed error achieved with the proposed method is 4% of the nominal speed. Experimental results are performed to demonstrate the efficacy of the proposed methods.

Chen, Der-Fa; Nguyen-Duy, Khiem

2012-01-01

147

Fault Tolerant Homopolar Magnetic Bearings  

Science.gov (United States)

Magnetic suspensions (MS) satisfy the long life and low loss conditions demanded by satellite and ISS based flywheels used for Energy Storage and Attitude Control (ACESE) service. This paper summarizes the development of a novel MS that improves reliability via fault tolerant operation. Specifically, flux coupling between poles of a homopolar magnetic bearing is shown to deliver desired forces even after termination of coil currents to a subset of failed poles . Linear, coordinate decoupled force-voltage relations are also maintained before and after failure by bias linearization. Current distribution matrices (CDM) which adjust the currents and fluxes following a pole set failure are determined for many faulted pole combinations. The CDM s and the system responses are obtained utilizing 1D magnetic circuit models with fringe and leakage factors derived from detailed, 3D, finite element field models. Reliability results are presented vs. detection/correction delay time and individual power amplifier reliability for 4, 6, and 7 pole configurations. Reliability is shown for two success criteria, i.e. (a) no catcher bearing contact following pole failures and (b) re-levitation off of the catcher bearings following pole failures. An advantage of the method presented over other redundant operation approaches is a significantly reduced requirement for backup hardware such as additional actuators or power amplifiers.

Li, Ming-Hsiu; Palazzolo, Alan; Kenny, Andrew; Provenza, Andrew; Beach, Raymond; Kascak, Albert

2003-01-01

148

Network fault tolerance in LA-MPI  

Energy Technology Data Exchange (ETDEWEB)

LA-MPI is a high-performance, network-fault-tolerant implementation of MPl designcd for terascale clusters that are inherently unreliable due to their very large number of system components and to trade-offs between cost and pcrformance. This paper reviews the architectural design of LA-MPI, focusing on our approach to guaranteeing data integrity. We discuss our network data path abstraction that makes LA-MPI highly portable, givcs high-performance through mcssage striping, and niost importantly provides the basis for network fault tolerance. Finally we include some performance numbers for the Quadrics and UDP network paths.

Aulwes, R. T. (Robbie T.); Daniel, D. J. (David J.); Desai, N. N. (Nehal N.); Graham, R. L. (Richard L.); Risinger, L. D. (Larrd Dean); Sukalski, M. W. (Mitchel W.); Taylor, M. A. (Mark)

2003-01-01

149

Early Error Detection for Fault Tolerance Strategies  

OpenAIRE

In this paper we present an integration of early run-time monitors in real-time systems to improve their fault tolerance properties. Early Error Detection is a mechanism that provides a theoretically optimal run-time error detection service, based on a formal specification of an application, e.g., given by a timed automata. We show how our approach can improve classical fault tolerance strategies by investigating two use-cases, namely for a design pattern that provides several degraded modes ...

Robert, Thomas; Roy, Matthieu; Fabre, Jean-charles

2010-01-01

150

Fault Tolerant Environment in web crawler Using Hardware Failure Detection  

OpenAIRE

Fault Tolerant Environment is a complete programming environment for the reliable execution of distributed application programs. Fault Tolerant Distributed Environment encompasses all aspects of modern fault-tolerant distributed computing. The built-in user-transparent error detection mechanism covers processor node crashes and hardware transient failures. The mechanism also integrates user-assisted error checks into the system failure model. The nucleus non-blocking checkpointing mechanism c...

Anup Garje, Prof Bhavesh Patel

2012-01-01

151

Simulation Framework for Evaluation of Fault Tolerant Large Dynamic Distributed System  

Directory of Open Access Journals (Sweden)

Full Text Available The use of Java based simulators in the design and development of distributed system for evaluating the dependability on algorithms is appreciable due to their efficiency and scalability. It allows in designing the realistic simulation scenarios. In this work, we have proposed a Saturn, a multithreaded process oriented over simulation framework which is designed for modeling large scale distributed system. Realistic simulation is provided by it to provide a wide-range of distributed system technologies. It is an innovative solution to the problem of evaluating dependability characteristics of distributed system. Our solution is based on several proposed extensions to the simulation model of the MONARC simulation framework. These extensions refer to fault tolerance and system orchestration mechanisms in order to access the reliability and availability of distributed systems. The extended simulation model includes the necessary components to describe various actual failure situations and provides the mechanism to evaluate different strategies for replication and redundancy procedure as well as security enforcement mechanism. It is a simulator which also evaluates major QoS of the heartbeat based adaptive failure detection mechanism.

Sanjay Bansal

2012-08-01

152

Fault-tolerant Control of Discrete-time LPV systems using Virtual Actuators and Sensors  

DEFF Research Database (Denmark)

This paper proposes a new fault-tolerant control (FTC) method for discrete-time linear parameter varying (LPV) systems using a reconfiguration block. The basic idea of the method is to achieve the FTC goal without re-designing the nominal controller by inserting a reconfiguration block between the plant and the nominal controller. The reconfiguration block is realized by an LPV virtual actuator and an LPV virtual sensor. Its goal is to transform the signals from the faulty system such that its behavior is similar to that of the nominal system from the viewpoint of the controller. Furthermore, it transforms the output of the controller for the faulty system such that the stability and performance goals are preserved. Input-to-state stabilizing LPV gains of the virtual actuator and sensor are obtained by solving linear matrix inequalities (LMIs). We show that separate design of these gains guarantees the input-to-state stability (ISS) of the closed-loop reconfigured system. Moreover, we obtain performances in terms of the ISS gains for the virtual actuator, the virtual sensor and their interconnection. Minimizing these performances is formulated as convex optimization problems subject to LMI constraints. Finally, the effectiveness of the method is demonstrated via a numerical example and stator current control of an induction motor.

Tabatabaeipour, Mojtaba; Stoustrup, Jakob

2015-01-01

153

Fault tolerant capabilities of the Cosmic Background Explorer attitude control system  

Science.gov (United States)

The Cosmic Background Explorer (COBE), which was launched November 18, 1989 from Vandenberg Air Force Base aboard a Delta rocket, has been classified by the scientific community as a major success with regards to the field of cosmology theory. Despite a number of anomalies which have occurred during the mission, the attitude control system (ACS) has performed remarkably well. This is due in large part to the fault tolerant capabilities that were designed into the ACS. A unique triaxial control system orientated in the spacecraft's transverse plane provides the ACS the ability to safely survive various sensor and actuator failures. Features that help to achieve this fail-operational system include component cross-strapping and autonomous control electronics switching. This design philosophy was of utmost importance because of the constraint placed upon the ACS to keep the spinning observatory and its cryogen-cooled science instruments pointing away from the sun. Even though the liquid helium was depleted within the expected twelve months from launch, it is still very much desirable to avoid any thermal disturbances upon the remaining functional instruments.

Placanica, Samuel J.

1992-01-01

154

Heap Base Coordinator Finding with Fault Tolerant Method in Distributed Systems  

Directory of Open Access Journals (Sweden)

Full Text Available Coordinator finding in wireless networks is a very important problem, and this problem is solved by suitable algorithms. The main goals of coordinator finding are synchronizing the processes at optimal using of the resources. Many different algorithms have been presented for coordinator finding. The most important leader election algorithms are the Bully and Ring algorithms. In this paper we analyze and compare these algorithms with together and we propose new approach with fault tolerant mechanisms base on heap for coordinator finding in wireless environment. Our algorithm's running time and message complexity compare favorably with existing algorithms. Our work involves substantial modifications of an existing algorithm and its proof, and we adapt the existing algorithms to the noisy environment base on fault tolerant mechanisms

Mehdi EffatParvar

2011-07-01

155

Validation Methods Research for Fault-Tolerant Avionics and Control Systems Sub-Working Group Meeting. CARE 3 peer review  

Science.gov (United States)

A computer aided reliability estimation procedure (CARE 3), developed to model the behavior of ultrareliable systems required by flight-critical avionics and control systems, is evaluated. The mathematical models, numerical method, and fault-tolerant architecture modeling requirements are examined, and the testing and characterization procedures are discussed. Recommendations aimed at enhancing CARE 3 are presented; in particular, the need for a better exposition of the method and the user interface is emphasized.

Trivedi, K. S. (editor); Clary, J. B. (ed)

1980-01-01

156

Identification of Critical Factors in Checkpointing Based Multiple Fault Tolerance for Distributed System  

OpenAIRE

Performance of a checkpointing based multiple fault tolerance is low. The main reason is overheads associate with checkpointing. A checkpointing algorithm can be improved by improved storing strategy and checkpointing scheduling. Improved storage strategy and checkpointing scheduling will reduce the overheads associated with checkpointing. Performance and efficiency is most desirable feature of recovery based on checkpointing. In this paper important critical issues involved in fast and effic...

Sanjay Bansal; Sanjeev Sharma,

2011-01-01

157

Research on fault diagnose and fault tolerant control of steam generator based on strong tracking filter  

International Nuclear Information System (INIS)

In order to further improve the safety of nuclear power plants, based on the nonlinear system with stochastic noise, the strong tracking filter is used to evaluate the sensor fault bias of steam generator control system and reconstruct the sensors output to implement the fault tolerant control. The simulation results demonstrate that this method can evaluate the time-varying sensor fault bias effectively and has great fault tolerant ability, and the methodology employing the strong tracking filter for steam generator fault tolerant control design is effective. (authors)

158

Fault-tolerance experiments with the JPL STAR computer.  

Science.gov (United States)

Results of fault-tolerance experiments performed using an experimental computer with dynamic (standby) redundancy, including replaceable subsystems and a 'program rollback' provision to eliminate transient-caused errors. After a brief review of the specification of fault-tolerance with respect to transient faults, including a description of the method of injection of transient faults in software and system tests, fault-tolerance experiments carried out with this computer with regard to the determination of fault classes, software verification, system verification, and recovery stability are summarized. A test and repair processor is described which constitutes a special monitor unit of the computer and is used to obtain information for fault detection in the other subsystems of the computer and to ensure that proper recovery occurs when a fault is detected.

Avizienis, A.; Rennels, D. A.

1972-01-01

159

Software-implemented hardware fault tolerance  

CERN Document Server

Addresses the topic of software-implemented hardware fault tolerance (SIHFT), that is, how to deal with faults affecting the hardware by only (or mainly) acting on the software. This book presents the theory behind software-implemented hardware fault tolerance, as well as the practical aspects related to put it at work on real examples.

Goloubeva, O; Sonza Reorda, M

2006-01-01

160

A study on quantification of unavailability of DPPS with fault tolerant techniques considering fault tolerant techniques' characteristics  

Energy Technology Data Exchange (ETDEWEB)

With the improvement of digital technologies, digital I and C systems have included more various fault tolerant techniques than conventional analog I and C systems have, in order to increase fault detection and to help the system safely perform the required functions in spite of the presence of faults. So, in the reliability evaluation of digital systems, the fault tolerant techniques (FTTs) and their fault coverage must be considered. To consider the effects of FTTs in a digital system, there have been several studies on the reliability of digital model. Therefore, this research based on literature survey attempts to develop a model to evaluate the plant reliability of the digital plant protection system (DPPS) with fault tolerant techniques considering detection and process characteristics and human errors. Sensitivity analysis is performed to ascertain important variables from the fault management coverage and unavailability based on the proposed model.

Kim, B. G.; Kang, H. G.; Kim, H. E.; Seung, P. H. [Korea Advanced Institute of Science and Technology, Daejeon (Korea, Republic of); Kang, H. G. [Khalifa Univ. of Science, Abu Dhabi (United Arab Emirates); Lee, S. J. [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

2012-03-15

161

A study on quantification of unavailability of DPPS with fault tolerant techniques considering fault tolerant techniques' characteristics  

International Nuclear Information System (INIS)

With the improvement of digital technologies, digital I and C systems have included more various fault tolerant techniques than conventional analog I and C systems have, in order to increase fault detection and to help the system safely perform the required functions in spite of the presence of faults. So, in the reliability evaluation of digital systems, the fault tolerant techniques (FTTs) and their fault coverage must be considered. To consider the effects of FTTs in a digital system, there have been several studies on the reliability of digital model. Therefore, this research based on literature survey attempts to develop a model to evaluate the plant reliability of the digital plant protection system (DPPS) with fault tolerant techniques considering detection and process characteristics and human errors. Sensitivity analysis is performed to ascertain important variables from the fault management coverage and unavailability based on the proposed model

162

Eigenstructure Assignment for Fault Tolerant Flight Control Design  

Science.gov (United States)

In recent years, fault tolerant flight control systems have gained an increased interest for high performance military aircraft as well as civil aircraft. Fault tolerant control systems can be described as either active or passive. An active fault tolerant control system has to either reconfigure or adapt the controller in response to a failure. One approach is to reconfigure the controller based upon detection and identification of the failure. Another approach is to use direct adaptive control to adjust the controller without explicitly identifying the failure. In contrast, a passive fault tolerant control system uses a fixed controller which achieves acceptable performance for a presumed set of failures. We have obtained a passive fault tolerant flight control law for the F/A-18 aircraft which achieves acceptable handling qualities for a class of control surface failures. The class of failures includes the symmetric failure of any one control surface being stuck at its trim value. A comparison was made of an eigenstructure assignment gain designed for the unfailed aircraft with a fault tolerant multiobjective optimization gain. We have shown that time responses for the unfailed aircraft using the eigenstructure assignment gain and the fault tolerant gain are identical. Furthermore, the fault tolerant gain achieves MIL-F-8785C specifications for all failure conditions.

Sobel, Kenneth; Joshi, Suresh (Technical Monitor)

2002-01-01

163

Fault-tolerant control for current sensors of doubly fed induction generators based on an improved fault detection method  

DEFF Research Database (Denmark)

Fault-tolerant control of current sensors is studied in this paper to improve the reliability of a doubly fed induction generator (DFIG). A fault-tolerant control system of current sensors is presented for the DFIG, which consists of a new current observer and an improved current sensor fault detection algorithm. The current observer is constructed by using only voltage signals as inputs. The fault detection algorithm is based on the current observer, in which an adaptive threshold and different fault duration times are considered. The performance of the proposed observer, improved fault detection algorithm, and fault-tolerant control system are investigated by simulation. The results indicate that the outputs of the observer and the sensor are highly coherent. The fault detection algorithm can efficiently detect both soft and hard faults in current sensors, and the fault-tolerant control system can effectively tolerate both types of faults. © 2013 Published by Elsevier Ltd. All rights reserved.

Li, Hui; Yang, Chao

2014-01-01

164

Adapted importance sampling schemes for the simulation of dependability models of Fault-tolerant systems with deferred repair  

OpenAIRE

This paper targets the simulation of continuous-time Markov chain models of fault-tolerant systems with deferred repair. We start by stating sufficient conditions for a given importance sampling scheme to satisfy the bounded relative error property. Using those sufficient conditions, it is noted that many previously proposed importance sampling techniques such as failure biasing and balanced failure biasing satisfy that property. Then, we adapt the importance sampling schemes failure transiti...

Carrasco, Juan A.

2006-01-01

165

Dynamic and fault-tolerant cluster management  

OpenAIRE

Recent decentralised event-based systems have focused on providing event delivery which scales with increasing number of processes. While the main focus of research has been on ensuring that processes maintain only a small amount of information on maintaining membership and routing, an important factor in achieving scalability for event-based peer-to-peer dissemination system is the number of events disseminated at the same time. This work presents a dynamic and fault tolerant cluster managem...

Gidenstam, Anders; Koldehofe, Boris; Papatriantafilou, Marina; Tsigas, Philippas

2005-01-01

166

System Wide Joint Position Sensor Fault Tolerance in Robot Systems Using Cartesian Accelerometers  

Science.gov (United States)

Joint position sensors are necessary for most robot control systems. A single position sensor failure in a normal robot system can greatly degrade performance. This paper presents a method to obtain position information from Cartesian accelerometers without integration. Depending on the number and location of the accelerometers. the proposed system can tolerate the loss of multiple position sensors. A solution technique suitable for real-time implementation is presented. Simulations were conducted using 5 triaxial accelerometers to recover from the loss of up to 4 joint position sensors on a 7 degree of freedom robot moving in general three dimensional space. The simulations show good estimation performance using non-ideal accelerometer measurements.

Aldridge, Hal A.; Juang, Jer-Nan

1997-01-01

167

Admissible Model Matching Fault Tolerant Control based on LPV Fault Representation  

OpenAIRE

In this paper, an approach to design an Admissible Model Matching (AMM) Fault Tolerant Control (FTC) based on Linear Parameter Varying (LPV) fault representation is proposed. The main contribution of this approach is to consider the fault as a scheduling variable that allows the controller reconfiguration online. The fault is expressed as a change in the system dynamics (in particular, in the model parameters). The suggested strategy is an active technique that requires the fault to be detect...

Montes Oca, Saul; Puig, Vicenc?; Theilliol, Didier; Tornil-sin, Sebastia?n

2009-01-01

168

Fault tolerant control - a residual based set-up  

OpenAIRE

A new set-up for fault tolerant control (FTC) for stable systems is presented in this paper. The new set-up is based on a simple implementation of the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization. This implementation of the YJBK parameterization will allow a direct and simple reconfiguration of the feedback controller. Another central part of fault tolerant control is fault diagnosis. The controller implementation can be applied directly in connection with both passiv...

Niemann, Hans Henrik; Poulsen, Niels Kjølstad

2010-01-01

169

Efficient Fault-Tolerant Strategy Selection Algorithm in Cloud Computing  

Directory of Open Access Journals (Sweden)

Full Text Available Cloud computing is upcoming a mainstream feature of information technology. More progressively enterprises deploy their software systems in the cloud environment. The applications in cloud are usually large scale and containing a lot of distributed cloud components. Building cloud applications is highly reliable for challenging and critical research issues. Information processing systems has increased the significance of its correct and continuous operation even in the presence of faulty components. To address this issue, proposes a cloud framework to build fault-tolerant cloud applications. We first propose fault detection algorithms to identify significant components from the huge amount of cloud components. Then, we present an efficient fault-tolerance strategy selection algorithm to determine the most suitable fault-tolerance strategy for each significant component. Software fault tolerance is widely adopted to increase the overall system reliability in critical applications. System reliability can be enhanced by employing functionally equivalent components to tolerate component failures. Fault-tolerance strategies introduced a three well-known techniques are in the following with formulas for calculating the failure probabilities of the fault-tolerant modules. Our work will mainly be driven toward the implementation of the framework to measure the strength of fault tolerance service and to make an in-depth analysis of the cost benefits among all the stakeholders. An algorithm is proposed to automatically determine an efficient fault-tolerance strategy for the significant cloud components. Using real failure traces and model, we evaluate the proposed resource provisioning policies to determine their performance, cost as well as cost efficiency. The experimental results show that by tolerating faults of a small part of the most important components, the reliability of cloud applications can be highly improved.

P.Priyanka

2014-02-01

170

A Dynamic Effective Fault Tolerance System in Robotic Manipulator using a Hybrid Neural Network based Controller  

OpenAIRE

Robot manipulator play important role in the field of automobile industry, mainly it is used in gas welding application and manufacturing and assembling of motor parts. In complex trajectory, on each joint the speed of the robot manipulator is affected. For that reason, it is necessary to analyze the noise and vibration of robot's joints for predicting faults also improve the control precision of robotic manipulator. In this study we will propose a new fault detection system for Robot manipul...

Jiji, G.; Rajaram, M.

2014-01-01

171

Fault Tolerant Ethernet Based Network for Time Sensitive Applications in Electrical Power Distribution Systems  

Directory of Open Access Journals (Sweden)

Full Text Available The paper analyses and experimentally verifies deployment of Ethernet based network technology to enable fault tolerant and timely exchange of data among a number of high voltage protective relays that use proprietary serial communication line to exchange data in real time on a state of its high voltage circuitry facilitating a fast protection switching in case of critical failures. The digital serial signal is first fetched into PCM multiplexer where it is mapped to the corresponding E1 (2 Mbit/s time division multiplexed signal. Subsequently, the resulting E1 frames are then packetized and sent through Ethernet control LAN to the opposite PCM demultiplexer where the same but reverse processing is done finally sending a signal into the opposite protective relay. The challenge of this setup is to assure very timely delivery of the control information between protective relays even in the cases of potential failures of Ethernet network itself. The tolerance of Ethernet network to faults is assured using widespread per VLAN Rapid Spanning Tree Protocol potentially extended by 1+1 PCM protection as a valuable option.

Leos Bohac

2013-01-01

172

Recovery in fault-tolerant distributed microcontrollers  

Science.gov (United States)

A critical problem facing both the government and commercial space program is the need for lower cost, higher performance and lower power consumption for on-board processing. Special radiation hardened processors have been developed to operate in the space radiation environment, but they are typically one to two orders of magnitude behind the performance of commercial devices, and they consume much more power. Yet there is a need for much greater processing performance in most future space missions. The use of commercial (designated COTS Commercial Off-the-Shelf) processors in space has been prevented by the fact that the space radiation environment causes a unacceptably high transient error rate---derailing their computations every few hours [MESS 92]. However, protective redundancy can be employed along with the technology of fault-tolerant computing to automatically recover from such errors and thus enable their use. This thesis focuses on one aspect of this problem, the embedded microcontrollers highly integrated computer system on a single chip that, not unlike those used in modern automobiles, control various subsystems that make up a spacecraft. This thesis examines tradeoffs and experiments with design techniques required to implement fault-tolerant distributed networks using embedded microcontroller processing nodes. A new fault-tolerant node architecture was developed that allows differing amounts of redundancy to be employed with minimal design change. This includes a special isolated wire-or output system that allows modules to be powered down to recover from some potentially destructive radiation events (latchup). An novel recovery approach was developed that uses comparison voting for error detection and recovery but also employs a "stable" set of recovery actions to allow recovery if multiple errors or Byzantine behaviors occur. Finally, a redundant intercommunication architecture between embedded processing nodes was developed that provides fault-tolerance in communications between them. A testbed has been constructed, a real-time executive has been developed, and a supporting test environment has also been implemented to allow fault-insertion testing of the experimental architecture. Our initial results strongly support the viability of the fault-tolerance approaches we have developed.

Hwang, Riki I.-Ming

173

Electrical Steering of Vehicles - Fault-tolerant Analysis and Design  

DEFF Research Database (Denmark)

The topic of this paper is systems that need be designed such that no single fault can cause failure at the overall level. A methodology is presented for analysis and design of fault-tolerant architectures, where diagnosis and autonomous reconfiguration can replace high cost triple redundancy solutions and still meet strict requirements to functional safety. The paper applies graph-based analysis of functional system structure to find a novel fault-tolerant architecture for an electrical steering where a dedicated AC-motor design and cheap voltage measurements ensure ability to detect all relevant faults. The paper shows how active control reconfiguration can accommodate all critical faults and the fault-tolerant abilities are demonstrated on a warehouse truck hardware.

Blanke, Mogens; Thomsen, Jesper Sandberg

2006-01-01

174

Fault Tolerance in Control Architectures for Mobile Robots: Fantasy or Reality?  

OpenAIRE

Due to the future development of robotic autonomous systems in human environment, the fault tolerance paradigm will be a central issue in robotics. This article presents a survey of fault tolerance concepts, means and implementations in robotic architectures.

Crestani, Didier; Godary-dejean, Karen

2012-01-01

175

Fault Tolerance in Cellular Automata at High Fault Rates  

CERN Document Server

A commonly used model for fault-tolerant computation is that of cellular automata. The essential difficulty of fault-tolerant computation is present in the special case of simply remembering a bit in the presence of faults, and that is the case we treat in this paper. We are concerned with the degree (the number of neighboring cells on which the state transition function depends) needed to achieve fault tolerance when the fault rate is high (nearly 1/2). We consider both the traditional transient fault model (where faults occur independently in time and space) and a recently introduced combined fault model which also includes manufacturing faults (which occur independently in space, but which affect cells for all time). We also consider both a purely probabilistic fault model (in which the states of cells are perturbed at exactly the fault rate) and an adversarial model (in which the occurrence of a fault gives control of the state to an omniscient adversary). We show that there are cellular automata that can...

McCann, Mark

2007-01-01

176

A Dynamic Slack Management Technique for Real-Time Distributed Embedded System with Enhanced Fault Tolerance and Resource Constraints  

Directory of Open Access Journals (Sweden)

Full Text Available This project work aims to develop a dynamic slack management technique, for real-time distributed embedded systems to reduce the total energy consumption in addition to timing, precedence and resource constraints. The Slack Distribution Technique proposed considers a modified Feedback Control Scheduling (FCS algorithm. This algorithm schedules dependent tasks effectively with precedence and resource constraints. It further minimizes the schedule length and utilizes the available slack to increase the energy efficiency. A fault tolerant mechanism uses a deferred-active-backup scheme increases the schedulability and provides reliability to the system.

Santhi Baskaran,

2011-01-01

177

Diagnosis and Fault-tolerant Control, 2nd edition.  

DEFF Research Database (Denmark)

Fault-tolerant control aims at a graceful degradation of the behaviour of automated systems in case of faults. It satisfies the industrial demand for enhanced availability and safety, in contrast to traditional reactions to faults that bring about sudden shutdowns and loss of availability. The book presents effective model-based analysis and design methods for fault diagnosis and fault-tolerant control. Architectural and structural models are used to analyse the propagation of the fault throught the process, to test the fault detectability and to find the redundancies in the process that can be used to ensure fault tolerance. Design methods for diagnostic systems and fault-tolerant controllers are presented for processes that are described by analytical models, by discrete-event models or that can be dealt with as quantised systems. Five case studies on pilot processes show the applicability of the presented methods. The theoretical results are illustrated by two running examples used throughout the book. The second edition includes new material about reconfigurable control, diagnosis of nonlinear systems, and remote diagnosis. The application examples are extended by a steering-by-wire system and the air path of a diesel engine, both of which include experimental results. The bibliographical notes at the end of all chapters have been up-dated. The chapters end with exercises to be used in lectures.

Blanke, Mogens; Kinnaert, Michel

2006-01-01

178

Fault tolerance issues in nanoelectronics  

Science.gov (United States)

The astonishing success story of microelectronics cannot go on indefinitely. In fact, once devices reach the few-atom scale (nanoelectronics), transient quantum effects are expected to impair their behaviour. Fault tolerant techniques will then be required. The aim of this thesis is to investigate the problem of transient errors in nanoelectronic devices. Transient error rates for a selection of nanoelectronic gates, based upon quantum cellular automata and single electron devices, in which the electrostatic interaction between electrons is used to create Boolean circuits, are estimated. On the bases of such results, various fault tolerant solutions are proposed, for both logic and memory nanochips. As for logic chips, traditional techniques are found to be unsuitable. A new technique, in which the voting approach of triple modular redundancy (TMR) is extended by cascading TMR units composed of nanogate clusters, is proposed and generalised to other voting approaches. For memory chips, an error correcting code approach is found to be suitable. Various codes are considered and a lookup table approach is proposed for encoding and decoding. We are then able to give estimations for the redundancy level to be provided on nanochips, so as to make their mean time between failures acceptable. It is found that, for logic chips, space redundancies up to a few tens are required, if mean times between failures have to be of the order of a few years. Space redundancy can also be traded for time redundancy. As for memory chips, mean times between failures of the order of a few years are found to imply both space and time redundancies of the order of ten.

Spagocci, S. M.

179

Fault Tolerant Magnetic Bearing for Turbomachinery  

Science.gov (United States)

NASA Glenn Research Center (GRC) has developed a Fault-Tolerant Magnetic Bearing Suspension rig to enhance the bearing system safety. It successfully demonstrated that using only two active poles out of eight redundant poles from each radial bearing (that is, simply 12 out of 16 poles dead) levitated the rotor and spun it without losing stability and desired position up to the maximum allowable speed of 20,000 rpm. In this paper, it is demonstrated that as far as the summation of force vectors of the attracting poles and rotor weight is zero, a fault-tolerant magnetic bearing system maintained the rotor at the desired position without losing stability even at the maximum rotor speed. A proportional-integral-derivative (PID) controller generated autonomous corrective actions with no operator's input for the fault situations without losing load capacity in terms of rotor position. This paper also deals with a centralized modal controller to better control the dynamic behavior over system modes.

Choi, Benjamin; Provenza, Andrew

2001-01-01

180

Fault Tolerant Environment in web crawler Using Hardware Failure Detection  

Directory of Open Access Journals (Sweden)

Full Text Available Fault Tolerant Environment is a complete programming environment for the reliable execution of distributed application programs. Fault Tolerant Distributed Environment encompasses all aspects of modern fault-tolerant distributed computing. The built-in user-transparent error detection mechanism covers processor node crashes and hardware transient failures. The mechanism also integrates user-assisted error checks into the system failure model. The nucleus non-blocking checkpointing mechanism combined with a novel low overhead roll forward recovery scheme delivers an efficient, low-overload backup and recovery mechanism for distributed processes. Fault Tolerant Distributed Environment also provides a means of remote automatic process allocation on distributed system nodes. In case of recovery is not possible, we can use new microrebooting approach to store the system to stable state.

Anup Garje , Prof. Bhavesh Patel , Dr. B. B. Mesharm

2012-06-01

181

Enhanced Maritime Safety through Diagnosis and Fault Tolerant Control  

DEFF Research Database (Denmark)

Faults in steering, navigation instruments or propulsion machinery are serious on a marine vessel since the consequence could be loss of maneuvering ability, and imply risk of damage to vessel personnel or environment. Early diagnosis and accomodation of faults could enhance safety. Fault-tolerant control is a methodology to help prevent that faults develop into failure. The means include on-line fault diagnosis, automatic condition assessment and calculation of remedial action to avoid hazards. This paper gives an overview of methods to obtain fault-tolerance: fault diagnosis; analysis of properties of a falty system; means to determine remedial actions. The paper illustrates the techniques by two marine examples, sensor fusion for automatic steering and control of the main engine.

Blanke, Mogens

2001-01-01

182

Fault-Tolerant Attitude Control System for a Spacecraft with Control Moment Gyros Using Multi-Objective Optimization  

Directory of Open Access Journals (Sweden)

Full Text Available Recent years have seen a growing requirement for accurate and agile attitude control of spacecraft. To both quickly and accurately control the attitude of a spacecraft, Control Moment Gyros (CMGs which can generate much higher torque than conventional spacecraft actuators are used as actuators of the spacecraft. The drive on the motors is needed for rapid maneuverability, negatively affecting their life. Thus, in designing spacecraft the conflicting requirements are rapid maneuverability and reduced the drive on motors. Furthermore, the attitude control system needs to be fault-tolerant. The dominant requirement is different for each spacecraft mission, and therefore the relationship between the requirements should be shown. In this study, a design method is proposed for the attitude control system, using multi objective optimization of the skew angle and parameters of the control system. Pareto solutions that can show the relationship between the requirements are obtained by optimizing the parameters. Through numerical analysis, the effect with fault-tolerance and parameter differences for the dominant requirement are confirmed and the method to guide for determining parameters of the attitude control system is established.

Ai Noumi

2015-01-01

183

A Survey on Fault Tolerance in Work flow Management and Scheduling  

Directory of Open Access Journals (Sweden)

Full Text Available Fault Tolerance is a configuration thatprevent a computer or network device from failing inthe event of unexpected problem or error such ashardware failure, link failure, unauthorized access,variations in the configuration of different systems andsystem running out of memory or disk space. Theintegration of fault tolerance measures with schedulinggains much importance. Workflow management systemssupport fault tolerance and efficient data handlingmechanisms

K.Ganga ,Dr S.Karthik , A.Christopher Paul

2012-10-01

184

A Dynamic Effective Fault Tolerance System in Robotic Manipulator using a Hybrid Neural Network based Controller  

Directory of Open Access Journals (Sweden)

Full Text Available Robot manipulator play important role in the field of automobile industry, mainly it is used in gas welding application and manufacturing and assembling of motor parts. In complex trajectory, on each joint the speed of the robot manipulator is affected. For that reason, it is necessary to analyze the noise and vibration of robot's joints for predicting faults also improve the control precision of robotic manipulator. In this study we will propose a new fault detection system for Robot manipulator. The proposed hybrid fault detection system is designed based on fuzzy support vector machine and Artificial Neural Networks (ANNs. In this system the decouple joints are identified and corrected using fuzzy SVM, here non-linear signal are used for complete process and treatment, the Artificial Neural Networks (ANNs are used to detect the free-swinging and locked joint of the robot, two types of neural predictors are also employed in the proposed adaptive neural network structure. The simulation results of a hybrid controller demonstrate the feasibility and performance of the methodology.

G. Jiji

2014-04-01

185

Closed-Loop Evaluation of an Integrated Failure Identification and Fault Tolerant Control System for a Transport Aircraft  

Science.gov (United States)

Formal robustness analysis of aircraft control upset prevention and recovery systems could play an important role in their validation and ultimate certification. Such systems developed for failure detection, identification, and reconfiguration, as well as upset recovery, need to be evaluated over broad regions of the flight envelope or under extreme flight conditions, and should include various sources of uncertainty. To apply formal robustness analysis, formulation of linear fractional transformation (LFT) models of complex parameter-dependent systems is required, which represent system uncertainty due to parameter uncertainty and actuator faults. This paper describes a detailed LFT model formulation procedure from the nonlinear model of a transport aircraft by using a preliminary LFT modeling software tool developed at the NASA Langley Research Center, which utilizes a matrix-based computational approach. The closed-loop system is evaluated over the entire flight envelope based on the generated LFT model which can cover nonlinear dynamics. The robustness analysis results of the closed-loop fault tolerant control system of a transport aircraft are presented. A reliable flight envelope (safe flight regime) is also calculated from the robust performance analysis results, over which the closed-loop system can achieve the desired performance of command tracking and failure detection.

Shin, Jong-Yeob; Belcastro, Christine; Khong, thuan

2006-01-01

186

A failure-distance dased method to bound the reliability of non-repairable fault-tolerant systems without the knowledge of minimal cuts  

OpenAIRE

CTMC (continuous-time Markov chains) are a commonly used formalism for modeling fault-tolerant systems. One of the major drawbacks of CTMC is the well-known state-space explosion problem. This paper develops and analyzes a method (SC-BM) to compute bounds for the reliability of nonrepairable fault-tolerant systems in which only a portion of the state space of the CTMC is generated. SC-BM uses the failure distance concept as the method described in [1] but, unlike that method, which is based o...

Sun?e?, Vi?ctor; Carrasco, Juan A.

2001-01-01

187

Sensitivity Analysis of Unavailability of a Component in DPS with Various Fault-Tolerant Techniques  

International Nuclear Information System (INIS)

With the improvement of digital technologies, digital protection system (DPS) has more multiple sophisticated fault-tolerant techniques (FTTs), in order to increase fault detection and to help the system safely perform the required functions in spite of the possible presence of faults. In the reliability evaluation of digital systems, fault-tolerant techniques (FTTs) and their fault coverage must be considered. Fault detection coverage is crucial factor of FTT in reliability. However, the fault detection coverage is not enough to reflect the effects of various FTTs in reliability model. Thus, integrated fault coverage is suggested to reflect characteristics of FTTs

188

Sensitivity Analysis of Unavailability of a Component in DPS with Various Fault-Tolerant Techniques  

Energy Technology Data Exchange (ETDEWEB)

With the improvement of digital technologies, digital protection system (DPS) has more multiple sophisticated fault-tolerant techniques (FTTs), in order to increase fault detection and to help the system safely perform the required functions in spite of the possible presence of faults. In the reliability evaluation of digital systems, fault-tolerant techniques (FTTs) and their fault coverage must be considered. Fault detection coverage is crucial factor of FTT in reliability. However, the fault detection coverage is not enough to reflect the effects of various FTTs in reliability model. Thus, integrated fault coverage is suggested to reflect characteristics of FTTs

Kim, Bo Gyung; Kang, Hyun Gook; Kim, Hee Eun; Seong, Poong Hyun [Korea Advanced Institute of Science and Technology, Daejeon (Korea, Republic of); Lee, Seung Jun [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

2012-05-15

189

Fault-tolerant wait-free shared objects  

Science.gov (United States)

A concurrent system consists of processes and shared objects. Previous research focused on the problem of tolerating process failure. We study the complementary problem of tolerating failures. We divide object failures into two broad classes: responsive and non-responsive. With responsive failures, a faulty object responds to every invocation, but responses may be incorrect. With non-responsive failures, a faulty object may also 'hang' without responding. For each class, we consider crash, and arbitrary types of failures. For each type of failure, we are seeking a universal implementation for fault-tolerant wait-free shared objects. We present (deterministic) implementations for all types of responsive failures, including arbitrary failures. In contrast, we show that even the most benign type of non-responsive failures requires the use of randomization. Of special interest is the problem of implementing fault-tolerant objects using only objects of the same type. We present such fault-tolerant self-implementations for many common object types. Graceful degradation is a desirable property of fault-tolerant implementations: the implemented object never fails more severely than the base objects it is derived from, even if all the base objects fail. For several failure models, we show whether this property can be achieved, and, if so, how. In addition to the above possibility/impossibility results, we also consider the resources complexity of fault-tolerant implementations. In many cases, we present lower bounds and give matching algorithms.

Jayanti, Prasad; Chandra, Tushar Deepak; Toueg, Sam

1992-01-01

190

A continuous-time semi-markov bayesian belief network model for availability measure estimation of fault tolerant systems  

Scientific Electronic Library Online (English)

Full Text Available SciELO Brazil | Language: English Abstract in portuguese Neste trabalho, é proposto um modelo baseado na integração entre processos semi-Markovianos e redes Bayesianas para avaliação da disponibilidade de sistemas tolerantes à falha. Esta integração resulta em um modelo estocástico híbrido o qual é capaz de representar as características dinâmicas de um s [...] istema assim como tratar as relações de causa e efeito entre fatores externos tais como condições ambientais e operacionais. Além disso, o modelo híbrido permite avaliar a propagação de incerteza sobre a disponibilidade do sistema. É também proposto um procedimento numérico para a solução das equações de probabilidade de estado de processos semi-Markovianos descritos por taxas de transição. Tal procedimento numérico é baseado na aplicação de transformadas de Laplace que são invertidas pelo método de quadratura Gaussiana conhecido como Gauss Legendre. O modelo híbrido e procedimento numérico são ilustrados por meio de um exemplo de aplicação no contexto de sistemas tolerantes à falha. Abstract in english In this work it is proposed a model for the assessment of availability measure of fault tolerant systems based on the integration of continuous time semi-Markov processes and Bayesian belief networks. This integration results in a hybrid stochastic model that is able to represent the dynamic charact [...] eristics of a system as well as to deal with cause-effect relationships among external factors such as environmental and operational conditions. The hybrid model also allows for uncertainty propagation on the system availability. It is also proposed a numerical procedure for the solution of the state probability equations of semi-Markov processes described in terms of transition rates. The numerical procedure is based on the application of Laplace transforms that are inverted by the Gauss quadrature method known as Gauss Legendre. The hybrid model and numerical procedure are illustrated by means of an example of application in the context of fault tolerant systems.

Márcio das Chagas, Moura; Enrique López, Droguett.

2008-08-01

191

A continuous-time semi-markov bayesian belief network model for availability measure estimation of fault tolerant systems  

Directory of Open Access Journals (Sweden)

Full Text Available In this work it is proposed a model for the assessment of availability measure of fault tolerant systems based on the integration of continuous time semi-Markov processes and Bayesian belief networks. This integration results in a hybrid stochastic model that is able to represent the dynamic characteristics of a system as well as to deal with cause-effect relationships among external factors such as environmental and operational conditions. The hybrid model also allows for uncertainty propagation on the system availability. It is also proposed a numerical procedure for the solution of the state probability equations of semi-Markov processes described in terms of transition rates. The numerical procedure is based on the application of Laplace transforms that are inverted by the Gauss quadrature method known as Gauss Legendre. The hybrid model and numerical procedure are illustrated by means of an example of application in the context of fault tolerant systems.Neste trabalho, é proposto um modelo baseado na integração entre processos semi-Markovianos e redes Bayesianas para avaliação da disponibilidade de sistemas tolerantes à falha. Esta integração resulta em um modelo estocástico híbrido o qual é capaz de representar as características dinâmicas de um sistema assim como tratar as relações de causa e efeito entre fatores externos tais como condições ambientais e operacionais. Além disso, o modelo híbrido permite avaliar a propagação de incerteza sobre a disponibilidade do sistema. É também proposto um procedimento numérico para a solução das equações de probabilidade de estado de processos semi-Markovianos descritos por taxas de transição. Tal procedimento numérico é baseado na aplicação de transformadas de Laplace que são invertidas pelo método de quadratura Gaussiana conhecido como Gauss Legendre. O modelo híbrido e procedimento numérico são ilustrados por meio de um exemplo de aplicação no contexto de sistemas tolerantes à falha.

Márcio das Chagas Moura

2008-08-01

192

SABRE: a bio-inspired fault-tolerant electronic architecture  

International Nuclear Information System (INIS)

As electronic devices become increasingly complex, ensuring their reliable, fault-free operation is becoming correspondingly more challenging. It can be observed that, in spite of their complexity, biological systems are highly reliable and fault tolerant. Hence, we are motivated to take inspiration for biological systems in the design of electronic ones. In SABRE (self-healing cellular architectures for biologically inspired highly reliable electronic systems), we have designed a bio-inspired fault-tolerant hierarchical architecture for this purpose. As in biology, the foundation for the whole system is cellular in nature, with each cell able to detect faults in its operation and trigger intra-cellular or extra-cellular repair as required. At the next level in the hierarchy, arrays of cells are configured and controlled as function units in a transport triggered architecture (TTA), which is able to perform partial-dynamic reconfiguration to rectify problems that cannot be solved at the cellular level. Each TTA is, in turn, part of a larger multi-processor system which employs coarser grain reconfiguration to tolerate faults that cause a processor to fail. In this paper, we describe the details of operation of each layer of the SABRE hierarchy, and how these layers interact to provide a high systemic level of fault tolerance. (paper)

193

Byzantine-fault tolerant self-stabilizing protocol for distributed clock synchronization systems  

Science.gov (United States)

A rapid Byzantine self-stabilizing clock synchronization protocol that self-stabilizes from any state, tolerates bursts of transient failures, and deterministically converges within a linear convergence time with respect to the self-stabilization period. Upon self-stabilization, all good clocks proceed synchronously. The Byzantine self-stabilizing clock synchronization protocol does not rely on any assumptions about the initial state of the clocks. Furthermore, there is neither a central clock nor an externally generated pulse system. The protocol converges deterministically, is scalable, and self-stabilizes in a short amount of time. The convergence time is linear with respect to the self-stabilization period.

Malekpour, Mahyar R. (Inventor)

2010-01-01

194

Enhancement of Fault Tolerance in Cloud Computing  

Directory of Open Access Journals (Sweden)

Full Text Available In recent years researchers are trying to work out scientific applications in cloud so that it decreases the infrastructure cost and increases the span of team and finally innovative ideas towards applications is increased. But the cloud is still not as much reliable, controllable as grid. So in the evolving Cloud computing environment there is a great need of fault tolerance mechanism for the system to work effectively even in the presence of failure. Moreover Big Organizations are also opting for using Hybrid Cloud instead of private Cloud. Thus, in this paper we propose an approach of using a new framework in Cloud so as to use Cloud for scientific applications as well makes the public Cloud trustworthy platform. There is a progressive approach introduced to provide an effective way to achieve high fault tolerance in Clouds by enabling a new workflow planning method to balance performance, reliability and cost for critical scientific applications and focus mainly on use of distributed resources for workflow execution mainly in serial and concurrent manner.

Pushpanjali Gupta

2014-08-01

195

Model-Based Fault Tolerant Control  

Science.gov (United States)

The Model Based Fault Tolerant Control (MBFTC) task was conducted under the NASA Aviation Safety and Security Program. The goal of MBFTC is to develop and demonstrate real-time strategies to diagnose and accommodate anomalous aircraft engine events such as sensor faults, actuator faults, or turbine gas-path component damage that can lead to in-flight shutdowns, aborted take offs, asymmetric thrust/loss of thrust control, or engine surge/stall events. A suite of model-based fault detection algorithms were developed and evaluated. Based on the performance and maturity of the developed algorithms two approaches were selected for further analysis: (i) multiple-hypothesis testing, and (ii) neural networks; both used residuals from an Extended Kalman Filter to detect the occurrence of the selected faults. A simple fusion algorithm was implemented to combine the results from each algorithm to obtain an overall estimate of the identified fault type and magnitude. The identification of the fault type and magnitude enabled the use of an online fault accommodation strategy to correct for the adverse impact of these faults on engine operability thereby enabling continued engine operation in the presence of these faults. The performance of the fault detection and accommodation algorithm was extensively tested in a simulation environment.

Kumar, Aditya; Viassolo, Daniel

2008-01-01

196

Simulation modeling based method for choosing an effective set of fault tolerance mechanisms for real-time avionics systems  

Science.gov (United States)

In this paper, the reliability allocation problem (RAP) for real-time avionics systems (RTAS) is considered. The proposed method for solving this problem consists of two steps: (i) creation of an RTAS simulation model at the necessary level of abstraction and (ii) application of metaheuristic algorithm to find an optimal solution (i. e., to choose an optimal set of fault tolerance techniques). When during the algorithm execution it is necessary to measure the execution time of some software components, the simulation modeling is applied. The procedure of simulation modeling also consists of the following steps: automatic construction of simulation model of the RTAS configuration and running this model in a simulation environment to measure the required time. This method was implemented as an experimental software tool. The tool works in cooperation with DYANA simulation environment. The results of experiments with the implemented method are presented. Finally, future plans for development of the presented method and tool are briefly described.

Bakhmurov, A. G.; Balashov, V. V.; Glonina, A. B.; Pashkov, V. N.; Smeliansky, R. L.; Volkanov, D. Yu.

2013-12-01

197

Analysis of an inherently fault tolerant program  

International Nuclear Information System (INIS)

Software for process-control systems, such as nuclear power plant safety control systems and robots, can be very complex because of the large number of cases which have to be considered. The approach proposed here uses decentralized control concepts and is based on Dijkstra's ''relaxation'' problem and self-stabilizing systems. The resulting program is inherently fault tolerant of partial hardware failures. Further, often the software is simplified, so that its correctness can be verified more easily. The authors present an overview of the model using a simple control program for a simulated robot as an example. Then they analyze this control program in terms of the degree to which it is decentralized, its partial correctness proof, its convergence proof and its performance. They also discuss some modifications to the basic algorithm

198

Software for Fault-Tolerant Matrix Multiplication  

Science.gov (United States)

Formal Linear Algebra Recovery Environment is a computer program for high-performance, fault-tolerant matrix multiplication. The program is based on an extension of the prior theory and practice of fault-tolerant matrix matrix multiplication of the form C = AB. This extension provides low-overhead methods for detecting errors, not only in C, but also in A and/or B. These methods enable the detection of all errors as long as, in a given case, only one entry in A, B, or C is corrupted. The program also provides for following a low-overhead rollback approach to correct errors once detected. Results of computational experiments have demonstrated that the methods implemented in this program work well in practice while imposing an acceptably low level of overhead, relative to high-performance matrix-multiplication methods that do not afford fault tolerance.

Katz, Daniel; Tisdale, Edwin; Quintana-Orti, Enrique; Gunnels, John; van de Geijn, Robert

2004-01-01

199

SIFT - Multiprocessor architecture for Software Implemented Fault Tolerance flight control and avionics computers  

Science.gov (United States)

A brief description of a SIFT (Software Implemented Fault Tolerance) Flight Control Computer with emphasis on implementation is presented. A multiprocessor system that relies on software-implemented fault detection and reconfiguration algorithms is described. A high level reliability and fault tolerance is achieved by the replication of computing tasks among processing units.

Forman, P.; Moses, K.

1979-01-01

200

On requirements for software fault tolerance for flight controls  

Science.gov (United States)

The need for the application of software fault tolerance techniques in digital flight control systems is argued to follow from the requirements derivable from the safety constraints of such systems, requirements which can be stated in terms of minimum acceptable system reliability levels and, moreover, stated quantitatively. It is argued further that, while fault tolerance appears to be a viable mechanism in general, individual fault tolerance schemes need to be analyzed to ensure that they are adequate to the task and being properly utilized, that such analysis is essentially an exercise in software 'reliability' estimation involving software characteristics not currently included in software 'reliability' modeling (most especially, the degree of correlation of malfunctions among redundant, dissimilar software modules), and that, consequently, further research and studies in the characterization of software behavior and malfunctions is required.

Migneault, G. E.

1983-01-01

201

Design and Analysis of a Fault Tolerant Microprocessor Based on Triple Modular Redundancy Using VHDL  

OpenAIRE

There are numerous real time & operation critical systems in which the failure of the system is unacceptable at any stage of processing. The examples of such systems are like ATM machines, satellites, spacecraft etc. In this paper a fault tolerant microprocessor is developed by using checker units with a fault secure ALU and to develop a fault secure ALU the parity prediction logic and two rail checker method was used. Finally triple modular redundancy is applied to develop a fault tolerant p...

Deepti Shinghal; Dinesh Chandra,

2011-01-01

202

Learning Fault-tolerant Speech Parsing with SCREEN  

CERN Document Server

This paper describes a new approach and a system SCREEN for fault-tolerant speech parsing. SCREEEN stands for Symbolic Connectionist Robust EnterprisE for Natural language. Speech parsing describes the syntactic and semantic analysis of spontaneous spoken language. The general approach is based on incremental immediate flat analysis, learning of syntactic and semantic speech parsing, parallel integration of current hypotheses, and the consideration of various forms of speech related errors. The goal for this approach is to explore the parallel interactions between various knowledge sources for learning incremental fault-tolerant speech parsing. This approach is examined in a system SCREEN using various hybrid connectionist techniques. Hybrid connectionist techniques are examined because of their promising properties of inherent fault tolerance, learning, gradedness and parallel constraint integration. The input for SCREEN is hypotheses about recognized words of a spoken utterance potentially analyzed by a spe...

Wermter, S; Wermter, Stefan; Weber, Volker

1994-01-01

203

Interactive animation of fault-tolerant parallel algorithms  

Energy Technology Data Exchange (ETDEWEB)

Animation of algorithms makes understanding them intuitively easier. This paper describes the software tool Raft (Robust Animator of Fault Tolerant Algorithms). The Raft system allows the user to animate a number of parallel algorithms which achieve fault tolerant execution. In particular, we use it to illustrate the key Write-All problem. It has an extensive user-interface which allows a choice of the number of processors, the number of elements in the Write-All array, and the adversary to control the processor failures. The novelty of the system is that the interface allows the user to create new on-line adversaries as the algorithm executes.

Apgar, S.W.

1992-02-01

204

Implementation of middleware fault tolerance support for real-time embedded applications  

OpenAIRE

Critical real-time embedded systems need to apply fault tolerance strategies to deal with operation time errors, either in hardware or software. In this paper we present the ongoing work to provide application fault tolerance by means of implementing middleware transparent support over the BOSS embedded operating system. The middleware uses a publishersubscriber protocol and enables the execution of several fault tolerance strategies with minimum burden to the application level software

Afonso, Francisco; Silva, Carlos A.; Montenegro, Se?rgio; Tavares, Adriano

2006-01-01

205

Fault tolerant issues in the BTeV trigger  

International Nuclear Information System (INIS)

The BTeV trigger performs sophisticated computations using large ensembles of FPGAs, DSPs, and conventional microprocessors. This system will have between 5,000 and 10,000 computing elements and many networks and data switches. While much attention has been devoted to developing efficient algorithms, the need for fault-tolerant, fault-adaptive, and flexible techniques and software to manage this huge computing platform has been identified as one of the most challenging aspects of this project. They describe the problem and offer an approach to solving it based on a distributed, hierarchical fault management system

206

Concepts and Methods in Fault-tolerant Control  

DEFF Research Database (Denmark)

Faults in automated processes will often cause undesired reactions and shut-down of a controlled plant, and the consequences could be damage to technical parts of the plant, to personnel or the environment. Fault-tolerant control combines diagnosis with control methods to handle faults in an intelligent way. The aim is to prevent that simple faults develop into serious failure and hence increase plant availability and reduce the risk of safety hazards. Fault-tolerant control merges several disciplines into a common framework to achieve these goals. The desired features are obtained through on-line fault diagnosis, automatic condition assessment and calculation of appropriate remedial actions to avoid certain consequences of a fault. The envelope of the possible remedial actions is very wide. Sometimes, simple could be achieved by replacing a measurement from a faulty sensor by an estimate. In yet other situations, complex reconfiguration or on-line controller redesign is required. This paper gives an overviewof recent tools to analyze and explore structure and other fundamental properties of an automated system such that any inherent redundancy in the controlled process can be fully utilized to maintain availability, even though faults may occur.

Blanke, Mogens; Staroswiecly, M.

2001-01-01

207

Reconfigurable Fault Tolerance for FPGAs  

Science.gov (United States)

The invention allows a field-programmable gate array (FPGA) or similar device to be efficiently reconfigured in whole or in part to provide higher capacity, non-redundant operation. The redundant device consists of functional units such as adders or multipliers, configuration memory for the functional units, a programmable routing method, configuration memory for the routing method, and various other features such as block RAM, I/O (random access memory, input/output) capability, dedicated carry logic, etc. The redundant device has three identical sets of functional units and routing resources and majority voters that correct errors. The configuration memory may or may not be redundant, depending on need. For example, SRAM-based FPGAs will need some type of radiation-tolerant configuration memory, or they will need triple-redundant configuration memory. Flash or anti-fuse devices will generally not need redundant configuration memory. Some means of loading and verifying the configuration memory is also required. These are all components of the pre-existing redundant FPGA. This innovation modifies the voter to accept a MODE input, which specifies whether ordinary voting is to occur, or if redundancy is to be split. Generally, additional routing resources will also be required to pass data between sections of the device created by splitting the redundancy. In redundancy mode, the voters produce an output corresponding to the two inputs that agree, in the usual fashion. In the split mode, the voters select just one input and convey this to the output, ignoring the other inputs. In a dual-redundant system (as opposed to triple-redundant), instead of a voter, there is some means to latch or gate a state update only when both inputs agree. In this case, the invention would require modification of the latch or gate so that it would operate normally in redundant mode, and would separately latch or gate the inputs in non-redundant mode.

Shuler, Robert, Jr.

2010-01-01

208

Fault Tolerant Control for Kori Unit 1 Steam Generator  

International Nuclear Information System (INIS)

In order to implement more reliable control systems, failures of a controller, a sensor and an actuator should be taken into consideration in the process of control system design. Traditionally there have been two approaches for dealing with fault-tolerant control problem: active redundancy and passive redundancy. Active redundancy has no reconfiguration part to take an action such as diagnosing and selecting intact controller when a controller failure occurs, that is, one controller guarantees the system stability and performance under failure of the other controller. Meanwhile, passive redundancy has reconfiguration parts which supervise the system, reject the faulty controller, and select the sound controller which performs the mission. Active redundancy structure for fault-tolerant control is focused in the paper and design methods of fault tolerant state feedback control and fault-tolerant output feedback control are proposed, which makes control a system reliable while guaranteeing stability and performance in the sense of H? norm, in the face of controller failures in the dual-controller configuration. The proposed method is applied to Kori Unit 1 steam generator level control system. The results show that the steam generator water level is well controlled in the situation of one controller failure

209

Implementation of Fault Tolerant Method Using BCH Code on FPGA  

OpenAIRE

The Fault tolerance degradation is the property thatenables a system (often computer-based) to continue operatingproperly in the event of the failure of (or one or more faultswithin) some of its components. To designing a new 32-bitArithmetic Logic Unit (ALU) that is secure against many attacksor faults and able to correct any 5-bit fault in any position of its 32bits input register of ALU. Because the radiation effects onelectronic circuits may cause to be inverted data bits of registers orm...

Mahadevaswamy V P; Sunitha S.L.; Shobha, B. N.

2012-01-01

210

Fault tolerant testbed evaluation, phase 1  

Science.gov (United States)

In recent years, avionics systems development costs have become the driving factor in the development of space systems, military aircraft, and commercial aircraft. A method of reducing avionics development costs is to utilize state-of-the-art software application generator (autocode) tools and methods. The recent maturity of application generator technology has the potential to dramatically reduce development costs by eliminating software development steps that have historically introduced errors and the need for re-work. Application generator tools have been demonstrated to be an effective method for autocoding non-redundant, relatively low-rate input/output (I/O) applications on the Space Station Freedom (SSF) program; however, they have not been demonstrated for fault tolerant, high-rate I/O, flight critical environments. This contract will evaluate the use of application generators in these harsh environments. Using Boeing's quad-redundant avionics system controller as the target system, Space Shuttle Guidance, Navigation, and Control (GN&C) software will be autocoded, tested, and evaluated in the Johnson (Space Center) Avionics Engineering Laboratory (JAEL). The response of the autocoded system will be shown to match the response of the existing Shuttle General Purpose Computers (GPC's), thereby demonstrating the viability of using autocode techniques in the development of future avionics systems.

Caluori, V., Jr.; Newberry, T.

1993-09-01

211

A methodology for testing fault-tolerant software  

Science.gov (United States)

A methodology for testing fault tolerant software is presented. There are problems associated with testing fault tolerant software because many errors are masked or corrected by voters, limiter, or automatic channel synchronization. This methodology illustrates how the same strategies used for testing fault tolerant hardware can be applied to testing fault tolerant software. For example, one strategy used in testing fault tolerant hardware is to disable the redundancy during testing. A similar testing strategy is proposed for software, namely, to move the major emphasis on testing earlier in the development cycle (before the redundancy is in place) thus reducing the possibility that undetected errors will be masked when limiters and voters are added.

Andrews, D. M.; Mahmood, A.; Mccluskey, E. J.

1985-01-01

212

Implementations of a four-level mechanical architecture for fault-tolerant robots  

International Nuclear Information System (INIS)

This paper describes a fault tolerant mechanical architecture with four levels devised and implemented in concert with NASA (Tesar, D. and Sreevijayan, D., Four-level fault tolerance in manipulator design for space operations. In First Int. Symp. Measurement and Control in Robotics (ISMCR '90), Houston, Texas, 20-22 June 1990.) Subsequent work has clarified and revised the architecture. The four levels proceed from fault tolerance at the actuator level, to fault tolerance via in-parallel chains, to fault tolerance using serial kinematic redundancy, and finally to the fault tolerance multiple arm systems provide. This is a subsumptive architecture because each successive layer can incorporate the fault tolerance provided by all layers beneath. For instance a serially-redundant robot can incorporate dual fault-tolerant actuators. Redundant systems provide the fault tolerance, but the guiding principle of this architecture is that functional redundancies actively increase the performance of the system. Redundancies do not simply remain dormant until needed. This paper includes specific examples of hardware and/or software implementation at all four levels

213

Implementation of Fault Tolerant Method Using BCH Code on FPGA  

Directory of Open Access Journals (Sweden)

Full Text Available The Fault tolerance degradation is the property thatenables a system (often computer-based to continue operatingproperly in the event of the failure of (or one or more faultswithin some of its components. To designing a new 32-bitArithmetic Logic Unit (ALU that is secure against many attacksor faults and able to correct any 5-bit fault in any position of its 32bits input register of ALU. Because the radiation effects onelectronic circuits may cause to be inverted data bits of registers ormemories. If one bit of main storage system is changed themission of system would be completely different. The highmotivation in choice of BCH (Bose, chaudhuri, andHocquenghem codes is that, it is able to correct multiple errorsand these classes of codes are kind of powerful random errorcorrecting cyclic codes. In comparison with area penalty methods,32-bit fault tolerant ALU using BCH code is a better choice interms of area as compared to Triple Modular Redundancy (TMRand Residue code. This is due to the fault tolerant method for32-bit ALU using TMR with single or triplicated voting needsingle voting scheme or tripled voter and two extra 32-bit ALUwhich has been increased the hardware overhead by 202% and208% respectively. The Residue code requires hardwareoverhead of 148.9%. However, in comparison with TMR a n dRe s i d u e c o d e , BCH code needs the hardware overhead is 70to 75%, which causes that the overall cost and power consumptionwill get reduces. Thus proposed fault tolerant hardware overheadhas lower hardware and multiple error correction when comparedto the other techniques.

Mahadevaswamy V P

2012-09-01

214

Steps toward fault-tolerant quantum chemistry.  

Energy Technology Data Exchange (ETDEWEB)

Developing quantum chemistry programs on the coming generation of exascale computers will be a difficult task. The programs will need to be fault-tolerant and minimize the use of global operations. This work explores the use a task-based model that uses a data-centric approach to allocate work to different processes as it applies to quantum chemistry. After introducing the key problems that appear when trying to parallelize a complicated quantum chemistry method such as coupled-cluster theory, we discuss the implications of that model as it pertains to the computational kernel of a coupled-cluster program - matrix multiplication. Also, we discuss the extensions that would required to build a full coupled-cluster program using the task-based model. Current programming models for high-performance computing are fault-intolerant and use global operations. Those properties are unsustainable as computers scale to millions of CPUs; instead one must recognize that these systems will be hierarchical in structure, prone to constant faults, and global operations will be infeasible. The FAST-OS HARE project is introducing a scale-free computing model to address these issues. This model is hierarchical and fault-tolerant by design, allows for the clean overlap of computation and communication, reducing the network load, does not require checkpointing, and avoids the complexity of many HPC runtimes. Development of an algorithm within this model requires a change in focus from imperative programming to a data-centric approach. Quantum chemistry (QC) algorithms, in particular electronic structure methods, are an ideal test bed for this computing model. These methods describe the distribution of electrons in a molecule, which determine the properties of the molecule. The computational cost of these methods is high, scaling quartically or higher in the size of the molecule, which is why QC applications are major users of HPC resources. The complexity of these algorithms means that MPI alone is insufficient to achieve parallel scaling; QC developers have been forced to use alternative approaches to achieve scalability and would be receptive to radical shifts in the programming paradigm. Initial work in adapting the simplest QC method, Hartree-Fock, to this the new programming model indicates that the approach is beneficial for QC applications. However, the advantages to being able to scale to exascale computers are greatest for the computationally most expensive algorithms; within QC these are the high-accuracy coupled-cluster (CC) methods. Parallel coupledcluster programs are available, however they are based on the conventional MPI paradigm. Much of the effort is spent handling the complicated data dependencies between the various processors, especially as the size of the problem becomes large. The current paradigm will not survive the move to exascale computers. Here we discuss the initial steps toward designing and implementing a CC method within this model. First, we introduce the general concepts behind a CC method, focusing on the aspects that make these methods difficult to parallelize with conventional techniques. Then we outline what is the computational core of the CC method - a matrix multiply - within the task-based approach that the FAST-OS project is designed to take advantage of. Finally we outline the general setup to implement the simplest CC method in this model, linearized CC doubles (LinCC).

Taube, Andrew Garvin

2010-05-01

215

Fault tolerant UAV`s are coming; Fault tolerant mujinki jidai no torai  

Energy Technology Data Exchange (ETDEWEB)

This paper explains a concept of UAV (unmanned aviation vehicle). Previous UAV`s have achieved success because of their simple system and simple operation. However, for future UAV`s, higher reliability and safety than those of ordinary aircraft are strongly required with a rise in expectation for mission to be executed. In other words, future UAV`s should aim at a fault tolerant system featured by autonomous operation and less than 10{sup -9} fault/hour reliability. Recently ordinary aircraft also came to adopt auto-sequence control for flight control systems to achieve considerably high programmed automatic control from takeoff to landing. A UAV with an autonomous operation function possible to return to a base was also developed. A system reliability of a 10{sup -9} level against flight critical phenomena is required for ordinary commercial aircraft. It is supposed that a reliability equivalent to or more than the above reliability will be required for UAV`s as system design requirement in the near future. (NEDO)

Sumita, J.

1999-06-05

216

Fault tolerant microcomputer based alarm annunciator for Dhruva reactor  

International Nuclear Information System (INIS)

The Dhruva alarm annunciator displays the status of 624 alarm points on an array of display windows using the standard ringback sequence. Recognizing the need for a very high availability, the system is implemented as a fault tolerant configuration. The annunciator is partitioned into three identical units; each unit is implemented using two microcomputers wired in a hot standby mode. In the event of one computer malfunctioning, the standby computer takes over control in a bouncefree transfer. The use of microprocessors has helped built-in flexibility in the system. The system also provides built-in capability to resolve the sequence of occurrence of events and conveys this information to another system for display on a CRT. This report describes the system features, fault tolerant organisation used and the hardware and software developed for the annunciation function. (author). 8 figs

217

Design study of Software-Implemented Fault-Tolerance (SIFT) computer  

Science.gov (United States)

Software-implemented fault tolerant (SIFT) computer design for commercial aviation is reported. A SIFT design concept is addressed. Alternate strategies for physical implementation are considered. Hardware and software design correctness is addressed. System modeling and effectiveness evaluation are considered from a fault-tolerant point of view.

Wensley, J. H.; Goldberg, J.; Green, M. W.; Kutz, W. H.; Levitt, K. N.; Mills, M. E.; Shostak, R. E.; Whiting-Okeefe, P. M.; Zeidler, H. M.

1982-01-01

218

HPC application fault-tolerance using transparent redundant computation.  

Energy Technology Data Exchange (ETDEWEB)

As the core count of HPC machines continue to grow in size, issues such as fault tolerance and reliability are becoming limiting factors for application scalability. Current techniques to ensure progress across faults, for example coordinated checkpoint-restart, are unsuitable for machines of this scale due to their predicted high overheads. In this study, we present the design and implementation of a novel system for ensuring reliability which uses transparent, rank-level, redundant computation. Using this system, we show the overheads involved in redundant computation for a number of real-world HPC applications. Additionally, we relate the communication characteristics of an application to the overheads observed.

Riesen, Rolf E.; Laros, James H., III; Pedretti, Kevin Thomas Tauke; Oldfield, Ron A.; Ferreira, Kurt Brian; Brightwell, Ronald Brian

2009-08-01

219

Improving Fault Tolerance in Ad-Hoc Networks by Using Residue Number System  

Directory of Open Access Journals (Sweden)

Full Text Available In this study, we presented a method for distributing data storage by using residue number system for mobile systems and wireless networks based on peer to peer paradigm. Generally, redundant residue number system is capable in error detection and correction. In proposed method, we made a new system by mixing Redundant Residue Number System (RRNS, Multi Level Residue Number System (ML RNS and Multiple Valued Logic (MVL RNS which was perfect for parallel, carry free, high speed arithmetic and the system supports secure data communication. In addition it had ability of error detection and correction. In comparison to other number systems, it had many improvements in data security, error detection and correction, speed of storage and retrieval.

A. Barati

2008-01-01

220

Special Issue: Fault Tolerant Control of Power Grids  

OpenAIRE

This special issue contains article on fault detection and isolation and fault tolerant control methods applied to different aspects of modern power grids, both for detection, isolation and accommodating faults in the power grid, and for detection, isolation and accommodation of faults in power generating units.

Odgaard, Peter; Aubrun, Christophe; Majanne, Yrjo

2014-01-01

221

Cooperative Fault Tolerant Distributed Computing  

Energy Technology Data Exchange (ETDEWEB)

HARNESS was proposed as a system that combined the best of emerging technologies found in current distributed computing research and commercial products into a very flexible, dynamically adaptable framework that could be used by applications to allow them to evolve and better handle their execution environment. The HARNESS system was designed using the considerable experience from previous projects such as PVM, MPI, IceT and Cumulvs. As such, the system was designed to avoid any of the common problems found with using these current systems, such as no single point of failure, ability to survive machine, node and software failures. Additional features included improved inter-component connectivity, with full support for dynamic down loading of addition components at run-time thus reducing the stress on application developers to build in all the libraries they need in advance.

Fagg, Graham E.

2006-03-15

222

Universal Fault-Tolerant Computation on Decoherence-Free Subspaces  

CERN Document Server

A general scheme to perform universal quantum computation fault-tolerantly within decoherence-free subspaces (DFSs) of a system's Hilbert space is derived. This scheme leads to the first fault-tolerant realization of universal quantum computation on DFSs with the properties that (i) only one- and two-qubit interactions are required, and (ii) the system remains within the DFS throughout the entire implementation of a quantum gate. We show explicitly how to perform universal computation on clusters of the four-qubit DFS encoding one logical qubit each under "collective decoherence" (qubit-permutation-invariant system-bath coupling). Our results have immediate relevance to a number of proposed quantum computer implementations, in particular those in which the internal system Hamiltonian is of the Heisenberg type, such as spin-spin coupled quantum dots.

Bacon, D J; Lidar, D A; Whaley, K B

2000-01-01

223

Highly Reliable Fault Tolerant Technique for Safety Critical Applications  

Directory of Open Access Journals (Sweden)

Full Text Available This paper presents a highly reliable fault tolerant technique for safety critical applications using Five Modular Redundancy method. In high radiation environments like space crafts and nuclear thermal plants it is likely that single event upsets (SEU degrades the system operation. This causes single bit flips in the sequential elements of electronic components in the system. If these systems are not provided with the fault tolerance then there are high chances of obtaining false response. In order to avoid this problem the system is made redundant and a roll-forward recovery mechanism is used to increase the overall reliability. Scan cell design is employed to shift out the internal states of all the flip flops during comparison and recovery process. The proposed method is designed using verilog HDL on XILINX ISE simulator.

Nanditha S

2014-05-01

224

Fault-tolerant battery system employing intra-battery network architecture  

Science.gov (United States)

A distributed energy storing system employing a communications network is disclosed. A distributed battery system includes a number of energy storing modules, each of which includes a processor and communications interface. In a network mode of operation, a battery computer communicates with each of the module processors over an intra-battery network and cooperates with individual module processors to coordinate module monitoring and control operations. The battery computer monitors a number of battery and module conditions, including the potential and current state of the battery and individual modules, and the conditions of the battery's thermal management system. An over-discharge protection system, equalization adjustment system, and communications system are also controlled by the battery computer. The battery computer logs and reports various status data on battery level conditions which may be reported to a separate system platform computer. A module transitions to a stand-alone mode of operation if the module detects an absence of communication connectivity with the battery computer. A module which operates in a stand-alone mode performs various monitoring and control functions locally within the module to ensure safe and continued operation.

Hagen, Ronald A. (Stillwater, MN); Chen, Kenneth W. (Fair Oaks, CA); Comte, Christophe (Montreal, CA); Knudson, Orlin B. (Vadnais Heights, MN); Rouillard, Jean (Saint-Luc, CA)

2000-01-01

225

Diagnosis and Fault-tolerant Control for Ship Station Keeping  

DEFF Research Database (Denmark)

This paper adresses the design process of diagnosis and fault-tolerant control when the a system should operate despite multiple failures in sensors or actuators. Graph-teory based analysis of systems structure is demonstrated to be a unique design methodology that can cope with the diagnosis design for systems of high complexity, and also analyse the cases of cascaded or multiple faults. The paper takes as example a ship with two CP propellers, rudders and a bow thruster as actuators, and instrumentation with a suite of global position sensors, inertial navigation units and conventional gyro units to provide ship motion information. A salient feature of the design mehod is the ability to analyse cases where faults have occurrred and easily determine where in the faulty system diagnosability and controlability are retained.

Blanke, Mogens

2005-01-01

226

Visual Programming of Fault-Tolerant Distributed Applications  

OpenAIRE

The design of fault-tolerant distributed applications is a complex task. In addition to application functionalities, the programmer must consider issues related to both replication and distribution for every application component concerned with fault-tolerance. This paper describes an approach which combines two environments (Specs and Garf) so as to: (1) graphically design applications using high level Petri nets and (2) discharge the programmer of fault-tolerance issues.

Muganga, B.; Pacull, F.; Mazouni, K. R.; Wolff, A. -d

1995-01-01

227

Fault Tolerant Distributed Middleware for VLAB  

Science.gov (United States)

With increasingly large storage media, fast processors and improved data-collecting instruments, the datasets in scientific fields are growing at an exponential rate. How to analyze, visualize and manipulate those datasets (geographically distributed in most cases) easily and efficiently within a collaborative environment is rather challenging. We address this problem through NaradaBrokering (NB), a unique and flexible middleware application program interface (API) (http://www.naradabrokering.org, [1]). Topics, rather than IP addresses and hostnames are used to locate remote services and support collaboration. In our framework, the underlying hardware, middleware, system load, or resource availability is transparent to the end users. Web Services are the key components within this framework that enable building of loosely coupled applications. Furthermore, multiple service providers can provide identical services. Our system routes client requests to the best qualified service provider according to some default or user-defined conditions. This not only provides a desired Quality of Service (QOS), but also acts as a load balancing mechanism to better distribute the workload across available services. We have deployed NB between Florida State University, the University of Minnesota and Indiana University, and installed multiple instances of a wavelet service. We will demonstrate fault tolerance with respect to the faulty nodes in the NB network and faulty Wavelet service providers. Two users sharing an identical view through an applet will illustrate the collaborative nature of our system. [1] S. Pallickara and G. Fox, "NaradaBrokering: A Middleware Framework and Architec- ture for Enabling Durable Peer-to-Peer Grid", in Proceedings of ACM/IFIP/USENIX International Middleware Conference Middleware-2003. pp 41-61, (2003)

Lu, Z.; Bollig, E. F.; Erlebacher, G.; Gardgil, H.; Yuen, D.; Pierce, M.; Pallickara, S.

2005-12-01

228

Fault Tolerant Weighted Voting Algorithms  

OpenAIRE

Computer networks are now necessities of modern organisations and network security has become a major concern for them. In this paper we have proposed a holistic approach to network security with a hybrid model that includes an Intrusion Detection System (IDS) to detect network attacks and a survivability model to assess the impacts of undetected attacks. A neural network-based IDS has been proposed, where the learning mechanism for the neural network is evolved using genetic algorithm. Then ...

Azad Azadmanesh; Alireza Farahani; Lotfi Najjar

2008-01-01

229

A Blueprint for a Topologically Fault-tolerant Quantum Computer  

CERN Document Server

The advancement of information processing into the realm of quantum mechanics promises a transcendence in computational power that will enable problems to be solved which are completely beyond the known abilities of any "classical" computer, including any potential non-quantum technologies the future may bring. However, the fragility of quantum states poses a challenging obstacle for realization of a fault-tolerant quantum computer. The topological approach to quantum computation proposes to surmount this obstacle by using special physical systems -- non-Abelian topologically ordered phases of matter -- that would provide intrinsic fault-tolerance at the hardware level. The so-called "Ising-type" non-Abelian topological order is likely to be physically realized in a number of systems, but it can only provide a universal gate set (a requisite for quantum computation) if one has the ability to perform certain dynamical topology-changing operations on the system. Until now, practical methods of implementing thes...

Bonderson, Parsa; Freedman, Michael; Nayak, Chetan

2010-01-01

230

Fault tolerant attitude control for small unmanned aircraft systems equipped with an airflow sensor array.  

Science.gov (United States)

Inspired by sensing strategies observed in birds and bats, a new attitude control concept of directly using real-time pressure and shear stresses has recently been studied. It was shown that with an array of onboard airflow sensors, small unmanned aircraft systems can promptly respond to airflow changes and improve flight performances. In this paper, a mapping function is proposed to compute aerodynamic moments from the real-time pressure and shear data in a practical and computationally tractable formulation. Since many microscale airflow sensors are embedded on the small unmanned aircraft system surface, it is highly possible that certain sensors may fail. Here, an adaptive control system is developed that is robust to sensor failure as well as other numerical mismatches in calculating real-time aerodynamic moments. The advantages of the proposed method are shown in the following simulation cases: (i) feedback pressure and wall shear data from a distributed array of 45 airflow sensors; (ii) 50% failure of the symmetrically distributed airflow sensor array; and (iii) failure of all the airflow sensors on one wing. It is shown that even if 50% of the airflow sensors have failures, the aircraft is still stable and able to track the attitude commands. PMID:25405953

Shen, H; Xu, Y; Dickinson, B T

2014-12-01

231

Byzantine Fault Tolerance for Nondeterministic Applications  

CERN Document Server

All practical applications contain some degree of nondeterminism. When such applications are replicated to achieve Byzantine fault tolerance (BFT), their nondeterministic operations must be sanitized to ensure replica consistency. To the best of our knowledge, only two types of replica nondeterminism have been studied under the Byzantine fault model, which we refer to as wrappable nondeterminism and verifiable pre-determinable nondeterminism. The wrappable nondeterminism is a type of nondeterminism that can be controlled using an infrastructure-provided or application-provided wrapper function, without explicit inter-replica coordination. For example, information such as hostnames, process ids, file descriptors, etc. can be determined group-wise. The verifiable pre-determinable nondeterminism is a type of nondeterminism whose values can be independently chosen by the primary replica and verified by other replicas prior to the execution of a client's request, such as the operation to retrieve the local clock v...

Zhao, W

2007-01-01

232

Communication and Agreement Abstractions for Fault-Tolerant Asynchronous Distributed Systems  

CERN Document Server

Understanding distributed computing is not an easy task. This is due to the many facets of uncertainty one has to cope with and master in order to produce correct distributed software. Considering the uncertainty created by asynchrony and process crash failures in the context of message-passing systems, the book focuses on the main abstractions that one has to understand and master in order to be able to produce software with guaranteed properties. These fundamental abstractions are communication abstractions that allow the processes to communicate consistently (namely the register abstraction

Raynal, Michel

2010-01-01

233

ACID Support and Fault-Tolerant Database Systems on Cloud:A Review  

Directory of Open Access Journals (Sweden)

Full Text Available Cloud computing represents a different way to architect and remotely manage computing resources. One has only to establish an account with Microsoft or Amazon or Google to begin building and deploying application systems into a cloud. These systems can be, but certainly are not restricted to being simplistic. Some applications requires http services, some requires relational database or might require web service infrastructure and message queues. With clouds, IT-related applications can be provided as a service, which can be accessed through internet. There are platforms on cloud which provide scalability and high availability properties for web applications but there are problems related to data consistency at the same time, and in case of server failures, it becomes major problem in applications related to payment services. Data needs to be properly managed in cloud environment and to achieve proper transaction processing and consistency, RDBMS techniques such as ACID transactions should be used. Web services in Azure ensure application availability by replicating stored data at least three times and offer optional geolocation of replicas in separate Microsoft data centres to provide disaster recovery services.Azure storage services provide scalable persistent storage of structured tables, blobs and queues.

Pratiyush Guleria

2012-10-01

234

Design of fault-tolerant inductive position sensor  

Energy Technology Data Exchange (ETDEWEB)

The position sensors used in a magnetic bearing system are desirable to provide some degree of fault-tolerance as the rotor position is necessary for the feedback control to overcome the open-loop instability. In this paper, we propose and inductive position sensor that can cope with a partial fault in the sensor. The sensor has multiple poles which can be combined to sense the in-plane motion of the rotor. When a high-frequency voltage signal drives each pole of the sensor, the resulting current in the sensor coil contains information regarding the rotor position. The signal processing circuit of the sensor extracts this position information. In this paper, we used the magnetic circuit model of the sensor that shows the analytical relationship between the sensor output and the rotor motion. The multi-polar structure of the sensor makes it possible to introduce redundancy which can be exploited for fault-tolerant operation. The proposed sensor is applied to a magnetically levitated turbo-molecular vacuum pump. Experimental results validate the fault-tolerance algorithm.

Paek, Sung Kuk; Noh, Myoung Gyu [Chungnam National University, Daejeon (Korea, Republic of); Park, Byeong Cheol [Korea Electric Power Research Institute, Daejeon (Korea, Republic of)

2008-03-15

235

Superior model for fault tolerance computation in designing nano-sized circuit systems  

Science.gov (United States)

As CMOS technology scales nano-metrically, reliability turns out to be a decisive subject in the design methodology of nano-sized circuit systems. As a result, several computational approaches have been developed to compute and evaluate reliability of desired nano-electronic circuits. The process of computing reliability becomes very troublesome and time consuming as the computational complexity build ups with the desired circuit size. Therefore, being able to measure reliability instantly and superiorly is fast becoming necessary in designing modern logic integrated circuits. For this purpose, the paper firstly looks into the development of an automated reliability evaluation tool based on the generalization of Probabilistic Gate Model (PGM) and Boolean Difference-based Error Calculator (BDEC) models. The Matlab-based tool allows users to significantly speed-up the task of reliability analysis for very large number of nano-electronic circuits. Secondly, by using the developed automated tool, the paper explores into a comparative study involving reliability computation and evaluation by PGM and, BDEC models for different implementations of same functionality circuits. Based on the reliability analysis, BDEC gives exact and transparent reliability measures, but as the complexity of the same functionality circuits with respect to gate error increases, reliability measure by BDEC tends to be lower than the reliability measure by PGM. The lesser reliability measure by BDEC is well explained in this paper using distribution of different signal input patterns overtime for same functionality circuits. Simulation results conclude that the reliability measure by BDEC depends not only on faulty gates but it also depends on circuit topology, probability of input signals being one or zero and also probability of error on signal lines.

Singh, N. S. S.; Asirvadam, V. S.; Muthuvalu, M. S.

2014-10-01

236

Superior model for fault tolerance computation in designing nano-sized circuit systems  

International Nuclear Information System (INIS)

As CMOS technology scales nano-metrically, reliability turns out to be a decisive subject in the design methodology of nano-sized circuit systems. As a result, several computational approaches have been developed to compute and evaluate reliability of desired nano-electronic circuits. The process of computing reliability becomes very troublesome and time consuming as the computational complexity build ups with the desired circuit size. Therefore, being able to measure reliability instantly and superiorly is fast becoming necessary in designing modern logic integrated circuits. For this purpose, the paper firstly looks into the development of an automated reliability evaluation tool based on the generalization of Probabilistic Gate Model (PGM) and Boolean Difference-based Error Calculator (BDEC) models. The Matlab-based tool allows users to significantly speed-up the task of reliability analysis for very large number of nano-electronic circuits. Secondly, by using the developed automated tool, the paper explores into a comparative study involving reliability computation and evaluation by PGM and, BDEC models for different implementations of same functionality circuits. Based on the reliability analysis, BDEC gives exact and transparent reliability measures, but as the complexity of the same functionality circuits with respect to gate error increases, reliability measure by BDEC tends to be lower than the reliability measure by PGM. The lesser reliability measure by BDEC is well explained in this paper using distribution of different signal input patterns overtime for same functionality circuits. Simulation results conclude that the reliability measure by BDEC depends not only on faulty gates but it also depends on circuit topology, probability of input signals being one or zero and also probability of error on signal lines

237

Fault handling schemes in electronic systems with specific application to radiation tolerance and VLSI design  

Science.gov (United States)

Naturally occurring space radiation particles can produce transient and permanent changes in the electrical properties of electronic devices and systems. In this work, the transient radiation effects on DRAM and CMOS SRAM were considered. In addition, the effect of total ionizing dose radiation of the switching times of CMOS logic gates were investigated. Effects of transient radiation on the column and cell of MOS dynamic memory cell was simulated using SPICE. It was found that the critical charge of the bitline was higher than that of the cell. In addition, the critical charge of the combined cell-bitline was found to be dependent on the gate voltage of the access transistor. In addition, the effect of total ionizing dose radiation on the switching times of CMOS logic gate was obtained. The results of this work indicate that, the rise time of CMOS logic gates increases, while the fall time decreases with an increase in total ionizing dose radiation. Also, by increasing the size of the P-channel transistor with respect to that of the N-channel transistor, the propagation delay of CMOS logic gate can be made to decrease with, or be independent of an increase in total ionizing dose radiation. Furthermore, a method was developed for replacing polysilicon feedback resistance of SRAMs with a switched capacitor network. A switched capacitor SRAM was implemented using MOS Technology. The critical change of the switched capacitor SRAM has a very large critical charge. The results of this work indicate that switched capacitor SRAM is a viable alternative to SRAM with polysilicon feedback resistance.

Attia, John Okyere

1993-01-01

238

Design and Analysis of a Fault Tolerant Microprocessor Based on Triple Modular Redundancy Using VHDL  

Directory of Open Access Journals (Sweden)

Full Text Available There are numerous real time & operation critical systems in which the failure of the system is unacceptable at any stage of processing. The examples of such systems are like ATM machines, satellites, spacecraft etc. In this paper a fault tolerant microprocessor is developed by using checker units with a fault secure ALU and to develop a fault secure ALU the parity prediction logic and two rail checker method was used. Finally triple modular redundancy is applied to develop a fault tolerant processor. Proposed method was validated using the VHDL test environment and the results showed that the reliability of the system increased with a little area overhead.

Deepti Shinghal

2011-03-01

239

Rule-based fault diagnosis of hall sensors and fault-tolerant control of PMSM  

Science.gov (United States)

Hall sensor is widely used for estimating rotor phase of permanent magnet synchronous motor(PMSM). And rotor position is an essential parameter of PMSM control algorithm, hence it is very dangerous if Hall senor faults occur. But there is scarcely any research focusing on fault diagnosis and fault-tolerant control of Hall sensor used in PMSM. From this standpoint, the Hall sensor faults which may occur during the PMSM operating are theoretically analyzed. According to the analysis results, the fault diagnosis algorithm of Hall sensor, which is based on three rules, is proposed to classify the fault phenomena accurately. The rotor phase estimation algorithms, based on one or two Hall sensor(s), are initialized to engender the fault-tolerant control algorithm. The fault diagnosis algorithm can detect 60 Hall fault phenomena in total as well as all detections can be fulfilled in 1/138 rotor rotation period. The fault-tolerant control algorithm can achieve a smooth torque production which means the same control effect as normal control mode (with three Hall sensors). Finally, the PMSM bench test verifies the accuracy and rapidity of fault diagnosis and fault-tolerant control strategies. The fault diagnosis algorithm can detect all Hall sensor faults promptly and fault-tolerant control algorithm allows the PMSM to face failure conditions of one or two Hall sensor(s). In addition, the transitions between health-control and fault-tolerant control conditions are smooth without any additional noise and harshness. Proposed algorithms can deal with the Hall sensor faults of PMSM in real applications, and can be provided to realize the fault diagnosis and fault-tolerant control of PMSM.

Song, Ziyou; Li, Jianqiu; Ouyang, Minggao; Gu, Jing; Feng, Xuning; Lu, Dongbin

2013-07-01

240

Fault Tolerant Neuro-Robust Position Control of DC Motors  

OpenAIRE

DC motors are widely used in industry such as mechanics, robotics, and aerospace engineering. In this paper, we present a high performance control method for position control of DC motors. Fault-tolerant control model are also addressed to combine with neuro-robust control approach. It is shown that with the proposed control algorithms, external disturbances and coupled dynamics inherent in the system are effectively compensated using neural network unit in which no analytical estimation on t...

Ran Zhang; Marwan Bikdash

2011-01-01

241

Robustness and fault tolerance make brains harder to study  

OpenAIRE

Abstract Brains increase the survival value of organisms by being robust and fault tolerant. That is, brain circuits continue to operate as the organism needs, even when the circuit properties are significantly perturbed. Kispersky and colleagues, in a recent paper in Neural Systems & Circuits, have found that Granger Causality analysis, an important method used to infer circuit connections from the behavior of neurons within the circuit, is defeated by the mechanisms that give rise to this r...

Stevens Charles F; Srinivasan Shyam

2011-01-01

242

Learning fault-tolerant speech parsing with SCREEN  

OpenAIRE

This paper describes a new approach and a system SCREEN for fault-tolerant speech parsing. SCREEEN stands for Symbolic Connectionist Robust EnterprisE for Natural language. Speech parsing describes the syntactic and semantic analysis of spontaneous spoken language. The general approach is based on incremental immediate flat analysis, learning of syntactic and semantic speech parsing, parallel integration of current hypotheses, and the consideration of various forms of speech...

Wermter, Stefan; Weber, Volker

1994-01-01

243

A Framework-Based Approach for Fault-Tolerant Service Robots  

Directory of Open Access Journals (Sweden)

Full Text Available Recently the component?based approach has become a major trend in intelligent service robot development due to its reusability and productivity. The framework in a component?based system should provide essential services for application components. However, to our knowledge the existing robot frameworks do not yet support fault tolerance service. Moreover, it is often believed that faults can be handled only at the application level. In this paper, by extending the robot framework with the fault tolerance function, we argue that the framework?based fault tolerance approach is feasible and even has many benefits, including that: 1 the system integrators can build fault tolerance applications from non?fault?aware components; 2 the constraints of the components and the operating environment can be considered at the time of integration, which ? cannot be anticipated eaily at the time of component development; 3 consistency in system reliability can be obtained even in spite of diverse application component sources. In the proposed construction, we build XML rule files defining the rules for probing and determining the fault conditions of each component, contamination cases from a faulty component, and the possible recovery and safety methods. The rule files are established by a system integrator and the fault manager in the framework controls the fault tolerance process according to the rules. We demonstrate that the fault?tolerant framework can incorporate widely accepted fault tolerance techniques. The effectiveness and real?time performance of the framework?based approach and its techniques are examined by testing an autonomous mobile robot in typical fault scenarios.

Heejune Ahn

2012-11-01

244

Fault Tolerance in ZigBee Wireless Sensor Networks  

Science.gov (United States)

Wireless sensor networks (WSN) based on the IEEE 802.15.4 Personal Area Network standard are finding increasing use in the home automation and emerging smart energy markets. The network and application layers, based on the ZigBee 2007 PRO Standard, provide a convenient framework for component-based software that supports customer solutions from multiple vendors. This technology is supported by System-on-a-Chip solutions, resulting in extremely small and low-power nodes. The Wireless Connections in Space Project addresses the aerospace flight domain for both flight-critical and non-critical avionics. WSNs provide the inherent fault tolerance required for aerospace applications utilizing such technology. The team from Ames Research Center has developed techniques for assessing the fault tolerance of ZigBee WSNs challenged by radio frequency (RF) interference or WSN node failure.

Alena, Richard; Gilstrap, Ray; Baldwin, Jarren; Stone, Thom; Wilson, Pete

2011-01-01

245

Fault-tolerance techniques for SRAM-based FPGAs  

CERN Document Server

Fault-tolerance in integrated circuits is no longer the exclusive concern of space designers or highly-reliable applications engineers. Today, designers of many next-generation products must cope with reduced margin noises. The continuous evolution of fabrication technology of semiconductor components – shrinking transistor geometry, power supply, speed, and logic density – has significantly reduced the reliability of very deep submicron integrated circuits, in face of various internal and external sources of noise. Field Programmable Gate Arrays (FPGAs), customizable by SRAM cells, are the latest advance in the integrated circuit evolution: millions of memory cells to implement the logic, embedded memories, routing, and embedded microprocessors cores. These re-programmable systems-on-chip platforms must be fault-tolerant to cope with current requirements.

Kastensmidt, Fernanda Lima; Reis, Ricardo

2006-01-01

246

Fault-tolerant and Diagnostic Methods for Navigation  

DEFF Research Database (Denmark)

Precise and reliable navigation is crucial, and for reasons of safety, essential navigation instruments are often duplicated. Hardware redundancy is mostly used to manually switch between instruments should faults occur. In contrast, diagnostic methods are available that can use analytic redundancy to diagnose faults and autonomously provide valid navigation data, disregarding any faulty sensor data and use sensor fusion to obtain a best estimate for users. This paper discusses how diagnostic and fault-tolerant methods are applicable in marine systems. An example chosen is sensor fusion for navigation. Diagnosis design is based on parity relations and statistical hypothesis tests. Sensor fusion on healthy signals is made using a Kalman filter with inverse covariance updating to deal with asynchronous or missing data from instruments. The paper is presented at a tutorial level.

Blanke, Mogens

2003-01-01

247

Full Tolerant Archiving System  

Science.gov (United States)

The archiving system at the Italian center for Astronomical Archives (IA2) manages data from external sources like telescopes, observatories, or surveys and handles them in order to guarantee preservation, dissemination, and reliability, in most cases in a Virtual Observatory (VO) compliant manner. A metadata model dynamic constructor and a data archive manager are new concepts aimed at automatizing the management of different astronomical data sources in a fault tolerant environment. The goal is a full tolerant archiving system, nevertheless complicated by the presence of various and time changing data models, file formats (FITS, HDF5, ROOT, PDS, etc.) and metadata content, even inside the same project. To avoid this unpleasant scenario a novel approach is proposed in order to guarantee data ingestion, backward compatibility, and information preservation.

Knapic, C.; Molinaro, M.; Smareglia, R.

2013-10-01

248

A Reflective Object-Oriented Architecture for Developing Fault-Tolerant Software  

Scientific Electronic Library Online (English)

Full Text Available This paper proposes a reflective object-oriented architecture for developing fault-tolerant software. Reflective object-oriented programming promotes a modular structuring of systems by means of a new dimension of modularization—the separation between base-level objects and meta-level objects. This [...] property allows the creation of metaobjects responsible for managing tasks of application objects located at the base level. In the context of this work, computational reflection is applied to implement various strategies of fault tolerance at the meta-level in a transparent manner for the application programmer, that is, without interfering with the original structure of application objects that require fault tolerance facilities. The use of the proposed architecture has the following advantages: (i) separation of concerns, that is, separate the concerns related to the application domain from those related to the implementation of fault-tolerant mechanisms; (ii) it promotes code reuse of fault-tolerance mechanisms; (iii) it allows application programmers to use the most adequate fault-tolerance strategy for his implementation, and (iv) it provides a design that is more adaptable, flexible and easier to extend than traditional designs for developing fault-tolerant software. Our reflective architecture is composed of three levels, and is based on the abstraction of object groups.

Luiz E., Buzato; Cecília M. F., Rubira; Maria Lúcia B., Lisboa.

1997-11-01

249

Fault tolerant wind speed estimator used in wind turbine controllers  

DEFF Research Database (Denmark)

Advanced control schemes can be used to optimize energy production and cost of energy in modern wind turbines. These control schemes most often rely on wind speed estimations. These designs of wind speed estimators are, however, not designed to be fault tolerant towards faults in the used sensors. In this paper a fault tolerant wind speed estimator is designed based on a set of unknown input observers, each designed to the different sets of non-faulty sensors. Faults in the rotor, generator and wind speed sensors are considered. The designed wind speed estimator is passive tolerant towards faults in the wind speed sensors, and faults in the generator and rotor speed sensors are accommodated by an active fault tolerant observer scheme in which the faults are detected and identified, and the observer corresponding to the non-faulty sensors are used. The potential of the scheme is shown by applying the proposed wind speed estimator to a simulation model of a wind turbine. Notice that since the faults are accommodated in the observer scheme the actual controller do not need to be adjusted or reconfigured to accommodate the sensor faults.

Odgaard, Peter Fogh; Stoustrup, Jakob

2012-01-01

250

Formal Verification of Fault Tolerant NoC-based Architecture  

OpenAIRE

Approaches to design fault tolerant Network-on-Chip (NoC) for System-on-Chip(SoC)-based reconfigurable Field-Programmable Gate Array (FPGA) technology are challenges on the conceptualisation of the Multiprocessor System-on-Chip (MPSoC) design. For this purpose, the use of rigorous formal approaches, based on incremental design and proof theory, has become an essential step in a validation architecture. The Event-B formal method is a promising formal approach that can be used to develop, model...

Andriamiarina, Manamiary Bruno; Daoud, Hayat; Belarbi, Mostefa; Me?ry, Dominique; Tanougast, Camel

2012-01-01

251

Modeling and measurement of fault-tolerant multiprocessors  

Science.gov (United States)

The workload effects on computer performance are addressed first for a highly reliable unibus multiprocessor used in real-time control. As an approach to studing these effects, a modified Stochastic Petri Net (SPN) is used to describe the synchronous operation of the multiprocessor system. From this model the vital components affecting performance can be determined. However, because of the complexity in solving the modified SPN, a simpler model, i.e., a closed priority queuing network, is constructed that represents the same critical aspects. The use of this model for a specific application requires the partitioning of the workload into job classes. It is shown that the steady state solution of the queuing model directly produces useful results. The use of this model in evaluating an existing system, the Fault Tolerant Multiprocessor (FTMP) at the NASA AIRLAB, is outlined with some experimental results. Also addressed is the technique of measuring fault latency, an important microscopic system parameter. Most related works have assumed no or a negligible fault latency and then performed approximate analyses. To eliminate this deficiency, a new methodology for indirectly measuring fault latency is presented.

Shin, K. G.; Woodbury, M. H.; Lee, Y. H.

1985-01-01

252

Hybrid fault tolerance techniques to detect transient faults in embedded processors  

CERN Document Server

This book describes fault tolerance techniques based on software and hardware to create hybrid techniques. They are able to reduce overall performance degradation and increase error detection when associated with applications implemented in embedded processors. Coverage begins with an extensive discussion of the current state-of-the-art in fault tolerance techniques. The authors then discuss the best trade-off between software-based and hardware-based techniques and introduce novel hybrid techniques. Proposed techniques increase existing fault detection rates up to 100%, while maintaining low performance overheads in area and application execution time. • Discusses the effects of radiation on modern integrated circuits; • Provides a comprehensive overview of state-of-the art fault tolerance techniques based on software, hardware, and hybrid techniques; • Introduces novel hybrid fault tolerance techniques for reconfigurable FPGAs and ASICs; • Performs fault injection campaigns by simulation, bitstream ...

Azambuja, José Rodrigo; Becker, Jürgen

2014-01-01

253

CMOS processor element for a fault-tolerant SVD array  

Science.gov (United States)

This paper describes the VLSI implementation of a CORDIC based processor element for use in a fault-reconfigurable systolic array to compute the singular value decomposition (SVD) of a matrix. The chip implements a time redundant fault tolerance scheme, which allows processors adjacent to a faulty processor to act as computation backup during the systolic idle time. Also, processors around a fault collaborate to reroute data around the faulty processor. This form of time redundancy is attractive when tolerance to a few faults needs to be achieved with little hardware overhead.

Kota, Kishore; Cavallaro, Joseph R.

1993-11-01

254

Fault Tolerance In Grid Computing: State of the Art and Open Issues  

Directory of Open Access Journals (Sweden)

Full Text Available Fault tolerance is an important property for large scale computational grid systems, wheregeographically distributed nodes co-operate to execute a task. In order to achieve high level of reliabilityand availability, the grid infrastructure should be a foolproof fault tolerant. Since the failure of resourcesaffects job execution fatally, fault tolerance service is essential to satisfy QOS requirement in gridcomputing. Commonly utilized techniques for providing fault tolerance are job checkpointing andreplication. Both techniques mitigate the amount of work lost due to changing system availability but canintroduce significant runtime overhead. The latter largely depends on the length of checkpointing intervaland the chosen number of replicas, respectively. In case of complex scientific workflows where tasks canexecute in well defined order reliability is another biggest challenge because of the unreliable nature ofthe grid resources.

Ritu Garg

2011-02-01

255

Osprey: Implementing MapReduce-Style Fault Tolerance in a Shared-Nothing Distributed Database  

OpenAIRE

In this paper, we describe a scheme for tolerating and recovering from mid-query faults in a distributed shared nothing database. Rather than aborting and restarting queries, our system, Osprey, divides running queries into subqueries, and replicates data such that each subquery can be rerun on a different node if the node initially responsible fails or returns too slowly. Our approach is inspired by the fault tolerance properties of Map Reduce, in which map or reduce jobs are greedily assign...

Madden, Samuel R.; Yen, Christine Y.; Yang, Christopher M.; Tan, Ceryen C.

2010-01-01

256

Fault Tolerant Neuro-Robust Position Control of DC Motors  

Directory of Open Access Journals (Sweden)

Full Text Available DC motors are widely used in industry such as mechanics, robotics, and aerospace engineering. In this paper, we present a high performance control method for position control of DC motors. Fault-tolerant control model are also addressed to combine with neuro-robust control approach. It is shown that with the proposed control algorithms, external disturbances and coupled dynamics inherent in the system are effectively compensated using neural network unit in which no analytical estimation on the upper bound of the reconstruction error and uncertainties is needed. Simulations on various flight conditions also confirm the effectiveness of the proposed methods.

Ran Zhang

2011-10-01

257

Reliability and fault tolerance in the European ADS project  

CERN Document Server

After an introduction to the theory of reliability, this paper focuses on a description of the linear proton accelerator proposed for the European ADS demonstration project. Design issues are discussed and examples of cases of fault tolerance are given.

Biarrotte, Jean-Luc

2013-01-01

258

Tolerance of Radial-Basis Functions Against Stuck-At-Faults  

OpenAIRE

Neural networks are intended to be used in future nanoelectronic systems since neural architectures seem to be robust against malfunctioning elements and noise in their weights. In this paper we analyze the fault-tolerance of Radial Basis Function networks to Stuck- At-Faults at the trained weights and at the output of neurons. Moreover, we determine upper bounds on the mean square error arising from these faults.

Eickhoff, Ralf; Ru?ckert, Ulrich

2005-01-01

259

An extended induction motor model for investigation of faulted machines and fault tolerant variable speed drives  

OpenAIRE

High performance variable speed induction motor drives have been commercially available for industrial applications for many years. More recently they have been proposed for applications such as hybrid automotive drives, and some pump applications on more electric aircraft. These applications will require the drive to operate in the presence of faults i.e. they must be “Fault Tolerant” and be capable of “Fault Ride Through”. The aim of this project was therefore to investigate fault r...

Jasim, Omar

2010-01-01

260

A universal, fault-tolerant, non-linear analytic network for modeling and fault detection  

Energy Technology Data Exchange (ETDEWEB)

The similarities and differences of a universal network to normal neural networks are outlined. The description and application of a universal network is discussed by showing how a simple linear system is modeled by normal techniques and by universal network techniques. A full implementation of the universal network as universal process modeling software on a dedicated computer system at EBR-II is described and example results are presented. It is concluded that the universal network provides different feature recognition capabilities than a neural network and that the universal network can provide extremely fast, accurate, and fault-tolerant estimation, validation, and replacement of signals in a real system.

Mott, J.E. [Advanced Modeling Techniques Corp., Idaho Falls, ID (United States); King, R.W.; Monson, L.R.; Olson, D.L.; Staffon, J.D. [Argonne National Lab., Idaho Falls, ID (United States)

1992-03-06

261

A universal, fault-tolerant, non-linear analytic network for modeling and fault detection  

International Nuclear Information System (INIS)

The similarities and differences of a universal network to normal neural networks are outlined. The description and application of a universal network is discussed by showing how a simple linear system is modeled by normal techniques and by universal network techniques. A full implementation of the universal network as universal process modeling software on a dedicated computer system at EBR-II is described and example results are presented. It is concluded that the universal network provides different feature recognition capabilities than a neural network and that the universal network can provide extremely fast, accurate, and fault-tolerant estimation, validation, and replacement of signals in a real system

262

Topological fault-tolerance in cluster state quantum computation  

OpenAIRE

We describe a fault-tolerant version of the one-way quantum computer using a cluster state in three spatial dimensions. Topologically protected quantum gates are realized by choosing appropriate boundary conditions on the cluster. We provide equivalence transformations for these boundary conditions that can be used to simplify fault-tolerant circuits and to derive circuit identities in a topological manner. The spatial dimensionality of the scheme can be reduced to two by co...

Raussendorf, Robert; Harrington, Jim; Goyal, Kovid

2007-01-01

263

Simulating chemistry efficiently on fault-tolerant quantum computers  

OpenAIRE

Quantum computers can in principle simulate quantum physics exponentially faster than their classical counterparts, but some technical hurdles remain. Here we consider methods to make proposed chemical simulation algorithms computationally fast on fault-tolerant quantum computers in the circuit model. Fault tolerance constrains the choice of available gates, so that arbitrary gates required for a simulation algorithm must be constructed from sequences of fundamental operatio...

Jones, N. Cody; Whitfield, James D.; Mcmahon, Peter L.; Yung, Man-hong; Meter, Rodney; Aspuru-guzik, Ala?n; Yamamoto, Yoshihisa

2012-01-01

264

Fault Tolerance Structure of Radix 2 Signed Digital Adders  

Directory of Open Access Journals (Sweden)

Full Text Available In this study, structure of fault tolerance adder based on Radix 2 Signed Digital (SD representation is proposed. The “carry-free” property of the SD adder that faults impact limited to a few digits can be used to fault detection which is based on parity checking assumed single fault set. Using an encoding scheme to get the parity value of digits involved in computing, this parity values can be exploited to check the circuit. An error information register is set to store the checking results and the bits of the register indicate the corresponding units faulty or not. According to the fault type, recomputation or reconfiguration is used to error correction. The hardware overhead appending Fault-Tolerant is about 120% and the maximum combinational path delay of the proposed adder is constant with the increase of operands.

Jishun Kuang

2012-01-01

265

Architectural concepts and redundancy techniques in fault-tolerant computers  

Science.gov (United States)

This paper presents a description of redundancy techniques employed in the design of fault-tolerant computers, and a discussion of the effects of functional requirements, technology constraints, and cost considerations which enter into the choice of these techniques. The STAR computer, developed at the Jet Propulsion Laboratory for long-duration planetary spacecraft missions, is discussed along with several later fault-tolerant computer designs. The class of computers described in this paper employs dynamic redundancy, i.e., the machine is divided into a set of submodules, each with standby spares; a special hard core monitor unit detects and diagnoses faults, and effects automated recovery by replacing failed parts.

Rennels, D. A.

1974-01-01

266

Fault-tolerant holonomic quantum computation in surface codes  

Science.gov (United States)

We show that universal holonomic quantum computation can be achieved fault tolerantly by adiabatically deforming the gapped stabilizer Hamiltonian of the surface code, where quantum information is encoded in the degenerate ground space of the system Hamiltonian. We explicitly propose procedures to perform each logical operation, including logical state initialization, logical state measurement, logical controlled-not (cnot), state injection, distillation, etc. In particular, adiabatic braiding of different types of holes on the surface leads to a topologically protected, non-Abelian geometric logical cnot. Throughout the computation, quantum information is protected from both small perturbations and low-weight thermal excitations by a constant energy gap and is independent of the system size. Also, the Hamiltonian terms have weight at most four during the whole process. The effect of thermal error propagation is considered during the adiabatic code deformation. With the help of active error correction, this scheme is fault tolerant, in the sense that the computation time can be arbitrarily long for large-enough lattice size. It is shown that the frequency of error correction and the physical resources needed can be greatly reduced by the constant energy gap.

Zheng, Yi-Cong; Brun, Todd A.

2015-02-01

267

Coordinated Fault-Tolerance for High-Performance Computing Final Project Report  

Energy Technology Data Exchange (ETDEWEB)

With the Coordinated Infrastructure for Fault Tolerance Systems (CIFTS, as the original project came to be called) project, our aim has been to understand and tackle the following broad research questions, the answers to which will help the HEC community analyze and shape the direction of research in the field of fault tolerance and resiliency on future high-end leadership systems. Will availability of global fault information, obtained by fault information exchange between the different HEC software on a system, allow individual system software to better detect, diagnose, and adaptively respond to faults? If fault-awareness is raised throughout the system through fault information exchange, is it possible to get all system software working together to provide a more comprehensive end-to-end fault management on the system? What are the missing fault-tolerance features that widely used HEC system software lacks today that would inhibit such software from taking advantage of systemwide global fault information? What are the practical limitations of a systemwide approach for end-to-end fault management based on fault awareness and coordination? What mechanisms, tools, and technologies are needed to bring about fault awareness and coordination of responses on a leadership-class system? What standards, outreach, and community interaction are needed for adoption of the concept of fault awareness and coordination for fault management on future systems? Keeping our overall objectives in mind, the CIFTS team has taken a parallel fourfold approach. Our central goal was to design and implement a light-weight, scalable infrastructure with a simple, standardized interface to allow communication of fault-related information through the system and facilitate coordinated responses. This work led to the development of the Fault Tolerance Backplane (FTB) publish-subscribe API specification, together with a reference implementation and several experimental implementations on top of existing publish-subscribe tools. We enhanced the intrinsic fault tolerance capabilities representative implementations of a variety of key HPC software subsystems and integrated them with the FTB. Targeting software subsystems included: MPI communication libraries, checkpoint/restart libraries, resource managers and job schedulers, and system monitoring tools. Leveraging the aforementioned infrastructure, as well as developing and utilizing additional tools, we have examined issues associated with expanded, end-to-end fault response from both system and application viewpoints. From the standpoint of system operations, we have investigated log and root cause analysis, anomaly detection and fault prediction, and generalized notification mechanisms. Our applications work has included libraries for fault-tolerance linear algebra, application frameworks for coupled multiphysics applications, and external frameworks to support the monitoring and response for general applications. Our final goal was to engage the high-end computing community to increase awareness of tools and issues around coordinated end-to-end fault management.

Panda, Dhabaleswar Kumar [The Ohio State University; Beckman, Pete

2011-07-28

268

Fault-tolerant Control of Unmanned Underwater Vehicles with Continuous Faults: Simulations and Experiments  

OpenAIRE

A novel thruster fault diagnosis and accommodation method for open-frame underwater vehicles is presented in the paper. The proposed system consists of two units: a fault diagnosis unit and a fault accommodation unit. In the fault diagnosis unit an ICMAC (Improved Credit Assignment Cerebellar Model Articulation Controllers) neural network information fusion model is used to realize the fault identification of the thruster. The fault accommodation unit is based on direct calculations of moment...

Qian Liu; Daqi Zhu

2010-01-01

269

A Remote Characterization System and a fault-tolerant tracking system for subsurface mapping of buried waste sites  

International Nuclear Information System (INIS)

This paper describes two closely related projects that will provide new technology for characterizing hazardous waste burial sites. The first project, a collaborative effort by five of the national laboratories, involves the development and demonstration of a remotely controlled site characterization system. The Remote Characterization System (RCS) includes a unique low-signature survey vehicle, a base station, radio telemetry data links, satellite-based vehicle tracking, stereo vision, and sensors for noninvasive inspection of the surface and subsurface. The second project, conducted by the Idaho National Engineering Laboratory (INEL), involves the development of a position sensing system that can track a survey vehicle or instrument in the field. This system can coordinate updates at a rate of 200/s with an accuracy better than 0.1% of the distance separating the target and the sensor. It can employ acoustic or electromagnetic signals in a wide range of frequencies and can be operated as a passive or active device

270

Energy Efficient Fault Tolerant Routing Mechanism for Wireless Sensor Network  

Directory of Open Access Journals (Sweden)

Full Text Available Wireless sensor networks are self-organizing systems with resource-constraints that are often deployed in inhospitable and inaccessible environments in order to gather data about some phenomenon in the outside world. For most sensor network applications, point-to-point reliability is not the main objective (Paradis & Qi, 2007; Instead, reliable delivery of the interesting event to the server has to be guaranteed (may be with a certain probability. The communication in such networks is unpredictable and failure-prone, even more so than in regular wireless ad hoc networks. Hence, it is vital to provide fault tolerant techniques for distributed applications in sensor network. Several approaches have been proposed in many recent studies to address the fault tolerance issue in application, transport and/or routing layers. In this paper, we propose a slight modification of the conventional routing (destination, next hop by introducing the second hop information in the route construction phase in order to use it in case of node/link failure (skip only the failed link. Furthermore, the implementation of this proposed routing technique stabilizes the throughput, reduces the average jitter, provides low control overhead and decreases the energy consumption of the network. As a result, the reliability, availability, energy-efficiency and maintainability of the network are achieved.

Ahmed Roumane

2012-05-01

271

Scalability, performance, and fault tolerance of PACS architectures  

Science.gov (United States)

Three data-base architectures may be distinguished among Picture Archiving and Communication Systems (PACSs): (1) Configurations with logically and physically centralized data- base and file server, (2) systems with physically distributed file servers and a logically centralized data-base, and (3) installations with logically and physically distributed data- bases and file servers. A brief overview of these architectures and their scaleability, performance, and fault- tolerance is given. A PACS for an existing large university hospital is designed for the first as well as the second architecture using given image production data and workflow. We evaluate the fault-tolerance of the two architectures. By modeling the work-flow and employing queuing theory, solutions with practically realizable data transfer requirements are found for both architectures. With today's performance and cost of computers, storage, and information management technologies, the second and third architectures are preferably implemented, depending on the size of the installation. The architectures offer almost unlimited scaleability, very high fault-tolerance, and optimized workflow. We describe a modern commercial PACS that adheres to the open-systems concept and consists of software application programs that run, independent of specific computer and network components, on off-the-shelf hardware and under standard multi-platform operating systems and utilize commercial data-base management systems and network managers. The system is based on the second architecture with multiple islands of functionality, each with servers and archive modules and a physically distributed data-base. Our PACS architecture supports browser technology: Workstations use the data-base to determine the location of needed information and then, through the image browser, mount the appropriate file server for access. The architecture supports a concept similar to domain name server (DNS) directory services on the Internet. The system can be expanded to enterprise-wide installations with a logically distributed data-base. Openness, scaleability, and longevity of a PACS also strongly depend on the architecture of software applications in the operating and tool-set environment as well as on the distribution of image processing tasks across a PACS. These issues are discussed in the last section of our paper. We are presenting an image processing strategy that provides a consistent rendering of image gray-scale and spatial resolution throughout the entire PACS.

Blume, Hartwig R.; Prior, Fred W.; di Pierro, Milan C.; Goble, John C.; Lodgberg, Jonas; Kenney, Robert S.; Goeringer, Fred

1998-07-01

272

Superconducting generator field winding design for high fault tolerance  

International Nuclear Information System (INIS)

Development of rotating electrical machines with superconducting field windings is proceeding at numerous sites worldwide. The primary emphasis is on large turbine generators for application to power systems. The EPRI/Westinghouse 300 MVA superconducting generator program is directed towards demonstration of the technology in an actual utility environment for a long period of time. The concept of stability, in the case of superconducting generators, includes the traditional concepts of stability with respect to the electromechanical interactions and oscillations of the machine with the power system as well as the thermohydraulic stability of the cryogenic rotor and its helium supply system. Power system disturbances, such as faults, produce flow and pressure transients in the rotor cooling system. Depending upon the severity and time history of the disturbances, these transients may occasion normalization of the superconductor and destabilize the generator output through loss of field excitation. This paper addresses the question of designing the superconducting winding and its cryogenic cooling system for stability in the presence of large disturbances, a capability which has been called high fault tolerance

273

Fault-tolerant computer architecture based on INMOS transputer processor  

Science.gov (United States)

Redundant processing was used for several years in mission flight systems. In these systems, more than one processor performs the same task at the same time but only one processor is actually in real use. A fault-tolerance computer architecture based on the features provided by INMOS Transputers is presented. The Transputer architecture provides several communication links that allow data and command communication with other Transputers without the use of a bus. Additionally the Transputer allows the use of parallel processing to increase the system speed considerably. The processor architecture consists of three processors working in parallel keeping all the processors at the same operational level but only one processor is in real control of the process. The design allows each Transputer to perform a test to the other two Transputers and report the operating condition of the neighboring processors. A graphic display was developed to facilitate the identification of any problem by the user.

Ortiz, Jorge L.

1987-01-01

274

Empirical Study of FFANNs Tolerance to Weight Stuck at Zero Fault  

OpenAIRE

Fault tolerance property of artificial neural networks has been investigated with reference to the hardware model of artificial neural networks. Weight fault is an important link, which causes breakup between two nodes. In this paper weight fault has been explained.Experiments have been performed for Weight-stuck-0 fault. Effect of weight-stuck-0 fault on trained network has been analyzed in this paper. The obtained results suggest that networks are not fault tolerant to this type of fault.

Chandra Sekhar Rai; Pravin Chandra,; Amit Prakash Singh

2010-01-01

275

Buffered coscheduling for parallel programming and enhanced fault tolerance  

Science.gov (United States)

A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval. The buffered coscheduling method of this invention also enhances the fault tolerance of a network of parallel machine processors or distributed system processors

Petrini, Fabrizio (Los Alamos, NM); Feng, Wu-chun (Los Alamos, NM)

2006-01-31

276

Fault tolerant vector control of induction motor drive  

Science.gov (United States)

For electric composed of technical objects hazardous industries, such as nuclear, military, chemical, etc. an urgent task is to increase their resiliency and survivability. The construction principle of vector control system fault-tolerant asynchronous electric. Displaying recovery efficiency three-phase induction motor drive in emergency mode using two-phase vector control system. The process of formation of a simulation model of the asynchronous electric unbalance in emergency mode. When modeling used coordinate transformation, providing emergency operation electric unbalance work. The results of modeling transient phase loss motor stator. During a power failure phase induction motor cannot save circular rotating field in the air gap of the motor and ensure the restoration of its efficiency at rated torque and speed.

Odnokopylov, G.; Bragin, A.

2014-10-01

277

A benchmark for fault tolerant flight control evaluation  

Science.gov (United States)

A large transport aircraft simulation benchmark (REconfigurable COntrol for Vehicle Emergency Return - RECOVER) has been developed within the GARTEUR (Group for Aeronautical Research and Technology in Europe) Flight Mechanics Action Group 16 (FM-AG(16)) on Fault Tolerant Control (2004 2008) for the integrated evaluation of fault detection and identification (FDI) and reconfigurable flight control strategies. The benchmark includes a suitable set of assessment criteria and failure cases, based on reconstructed accident scenarios, to assess the potential of new adaptive control strategies to improve aircraft survivability. The application of reconstruction and modeling techniques, based on accident flight data, has resulted in high-fidelity nonlinear aircraft and fault models to evaluate new Fault Tolerant Flight Control (FTFC) concepts and their real-time performance to accommodate in-flight failures.

Smaili, H.; Breeman, J.; Lombaerts, T.; Stroosma, O.

2013-12-01

278

Improvement of Matrix Converter Drive Reliability by Online Fault Detection and a Fault-Tolerant Switching Strategy.  

DEFF Research Database (Denmark)

The matrix converter system is becoming a very promising candidate to replace the conventional two-stage ac/dc/ac converter, but system reliability remains an open issue. The most common reliability problem is that a bidirectional switch has an open-switch fault during operation. In this paper, a matrix converter driving a speed-controlled permanent-magnet synchronous motor is examined under a single open-switch fault. First, a new fault-detection method is proposed using only the motor currents. Second, a novel fault-tolerant switching strategy is presented. By treating the matrix converter as a two-stage rectifier/inverter, existing modulation techniques for the inverter stage can be reused, whereas the rectifier stage is modified by control to counteract the fault. However, the proposed techniques require no additional hardware devices or circuit modifications to the matrix converter. Experimental results show that the proposed method can maintain the motor speed with a maximum ripple of 2%—a fivefold improvement over the uncompensated system. The proposed method therefore offers a very economical and effective solution for the matrix converter fault tolerance problem.

Nguyen-Duy, Khiem; Liu, Tian-Hua

2011-01-01

279

Particle Filter Based Fault-tolerant ROV Navigation using Hydro-acoustic Position and Doppler Velocity Measurements  

DEFF Research Database (Denmark)

This paper presents a fault tolerant navigation system for a remotely operated vehicle (ROV). The navigation system uses hydro-acoustic position reference (HPR) and Doppler velocity log (DVL) measurements to achieve an integrated navigation. The fault tolerant functionality is based on a modied particle lter. This particle lter is able to run in an asynchronous manner to accommodate the measurement drop out problem, and it overcomes the measurement outliers by switching observation models. Simulations with experimental data show that this fault tolerant navigation system can accurately estimate the ROV kinematic states, even when sensor failures appear frequently.

Zhao, Bo; Blanke, Mogens

2012-01-01

280

A Fault Tolerant Resource Allocation Architecture for Mobile Grid  

Directory of Open Access Journals (Sweden)

Full Text Available Problem statement: In order to achieve high level of reliability and availability, the grid infrastructure should be fault tolerant. Since the failure of resources affects job execution fatally, fault tolerance service is essential to satisfy QoS requirement in grid computing with respect to mobile nodes. Approach: We propose a fault tolerant technique for improving reliability in mobile grid environment considering the node mobility. The Cluster head and monitoring agent was designed in such a way it addresses both resource and network failure and present recovery techniques for overcoming the faults. Results: The proposed model achieves a identifiable performance when compared to the previous model (HRAA. By simulation results, we analyze the node and link failures on parameters such as delivery ratio, throughput and delay against the rate of success. Conclusion: The proposed fault tolerant approach checks for availability of the nodes with least work load for transferring the executed job to cluster head providing an alternate path in case of failure thereby enhancing the reliability of the grid environment.

P. T. Vanathi

2012-01-01

281

Fault tolerant homopolar magnetic bearings with flux invariant control  

International Nuclear Information System (INIS)

The theory for a novel fault-tolerant 4-active-pole homopolar magnetic bearing is developed. If any one coil of the four coils in the bearing actuator fail, the remaining three coil currents change via an optimal distribution matrix such that the same opposing pole, C-core type, control fluxes as those of the un-failed bearing are produced. The homopolar magnetic bearing thus provides unaltered magnetic forces without any loss of the bearing load capacity even if any one coil suddenly fails. Numerical examples are provided to illustrate the novel fault-tolerant, 4-active pole homopolar magnetic bearings

282

Production of Reliable Flight Crucial Software: Validation Methods Research for Fault Tolerant Avionics and Control Systems Sub-Working Group Meeting  

Science.gov (United States)

The state of the art in the production of crucial software for flight control applications was addressed. The association between reliability metrics and software is considered. Thirteen software development projects are discussed. A short term need for research in the areas of tool development and software fault tolerance was indicated. For the long term, research in format verification or proof methods was recommended. Formal specification and software reliability modeling, were recommended as topics for both short and long term research.

Dunham, J. R. (editor); Knight, J. C. (editor)

1982-01-01

283

Fault-tolerant Sensor Fusion for Marine Navigation  

DEFF Research Database (Denmark)

Reliability of navigation data are critical for steering and manoeuvring control, and in particular so at high speed or in critical phases of a mission. Should faults occur, faulty instruments need be autonomously isolated and faulty information discarded. This paper designs a navigation solution where essential navigation information is provided even with multiple faults in instrumentation. The paper proposes a provable correct implementation through auto-generated state-event logics in a supervisory part of the algorithms. Test results from naval vessels document the performance and shows events where the fault-tolerant sensor fusion provided uninterrupted navigation data despite temporal instrument defects

Blanke, Mogens

2006-01-01

284

A Bypass-Ring Scheme for a Fault Tolerant Multicast  

Directory of Open Access Journals (Sweden)

Full Text Available We present a fault tolerant scheme for recovery from single or multiple node failures in multi-directional multicast trees. The scheme is based on cyclic structures providing alternative paths to eliminate faulty nodes and reroute the traffic. Our scheme is independent of message source and direction in the tree, provides a basis for on-the-fly repair and can be used as a platform for various strategies for reconnecting tree partitions. It only requires an underlying infrastructure to provide a reliable routing service. Although it is described in the context of a message multicast, the scheme can be used universally in all systems using tree-based overlay networks for communication among components.

V. Dynda

2003-01-01

285

Fault-Tolerant Operation of an Open-End Winding Five-Phase PMSM Drive with Inverter Faults  

OpenAIRE

Multi-phase machines are known for their fault-tolerant capability. However, star-connected machines have no fault tolerance to inverter switch short-circuit fault. This paper investigates the fault-tolerant operation of an open-end five-phase drive, i.e. a multi-phase machine fed with a dual-inverter supply. Open-circuit faults and inverter switch short-circuit faults are considered and handled with various degrees of reconfiguration. Theoretical developments and experimental results validat...

Meinguet, Fabien; Nguyen, Ngac-ky; Sandulescu, Paul; Kestelyn, Xavier; Semail, Eric

2013-01-01

286

Design of Fault Tolerant Network Interfaces for NoCs  

DEFF Research Database (Denmark)

Networks-on-Chip (NoCs) appeared as a strategy to deal with the communication requirements of complex IP-based System-on-Chips. As the complexity of designs increases and the technology scales down into the deep-submicron domain, the probability of malfunctions and failures in the NoC components increases. This paper focuses on the study and evaluation of techniques for increasing reliability and resilience of Network Interfaces (NIs). NIs act as interfaces between IP cores and the communication infrastructure; a faulty behavior in them could affect therefore the overall system. In this work, we propose a functional fault model for the NI components, and we present a two-level fault tolerant solution that can be employed for mitigating the effects of both single-event upset soft errors and hard errors on the NI. Experiments show that with a limited overhead we can obtain a significant reliability of the NI, while saving up to 83% in area with respect to a standard Triple Modular Redundancy implementation, as well as a significant energy reduction.

Fiorin, Leandro; Micconi, Laura

2011-01-01

287

Fault Tolerant Message Efficient Coordinator Election Algorithm in High Traffic Bidirectional Ring Network  

Directory of Open Access Journals (Sweden)

Full Text Available Nowadays use of distributed systems such as internet and cloud computing is growing dramatically. Coordinator existence in these systems is crucial due to processes coordinating and consistency requirement as well. However the growth makes their election algorithm even more complicated. Too many algorithms are proposed in this area but the two most well known one are Bully and Ring. In this paper we propose a fault tolerant coordinator election algorithm in typical bidirectional ring topology which is twice as fast as Ring algorithm although far fewer messages are passing due to election. Fault tolerance technique is applied which leads the waiting time for the election reaching to zero.

Danial Rahdari

2012-12-01

288

Analysis of GPS Abnormal Conditions within Fault Tolerant Control Laws  

Science.gov (United States)

The Global Position System (GPS) is a critical element for the functionality of autonomous flying vehicles. The GPS operation at normal and abnormal conditions directly impacts the trajectory tracking performance of the autonomous Unmanned Aerial Vehicles (UAVs) controllers. The effects of GPS parameter variation must be well understood and user-friendly computational tools must be developed to facilitate the design and evaluation of fault tolerant control laws. This thesis presents the development of a simplified GPS error model in Matlab/Simulink and its use performing a sensitivity analysis of GPS parameters effect under system normal and abnormal operation on different UAV trajectory tracking controllers. The model statistically generates position and velocity errors, simulates the effect of GPS satellite configuration on the position and velocity measurement accuracy, and implements a set of failures to the GPS readings. The model and its graphical user interface was integrated within the WVU UAV simulation environment as a masked Simulink block. The effects on the controllers' trajectory tracking performance of the following GPS parameters were investigated within normal operation ranges and outside: time delay, update rate, error standard deviation, bias, and major position and velocity failures. Several sets of control laws with fixed and adaptive parameters and of different levels of complexity have been used in this investigation. A complex performance index formulated in terms of tracking errors and control activity was used for control laws performance evaluation. The composition of various metrics within the performance index was performed using fixed and variable weights depending on the local characteristics of the commanded trajectory. This study has revealed that GPS error parameters have a significant impact on control laws performance. The proposed GPS model has proved to be a valuable, flexible tool for testing and evaluation of the fault tolerant capabilities of autonomous flight control laws.

Al-Sinbol, Gahssan

289

Design and analysis of linear fault-tolerant permanent-magnet vernier machines.  

Science.gov (United States)

This paper proposes a new linear fault-tolerant permanent-magnet (PM) vernier (LFTPMV) machine, which can offer high thrust by using the magnetic gear effect. Both PMs and windings of the proposed machine are on short mover, while the long stator is only manufactured from iron. Hence, the proposed machine is very suitable for long stroke system applications. The key of this machine is that the magnetizer splits the two movers with modular and complementary structures. Hence, the proposed machine offers improved symmetrical and sinusoidal back electromotive force waveform and reduced detent force. Furthermore, owing to the complementary structure, the proposed machine possesses favorable fault-tolerant capability, namely, independent phases. In particular, differing from the existing fault-tolerant machines, the proposed machine offers fault tolerance without sacrificing thrust density. This is because neither fault-tolerant teeth nor the flux-barriers are adopted. The electromagnetic characteristics of the proposed machine are analyzed using the time-stepping finite-element method, which verifies the effectiveness of the theoretical analysis. PMID:24982959

Xu, Liang; Ji, Jinghua; Liu, Guohai; Du, Yi; Liu, Hu

2014-01-01

290

A novel supervisory-based fault tolerant control: application to hydraulic process  

OpenAIRE

In this paper, we demonstrate a performance-based supervisory approach to achieve fault tolerance that does not require any explicit fault-diagnosis module. Moreover, in our real-time approach the information about the plant is unavailable. The time-valued trajectories generated by the system determine the behavior of the plant-working mode. These trajectories are supposed to follow a certain desired behavior. Therefore, the trajectories when does not belong to that desired behavior assumes t...

Jain, Tushar; Yame?, Joseph Julien; Sauter, Dominique

2011-01-01

291

Fault tolerance and reliability in integrated ship control : the ATOMOS concept  

DEFF Research Database (Denmark)

Various strategies for achieving fault tolerance in large scale control systems are discussed. The positive and negative impacts of distribution through network communication are presented. The ATOMOS framework for standardized reliable marine automation is presented along with the corresponding reliability issues. A generic framework for simulation of network traffic under fault conditions is suggested and the first practical experiences from a prototype implementation are reported.

Nielsen, Jens Frederik Dalsgaard; Izadi-Zamanabadi, Roozbeh

2002-01-01

292

FAULT TOLERANCE USING CREDENTIALS MANAGEMENT IN ONLINE TRANSACTION APPLICATION  

Directory of Open Access Journals (Sweden)

Full Text Available Web applications play a vital role in the IT field for satisfying the web customer. The customer always depends on the online transaction processing system. The web application has various forms which gives a complete service to the customer. These various forms have options that are used to satisfy the customer’s needs because of the attraction over web sites existing in the global market. The traditional web pages will be closed from the current session whenever the customer selects an improper option because of single sign-on property. Selection of wrong option that is not suitable for the current session will lead to reliability problem. If the same user needs the same service, again he has to navigate from home page to the required page, thus adding up extra burden on customer. The customer session should be maintained properly, so that the customer’s satisfaction is retained over the online web application. The existing system classifies the user with their access level and also their fault level. The main objective of the proposed work is to manage the credential in all levels in order to keep the valuable customer for a long time of access in the current session. The credential management and session management are used to manage a multilevel credential from web client to web resource level and vice versa. The options selected by the customer can be classified based on the fault and type of access. The credential management also performs the maintenance process for fixing the fault tolerance level to the web user. A complete log is recorded to trace the overall process in the online transaction processing.

L. Javid Ali

2014-07-01

293

Fault-tolerant Algorithms for Tick-Generation in Asynchronous Logic: Robust Pulse Generation  

CERN Document Server

Today's hardware technology presents a new challenge in designing robust systems. Deep submicron VLSI technology introduced transient and permanent faults that were never considered in low-level system designs in the past. Still, robustness of that part of the system is crucial and needs to be guaranteed for any successful product. Distributed systems, on the other hand, have been dealing with similar issues for decades. However, neither the basic abstractions nor the complexity of contemporary fault-tolerant distributed algorithms match the peculiarities of hardware implementations. This paper is intended to be part of an attempt striving to overcome this gap between theory and practice for the clock synchronization problem. Solving this task sufficiently well will allow to build a very robust high-precision clocking system for hardware designs like systems-on-chips in critical applications. As our first building block, we describe and prove correct a novel Byzantine fault-tolerant self-stabilizing pulse syn...

Dolev, Danny; Lenzen, Christoph; Schmid, Ulrich

2011-01-01

294

Microprocessor-based fault-tolerant nuclear turbine governor  

International Nuclear Information System (INIS)

A new microprocessor-based fault-tolerant nuclear turbine governor has been developed. Hierarchically distributed configuration and asynchronous triplicated architecture with middle value voting logic maximizes the plant availability. Problem-oriented language is provided for design ease and program maintainability. The turbine governor with these features is described with test results

295

Fault tolerant computer for nuclear power plant applications  

International Nuclear Information System (INIS)

A quadruply redundant synchronous fault tolerant processor (FTP) is now under fabrication at the C.S. Draper Laboratory to be used initially as a trip monitor for the Experimental Breeder Reactor EBR-II operated by the Argonne National Laboratory in Idaho Falls, Idaho. The hardware architecture of this processor is described and certain issues unique to quadruply redundant computers are discussed

296

Beam Dynamics Studies for the Fault Tolerance Assessment of the PDS-XADS Linac Design  

International Nuclear Information System (INIS)

In order to meet the high availability/reliability required by the PDS-XADS design, the accelerator needs to implement to the maximum possible extent a fault tolerance strategy that would allow beam operation in the presence of most of the envisaged faults that could occur in its beam line components. In this work, we report the results of beam dynamics simulations performed to characterize the effects of the faults of the main linac components (cavities and focusing magnets) on the beam parameters. The outcome of this activity is the definition of the possible corrective actions that could be conceived (and implemented in the system) in order to guarantee the fault tolerance characteristics of the accelerator. This work has been supported by the PDS-XADS program, funded by the EU 5th Framework Program under contract FIKW-CT-2001-00179

297

Combining dynamical decoupling with fault-tolerant quantum computation  

International Nuclear Information System (INIS)

We study how dynamical decoupling (DD) pulse sequences can improve the reliability of quantum computers. We prove upper bounds on the accuracy of DD-protected quantum gates and derive sufficient conditions for DD-protected gates to outperform unprotected gates. Under suitable conditions, fault-tolerant quantum circuits constructed from DD-protected gates can tolerate stronger noise and have a lower overhead cost than fault-tolerant circuits constructed from unprotected gates. Our accuracy estimates depend on the dynamics of the bath that couples to the quantum computer and can be expressed either in terms of the operator norm of the bath's Hamiltonian or in terms of the power spectrum of bath correlations; we explain in particular how the performance of recursively generated concatenated pulse sequences can be analyzed from either viewpoint. Our results apply to Hamiltonian noise models with limited spatial correlations.

298

Combining dynamical decoupling with fault-tolerant quantum computation  

CERN Document Server

We study how dynamical decoupling (DD) pulse sequences can improve the reliability of quantum computers. We prove upper bounds on the accuracy of DD-protected quantum gates and derive sufficient conditions for DD-protected gates to outperform unprotected gates. Under suitable conditions, fault-tolerant quantum circuits constructed from DD-protected gates can tolerate stronger noise, and have a lower overhead cost, than fault-tolerant circuits constructed from unprotected gates. Our accuracy estimates depend on the dynamics of the bath that couples to the quantum computer, and can be expressed either in terms of the operator norm of the bath's Hamiltonian or in terms of the power spectrum of bath correlations; we explain in particular how the performance of recursively generated concatenated pulse sequences can be analyzed from either viewpoint. Our results apply to Hamiltonian noise models with limited spatial correlations.

Ng, Hui Khoon; Preskill, John

2009-01-01

299

An Active Fault-Tolerant Control Method Ofunmanned Underwater Vehicles with Continuous and Uncertain Faults  

Directory of Open Access Journals (Sweden)

Full Text Available This paper introduces a novel thruster fault diagnosis and accommodation system for open-frame underwater vehicles with abrupt faults. The proposed system consists of two subsystems: a fault diagnosis subsystem and a fault accommodation sub-system. In the fault diagnosis subsystem a ICMAC(Improved Credit Assignment Cerebellar Model Articulation Controllers neural network is used to realize the on-line fault identification and the weighting matrix computation. The fault accommodation subsystem uses a control algorithm based on weighted pseudo-inverse to find the solution of the control allocation problem. To illustrate the proposed method effective, simulation example, under multi-uncertain abrupt faults, is given in the paper.

Yongsheng Yang

2008-11-01

300

Review of fault diagnosis and fault-tolerant control for modular multilevel converter of HVDC  

DEFF Research Database (Denmark)

This review focuses on faults in Modular Multilevel Converter (MMC) for use in high voltage direct current (HVDC) systems by analyzing the vulnerable spots and failure mechanism from device to system and illustrating the control & protection methods under failure condition. At the beginning, several typical topologies of MMC-HVDC systems are presented. Then fault types such as capacitor voltage unbalance, unbalance between upper and lower arm voltage are analyzed and the corresponding fault detection and diagnosis approaches are explained. In addition, more attention is dedicated to control strategies, when running in MMC faults or grid faults. This paper ends up with a discussion of other opportunities for future development.

Liu, Hui; Loh, Poh Chiang

2013-01-01

301

Fault Tolerant Control of Wind Turbines : A benchmark model  

DEFF Research Database (Denmark)

This paper presents a test benchmark model for the evaluation of fault detection and accommodation schemes. This benchmark model deals with the wind turbine on a system level, and it includes sensor, actuator, and system faults, namely faults in the pitch system, the drive train, the generator, and the converter system. Since it is a system-level model, converter and pitch system models are simplified because these are controlled by internal controllers working at higher frequencies than the system model. The model represents a three-bladed pitch-controlled variable-speed wind turbine with a nominal power of 4.8 MW. The fault detection and isolation (FDI) problem was addressed by several teams, and five of the solutions are compared in the second part of this paper. This comparison relies on additional test data in which the faults occur in different operating conditions than in the test data used for the FDI design.

Odgaard, Peter Fogh; Stoustrup, Jakob

2013-01-01

302

Fault Tolerant Wind Farm Control : a Benchmark Model  

DEFF Research Database (Denmark)

This paper presents a test benchmark model for the evaluation of fault detection and accommodation schemes. This benchmark model deals with the wind turbine on a system level, and it includes sensor, actuator, and system faults, namely faults in the pitch system, the drive train, the generator, and the converter system. Since it is a system-level model, converter and pitch system models are simplified because these are controlled by internal controllers working at higher frequencies than the system model. The model represents a three-bladed pitch-controlled variable-speed wind turbine with a nominal power of 4.8 MW. The fault detection and isolation (FDI) problem was addressed by several teams, and five of the solutions are compared in the second part of this paper. This comparison relies on additional test data in which the faults occur in different operating conditions than in the test data used for the FDI design.

Odgaard, Peter Fogh; Stoustrup, Jakob

2013-01-01

303

Making classical ground-state spin computing fault-tolerant.  

Science.gov (United States)

We examine a model of classical deterministic computing in which the ground state of the classical system is a spatial history of the computation. This model is relevant to quantum dot cellular automata as well as to recent universal adiabatic quantum computing constructions. In its most primitive form, systems constructed in this model cannot compute in an error-free manner when working at nonzero temperature. However, by exploiting a mapping between the partition function for this model and probabilistic classical circuits we are able to show that it is possible to make this model effectively error-free. We achieve this by using techniques in fault-tolerant classical computing and the result is that the system can compute effectively error-free if the temperature is below a critical temperature. We further link this model to computational complexity and show that a certain problem concerning finite temperature classical spin systems is complete for the complexity class Merlin-Arthur. This provides an interesting connection between the physical behavior of certain many-body spin systems and computational complexity. PMID:21230024

Crosson, I J; Bacon, D; Brown, K R

2010-09-01

304

Fault Tolerant Control Using Proportional-Integral-Derivative Controller Tuned by Genetic Algorithm  

Directory of Open Access Journals (Sweden)

Full Text Available Problem statement: The growing demand for reliability, maintainability and survivability in industrial processes has drawn significant research in fault detection and fault tolerant control domain. A fault is usually defined as an unexpected change in a system, such as component malfunction and variations in operating condition, which tends to degrade the overall system performance. The purpose of fault detection is to detect these malfunctions to take proper action in order to prevent faults from developing into a total system failure. Approach: In this study an effective integrated fault detection and fault tolerant control scheme was developed for a class of LTI system. The scheme was based on a Kalman filter for simultaneous state and fault parameter estimation, statistical decisions for fault detection and activation of controller reconfiguration. Proportional-Integral-Derivative (PID control schemes continue to provide the simplest and yet effective solutions to most of the control engineering applications today. Determination or tuning of the PID parameters continues to be important as these parameters have a great influence on the stability and performance of the control system. In this study GA was proposed to tune the PID controller. Results: The results reflect that proposed scheme improves the performance of the process in terms of time domain specifications, robustness to parametric changes and optimum stability. Also, A comparison with the conventional Ziegler-Nichols method proves the superiority of GA based system. Conclusion: This study demonstrates the effectiveness of genetic algorithm in tuning of a PID controller with optimum parameters. It is, moreover, proved to be robust to the variations in plant dynamic characteristics and disturbances assuring a parameter-insensitive operation of the process.

S. Kanthalakshmi

2011-01-01

305

Real-time optimal torque control of fault-tolerant permanent magnet brushless machines  

Science.gov (United States)

The paper describes issues that are pertinent to control system hardware and software design for the real-time implementation of an optimal torque control strategy for fault-tolerant permanent magnet brushless ac drives, and reports experimental results. The influence of the current control loop bandwidth and pulse width modulation on the torque ripple are investigated and quantified.

Max, L.; Wang, J.; Atallah, K.; Howe, D.

2005-05-01

306

Direct Fault Tolerant RLV Altitude Control: A Singular Perturbation Approach  

Science.gov (United States)

In this paper, we present a direct fault tolerant control (DFTC) technique, where by "direct" we mean that no explicit fault identification is used. The technique will be presented for the attitude controller (autopilot) for a reusable launch vehicle (RLV), although in principle it can be applied to many other applications. Any partial or complete failure of control actuators and effectors will be inferred from saturation of one or more commanded control signals generated by the controller. The saturation causes a reduction in the effective gain, or bandwidth of the feedback loop, which can be modeled as an increase in singular perturbation in the loop. In order to maintain stability, the bandwidth of the nominal (reduced-order) system will be reduced proportionally according to the singular perturbation theory. The presented DFTC technique automatically handles momentary saturations and integrator windup caused by excessive disturbances, guidance command or dispersions under normal vehicle conditions. For multi-input, multi-output (MIMO) systems with redundant control effectors, such as the RLV attitude control system, an algorithm is presented for determining the direction of bandwidth cutback using the method of minimum-time optimal control with constrained control in order to maintain the best performance that is possible with the reduced control authority. Other bandwidth cutback logic, such as one that preserves the commanded direction of the bandwidth or favors a preferred direction when the commanded direction cannot be achieved, is also discussed. In this extended abstract, a simplistic example is proved to demonstrate the idea. In the final paper, test results on the high fidelity 6-DOF X-33 model with severe dispersions will be presented.

Zhu, J. J.; Lawrence, D. A.; Fisher, J.; Shtessel, Y. B.; Hodel, A. S.; Lu, P.

2002-01-01

307

Making Classical Ground State Spin Computing Fault-Tolerant  

CERN Document Server

We examine a model of classical deterministic computing in which the ground state of the classical system is a spatial history of the computation. This model is relevant to quantum dot cellular automata as well as to recent universal adiabatic quantum computing constructions. In its most primitive form, systems constructed in this model cannot compute in an error free manner when working at non-zero temperature. However, by exploiting a mapping between the partition function for this model and probabilistic classical circuits we are able to show that it is possible to make this model effectively error free. We achieve this by using techniques in fault-tolerant classical computing and the result is that the system can compute effectively error free if the temperature is below a critical temperature. We further link this model to computational complexity and show that a certain problem concerning finite temperature classical spin systems is complete for the complexity class Merlin-Arthur. This provides an inter...

Crosson, Isaac J; Brown, Kenneth R

2010-01-01

308

Improving the Navigability of a Hexapod Robot using a Fault-Tolerant Adaptive Gait  

OpenAIRE

This paper encompasses a study on the development of a walking gait for fault tolerant locomotion in unstructured environments. The fault tolerant gait for adaptive locomotion fulfills stability conditions in opposition to a fault (locked joints or sensor failure) event preventing a robot to realize stable locomotion over uneven terrains. To accomplish this feat, a fault tolerant gait based on force?position control is proposed in this paper for a hexapod robot to enable stable walking with...

Umar Asif

2012-01-01

309

Wireless Fault-Tolerant Controllers in Cascaded Industrial Workcells Using Wi-Fi and Ethernet  

Directory of Open Access Journals (Sweden)

Full Text Available A Wireless Networked Control System using 802.11b is used to model fault-tolerance at the controller level of an industrial workcell. The fault-tolerance study in this paper presents the cascading of two independent workcells where each controller must be able to handle the load of both cells in case of failure of the other one. The intercommunication is completely wireless between the cells and this feature is investigated. The model incorporates unmodified 802.11b and 802.11g for communication. Sensors send sampled data to both controllers and the controllers to exchange a watchdog. The fault-free and faulty models are both simulated using OPNET Network Modeler. External interference on the critical intercommunication link is also investigated. Results of simulations are presented based on a 95% confidence analysis, guaranteeing correct system performance.

Tarek K. Refaat

2013-11-01

310

Lightweight storage and overlay networks for fault tolerance.  

Energy Technology Data Exchange (ETDEWEB)

The next generation of capability-class, massively parallel processing (MPP) systems is expected to have hundreds of thousands to millions of processors, In such environments, it is critical to have fault-tolerance mechanisms, including checkpoint/restart, that scale with the size of applications and the percentage of the system on which the applications execute. For application-driven, periodic checkpoint operations, the state-of-the-art does not provide a scalable solution. For example, on today's massive-scale systems that execute applications which consume most of the memory of the employed compute nodes, checkpoint operations generate I/O that consumes nearly 80% of the total I/O usage. Motivated by this observation, this project aims to improve I/O performance for application-directed checkpoints through the use of lightweight storage architectures and overlay networks. Lightweight storage provide direct access to underlying storage devices. Overlay networks provide caching and processing capabilities in the compute-node fabric. The combination has potential to signifcantly reduce I/O overhead for large-scale applications. This report describes our combined efforts to model and understand overheads for application-directed checkpoints, as well as implementation and performance analysis of a checkpoint service that uses available compute nodes as a network cache for checkpoint operations.

Oldfield, Ron A.

2010-01-01

311

Modular Multilevel Converter Control Strategy with Fault Tolerance  

DEFF Research Database (Denmark)

The Modular Multilevel Converter (MMC) technology has recently emerged in VSC-HVDC applications where it demonstrated higher efficiency and fault tolerance compared to the classical 2-level topology. Due to the ability of MMC to connect to HV levels, MMC can be also used in transformerless STATCOM and large wind turbines. In this paper, a control and communication strategy have been developed to accommodate tolerant module failure and capacitor voltage unbalance. A downscaled prototype converter has been built in order to validate and investigate the control strategy, and also test the proposed communication infrastructure based on Industrial Ethernet.

Teodorescu, Remus; Eni, Emanuel-Petre

2013-01-01

312

Empirical Study of FFANN Tolerance to Weight Stuck at Max/Min Fault  

OpenAIRE

Fault tolerance property of artificial neural networks has been investigatedwith reference to the hardware model of artificial neural networks. Weightfault is an important link, which causes breakup between two nodes. In thispaper three types of weight faults have been explained. Experiments have beenperformed to demonstrate fault tolerance behavior of feedforward artificialneural network for weight-stuck-MAX/MIN fault. Effect of weight-stuck-MAX/MIN fault on trained network has been analyzed...

Amit Prakash Singh; Chandra Shekhar Rai; Pravin Chandra,

2010-01-01

313

Design of a fault-tolerant controller for the SP-100 space reactor  

International Nuclear Information System (INIS)

The control system of an SP-100 space reactor is a key element of space reactor design to meet the space mission requirements of safety, reliability, and life expectancy. In this work, a fault-tolerant controller (FTC) is developed to control the thermoelectric (TE) power in the SP-100 space reactor. A fault-tolerant controller makes the control system stable and retains acceptable performance even under system faults. The objectives of the proposed model predictive controller are to minimize both the difference between the predicted TE power and the desired power, and the variation of control drum angle that adjusts the control reactivity. Also, the objectives are subject to constraints of maximum and minimum control drum angle and maximum drum angle variation speed. The model predictive controller incorporates a fault detection and diagnostics algorithm so that the controller can work properly even under input and output measurement faults. A lumped parameter simulation model of the SP-100 nuclear space reactor is used to verify the proposed controller design. Simulation result show that the TE generator power level, regulated by the proposed controller, could track the target power level effectively even under measurement faults, satisfying all control constraints. (authors)

314

Topological fault-tolerance in cluster state quantum computation  

Energy Technology Data Exchange (ETDEWEB)

We describe a fault-tolerant version of the one-way quantum computer using a cluster state in three spatial dimensions. Topologically protected quantum gates are realized by choosing appropriate boundary conditions on the cluster. We provide equivalence transformations for these boundary conditions that can be used to simplify fault-tolerant circuits and to derive circuit identities in a topological manner. The spatial dimensionality of the scheme can be reduced to two by converting one spatial axis of the cluster into time. The error threshold is 0.75% for each source in an error model with preparation, gate, storage and measurement errors. The operational overhead is poly-logarithmic in the circuit size.

Raussendorf, R [Perimeter Institute for Theoretical Physics, Waterloo, ON, M6P 1N8 (Canada); Harrington, J [Perimeter Institute for Theoretical Physics, Waterloo, ON, M6P 1N8 (Canada); Goyal, K [Institute for Quantum Information, California Institute of Technology, Pasadena, CA 91125 (United States)

2007-06-15

315

Fault Tolerant Characteristics of Artificial Neural Network Electronic Hardware  

Science.gov (United States)

The fault tolerant characteristics of analog-VLSI artificial neural network (with 32 neurons and 532 synapses) chips are studied by exposing them to high energy electrons, high energy protons, and gamma ionizing radiations under biased and unbiased conditions. The biased chips became nonfunctional after receiving a cumulative dose of less than 20 krads, while the unbiased chips only started to show degradation with a cumulative dose of over 100 krads. As the total radiation dose increased, all the components demonstrated graceful degradation. The analog sigmoidal function of the neuron became steeper (increase in gain), current leakage from the synapses progressively shifted the sigmoidal curve, and the digital memory of the synapses and the memory addressing circuits began to gradually fail. From these radiation experiments, we can learn how to modify certain designs of the neural network electronic hardware without using radiation-hardening techniques to increase its reliability and fault tolerance.

Zee, Frank

1995-01-01

316

Compilation and Synthesis for Fault-Tolerant Digital Microfluidic Biochips  

DEFF Research Database (Denmark)

Microfluidic-based biochips are replacing the conventional biochemical analyzers, by integrating all the necessary functions for biochemical analysis using microfluidics. The digital microfluidic biochips (DMBs) manipulate discrete amounts of fluids of nanoliter volume, named droplets, on an array of electrodes to perform operations such as dispensing, transport, mixing, split, dilution and detection. Researchers have proposed compilation approaches, which, starting from a biochemical application and a biochip architecture, determine the allocation, resource binding, scheduling, placement and routing of the operations in the application. During the execution of a bioassay, operations could experience transient faults, thus impacting negatively the correctness of the application. We have proposed both offline (design time) and online (runtime) recovery strategies. The online recovery strategy decides the introduction of the redundancy required for fault-tolerance. We consider both time redundancy, i.e., re-executing erroneous operations, and space redundancy, i.e., creating redundant droplets for fault-tolerance. Error recovery is performed such that the number of transient faults tolerated is maximized and the timing constraints of the biochemical application are satisfied. Previous work has assumed that the biochip architecture is given, and most approaches consider a rectangular shape for the electrode array, where operations execute on rectangular “modules” formed of electrodes. However, non-regular application-specific architectures are common in practice. Hence, we have proposed an approach to the synthesis of application-specific architectures, such that the cost is minimized and the timing constraints of the application are satisfied. We propose an algorithm to build a library of non-regular modules for a given applicationspecific architecture, so that the area of a non-regular application-specific biochip can be used effectively. During fabrication, DMBs can be affected by permanent faults, which may lead to the failure of the application. Our approach introduces redundant electrodes to synthesize fault-tolerant architectures aiming at increasing the yield of DMBs. We also propose a method to estimate, at design time, the application completion time in case of permanent faults in order to verify if an application can be successfully run on the architecture. The proposed approaches were evaluated using several real-life case studies and synthetic benchmarks.

Alistar, Mirela

2014-01-01

317

FAULT TOLERANT SCHEDULING STRATEGY FOR COMPUTATIONAL GRID ENVIRONMENT  

Directory of Open Access Journals (Sweden)

Full Text Available Computational grids have the potential for solving large-scale scientific applications using heterogeneous and geographically distributed resources. In addition to the challenges of managing and scheduling these applications, reliability challenges arise because of the unreliable nature of grid infrastructure. Two major problems that are critical to the effective utilization of computational resources are efficient scheduling of jobs and providing fault tolerance in a reliable manner. This paper addresses these problems by combining the checkpoint replication based fault tolerance echanism with Minimum Total Time to Release (MTTR job scheduling algorithm. TTR includes the service time of the job, waiting time in the queue, transfer of input and output data to and from the resource. The MTTR algorithm minimizes the TTR by selecting a computational resource based on job requirements, job characteristics and hardware features of the resources. The fault tolerance mechanism used here sets the job checkpoints based on the resource failure rate. If resource failure occurs, the job is restarted from its last successful state using a checkpoint file from another grid resource. Acritical aspect for an automatic recovery is the availability of checkpoint files. A strategy to increase the availability of checkpoints is replication. Replica Resource Selection Algorithm (RRSA is proposed to provide Checkpoint Replication Service (CRS. Globus Tool Kit is used as the grid middleware to set up a grid environment and evaluate the performance of the proposed approach. The monitoring tools Ganglia and NWS (Network Weather Service are used to gather hardware and network details respectively. The experimental results demonstrate that, the proposed approach effectively schedule the grid jobs with fault tolerant way thereby reduces TTR of the jobs submitted in the grid. Also, it increases the percentage of jobs completed within specified deadline and making the grid trustworthy.

MALARVIZHI NANDAGOPAL,

2010-09-01

318

Logic Synthesis for Fault-Tolerant Quantum Computers  

OpenAIRE

Efficient constructions for quantum logic are essential since quantum computation is experimentally challenging. This thesis develops quantum logic synthesis as a paradigm for reducing the resource overhead in fault-tolerant quantum computing. The model for error correction considered here is the surface code. After developing the theory behind general logic synthesis, the resource costs of magic-state distillation for the $T = \\exp(i \\pi (I-Z)/8)$ gate are quantitatively an...

Jones, N. Cody

2013-01-01

319

Safety in Numbers: Fault Tolerance in Robot Swarms  

OpenAIRE

The swarm intelligence literature frequently asserts that swarms exhibit high levels of robustness. That claim is, however, rather less frequently supported by empirical or theoretical analysis. But what do we mean by a 'robust' swarm? How would we measure the robustness or – to put it another way – fault-tolerance of a robotic swarm? These questions are not just of academic interest. If swarm robotics is to make the transition from the laboratory to real-world engineering implementation,...

Winfield, A. F. T.; Nembrini, Julien

2006-01-01

320

Resource optimization for fault-tolerant quantum computing  

OpenAIRE

In this thesis we examine a variety of techniques for reducing the resources required for fault-tolerant quantum computation. First, we show how to simplify universal encoded computation by using only transversal gates and standard error correction procedures, circumventing existing no-go theorems. We then show how to simplify ancilla preparation, reducing the cost of error correction by more than a factor of four. Using this optimized ancilla preparation, we develop improve...

Paetznick, Adam

2014-01-01

321

BFTDT: Byzantine Fault Tolerance tryout for Dependable Transactions in Cloud  

Directory of Open Access Journals (Sweden)

Full Text Available Cloud Web Services (CWS is the technology used for business collaboration and integration among the web users. The Web Services Atomic Transactions (WS-AT have been used for the trusted distributed transaction processing over the web. The WS-AT in the distributed sense has byzantine faults to overcome that Byzantine Faults Techniques (BFT is used. The reliable coordinator provides the services that are Coordination services, Activation services, Registration Services and Completion services which make the transaction effective and reliable. In the trusted environment, to evade congestion of the resources, fair share bandwidth allocation scheme is used to allocate separate bandwidth for each web users and the transaction is processed Coordinator server and the Transaction Processing Monitor (TPM. The WS-AT for business applications analysis shows the high degree of dependability, security, trust, fault tolerance and fairness of the resources in the trusted environment.

Gayathri S

2012-11-01

322

Bayesian reliability assessment of legacy safety-critical systems upgraded with fault-tolerant off-the-shelf software  

International Nuclear Information System (INIS)

This paper presents a new way of applying Bayesian assessment to systems, which consist of many components. Full Bayesian inference with such systems is problematic, because it is computationally hard and, far more seriously, one needs to specify a multivariate prior distribution with many counterintuitive dependencies between the probabilities of component failures. The approach taken here is one of decomposition. The system is decomposed into partial views of the systems or part thereof with different degrees of detail and then a mechanism of propagating the knowledge obtained with the more refined views back to the coarser views is applied (recalibration of coarse models). The paper describes the recalibration technique and then evaluates the accuracy of recalibrated models numerically on contrived examples using two techniques: u-plot and prequential likelihood, developed by others for software reliability growth models. The results indicate that the recalibrated predictions are often more accurate than the predictions obtained with the less detailed models, although this is not guaranteed. The techniques used to assess the accuracy of the predictions are accurate enough for one to be able to choose the model giving the most accurate prediction

323

Faster quantum chemistry simulation on fault-tolerant quantum computers  

International Nuclear Information System (INIS)

Quantum computers can in principle simulate quantum physics exponentially faster than their classical counterparts, but some technical hurdles remain. We propose methods which substantially improve the performance of a particular form of simulation, ab initio quantum chemistry, on fault-tolerant quantum computers; these methods generalize readily to other quantum simulation problems. Quantum teleportation plays a key role in these improvements and is used extensively as a computing resource. To improve execution time, we examine techniques for constructing arbitrary gates which perform substantially faster than circuits based on the conventional Solovay–Kitaev algorithm (Dawson and Nielsen 2006 Quantum Inform. Comput. 6 81). For a given approximation error ?, arbitrary single-qubit gates can be produced fault-tolerantly and using a restricted set of gates in time which is O(log??) or O(log?log??); with sufficient parallel preparation of ancillas, constant average depth is possible using a method we call programmable ancilla rotations. Moreover, we construct and analyze efficient implementations of first- and second-quantized simulation algorithms using the fault-tolerant arbitrary gates and other techniques, such as implementing various subroutines in constant time. A specific example we analyze is the ground-state energy calculation for lithium hydride. (paper)

324

Unconstrained and Constrained Fault-Tolerant Resource Allocation  

CERN Document Server

First, we study the Unconstrained Fault-Tolerant Resource Allocation (UFTRA) problem (a.k.a. FTFA problem in \\cite{shihongftfa}). In the problem, we are given a set of sites equipped with an unconstrained number of facilities as resources, and a set of clients with set $\\mathcal{R}$ as corresponding connection requirements, where every facility belonging to the same site has an identical opening (operating) cost and every client-facility pair has a connection cost. The objective is to allocate facilities from sites to satisfy $\\mathcal{R}$ at a minimum total cost. Next, we introduce the Constrained Fault-Tolerant Resource Allocation (CFTRA) problem. It differs from UFTRA in that the number of resources available at each site $i$ is limited by $R_{i}$. Both problems are practical extensions of the classical Fault-Tolerant Facility Location (FTFL) problem \\cite{Jain00FTFL}. For instance, their solutions provide optimal resource allocation (w.r.t. enterprises) and leasing (w.r.t. clients) strategies for the cont...

Liao, Kewen

2011-01-01

325

A Fault Tolerance Management Framework for Wireless Sensor Networks  

Directory of Open Access Journals (Sweden)

Full Text Available

Wireless Sensor Networks (WSNs have the potential of significantly enhancing our ability to monitor and interact with our physical environment. Realizing a fault tolerant operation is critical to the success of WSNs. The main challenge is providing fault tolerance (FT while conserving the limited resources of the network. Many schemes have been proposed in this area. Our main contribution in this paper is to propose a general framework for fault tolerance in WSNs. The proposed framework can be used to guide the design and development of FT solutions and to evaluate existing ones. We present a comparative study of the existing schemes and identify potential enhancements. A primary module of the framework is the learning and refinement module which enables a FT solution to be adaptive and self-configurable based on changes in the network conditions. We view this as vital to the resource-constrained and highly dynamic WSNs. Up to our knowledge, we are the first to propose the implementation of such module in FT solutions for WSNs.Index Terms

Hesham El-Sayed

2007-06-01

326

A Dynamic Slack Management Technique for Real-Time Distributed Embedded System with Enhanced Fault Tolerance and Resource Constraints  

OpenAIRE

This project work aims to develop a dynamic slack management technique, for real-time distributed embedded systems to reduce the total energy consumption in addition to timing, precedence and resource constraints. The Slack Distribution Technique proposed considers a modified Feedback Control Scheduling (FCS) algorithm. This algorithm schedules dependent tasks effectively with precedence and resource constraints. It further minimizes the schedule length and utilizes the available slack to inc...

Santhi Baskaran; Gugan, I.; Aswin Kumar, A.; Govindarajan, D.

2011-01-01

327

Fault-tolerant Control of Unmanned Underwater Vehicles with Continuous Faults: Simulations and Experiments  

Directory of Open Access Journals (Sweden)

Full Text Available A novel thruster fault diagnosis and accommodation method for open-frame underwater vehicles is presented in the paper. The proposed system consists of two units: a fault diagnosis unit and a fault accommodation unit. In the fault diagnosis unit an ICMAC (Improved Credit Assignment Cerebellar Model Articulation Controllers neural network information fusion model is used to realize the fault identification of the thruster. The fault accommodation unit is based on direct calculations of moment and the result of fault identification is used to find the solution of the control allocation problem. The approach resolves the continuous faulty identification of the UV. Results from the experiment are provided to illustrate the performance of the proposed method in uncertain continuous faulty situation.

Qian Liu

2010-02-01

328

Experimental Robot Position Sensor Fault Tolerance Using Accelerometers and Joint Torque Sensors  

Science.gov (United States)

Robot systems in critical applications, such as those in space and nuclear environments, must be able to operate during component failure to complete important tasks. One failure mode that has received little attention is the failure of joint position sensors. Current fault tolerant designs require the addition of directly redundant position sensors which can affect joint design. The proposed method uses joint torque sensors found in most existing advanced robot designs along with easily locatable, lightweight accelerometers to provide a joint position sensor fault recovery mode. This mode uses the torque sensors along with a virtual passive control law for stability and accelerometers for joint position information. Two methods for conversion from Cartesian acceleration to joint position based on robot kinematics, not integration, are presented. The fault tolerant control method was tested on several joints of a laboratory robot. The controllers performed well with noisy, biased data and a model with uncertain parameters.

Aldridge, Hal A.; Juang, Jer-Nan

1997-01-01

329

Empirical Study of FFANN Tolerance to Weight Stuck at Max/Min Fault  

Directory of Open Access Journals (Sweden)

Full Text Available Fault tolerance property of artificial neural networks has been investigatedwith reference to the hardware model of artificial neural networks. Weightfault is an important link, which causes breakup between two nodes. In thispaper three types of weight faults have been explained. Experiments have beenperformed to demonstrate fault tolerance behavior of feedforward artificialneural network for weight-stuck-MAX/MIN fault. Effect of weight-stuck-MAX/MIN fault on trained network has been analyzed in this paper. Theobtained results suggest that networks are not fault tolerant to this type offault.

Amit Prakash Singh

2010-04-01

330

Task-based Dynamic Fault Tolerance for Humanoid Robot Applications and Its Hardware Implementation  

Directory of Open Access Journals (Sweden)

Full Text Available This paper presents a new fault tolerance scheme suitable for humanoid robot applications. In the future, various tasks ranging from daily chores to safety-related tasks will be carried out by individual humanoid robots. If the importance of the tasks is different, the required dependability will vary accordingly. Therefore, for mobile humanoid robots operating under power constraints, fault tolerance that dynamically changes based on the importance of the tasks is desirable because fault-tolerant designs involving hardware redundancy are power intensive. In the proposed fault tolerance scheme, a duplex computer system switches between hot standby and cold standby according to each individual task. However, in mobile humanoid robots, a safety issue arises when cold standby is used for the standby computer unit. Since an unpowered unit cannot immediately start to operate, a biped-walking robot falls down when failover occurs during cold standby. This paper proposes a safety failover method to resolve this issue and describes the hardware design of the safety failover subsystem.

Masayuki Murakami

2008-08-01

331

Reliable multicast fault tolerant MPI in the Grid environment  

CERN Document Server

Grid environments have recently been developed with low stretch and overheads that increase with the logarithm of the number of nodes in the system. Getting and sending data to/from a large numbers of nodes is gaining importance due to an increasing number of independent data providers and the heterogeneity of the network/Grid. One of the key challenges is to achieve a balance between low bandwidth consumption and good reliability. In this paper we present an implementation of a reliable multicast protocol over a fault tolerant MPI: MPICHV2. It can provide one way to solve the problem of transferring large chunks of data between applications running on a grid with limited network links. We first show that we can achieve similar performance as the MPICH-P4 implementation by using multicast with data compression in a cluster. Next, we provide a theoretical cluster organization and GRID network architecture to harness the performance provided by using multicast. Finally, we present the conclusion and future work...

Hudzia, B; Hudzia, Benoit; Petiton, Serge

2006-01-01

332

Empirical Study of FFANNs Tolerance to Weight Stuck at Zero Fault  

Directory of Open Access Journals (Sweden)

Full Text Available Fault tolerance property of artificial neural networks has been investigated with reference to the hardware model of artificial neural networks. Weight fault is an important link, which causes breakup between two nodes. In this paper weight fault has been explained.Experiments have been performed for Weight-stuck-0 fault. Effect of weight-stuck-0 fault on trained network has been analyzed in this paper. The obtained results suggest that networks are not fault tolerant to this type of fault.

Chandra Sekhar Rai

2010-04-01

333

Fault-tolerance techniques for high-speed fiber-optic networks  

Science.gov (United States)

Four fiber optic network topologies (linear bus, ring, central star, and distributed star) are discussed relative to their application to high data throughput, fault tolerant networks. The topologies are also examined in terms of redundancy and the need to provide for single point, failure free (or better) system operation. Linear bus topology, although traditionally the method of choice for wire systems, presents implementation problems when larger fiber optic systems are considered. Ring topology works well for high speed systems when coupled with a token passing protocol, but it requires a significant increase in protocol complexity to manage system reconfiguration due to ring and node failures. Star topologies offer a natural fault tolerance, without added protocol complexity, while still providing high data throughput capability.

Deruiter, John

1991-01-01

334

Fault-Tolerant, Radiation-Hard DSP  

Science.gov (United States)

Commercial digital signal processors (DSPs) for use in high-speed satellite computers are challenged by the damaging effects of space radiation, mainly single event upsets (SEUs) and single event functional interrupts (SEFIs). Innovations have been developed for mitigating the effects of SEUs and SEFIs, enabling the use of very-highspeed commercial DSPs with improved SEU tolerances. Time-triple modular redundancy (TTMR) is a method of applying traditional triple modular redundancy on a single processor, exploiting the VLIW (very long instruction word) class of parallel processors. TTMR improves SEU rates substantially. SEFIs are solved by a SEFI-hardened core circuit, external to the microprocessor. It monitors the health of the processor, and if a SEFI occurs, forces the processor to return to performance through a series of escalating events. TTMR and hardened-core solutions were developed for both DSPs and reconfigurable field-programmable gate arrays (FPGAs). This includes advancement of TTMR algorithms for DSPs and reconfigurable FPGAs, plus a rad-hard, hardened-core integrated circuit that services both the DSP and FPGA. Additionally, a combined DSP and FPGA board architecture was fully developed into a rad-hard engineering product. This technology enables use of commercial off-the-shelf (COTS) DSPs in computers for satellite and other space applications, allowing rapid deployment at a much lower cost. Traditional rad-hard space computers are very expensive and typically have long lead times. These computers are either based on traditional rad-hard processors, which have extremely low computational performance, or triple modular redundant (TMR) FPGA arrays, which suffer from power and complexity issues. Even more frustrating is that the TMR arrays of FPGAs require a fixed, external rad-hard voting element, thereby causing them to lose much of their reconfiguration capability and in some cases significant speed reduction. The benefits of COTS high-performance signal processing include significant increase in onboard science data processing, enabling orders of magnitude reduction in required communication bandwidth for science data return, orders of magnitude improvement in onboard mission planning and critical decision making, and the ability to rapidly respond to changing mission environments, thus enabling opportunistic science and orders of magnitude reduction in the cost of mission operations through reduction of required staff. Additional benefits of COTS-based, high-performance signal processing include the ability to leverage considerable commercial and academic investments in advanced computing tools, techniques, and infra structure, and the familiarity of the science and IT community with these computing environments.

Czajkowski, David

2011-01-01

335

Fault-Tolerant Vision for Vehicle Guidance in Agriculture  

DEFF Research Database (Denmark)

The emergence of widely available vision technologies is enabling for a wide range of automation tasks in industry and other areas. Agricultural vehicle guidance systems have benefitted from advances in 3D vision based on stereo camera technology. By automatically guiding vehicles along crops and other field structures the operator’s stress levels can be reduced. High precision steering in sensitive crops can also be maintained for longer periods of time as the driver is less tired. Safety and availabilitymust be inherent in such systems in order to get widespread market acceptance. To tolerate dropout of 3D vision, faults in classification, or other defects, redundant information should be utilized. Such information can be used to diagnose faulty behavior and to temporarily continue operation with a reduced set of sensors when faults or artifacts occur. Additional sensors include GPS receivers and inertial sensors. To fully utilize the possibilities in 3D vision, the system must also be able to learn and adapt to changing environments. By learning features of the environment new diagnostic relations can be generated by creating redundant feed-forward information about crop location. Also, by mapping the field that is seen by the stereo camera, it is possible to support the guidance system by storing salient information about the environment. By tracking the motion of the vehicle, vision output can be fused over time to create more reliable and robust estimates of crop location. This thesis approaches these challenges by considering systematic design methods using graph-based analysis. It is demonstrated how diagnostic relations can be derived and remedial actions can be done to maintain safety and healthy ii functioning of vision systems. The combination of redundant information from 3D vision, mapping, and aiding sensors such as GPS provide means to detect and isolate single faults in the system. In addition, learning is employed to adapt the system to variational changes in the natural environment. 3D vision is enhanced by learning texture and color information. Intensity gradients on small neighborhoods of pixels are shown to provide a superior approach to modeling texture information than other methods. Stochastic automatas using optimally quantized data is demonstrated as a strong approach for offline learning. It is considered how 3D vision provides labeling of training data that subsequently can be fed into a learning system. Statistical change detection theory is shown to be a suitable approach to detecting artifacts in the learning process so safe operation can be maintained. The system can be used to perform real-time classification using a fast online approach that is superior to state-of-the-art. Advances in tracking vehicle motion using 3D vision is demonstrated to allow unprecedented high accuracy maps to be created of the local environment. Features in the environment are extracted and tracked using novel feature detectors relying on approximating the Laplacian operator with a bi-level octagonal kernel. It is shown how these features display high levels of accuracy and stability while being considerable faster than similar feature detectors. Artifacts in 3D vision range measurements are demonstrated to be detectable by using the generated 3D maps and a probabilistic approach to fusing and comparing range measurements.

Blas, Morten Rufus

2010-01-01

336

Sensor and sensorless fault tolerant control for induction motors using a wavelet index.  

Science.gov (United States)

Fault Tolerant Control (FTC) systems are crucial in industry to ensure safe and reliable operation, especially of motor drives. This paper proposes the use of multiple controllers for a FTC system of an induction motor drive, selected based on a switching mechanism. The system switches between sensor vector control, sensorless vector control, closed-loop voltage by frequency (V/f) control and open loop V/f control. Vector control offers high performance, while V/f is a simple, low cost strategy with high speed and satisfactory performance. The faults dealt with are speed sensor failures, stator winding open circuits, shorts and minimum voltage faults. In the event of compound faults, a protection unit halts motor operation. The faults are detected using a wavelet index. For the sensorless vector control, a novel Boosted Model Reference Adaptive System (BMRAS) to estimate the motor speed is presented, which reduces tuning time. Both simulation results and experimental results with an induction motor drive show the scheme to be a fast and effective one for fault detection, while the control methods transition smoothly and ensure the effectiveness of the FTC system. The system is also shown to be flexible, reverting rapidly back to the dominant controller if the motor returns to a healthy state. PMID:22666016

Gaeid, Khalaf Salloum; Ping, Hew Wooi; Khalid, Mustafa; Masaoud, Ammar

2012-01-01

337

Fault tolerant techniques for spacecraft data recorders  

Science.gov (United States)

This paper presennts the techniques for improving system reliability which SEAKR Engineering employs in the design of their spacecraft solid state data recorders. Briefly, these techniques include Hamming code error correction, periodic memory scrubbing, latch-up protection, excessive capacity, redundant power suppliers/control/bus circuits, a microcode protection, and shielding.

Anderson, Scott R.

338

Checkpoint and Replication Oriented Fault Tolerant Mechanism for MapReduce Framework  

OpenAIRE

MapReduce is an emerging programming paradigm and an associated implementation for processing and generating big data which has been widely applied in data-intensive systems. In cloud environment, node and task failure is no longer accidental but a common feature of large-scale systems. In MapReduce framework, although the rescheduling based fault-tolerant method is simple to implement, it failed to fully consider the location of distributed data, the computation and storage overhead. Thus, a...

Yang Liu; Wei Wei; Yuhong Zhang

2013-01-01

339

Fault tolerant workflow scheduling based on replication and resubmission of tasks in Cloud Computing  

OpenAIRE

The aim of workflow scheduling system is to schedule the workflows within the user given deadline to achieve a good success rate. Workflow is a set of tasks processed in a predefined order based on its data and control dependency. Scheduling these workflows in a computing environment, like cloud environment, is an NP-Complete problem and it becomes more challenging when failures of tasks areconsidered. To overcome these failures, the workflow scheduling system should be fault tolerant. In thi...

Jayadivya S K; Jaya Nirmala S; Mary Saira Bhanu S

2012-01-01

340

Byzantine Fault Tolerance of Regenerating Codes  

CERN Document Server

Recent years have witnessed a slew of coding techniques custom designed for networked storage systems. Network coding inspired regenerating codes are the most prolifically studied among these new age storage centric codes. A lot of effort has been invested in understanding the fundamental achievable trade-offs of storage and bandwidth usage to maintain redundancy in presence of different models of failures, showcasing the efficacy of regenerating codes with respect to traditional erasure coding techniques. For practical usability in open and adversarial environments, as is typical in peer-to-peer systems, we need however not only resilience against erasures, but also from (adversarial) errors. In this paper, we study the resilience of generalized regenerating codes (supporting multi-repairs, using collaboration among newcomers) in the presence of two classes of Byzantine nodes, relatively benign selfish (non-cooperating) nodes, as well as under more active, malicious polluting nodes. We give upper bounds on t...

Oggier, Frédérique

2011-01-01

341

FPGA fault tolerance in particle physics experiments  

International Nuclear Information System (INIS)

The behavior of matter in physically extreme conditions is in focus of many high-energy-physics experiments. For this purpose, high energy charged particles (ions) are collided with each other and energy- or baryon densities are created similar to those at the beginning of the universe or to those which can be found in the center of neutron stars. In both cases a plasma of quarks and gluons (QGP) is present, which immediately decomposes to hadrons within a short period of time. At this process, particles are formed, which allow statements about the beginning of the universe when captured by large detectors, but which also lead to the massive occurance of hardware failures within the detector's electronic devices. This contribution is about methods to mitigate radiation susceptibility for Field Programmable Gate Arrays (FPGA), enabling them to be used within particle detector systems to directly gain valid data in the readout chain or to be used as detector-control-system.

342

Fault tolerances using toroidal zone plate encryption  

Science.gov (United States)

We present an analysis of the sensitivity to misalignment, illuminating wavelength change or construction parameters alterations for a zone plate used in an encryption method. The security system is based on a computer generated toroidal phase mask. We use a processor where this phase mask is placed in the Fourier plane of the object to be encrypted. As the original-data recovering is performed by using the conjugate of the encryption mask, a perfect match with the original conditions or the mask characteristics influences not only on the quality of the decrypted image, but also in the success of the method. Our results show that the scheme is an optimized design concerning both data storage and encryption as exhibit less degenerate noise while keeping the security standards of other methods.

Barrera, John Fredy; Henao, Rodrigo; Torroba, Roberto

2005-12-01

343

Bulk fault-tolerant quantum information processing with boundary addressability  

International Nuclear Information System (INIS)

We present a fault-tolerant (FT) semi-global control strategy for universal quantum computers. We show that an N-dimensional array of qubits where only (N-1)-dimensional addressing resolution is available is compatible with FT universal quantum computation. What is more, we show that measurements and individual control of qubits are required only at the boundaries of the FT computer. Our model alleviates the heavy physical conditions on current qubit candidates imposed by addressability requirements and represents an option for improving their scalability.

344

Data center networks topologies, architectures and fault-tolerance characteristics  

CERN Document Server

This SpringerBrief presents a survey of data center network designs and topologies and compares several properties in order to highlight their advantages and disadvantages. The brief also explores several routing protocols designed for these topologies and compares the basic algorithms to establish connections, the techniques used to gain better performance, and the mechanisms for fault-tolerance. Readers will be equipped to understand how current research on data center networks enables the design of future architectures that can improve performance and dependability of data centers. This con

Liu, Yang; Veeraraghavan, Malathi; Lin, Dong; Hamdi, Mounir

2013-01-01

345

Reliability analysis of fault-tolerant reconfigurable nano-architectures  

Energy Technology Data Exchange (ETDEWEB)

Manufacturing defects and transient errors will be abundant in high - density reconfigurable nano-scale designs. Recently, we have automated a computational scheme based on Markov Random Field (MRF) and Belief Propagation algorithms in a tool named NANOLAB to evaluate the reliability of nano architectures. In this paper, we show how our methodology can be exploited to design defect- and fault-tolerant programmable logic architectures. The effectiveness of such automation is illustrated by analyzing reconfigurable Boolean networks formed using different industry-based configurable logic blocks (CLBs), both in the presence of thermal perturbations and signal noise.

Bhaduri, D. (Debayan); Graham, P. S. (Paul S.); Shukla, S. K. (Sandeep K.)

2004-01-01

346

2009 fault tolerance for extreme-scale computing workshop, Albuquerque, NM - March 19-20, 2009.  

Energy Technology Data Exchange (ETDEWEB)

This is a report on the third in a series of petascale workshops co-sponsored by Blue Waters and TeraGrid to address challenges and opportunities for making effective use of emerging extreme-scale computing. This workshop was held to discuss fault tolerance on large systems for running large, possibly long-running applications. The main point of the workshop was to have systems people, middleware people (including fault-tolerance experts), and applications people talk about the issues and figure out what needs to be done, mostly at the middleware and application levels, to run such applications on the emerging petascale systems, without having faults cause large numbers of application failures. The workshop found that there is considerable interest in fault tolerance, resilience, and reliability of high-performance computing (HPC) systems in general, at all levels of HPC. The only way to recover from faults is through the use of some redundancy, either in space or in time. Redundancy in time, in the form of writing checkpoints to disk and restarting at the most recent checkpoint after a fault that cause an application to crash/halt, is the most common tool used in applications today, but there are questions about how long this can continue to be a good solution as systems and memories grow faster than I/O bandwidth to disk. There is interest in both modifications to this, such as checkpoints to memory, partial checkpoints, and message logging, and alternative ideas, such as in-memory recovery using residues. We believe that systematic exploration of these ideas holds the most promise for the scientific applications community. Fault tolerance has been an issue of discussion in the HPC community for at least the past 10 years; but much like other issues, the community has managed to put off addressing it during this period. There is a growing recognition that as systems continue to grow to petascale and beyond, the field is approaching the point where we don't have any choice but to address this through R&D efforts.

Katz, D. S.; Daly, J.; DeBardeleben, N.; Elnozahy, M.; Kramer, B.; Lathrop, S.; Nystrom, N.; Milfeld, K.; Sanielevici, S.; Scott, S.; Votta, L.; Louisiana State Univ.; Center for Exceptional Computing; LANL; IBM; Univ. of Illinois; Shodor Foundation; Pittsburgh Supercomputer Center; Texas Advanced Computing Center; ORNL; Sun Microsystems

2009-02-01

347

Checkpoint and Replication Oriented Fault Tolerant Mechanism for MapReduce Framework  

Directory of Open Access Journals (Sweden)

Full Text Available MapReduce is an emerging programming paradigm and an associated implementation for processing and generating big data which has been widely applied in data-intensive systems. In cloud environment, node and task failure is no longer accidental but a common feature of large-scale systems. In MapReduce framework, although the rescheduling based fault-tolerant method is simple to implement, it failed to fully consider the location of distributed data, the computation and storage overhead. Thus, a single node failure will increase the completion time dramatically. In this paper, a Checkpoint and Replication Oriented Fault Tolerant scheduling algorithm (CROFT is proposed, which takes both task and node failure into consideration. Preliminary experiments show that with less storage and network overhead. CROFT will significantly reduce the completion time at failure time, and the overall performance of MapReduce can be improved at least over 30% than original mechanism in Hadoop.  

Yang Liu

2013-09-01

348

Design and Bandwidth Analysis of Fault-Tolerant Multistage Interconnection Networks  

OpenAIRE

The design of a suitable interconnection network for inter-processor communication is one of the key issues of the system performance. In this study a new irregular interconnection network IABN (Irregular Augmented Baseline) has been proposed. IABN is designed by modifying existing ABN (Augmented Baseline Network). ABN is a regular multi-path network with limited fault tolerance. IABN provides three times more paths between any pair of source-destination in comparison to ABN. The ABN and IABN...

Aggarwal, R.; Kaur, L.

2008-01-01

349

High Performance and Fault Tolerance Double Precision Floating Point Arithmetic Units  

OpenAIRE

The floating point arithmetic units are complex in their algorithms and many scientific problems require floating point units with high accuracy. Hence for increased performance and fault tolerance operations the double precision floating point arithmetic units adder, subtractor, multiplier and divider is designed which is enough for most System on Chip (SoC) applications and it also improves the accuracy during long chain of computations. The synthesized code results are verified and the com...

Kittur Harish Maillikarju; Ravi, M. S.; Vinothkumar, N.

2013-01-01

350

A fault tolerant VLSI implementation of a nuclear control rod controller  

International Nuclear Information System (INIS)

This paper presents a VLSI implementation of a control system used for automatic control of control rods in a typical nuclear power station. Fast, efficient, and reliable control over the control rods is achieved. The design is divided into two VLSI chips that form the heart of a hybrid redundant scheme for fault tolerance. The layout was generated using the MOSIS CMOS 1-2 micron process design rules

351

The Nile fast-track implementation: fault-tolerant parallel processing of legacy CLEO data  

International Nuclear Information System (INIS)

Nile is a multi-disciplinary project building distributed parallel fault-tolerant computing for high energy physics and related fields. Nile Fast-Track is an early prototype of many key design principles of the full Nile project which is distributed computing over a wide area network. Object oriented design techniques are employed to produce a test-bed system which is extremely modular. We report on the Fast-Track project design, its status, and future plans. (author)

352

ADHOCFTSIM: A Simulator of Fault Tolerence In the AD-HOC Networks  

OpenAIRE

The flexibility and diversity of Wireless Mobile Networks offer many opportunities that are not alwaystaken into account by existing distributed systems. In particular, the proliferation of mobile users and theuse of mobile Ad-Hoc promote the formation of collaborative groups to share resources. We propose asolution for the management of fault tolerance in the Ad-Hoc networks, combining the functions neededto better availability of data. Our contribution takes into account the characteristics...

Esma Insaf Djebbar; Abderahmann Benaissa; Ali Cherif Brakeche; Ghalem Belalem

2010-01-01

353

Decoherence-Free Subspaces for Multiple-Qubit Errors (II) Universal, Fault-Tolerant Quantum Computation  

CERN Document Server

Decoherence-free subspaces (DFSs) shield quantum information from errors induced by the interaction with an uncontrollable environment. Here we study a model of correlated errors forming an Abelian subgroup (stabilizer) of the Pauli group (the group of tensor products of Pauli matrices). Unlike previous studies of DFSs, this type of errors does not involve any spatial symmetry assumptions on the system-environment interaction. We solve the problem of universal, fault-tolerant quantum computation on the associated class of DFSs.

Lidar, D A; Kempe, J; Whaley, K B; Lidar, Daniel A.; Bacon, David; Kempe, Julia

2001-01-01

354

New Results on the Fault-Tolerant Facility Placement Problem  

CERN Document Server

We studied the Fault-Tolerant Facility Placement problem (FTFP) which generalizes the uncapacitated facility location problem (UFL). In FTFP, we are given a set F of sites at which facilities can be built, and a set C of clients with some demands that need to be satisfied by different facilities. A client $j$ has demand $r_j$. Building one facility at a site $i$ incurs a cost $f_i$, and connecting one unit of demand from client $j$ to a facility at site $i\\in\\fac$ costs $d_{ij}$. $d_{ij}$'s are assumed to form a metric. A feasible solution specifies the number of facilities to be built at each site and the way to connect demands from clients to facilities, with the restriction that demands from the same client must go to different facilities. Facilities at the same site are considered different. The goal is to find a solution with minimum total cost. We gave a 1.7245-approximation algorithm to the FTFP problem. Our technique is via a reduction to the Fault-Tolerant Facility Location problem, in which each cli...

Yan, Li

2011-01-01

355

Design and Bandwidth Analysis of Fault-Tolerant Multistage Interconnection Networks  

Directory of Open Access Journals (Sweden)

Full Text Available The design of a suitable interconnection network for inter-processor communication is one of the key issues of the system performance. In this study a new irregular interconnection network IABN (Irregular Augmented Baseline has been proposed. IABN is designed by modifying existing ABN (Augmented Baseline Network. ABN is a regular multi-path network with limited fault tolerance. IABN provides three times more paths between any pair of source-destination in comparison to ABN. The ABN and IABN MINs are analyzed and compared in terms of performance parameters namely Bandwidth, Cost and Bandwidth per unit Cost. The proposed network IABN provides much better fault-tolerance and almost double bandwidth at the expanse of little more cost than ABN.

R. Aggarwal

2008-01-01

356

Fault tolerant onboard packet switch architecture for communication satellites: Shared memory per beam approach  

Science.gov (United States)

The NASA Lewis Research Center is developing a multichannel communication signal processing satellite (MCSPS) system which will provide low data rate, direct to user, commercial communications services. The focus of current space segment developments is a flexible, high-throughput, fault tolerant onboard information switching processor. This information switching processor (ISP) is a destination-directed packet switch which performs both space and time switching to route user information among numerous user ground terminals. Through both industry study contracts and in-house investigations, several packet switching architectures were examined. A contention-free approach, the shared memory per beam architecture, was selected for implementation. The shared memory per beam architecture, fault tolerance insertion, implementation, and demonstration plans are described.

Shalkhauser, Mary JO; Quintana, Jorge A.; Soni, Nitin J.

1994-01-01

357

Actuator usage and fault tolerance of the James Webb Space Telescope optical element mirror actuators  

Science.gov (United States)

The James Webb Space Telescope (JWST) telescope's secondary mirror and eighteen primary mirror segments are each actively controlled in rigid body position via six hexapod actuators. The mirrors are stowed to the mirror support structure to survive the launch environment and then must be deployed 12.5 mm to reach the nominally deployed position before the Wavefront Sensing & Control (WFS&C) alignment and phasing process begins. The actuation system is electrically, but not mechanically redundant. Therefore, with the large number of hexapod actuators, the fault tolerance of the OTE architecture and WFS&C alignment process has been carefully considered. The details of the fault tolerance will be discussed, including motor life budgeting, failure signatures, and motor life.

Barto, A.; Acton, D. S.; Finley, P.; Gallagher, B.; Hardy, B.; Knight, J. S.; Lightsey, P.

2012-09-01

358

Robust and Fault-Tolerant Linear Parameter-Varying Control of Wind Turbines  

DEFF Research Database (Denmark)

High performance and reliability are required for wind turbines to be competitive within the energy market. To capture their nonlinear behavior, wind turbines are often modeled using parameter-varying models. In this paper we design and compare multiple linear parameter-varying (LPV) controllers, designed using a proposed method that allows the inclusion of both faults and uncertainties in the LPV controller design. We specifically consider a 4.8 MW, variable-speed, variable-pitch wind turbine model with a fault in the pitch system. We propose the design of a nominal controller (NC), handling the parameter variations along the nominal operating trajectory caused by nonlinear aerodynamics. To accommodate the fault in the pitch system, an active fault-tolerant controller (AFTC) and a passive fault-tolerant controller (PFTC) are designed. In addition to the nominal LPV controller, we also propose a robust controller (RC). This controller is able to take into account model uncertainties in the aerodynamic model. The controllers are based on output feedback and are scheduled on an estimated wind speed to manage the parameter-varying nature of the model. Furthermore, the AFTC relies on information from a fault diagnosis system. The optimization problems involved in designing the PFTC and RC are based on solving bilinear matrix inequalities (BMIs) instead of linear matrix inequalities (LMIs) due to unmeasured parameter variations. Consequently, they are more difficult to solve. The paper presents a procedure, where the BMIs are rewritten into two necessary LMI conditions, which are solved using a two-step procedure. Simulation results show the performance of the LPV controllers to be superior to that of a reference controller designed based on classical principles.

Sloth, Christoffer; Esbensen, Thomas

2011-01-01

359

Fault-Tolerant Control of Wind Turbines using a Takagi-Sugeno Sliding Mode Observer  

Science.gov (United States)

In this paper, observer-based fault-tolerant control schemes for actuator and sensor faults are implemented within dynamic wind turbine simulations. The faults are directly reconstructed by means of a Takagi-Sugeno sliding mode observer. As simulation models, both a reduced-order model with 4 degrees of freedom and the aero-elastic code FAST by NREL are used. A fault-tolerant control scheme is set up by subtracting the reconstructed fault from the faulty control signal respectively sensor value. With these fault compensation schemes, the corrected controller behaviour is close to the fault-free case. The global stability of the controller in the full-load region in the presence of faults and with active fault compensation is shown by analysing the derivative of an appropriate Lyapunov function.

Georg, Sören; Schulte, Horst

2014-06-01

360

Fault tolerant workflow scheduling based on replication and resubmission of tasks in Cloud Computing  

Directory of Open Access Journals (Sweden)

Full Text Available The aim of workflow scheduling system is to schedule the workflows within the user given deadline to achieve a good success rate. Workflow is a set of tasks processed in a predefined order based on its data and control dependency. Scheduling these workflows in a computing environment, like cloud environment, is an NP-Complete problem and it becomes more challenging when failures of tasks areconsidered. To overcome these failures, the workflow scheduling system should be fault tolerant. In this paper, the proposed Fault Tolerant Workflow Scheduling algorithm (FTWS provides fault tolerance by using replication and resubmission of tasks based on priority of the tasks. The replication of tasks depends on a heuristic metric which is calculated by finding the tradeoff between the replication factor and resubmission factor. The heuristic metric is considered because replication alone may lead to resource wastage and resubmission alone may increase makespan. Tasks are prioritized based on the criticality of the task which is calculated by using parameters like out degree, earliest deadline and high resubmission impact. Priority helps in meeting the deadline of a task and thereby reducing wastage of resources. FTWS schedules workflows within a deadline even in the presence of failures without using any history of information. The experiments were conducted in a simulated cloud environment by scheduling workflows in the presence of failures which are generated randomly. The experimental results of the proposed work demonstrate the effective success rate in-spite of various failures.

Jayadivya S K

2012-06-01

361

Fault-Tolerant, Multiple-Zone Temperature Control  

Science.gov (United States)

A computer program has been written as an essential part of an electronic temperature control system for a spaceborne instrument that contains several zones. The system was developed because the temperature and the rate of change of temperature in each zone are required to be maintained to within limits that amount to degrees of precision thought to be unattainable by use of simple bimetallic thermostats. The software collects temperature readings from six platinum resistance thermometers, calculates temperature errors from the readings, and implements a proportional + integral + derivative (PID) control algorithm that adjusts heater power levels. The software accepts, via a serial port, commands to change its operational parameters. The software attempts to detect and mitigate a host of potential faults. It is robust to many kinds of faults in that it can maintain PID control in the presence of those faults.

Granger, James; Franklin, Brian; Michalik, Martin; Yates, Phillip; Peterson, Erik; Borders, James

2008-01-01

362

Fault-tolerant sub-lithographic design with rollback recovery.  

Science.gov (United States)

Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (P(f) = 10(-7)) in systems with 10(12) susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme. PMID:21730568

Naeimi, Helia; Dehon, André

2008-03-19

363

High Speed Fault Injection Tool Implemented With Verilog HDL on FPGA for Testing Fault Tolerance Designs  

Directory of Open Access Journals (Sweden)

Full Text Available This paper presents an FPGA-based fault injection tool, called FITO that supports several synthesizable fault models for dependability analysis of digital systems modeled by Verilog HDL. Using the FITO, experiments can be performed in real-time with good controllability and observability. As a case study, an Open RISC 1200 microprocessor was evaluated using an FPGA circuit. About 4000 permanent, transient, and SEUfaults were injected into this microprocessor. The results show that the FITO tool is more than 79 times faster than a pure simulation-based fault injection with only 2.5% FPGA area overhead.

G. Gopinath Reddy

2013-11-01

364

Second-order sliding mode fault-tolerant control of heat recovery steam generator boiler in combined cycle power plants  

International Nuclear Information System (INIS)

Power generation plants are intrinsically complex systems due to their numerous internal components. Higher energy efficiency in power plants is now achieved through employing combined cycles. In this article, an adaptive robust Sliding Mode Controller (SMC) is designed to overcome the faults in Heat Recovery Steam Generator boilers (HRSG boilers) as one of the main parts of a combined cycle plant. On condition that a fault occurs in the HRSG boiler, the control system must be able to reconfigure its parameters to maintain the admissible thresholds in dynamic variables such as drum pressure, steam temperature, and drum water level. To achieve good performance for the boiler, the proposed adaptive robust SMC shall conquer the effects of faults and uncertainties by estimating their upper bounds adaptively, and force the outputs of the multivariable boiler to track the outputs of a desired multivariable reference model. Manipulating a suitable control input and using second-order sliding mode control strategy, the output tracking error slides to zero on a PID sliding surface. Besides tracking, the controlled boiler tolerates faults in system matrix, faults in input matrix, and external disturbance signal. Numerical simulations confirm the effectiveness of the proposed FTC (Fault-Tolerant Control) system for an uncertain non-minimum phase HRSG boiler. Highlights: ? This paper proposes a PID-based adaptive second-order sliding mode controller (SMC). ? SMC is robust to controller (SMC). ? SMC is robust to actuator and sensor faults and tracks outputs of a reference system. ? SMC is used in fault tolerant control of a heat recovery steam generator boilers. ? Boiler and reference system have different number of states and inputs. ? Performance of SMC is investigated with different faults scenarios in simulations.

365

ROBUST FAULT TOLERANT CONTROL WITH SENSOR FAULTS FOR A FOUR-ROTOR HELICOPTER  

OpenAIRE

This paper considers the control problem for an underactuated quadrotor UAV system in presence of sensor faults. Dynamic modelling of quadrotor while taking into account various physical phenomena, which can influence the dynamics of a flying structure is presented. Subsequently, a new control strategy based on robust integral backstepping approach using sliding mode and taking into account the sensor faults is developed. Lyapunov based stability analysis shows that the proposed control strat...

Fouad Yacef; Belkacem Sait; Hicham Khebbache

2012-01-01

366

Fault Injection for Embedded Microprocessor-based Systems  

OpenAIRE

Microprocessor-based embedded systems are increasingly used to control safety-critical systems (e.g., air and railway traffic control, nuclear plant control, aircraft and car control). In this case, fault tolerance mechanisms are introduced at the hardware and software level. Debugging and verifying the correct design and implementation of these mechanisms ask for effective environments, and Fault Injection represents a viable solution for their implementation. In this paper we present a Faul...

Benso, Alfredo; Rebaudengo, Maurizio; Sonza Reorda, Matteo

1999-01-01

367

Robust and Fault Tolerant Control of CD-players  

DEFF Research Database (Denmark)

Several new standards have emerged recently in the area of portable optical data sto-rage media and more are on their way. In addition to the well known Compact Disc(CD), portable optical media now also feature media for video storage (DVDs) and ge-neral data storage media for computer purposes (CD-ROMs). DVDs can be two-sided with multiple layers, allowing read, write and rewrite operations. Most significantly in this context, the new media typically have much higher physical data densities. This constitutes a significant challenge in terms of playability (the ability to reproduce the information from non-ideal discs in non-ideal circumstances) which is the main topic this Ph.D. thesis is focused on. There are three important contributions to the technical field of study treated in the thesis. It is known that the specific characteristics of the CD-drives vary from unit to unit. Traditionally the parameter estimation is performed in closed loop, probably because open loop estimation has been stated for being very difficult or even impossible. A novel method, which requires an additional current measurement, is presented in this work where parameter estimation is accomplished in open loop in a simple and reliable way. The second main contribution is related to robust control. Usually, the nominal and uncertainty models are assumed to be known and the designer is limited to specify the performance requirements. In a more realistic situation, the designer may only have a set of complex points in the Nyquist plane from several worst-case plants as a result of measurement experiments. In the thesis a deterministic method is proposed, which generates a nominal and uncertainty model based on the set of complex points in a less conservative way than conventional methods. Finally, the third main contribution is to be found in the fault-diagnosis and fault-tolerant control fields. One of the main challenges in the positioning control of the focus point in CD-players is to handle two types of disturbances with conflicting requirements in an effective way. While a high bandwidth is desired to better suppress shocks, a low bandwidth is preferred in the presence of surface defects. Traditionally, a simple defect detector is employed to deal with this trade-off. In this work, two fault diagnosis schemes are suggested which are able not only to detect but also to separate, to certain extent, the characteristics of the signals originated by the surface defects. Furthermore two fault-tolerant control schemes are proposed such that the mentioned trade-off is handled in a more efficient way.

Vidal, Enrique Sanchez

2003-01-01

368

Relaxed fault-tolerant hardware implementation of neural networks in the presence of multiple transient errors.  

Science.gov (United States)

Reliability should be identified as the most important challenge in future nano-scale very large scale integration (VLSI) implementation technologies for the development of complex integrated systems. Normally, fault tolerance (FT) in a conventional system is achieved by increasing its redundancy, which also implies higher implementation costs and lower performance that sometimes makes it even infeasible. In contrast to custom approaches, a new class of applications is categorized in this paper, which is inherently capable of absorbing some degrees of vulnerability and providing FT based on their natural properties. Neural networks are good indicators of imprecision-tolerant applications. We have also proposed a new class of FT techniques called relaxed fault-tolerant (RFT) techniques which are developed for VLSI implementation of imprecision-tolerant applications. The main advantage of RFT techniques with respect to traditional FT solutions is that they exploit inherent FT of different applications to reduce their implementation costs while improving their performance. To show the applicability as well as the efficiency of the RFT method, the experimental results for implementation of a face-recognition computationally intensive neural network and its corresponding RFT realization are presented in this paper. The results demonstrate promising higher performance of artificial neural network VLSI solutions for complex applications in faulty nano-scale implementation environments. PMID:24807519

Mahdiani, Hamid Reza; Fakhraie, Sied Mehdi; Lucas, Caro

2012-08-01

369

Observer-Based Fault Estimation and Accomodation for Dynamic Systems  

CERN Document Server

Due to the increasing security and reliability demand of actual industrial process control systems, the study on fault diagnosis and fault tolerant control of dynamic systems has received considerable attention. Fault accommodation (FA) is one of effective methods that can be used to enhance system stability and reliability, so it has been widely and in-depth investigated and become a hot topic in recent years. Fault detection is used to monitor whether a fault occurs, which is the first step in FA. On the basis of fault detection, fault estimation (FE) is utilized to determine online the magnitude of the fault, which is a very important step because the additional controller is designed using the fault estimate. Compared with fault detection, the design difficulties of FE would increase a lot, so research on FE and accommodation is very challenging. Although there have been advancements reported on FE and accommodation for dynamic systems, the common methods at the present stage have design difficulties, whi...

Zhang, Ke; Shi, Peng

2013-01-01

370

Experimental magic state distillation for fault-tolerant quantum computing.  

Science.gov (United States)

Any physical quantum device for quantum information processing (QIP) is subject to errors in implementation. In order to be reliable and efficient, quantum computers will need error-correcting or error-avoiding methods. Fault-tolerance achieved through quantum error correction will be an integral part of quantum computers. Of the many methods that have been discovered to implement it, a highly successful approach has been to use transversal gates and specific initial states. A critical element for its implementation is the availability of high-fidelity initial states, such as |0? and the 'magic state'. Here, we report an experiment, performed in a nuclear magnetic resonance (NMR) quantum processor, showing sufficient quantum control to improve the fidelity of imperfect initial magic states by distilling five of them into one with higher fidelity. PMID:21266968

Souza, Alexandre M; Zhang, Jingfu; Ryan, Colm A; Laflamme, Raymond

2011-01-25

371

Ultrafast and Fault-Tolerant Quantum Communication across Long Distances  

Science.gov (United States)

Quantum repeaters (QRs) provide a way of enabling long distance quantum communication by establishing entangled qubits between remote locations. In this Letter, we investigate a new approach to QRs in which quantum information can be faithfully transmitted via a noisy channel without the use of long distance teleportation, thus eliminating the need to establish remote entangled links. Our approach makes use of small encoding blocks to fault-tolerantly correct both operational and photon loss errors. We describe a way to optimize the resource requirement for these QRs with the aim of the generation of a secure key. Numerical calculations indicate that the number of quantum memory bits at each repeater station required for the generation of one secure key has favorable polylogarithmic scaling with the distance across which the communication is desired.

Muralidharan, Sreraman; Kim, Jungsang; Lütkenhaus, Norbert; Lukin, Mikhail D.; Jiang, Liang

2014-06-01

372

Initial Fault Tolerance and Autonomy Results for Autonomous On-board Processing of Hyperspectral Imaging  

Science.gov (United States)

By developing Radiation Hardening by Software (RHBSW) techniques leveraged from the High Performance Computing community, our work seeks to deliver radiation tolerant, high performance System on a Chip (SoC) processors to the remote sensing community. This SoC architecture is uniquely suited to both handle high performance signal processing tasks, as well as autonomous agent processing. This allows situational awareness to be developed in-situ, resulting in a 10-100x decrease in processing latency, which directly translates into more science experiments conducted per day and a more thorough, timely analysis of captured data. With the increase in the amount of computational throughput made possible by commodity high performance processors and low overhead fault tolerance, new applications can be considered for on-board processing. A high performance and low overhead fault tolerance strategy targeting scientific applications on the SpaceCube 1.0 platform has been enhanced with initial results showing an order of magnitude increase in Mean Time Between Data Error and a complete elimination of processor hangs. Initial study of representative Hyperspectral applications also proves promising due to high levels of data parallelism and fine grained parallelism achievable within FPGA System on a Chip architectures enabled by our RHBSW techniques. To demonstrate the kinds of capabilities these fault tolerance approaches yield, the team focused on applications representative of the Decadal Survey HyspIRI mission, which uses high throughput Thermal Infrared Scanner (132 Mbps) and Hyperspectral Visibe ShortWave InfraRed (804 Mbps) instruments, while having only a 15 Mbps downlink channel. This mission provides a great many use scenarios for onboard processing, from high compression algorithms, to pre-processing and selective download of high priority images, to full on-board classification. This paper focuses on recent efforts which revolve around developing a fault emulator for the embedded PowerPC within Xilinx V4FX devices, validating the RHBSW techniques developed in the prior year, and initial performance results on a representative autonomous Hyperspectral application. In the future, fault analysis data will be refined and correlated between software fault emulation, laser testing, and space based results. This project will also deliver expected performance results on an optimized, representative Hyperspectral imaging application demonstrating autonomous operations.

French, M.; Walters, J.; Zick, K.

2011-12-01

373

Declarative Specification of Fault Tolerant Auction Protocols: The English Auction Case Study  

DEFF Research Database (Denmark)

Auction mechanisms are nowadays widely used in electronic commerce Web sites for buying and selling items among different users. The increasing importance of auction protocols in the negotiation phase is not limited to online marketplaces. In fact, the wide applicability of auctions as resource?allocation and negotiation mechanisms have also led to a great deal of interest in auctions within the agent community. A challenging issue for agents operating in open Multiagent Systems (such as the emerging semantic Web infrastructure) concerns the specification of declarative communication rules which could be published and shared allowing agents to dynamically engage well?known and trusted negotiation protocols. To cope with real?world applications, these rules should also specify fault tolerant patterns of interaction, enabling negotiating agents to interact with each other tolerating failures, for instance terminating an auction process even if some bidding agents dynamically crash. In this paper, we propose an approach to specify fault tolerant auction protocols in open and dynamic environments by means of communication rules dealing with crash failures of agents. We illustrate these concepts considering a case study about the specification of an English Auction protocol which tolerate crashes of bidding agents and we discuss its properties.

Dragoni, Nicola; Gaspari, Mauro

2012-01-01

374

Fault tolerant small satellite attitude control using adaptive non-singular terminal sliding mode  

Science.gov (United States)

The Attitude Control System (ACS) plays a pivotal role in the whole performance of the spacecraft on the orbit; therefore, it is vitally important to design the control system with the performance of rapid response, high control precision and insensitive to external perturbations. In the first place, this paper proposes two adaptive nonlinear control algorithms based on the sliding mode control (SMC), which are designed for small satellite attitude control system. The nonlinear dynamics describing the attitude of small satellite is considered in a circle reference orbit, and the stability of the closed-loop system in the presence of external perturbations is investigated. Then, in order to account for accidental or degradation fault in satellite actuators, the fault-tolerant control schemes are presented. Hence, two adaptive fault-tolerant control laws (continuous sliding mode control and non-singular terminal sliding mode control) are developed by adopting the nonlinear analytical model to describe the system, which can guarantee global asymptotic convergence of the attitude control error with the existence of unknown external perturbations. The nonlinear hyperplane based Terminal sliding mode is introduced into the control law design; therefore, the system convergence performance improves and the control error is convergent in "finite time". As a result, the study on the non-singular terminal sliding mode control is the emphasis and the continuous sliding mode control is used to compare with the non-singular terminal sliding mode control. Meanwhile, an adaptive fuzzy algorithm has been proposed to suppress the chattering phenomenon. Moreover, several numerical examples are presented to demonstrate the efficacy of the proposed controllers by correcting for the external perturbations. Simulation results confirm that the suggested methodologies yield high control precision in control. In addition, actuator degradation, actuator stuck and actuator failure for a period of time are simulated to demonstrate the fault recovery capability of the fault tolerant controllers. The numerical results clearly demonstrate the good performance of the adaptive non-singular terminal control in the event of actuator fault compare with the continuous sliding mode control.

Cao, Lu; Chen, XiaoQian; Sheng, Tao

2013-06-01

375

Asynchronous and Multiprecision Linear Solvers - Scalable and Fault-Tolerant Numerics for Energy Efficient High Performance Computing  

OpenAIRE

Asynchronous methods minimize idle times by removing synchronization barriers, and therefore allow the efficient usage of computer systems. The implied high tolerance with respect to communication latencies improves the fault tolerance. As asynchronous methods also enable the usage of the power and energy saving mechanisms provided by the hardware, they are suitable candidates for the highly parallel and heterogeneous hardware platforms that are expected for the near future.

Anzt, Hartwig

2012-01-01

376

Row fault detection system  

Science.gov (United States)

An apparatus, program product and method check for nodal faults in a row of nodes by causing each node in the row to concurrently communicate with its adjacent neighbor nodes in the row. The communications are analyzed to determine a presence of a faulty node or connection.

Archer, Charles Jens (Rochester, MN); Pinnow, Kurt Walter (Rochester, MN); Ratterman, Joseph D. (Rochester, MN); Smith, Brian Edward (Rochester, MN)

2012-02-07

377

Data Structures: Sequence Problems, Range Queries, and Fault Tolerance  

DEFF Research Database (Denmark)

The focus of this dissertation is on algorithms, in particular data structures that give provably ecient solutions for sequence analysis problems, range queries, and fault tolerant computing. The work presented in this dissertation is divided into three parts. In Part I we consider algorithms for a range of sequence analysis problems that have risen from applications in pattern matching, bioinformatics, and data mining. On a high level, each problem is dened by a function and some constraints and the job at hand is to locate subsequences that score high with this function and are not invalidated by the constraints. Many variants and similar problems have been proposed leading to several dierent approaches and algorithms. We consider problems where the function is the sum of the elements in the sequence and the constraints only bound the length of the subsequences considered. We give optimal algorithms for several variants of the problem based on a simple idea and classic algorithms and data structures. In Part II we consider range query data structures. This a category of problems where the task is to preprocess an input sequence using as little time and space as possible such that one can eciently compute a certain function on the elements in a given query subsequence. There are many types of functions that has been considered in connection with input from dierent sources. The input could be ip-data sorted by ip-address, real estate prices sorted by zip code, advertising cost sorted by time etc. We consider data structures for two classic statistics functions, namely median and mode. Finally, Part III investigates fault tolerant algorithms and data structures. This deals with the trend of avoiding elaborate error checking and correction circuitry that would impose non-negligible costs in terms of hardware performance and money in the design of todays high speed memory technologies. Hardware, power failures, and environmental conditions such as cosmic rays and alpha particles can all alter the memory in unpredictable ways. In applications where large memory capacities are needed at low cost, it makes sense to assume that the algorithms themselves are in charge for dealing with memory faults. We investigate searching, sorting and counting algorithms and data structures that provably returns sensible information in spite of memory corruptions.

JØrgensen, Allan GrØnlund

2010-01-01

378

Fault-tolerant ancilla preparation and noise threshold lower bounds for the 23-qubit Golay code  

CERN Document Server

In fault-tolerant quantum computing schemes, the overhead is often dominated by the cost of preparing codewords reliably. This cost generally increases quadratically with the block size of the underlying quantum error-correcting code. In consequence, large codes that are otherwise very efficient have found limited fault-tolerance applications. Fault-tolerant preparation circuits therefore are an important target for optimization. We study the Golay code, a 23-qubit quantum error-correcting code that protects the logical qubit to a distance of seven. In simulations, even using a naive ancilla preparation procedure, the Golay code is competitive with other codes both in terms of overhead and the tolerable noise threshold. We provide two simplified circuits for fault-tolerant preparation of Golay code-encoded ancillas. The new circuits minimize error propagation, reducing the overhead by roughly a factor of four compared to standard encoding circuits. By adapting the malignant set counting technique to depolariz...

Paetznick, Adam

2011-01-01

379

Fault tolerance techniques to assure data integrity in high-volume PACS image archives  

Science.gov (United States)

Picture archiving and communication systems (PACS) perform the systematic acquisition, archiving, and presentation of large quantities of radiological image and text data. In the UCLA Radiology PACS, for example, the volume of image data archived currently exceeds 2500 gigabytes. Furthermore, the distributed heterogeneous PACS is expected to have near real-time response, be continuously available, and assure the integrity and privacy of patient data. The off-the-shelf subsystems that compose the current PACS cannot meet these expectations; therefore fault tolerance techniques had to be incorporated into the system. This paper is to report our first-step efforts towards the goal and is organized as follows: First we discuss data integrity and identify fault classes under the PACS operational environment, then we describe auditing and accounting schemes developed for error-detection and analyze operational data collected. Finally, we outline plans for future research.

He, Yutao; Huang, Lu J.; Valentino, Daniel J.; Wingate, W. Keith; Avizienis, Algirdas

1995-05-01

380

Fault Diagnosis for Electrical Distribution Systems using Structural Analysis  

DEFF Research Database (Denmark)

Fault-tolerance in electrical distribution relies on the ability to diagnose possible faults and determine which components or units cause a problem or are close to doing so. Faults include defects in instrumentation, power generation, transformation and transmission. The focus of this paper is the design of efficient diagnostic algorithms, which is a prerequisite for fault-tolerant control of power distribution. Diagnosis in a grid depend on available analytic redundancies, and hence on network topology. When topology changes, due to earlier fault(s) or caused by maintenance, analytic redundancy relations (ARR) are likely to change. The algorithms used for diagnosis may need to change accordingly, and finding efficient methods to ARR generation is essential to employ fault-tolerant methods in the grid. Structural analysis (SA) is based on graph-theoretical results, that offer to find analytic redundancies in large sets of equations only from the structure (topology) of the equations. A salient feature is automated generation of redundancy relations. The method is indeed feasible in electrical networks where circuit theory and network topology together formulate the constraints that define a structure graph. This paper shows how three-phase networks are modelled and analysed using structural methods, and it extends earlier results by showing how physical faults can be identified such that adequate remedial actions can be taken. The paper illustrates a feasible modelling technique for structural analysis of power systems, it demonstrates detection and isolation of failures in a network, and shows how typical faults are diagnosed. Nonlinear fault simulations illustrate the results.

Knüppel, Thyge; Blanke, Mogens

2014-01-01

381

ROBUST FAULT TOLERANT CONTROL WITH SENSOR FAULTS FOR A FOUR-ROTOR HELICOPTER  

Directory of Open Access Journals (Sweden)

Full Text Available This paper considers the control problem for an underactuated quadrotor UAV system in presence of sensor faults. Dynamic modelling of quadrotor while taking into account various physical phenomena, which can influence the dynamics of a flying structure is presented. Subsequently, a new control strategy based on robust integral backstepping approach using sliding mode and taking into account the sensor faults is developed. Lyapunov based stability analysis shows that the proposed control strategy design keep the stability of the closed loop dynamics of the quadrotor UAV even after the presence of sensor failures. Numerical simulation results are provided to show the good tracking performance of proposed control laws.

Fouad Yacef

2012-03-01

382

Fault Tolerance Mechanism using Clustering for Power Saving in Wireless Sensor Networks  

Directory of Open Access Journals (Sweden)

Full Text Available The dependability of wireless device networks (WSN is laid low with faults which will occur attributable to varied reasons like malfunctioning hardware, software system glitches, dislocation, or environmental hazards, e.g. ?re or ?ood. A WSN that's not ready to take care of such things may suffer a discount in overall lifespan, or lead to hazardous consequences in important application contexts. In our proposed work we will propose the fault tolerance mechanism by finding the fail over scenario and will select the backup cluster head. In case of failure of primary cluster head, back up cluster head will take place of primary cluster head automatically. The results shows that the performance of the proposed scheme for finding better fault tolerance in wireless sensor network. Traditional SECA scheme provide good solution for finding cluster heads with good approach but proposed scheme have much better performance in finding cluster heads. The accuracy of the finding cluster heads much improved than the SECA method for energy saving in wireless sensor network. The purpose of back up cluster heads is very helpful in saving energy

Navmeet Kaur1 , Kamaljit Kaur

2013-08-01

383

Optimal Configuration of Fault-Tolerance Parameters for Distributed Server Access  

DEFF Research Database (Denmark)

Server replication is a common fault-tolerance strategy to improve transaction dependability for services in communications networks. In distributed architectures, fault-diagnosis and recovery are implemented via the interaction of the server replicas with the clients and other entities such as enhanced name servers. Such architectures provide an increased number of redundancy configuration choices. The influence of a (wide area) network connection can be quite significant and induce trade-offs between dependability and user-perceived performance. This paper develops a quantitative stochastic model using stochastic activity networks (SAN) for the evaluation of performance and dependability metrics of a generic transaction-based service implemented on a distributed replication architecture. The composite SAN model can be easily adapted to a wide range of client-server applications deployed in replicated server architectures. In order to obtain insight into the system behaviour, a set of relevant environment parameters and controllable fault-tolerance parameters are chosen and the dependability/performance trade-off is evaluated.

Daidone, Alessandro; Renier, Thibault

2013-01-01

384

LQCD workflow execution framework: Models, provenance and fault-tolerance  

International Nuclear Information System (INIS)

Large computing clusters used for scientific processing suffer from systemic failures when operated over long continuous periods for executing workflows. Diagnosing job problems and faults leading to eventual failures in this complex environment is difficult, specifically when the success of an entire workflow might be affected by a single job failure. In this paper, we introduce a model-based, hierarchical, reliable execution framework that encompass workflow specification, data provenance, execution tracking and online monitoring of each workflow task, also referred to as participants. The sequence of participants is described in an abstract parameterized view, which is translated into a concrete data dependency based sequence of participants with defined arguments. As participants belonging to a workflow are mapped onto machines and executed, periodic and on-demand monitoring of vital health parameters on allocated nodes is enabled according to pre-specified rules. These rules specify conditions that must be true pre-execution, during execution and post-execution. Monitoring information for each participant is propagated upwards through the reflex and healing architecture, which consists of a hierarchical network of decentralized fault management entities, called reflex engines. They are instantiated as state machines or timed automatons that change state and initiate reflexive mitigation action(s) upon occurrence of certain faults. We describe how this cluster reln faults. We describe how this cluster reliability framework is combined with the workflow execution framework using formal rules and actions specified within a structure of first order predicate logic that enables a dynamic management design that reduces manual administrative workload, and increases cluster-productivity.

385

A pattern-recognition-based, fault-tolerant monitoring and diagnostic technique  

International Nuclear Information System (INIS)

A properly designed monitoring and diagnostic system must be capable of detecting and distinguishing sensor and process malfunctions in the presence of signal noise, varying process states and multiple faults. The technique presented addresses these objectives through the implementation of a multivariate state estimation algorithm based upon pattern recognition methodology coupled with a statistically-based hypothesis test. Utilizing a residual signal vector generated from the difference between the estimated and measured current states of a process, disturbances are detected and identified with statistical hypothesis testing. Since the hypothesis testing utilizes the inherent noise on the signals to obtain a conclusion and the state estimation algorithm requires only a majority of the sensors to be functioning to ascertain the current state, this technique has proven to be quite robust and fault-tolerant. Several examples of its application are presented. (author)

386

Fault-tolerant linear optics quantum computation by error-detecting quantum state transfer  

CERN Document Server

A scheme for linear optical implementation of fault-tolerant quantum computation is proposed, which is based on an error-detecting code. Each computational step is mediated by transfer of quantum information into an ancilla system embedding error-detection capability. Photons are assumed to be subjected to both photon loss and depolarization, and the threshold region of their strengths for scalable quantum computation is obtained, together with the amount of physical resources consumed. Compared to currently known results, the present scheme reduces the resource requirement, while yielding a comparable threshold region.

Cho, J

2006-01-01

387

High Performance and Fault Tolerance Double Precision Floating Point Arithmetic Units  

Directory of Open Access Journals (Sweden)

Full Text Available The floating point arithmetic units are complex in their algorithms and many scientific problems require floating point units with high accuracy. Hence for increased performance and fault tolerance operations the double precision floating point arithmetic units adder, subtractor, multiplier and divider is designed which is enough for most System on Chip (SoC applications and it also improves the accuracy during long chain of computations. The synthesized code results are verified and the complete layout is generated using backend flow.

Kittur Harish Maillikarju

2013-01-01

388

P2P-MPI : A fault-tolerant Message Passing Interface Implementation for Grids  

OpenAIRE

This thesis aims to demonstrate that message-passing parallel programs can be deployed onto large, heterogeneous distributed systems. This work consists in the design and development of a proof-of-concept middleware named P2P-MPI, released under a public license. P2P-MPI alleviates this task by proposing a peer-to-peer based platform in which available resources are dynamically discovered upon job requests, and by providing a fault-tolerant message-passing library for Java programs. The motiv...

Rattanapoka, Choopan

2008-01-01

389

Universal Fault Tolerant Quantum Computation on a Class of Decoherence-Free Subspaces Without Spatial Symmetry  

CERN Document Server

Decoherence-free subspaces (DFSs) are constructed without the assumption of spatially symmetric system-bath coupling. Instead the underlying assumption is that subgroups of the full Pauli group of errors are responsible for the decoherence. The corresponding decoherence-free states can protect quantum information in the presence of multiple-qubit errors, and are stabilizer codes. It is shown how to perform universal fault tolerant quantum computation on this class of DFSs. This is the first demonstration that it is possible to use only one- and two-body quantum gates to perform full-blown quantum computation on a class of DFSs, with a finite number of measurements.

Lidar, D A; Kempe, J; Whaley, K B; Lidar, Daniel A.; Bacon, David; Kempe, Julia

2001-01-01

390

Separation of Fault Tolerance and Non-Functional Concerns: Aspect Oriented Patterns and Evaluation  

Directory of Open Access Journals (Sweden)

Full Text Available Dependable computer based systems employing fault tolerance and robust software development techniques demand additional error detection and recovery related tasks. This results in tangling of core functionality with these cross cutting non-functional concerns. In this regard current work identifies these dependability related non-functional and cross-cutting concerns and proposes design and implementation solutions in an aspect oriented framework that modularizes and separates them from core functionality. The degree of separation has been quantified using software metrics. A Lego NXT Robot based case study has been completed to evaluate the proposed design framework.

Kashif Hameed

2010-04-01

391

The Matrix Method of Determining the Fault Tolerance Degree of a Computer Network Topology  

OpenAIRE

This work presents a theoretical-graph method of determining the fault tolerance degree of the computer network interconnections and nodes. Experimental results received from simulations of this method over a distributed computing network environment are also presented.

Krivoi, Sergey; Hajder, Miroslaw; Dymora, Pawel; Mazurek, Miroslaw

2006-01-01

392

A Byzantine resilient fault tolerant computer for nuclear power plant applications  

International Nuclear Information System (INIS)

A quadruply redundant synchronous fault tolerant processor, capable of tolerating Byzantine faults, is now under fabrication at the C.S. Draper Laboratory to be used initially as a trip monitor for the Experimental Breeder Reactor EBR-II operated by the Argonne National Laboratory in Idaho Falls, Idaho. This paper describes the hardware architecture of this processor and discusses certain issues unique to quadruply redundant computers

393

Improvement of Fault Tolerance Using Checkpoint Optimization Technique in Grid Computing Environment  

OpenAIRE

Grid is an association of computer resources from several administrative domains to reach a mutual goal with an abstraction of service origination to the user. Fault tolerance is an important property in grid computing as the dependability of individual grid resources may not be guaranteed. Fault tolerant approach is useful in order to potentially prevent a malicious node affecting the overall performance of the application. In this paper, grid computing and related work are discussed.

Sumant Jain, Jyoti Choudhary

2013-01-01

394

ALLIANCE: An architecture for fault tolerant multi-robot cooperation  

International Nuclear Information System (INIS)

ALLIANCE is a software architecture that facilitates the fault tolerant cooperative control of teams of heterogeneous mobile robots performing missions composed of loosely coupled, largely independent subtasks. ALLIANCE allows teams of robots, each of which possesses a variety of high-level functions that it can perform during a mission, to individually select appropriate actions throughout the mission based on the requirements of the mission, the activities of other robots, the current environmental conditions, and the robot's own internal states. ALLIANCE is a fully distributed, behavior-based architecture that incorporates the use of mathematically modeled motivations (such as impatience and acquiescence) within each robot to achieve adaptive action selection. Since cooperative robotic teams usually work in dynamic and unpredictable environments, this software architecture allows the robot team members to respond robustly, reliably, flexibly, and coherently to unexpected environmental changes and modifications in the robot team that may occur due to mechanical failure, the learning of new skills, or the addition or removal of robots from the team by human intervention. The feasibility of this architecture is demonstrated in an implementation on a team of mobile robots performing a laboratory version of hazardous waste cleanup

395

Fault Tolerant Distributed Portfolio Optimization in Smart Grids  

DEFF Research Database (Denmark)

This work considers a portfolio of units for electrical power production and the problem of utilizing it to maintain power balance in the electrical grid. We treat the portfolio as a graph in which the nodes are distributed generators and the links are communication paths. We present a distributed optimization scheme for power balancing, where communication is allowed only between units that are linked in the graph. We include consumers with controllable consumption as an active part of the portfolio. We show that a suboptimal, but arbitrarily good power balancing can be obtained in an uncoordinated, distributed optimization framework, and argue that the scheme will work even if the computation time is limited. We further show that our approach can tolerate changes in the portfolio, in the sense that increasing or reducing the number of units in the portfolio requires only local updates. This ensures that units experiencing faults or need for maintenance can be removed from the graph without affecting theoverall performance or convergence of the optimization. The results are illustrated by numerical case studies.

Juelsgaard, Morten; Wisniewski, Rafal

2014-01-01

396

Adaptive fault-tolerant routing in hypercube multicomputers  

Science.gov (United States)

A connected hypercube with faulty links and/or nodes is called an injured hypercube. To enable any non-faulty node to communicate with any other non-faulty node, information on component failures has to be made available to non-faulty nodes so as to route messages around the faulty components. A distributed adaptive fault tolerant routing scheme is proposed in which each node is required to know only the condition of its own links. This scheme is shown to be capable of routing messages successfully as long as the number of faulty components is less than n (the dimension of the hypercube), and to route messages via shortest paths with a rather high probability. A second routing scheme based on depth-first search is proposed which works in the presence of an arbitrary number of faulty components; however, the paths chosen by this may not always be the shortest. To guarantee shortest paths, every mode must be given information beyond that on its own links; the additional information to be kept at each node for shortest-path routing is determined. Several examples are given to illustrate the results.

Chen, Ming-Syan; Shin, Kang G.

1990-01-01

397

Fault tolerant channel-encrypting quantum dialogue against collective noise  

Science.gov (United States)

In this paper, two fault tolerant channel-encrypting quantum dialogue (QD) protocols against collective noise are presented. One is against collective-dephasing noise, while the other is against collective-rotation noise. The decoherent-free states, each of which is composed of two physical qubits, act as traveling states combating collective noise. Einstein-Podolsky-Rosen pairs, which play the role of private quantum key, are securely shared between two participants over a collective-noise channel in advance. Through encryption and decryption with private quantum key, the initial state of each traveling two-photon logical qubit is privately shared between two participants. Due to quantum encryption sharing of the initial state of each traveling logical qubit, the issue of information leakage is overcome. The private quantum key can be repeatedly used after rotation as long as the rotation angle is properly chosen, making quantum resource economized. As a result, their information-theoretical efficiency is nearly up to 66.7%. The proposed QD protocols only need single-photon measurements rather than two-photon joint measurements for quantum measurements. Security analysis shows that an eavesdropper cannot obtain anything useful about secret messages during the dialogue process without being discovered. Furthermore, the proposed QD protocols can be implemented with current techniques in experiment.

Ye, TianYu

2015-04-01

398

ALLIANCE: An architecture for fault tolerant multi-robot cooperation  

Energy Technology Data Exchange (ETDEWEB)

ALLIANCE is a software architecture that facilitates the fault tolerant cooperative control of teams of heterogeneous mobile robots performing missions composed of loosely coupled, largely independent subtasks. ALLIANCE allows teams of robots, each of which possesses a variety of high-level functions that it can perform during a mission, to individually select appropriate actions throughout the mission based on the requirements of the mission, the activities of other robots, the current environmental conditions, and the robot`s own internal states. ALLIANCE is a fully distributed, behavior-based architecture that incorporates the use of mathematically modeled motivations (such as impatience and acquiescence) within each robot to achieve adaptive action selection. Since cooperative robotic teams usually work in dynamic and unpredictable environments, this software architecture allows the robot team members to respond robustly, reliably, flexibly, and coherently to unexpected environmental changes and modifications in the robot team that may occur due to mechanical failure, the learning of new skills, or the addition or removal of robots from the team by human intervention. The feasibility of this architecture is demonstrated in an implementation on a team of mobile robots performing a laboratory version of hazardous waste cleanup.

Parker, L.E.

1995-02-01

399

Load Balancing with Fault Tolerance and Optimal Resource Utilization in Grid Computing  

Directory of Open Access Journals (Sweden)

Full Text Available In grid computing, load balancing with optimal resource utilization and fault tolerance are important issues. The availability of the selected resources for job execution is a primary factor that determines the computing performance. Typically, the probability of a failure is higher in the grid computing than in a traditional parallel computing and the failure of resources affects job execution fatally. Therefore, a fault tolerance service is essential in grid. Also grid services are often expected to meet some minimum levels of Quality of Service (QoS for a desirable operation. To address this issue, we propose load balancing with optimal resource utilization and fault tolerance service that satisfies QoS requirements. A fault tolerance service deals with various types of resource failures, which include process failures, processor failures and network failures. We design and implement a fault detector and a fault manager. Approach is effective in the sense that the fault detector detects the occurrence of resource failures and the fault manager guarantees that the submitted jobs completely executed with optimal resources. The performance of job execution is improved due to job migration using Mobile Agent (MA even if some failures occurs. This MA executes one of the check pointing algorithms and its performance is compared with check pointing algorithm-using Message Passing Interface (MPI. Also the overhead generated during job migration is compared with MA and MPI.

Neeraj Nehra

2007-01-01

400

High Speed Operation and Testing of a Fault Tolerant Magnetic Bearing  

Science.gov (United States)

Research activities undertaken to upgrade the fault-tolerant facility, continue testing high-speed fault-tolerant operation, and assist in the commission of the high temperature (1000 degrees F) thrust magnetic bearing as described. The fault-tolerant magnetic bearing test facility was upgraded to operate to 40,000 RPM. The necessary upgrades included new state-of-the art position sensors with high frequency modulation and new power edge filtering of amplifier outputs. A comparison study of the new sensors and the previous system was done as well as a noise assessment of the sensor-to-controller signals. Also a comparison study of power edge filtering for amplifier-to-actuator signals was done; this information is valuable for all position sensing and motor actuation applications. After these facility upgrades were completed, the rig is believed to have capabilities for 40,000 RPM operation, though this has yet to be demonstrated. Other upgrades included verification and upgrading of safety shielding, and upgrading control algorithms. The rig will now also be used to demonstrate motoring capabilities and control algorithms are in the process of being created. Recently an extreme temperature thrust magnetic bearing was designed from the ground up. The thrust bearing was designed to fit within the existing high temperature facility. The retrofit began near the end of the summer, 04, and continues currently. Contract staff authored a NASA-TM entitled "An Overview of Magnetic Bearing Technology for Gas Turbine Engines", containing a compilation of bearing data as it pertains to operation in the regime of the gas turbine engine and a presentation of how magnetic bearings can become a viable candidate for use in future engine technology.

DeWitt, Kenneth; Clark, Daniel

2004-01-01

401

An Adaptive Job Scheduling with efficient Fault Tolerance Strategy in Computational Grid  

Directory of Open Access Journals (Sweden)

Full Text Available Grid computing is an emerging technology which has the potential to solve large scale scientific problems in an integrated heterogeneous environment. However, in the grid computing environment there are certain aspects which reduces efficiency of the system. Scheduling the jobs to the best suited resources, achieving the load balancing and fault tolerance are the key aspects to improve the efficiency and to exploit the capabilities of emergent computational systems. Because of dynamic and distributed nature of the grid, the traditional methodologies of scheduling are inefficient for the effective utilization of the available resources. In this paper, an efficient adaptive job scheduling algorithm is proposed to improve the efficiency of the grid system for a large number of tasks. Moreover, the proposed adaptive job scheduling in addition to the fault tolerance strategy with check pointing approach shows the improvement in performance of the overall computation time even in worst scenario under the heterogeneous grid environment. The simulation results illustrates that the proposed strategy effectively schedules the grid jobs with more than 10% increase in overall performance thus resulting in minimization of overall execution time.

S. Gokuldev

2014-08-01

402

Single event upset injection simulation and fault-tolerant design for image compression applications  

Science.gov (United States)

This paper describes a SEU fault injection framework. Based on the assumption of SEU effects and SEU distribution, the quantitative analysis between measured data and simulation model is investigated. By adjusting some parameters in the simulation-based framework, the proposed framework can be very possibly close to the published data and some accelerated radiation experiments. Furthermore, how the JPEG2000 based hardware architecture is sensitive to SEUs can be found out. In terms of hardware resources and operating frequencies, some fault-tolerant techniques can be introduced to the more sensitive parts, which show the framework's effectiveness in fault-tolerant design for image compression applications.

Guo, Jie; Li, Yunsong; Liu, Kai; Lei, Jie; Wu, Chengke

2012-10-01

403

A hybrid framework for design and analysis of fault-tolerant architectures for nanoscale molecular crossbar memories.  

Energy Technology Data Exchange (ETDEWEB)

It is anticipated that self assembled ultra-dense nanomemories will be more susceptible to manufacturing defects and transient faults than conventional CMOS-based memories, thus the need exists for fault-tolerant memory architectures. The development of such architectures will require intense analysis in terms of achievable performance measures - power dissipation, area, delay and reliability. In this paper, we propose and develop a hybrid automation framework, called HMAN, that aids the design and analysis of fault-tolerant architectures for nanomemories. Our framework can analyze memory architectures at two different levels of the design abstraction, namely the system and circuit levels. To the best of our knowledge, this is the first such attempt at analyzing memory systems at different levels of abstraction and then correlating the different performance measures to provide the system designers guidelines for designing a robust nanomemory. We also illustrate the application of our framework to self-assembled crossbar architectures by analyzing a hierarchical fault-tolerant crossbar-based memory architecture that we have developed, and comparing this with existing crossbar architectures.

Graham, P. S. (Paul S.); Gokhale, M. (Maya); Bhaduri, D. (Debayan); Shukla, S. K. (Sandeep K.); Coker, D. (Deji); Taylor, V. (Valerie)

2005-01-01

404

Design of a fault tolerant airborne digital computer. Volume 1: Architecture  

Science.gov (United States)

This volume is concerned with the architecture of a fault tolerant digital computer for an advanced commercial aircraft. All of the computations of the aircraft, including those presently carried out by analogue techniques, are to be carried out in this digital computer. Among the important qualities of the computer are the following: (1) The capacity is to be matched to the aircraft environment. (2) The reliability is to be selectively matched to the criticality and deadline requirements of each of the computations. (3) The system is to be readily expandable. contractible, and (4) The design is to appropriate to post 1975 technology. Three candidate architectures are discussed and assessed in terms of the above qualities. Of the three candidates, a newly conceived architecture, Software Implemented Fault Tolerance (SIFT), provides the best match to the above qualities. In addition SIFT is particularly simple and believable. The other candidates, Bus Checker System (BUCS), also newly conceived in this project, and the Hopkins multiprocessor are potentially more efficient than SIFT in the use of redundancy, but otherwise are not as attractive.

Wensley, J. H.; Levitt, K. N.; Green, M. W.; Goldberg, J.; Neumann, P. G.

1973-01-01

405

Parameter Estimation Analysis for Hybrid Adaptive Fault Tolerant Control  

Science.gov (United States)

Research efforts have increased in recent years toward the development of intelligent fault tolerant control laws, which are capable of helping the pilot to safely maintain aircraft control at post failure conditions. Researchers at West Virginia University (WVU) have been actively involved in the development of fault tolerant adaptive control laws in all three major categories: direct, indirect, and hybrid. The first implemented design to provide adaptation was a direct adaptive controller, which used artificial neural networks to generate augmentation commands in order to reduce the modeling error. Indirect adaptive laws were implemented in another controller, which utilized online PID to estimate and update the controller parameter. Finally, a new controller design was introduced, which integrated both direct and indirect control laws. This controller is known as hybrid adaptive controller. This last control design outperformed the two earlier designs in terms of less NNs effort and better tracking quality. The performance of online PID has an important role in the quality of the hybrid controller; therefore, the quality of the estimation will be of a great importance. Unfortunately, PID is not perfect and the online estimation process has some inherited issues; the online PID estimates are primarily affected by delays and biases. In order to ensure updating reliable estimates to the controller, the estimator consumes some time to converge. Moreover, the estimator will often converge to a biased value. This thesis conducts a sensitivity analysis for the estimation issues, delay and bias, and their effect on the tracking quality. In addition, the performance of the hybrid controller as compared to direct adaptive controller is explored. In order to serve this purpose, a simulation environment in MATLAB/SIMULINK has been created. The simulation environment is customized to provide the user with the flexibility to add different combinations of biases and delays to the explored derivatives. Biases were considered in the range -500% to 500% and delays in the range 0.5 to 40 seconds. The stability and control derivatives considered in this research effort are a combination of decoupled derivatives in the three channels, longitudinal, lateral, and directional. Numerous simulation scenarios and flight conditions are considered to provide more credibility to the obtained results. In addition, a statistical analysis has been conducted to assess the results. The performance of the control laws has been evaluated in terms of the integral of the error in tracking the three desired angular rates, pitch, roll, and yaw. In addition, the effort of the neural networks exerted to compensate for tracking errors is considered in the analysis as well. The results show that in order to obtain reliable estimates for the investigated derivatives, the estimator needs to generate values with less than five seconds delay. In addition, derivatives estimates are within 50% or -15% off the exact values. Moreover, the importance of updating derivatives depends on the maneuver scenario and the flight condition. The estimation process at quasi-steady state conditions provides reliable estimates as opposed to estimation during fast dynamic changes; also, the estimation process has better performance at large rate of change of derivatives values.

Eshak, Peter B.

406

Clustering and fault tolerance for target tracking using wireless sensor networks  

International Nuclear Information System (INIS)

Over the last few years, the deployment of WSNs (Wireless Sensor Networks) has been fostered in diverse applications. WSN has great potential for a variety of domains ranging from scientific experiments to commercial applications. Due to the deployment of WSNs in dynamic and unpredictable environments. They have potential to cope with variety of faults. This paper proposes an energy-aware fault-tolerant clustering protocol for target tracking applications termed as the FITf (Fault Tolerant Target Tracking) protocol The identification of RNs (Redundant Nodes) makes SN (Sensor Node) fault tolerance plausible and the clustering endorsed recovery of sensors supervised by a faulty CH (Cluster Head). The FfTT protocol intends two steps of reducing energy consumption: first, by identifying RNs in the network; secondly, by restricting the numbers of SNs sending data to the CH. Simulations validate the scalability and low power consumption of the FITf protocol in comparison with LEACH protocol. (author)