WorldWideScience
 
 
1

Fault tolerant computing systems  

CERN Multimedia

Fault tolerance involves the provision of strategies for error detection, damage assessment, fault treatment and error recovery. A survey is given of the different sorts of strategies used in highly reliable computing systems, together with an outline of recent research on the problems of providing fault tolerance in parallel and distributed computing systems. (15 refs).

Randell, B

1981-01-01

2

Fault tolerance in distributed systems  

Energy Technology Data Exchange (ETDEWEB)

Due to advances in microelectronics, the growing complexity of systems and the increasingly widespread use of information-processing systems, the aspect of system reliability and availability is acquiring new importance. Fault tolerance - the capability of a system to go on executing its specific functions despite the failure of a limited number of its subsystems - represents an approach to realizing an overall system whose reliability is assured for a definite length of time under specified operating conditions. Especially in distributed systems, the realization of fault tolerance would seem - not least on account of the inherent redundancies - to represent an important aspect of systems development. The associated problems, theoretic foundations and relevant approaches are reviewed.

Schmitter, E.

1983-02-01

3

Fault tolerant control for uncertain systems with parametric faults  

DEFF Research Database (Denmark)

A fault tolerant control (FTC) architecture based on active fault diagnosis (AFD) and the YJBK (Youla, Jarb, Bongiorno and Kucera)parameterization is applied in this paper. Based on the FTC architecture, fault tolerant control of uncertain systems with slowly varying parametric faults is investigated. Conditions are given for closed-loop stability in case of false alarms or missing fault detection/isolation.

Niemann, Hans Henrik; Poulsen, Niels KjØlstad

2006-01-01

4

Synthesis of Fault-Tolerant Embedded Systems  

DEFF Research Database (Denmark)

This work addresses the issue of design optimization for fault- tolerant hard real-time systems. In particular, our focus is on the handling of transient faults using both checkpointing with rollback recovery and active replication. Fault tolerant schedules are generated based on a conditional process graph representation. The formulated system synthesis approaches decide the assignment of fault-tolerance policies to processes, the optimal placement of checkpoints and the mapping of processes to processors, such that multiple transient faults are tolerated, transparency requirements are considered, and the timing constraints of the application are satisfied.

Eles, Petru; Izosimov, Viacheslav

2008-01-01

5

Fault Tolerance in Real Time Distributed System  

Directory of Open Access Journals (Sweden)

Full Text Available In this paper we investigate the different techniques of fault tolerance which are used in many real time distributed systems. The main focus is on types of fault occurring in the system, fault detection techniques and the recovery techniques used. A fault can occur due to link failure, resource failure or by any other reason is to be tolerated for working the system smoothly and accurately. These faults can be detected and recovered by many techniques used ccordingly. An appropriate fault detector can avoid loss due to system crash and reliable fault tolerance technique can save from system failure. This paper provides how these methods are applied to detect and tolerate faults from various Real Time Distributed Systems.

Arvind Kumar; Rama Shankar Yadav; Ranvijay, Anjali Jain

2011-01-01

6

Reconfigurable fault tolerant avionics system  

Science.gov (United States)

This paper presents the design of a reconfigurable avionics system based on modern Static Random Access Memory (SRAM)-based Field Programmable Gate Array (FPGA) to be used in future generations of nano satellites. A major concern in satellite systems and especially nano satellites is to build robust systems with low-power consumption profiles. The system is designed to be flexible by providing the capability of reconfiguring itself based on its orbital position. As Single Event Upsets (SEU) do not have the same severity and intensity in all orbital locations, having the maximum at the South Atlantic Anomaly (SAA) and the polar cusps, the system does not have to be fully protected all the time in its orbit. An acceptable level of protection against high-energy cosmic rays and charged particles roaming in space is provided within the majority of the orbit through software fault tolerance. Check pointing and roll back, besides control flow assertions, is used for that level of protection. In the minority part of the orbit where severe SEUs are expected to exist, a reconfiguration for the system FPGA is initiated where the processor systems are triplicated and protection through Triple Modular Redundancy (TMR) with feedback is provided. This technique of reconfiguring the system as per the level of the threat expected from SEU-induced faults helps in reducing the average dynamic power consumption of the system to one-third of its maximum. This technique can be viewed as a smart protection through system reconfiguration. The system is built on the commercial version of the (XC5VLX50) Xilinx Virtex5 FPGA on bulk silicon with 324 IO. Simulations of orbit SEU rates were carried out using the SPENVIS web-based software package.

Ibrahim, M. M.; Asami, K.; Cho, Mengu

7

Fault Tolerance Techniques in Distributed System  

Directory of Open Access Journals (Sweden)

Full Text Available Fault tolerance is an important issue in distributed computing. Developers of early distributed systems took a simplistic approach to providing fault tolerance: They just used another copy of the same hardware as a backup. There are various factors

Sourabh Dave; Abhishek Raghuvanshi

2012-01-01

8

Fault Tolerance in Critical Information Systems  

UK PubMed Central (United Kingdom)

Critical infrastructure applications provide services upon which society depends heavily;such applications require constant, dependable operation in the face of various failures,natural disasters, and other disruptive events that might cause a loss of service. Theseapplications are themselves dependent on distributed information systems for all aspectsof their operation, so survivability of these critical information systems is an importantissue. Survivability is the ability of a system to continue to provide service, though possiblyalternate or degraded, in the face of various types of failure and disruption. A fundamentalmechanism by which survivability can be achieved in critical information systemsis fault tolerance. Much of the literature on fault-tolerant distributed systems focuses ontolerance of local faults by detecting and masking the effects of those faults. I describe adirection for fault tolerance in the face of non-local faults---faults whose effects have significantnon-local impact, sometimes widespread and sometimes catastrophic---whereoften the effects of these faults cannot be masked using available resources. The goal is torecognize these non-local faults through detection and analysis, then to provide continuedservice (possibly alternate or degraded) by reconfiguring the system in response to thesefaults.

Dean Richard; W. Miksad; John C. Knight (advisor; Alfred C. Weaver (committee Chair; James P. Cohoon; Anita K. Jones; Matthew C. Elder

9

Backup fault tolerant computer system  

Energy Technology Data Exchange (ETDEWEB)

A parallel computer system is described having at least a first primary task performing means, a first secondary task performing means acting as a backup for the first primary task performing means, a second primary task performing means, a second secondary task performing means acting as a backup for the second primary task performing means, each of the task performing means having a task performing memory means, and a message bus means interconnecting the task performing means, the method of transferring messages among the task performing means. The system consists of: simultaneously sending on the message bus means a plurality of messages to the second primary task performing means and to the first and second secondary task performing means, each of the messages including a header, a body and an end of message indicator; the second primary task performing operating on the plurality of messages received from the first primary task performing means by initially storing the messages in a queue in its associated task performing memory means and thereafter sequentially reading the messages from the queue for processing in accordance with the task associated with the second primary task performing means; the second secondary task performing means only storing the plurality of messages received from the first primary task performing means in a corresponding queue of its associated task performing memory means unless instructed to process at least some of the messages as a result of the failure of the operation on at least one of the messages by the second primary task performing means; and the first secondary task performing means at least counting the number of messages of the plurality of messages received from the first primary task performing means.

Glazer, S.D.; Baumbach, J.; Borg, A.; Wittels, E.

1986-05-20

10

Fault-tolerant distributed measurement systems  

Energy Technology Data Exchange (ETDEWEB)

A 100 kbit/s battery-powered fault-tolerant communications network was developed for use in industrial distributed measurement systems, where a loop controller supervises up to 64 addressable field devices with a network polling period of 250ms. Safety and reliability were optimized using fibre-optic data links and low-power circuitry throughout. Based on a highly redundant loop topology of two receiver/two transmitter communications nodes, the network can tolerate any double node or any quadruple linked failure. Each node circuit is designed to operate continuously for five years using a standard D-type lithium cell, and consists essentially of a CMOS single-chip microcomputer, a specially designed CMOS communications interface chip, some analogue circuity for the optical receivers and transmitters, and interfaces for a sensor/actuator and roving hand-held terminal. The communications interface was implement on a 2436-cell CMOS gate array and feature a self-test facility which provides over 86% fault coverage using only three test vectors. The chip can also be used in the loop controller. Control procedures developed to detect, locate, and reconfigure around faults that occur in the communications network.

Gater, C.

1987-01-01

11

Fault tolerance in multiprocessor systems without dedicated redundancy  

Energy Technology Data Exchange (ETDEWEB)

This paper describes an algorithm called RAFT (recursive algorithm for fault tolerance) for achieving fault tolerance in multiprocessor systems. Through the use of a combination of dynamic space and time redundancy techniques, RAFT achieves fault tolerance in the presence of permanent as well as intermittent faults. Performance and reliability of multiprocessor system using RAFT are determined as a function of individual processor reliability and the total number of fault modes in a processor. RAFT-based systems are superior to TMR systems in hardware economy and provide comparable reliability. A multiprocessor architecture adopting RAFT is given.

Agrawal, P.

1988-03-01

12

Fault-tolerant actuator system for electrical steering of vehicles  

DEFF Research Database (Denmark)

Being critical to the safety of vehicles, the steering system is required to maintain the vehicles ability to steer until it is brought to halt, should a fault occur. With electrical steering becoming a cost-effective candidate for electrical powered vehicles, a fault-tolerant architecture is needed that meets this requirement. This paper studies the fault-tolerance properties of an electrical steering system. It presents a fault-tolerant architecture where a dedicated AC motor design used in conjunction with cheap voltage measurements can ensure detection of all relevant faults in the steering system. The paper shows how active control reconfiguration can accommodate all critical faults. The fault-tolerant abilities of the steering system are demonstrated on the hardware of a warehouse truck.

Thomsen, Jesper Sandberg; Blanke, Mogens

2006-01-01

13

Fault-tolerant Actuator System for Electrical Steering of Vehicles  

DEFF Research Database (Denmark)

Being critical to the safety of vehicles, the steering system is required to maintain the vehicles ability to steer until it is brought to halt, should a fault occur. With electrical steering becoming a cost-effective candidate for electrical powered vehicles, a fault-tolerant architecture is needed that meets this requirement. This paper studies the fault-tolerance properties of an electrical steering system. It presents a fault-tolerant architecture where a dedicated AC motor design used in conjunction with cheap voltage measurements can ensure detection of all relevant faults in the steering system. The paper shows how active control reconfiguration can accommodate all critical faults. The fault-tolerant abilities of the steering system are demonstrated on the hardware of a warehouse truck.

SØrensen, Jesper Sandberg; Blanke, Mogens

2006-01-01

14

From fault classification to fault tolerance for multi-agent systems  

CERN Multimedia

Faults are a concern for Multi-Agent Systems (MAS) designers, especially if the MAS are built for industrial or military use because there must be some guarantee of dependability. Some fault classification exists for classical systems, and is used to define faults. When dependability is at stake, such fault classification may be used from the beginning of the system's conception to define fault classes and specify which types of faults are expected. Thus, one may want to use fault classification for MAS; however, From Fault Classification to Fault Tolerance for Multi-Agent Systems argues that

Potiron, Katia; Taillibert, Patrick

2013-01-01

15

An integrated study of fault tolerance in computing systems  

Energy Technology Data Exchange (ETDEWEB)

A general framework for the design and analysis of distributed fault-tolerant systems is proposed including fault/error occurrence and detection, error propagation, fault location, retry, system reconfiguration, damage assessment, and error recovery. Detection mechanisms are usually assumed to be so perfect that problems within a particular phase of fault tolerance can be studied without considering its interplay with other phases. This dissertation shows that the assumption of imperfect detection mechanisms will greatly influence fault diagnosis, rollback recovery, and checkpointing. Two additional related problems are studied. One is concerned with the use of retry following a fault detection and the other with the optimal placement of checkpoints in a real-time task with or without the perfect detection assumption. A fault-classification scheme is developed for on-line estimation of fault parameters.

Lin, Tein-Hsiang.

1988-01-01

16

Intelligent System for Parallel Fault-Tolerant Diagnostic Tests Construction  

Directory of Open Access Journals (Sweden)

Full Text Available This investigation deals with the intelligent system for parallel fault-tolerant diagnostic tests construction. A modified parallel algorithm for fault-tolerant diagnostic tests construction is proposed. The algorithm is allowed to optimize processing time on tests construction. A matrix model of data and knowledge representation, as well as various kinds of regularities in data and knowledge are presented. Applied intelligent system for diagnostic of mental health of population which is developed with the use of intelligent system for parallel fault-tolerant DTs construction is suggested.

Anna Yankovskaya; Sergei Kitler

2013-01-01

17

Active Fault Tolerant Control of Livestock Stable Ventilation System  

DEFF Research Database (Denmark)

Modern stables and greenhouses are equipped with different components for providing a comfortable climate for animals and plant. A component malfunction may result in loss of production. Therefore, it is desirable to design a control system, which is stable, and is able to provide an acceptable degraded performance even in the faulty case. In this thesis, we have designed such controllers for climate control systems of livestock buildings in three steps: • Deriving a model for the climate control system of a pig-stable. • Designing an active fault diagnosis (AFD) algorithm for different kinds of fault. • Designing a fault tolerant control scheme for the climate control system. In the first step, a conceptual multi-zone model for climate control of a live-stock building is derived. In the next step, two methods for active fault diagnosis are proposed. The AFD methods excite the system by injecting a so-called excitation input. Two different algorithms, the EKF and a new adaptive filter, are used to detect the faults. Fault tolerant controller (FTC) is based on a switching scheme between a set of predefined passive fault tolerant controller (PFTC). In the FTC part of the thesis, first a passive fault tolerant controller (PFTC) based on state feed-back is proposed for discrete-time piecewise affine (PWA) systems. Only actuator faults are considered. Then the PFTC problem is reformulated as a feasibility of a set of linear matrix inequalities (LMIs).

Gholami, Mehdi

2011-01-01

18

Fault-tolerant computation with higher-dimensional systems  

Energy Technology Data Exchange (ETDEWEB)

Instead of a quantum computer where the fundamental units are 2-dimensional qubits, the author can consider a quantum computer made up of d-dimensional systems. There is a straightforward generalization of the class of stabilizer codes to d-dimensional systems, and he will discuss the theory of fault-tolerant computation using such codes. He proves that universal fault-tolerant computation is possible with any higher-dimensional stabilizer code for prime d.

Gottesman, D.

1998-07-01

19

Fault tolerant decentralized H? control for symmetric composite systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This note discusses a class of large-scale systems composed of symmetrically interconnected identical subsystems. We consider the decentralized H? control design problem and study the fault tolerance of the resulting system. By exploiting the special structure of the systems, a sufficient condition ...

Huang, S; Lam, J; Yang, GH; Zhang, S

20

Design of fault tolerant control system for steam generator using  

Energy Technology Data Exchange (ETDEWEB)

A controller and sensor fault tolerant system for a steam generator is designed with fuzzy logic. A structure of the proposed fault tolerant redundant system is composed of a supervisor and two fuzzy weighting modulators. A supervisor alternatively checks a controller and a sensor induced performances to identify which part, a controller or a sensor, is faulty. In order to analyze controller induced performance both an error and a change in error of the system output are chosen as fuzzy variables. The fuzzy logic for a sensor induced performance uses two variables : a deviation between two sensor outputs and its frequency. Fuzzy weighting modulator generates an output signal compensated for faulty input signal. Simulations show that the proposed fault tolerant control scheme for a steam generator regulates well water level by suppressing fault effect of either controllers or sensors. Therefore through duplicating sensors and controllers with the proposed fault tolerant scheme, both a reliability of a steam generator control and sensor system and that of a power plant increase even more. 2 refs., 9 figs., 1 tab. (Author)

Kim, Myung Ki; Seo, Mi Ro [Korea Electric Power Research Institute, Taejon (Korea, Republic of)

1998-12-31

 
 
 
 
21

Fault-Tolerant Architecture for High Performance Embedded System Applications  

UK PubMed Central (United Kingdom)

The architecture of a fault-tolerant embedded computersystem is presented. It employs multiple processors forhigh performance and dual-port memory units forinterprocessor communication. The high performanceembedded computer (HPEC) system consists of fiveprocessors that are partitioned into two sets namely thecomputing and IO partitions. The computing partition isconcerned with computational intensive tasks and itconsists of three worker processors. The IO partitionperforms general-purpose and real-time I/O related tasks.It has two interface processors with high-speed I/O andfast interrupt capabilities. The processor cores for thesepartitions are selected according to computational andhigh-speed I/O functions. The HPEC system size can beadjusted for varying needs of computing and real-time I/Owithout affecting the basic architecture features. TheHPEC architecture is fault-tolerant in terms of faultcontainment and isolation of faulty units. Reliabilitymodeling and analysis of the system indicates that itdegrades gracefully under different fault scenarios.

Gul N. Khan

22

Fault Tolerance by Replication in Parallel System  

Directory of Open Access Journals (Sweden)

Full Text Available In this paper the author has concentrated on architecture of a cluster computer and the working of them in context with parallel paradigms. Author has a keen interest on guaranteeing the working of a node efficiently and the data on it should be available at any time to run the task in parallel. The applications while running may face resource faults during execution. The application must dynamically do something to prepare for, and recover from, the expected failure. Typically, checkpointing is used to minimize the loss of computation. Checkpointing is a strategy purely local, but can be very costly. Most checkpointing techniques, however, require central storage for storing checkpoints. This results in a bottleneck and severely limits the scalability of checkpointing, while also proving to be too expensive for dedicated checkpointing networks and storage systems. The author has suggested the technique of replication implemented on it. Replication has been studied for parallel databases in general. Author has worked on parallel execution of task on a node; if it fails then self protecting feature should be turned on. Self-protecting in this context means that computer clusters should detect and handle failures automatically with the help of replication.

Madhavi Vaidya

2011-01-01

23

On fault-tolerant mechanisms in distributed systems  

Energy Technology Data Exchange (ETDEWEB)

For a system of communicating processes to be able to recover from hardware failures, there should be adequate mechanisms for process migration, checkpointing and recovery. In this dissertation, protocols are developed for each one of these aspects of fault-tolerance. A set of processes can communicate efficiently with each other, if the underlying interconnection structure matches the communication pattern of the processes. In a fault-tolerant distributed system, this requirement should be satisfied even when node failures occur. The interconnection structure of a distributed system is represented by a graph G{sub p}, where the nodes of the graph represent the processors and the edges of the graph represent communication links between the processors. For a given distributed system G{sub p} with m nodes, an interconnection structure represented by a graph G{sub n} with m + k nodes is said to be a k-fault-tolerant structure of G{sub p}, if G{sub n} {minus} F (where F is a subset of the nodes in G{sub n} with card(F) {le} K) has a subgraph isomorphic to G{sub p}. Fault-tolerant structures for systems where G{sub p} has a loop, star or star-loop structure, are presented. A reconfiguration strategy is a procedure used to migrate the processes an a failed node to a spare node. Reconfiguration strategies for loop and star systems are presented. A checkpointing and a recovery protocol are also presented for fault-tolerant distributed systems. It is shown that this checkpointing protocol requires only a minimum number of processes to save their states during each checkpointing instance. It is also shown that the recovery protocol requires only a minimum number of additional processes to rollback, following the failure of a process. The proposed protocols are non-intrusive in the sense that they do not require the processes to stop their computational activity during checkpointing or recovery.

Israel, S.R.

1988-01-01

24

Fault tolerance of the NIF power conditioning system  

International Nuclear Information System (INIS)

The tolerance of the circuit topology proposed for the National Ignition Facility (NIF) power conditioning system to specific fault conditions is investigated. A new pulsed power circuit is proposed for the NIF which is simpler and less expensive than previous ICF systems. The inherent fault modes of the new circuit are different from the conventional approach, and must be understood to ensure adequate NIF system reliability. A test-bed which simulates the NIF capacitor module design was constructed to study the circuit design. Measurements from test-bed experiments with induced faults are compared with results from a detailed circuit model. The model is validated by the measurements and used to predict the behavior of the actual NIF module during faults. The model can be used to optimize fault tolerance of the NIF module through an appropriate distribution of circuit inductance and resistance. The experimental and modeling results are presented, and fault performance is compared with the ratings of pulsed power components. Areas are identified which require additional investigation

1995-06-02

25

Testing Virtual Reconfigurable Circuit Designed For A Fault Tolerant System  

Directory of Open Access Journals (Sweden)

Full Text Available This research describes about the testing of virtual reconfigurable circuit (VRC) designed and implemented for a fault tolerant system which averages the (three) sensor inputs. The circuits that are to be tested are those which are successfully evolved in this system under different situations such as (i) all the three sensors are faultless (ii) one of the input sensor fails as open (iii) sensors fails as short circuit. The objective of this research is to test the desired optimal circuits evolved by decoding the configuration bit streams. The logic simulation tool used to perform fault simulation is AUSIM (Auburn University Simulator).

P. N. Kumar; S. Anandhi; M. Elancheralathan; J. R.P. Perinbam

2007-01-01

26

OPTIMAL CHOICE WITHIN A FAULT TOLERANT FLIGHT CONTROL SYSTEM ????? ???????????? ??????????? ?????? ? ?????????????? ???????? ???????????????? ??????? ????? ???????????? ??????????? ??????? ? ????????????? ???????? ?????????????? ???????  

Directory of Open Access Journals (Sweden)

Full Text Available  Safety of aircraft during the flight is one of the most important problems that concerns of all aviation. Failures/faults main elements automatic control system and damages to the external contour of the aircraft by foreign objects always lead to a change the characteristics of the aircraft, direct and indirect economic costs and sometimes to injury or death of passengers and crew. Real-time active fault tolerant control system makes it possible to warn or prevent emergency situations and thus improve safety. ????????? ????? ?????? ???????????? ??????????? ?????? ???????? ? ???????? ?????????? ????????????? ?????? ???????? ? ?????? ? ?????????????? ???????????????? ??????????. ?????????? ????????? ??????????? ??????, ???????????? ?? ?????????????? ??????????? ?????? ????????. ????????????? ????? ?????? ???????????? ??????????? ??????? ?????? ? ?????? ????????? ?????????? ????????? ???????? ? ??????? ? ????????????? ??????????????? ??????????. ???????????? ????????? ??????????? ???????, ?? ?????????? ?? ??????????? ????????? ????????? ????????.

Vasily Kazak; Dmitriy Shevchuk; Sergiy Bugryk; Yuri Smerechynskyy

2013-01-01

27

Computer Aided Design of Fault-Tolerant VLSI Systems  

UK PubMed Central (United Kingdom)

An ever increasing demand for affordable on-chip fault-tolerance, the inherent unreliabilityattendant upon very large scale integration (VLSI), and the overwhelming complexityof fault-tolerance have elevated the automatic design of fault-tolerant VLSI systemsinto a research problem of immediate practical relevance. In this paper, we will outline(i) a flexible methodology for compiling an algorithmic description into an equivalentfault-tolerant VLSI IC subject to an application specific policy for fault-tolerance and (ii)a framework that embodies this methodology. The framework subsumes algorithms forsynthesizing self-recovering, fault-secure, and reliable VLSI ICs from high-level algorithmicdescriptions.Keywords: Fault-Tolerance, High Level Synthesis, CAD1 IntroductionThe rapidly emerging trend towards very large scale integrated circuit implementation ofcrucial tasks in life-critical, mission-critical, and safety-critical applications (such as automobile/process contro...

Ramesh Karri; Karin Hogstedt; Alex Orailoglu

28

Summarize of Electric Vehicle Electric System Fault and Fault-tolerant Technology  

Directory of Open Access Journals (Sweden)

Full Text Available Electric vehicle drive system is a multi-variable function, running environment complexed and changeable system, so it’s failure form is complicated. In this paper, according to the fault happens in different position, establish vehicle fault table, analyze the consequences of failure may cause and the causes of failure. Combined with hardware limitations, and the maximum guarantee system performance requirements, passive software redundancy fault-tolerant strategy is put forward, give an example to analysis the pros and cons of this method.

Zhang Liwei; Huang Xianjin; Yang Yannan; Xu Chen; Liu Jie

2013-01-01

29

Design of defect/fault-tolerant, testable VLSI systems  

Energy Technology Data Exchange (ETDEWEB)

This dissertation addresses three areas. It proposes a simple methodology to quantify the metrics of area, performance, testability, and yield of a VLSI system. Change in these metrics when the design is modified to make it more testable or defect/fault-tolerant represent the cost/performance of that technique. The Tree Random Access Memory Architecture, is then presented. This is a methodology for the design of future multi-megabit Dynamic Random Access Memories so that they are easily testable, have good performance, low refresh time, and are defect/fault-tolerant. The increase in area is compensated by enhanced yield. Finally, several related issues are discussed: extensions of the TRAM architecture, wafer-scale memory systems, testing encoded memories, evaluating the cost/performance of using partial scan as a design for testability, developing benchmark models for other architectures, and integrating the modeling techniques presented.

Jarwala, N.T.

1988-01-01

30

A Game-theoretic Approach for Synthesizing Fault-Tolerant Embedded Systems  

CERN Document Server

In this paper, we present an approach for fault-tolerant synthesis by combining predefined patterns for fault-tolerance with algorithmic game solving. A non-fault-tolerant system, together with the relevant fault hypothesis and fault-tolerant mechanism templates in a pool are translated into a distributed game, and we perform an incomplete search of strategies to cope with undecidability. The result of the game is translated back to executable code concretizing fault-tolerant mechanisms using constraint solving. The overall approach is implemented to a prototype tool chain and is illustrated using examples.

Cheng, Chih-Hong; Knoll, Alois; Buckl, Christian

2010-01-01

31

A Fault Tolerant Mobile Agent Information Retrieval System  

Directory of Open Access Journals (Sweden)

Full Text Available Problem statement: Most of the information retrieval systems used only client-server architectures. The client-server model though powerful, had some limitations. In mobile computing environment which has both wired network and wireless networks with limited communication capabilities, the performance of the system was very low. Approach: Mobile agents are considered a suitable technology to develop applications such as information retrieval system for mobile computing environment. Mobile agents are autonomous and dynamic entities that can migrate between various nodes in the network. They offer many advantages over traditional design methodologies like: reduction in network load, overcoming network latency and disconnected operations. Since the mobile agents do not need continuous communication with the mobile host, they are not affected by the sudden disconnection of wireless network and the situation of turning mobile host off for power saving. In order to get the complete benefit of mobile agent system, the system must be fault tolerant. In the context of mobile agents, fault-tolerance prevents a partial or complete loss of the agent. Results: Our system in mobile computing environment ensured that the agent arrived at its destination with result and performance of the system improved by the way of reduction in the response time. And also, the system allowed sending more requests by the way of creating many mobile agents without affecting the performance. Conclusion: Our research compared the performance of client-server architecture and fault tolerant mobile agent information retrieval system and proved that our system solved the limitations faced by the client server architecture. The system can also be extended to adhoc networks.

R. Punithavathi; K. Duraiswamy

2010-01-01

32

Fault-tolerant reactor protection system  

Energy Technology Data Exchange (ETDEWEB)

A reactor protection system is disclosed having four divisions, with quad redundant sensors for each scram parameter providing input to four independent microprocessor-based electronic chassis. Each electronic chassis acquires the scram parameter data from its own sensor, digitizes the information, and then transmits the sensor reading to the other three electronic chassis via optical fibers. To increase system availability and reduce false scrams, the reactor protection system employs two levels of voting on a need for reactor scram. The electronic chassis perform software divisional data processing, vote 2/3 with spare based upon information from all four sensors, and send the divisional scram signals to the hardware logic panel, which performs a 2/4 division vote on whether or not to initiate a reactor scram. Each chassis makes a divisional scram decision based on data from all sensors. Each division performs independently of the others (asynchronous operation). All communications between the divisions are asynchronous. Each chassis substitutes its own spare sensor reading in the 2/3 vote if a sensor reading from one of the other chassis is faulty or missing. Therefore the presence of at least two valid sensor readings in excess of a set point is required before terminating the output to the hardware logic of a scram inhibition signal even when one of the four sensors is faulty or when one of the divisions is out of service. 16 figs.

Gaubatz, D.C.

1997-04-15

33

Fault-tolerant reactor protection system  

Energy Technology Data Exchange (ETDEWEB)

A reactor protection system having four divisions, with quad redundant sensors for each scram parameter providing input to four independent microprocessor-based electronic chassis. Each electronic chassis acquires the scram parameter data from its own sensor, digitizes the information, and then transmits the sensor reading to the other three electronic chassis via optical fibers. To increase system availability and reduce false scrams, the reactor protection system employs two levels of voting on a need for reactor scram. The electronic chassis perform software divisional data processing, vote 2/3 with spare based upon information from all four sensors, and send the divisional scram signals to the hardware logic panel, which performs a 2/4 division vote on whether or not to initiate a reactor scram. Each chassis makes a divisional scram decision based on data from all sensors. Each division performs independently of the others (asynchronous operation). All communications between the divisions are asynchronous. Each chassis substitutes its own spare sensor reading in the 2/3 vote if a sensor reading from one of the other chassis is faulty or missing. Therefore the presence of at least two valid sensor readings in excess of a set point is required before terminating the output to the hardware logic of a scram inhibition signal even when one of the four sensors is faulty or when one of the divisions is out of service.

Gaubatz, Donald C. (Cupertino, CA)

1997-01-01

34

Fault-tolerant reactor protection system  

International Nuclear Information System (INIS)

[en] A reactor protection system is disclosed having four divisions, with quad redundant sensors for each scram parameter providing input to four independent microprocessor-based electronic chassis. Each electronic chassis acquires the scram parameter data from its own sensor, digitizes the information, and then transmits the sensor reading to the other three electronic chassis via optical fibers. To increase system availability and reduce false scrams, the reactor protection system employs two levels of voting on a need for reactor scram. The electronic chassis perform software divisional data processing, vote 2/3 with spare based upon information from all four sensors, and send the divisional scram signals to the hardware logic panel, which performs a 2/4 division vote on whether or not to initiate a reactor scram. Each chassis makes a divisional scram decision based on data from all sensors. Each division performs independently of the others (asynchronous operation). All communications between the divisions are asynchronous. Each chassis substitutes its own spare sensor reading in the 2/3 vote if a sensor reading from one of the other chassis is faulty or missing. Therefore the presence of at least two valid sensor readings in excess of a set point is required before terminating the output to the hardware logic of a scram inhibition signal even when one of the four sensors is faulty or when one of the divisions is out of service. 16 figs

1995-07-14

35

Task allocation and reallocation for fault tolerance in multicomputer systems  

Energy Technology Data Exchange (ETDEWEB)

The goal of task allocation in a set of interconnected processors (computers) is to maximize the efficient use of resources and reduce the job turnaround time. A simple but effective method to allocate the tasks in multicomputer systems for minimizing the interprocessor communication cost subject to resource limitations defined by the system and designer. The authors demonstrate the effectiveness of the task allocation and reallocation for hardware fault tolerance by illustrations of applying the methods to different examples and practical communications network multiprocessor systems. 27 refs.

Chen, C.H.; Cherkassky, V.

1994-10-01

36

A Fully Automated Fault-tolerant System for Distributed  

UK PubMed Central (United Kingdom)

Different fields including biomedical-engineering, educational researchand geology have an increasing need to process large amountsof video and make them electronically available at different locations.So far, this has been a failure-prone tedious operation withan operator needed to babysit the processing and off-site replicationof processed video. In this work, we developed a fault-tolerantsystem that handles large scale processing and replication of digitalvideo in a fully automated manner. The system is highly resilientand handles a variety of hardware, software and network failuresmaking it possible to process videos using commodity clusters orgrid resources. Finally, we discuss how the system is being used ineducational research to process several hundred terabytes of video.Categories and Subject Descriptors: D.1.3 [Software]: ProgrammingTechniques - Concurrent ProgrammingGeneral Terms: Performance, Design, Reliability, ExperimentationKey Words: Video processing, data pipelines, distributed systems,clusters, off-site replication, fault-tolerance, grid, educationalresearch.

George Kola; Tevfik Kosar; Miron Livny

37

A Detailed Review of Fault-Tolerance Techniques in Distributed System  

Directory of Open Access Journals (Sweden)

Full Text Available In this paper, we give a survey on various fault tolerance techniques and related issues in distributed systems. More specially speaking, we talk about two most important issues; multiple fault handling capability and performance. This survey provides the related research results and also explored the future directions about fault tolerance techniques, and it is a good reference for researcher.

Sanjay Bansal; Sanjeev Sharma; Ishita Trivedi

2011-01-01

38

Fault Tolerant Feedback Control  

DEFF Research Database (Denmark)

An architecture for fault tolerant feedback controllers based on the Youla parameterization is suggested. It is shown that the Youla parameterization will give a residual vector directly in connection with the fault diagnosis part of the fault tolerant feedback controller. It turns out that there is a separation be-tween the feedback controller and the fault tolerant part. The closed loop feedback properties are handled by the nominal feedback controller and the fault tolerant part is handled by the design of the Youla parameter. The design of the fault tolerant part will not affect the design of the nominal feedback con-troller.

Stoustrup, Jakob; Niemann, H.

2001-01-01

39

Piecewise Sliding Mode Decoupling Fault Tolerant Control System  

Directory of Open Access Journals (Sweden)

Full Text Available Problem statement: Proposed method in the present study could deal with fault tolerant control system by using the so called decentralized control theory with decoupling fashion sliding mode control, dealing with subsystems instead of whole system and to the knowledge of the author there is no known computational algorithm for decentralized case, Approach: In this study we present a decoupling strategy based on the selection of sliding surface, which should be in piecewise sliding surface partition to apply the PwLTool which have as purpose in our case to delimit regions where sliding mode occur, after that as Results: We get a simple linearized model selected in those regions which could depict the complex system, Conclusion: With the 3 water tank level system as example we implement this new design scenario and since we are interested in networked control system we believe that this kind of controller implementation will not be affected by network delays.

Rafi Youssef; Hui Peng

2010-01-01

40

Abstractions for Fault-Tolerance  

UK PubMed Central (United Kingdom)

ions for Fault-ToleranceFlaviu Cristian, Computer Science and EngineeringUniversity of California, San Diego, CA 92093-0114Designing and understanding fault-tolerant distributed system architectures is notoriously difficult:one has to maintain control not only over standard (failure-free) behaviors, but also overa multitude of failure behaviors caused by component failures. The lack of clear structuringconcepts and terminology can exacerbate this difficulty. This paper complements earlier attemptsat introducing some order and discipline in this area [5], by discussing a number of basicconcepts and services that simplify the understanding and design of fault-tolerant systems.Fault-tolerance has two different meanings. First, a system is said fault-tolerant if its behaviorremains well-defined when components fail. For example, a storage service that either readscorrectly a value written previously or signals an exception is fault-tolerant in the above sense:low level bit corr...

Flaviu Cristian; Computer Science

 
 
 
 
41

Design and analysis of reliable and fault-tolerant computer systems  

CERN Document Server

Covering both the theoretical and practical aspects of fault-tolerant mobile systems, and fault tolerance and analysis, this book tackles the current issues of reliability-based optimization of computer networks, fault-tolerant mobile systems, and fault tolerance and reliability of high speed and hierarchical networks.The book is divided into six parts to facilitate coverage of the material by course instructors and computer systems professionals. The sequence of chapters in each part ensures the gradual coverage of issues from the basics to the most recent developments. A useful set of refere

Abd-El-Barr, Mostafa

2006-01-01

42

A fault tolerant superheat control strategy for supermarket refrigeration systems  

DEFF Research Database (Denmark)

In this paper, a fault tolerant control (FTC) strategy is proposed for evaporator superheat control in supermarket refrigeration systems. Conventional control uses a pressure and temperature sensor for this purpose, however, the pressure sensor can fail to function. A contingency control strategy, based on a maximum slope-seeking control method and only a single temperature sensor, is developed to drive the evaporator outlet temperature to a level that gives a suitable superheat of the refrigerant. The FTC strategy requires no a priori system knowledge or additional hardware and functions in a plug & play fashion. The strategy is outlined by means of procedural steps as well as a flow chart that also illustrates the process of automatic tuning of the maximum slope-seeking controller. Test results are furthermore presented for a display case in a full scale CO2 supermarket refrigeration system.

Vinther, Kasper; Izadi-Zamanabadi, Roozbeh

2013-01-01

43

Fault-tolerant for Electric Vehicles Drive System Sensor Failure  

Directory of Open Access Journals (Sweden)

Full Text Available When EV failure happens, it needs to take some fault-tolerant method to ensure people’s safety. When the current sensor and speed sensor are out of work, the software fault-tolerant control algorithm switching strategy can be used. This paper has done theoretical analysis of the rotor field-oriented vectoe control algorithm into the open loop constant V/F control algorithm, and the phase angle compensation method is used to reduce the shock of current and torque, and simulation is done in MATLAB/Simulink.    

Zhang Liwei; Xu Chen; Liu Jie; Wu Jialong

2013-01-01

44

Boolean Logic with Fault Tolerent Coding  

CERN Multimedia

Error detectable and error correctable coding in Hamming space was researched to discover possible fault tolerant coding constellations, which can implement Boolean lattice with fault tolerant property. Basic logic operators of the Boolean algebra were developed to apply fault tolerant coding in the logic circuits. It was shown that application of three-bit fault tolerant codes have provided the digital system skill of auto-recovery without need for designing additional-fault tolerance mechanisms.

Alagoz, B Baykant

2009-01-01

45

Passive Fault Tolerant Control of Piecewise Affine Systems Based on H Infinity Synthesis  

DEFF Research Database (Denmark)

In this paper we design a passive fault tolerant controller against actuator faults for discretetime piecewise affine (PWA) systems. By using dissipativity theory and H analysis, fault tolerant state feedback controller design is expressed as a set of Linear Matrix Inequalities (LMIs). In the current paper, the PWA system switches not only due to the state but also due to the control input. The method is applied on a large scale livestock ventilation model.

Gholami, Mehdi; Cocquempot, vincent

2011-01-01

46

An Algebra of Fault Tolerance  

CERN Multimedia

Every system of any significant size is created by composition from smaller sub-systems or components. It is thus fruitful to analyze the fault-tolerance of a system as a function of its composition. In this paper, two basic types of system composition are described, and an algebra to describe fault tolerance of composed systems is derived. The set of systems forms monoids under the two composition operators, and a semiring when both are concerned. A partial ordering relation between systems is used to compare their fault-tolerance behaviors.

Rao, Shrisha

2009-01-01

47

Tolerant control for multiple faults of sensors in VAV systems  

International Nuclear Information System (INIS)

[en] Principal component analysis, joint angle plots and reconstruction schemes are presented in this paper to detect, isolate and evaluate multiple sensors faults occurring in variable air volume (VAV) systems. Multi-level principal component analysis models, including system level and local level, are built to detect multiple faults occurring in VAV systems. As the initial detection, a system level model is used to discover the abnormalities in view of the whole systems. Two local level models are used to further confirm the occurrence of the faults. Moreover, with the multiple faults separated into different locations by the two local level detection models, joint angle plots are used, respectively, to isolate the faults one by one. Finally, the reconstruction scheme is used to estimate the magnitude of the bias to recover from the faulty operation

2007-01-01

48

Synthesis of Fault-Tolerant Embedded Systems with Checkpointing and Replication  

DEFF Research Database (Denmark)

We present an approach to the synthesis of fault-tolerant hard real-time systems for safety-critical applications. We use checkpointing with rollback recovery and active replication for tolerating transient faults. Processes are statically scheduled and communications are performed using the time-triggered protocol. Our synthesis approach decides the assignment of fault-tolerance policies to processes, the optimal placement of checkpoints and the mapping of processes to processors such that transient faults are tolerated and the timing constraints of the application are satisfied. We present several synthesis algorithms which are able to find fault-tolerant implementations given a limited amount of resources. The developed algorithms are evaluated using extensive experiments, including a real-life example.

Izosimov, Viacheslav; Pop, Paul

2006-01-01

49

Design and Assessment of a Multiple Sensor Fault Tolerant Robust Control System  

Directory of Open Access Journals (Sweden)

Full Text Available This paper presents an enhanced robust control design structure to realise fault tolerance towards sensor faults suitable for multi-input-multi-output (MIMO) systems implementation. The proposed design permits fault detection and controller elements to be designed with considerations to stability and robustness towards uncertainties besides multiple faults environment on a common mathematical platform. This framework can also cater to systems requiring fast responses. A design example is illustrated with a fast, multivariable and unstable system, that is, the double inverted pendulum system. Results indicate the potential of this design framework to handle fast systems with multiple sensor faults.

S.S. Yang; J. Chen

2008-01-01

50

Fault tolerant EHA architectures  

Science.gov (United States)

An evaluation is conducted of fault-tolerant electrohydrostatic actuator (EHA) architectures applicable to prospective military aircraft, defining fault tolerances in terms of mission-success probability and safety reliability. The functional-level failure modes of an EHA and its interfacing equipment are used to analyze levels of fault coverage and redundancy required by MIL-F-9490 and MIL-STD 882B. A summary is presented of estimates of fault tolerance, performance, and weight of candidate EHA architectures, to allow selection of an architecture suited for a specific application.

Sadeghi, Tom; Lyons, Arthur

1992-03-01

51

Fault tolerant control for nonlinear systems described by Takagi-Sugeno models  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper the problem of active fault tolerant control (FTC) in noisy systems is studied. The proposed FTC strategy is based on the known of the fault estimate and the error between the faulty system state and a reference system state. A proportional integral observer is used in order to estimat...

Kheder, Atef; Ben Othman, Kamel; Benrejeb, Mohamed; Maquin, Didier

52

Transient Fault Tolerance and System Safety Enhancement Based on System Theory  

Directory of Open Access Journals (Sweden)

Full Text Available Transient faults are hard to be detected and located due to their unpredictable nature and short duration, and they are the dominant causations of system failures, which makes it necessary to consider transient fault-tolerant design in the development of modern safety-critical industrial system. In this paper an approach based on system theory is proposed to tolerate the transient faults in tunnel construction wireless monitoring and control systems (TCWMCS), in which the effects of transient faults are expressed by dysfunction of interactions among software applications. After analyzing the dysfunctional interactions of the system by the operational process model and educing the causes of dysfunction in the functional control diagram, a safety enhancement way was proposed for the designers, in which effictive safety constraints were set up to tolerate the transient faults. The experiment evaluation indicated that the effects of transient faults could be exposed by the causal factors of dysfunctional interactions and system safety could be enhanced by the enforcement of  appropriate constraints.

Xiongfeng Huang; Chunjie Zhou; Yuanqing Qin; Ye Wang; Mingyue Yang

2011-01-01

53

Towards fault-tolerant decision support systems for ship operator guidance  

DEFF Research Database (Denmark)

Fault detection and isolation are very important elements in the design of fault-tolerant decision support systems for ship operator guidance. This study outlines remedies that can be applied for fault diagnosis, when the ship responses are assumed to be linear in the wave excitation. A novel numerical procedure is described for the calculation of residuals using the ship's transfer functions which correlate the wave excitation and the ship responses. As tests, multiplicative faults have artificially been imposed to full-scale motion measurements and it is shown that the developed model is able to detect and isolate all faults.

Nielsen, Ulrik Dam; Lajic, Zoran

2012-01-01

54

Quorums Systems as a Method to Enhance Collaboration for Achieving Fault Tolerance in Distributed System  

Directory of Open Access Journals (Sweden)

Full Text Available A system that implements the byzantine agreement algorithm is supposed to be very reliable and robust because of its fault tolerating feature. For very realistic environments, byzantine agreement protocols becomes inadequate, because they are based on the assumption that failures are controlled and they have unlimited severity. The byzantine agreement model works with a number of bounded failures that have to be tolerated. It is never concerned to identify these failures or to exclude them from the system. In this paper, we tackle quorum systems, which is a particular sort of distributed systems where some storage or computations are replicated on various machines in the idea that some of them work correctly to produce a reliable output at some given moment of time. Thus, by majority voting collaboration with quorums, one can achieve fault tolerance in distributed systems. Further, we argue that an algorithm to identify faulty-behaving machines is useful to identify purposeful malicious behaviors.

Ioan PETRI

2009-01-01

55

Fault Tolerant Control Systems : a Development Method and Real-Life Case Study  

DEFF Research Database (Denmark)

This thesis considered the development of fault tolerant control systems. The focus was on the category of automated processes that do not necessarily comprise a high number of identical sensors and actuators to maintain safe operation, but still have a potential for improving immunity to component failures. It is often feasible to increase availability for these control loops by designing the control system to perform on-line detection and reconfiguration in case of faults before the safety system makes a close-down of the process. A general development methodology is given in the thesis that carried the control system designer through the steps necessary to consider fault handling in an early design phase. It was shown how an existing control loop with interface to the plant wide control system could be extended with three additional modules to obtain fault tolerance: Fault detection and isolation, remedial action decision, and reconfiguration. The integration of these modules in software were considered. The general methodology covered the analysis, design, and implementation of fault tolerant control systems on an overall level. Two detailed studies were presented, one on fault detection and isolation design and one on design of the decision logic. Two application case studies were used to emphasize practical aspects of both the development methodology and the detailed studies. One was an electro-mechanical actuator in a position control loop for a diesel engine speed governor where the purpose was to avoid a total close-down in case of the most likely faults. The second was a fault tolerant attitude control system for a micro satellite where the operation of the system is mission critical. The purpose was to avoid hazardous effects from faults and maintain operation if possible. A method was introduced that, after a systematic examination of possible component failures, enables analysis of the relationship between failures and their consequences for the system's operation. This fault propagation analysis is based on coarse models of the subsystems describing the reaction to faults, as for example a variable being zero, low or high. Examples were given that illustrate how such models can be established by simple means, and yet provide important information when combined into a complete system. A special achievement was a method to determine how control loops behave in case of faults. This is not straight forward as the system behaviour depends on the character of the feedback. One of the detailed studies were the design of the decision logic in fault handling, realized as state-event machines. Guidelines for the design were provided, based on experience from the two case studies. Methods for verifying correct operation of the decision logic were described, where a completeness check against the fault propagation analysis is able to guarantee coverage of all considered faults. The usage of software tools to support the development process was illustrated with an off-the-shelf product for constraint logic solving and state-event machine analysis. The coarse system models and the decision logic were analyzed with the tool-box and it was shown how an easy analysis could be performed to verify correctness and completeness of the fault handling design. Experience from this study highlights requirements for a dedicated software environment for fault tolerant control systems design. The second detailed study addressed the detection of a fault event and determination of the failed component. A variety of algorithms were compared, based on two fault scenarios in the speed governor actuator setup. One was a position sensor fault and the second was an actuator current fault. The sensor fault detection was trivial, whereas the actuator fault was more challenging. The study demonstrated that many existing methods have a potential to detect and isolate the two faults, but also that the research field still misses a systematic approach to handle realistic problems such as low sampling rate and nonlinear characteristics of the system

BØgh, S.A.

1997-01-01

56

Fault detection and fault tolerant control of a smart base isolation system with magneto-rheological damper  

Science.gov (United States)

Fault detection and isolation (FDI) in real-time systems can provide early warnings for faulty sensors and actuator signals to prevent events that lead to catastrophic failures. The main objective of this paper is to develop FDI and fault tolerant control techniques for base isolation systems with magneto-rheological (MR) dampers. Thus, this paper presents a fixed-order FDI filter design procedure based on linear matrix inequalities (LMI). The necessary and sufficient conditions for the existence of a solution for detecting and isolating faults using the H_{\\infty } formulation is provided in the proposed filter design. Furthermore, an FDI-filter-based fuzzy fault tolerant controller (FFTC) for a base isolation structure model was designed to preserve the pre-specified performance of the system in the presence of various unknown faults. Simulation and experimental results demonstrated that the designed filter can successfully detect and isolate faults from displacement sensors and accelerometers while maintaining excellent performance of the base isolation technology under faulty conditions.

Wang, Han; Song, Gangbing

2011-08-01

57

The Impact of a Fault Tolerant MPI on Scalable Systems Services and Applications  

Energy Technology Data Exchange (ETDEWEB)

Exascale targeted scientific applications must be prepared for a highly concurrent computing environment where failure will be a regular event during execution. Natural and algorithm-based fault tolerance (ABFT) techniques can often manage failures more efficiently than traditional checkpoint/restart techniques alone. Central to many petascale applications is an MPI standard that lacks support for ABFT. The Run-Through Stabilization (RTS) proposal, under consideration for MPI 3, allows an application to continue execution when processes fail. The requirements of scalable, fault tolerant MPI implementations and applications will stress the capabilities of many system services. System services must evolve to efficiently support such applications and libraries in the presence of system component failures. This paper discusses how the RTS proposal impacts system services, highlighting specific requirements. Early experimentation results from Cray systems at ORNL using prototype MPI and runtime implementations are presented. Additionally, this paper outlines fault tolerance techniques targeted at leadership class applications.

Graham, Richard L [ORNL; Hursey, Joshua J [ORNL; Vallee, Geoffroy R [ORNL; Naughton, III, Thomas J [ORNL; Boehm, Swen [ORNL

2012-01-01

58

A Survey of Transformational Approaches to the Specification and Verification of Fault-Tolerant Systems  

UK PubMed Central (United Kingdom)

: Proving that a program suits its specification and thus can be called correcthas been a research subject for many years resulting in a wide range of methodsand formalisms. However, it is a common experience that even systems which havebeen proven correct can fail due to physical faults occuring in the system. As computerprograms control an increasing part of todays critical infrastructure, the notionof correctness has been extended to fault tolerance, meaning correctness in the presenceof a certain amount of faulty behavior of the environment. Formalisms to verifyfault-tolerant systems must in some form or another model faults and faulty behavior.Common ways to do this are based on a notion of transformation either at the programor the specification level. We survey the wide range of formal methods to verifyfault-tolerant systems which are based on some form of transformation. Our aim isto structure the area and relate these methods to one another. Finally we discuss the...

Felix C. Gartner

59

Active Fault Tolerant Control-FTC-Design for Takagi-Sugeno Fuzzy Systems with Weighting Functions Depending on the FTC  

Directory of Open Access Journals (Sweden)

Full Text Available In this paper the problem of active fault tolerant control design for noisy systems described by Takagi-Sugeno fuzzy models is studied. The proposed control strategy is based on the known of the fault estimated and the error between the faulty system state and a reference system state. The considered systems are affected by actuator and sensor faults and have the weighting functions depending on the fault tolerant control. A mathematical transformation is used to conceive an augmented system in which all the faults affecting the initial system appear as actuator faults. Then, an adaptive proportional integral observer is used in order to estimate the state and the faults. The problem of conception of the proportional integral observer and of the fault tolerant control strategy is formulated in linear matrices inequalities which can be solved easily. To illustrate the proposed method, It is applied to the three tanks systems.

Atef Khedher; Kamel Ben Othman; Mohamed Benrejeb

2011-01-01

60

Validation of the supervisory portion of a distributed fault tolerant control system  

Energy Technology Data Exchange (ETDEWEB)

The supervisory portion of a distributed fault tolerant control system (DFTCS) is responsible for managing redundancy, ensuring consistent control, and recovering from failures. Such software poses a unique set of challenges for validation testing. A test environment for such validation is described in this paper. Data has been collected on a total of 1000 test hours involving 2 million control actions and 700,000 randomly injected single and multiple faults. No anomalous behavior has been observed. Quantitative results include a coverage of 0.98 in the presence of an average of 2 simultaneous faults (maximum of 4 simultaneous faults) and an average response time (in presence of faults) of 10 msec when less than 2 simultaneous faults were injected. A separate long term stability test running at the Experimental Breeder Reactor II site of the Argonne National Laboratory West has been running continuously since November of 1991.

Hecht, M.; Agron, J. [SoHaR, Inc., Beverly Hills, CA (United States); Groves, C. [Argonne National Lab., IL (United States)

1992-07-01

 
 
 
 
61

STUDIES ON CONFIGURATION AND RECOVERY TECHNIQUES FOR FAULT-TOLERANT COMPUTING SYSTEMS ???????????????????????????????  

Digital Repository Infrastructure Vision for European Research (DRIVER)

It is of great importance to operate a computer system with high reliability. Several techniques to achieve the high reliability of a computer system have been proposed and implemented in the real computer systems. This dissertation discusses configuration and recovery techniques for fault-tolerant ...

??, ?; ????, ???; Fukumoto, Satoshi

62

An evaluation method of fault-tolerance for digital plant protection system in nuclear power plants  

International Nuclear Information System (INIS)

In recent years, analog based nuclear power plant (NPP) safety related instrumentation and control (I and C) systems have been replaced to modern digital based I and C systems. NPP safety related I and C systems require very high design reliability compare to the conventional digital systems so that reliability assessment is very important. In the reliability assessment of the digital system, fault tolerance evaluation is one of the crucial factors. However, the evaluation is very difficult because the digital system in NPP is very complex. In this paper, the simulation based fault injection technique on simplified processor is used to evaluate the fault-tolerance of the digital plant protection system (DPPS) with high efficiency with low cost

2005-01-01

63

An evaluation method of fault-tolerance for digital plant protection system in nuclear power plants  

Energy Technology Data Exchange (ETDEWEB)

In recent years, analog based nuclear power plant (NPP) safety related instrumentation and control (I and C) systems have been replaced to modern digital based I and C systems. NPP safety related I and C systems require very high design reliability compare to the conventional digital systems so that reliability assessment is very important. In the reliability assessment of the digital system, fault tolerance evaluation is one of the crucial factors. However, the evaluation is very difficult because the digital system in NPP is very complex. In this paper, the simulation based fault injection technique on simplified processor is used to evaluate the fault-tolerance of the digital plant protection system (DPPS) with high efficiency with low cost.

Lee, Jun Seok; Kim, Man Cheol; Seong, Poong Hyun [Korea Advanced Institute of Science and Technology, Daejeon (Korea, Republic of); Kang, Hyun Gook; Jang, Seung Cheol [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

2005-07-01

64

An Overview of Checkpointing Techniques for Fault Tolerance in Distributed Computing Systems  

Directory of Open Access Journals (Sweden)

Full Text Available Checkpointing is an important feature in distributed computing systems. It gives fault tolerance without requiring additional efforts from the programmer[1]. In order to provide fault tolerance for distributed systems, the checkpointing technique has widely been used and many researchers have been performed to reduce the overhead of checkpointing coordination. A checkpoint is a snapshot of the current state of a process. It saves enough information in non-volatile stable storage such that, if the contents of the volatile storage are lost due to process failure, one can reconstruct the process state from the information saved in the non-volatile stable storage [1].

Jagdish Makhijani Dr. Anil Rajput

2012-01-01

65

Designing an Adaptive Fault Tolerance Structure in Distributed Real Time Systems  

Directory of Open Access Journals (Sweden)

Full Text Available In this study, the Fault Tolerance CORBA (FT-CORBA) structure as a structure used for supporting fault tolerance programs as well as relative important parameters including replication style and number of replica which play further role in improved performance and making it adaptive to real time distributed system have been reviewed. Studying these specifications have been made a structure adaptive to real time systems with higher performance than FT-CORBA structure and finally the implementing of the said structure and determination of the number of replica and the objects replication style as well as the significance of related parameters have been investigated.

N. Mosharraf; M.R. Khayyambashi

2009-01-01

66

Fault Tolerance in a Multi-Layered DRE System: A Case Study  

Directory of Open Access Journals (Sweden)

Full Text Available Dynamic resource management is a crucial part of the infrastructure for emerging distributed real-time embedded systems, responsible for keeping mission-critical applications operating and allocating the resources necessary for them to meet their requirements. Because of this, the resource manager must be fault-tolerant, with nearly continuous operation. This paper describes our efforts to develop a fault-tolerant multi-layer dynamic resource management capability and the challenges we encountered, some due to the fault tolerance requirements we needed to meet and others due to characteristics of the resource management software. The challenges include the need for extremely rapid recovery; supporting the characteristics of component middleware, including peer-to-peer communication and multi-tiered calling semantics; supporting multiple languages; and the co-existence of replicated and non-replicated elements. Making our multi-layer dynamic resource manager fault-tolerant required simultaneously overcoming all of these challenges, presenting a significant fault tolerance research challenge.

Paul Rubel; Joseph Loyall; Richard Schantz; Matthew Gillen

2006-01-01

67

Energy/Reliability Trade-offs in Fault-Tolerant Event-Triggered Distributed Embedded Systems  

DEFF Research Database (Denmark)

This paper presents an approach to the synthesis of low-power fault-tolerant hard real-time applications mapped on distributed heterogeneous embedded systems. Our synthesis approach decides the mapping of tasks to processing elements, as well as the voltage and frequency levels for executing each task, such that transient faults are tolerated, the timing constraints of the application are satisfied, and the energy consumed is minimized. Tasks are scheduled using fixed-priority preemptive scheduling, while replication is used for recovery from multiple transient faults. Addressing energy and reliability simultaneously is especially challenging, since lowering the voltage to reduce the energy consumption has been shown to increase the transient fault rate. We presented a Tabu Search-based approach which uses an energy/reliability trade-off model to find reliable and schedulable implementations with limited energy and hardware resources. We evaluated the algorithm proposed using several synthetic and reallife benchmarks.

Gan, Junhe; Gruian, Flavius

2011-01-01

68

Reliability Monitoring of Fault Tolerant Control Systems with Demonstration on an Aircraft Model  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper proposes a reliability monitoring scheme for active fault tolerant control systems using a stochastic modeling method. The reliability index is defined based on system dynamical responses and a safety region; the plant and controller are assumed to have a multiple regime model structure, ...

Hongbin Li; Qing Zhao; Zhenyu Yang

69

On-line intelligent fault-tolerant control for catastrophic system failures  

Science.gov (United States)

While most research attention has been focused on fault detection and diagnosis, much less research effort has been dedicated to `general' failure accommodation. Due to the inherent complexity of nonlinear systems, most of model- based analytical redundancy fault diagnosis and accommodation studies deal with the linear system that is subject to simple additive or multiplicative faults. This assumption has limited the effectiveness and usefulness in practical applications. In this research work, the on-line fault accommodation control problems under catastrophic system failures are investigated. The main interest is focused on dealing with the unanticipated system component failures in the most general formulation. Through discrete- time Lyapunov stability theory, the necessary and sufficient conditions to guarantee the system on-line stability and performance under failures are derived and a systematic procedure and technique for proper fault accommodation under the unanticipated failures are developed. A complete architecture of fault diagnosis and accommodation has also been presented by incorporating the developed intelligent fault tolerant control scheme with a cost-effective fault detection scheme and a multiple-model based failure diagnosis process to efficiently handle the false alarms and the accommodation of both the anticipated and unanticipated failures in on-line situations.

Yen, Gary G.; Ho, Liang-Wei

2001-07-01

70

A Fault tolerant Control Supervisory System development Procedurefor Small Satellites : The AAUSAT-II case  

DEFF Research Database (Denmark)

The paper presents a stepwise procedure to develop a fault tolerant control system for small satellites. The procedure is illustrated through implementation on the AAUSAT-II spacecraft. As it is shown the presented procedure requires expertise from several disciplines that are nevertheless necessary for obtaining a complete and consistent solution.

Izadi-Zamanabadi, Roozbeh; Larsen, Jesper Abildgaard

71

An Introduction to Software Engineering and Fault Tolerance  

CERN Multimedia

This book consists of the chapters describing novel approaches to integrating fault tolerance into software development process. They cover a wide range of topics focusing on fault tolerance during the different phases of the software development, software engineering techniques for verification and validation of fault tolerance means, and languages for supporting fault tolerance specification and implementation. Accordingly, the book is structured into the following three parts: Part A: Fault tolerance engineering: from requirements to code; Part B: Verification and validation of fault tolerant systems; Part C: Languages and Tools for engineering fault tolerant systems.

Pelliccione, Patrizio; Guelfi, Nicolas; Romanovsky, Alexander

2010-01-01

72

A Review of Checkpointing Based Fault Tolerance Techniques in Mobile Distributed Systems  

Directory of Open Access Journals (Sweden)

Full Text Available Fault Tolerance Techniques enable systems to perform tasks in the presence of faults. A checkpoint is a local state of a process saved on stable storage. In a distributed system, since the processes in the system do not share memory, a global state of the system isdefined as a set of local states, one from each process. In case of a fault in distributed systems, checkpointing enables the execution of a program to be resumed from a previous consistent global state rather than resuming the execution from the beginning. In this way, the amount of useful processing lost because of the fault is significantly reduced. Checkpointing is an effective fault tolerant technique in distributed system as it avoids the domino effect and require minimum storage requirement. Most of the earlier coordinated checkpoint algorithms block their computation during checkpointing and forces minimum-process or nonblocking even though many of them may not be necessary or non-blocking minimum-process but takes useless checkpoints or reduced useless checkpoint but has higher synchronization message overhead or has high checkpoint request propagation time. In this paper, we discuss various issues related to the checkpointing for distributed systems and mobile computing environments. We also present a survey of some checkpointing algorithms for distributed systems.

Rachit Garg,; Praveen Kumar

2010-01-01

73

Distributed Fault-Tolerant Avionic Systems - A Real-Time Perspective  

CERN Multimedia

This paper examines the problem of introducing advanced forms of fault-tolerance via reconfiguration into safety-critical avionic systems. This is required to enable increased availability after fault occurrence in distributed integrated avionic systems(compared to static federated systems). The approach taken is to identify a migration path from current architectures to those that incorporate re-configuration to a lesser or greater degree. Other challenges identified include change of the development process; incremental and flexible timing and safety analyses; configurable kernels applicable for safety-critical systems.

Burke, Michael

2010-01-01

74

BYZANTINE FAULT TOLERANCE MODEL FOR SOAP FAULTS  

Directory of Open Access Journals (Sweden)

Full Text Available The proposed model is to configure Byzantine Fault Tolerance mechanism for every SOAP fault message that is transmitted. The reliability and availability are of major requirements of Web services since they operate in the distributed environment. One of the reliability issues is handling faults. Fault occurs in all the phases of Service Oriented Architecture i.e. during publishing, discovery, composition, binding, and execution. These faults maylead to service downtime, behaves abnormally, and may send incorrect responses. These abnormalities are classified as Byzantine faults in Web services. Even though SOAP specification provides fault handlingmechanisms, the correctness of the received SOAP fault messages are not known. In this paper, a model is proposed to check the correctness of the SOAP fault message received, by incorporating the Byzantine agreement for fault tolerance. The existing fault tolerant mechanism detects server failure and routes the request to the next available server without the knowledge of the client. The proposed model ensures a transparent environment by providing fault handling information to the client. This is achieved by incorporating an activereplication technique.

S. Murugan; V. Ramachandran

2012-01-01

75

Stochastic Models for Fault Tolerance  

CERN Document Server

As modern society relies on the fault-free operation of complex computing systems, system fault-tolerance has become an indispensable requirement. Therefore, we need mechanisms that guarantee correct service in cases where system components fail, be they software or hardware elements. Redundancy patterns are commonly used, for either redundancy in space or redundancy in time. Wolter's book details methods of redundancy in time that need to be issued at the right moment. In particular, she addresses the so-called "timeout selection problem", i.e., the question of choosing the right ti

Wolter, Katinka M

2010-01-01

76

Modeling and Design of Fault-Tolerant and Self-Adaptive Reconfigurable Networked Embedded Systems  

Directory of Open Access Journals (Sweden)

Full Text Available Automotive, avionic, or body-area networks are systems that consist of several communicating control units specialized for certain purposes. Typically, different constraints regarding fault tolerance, availability and also flexibility are imposed on these systems. In this article, we will present a novel framework for increasing fault tolerance and flexibility by solving the problem of hardware/software codesign online. Based on field-programmable gate arrays (FPGAs) in combination with CPUs, we allow migrating tasks implemented in hardware or software from one node to another. Moreover, if not enough hardware/software resources are available, the migration of functionality from hardware to software or vice versa is provided. Supporting such flexibility through services integrated in a distributed operating system for networked embedded systems is a substantial step towards self-adaptive systems. Beside the formal definition of methods and concepts, we describe in detail a first implementation of a reconfigurable networked embedded system running automotive applications.

Streichert Thilo; Koch Dirk; Haubelt Christian; Teich Jürgen

2006-01-01

77

Reliability Monitoring of Fault Tolerant Control Systems with Demonstration on an Aircraft Model  

Directory of Open Access Journals (Sweden)

Full Text Available This paper proposes a reliability monitoring scheme for active fault tolerant control systems using a stochastic modeling method. The reliability index is defined based on system dynamical responses and a safety region; the plant and controller are assumed to have a multiple regime model structure, and a semi-Markov model is built for reliability evaluation based on the safety behavior of each regime model estimated by using Monte Carlo simulation. Moreover, the history data of fault detection and isolation decisions is used to update its transition characteristics and reliability model. This method provides an up-to-date reliability index as demonstrated on an aircraft model.

Hongbin Li; Qing Zhao; Zhenyu Yang

2007-01-01

78

Assumptions for fault tolerant quantum computing  

Energy Technology Data Exchange (ETDEWEB)

Assumptions useful for fault tolerant quantum computing are stated and briefly discussed. We focus on assumptions related to properties of the computational system. The strongest form of the assumptions seems to be sufficient for achieving highly fault tolerant quantum computation. We discuss weakenings which are also likely to suffice.

Knill, E.; Laflamme, R.

1996-06-01

79

Design Optimization of Time- and Cost-Constrained Fault-Tolerant Embedded Systems with Checkpointing and Replication  

DEFF Research Database (Denmark)

We present an approach to the synthesis of fault-tolerant hard real-time systems for safety-critical applications. We use checkpointing with rollback recovery and active replication for tolerating transient faults. Processes and communications are statically scheduled. Our synthesis approach decides the assignment of fault-tolerance policies to processes, the optimal placement of checkpoints and the mapping of processes to processors such that multiple transient faults are tolerated and the timing constraints of the application are satisfied. We present several design optimization approaches which are able to find fault-tolerant implementations given a limited amount of resources. The developed algorithms are evaluated using extensive experiments, including a real-life example.

Pop, Paul; Izosimov, Viacheslav

2009-01-01

80

Guaranteed Cost Fault-tolerant Controller Design of Networked Control Systems under Variable-period Sampling  

Directory of Open Access Journals (Sweden)

Full Text Available This study investigates the problem of integrity against actuator failures for networked control systems under variable-period sampling. Assuming that the distance between any two consecutive sampling instants is less than a given bound, by using the input delay approach, the networked control systems under variable-period sampling are transformed into the continuous-time networked control systems under time-varying delays. Then the existence conditions of guaranteed cost fault-tolerant control law is testified in terms of the Lyapunov stability theory combined with Linear Matrix Inequalities (LMIs). Furthermore, the guaranteed cost fault-tolerant controller gain and the minimization guaranteed cost can be obtained by solving a minimization problem. A numerical simulation example demonstrates the conclusions are feasible and effective. The proposed control method resolves the problems of variable-period sampling and actuator failures, which meets the requirements in industrial networked control systems.

Xuan Li; Xiao-Bei Wu

2009-01-01

 
 
 
 
81

Fault-Tolerant Identification in Wireless Sensor Networks for Maximizing System Lifetime  

Directory of Open Access Journals (Sweden)

Full Text Available Wireless Sensor Network (WSN) is used by manyapplications such as security, command and control andsurveillance monitoring. In all such applications, themain application of WSN is sensing data and retrieval ofdata. There are many WSN systems that are querybased. They give responses in a stipulated time based onthe user’s query word. However, the WSN has possiblesensor faults for it is not reliable and thus the networkenergy level goes down. It results in reduction of lifetimeof network. To overcome the fault tolerance mechanismscan be used to improve reliability of the finding failurenodes and recovered by cluster heads. This paperpresents an algorithm that can effectively increaselifetime of WSN besides satisfying the QoS requirementsof application. Such algorithm is adaptive and also faulttolerant. It uses path and source redundancy and basedon hop-by-hop data delivery. Empirical simulationresults revealed that the proposed system is feasible. Thissystem also proposed the authentication of all kinds ofidentified faults and provides the services in qualitymanner. It increases the data flow and reduces the faults

Middela Shailaja; AnandaRaj S.P; Poornima.S

2012-01-01

82

New approach to the synchronisation problem in dynamic fault tolerant control systems  

Energy Technology Data Exchange (ETDEWEB)

The authors present a new design of dynamic computer architecture which improves fault tolerance in microprocessor-based control systems. It consists of three or more processors which are linked using a new device called a bus collator. This has the capability of configuring the system into TMR, independent of distributing mode. The important new feature is that the processors work asynchronously all the time, and scheduling is achieved at the task level. 6 references.

Rave, O.L.; Gillies, D.F.

1982-01-01

83

Fault tolerant safety related computer based process control system for TAPP- 3 and 4  

International Nuclear Information System (INIS)

[en] Computer based control systems for safety related applications in nuclear power plants have to meet not only the functional, performance and interface requirements, but in addition, they have to meet regulatory requirements like enhanced reliability, safety and security. While meeting these stringent requirements, such computer based systems also need to ensure high availability. Availability of these safety related systems has a direct influence on commercial operation of the NPP and on the availability of several megawatts of electrical power to the national grid. Several design features such as fault tolerance, on-line diagnostics and self-supervision etc. are to be incorporated in the computer system architecture, hardware design and software design to meet high reliability and high availability criteria. Reactor Control Division (RCnD) has designed and developed 'Dual Processor Hot Standby' (DPHS) fault tolerant architecture, which not only meets the safety requirements but also provides very high availability. The fault tolerant features of DPHS architecture and the design of Process Control System based on DPHS architecture (DPH5-PCS) for TAPP-3 and 4 are highlighted in this paper. DPH5-PCS for Tarapur Atomic Power Project (TAPP) -3 and 4 regulates Primary Heat Transport (PHT) system pressure, Pressuriser pressure, Pressuriser level, Bleed condenser pressure, Bleed condenser level and Steam generator pressure. (author)

2005-01-01

84

Active Fault Tolerant Control-FTC-Design for Takagi-Sugeno Fuzzy Systems with Weighting Functions Depending on the FTC  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper the problem of active fault tolerant control design for noisy systems described by Takagi-Sugeno fuzzy models is studied. The proposed control strategy is based on the known of the fault estimated and the error between the faulty system state and a reference system state. The consider...

Atef Khedher; Kamel Ben Othman; Mohamed Benrejeb

85

A Decoding Approach to Fault Tolerant Control of Linear Systems with Quantized Disturbance Input  

CERN Multimedia

The aim of this paper is to propose an alternative method to solve a Fault Tolerant Control problem. The model is a linear system affected by a disturbance term: this represents a large class of technological faulty processes. The goal is to make the system able to tolerate the undesired perturbation, i.e., to remove or at least reduce its negative effects; such a task is performed in three steps: the detection of the fault, its identification and the consequent process recovery. When the disturbance function is known to be \\emph{quantized} over a finite number of levels, the detection can be successfully executed by a recursive \\emph{decoding} algorithm, arising from Information and Coding Theory and suitably adapted to the control framework. This technique is analyzed and tested in a flight control issue; both theoretical considerations and simulations are reported.

Fosson, Sophie M

2010-01-01

86

Reliability analysis of fault tolerant computer based core temperature monitoring system of PFBR  

International Nuclear Information System (INIS)

The architecture of fault tolerant computer based core temperature monitoring system (CTMS) of Prototype Fast Breeder Reactor (PFBR) and its hardware Reliability evaluation are presented in this paper. In PFBR, Protection against individual fuel subassembly blockage is provided by Real time computer based system (RTCS). This process involves computations in every cycle and hence, a Real Time Computer is used. This paper elaborates the importance of RTCS and how the fault-tolerant architecture (Triple modular redundancy) is arrived with the help of Reliability evaluation. The operational and safety requirements, i.e., both safe and unsafe failures are analyzed. Common Cause Failure analysis has also been carried out to arrive at the results. (author)

2006-01-01

87

Fault Tolerant Control: A Simultaneous Stabilization Result  

DEFF Research Database (Denmark)

This paper discusses the problem of designing fault tolerant compensators that stabilize a given system both in the nominal situation, as well as in the situation where one of the sensors or one of the actuators has failed. It is shown that such compensators always exist, provided that the system is detectable from each output and that it is stabilizable. The proof of this result is constructive, and a worked example shows how to design a fault tolerant compensator for a simple, yet challeging system. A family of second order systems is described that requires fault tolerant compensators of arbitrarily high order. Udgivelsesdato: FEB

Stoustrup, Jakob; Blondel, V.D.

2004-01-01

88

Fault tolerant design of a servo manipulator system for hot cell operation  

Energy Technology Data Exchange (ETDEWEB)

In this paper, fault tolerant mechanisms are presented for a servo manipulator system designed to operate in a hot cell. A hot cell is a sealed and shielded room to handle radioactive materials, and it is dangerous for people to work in the hot cell. So, remote operations are necessary to handle the radioactive materials in the hot cell. KAERI has developed a servo manipulator system to perform such remote operations. However, since electric components such as servo motors are weakened with radiation, fault tolerant mechanisms have to be considered. For fault tolerance of the servo manipulator system, hardware and software redundancy has been considered. In the case of hardware, radioactive resistant electric components such as cables and connectors have been adopted and motors driving a transport have been duplicated. In case of software, a reconfiguration algorithm accommodating one motor's failure has been developed. The algorithm uses redundant axes to recover the end effector's motion in spite of one motor's failure.

Jin, Jae Hyun; Park, Byung Suk; Ahn, Sung Ho; Yoon, Ji Sup; Jung, Jae Hoo [KAERI, Taejon (Korea, Republic of)

2003-07-01

89

On Building a Scalable Real-Time Fault-Tolerant System for Embedded Applications  

Canada Institute for Scientific and Technical Information (Canada)

Real-time embedded systems have evolved during the past several decades from small custom-designed digital hardware to large distributed processing systems. As these systems become more complex, their interoperability, evolvability and cost-effectiveness requirements motivate the use of the commercial-off-the-shelf components. This raises the challenge of constructing dependable and predictable real-time services for application developers on top of the inexpensive hardware and software components which has minimal support for timeliness and dependability guarantees. We are addressing this challenge in the ARMADA project. ARMADA is a set of communication and middleware services that provide support for fault-tolerance and end-to-end guarantees for embedded real-time distributed applications. Since real-time performance of such applications depends heavily on the communication subsystem, the first thrust of the project is to develop a predictable communication service and architecture to ensure QoS-sensitive message delivery. In its second thrust, ARMADA aims to offload the complexity of developing fault-tolerance applications from the application programmer by focusing on a collection of modular, composable middleware for fault-tolerance group communication and replication under timing constraints. Finally, we develop tools for testing and validating the behavior of our services.

2001-01-01

90

Fault Tolerant Adiabatic Quantum Computation  

CERN Document Server

We develop a theory of fault tolerant adiabatic quantum computation (AQC), using a hybrid methodology involving subsystem and stabilizer codes, concatenated dynamical decoupling, and energy gaps. As an example we show how to perform fault tolerant AQC against 1-local noise using only 2-local interactions, as suitable for capacitively coupled flux qubits and polar molecules.

Lidar, Daniel A

2007-01-01

91

Application of Joint Parameter Identification and State Estimation to a Fault-Tolerant Robot System  

DEFF Research Database (Denmark)

The joint parameter identification and state estimation technique is applied to develop a fault-tolerant space robot system. The potential faults in the considered system are abrupt parametric faults, which indicate that some system parameters will immediately deviate from their nominal values if a fault happens. The concerned system parameters consist of deterministic parts as well as those describing the stochastic features in the system. Due to the purpose for design of reconfigurable control, these deviated system parameters need to be identified as precisely and quickly as possible. Meanwhile, it would further simplify the reconfigurable design task and possibly speed up the system recovery, if the system state information under the new operating circumstance can be available along with faulty parameter information. The joint parameter identification and state estimation using the combined Kalman Filter and Maximum Likelihood (KF-ML) techniques is discussed and applied in this study. The simulation results on a space robot system showed that the proposed method is quite promising in providing both faulty parameter information and state estimation in a quick, accurate and robust manner.

Sun, Zhen; Yang, Zhenyu

2011-01-01

92

Nonlinear, Adaptive and Fault-tolerant Control for Electro-hydraulic Servo Systems  

DEFF Research Database (Denmark)

Fluid power systems have been in use since 1795 with the rst hydraulic press patented by Joseph Bramah and today form the basis of many industries. Electro hydraulic servo systems are uid power systems controlled in closed-loop. They transform reference input signals into a set of movements in hydraulic actuators (cylinders or motors) by the means of hydraulic uid under pressure. With the development of computing power and control techniques during the last few decades, they are used increasingly in many industrial elds which require high actuation forces within limited space. However, despite numerous attractive properties, hydraulic systems are always subject to potential leakages in their components, friction variation in their hydraulic actuators and deciency in their sensors. These violations of normal behaviour reduce the system performances and can lead to system failure if they are not detected early and handled. Moreover, the task of controlling electro hydraulic systems for high performance operations is challenging due to the highly nonlinear behaviour of such systems and the large amount of uncertainties present in their models. This thesis focuses on nonlinear adaptive fault-tolerant control for a representative electro hydraulic servo controlled motion system. The thesis extends existing models of hydraulic systems by considering more detailed dynamics in the servo valve and in the friction inside the hydraulic cylinder. It identies the model parameters using experimental data from a test bed by analysing both the time response to standard input signals and the variation of the outputs with dierent excitation frequencies. The thesis also presents a model that accurately describes the static and dynamic normal behaviour of the system. Further, in this thesis, a fault detector is designed and implemented on the test bed that successfully diagnoses internal or external leakages, friction variations in the actuator or fault related to pressure sensors. The presented algorithm uses the position and pressure measurements to detect and isolate faults, avoiding missed detection and false alarm. The thesis also develops a high performance adaptive nonlinear controller for the hydraulic system which outperforms comparable linear controllers widely used in the industry. Because of the controller adaptivity, uncertainties in the model parameters can be handled. Moreover, a special attention is given to reduce the complexity of the controller in order to demonstrate its real-time implementation. Finally the thesis combines the techniques developed in fault detection and nonlinear control in order to develop an active fault-tolerant controller for electro hydraulic servo systems. In order to maintain overall service and performances as high as possible when a potential fault occurs, the fault-tolerant controlled system prognoses the fault and changes its controller parameters or structure. The consequences of an unexpected fault are avoided, high availability is ensured and the overall safety in electro hydraulic servo systems is increased.

Choux, Martin

2011-01-01

93

Robust fault tolerant control based on sliding mode method for uncertain linear systems with quantization.  

Science.gov (United States)

This paper is concerned with the problem of robust fault-tolerant compensation control problem for uncertain linear systems subject to both state and input signal quantization. By incorporating novel matrix full-rank factorization technique with sliding surface design successfully, the total failure of certain actuators can be coped with, under a special actuator redundancy assumption. In order to compensate for quantization errors, an adjustment range of quantization sensitivity for a dynamic uniform quantizer is given through the flexible choices of design parameters. Comparing with the existing results, the derived inequality condition leads to the fault tolerance ability stronger and much wider scope of applicability. With a static adjustment policy of quantization sensitivity, an adaptive sliding mode controller is then designed to maintain the sliding mode, where the gain of the nonlinear unit vector term is updated automatically to compensate for the effects of actuator faults, quantization errors, exogenous disturbances and parameter uncertainties without the need for a fault detection and isolation (FDI) mechanism. Finally, the effectiveness of the proposed design method is illustrated via a model of a rocket fairing structural-acoustic. PMID:23701895

Hao, Li-Ying; Yang, Guang-Hong

2013-05-20

94

Local rollback for fault-tolerance in parallel computing systems  

Science.gov (United States)

A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.

Blumrich, Matthias A. (Yorktown Heights, NY); Chen, Dong (Yorktown Heights, NY); Gara, Alan (Yorktown Heights, NY); Giampapa, Mark E. (Yorktown Heights, NY); Heidelberger, Philip (Yorktown Heights, NY); Ohmacht, Martin (Yorktown Heights, NY); Steinmacher-Burow, Burkhard (Boeblingen, DE); Sugavanam, Krishnan (Yorktown Heights, NY)

2012-01-24

95

Fault-tolerant Agreement in Synchronous Message-passing Systems  

CERN Multimedia

The present book focuses on the way to cope with the uncertainty created by process failures (crash, omission failures and Byzantine behavior) in synchronous message-passing systems (i.e., systems whose progress is governed by the passage of time). To that end, the book considers fundamental problems that distributed synchronous processes have to solve. These fundamental problems concern agreement among processes (if processes are unable to agree in one way or another in presence of failures, no non-trivial problem can be solved). They are consensus, interactive consistency, k-set agreement an

Raynal, Michel

2010-01-01

96

A Fault Tolerant Mobile Agent Information Retrieval System  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Problem statement: Most of the information retrieval systems used only client-server architectures. The client-server model though powerful, had some limitations. In mobile computing environment which has both wired network and wireless networks with limited communication capabilities, the pe...

R. Punithavathi; K. Duraiswamy

97

Fault-tolerant routing in peer-to-peer systems  

CERN Multimedia

We consider the problem of designing an overlay network and routing mechanism that permits finding resources efficiently in a peer-to-peer system. We argue that many existing approaches to this problem can be modeled as the construction of a random graph embedded in a metric space whose points represent resource identifiers, where the probability of a connection between two nodes depends only on the distance between them in the metric space. We study the performance of a peer-to-peer system where nodes are embedded at grid points in a simple metric space: a one-dimensional real line. We prove upper and lower bounds on the message complexity of locating particular resources in such a system, under a variety of assumptions about failures of either nodes or the connections between them. Our lower bounds in particular show that the use of inverse power-law distributions in routing, as suggested by Kleinberg (1999), is close to optimal. We also give efficient heuristics to dynamically maintain such a system as new...

Aspnes, J; Shah, G; Aspnes, James; Diamadi, Zoe; Shah, Gauri

2003-01-01

98

Theory of algorithm-based fault tolerance in array processor systems  

Energy Technology Data Exchange (ETDEWEB)

This thesis deals with a theoretical study of the scheme of algorithm-based fault tolerance and addresses four issues. First, it deals with some design issues of specific fault-tolerant and fault-secure schemes. Algorithms are classified into broad classes called paradigms which are determined exclusively by the communication patterns of the processors. The second part deals with the development of a model that can be used to analyze the fault-detecting and -locating capabilities of such algorithms. The model uses a broad interpretation of errors, faults and checks, which are represented as a tripartite graph. Three parameters are introduced to characterize the fault-tolerance scheme: the closure index, the masking index and the exposure index. In the third part, some graph-theoretic bounds are presented on various useful characteristics in algorithm-based fault tolerance. The model is used to determine bounds on the number of data elements that a processor may affect while allowing t-fault detection or t-fault location. Using these results, some upper and lower bounds are presented on the number of checks required to achieve detection or location. Finally, in order to estimate the overhead required in this fault-tolerant scheme, some bounds are derived on the number of processors and the time required for the execution of the checks. The last part of the thesis deals with a probabilistic study of the scheme.

Banerjee, P.

1985-01-01

99

Fault-tolerant quantum computation -- a dynamical systems approach  

CERN Multimedia

We apply a dynamical systems approach to concatenation of quantum error correcting codes, extending and generalizing the results of Rahn et al. [8] to both diagonal and nondiagonal channels. Our point of view is global: instead of focusing on particular types of noise channels, we study the geometry of the coding map as a discrete-time dynamical system on the entire space of noise channels. In the case of diagonal channels, we show that any code with distance at least three corrects (in the infinite concatenation limit) an open set of errors. For CSS codes, we give a more precise characterization of that set. We show how to incorporate noise in the gates, thus completing the framework. We derive some general bounds for noise channels, which allows us to analyze several codes in detail.

Fern, J; Simic, S; Sastry, S; Fern, Jesse; Kempe, Julia; Simic, Slobodan; Sastry, Shankar

2004-01-01

100

Fault tolerant, multiplexed control rod position detection and indication system for nuclear power plants  

International Nuclear Information System (INIS)

The majority of Westinghouse nuclear plants placed in service thus far have incorporated a Rod Position Indication system based upon an analog design philosophy. This system, while meeting all functional and accuracy requirements, has proven somewhat cumbersome, particularly in the area of initial field calibration and maintenance. This paper describes a new Digital Rod Position Indication system (DRPI) developed for use with pressurized water reactors. The system is based upon a digital design philosophy and meets all previous design constraints and environmental requirements. Further, fault tolerance, improved accuracy, interference from adjacent rods and the elimination of adjustments and calibration has been provided

1976-10-20

 
 
 
 
101

Survey On Fault Tolerance In Grid Computing  

Directory of Open Access Journals (Sweden)

Full Text Available Grid computing is defined as a hardware and software infrastructure that enables coordinatedresource sharing within dynamic organizations. In grid computing, the probability of a failure is muchgreater than in traditional parallel computing. Therefore, the fault tolerance is an important property inorder to achieve reliability, availability and QOS. In this paper, we give a survey on various faulttolerance techniques, fault management in different systems and related issues. A fault tolerance servicedeals with various types of resource failures, which include process failure, processor failure and networkfailures. This survey provides the related research results about fault tolerance in distinct functional areasof grid infrastructure and also gave the future directions about fault tolerance techniques, and it is a goodreference for researcher.

P. Latchoumy; P. Sheik Abdul Khader

2011-01-01

102

(m,n)-Semirings and a Generalized Fault Tolerance Algebra of Systems  

CERN Multimedia

We propose a new class of mathematical structures called (m,n)-semirings} (which generalize the usual semirings), and describe their basic properties. We also define partial ordering, and generalize the concepts of congruence, homomorphism, ideals, etc., for (m,n)-semirings. Following earlier work by Rao, we consider a system as made up of several components whose failures may cause it to fail, and represent the set of systems algebraically as an (m,n)-semiring. Based on the characteristics of these components we present a formalism to compare the fault tolerance behaviour of two systems using our framework of a partially ordered (m,n)-semiring.

Alam, Syed Eqbal; Davvaz, Bijan

2010-01-01

103

Software implemented fault tolerance: a methodology  

Energy Technology Data Exchange (ETDEWEB)

A methodology for fault tolerance is proposed. This is based on the interactions between hardware and software in a scheme made of intelligent modules. This is particularly applicable to VLSI systems. Particular emphasis has been posed on the software implementation to reduce the external hardware, as this is the main source of hard core failures. A design of a duplex hybrid system with software implemented fault tolerance is presented to evidentiate the novel characteristics of this approach. 21 references.

Lombardi, F.; Obac Roda, V.

1982-01-01

104

A modular-design approach to software fault tolerance in distributed computer systems  

Energy Technology Data Exchange (ETDEWEB)

An approach to design fault-tolerant software that must possess behavior that is very reliable (complying with its design specifications) in distributed computer systems (DCS) is proposed. The DCS is modeled as a set of communicating sequential processes with constraints on their execution time. Each process corresponds to the execution of software components that are part of the distributed software system. The approach is based on decentralized protocol to monitor the behavior of software components distributed over processors and communicating among them. In this approach, the Distributed Fault Handler (DFH) which is distributed over processors is developed for error detection and recovery during the execution of DCS Software. An error classification scheme is discussed, and an error-detection technique is presented. An asynchronous rollback recovery is proposed that minimizes the rollback distance. This approach is useful for constructing general distributed-applications that require high reliability.

Park, Eun-Kyo.

1988-01-01

105

Fault-tolerant delivery algorithms  

Energy Technology Data Exchange (ETDEWEB)

This dissertation addresses the problem of constructing a highly reliable delivery system in a distributed environment. It presents fault tolerance algorithms that guarantee the delivery of a message to its destination despite faults in one or more nodes in a system of loosely coupled processors. These algorithms are distinguished but not using extra hardware or checkpoint facilities that are common to many algorithms of their type. Instead, they maintain an appropriate number of copies of the message in nodes where the message passes. In the case of a fault, the algorithms locate a copy of the message closest to the destination, and resume delivery of the message from this location. The mechanism introduced in this dissertation can be implemented on existing distributed systems without the addition of specialized hardware or changes in the existing application program. Moreover, the proposed mechanism can be used transparently so that failure detection and recovery is automatic, and users are completely unaware of the detail of the algorithms. A complete analysis of both algorithms is presented in this dissertation. The communication overhead of each algorithm is presented. Also, the author discusses the conditions under which a loop may occur in a system where the algorithms are implemented. The availability of the system where the algorithms are implemented is found. The reliability model is presented in detail for each algorithm and different topology is examined. The parameters that affect the performance of both algorithms when implemented in a distributed system are presented based on our simulation result.

Al Jaber, H.S.

1990-01-01

106

Evaluation of error detection coverage and fault-tolerance of digital plant protection system in nuclear power plants  

Energy Technology Data Exchange (ETDEWEB)

Recently, traditional analog-based safety-related instrumentation and control (I and C) systems in nuclear power plants (NPPs) have been replaced with modern digital-based systems. Due to the digitalization of nuclear I and C systems, the safety assessment has become a major issue, as it is crucial to the system's reliability. In the safety assessment of the digitalized system, evaluation of error detection coverage and fault-tolerance are critical factors. For the evaluation, we use C++ based hardware description instead of a board with integrated circuit components. We select the digital plant protection system (DPPS) in NPPs as a target system. Permanent fault is used as a possible fault in the system and some error detection methods are used to detect errors. From the experiment, we confirmed that the proposed approach can evaluate the error detection coverage and the fault-tolerance of DPPS in NPPs.

Lee, Jun Seok [Center for Advanced Reactor Research, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701 (Korea, Republic of)]. E-mail: wahrheit@kaist.ac.kr; Kim, Man Cheol [Center for Advanced Reactor Research, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701 (Korea, Republic of)]. E-mail: charleskim@kaist.ac.kr; Seong, Poong Hyun [Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701 (Korea, Republic of)]. E-mail: phseong@kaist.ac.kr; Kang, Hyun Gook [Integrated Safety Assessment Team, Korea Atomic Energy Research Institute, 150 Deokjin-dong, Yuseong-gu, Daejeon 305-353 (Korea, Republic of)]. E-mail: hgkang@kaeri.re.kr; Jang, Seung Cheol [Integrated Safety Assessment Team, Korea Atomic Energy Research Institute, 150 Deokjin-dong, Yuseong-gu, Daejeon 305-353 (Korea, Republic of)]. E-mail: scjang@kaeri.re.kr

2006-04-15

107

A Study on Fault-Tolerant Software Architecture for COTS-Based Dependable System  

International Nuclear Information System (INIS)

Recently, with the rapid development of digital computers and information processing technologies, nuclear instrument and control (I and C) systems which needs safety-critical function have adopted digital technologies. Also, use of commercial off-the-shelf (COTS) software in safety-critical system has been incremented with several reasons such as economical efficiency and technical problems. But, it requires a considerable integration effort and brings about software quality and safety issues. COTS software is usually provided as a black box that cannot be modified. The biggest problem when we integrate such a product into dependable systems is the reliability of COTS software. There is no guarantee that the software will perform its function correctly. It may have bugs or unidentified components. Recently, the method of software verification and validation (V and V) is accepted as a way to assure the dependability of new-developed safety-critical nuclear I and C software. But, because of the limitation of COTS software, software V and V cant be applied as rigorously as new-developed software. There are considerable attentions into describing software architecture with respect to there dependability properties. In this paper, we present fault-tolerant software architecture using the C2 architectural style. The remainder of the paper is organized as follows: Section 2 discusses background work on the COTS software in nuclear I and C, software fault tolerance and C2 architectural style. Section 3 describes the architecture for fault-tolerant COTS-based software. Finally, we discuss the conclusion and future work

2005-01-01

108

Fault tolerant synchronization of chaotic heavy symmetric gyroscope systems versus external disturbances via Lyapunov rule-based fuzzy control.  

Science.gov (United States)

In this paper, fault tolerant synchronization of chaotic gyroscope systems versus external disturbances via Lyapunov rule-based fuzzy control is investigated. Taking the general nature of faults in the slave system into account, a new synchronization scheme, namely, fault tolerant synchronization, is proposed, by which the synchronization can be achieved no matter whether the faults and disturbances occur or not. By making use of a slave observer and a Lyapunov rule-based fuzzy control, fault tolerant synchronization can be achieved. Two techniques are considered as control methods: classic Lyapunov-based control and Lyapunov rule-based fuzzy control. On the basis of Lyapunov stability theory and fuzzy rules, the nonlinear controller and some generic sufficient conditions for global asymptotic synchronization are obtained. The fuzzy rules are directly constructed subject to a common Lyapunov function such that the error dynamics of two identical chaotic motions of symmetric gyros satisfy stability in the Lyapunov sense. Two proposed methods are compared. The Lyapunov rule-based fuzzy control can compensate for the actuator faults and disturbances occurring in the slave system. Numerical simulation results demonstrate the validity and feasibility of the proposed method for fault tolerant synchronization. PMID:21868010

Farivar, Faezeh; Shoorehdeli, Mahdi Aliyari

2011-08-24

109

Quantification of Unavailability of Digital Plant Protection System with Various Fault Tolerant Techniques in Nuclear Power Plants  

Energy Technology Data Exchange (ETDEWEB)

A digital plant protection system (DPPS) maintains safety by monitoring selected plant parameters, and initiating appropriate protective action when any parameter reaches to the set-point value. The protection system generates signal to actuate reactor trip whenever the process parameters exceed predefined limits. A DPPS is very important system to protect the core and the reactor coolant system. Therefore, it has various fault tolerant techniques to keep the system reliability and reactor safety. However, systematical frameworks or reasonable models to obtain the reliability of digital systems by considering the effects of fault tolerant techniques have not been proposed

Kim, Bo Gyung; Kang, Hyun Gook; Seong, Poong Hyun [KAIST, Daejeon (Korea, Republic of); Lee, Seung Jun [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

2011-10-15

110

Quantification of Unavailability of Digital Plant Protection System with Various Fault Tolerant Techniques in Nuclear Power Plants  

International Nuclear Information System (INIS)

A digital plant protection system (DPPS) maintains safety by monitoring selected plant parameters, and initiating appropriate protective action when any parameter reaches to the set-point value. The protection system generates signal to actuate reactor trip whenever the process parameters exceed predefined limits. A DPPS is very important system to protect the core and the reactor coolant system. Therefore, it has various fault tolerant techniques to keep the system reliability and reactor safety. However, systematical frameworks or reasonable models to obtain the reliability of digital systems by considering the effects of fault tolerant techniques have not been proposed

2011-01-01

111

State of the art on fault-tolerant real time distributed systems  

International Nuclear Information System (INIS)

The integration of new computerized functions in power plant, and especially nuclear power plant, control and instrumentation systems implies more and more stringent requirements as to communication system reliability. For if an item of equipment, or even a computer program, can be validated and qualified, no formal qualification procedure is presently imposed on communication networks. This is certainly due to the relative immaturity of these networks, but also to their complexity. It is for this reason that, in the context of preparation for the future PWR 2000 standardized nuclear plants, it would seem appropriate to take a look at fault-tolerant communication systems. Since C and I type applications (in the control room) are divided between several computers and are required to contend with extremely severe time constraints, EDF has undertaken investigation of fault-tolerant, real time distributed systems. This paper summarized the state of the art in the field as it appears from discussion with computer manufacturers, academics and research workers on related projects. The results obtained were then used to determine trends as to ''promising'' solutions. The paper concludes with recommended study programs for the PCC department of EDF/R and DD for the next few years. (author), 9 figs., 10 refs., 2 annexes.

1992-01-01

112

Economical and Fault-Tolerant Load Balancing in Distributed Stream Processing Systems  

Science.gov (United States)

We present an economical and fault-tolerant load balancing strategy (EFTLBS) based on an operator replication mechanism and a load shedding method, that fully utilizes the network resources to realize continuous and highly-available data stream processing without dynamic operator migration over wide area networks. In this paper, we first design an economical operator distribution (EOD) plan based on a bin-packing model under the constraints of each stream bandwidth as well as each server's CPU capacity. Next, we devise super-operator (SO) that load balances multi-degree operator replicas. Moreover, for improving the fault-tolerance of the system, we color the SOs based on a coloring bin-packing (CBP) model that assigns peer operator replicas to different servers. To minimize the effects of input rate bursts upon the system, we take advantage of a load shedding method while keeping the QoS guarantees made by the system based on the SO scheme and the CBP model. Finally, we substantiate the utility of our work through experiments on ns-3.

Xiao, Fuyuan; Kitasuka, Teruaki; Aritsugi, Masayoshi

113

Reverse Computation for Rollback-based Fault Tolerance in Large Parallel Systems  

Energy Technology Data Exchange (ETDEWEB)

Reverse computation is presented here as an important future direction in addressing the challenge of fault tolerant execution on very large cluster platforms for parallel computing. As the scale of parallel jobs increases, traditional checkpointing approaches suffer scalability problems ranging from computational slowdowns to high congestion at the persistent stores for checkpoints. Reverse computation can overcome such problems and is also better suited for parallel computing on newer architectures with smaller, cheaper or energy-efficient memories and file systems. Initial evidence for the feasibility of reverse computation in large systems is presented with detailed performance data from a particle simulation scaling to 65,536 processor cores and 950 accelerators (GPUs). Reverse computation is observed to deliver very large gains relative to checkpointing schemes when nodes rely on their host processors/memory to tolerate faults at their accelerators. A comparison between reverse computation and checkpointing with measurements such as cache miss ratios, TLB misses and memory usage indicates that reverse computation is hard to ignore as a future alternative to be pursued in emerging architectures.

Perumalla, Kalyan S [ORNL; Park, Alfred J [ORNL

2013-01-01

114

Scheduling and Optimization of Fault-Tolerant Embedded Systems with Transparency/Performance Trade-Offs  

DEFF Research Database (Denmark)

In this article, we propose a strategy for the synthesis of fault-tolerant schedules and for the mapping of fault-tolerant applications. Our techniques handle transparency/performance trade-offs and use the faultoccurrence information to reduce the overhead due to fault tolerance. Processes and messages are statically scheduled, and we use process reexecution for recovering from multiple transient faults. We propose a finegrained transparent recovery, where the property of transparency can be selectively applied to processes and messages. Transparency hides the recovery actions in a selected part of the application so that they do not affect the schedule of other processes and messages. While leading to longer schedules, transparent recovery has the advantage of both improved debuggability and less memory needed to store the faulttolerant schedules.

Izosimov, Viacheslav; Pop, Paul

2012-01-01

115

Fault-tolerant adaptive control for load-following in static space nuclear power systems  

International Nuclear Information System (INIS)

[en] In this paper the possible use of dual-loop, model-based adaptive control system for load-following in static space nuclear power systems is investigated. The objective of the fault-tolerant, autonomous control system is to deliver the demanded electric power at the desired voltage level, by appropriately manipulating the neutron power through the control drums. As a result sufficient thermal power is produced to meet the required demand in the presence of dynamically changing system operating conditions and potential sensor failures. The designed controller is proposed for use in combination with the currently considered shunt regulators, or as a back-up controller when other means of power system control, including some of the sensors, fail

1992-01-01

116

???????? ?????????? ??????????? ?????? ??? ?????????????? ?????? ?? ?????? ??????????-?????????? ?????? ???????? ??????? ??????????? ??????? ??? ???????????????? ?????? ?? ?????? ??????????-?????????? ?????? Computation method ?f minimal cut sets for fault-tolerant systems based ?n structural-automatic model  

Directory of Open Access Journals (Sweden)

Full Text Available ? ?????? ???????????? ???????? ??? ????????? ?? ?????? ????-????????? ?????????? ???????????. ???? ?????????? ?????????? ???????? ??????????-?????????? ?????? ? ???? ? ???????????????? ?????? ????????? ???? ?????? ?? ??-???????, ?????????? ?? ????'???????? ??????? ????????????? ???????. ? ?????????? ??????????? ??????? ?????????????? ????????? ? ???? ??????????? ?????????? ???????. ???? ???????? ???????????? ?? ???????????? ??????? ?????? ??????.? ?????? ???????????? ???????? ??????? ?????????? ?? ?????? ??????????????????? ?????????? ?????????????. ?????? ?????????? ??????????????? ?????????? ??????????-?????????? ?????? ? ????? ? ?????????????????? ?????? ???????? ???? ????????? ? ?????????, ??????????? ? ???????? ??????? ???????????????? ?????????. ? ?????????? ?????????? ??????? ?????????????? ????????? ?? ??????? ?????????? ??????????? ???????. ?????? ???????? ???????????? ??? ???????????? ??????? ?????? ???????.Introduction. The ?omputation method of minimal cut sets for fault-tolerant structures based on structural-automatic model is presented in this work. Method. The first step of this method is the development of binary structural -automatic model – formalized representation of system. Structural-automatic model is the data for program ASNA; and the state graph and transitions for the fault-tolerant system are formed in automatic mode. Also program model forms the system of differential equations Chapmen-Kolmogorov and solves it. The probability distribution in each state is gotten as the result of this calculation. Analyzing the list of states the minimal cut sets is calculated. Conclusion. The alternative of fault tree analysis is showed in presented work. This method has automatic event so it is faster then fault tree analysis.

B. Y. Volochiy; L. D. Ozirkovskyy; A. V. Mashchak; O. P. Shkil?uk; I. V. Kulyk

2013-01-01

117

Fault-tolerant system analysis: imperfect switching and maintenance. Final technical paper  

Energy Technology Data Exchange (ETDEWEB)

This final report presents the results of research into two important areas of concern for fault-tolerant avionics systems: testability analysis and innovative repair policies. The algorithms developed from this research have been included in the Mission Reliability Model (MIREM) and verified by comparison with known results from several Integrated Communication, Navigation, and Identification Avionics architectures. The purpose of the testability analysis was to develop techniques for assessing the impact of imperfect switching on the overall reliability of fault-tolerant avionics. A method of quantifying the effects of undetected errors and false alarms has been developed and included in MIREM. Under the next phase of the program, three repair statistics were identified: Mean Time To Repair, Mean Time Between Maintenance Actions, and Inherent Availability. These were used to define four alternative repair policies: immediate repair, deferred repair, scheduled maintenance, and repair at degraded level. Also included in MIREM as model outputs, these four options offer greater flexibility in evaluating and developing avionics designs.

Veatch, M.H.; Foley, R.D.

1987-01-01

118

Multi-agent Platform and Toolbox for Fault Tolerant Networked Control Systems  

Directory of Open Access Journals (Sweden)

Full Text Available Industrial distributed networked control systems use different communication networks to exchange different critical levels of information. Real-time control, fault diagnosis (FDI) and Fault Tolerant Networked Control (FTNC) systems demand one of the more stringent data exchange in the communication networks of these networked control systems (NCS). When dealing with large-scale complex NCS, designing FTNC systems is a very difficult task due to the large number of sensors and actuators spatially distributed and network connected. To solve this issue, a FTNC platform and toolbox are presented in this paper using simple and verifiable principles coming mainly from a decentralized design based on causal modelling partitioning of the NCS and distributed computing using multi-agent systems paradigm, allowing the use of agents with well established FTC methodologies or new ones developed taking into account the NCS specificities. The multi-agent platform and toolbox for FTNC systems have been built in Matlab/Simulink environment, which is in our days the scientific benchmark for this kind of research. Although the tests have been performed with a simple case, the results are promising and this approach is expected to succeed with more complex processes.

Mário J. G. C. Mendes; Bruno M. S. Santos; José Sá da Costa

2009-01-01

119

Advances in Database Technology: F1-Fault Tolerant RDBMS, C-Block and Q system  

Directory of Open Access Journals (Sweden)

Full Text Available In this paper, we discuss the latest database technologies that supports the present critical challenges faced by the organizations, as managing the data effectively has become a major need. In particular the latest f1-fault tolerant distributed RDBMS which is an hybrid database that combines the scalability of big table andfunctionality of SQL are discussed. Then c-block system that address the challenge of identifying the duplicates in large datasets for better efficiency and next the q system for efficient data integration which performs automatic data integration for the incoming datasets are discussed and finally we examine the integration of all these technologies in a system that would address the issues pertaining to data management

Y.Sailaja1 , M.Nalini Sri

2013-01-01

120

Energy-Aware Fault Tolerance in Hard Real-Time Embedded Systems  

Directory of Open Access Journals (Sweden)

Full Text Available Energy consumption of electronic devices has become a serious concern in recent years. Energy efficiency is necessary to lengthen the battery lifetime in portable systems, as well as to reduce the operational costs and the environmental impact of stationary systems. Dynamic power management (DPM) algorithms aim to reduce the energy consumption at the system level by selectively placing components into low-power states. Dynamic voltage scaling (DVS) algorithms reduce energy consumption by changing processor speed and voltage at run-time depending on the needs of the applications running. The proposed method is extended by integrating the DPM model DVS algorithm, thus enabling larger energy savings. The proposed methods are i) Postponement method and ii) Hybrid method. fault tolerance are also achieved by increasing transistor density and decreasing supply voltage.

S.Subha; N.Kumaresan

2012-01-01

 
 
 
 
121

GRID COMPUTING AND FAULT TOLERANCE APPROACH  

Directory of Open Access Journals (Sweden)

Full Text Available Grid computing is a means of allocating the computational power of alarge number of computers to complex difficult computation orproblem. Grid computing is a distributed computing paradigm thatdiffers from traditional distributed computing in that it is aimed toward large scale systems that even span organizational boundaries. This paper proposes a method to achieve maximum fault tolerance in the Grid environment system by using Reliability consideration by using Replication approach and Check-point approach. Fault tolerance is an important property for large scale computational grid systems, where geographically distributed nodes co-operate to execute a task. In order to achieve high level of reliability and availability, the grid infrastructure should be a foolproof fault tolerant. Since the failure of resources affects job execution fatally, fault tolerance service is essential to satisfy QOS requirement in grid computing. Commonly utilized techniques for providing fault tolerance are job check pointing and replication. Both techniques mitigate the amount of work lost due to changing system availability but can introduce significant runtime overhead. The latter largely depends on the length of check pointing interval and the chosen number of replicas, respectively. In case of complex scientific workflows where tasks can execute in well defined order reliability is another biggest challenge because of the unreliable nature of the grid resources.

Pankaj Gupta,

2011-01-01

122

Achieving consensus in fault-tolerant distributed computer systems: protocols, lower bounds, and simulations  

Energy Technology Data Exchange (ETDEWEB)

Several new results in the area of fault-tolerant distributed computing are given. Specific interest is consensus protocols, that is, protocols that enable correct processors to reach agreement in the presence of disruptive behavior by faulty processors. Results presented here are as follows: two new efficient agreement protocols, one randomized and one deterministic; a method for efficiently transforming a protocol that reaches agreement on a single bit into a protocol that reaches agreement on values chosen from a larger set; a general method for compiling a protocol that tolerates relatively benign processor faults into one that tolerates more serious processor faults; and a strengthening of the known lower bound on the number of rounds of communication required by consensus protocols.

Coan, B.A.

1987-01-01

123

Fault Tolerant Control of Induction Motor  

Directory of Open Access Journals (Sweden)

Full Text Available The principle of vector control of electrical machines is to control both the magnitude and the phase of each phase, current and voltage. MATLAB/Simulink has been performed for assessment of operating features of the proposed scheme. Proportional Integral (PI) speed controller is designed in this paper. Test response of the developed variable speed drive along with the simulated response is given and discussed in detail for torque and speed. Fault tolerant fundamental is applied to the system when it is subject to a system fault. Two faults are investigated in this paper, stator short winding and broken rotor bar. The induction motor operates with acceptable performance in both speed and torque. The induction motor modeling along with the vector control fault tolerant scheme is investigated to show the optimal response of the control system

Khalaf Salloum Gaeid

2011-01-01

124

Optimised sensor selection for control and fault tolerance of electromagnetic suspension systems: A robust loop shaping approach.  

UK PubMed Central (United Kingdom)

This paper presents a systematic design framework for selecting the sensors in an optimised manner, simultaneously satisfying a set of given complex system control requirements, i.e. optimum and robust performance as well as fault tolerant control for high integrity systems. It is worth noting that optimum sensor selection in control system design is often a non-trivial task. Among all candidate sensor sets, the algorithm explores and separately optimises system performance with all the feasible sensor sets in order to identify fallback options under single or multiple sensor faults. The proposed approach combines modern robust control design, fault tolerant control, multiobjective optimisation and Monte Carlo techniques. Without loss of generality, it's efficacy is tested on an electromagnetic suspension system via appropriate realistic simulations.

Michail K; Zolotas AC; Goodall RM

2013-09-01

125

Reversible Fault-Tolerant Logic  

CERN Multimedia

It is now widely accepted that the CMOS technology implementing irreversible logic will hit a scaling limit beyond 2016, and that the increased power dissipation is a major limiting factor. Reversible computing can potentially require arbitrarily small amounts of energy. Recently several nano-scale devices which have the potential to scale, and which naturally perform reversible logic, have emerged. This paper addresses several fundamental issues that need to be addressed before any nano-scale reversible computing systems can be realized, including reliability and performance trade-offs and architecture optimization. Many nano-scale devices will be limited to only near neighbor interactions, requiring careful optimization of circuits. We provide efficient fault-tolerant (FT) circuits when restricted to both 2D and 1D. Finally, we compute bounds on the entropy (and hence, heat) generated by our FT circuits and provide quantitative estimates on how large can we make our circuits before we lose any advantage ove...

Boykin, P O; Roychowdhury, Vwani P.

2005-01-01

126

Backstepping decentralized fault tolerant control for reconfigurable modular robots  

Directory of Open Access Journals (Sweden)

Full Text Available For the actuators fault of reconfigurable modular robots, a backstepping decentralized fault tolerant control(DFTC) algorithm is proposed. The reconfigurable robot system is divied into a set of interconnected subsystems. The fault tolerant controller is designed based on backstepping method.

Jinbao He; Xinhua Yi; Zaifei Luo; Guojun Li

2013-01-01

127

A Fault-Tolerant Modulation Method to Counteract the Double Open-Switch Fault in Matrix Converter Drive Systems without Redundant Power Devices  

DEFF Research Database (Denmark)

This paper studies the double open-switch fault issue occurring within the conventional matrix converter driving a three-phase permanent-magnet synchronous motor system and proposes a fault-tolerant solution by introducing a revised modulation strategy. In this switching strategy, the rectifier-stage modulation is adjusted based on the knowledge of the switching logics of the inverter-stage and the operating input voltage sectors. However, the proposed fault-tolerant method does not rely on the assist of any redundant power devices or any reconfiguration of the matrix converter circuit by means of using redundant physical connections. It is shown that different locations of the double open switch affect the availability of the revised modulation. The steady state absolute speed error achieved with the proposed method is 4% of the nominal speed. Experimental results are performed to demonstrate the efficacy of the proposed methods.

Chen, Der-Fa; Nguyen-Duy, Khiem

2012-01-01

128

Coordinated Fault Tolerance for High-Performance Computing  

Energy Technology Data Exchange (ETDEWEB)

Our work to meet our goal of end-to-end fault tolerance has focused on two areas: (1) improving fault tolerance in various software currently available and widely used throughout the HEC domain and (2) using fault information exchange and coordination to achieve holistic, systemwide fault tolerance and understanding how to design and implement interfaces for integrating fault tolerance features for multiple layers of the software stack—from the application, math libraries, and programming language runtime to other common system software such as jobs schedulers, resource managers, and monitoring tools.

Dongarra, Jack; Bosilca, George; et al.

2013-04-08

129

Simulation Framework for Evaluation of Fault Tolerant Large Dynamic Distributed System  

Directory of Open Access Journals (Sweden)

Full Text Available The use of Java based simulators in the design and development of distributed system for evaluating the dependability on algorithms is appreciable due to their efficiency and scalability. It allows in designing the realistic simulation scenarios. In this work, we have proposed a Saturn, a multithreaded process oriented over simulation framework which is designed for modeling large scale distributed system. Realistic simulation is provided by it to provide a wide-range of distributed system technologies. It is an innovative solution to the problem of evaluating dependability characteristics of distributed system. Our solution is based on several proposed extensions to the simulation model of the MONARC simulation framework. These extensions refer to fault tolerance and system orchestration mechanisms in order to access the reliability and availability of distributed systems. The extended simulation model includes the necessary components to describe various actual failure situations and provides the mechanism to evaluate different strategies for replication and redundancy procedure as well as security enforcement mechanism. It is a simulator which also evaluates major QoS of the heartbeat based adaptive failure detection mechanism.

Sanjay Bansal

2012-01-01

130

Fault Tolerant Routing in SCI  

UK PubMed Central (United Kingdom)

We study the problem of routing based fault tolerance in SCI-networks. The paper provides a two step algorithm to route around faulty regions. The first phase consists of finding new paths for the packets that should have traversed the faulty region. The second phase, which is the challenging one, consists of modifying the routing strategy, so that the resulting conguration is free from deadlocks. The strategy has been evaluated through simulation of node-faults in a 12×8 torus, using the SCI-model for OPNET.

Olav Lysne; Sissel Herambtangen

131

Scheduling and Voltage Scaling for Energy/Reliability Trade-offs in Fault-Tolerant Time-Triggered Embedded Systems  

DEFF Research Database (Denmark)

In this paper we present an approach to the scheduling and voltage scaling of low-power fault-tolerant hard real-time applications mapped on distributed heterogeneous embedded systems. Processes and messages are statically scheduled, and we use process re-execution for recovering from multiple transient faults. Addressing simultaneously energy and reliability is especially challenging because lowering the voltage to reduce the energy consumption has been shown to exponentially increase the number of transient faults. In addition, time-redundancy based fault-tolerance techniques such as re-execution and dynamic voltage scaling-based low-power techniques are competing for the slack in the schedules. Our approach decides the voltage levels and start times of processes and the transmission times of messages, such that the transient faults are tolerated, the timing constraints of the application are satisfied and the energy is minimized. We present a constraint logic programming- based approach which is able to find reliable and schedulable implementations within limited energy and hardware resources. The developed algorithms have been evaluated using extensive experiments.

Pop, Paul; Poulsen, Kåre Harbo

2007-01-01

132

Mapping tasks into fault tolerant manipulators  

Energy Technology Data Exchange (ETDEWEB)

The application of robots in critical missions in hazardous environments requires the development of reliable or fault tolerant manipulators. In this paper, we define fault tolerance as the ability to continue the performance of a task after immobilization of a joint due to failure. Initially, no joint limits are considered, in which case we prove the existence of fault tolerant manipulators and develop an analysis tool to determine the fault tolerant work space. We also derive design templates for spatial fault tolerant manipulators. When joint limits are introduced, analytic solutions become infeasible but instead a numerical design procedure can be used, as is illustrated through an example.

Paredis, C.J.J.; Khosla, P.K.; Kanade, T. [Carnegie Mellon Univ., Pittsburgh, PA (United States)

1994-12-31

133

Fault tolerant architectures by partial reconfiguration  

Science.gov (United States)

The utilization of SRAM-based FPGAs in the implementation of embedded systems is in continuous growth. The flexibility that these devices offer in terms of hardware re-programming can be also a critical point to take into account when designing fault tolerant systems. As configuration values are stored in volatile memory, any event that affects this configuration memory can lead to undesirable changes in the circuits and as a consequence, erroneous outcomes can be obtained. This paper presents an approach to add fault tolerance in an aerospace application implemented in a commercial-off-the shelf FPGA (Virtex-5). By using this device, the partial reconfiguration facility can be exploited. This feature allows us to get more flexibility in hardware management at run-time also as a mean to correct specific parts of the system when faults are detected. Results regarding influence in area by using different approaches are presented.

Cardona, Luis Andrés.; Guo, Yi; Ferrer, Carles

2013-05-01

134

Aspects of fault tolerant ring structures  

Energy Technology Data Exchange (ETDEWEB)

Fault tolerant ring structures are a variation of the classical n-modular redundant (NMR) structures. In this paper, various aspects of synchronisation of TMR (triple modular redundant) ring structures are analysed. Some practical results are presented for an experimental system using 8748 microcomputers. 18 references.

Obac-roda, V.; Davies, O.J.

1982-01-01

135

Scalable distributed consensus to support MPI fault tolerance.  

Energy Technology Data Exchange (ETDEWEB)

As system sizes increase, the amount of time in which an application can run without experiencing a failure decreases. Exascale applications will need to address fault tolerance. In order to support algorithm-based fault tolerance, communication libraries will need to provide fault-tolerance features to the application. One important fault-tolerance operation is distributed consensus. This is used, for example, to collectively decide on a set of failed processes. This paper describes a scalable, distributed consensus algorithm that is used to support new MPI fault-tolerance features proposed by the MPI 3 Forum's fault-tolerance working group. The algorithm was implemented and evaluated on a 4,096-core Blue Gene/P. The implementation was able to perform a full-scale distributed consensus in 305 {mu}s and scaled logarithmically.

Buntinas, D. (Mathematics and Computer Science)

2011-01-01

136

Fault Tolerant External Memory Algorithms  

DEFF Research Database (Denmark)

Algorithms dealing with massive data sets are usually designed for I/O-efficiency, often captured by the I/O model by Aggarwal and Vitter. Another aspect of dealing with massive data is how to deal with memory faults, e.g. captured by the adversary based faulty memory RAM by Finocchi and Italiano. However, current fault tolerant algorithms do not scale beyond the internal memory. In this paper we investigate for the first time the connection between I/O-efficiency in the I/O model and fault tolerance in the faulty memory RAM, and we assume that both memory and disk are unreliable. We show a lower bound on the number of I/Os required for any deterministic dictionary that is resilient to memory faults. We design a static and a dynamic deterministic dictionary with optimal query performance as well as an optimal sorting algorithm and an optimal priority queue. Finally, we consider scenarios where only cells in memory or only cells on disk are corruptible and separate randomized and deterministic dictionaries in the latter.

JØrgensen, Allan GrØnlund; Brodal, Gerth StØlting

2009-01-01

137

Heap Base Coordinator Finding with Fault Tolerant Method in Distributed Systems  

Directory of Open Access Journals (Sweden)

Full Text Available Coordinator finding in wireless networks is a very important problem, and this problem is solved by suitable algorithms. The main goals of coordinator finding are synchronizing the processes at optimal using of the resources. Many different algorithms have been presented for coordinator finding. The most important leader election algorithms are the Bully and Ring algorithms. In this paper we analyze and compare these algorithms with together and we propose new approach with fault tolerant mechanisms base on heap for coordinator finding in wireless environment. Our algorithm's running time and message complexity compare favorably with existing algorithms. Our work involves substantial modifications of an existing algorithm and its proof, and we adapt the existing algorithms to the noisy environment base on fault tolerant mechanisms

Mehdi EffatParvar; AmirMasoud Rahmani; MohammadReza EffatParvar; Mehdi Dehghan

2011-01-01

138

System Description for a Scalable, Fault-Tolerant, Distributed Garbage Collector  

CERN Document Server

We describe an efficient and fault-tolerant algorithm for distributed cyclic garbage collection. The algorithm imposes few requirements on the local machines and allows for flexibility in the choice of local collector and distributed acyclic garbage collector to use with it. We have emphasized reducing the number and size of network messages without sacrificing the promptness of collection throughout the algorithm. Our proposed collector is a variant of back tracing to avoid extensive synchronization between machines. We have added an explicit forward tracing stage to the standard back tracing stage and designed a tuned heuristic to reduce the total amount of work done by the collector. Of particular note is the development of fault-tolerant cooperation between traces and a heuristic that aggressively reduces the set of suspect objects.

Allen, N

2002-01-01

139

A Direct Design from Input/Output Data of Fault-Tolerant Control System Based on GIMC Structure  

Science.gov (United States)

This paper deals with a design method of fault-tolerant control system based on Generalized Internal Model Control (GIMC) structure consisting of a standard outer loop feedback controller and an extra inner loop controller. The distinguished feature of GIMC structure is that the controller design for performance and robustness may be done separately. The outer loop controller is designed for nominal performance using some controller synthesis to meet (nominal) control specification, while the inner loop controller is designed to make a trade-off between robustness and performance. This feature is suitable for fault-tolerant control. The outer loop controller is designed for fault-free case, and the inner loop controller for faulty case. In the conventional methods, the inner loop controller is designed to maximize the robust stability margin without information on fault. Therefore, the performance in the faulty case tends to become conservative. In this paper, the inner loop controller is directly designed from experimental data collected from the faulty system. Since the collected data contains information on the fault, conservativeness in the conventional methods is decreased. The inner loop controller is designed by Virtual Reference Feedback Tuning (VRFT). VRFT is a direct design method from input-output data without identifying any models. Since complexity of the controller can be specified by the designer, no complexity reduction has to be required, which becomes advantageous upon implementation. The effectiveness of the proposed design method is confirmed by an experiment.

Sakuishi, Tsubasa; Yubai, Kazuhiro; Hirai, Junji

140

Detectors and Correctors: A Theory of Fault-Tolerance Components  

UK PubMed Central (United Kingdom)

In this paper, we show that two types of tolerance components, namely detectorsand correctors, appear in a rich class of fault-tolerant systems. This class includessystems designed using the wellknown techniques of encapsulation and refinement,as well as systems designed using extant fault-tolerance methods such as replicationand the state-machine approach. Our demonstration is via a theory of detectors andcorrectors, which characterizes the particular role of these components in achievingvarious types of fault-tolerance. Based on this theory and on our experience withusing these components in designs, we suggest that detectors and correctors providea powerful basis for efficient, component-based design of fault-tolerance.Keywords : Composition, Fault environment, Tolerance components, Tolerancedesign1A preliminary version of this paper appeared as [6].Email: fanish,kulkarnig@cis.ohio-state.edu ; Web: http://www.cis.ohio-state.edu/f~ anish,~kulkarni g;Tel: +...

Anish Arora; Sandeep S. Kulkarni

 
 
 
 
141

Detectors and Correctors: A Theory of Fault-Tolerance Components  

UK PubMed Central (United Kingdom)

In this paper, we show that two types of tolerance components, namely detectorsand correctors, appear in a rich class of fault-tolerant systems. This class includessystems designed using the wellknown techniques of encapsulation and refinement,as well as systems designed using extant fault-tolerance methods such as replicationand the state-machine approach. Our demonstration is via a theory of detectors andcorrectors, which characterizes the particular role of these components in achievingvarious types of fault-tolerance. Based on this theory and on our experience withusing these components in designs, we suggest that detectors and correctors providea powerful basis for efficient, component-based design of fault-tolerance.Keywords : Composition, Fault environment, Tolerance components, Tolerancedesign1A preliminary version of this paper appeared as [6].Email: fanish,kulkarnig@cis.ohio-state.edu ; Web: http://www.cis.ohio-state.edu/f~ anish,~kulkarni g;Tel: +1-614-292-18...

Anish Arora; Sandeep S. Kulkarni

142

FAULT TOLERANCE IN FPGA THROUGH KING SHIFTING  

Directory of Open Access Journals (Sweden)

Full Text Available A wide range of fault tolerance methods for FPGAs have been proposed. Approaches range from simple architectural redundancy to fully on-line adaptive implementations. The homogeneous structure of ?eld programmable gate arrays (FPGAs) suggests that the defect tolerance can be achieved by shifting the con?guration data inside the FPGA. All methods and schemes are qualitatively compared and some particularly promising approaches are highlighted. The applications of these methods also differ; some are used only for manufacturing yield enhancement, while others can be used in-system. This survey attempts to provide an overview of the current state of the art for fault tolerance in FPGAs.In this paper we have discussed the king shifting allocation method.

R.V.Kshirsagar; S. Sharma

2012-01-01

143

A knowledge model for software fault tolerance  

Energy Technology Data Exchange (ETDEWEB)

In this dissertation, a knowledge-based model is presented to deal with the software fault tolerance for a finite state machine (FSM) based system. The inference rules stored in the knowledge base are derived from the process requirements and specifications which are described by the Specification and Description Language (SDL), a CCITT recommendation standard. To supplement insufficient facts of the inference rules, a set of inference axioms is added into the knowledge base. The inference axiom is derived from heuristic and empirical knowledge of and expert. They are simple in nature and can be incrementally added by the experts to the knowledge base. In addition to the inference rules, the author also presents an effective fault recovery algorithm to recover the process from all possible software faults. Some theoretical support of the fault recovery scheme is presented too. Finally, to better understand the performance of the mode, the author implements an experimental system and perform a simulation on it. The performance of the model is measured by serviceability and recoverability. The serviceability shown from the result of the simulation is 60% improvement in average and the recoverability (67.2%) obtained from the simulation is comparable to that previously reported on another fault tolerant system. All of these have been shown to be quite satisfactory.

Hsueh, J.C.C.

1989-01-01

144

FaulTM: Fault-Tolerance Using Hardware Transactional Memory  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Fault-tolerance has become an essential concern for processor designers due to increasing soft-error rates. In this study, we are motivated by the fact that Transactional Memory (TM) hardware provides an ideal base upon which to build a fault-tolerant system. We show how it is possible to provide lo...

Yalcin, Gulay; Unsal, Osman; Hur, Ibrahim; Cristal, Adrian; Valero, Mateo

145

Identification of Critical Factors in Checkpointing Based Multiple Fault Tolerance for Distributed System  

Directory of Open Access Journals (Sweden)

Full Text Available Performance of a checkpointing based multiple fault tolerance is low. The main reason is overheads associate with checkpointing. A checkpointing algorithm can be improved by improved storing strategy and checkpointing scheduling. Improved storage strategy and checkpointing scheduling will reduce the overheads associated with checkpointing. Performance and efficiency is most desirable feature of recovery based on checkpointing. In this paper important critical issues involved in fast and efficient recovery are discussed based on checkpointing. Impact of each issue on performance of checkpointing based recovery is also discussed. Relationships among issues are also explored. Finally comparisons of important issues are done between coordinated checkpointing and uncoordinated checkpointing.

Sanjay Bansal; Sanjeev Sharma

2011-01-01

146

Development of advanced instrumentation and control techniques. A study on the development of fault-tolerant digital control systems  

Energy Technology Data Exchange (ETDEWEB)

This project developed control algorithms for steam generator level control systems in the low power range and proposed fault detection methods applicable above 20% power range. As control schemes applicable in the low power range, designed were PID control, adaptive GPC, H{sub 2} control H{infinity}. control and {mu}-synthesis scheme. Feed-forward control was also designed in case of PID and adaptive GPC, and comparative simulation studies have been done under various scenarios. From the simulation studies, adaptive GPC algorithm can be recommended as a very promising control scheme in the low power range. For developing fault detection scheme, we have classified and modeled various faults of the steam generator level control system based on a field document. Among the various fault detection methods, parameter identification method and unknown input observer method were designed and their performances have shown by simulation studies. Finally we have surveyed existing fault tolerant control methods for the next year project. summarized. (author)., 63 refs., 64 figs., 6 tabs.

Lee, Sang Jeong; Seong, Se Jin [Pohang University of Science Technology, Pohang (Korea, Republic of); Kim, Sang Woo; Hong, Seok Min; Yun, Sang Jun; Park, Sang Hyun; Jeong, Il Young [Chungnam National University, Taejon (Korea, Republic of)

1995-08-01

147

Strategies for Fault Tolerance in Multicomponent Applications  

Energy Technology Data Exchange (ETDEWEB)

This paper discusses on-going work with the Integrated Plasma Simulator (IPS), a framework for coupled multiphysics simulations of plasmas, to allow simulations to run through the loss of nodes on which the simulation is executing. While many different techniques are available to improve the fault tolerance of computational science applications on high-performance computer systems, checkpoint/restart (C/R) remains virtually the only one that see widespread use in practice. Our focus here is to augment the traditional C/R approach with additional techniques that can provide a more localized and tailored response to faults based on the ability to restart failed tasks on an individual basis, and the use of information external to the application itself in order to guide decision-making, in many cases avoiding the need to stop and restart the entire simulation. This capability involves several features within the IPS framework, and leverages the Fault Tolerance Backplane, a publish/subscribe event service to disseminate fault-related information throughout HPC systems, to obtain information from the Reliability, Availability and Serviceability (RAS) subsystem of the HPC system. This work is described in the context of Cray XT-series computer systems for concreteness, but is applicable to other environments as well. As part of the analysis of this work, we discuss the requirements to generalize this approach to other complex simulation applications beyond the Integrated Plasma Simulator.

Shet, Aniruddha G [ORNL; Elwasif, Wael R [ORNL; Foley, Samantha S [ORNL; Park, Byung H [ORNL; Bernholdt, David E [ORNL; Bramley, Randall B [ORNL

2011-01-01

148

A Concept for fault tolerant controllers  

DEFF Research Database (Denmark)

This paper describe a concept for fault tolerant controllers (FTC) based on the YJBK (after Youla, Jabr, Bongiorno and Kucera) parameterization. This controller architecture will allow to change the controller on-line in the case of faults in the system. In the described FTC concept, a safe mode controller is applied as the basic feedback controller. A controller for normal operation with high performance is obtained by including certain YJBK parameters (transfer functions) in the controller. This will allow a fast switch from normal operation to safe mode operation in case of critical faults in the system. The described FTC architecture allow the different feedback controllers to apply different sets of sensors and actuators.

Niemann, Hans Henrik; Poulsen, Niels KjØlstad

2009-01-01

149

First steps toward a fault-tolerance Multi-agent Systems  

Directory of Open Access Journals (Sweden)

Full Text Available The presence of errors and faults in systems is still a problem that needs to be handled.This paper tries to examine such a problem in the context of multi-agent systems. Forsolving it we analyze the use of two mechanisms: self-healing and self-organization, inorder to manage fault at agent level as well as at the overall network infrastructure level.In order to realize these we proposed the use of adaptive agents which change state changetheir status according to the current situation and transmit their decisions to other agentsby the means of a gossip communication protocol.

Cristina Amza

2011-01-01

150

Evaluation of Simple Causal Message Logging for Large-Scale Fault Tolerant HPC Systems  

Energy Technology Data Exchange (ETDEWEB)

The era of petascale computing brought machines with hundreds of thousands of processors. The next generation of exascale supercomputers will make available clusters with millions of processors. In those machines, mean time between failures will range from a few minutes to few tens of minutes, making the crash of a processor the common case, instead of a rarity. Parallel applications running on those large machines will need to simultaneously survive crashes and maintain high productivity. To achieve that, fault tolerance techniques will have to go beyond checkpoint/restart, which requires all processors to roll back in case of a failure. Incorporating some form of message logging will provide a framework where only a subset of processors are rolled back after a crash. In this paper, we discuss why a simple causal message logging protocol seems a promising alternative to provide fault tolerance in large supercomputers. As opposed to pessimistic message logging, it has low latency overhead, especially in collective communication operations. Besides, it saves messages when more than one thread is running per processor. Finally, we demonstrate that a simple causal message logging protocol has a faster recovery and a low performance penalty when compared to checkpoint/restart. Running NAS Parallel Benchmarks (CG, MG and BT) on 1024 processors, simple causal message logging has a latency overhead below 5%.

Bronevetsky, G; Meneses, E; Kale, L V

2011-02-25

151

Policy Specification for Non-Local Fault Tolerance in Large Distributed Information Systems  

UK PubMed Central (United Kingdom)

The services provided by critical infrastructure systems are essential to the operation of modernsociety. These systems include the financial payments system, transportation systems, militarycommand and control systems, the electric power grid, and telecommunications systems includingthe Internet. Widespread failure of any of these system might result in severe financial loss or perhapshuman injury. Critical infrastructure systems rely heavily on distributed information systemsfor operation. These information systems must therefore be dependable; that is, they must "deliverservice that can justifiably be trusted."Traditional dependability alone does not provide a rich enough model to deal with the faults inlarge, critical distributed systems operating in hostile environments. These systems require not simplydependability but instead require survivability. Informally, survivability is when a system has"the ability to continue to provide service (possibly degraded or different) in a given environmentwhen various events cause major damage to the system or its operating environment.".

E. Varner; Anita K. Jones; David Evans (committee Chair; Richard W. Miksad (dean

152

Fault tolerant massively parallel processing architecture  

Energy Technology Data Exchange (ETDEWEB)

This paper presents two massively parallel processing architectures suitable for solving a wide variety of algorithms of divide-and-conquer type for problems such as the discrete Fourier transform, production systems, design automation, and others. The first architecture, called the Chain-structured Butterfly ARchitecture (CBAR), consists of a two-dimensional array of N-L . (log/sub 2/(L)+1) processing elements (PE) organized as L levels of log/sub 2/(L)+1 stages, and which has the butterfly connection between PEs in consecutive stages with straight-through feedback between PEs in the last and first stages. This connection system has the desirable property of allowing thousands of PEs to be connected with O(N) connection cost, O(log/sub 2/(N/log/sub 2/N)) communication paths, and a small number (=4) of I/O ports per PE. However, this architecture is not fault tolerant. The authors, therefore, propose a second architecture, called the REconfigurable Chain-structured Butterfly ARchitecture (RECBAR), which is a modified version of the CBAR. The RECBAR possesses all the desirable features of the CBAR, with the number of I/O ports per PE increased to six, and uses O(log/sub 2/N)/N) overhead in PEs and approximately 50% overhead in links to achieve single-level fault tolerance. Reliability improvements of the RECBAR over the CBAR are studied. This paper also presents a distributed diagnostic and structuring algorithm for the RECBAR that enables the architecture to detect faults and structure itself accordingly within 2 . log/sub 2/(L)+1 time steps, thus making it a truly fault tolerant architecture.

Balasubramanian, V.; Banerjee, P.

1987-08-01

153

Fault tolerant control - a residual based set-up  

DEFF Research Database (Denmark)

A new set-up for fault tolerant control (FTC) for stable systems is presented in this paper. The new set-up is based on a simple implementation of the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization. This implementation of the YJBK parameterization will allow a direct and simple reconfiguration of the feedback controller. Another central part of fault tolerant control is fault diagnosis. The controller implementation can be applied directly in connection with both passive diagnosis (PFD) as well as with active fault diagnosis (AFD). The presented FTC set-up is investigated with respect to sensor reconfiguration. Actuator reconfiguration can be dealt with in a similar way.

Niemann, Hans Henrik; Poulsen, Niels KjØlstad

2009-01-01

154

Enhanced Maritime Safety through Diagnosis and Fault Tolerant Control  

DEFF Research Database (Denmark)

Faults in steering, navigation instruments or propulsion machinery are serious on a marine vessel since the consequence could be loss of maneuvering ability, and imply risk of damage to vessel personnel or environment. Early diagnosis and accomodation of faults could enhance safety. Fault-tolerant control is a methodology to help prevent that faults develop into failure. The means include on-line fault diagnosis, automatic condition assessment and calculation of remedial action to avoid hazards. This paper gives an overview of methods to obtain fault-tolerance: fault diagnosis; analysis of properties of a falty system; means to determine remedial actions. The paper illustrates the techniques by two marine examples, sensor fusion for automatic steering and control of the main engine.

Blanke, Mogens

2001-01-01

155

Federated Filter for Fault-Tolerant Integrated Navigation.  

Science.gov (United States)

This paper describes federated filter applications to integrated, fault-tolerant navigation systems. The federated filter is an optimal or near-optimal estimator for decentralized, multisensor data fusion. Its decentralized estimation architecture is base...

N. A. Carlson

1995-01-01

156

An efficient fault-tolerant out-patient order entry system based on special distributed client/server architecture.  

UK PubMed Central (United Kingdom)

An automatic order entry system is very important for processing out-patient information. This system not only helps physicians to enter their orders directly, but can also reduce order communication error and thus improve medical quality. Therefore, many hospitals have high aspirations to generate and implement direct order entry systems, but they are also concerned about the setbacks of system failure. In this paper, we present an effective and efficient fault-tolerant order entry system based on special distribution client/server architecture that satisfies the requirements of out-patient order entry very well. From the experimental results carried out on a prototype, we found that this system can improve the system response time of order entry and can also generate an operational method having a user friendly interface. The physicians can enter their orders easily, accurately, directly, flexibly and at a faster rate by making choices from standardized and personalized menus in this system.

Chuang CT

1998-04-01

157

Concatenated codes for fault tolerant quantum computing  

Energy Technology Data Exchange (ETDEWEB)

The application of concatenated codes to fault tolerant quantum computing is discussed. We have previously shown that for quantum memories and quantum communication, a state can be transmitted with error {epsilon} provided each gate has error at most c{epsilon}. We show how this can be used with Shor`s fault tolerant operations to reduce the accuracy requirements when maintaining states not currently participating in the computation. Viewing Shor`s fault tolerant operations as a method for reducing the error of operations, we give a concatenated implementation which promises to propagate the reduction hierarchically. This has the potential of reducing the accuracy requirements in long computations.

Knill, E.; Laflamme, R.; Zurek, W.

1995-05-01

158

Fault-Tolerant Exact State Transmission  

CERN Document Server

We show that a category of one-dimensional XY-type models may enable high-fidelity quantum state transmissions, regardless of details of coupling configurations. This observation leads to a fault- tolerant design of a state transmission setup. The setup is fault-tolerant, with specified thresholds, against engineering failures of coupling configurations, fabrication imperfections or defects, and even time-dependent noises. We propose the implementation of the fault-tolerant scheme using hard-core bosons in one-dimensional optical lattices.

Wang, Zhao-Ming; Modugno, Michele; Yao, Wang; Shao, Bin

2012-01-01

159

Fault-tolerant logics for FPGA linux  

Energy Technology Data Exchange (ETDEWEB)

The increasing use of SRAM-based reconfigurable architectures at important areas of research and development (like particle accelerators and space applications) brings new, currently partially unattended effects on top. An already well known, but nevertheless important problem of such systems is its susceptibility to radiation which increases in conjunction with particle flux and energy. Regarding to current knowledge, errors induced by Single Event Upsets (SEU) and Single Event Transients (SET) are handled exclusively in hardware by the use of spacial and temporal redundancy features. Our field of research is to extend conventional fault tolerance to multiple layers of embedded computer systems, starting with the FPGA bit layer and ending up in the software application layer to get a maximum of radiation tolerance in systems running FPGA Linux in radiation susceptible environments. Only a collaboration of all these layers is able to create an adequate amount of data security and process integrity.

Gebelein, Jano; Abel, Norbert; Kebschull, Udo [Kirchhoff-Institute for Physics, University of Heidelberg (Germany)

2009-07-01

160

Fault-tolerant logics for FPGA linux  

International Nuclear Information System (INIS)

The increasing use of SRAM-based reconfigurable architectures at important areas of research and development (like particle accelerators and space applications) brings new, currently partially unattended effects on top. An already well known, but nevertheless important problem of such systems is its susceptibility to radiation which increases in conjunction with particle flux and energy. Regarding to current knowledge, errors induced by Single Event Upsets (SEU) and Single Event Transients (SET) are handled exclusively in hardware by the use of spacial and temporal redundancy features. Our field of research is to extend conventional fault tolerance to multiple layers of embedded computer systems, starting with the FPGA bit layer and ending up in the software application layer to get a maximum of radiation tolerance in systems running FPGA Linux in radiation susceptible environments. Only a collaboration of all these layers is able to create an adequate amount of data security and process integrity.

2009-01-01

 
 
 
 
161

An efficient fault-tolerant order entry management information system based on special distributed client/server architecture.  

UK PubMed Central (United Kingdom)

An automatic order entry system is very important for the processing of out-patient information, not only helping doctors to enter their orders directly but also reducing errors of communication. Many hospitals are anxious to set up a direct order entry system but are concerned about possible system failures. In this paper we report on an effective and efficient fault-tolerant order entry management system which satisfies the requirements for out-patient order entry. From the results of experiments on a prototype we found that the system was user friendly and reduced the time taken. Doctors are able to enter their orders more easily, accurately and quickly by selecting from the standardized and personalized menus to be found in the system.

Chuang CT

1998-11-01

162

Systolic Array Fault Tolerance Performance Analysis.  

Science.gov (United States)

The reliability performance of six different systolic array fault tolerance techniques are determined and compared in terms of mean time between failure (MTBF). The six techniques include redundant arrays, companion processors, sequential row elimination ...

T. C. Choinski M. H. Leonhardt

1988-01-01

163

A Fault-Tolerant Duplex Microcontroller Architecture  

Energy Technology Data Exchange (ETDEWEB)

This paper presents a fault-tolerant duplex architecture to build a high-reliability microcontroller using commercial VLSI processors. The architecture supports fail-silence under all single-failure situations and facilitates recovery from transient failures. The paper implements the duplex architecture using two Motorola MC68360 processors and evaluates its fault tolerance in a real application environment. (author). 12 refs., 10 figs., 2 tabs.

Kim, B.J.; Baek, S.S.; Lee, I.H.; Lim, D.J. [Hanyang University, Seoul (Korea)

2002-04-01

164

SEU fault tolerance in artificial neural networks  

Energy Technology Data Exchange (ETDEWEB)

In this paper the authors investigate the robustness of Artificial Neural Networks when encountering transient modification of information bits related to the network operation. These kinds of faults are likely to occur as a consequence of interaction with radiation. Results of tests performed to evaluate the fault tolerance properties of two different digital neural circuits are presented.

Velazco, R.; Assoum, A.; Radi, N.E. [Lab. de Genie Informatique, Grenoble (France); Ecoffet, R. [Centre National d`Etudes Spatiales, Toulouse (France); Botey, X. [Univ. Politecnica de Catalunya, Barcelona (Spain)

1995-12-01

165

Virtualization-Based Fault-Tolerance  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This thesis presents an evaluation of the Virtualization-Based Fault Tolerance solutions and approaches available from different vendors. The performance of a SIP (Session Initiation Protocol) based application among “native environment”, “virtual environment” and “virtual environment with fault-to...

Devinani, Jagan

166

On Fault Tolerance of Resources in Computational Grids  

Directory of Open Access Journals (Sweden)

Full Text Available Grid computing or computational grid is always a vast research field in academic, as well as in industryalso. Computational grid provides resource sharing through multi-institutional virtual organizations fordynamic problem solving. Various heterogeneous resources of different administrative domain are virtuallydistributed through different network in computational grids. Thus any type of failure can occur at anypoint of time and job running in grid environment might fail. Hence fault tolerance is an important andchallenging issue in grid computing as the dependability of individual grid resources may not beguaranteed. In order to make computational grids more effective and reliable fault tolerant system isnecessary. The objective of this paper is to review different existing fault tolerance techniques applicable ingrid computing. This paper presents state of the art of various fault tolerance technique and comparativestudy of the existing algorithms.

Arindam Das; Ajanta De Sarkar

2012-01-01

167

Sensitivity Analysis of Unavailability of a Component in DPS with Various Fault-Tolerant Techniques  

International Nuclear Information System (INIS)

With the improvement of digital technologies, digital protection system (DPS) has more multiple sophisticated fault-tolerant techniques (FTTs), in order to increase fault detection and to help the system safely perform the required functions in spite of the possible presence of faults. In the reliability evaluation of digital systems, fault-tolerant techniques (FTTs) and their fault coverage must be considered. Fault detection coverage is crucial factor of FTT in reliability. However, the fault detection coverage is not enough to reflect the effects of various FTTs in reliability model. Thus, integrated fault coverage is suggested to reflect characteristics of FTTs

2012-01-01

168

Sensitivity Analysis of Unavailability of a Component in DPS with Various Fault-Tolerant Techniques  

Energy Technology Data Exchange (ETDEWEB)

With the improvement of digital technologies, digital protection system (DPS) has more multiple sophisticated fault-tolerant techniques (FTTs), in order to increase fault detection and to help the system safely perform the required functions in spite of the possible presence of faults. In the reliability evaluation of digital systems, fault-tolerant techniques (FTTs) and their fault coverage must be considered. Fault detection coverage is crucial factor of FTT in reliability. However, the fault detection coverage is not enough to reflect the effects of various FTTs in reliability model. Thus, integrated fault coverage is suggested to reflect characteristics of FTTs

Kim, Bo Gyung; Kang, Hyun Gook; Kim, Hee Eun; Seong, Poong Hyun [Korea Advanced Institute of Science and Technology, Daejeon (Korea, Republic of); Lee, Seung Jun [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

2012-05-15

169

A continuous-time semi-markov bayesian belief network model for availability measure estimation of fault tolerant systems  

Scientific Electronic Library Online (English)

Full Text Available Abstract in portuguese Neste trabalho, é proposto um modelo baseado na integração entre processos semi-Markovianos e redes Bayesianas para avaliação da disponibilidade de sistemas tolerantes à falha. Esta integração resulta em um modelo estocástico híbrido o qual é capaz de representar as características dinâmicas de um sistema assim como tratar as relações de causa e efeito entre fatores externos tais como condições ambientais e operacionais. Além disso, o modelo híbrido per (more) mite avaliar a propagação de incerteza sobre a disponibilidade do sistema. É também proposto um procedimento numérico para a solução das equações de probabilidade de estado de processos semi-Markovianos descritos por taxas de transição. Tal procedimento numérico é baseado na aplicação de transformadas de Laplace que são invertidas pelo método de quadratura Gaussiana conhecido como Gauss Legendre. O modelo híbrido e procedimento numérico são ilustrados por meio de um exemplo de aplicação no contexto de sistemas tolerantes à falha. Abstract in english In this work it is proposed a model for the assessment of availability measure of fault tolerant systems based on the integration of continuous time semi-Markov processes and Bayesian belief networks. This integration results in a hybrid stochastic model that is able to represent the dynamic characteristics of a system as well as to deal with cause-effect relationships among external factors such as environmental and operational conditions. The hybrid model also allows fo (more) r uncertainty propagation on the system availability. It is also proposed a numerical procedure for the solution of the state probability equations of semi-Markov processes described in terms of transition rates. The numerical procedure is based on the application of Laplace transforms that are inverted by the Gauss quadrature method known as Gauss Legendre. The hybrid model and numerical procedure are illustrated by means of an example of application in the context of fault tolerant systems.

Moura, Márcio das Chagas; Droguett, Enrique López

2008-08-01

170

A continuous-time semi-markov bayesian belief network model for availability measure estimation of fault tolerant systems  

Directory of Open Access Journals (Sweden)

Full Text Available In this work it is proposed a model for the assessment of availability measure of fault tolerant systems based on the integration of continuous time semi-Markov processes and Bayesian belief networks. This integration results in a hybrid stochastic model that is able to represent the dynamic characteristics of a system as well as to deal with cause-effect relationships among external factors such as environmental and operational conditions. The hybrid model also allows for uncertainty propagation on the system availability. It is also proposed a numerical procedure for the solution of the state probability equations of semi-Markov processes described in terms of transition rates. The numerical procedure is based on the application of Laplace transforms that are inverted by the Gauss quadrature method known as Gauss Legendre. The hybrid model and numerical procedure are illustrated by means of an example of application in the context of fault tolerant systems.Neste trabalho, é proposto um modelo baseado na integração entre processos semi-Markovianos e redes Bayesianas para avaliação da disponibilidade de sistemas tolerantes à falha. Esta integração resulta em um modelo estocástico híbrido o qual é capaz de representar as características dinâmicas de um sistema assim como tratar as relações de causa e efeito entre fatores externos tais como condições ambientais e operacionais. Além disso, o modelo híbrido permite avaliar a propagação de incerteza sobre a disponibilidade do sistema. É também proposto um procedimento numérico para a solução das equações de probabilidade de estado de processos semi-Markovianos descritos por taxas de transição. Tal procedimento numérico é baseado na aplicação de transformadas de Laplace que são invertidas pelo método de quadratura Gaussiana conhecido como Gauss Legendre. O modelo híbrido e procedimento numérico são ilustrados por meio de um exemplo de aplicação no contexto de sistemas tolerantes à falha.

Márcio das Chagas Moura; Enrique López Droguett

2008-01-01

171

Fault Tolerance in Grid Computing Using WADE  

Directory of Open Access Journals (Sweden)

Full Text Available Grid computing is a coordinated resource sharing and solving the problems in organizations which are dynamic and virtual in nature. Apart from the dynamic nature of grids which means that resources may enter and leave the grid at any time, in many cases outside of the applications, control grid resources are also heterogeneous in nature. Many grid applications will be running in environments where interaction faults are more commonly occur between diverse grid nodes. As resources may also be used outside of organizational boundaries, it becomes iteratively difficult to guarantee that a resource being used is not malicious one. Because of the diverse faults and failure conditions developing, deploying, and executing long running applications over the grid remains a challenge. Hence fault tolerance is an primary factor for grid computing. The prototype system is designed using agents to provide service replication, reactivation and avoids the single point of failure. The agents and the workflows are provided by a common software platform called WADE.

Gangineni Veeranjaneyulu

2012-01-01

172

SABRE: a bio-inspired fault-tolerant electronic architecture.  

Science.gov (United States)

As electronic devices become increasingly complex, ensuring their reliable, fault-free operation is becoming correspondingly more challenging. It can be observed that, in spite of their complexity, biological systems are highly reliable and fault tolerant. Hence, we are motivated to take inspiration for biological systems in the design of electronic ones. In SABRE (self-healing cellular architectures for biologically inspired highly reliable electronic systems), we have designed a bio-inspired fault-tolerant hierarchical architecture for this purpose. As in biology, the foundation for the whole system is cellular in nature, with each cell able to detect faults in its operation and trigger intra-cellular or extra-cellular repair as required. At the next level in the hierarchy, arrays of cells are configured and controlled as function units in a transport triggered architecture (TTA), which is able to perform partial-dynamic reconfiguration to rectify problems that cannot be solved at the cellular level. Each TTA is, in turn, part of a larger multi-processor system which employs coarser grain reconfiguration to tolerate faults that cause a processor to fail. In this paper, we describe the details of operation of each layer of the SABRE hierarchy, and how these layers interact to provide a high systemic level of fault tolerance. PMID:23302298

Bremner, P; Liu, Y; Samie, M; Dragffy, G; Pipe, A G; Tempesti, G; Timmis, J; Tyrrell, A M

2013-01-09

173

SABRE: a bio-inspired fault-tolerant electronic architecture.  

UK PubMed Central (United Kingdom)

As electronic devices become increasingly complex, ensuring their reliable, fault-free operation is becoming correspondingly more challenging. It can be observed that, in spite of their complexity, biological systems are highly reliable and fault tolerant. Hence, we are motivated to take inspiration for biological systems in the design of electronic ones. In SABRE (self-healing cellular architectures for biologically inspired highly reliable electronic systems), we have designed a bio-inspired fault-tolerant hierarchical architecture for this purpose. As in biology, the foundation for the whole system is cellular in nature, with each cell able to detect faults in its operation and trigger intra-cellular or extra-cellular repair as required. At the next level in the hierarchy, arrays of cells are configured and controlled as function units in a transport triggered architecture (TTA), which is able to perform partial-dynamic reconfiguration to rectify problems that cannot be solved at the cellular level. Each TTA is, in turn, part of a larger multi-processor system which employs coarser grain reconfiguration to tolerate faults that cause a processor to fail. In this paper, we describe the details of operation of each layer of the SABRE hierarchy, and how these layers interact to provide a high systemic level of fault tolerance.

Bremner P; Liu Y; Samie M; Dragffy G; Pipe AG; Tempesti G; Timmis J; Tyrrell AM

2013-03-01

174

Hybrid Fault Tolerant Peer to Peer Video Streaming Architecture  

Science.gov (United States)

In this paper, we propose a fault tolerant hybrid p2p-CDN video streaming arhitecture to overcome the problems caused by peer behavior in peer-to-peer (P2P) video streaming systems. Although there are several studies modeling and analytically investigating peer behaviors in P2P video streaming systems, they do not come up with a solution to guarantee the required Quality of the Services (QoS). Therefore, in this study a hybrid geographical location-time and interest based clustering algorithm is proposed to improve the success ratio and reduce the delivery time of required content. A Hybrid Fault Tolerant Video Streaming System (HFTS) over P2P networks conforming the required QoS and Fault Tolerance is also offered. The simulations indicate that the required QoS can be achieved in streaming video applications using the proposed hybrid approach.

Öztoprak, Kasim; Akar, Gözde Bozdagi

175

Fault Tolerant Heterogeneous Limited Duplication Scheduling algorithm for Decentralized Grid  

Directory of Open Access Journals (Sweden)

Full Text Available Fault tolerance is one of the most desirable property in decentralized grid computing systems, where computational resources are geographically distributed. These resources collaborate in order to execute workflow applications as fast as possible. In workflow applications, tasks are dependent on each other, so it becomes extremely vital that scheduling techniques should also have some decentralized fault tolerant mechanism. In this paper, we have proposed a decentralized fault tolerant mechanism which utilize the checkpoint concept; for Heterogeneous Limited Duplication (HLD) algorithm. HLD is based on task duplication scheduling in heterogeneous environment. There are two fold benefits firstly; if node failure occurs then rest of grid nodes sustain the execution of application. Secondly, less makespan of application is obtained using checkpoint concept. Therefore, application scheduled over decentralized grid systems (which are known for their unreliable behavior) will yield results fast utilizing algorithm proposed in this paper.

DR. NITIN; Neha Agarwal; Piyush Chauhan

2013-01-01

176

Fault tolerant issues in the BTeV trigger  

Energy Technology Data Exchange (ETDEWEB)

The BTeV trigger performs sophisticated computations using large ensembles of FPGAs, DSPs, and conventional microprocessors. This system will have between 5,000 and 10,000 computing elements and many networks and data switches. While much attention has been devoted to developing efficient algorithms, the need for fault-tolerant, fault-adaptive, and flexible techniques and software to manage this huge computing platform has been identified as one of the most challenging aspects of this project. They describe the problem and offer an approach to solving it based on a distributed, hierarchical fault management system.

Jeffrey A. Appel et al.

2002-12-03

177

Interactive animation of fault-tolerant parallel algorithms  

Energy Technology Data Exchange (ETDEWEB)

Animation of algorithms makes understanding them intuitively easier. This paper describes the software tool Raft (Robust Animator of Fault Tolerant Algorithms). The Raft system allows the user to animate a number of parallel algorithms which achieve fault tolerant execution. In particular, we use it to illustrate the key Write-All problem. It has an extensive user-interface which allows a choice of the number of processors, the number of elements in the Write-All array, and the adversary to control the processor failures. The novelty of the system is that the interface allows the user to create new on-line adversaries as the algorithm executes.

Apgar, S.W.

1992-02-01

178

Fault-tolerant computing: Theory and technique, Volume I  

Energy Technology Data Exchange (ETDEWEB)

This book presents papers discussing the aspects of making computer applications systems reliable - fault diagnosis and fault tolerance. It provides a current perspective of the testing and test generation area and an overview of the basic theoretical issues presented. The book includes test generation algorithms such as PODEM, functional testing and random testing. Techniques and issues involved in the design for testability area are addressed. The theory and application of error correcting codes at the subsystem level are covered.

Pradhan, D.K.

1986-01-01

179

Design Approach for Fault Tolerance in FPGA Architecture  

Directory of Open Access Journals (Sweden)

Full Text Available Failures of nano-metric technologies owing to defects and shrinking process tolerances give rise tosignificant challenges for IC testing. In recent years the application space of reconfigurable devices hasgrown to include many platforms with a strong need for fault tolerance. While these systems frequentlycontain hardware redundancy to allow for continued operation in the presence of operational faults, theneed to recover faulty hardware and return it to full functionality quickly and efficiently is great. Inaddition to providing functional density, FPGAs provide a level of fault tolerance generally not found inmask-programmable devices by including the capability to reconfigure around operational faults in thefield. Reliability and process variability are serious issues for FPGAs in the future. With advancement inprocess technology, the feature size is decreasing which leads to higher defect densities, moresophisticated techniques at increased costs are required to avoid defects. If nano-technology fabricationare applied the yield may go down to zero as avoiding defect during fabrication will not be a feasibleoption Hence, feature architecture have to be defect tolerant. In regular structure like FPGA, redundancyis commonly used for fault tolerance. In this work we present a solution in which configuration bit-streamof FPGA is modified by a hardware controller that is present on the chip itself. The technique usesredundant device for replacing faulty device and increases the yield.

Ms. Shweta S. Meshram; Ujwala A. Belorkar

2011-01-01

180

Fault-tolerant holonomic quantum computation  

CERN Multimedia

We explain how to combine holonomic quantum computation (HQC) with fault tolerant quantum error correction. This establishes the scalability of HQC, putting it on equal footing with other models of computation, while retaining the inherent robustness the method derives from its geometric nature.

Oreshkov, Ognyan; Lidar, Daniel A

2008-01-01

 
 
 
 
181

Fault-Tolerant Partial Replication in Large-Scale Database Systems  

CERN Document Server

We investigate a decentralised approach to committing transactions in a replicated database, under partial replication. Previous protocols either reexecute transactions entirely and/or compute a total order of transactions. In contrast, ours applies update values, and orders only conflicting transactions. It results that transactions execute faster, and distributed databases commit in small committees. Both effects contribute to preserve scalability as the number of databases and transactions increase. Our algorithm ensures serializability, and is live and safe in spite of faults.

Sutra, Pierre

2008-01-01

182

A modified NARMAX model-based self-tuner with fault tolerance for unknown nonlinear stochastic hybrid systems with an input-output direct feed-through term.  

UK PubMed Central (United Kingdom)

A modified nonlinear autoregressive moving average with exogenous inputs (NARMAX) model-based state-space self-tuner with fault tolerance is proposed in this paper for the unknown nonlinear stochastic hybrid system with a direct transmission matrix from input to output. Through the off-line observer/Kalman filter identification method, one has a good initial guess of modified NARMAX model to reduce the on-line system identification process time. Then, based on the modified NARMAX-based system identification, a corresponding adaptive digital control scheme is presented for the unknown continuous-time nonlinear system, with an input-output direct transmission term, which also has measurement and system noises and inaccessible system states. Besides, an effective state space self-turner with fault tolerance scheme is presented for the unknown multivariable stochastic system. A quantitative criterion is suggested by comparing the innovation process error estimated by the Kalman filter estimation algorithm, so that a weighting matrix resetting technique by adjusting and resetting the covariance matrices of parameter estimate obtained by the Kalman filter estimation algorithm is utilized to achieve the parameter estimation for faulty system recovery. Consequently, the proposed method can effectively cope with partially abrupt and/or gradual system faults and input failures by the fault detection.

Tsai JS; Hsu WT; Lin LG; Guo SM; Tann JW

2013-09-01

183

Aspect-Oriented Approach for the Improvement of the Reliability and Time Performance of a Fault-Tolerant System  

Directory of Open Access Journals (Sweden)

Full Text Available The principle of separation of concerns is a basis element in the software engineering and allows for the division of properties, becoming smaller each time, so as to master their complexity, from the design phase to achievement phase. This paper proposes the probabilistic assessment of critical fault-tolerant programmed systems to improve reliability and availability of an embedded system. In addition, to improve their response time, we use separation of concerns approach, functional (behavior) and non-functional (control). This phase will be achieved by developing a simulator based on aspect-oriented programming (AspectJ). The main objective is to show the impact of this separation on the response time when a hardware architecture of a processor executes instructions and routines of a software application correctly. The probabilistic assessment is based on the failure rate of software instructions executed on hardware architecture of a stack processor whose choice will be justified. The failures considered in this work are the basis of a study of decomposition and refinement carried out by the NFR Framework. As a result, this work has treated the issue of hardware / software interaction in programmed critical systems and the improvement of execution time.

Khalid Bouragba; Hicham Belhadaoui; Mohammed Ouzzif; Mounir Rifi

2011-01-01

184

Robustness and fault tolerance make brains harder to study  

Directory of Open Access Journals (Sweden)

Full Text Available Abstract Brains increase the survival value of organisms by being robust and fault tolerant. That is, brain circuits continue to operate as the organism needs, even when the circuit properties are significantly perturbed. Kispersky and colleagues, in a recent paper in Neural Systems & Circuits, have found that Granger Causality analysis, an important method used to infer circuit connections from the behavior of neurons within the circuit, is defeated by the mechanisms that give rise to this robustness and fault tolerance. See research article: http://www.neuralsystemsandcircuits.com/content/1/1/9/abstract

Srinivasan Shyam; Stevens Charles F

2011-01-01

185

Graceful fault tolerance in large networks of microcomputers  

Energy Technology Data Exchange (ETDEWEB)

This work considers the problem of fault diagnosis in a network of distributed multicomputers, and a strategy for repeated reconfiguration is presented in detail to help improve the degree of fault tolerance. The overall system diagnosability is shown to be enhanced further by constructing a large network with small well-known graphs as its basis and then applying reconfiguration techniques locally in various system partitions and exchanging diagnostic information globally. A detailed description of this new attractive approach is presented along with the diagnostic algorithm suitable for large networks of microcomputers in VLSI based distributed systems. A systematic procedure for defining near-optimal fault-tolerance graph theoretic networks is investigated which is well suited for multicomputer structures. A distributed algorithm along with a new system diagnostic theory is proposed.

Agrawal, B.K.

1984-01-01

186

Approach to Modeling of Fault-Tolerant Techniques using Fault Tree  

International Nuclear Information System (INIS)

Recently, the reactor protection system (RPS) based analog I and C system in nuclear power plants (NPPs) has been replaced with digital based I and C system. Because of replacement with analog to digital system, the development of a methodology for the probabilistic safety assessment (PSA) of digital system is an important issue. The digital plant protection system (DPPS) has four identical safety channel cabinet, and it has diversity, dual/triple structure, and enhanced automatic system functions. Since the DPPS uses complex and heterogeneous components, the DPPS should have automatic system functions such as various fault tolerant techniques for high availability and reliability. Therefore, it is necessary to evaluate the relative effects of fault tolerant techniques in DPPS using PSA techniques such as fault tree analysis

2011-01-01

187

Control switching in high performance and fault tolerant control  

DEFF Research Database (Denmark)

The problem of reliability in high performance control and in fault tolerant control is considered in this paper. A feedback controller architecture for high performance and fault tolerance is considered. The architecture is based on the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization. By using the nominal controller in the architecture as a simple and robust controller, it is possible to use the YJBK transfer function for optimization of the closed-loop performance. This can be done both in connections with normal operation of the system as well as in connection with faults in the system. The architecture will also allow changing the applied sensors and/or actuators when switching between different controllers. This switchingget particular simple for open-loop stable systems.

Niemann, Hans Henrik; Poulsen, Niels KjØlstad

2010-01-01

188

Fault-tolerant quantum computation  

Energy Technology Data Exchange (ETDEWEB)

It has recently been realized that use of the properties of quantum mechanics might speed up certain computations dramatically. Interest in quantum computation has since been growing. One of the main difficulties in realizing quantum computation is that decoherence tends to destroy the information in a superposition of states in a quantum computer, making long computations impossible. A further difficulty is that inaccuracies in quantum state transformations throughout the computation accumulate, rendering long computations unreliable. However, these obstacles may not be as formidable as originally believed. For any quantum computation with t gates, we show how to build a polynomial size quantum circuit that tolerates O(1/log{sup c}t) amounts of inaccuracy and decoherence per gate, for some constant c; the previous bound was O(1/t). We do this by showing that operations can be performed on quantum data encoded by quantum error-correcting codes without decoding this data.

Shor, P.W. [AT& T Research, Murray Hill, NJ (United States)

1996-12-31

189

Quantum Control and Fault-tolerance  

Science.gov (United States)

Quantum control (QC) and the methods of fault-tolerant quantum computing (FTQC) are two of the cornerstones on which the hope for a quantum computer rests. However QC methods do not generally scale well with the size of the system, and it is not known how their performance is hindered when integration with FTQC methods, especially considering these demand a large system size overhead, is attempted under realistic noise models. Here we study this problem using dynamical decoupling in the bang-bang limit as a toy model, with a non-Markovian noise where interactions decay with distance, and show that there exists a regime of the norms of the relevant Hamiltonians, in which dynamical decoupling protected gates provide an advantage over the bare gate implementation. This is a first step towards showing that QC protocols designed for a small set of qubits can be extended to larger sets without a significant loss of performance, as long as the noise model behaves reasonably well.

Paz Silva, Gerardo; Dominy, Jason; Lidar, Daniel

2013-03-01

190

Fault tolerant UAV`s are coming; Fault tolerant mujinki jidai no torai  

Energy Technology Data Exchange (ETDEWEB)

This paper explains a concept of UAV (unmanned aviation vehicle). Previous UAV`s have achieved success because of their simple system and simple operation. However, for future UAV`s, higher reliability and safety than those of ordinary aircraft are strongly required with a rise in expectation for mission to be executed. In other words, future UAV`s should aim at a fault tolerant system featured by autonomous operation and less than 10{sup -9} fault/hour reliability. Recently ordinary aircraft also came to adopt auto-sequence control for flight control systems to achieve considerably high programmed automatic control from takeoff to landing. A UAV with an autonomous operation function possible to return to a base was also developed. A system reliability of a 10{sup -9} level against flight critical phenomena is required for ordinary commercial aircraft. It is supposed that a reliability equivalent to or more than the above reliability will be required for UAV`s as system design requirement in the near future. (NEDO)

Sumita, J.

1999-06-05

191

Improving Fault Tolerance in Ad-Hoc Networks by Using Residue Number System  

Directory of Open Access Journals (Sweden)

Full Text Available In this study, we presented a method for distributing data storage by using residue number system for mobile systems and wireless networks based on peer to peer paradigm. Generally, redundant residue number system is capable in error detection and correction. In proposed method, we made a new system by mixing Redundant Residue Number System (RRNS), Multi Level Residue Number System (ML RNS) and Multiple Valued Logic (MVL RNS) which was perfect for parallel, carry free, high speed arithmetic and the system supports secure data communication. In addition it had ability of error detection and correction. In comparison to other number systems, it had many improvements in data security, error detection and correction, speed of storage and retrieval.

A. Barati; M. Dehghan; A. Movaghar; H. Barati

2008-01-01

192

Fault Tolerant Weighted Voting Algorithms  

Directory of Open Access Journals (Sweden)

Full Text Available Computer networks are now necessities of modern organisations and network security has become a major concern for them. In this paper we have proposed a holistic approach to network security with a hybrid model that includes an Intrusion Detection System (IDS) to detect network attacks and a survivability model to assess the impacts of undetected attacks. A neural network-based IDS has been proposed, where the learning mechanism for the neural network is evolved using genetic algorithm. Then the case where an attack evades the IDS and takes the system into a compromised state is discussed. We propose a stochastic model which enables us to do a cost/benefit analysis for systems security. This integrated approach allows systems managers to make more informed decisions regarding both intrusion detection and system protection.

Azad Azadmanesh; Alireza Farahani; Lotfi Najjar

2008-01-01

193

Cooperative Fault Tolerant Distributed Computing  

Energy Technology Data Exchange (ETDEWEB)

HARNESS was proposed as a system that combined the best of emerging technologies found in current distributed computing research and commercial products into a very flexible, dynamically adaptable framework that could be used by applications to allow them to evolve and better handle their execution environment. The HARNESS system was designed using the considerable experience from previous projects such as PVM, MPI, IceT and Cumulvs. As such, the system was designed to avoid any of the common problems found with using these current systems, such as no single point of failure, ability to survive machine, node and software failures. Additional features included improved inter-component connectivity, with full support for dynamic down loading of addition components at run-time thus reducing the stress on application developers to build in all the libraries they need in advance.

Fagg, Graham E.

2006-03-15

194

Fault-tolerant battery system employing intra-battery network architecture  

Energy Technology Data Exchange (ETDEWEB)

A distributed energy storing system employing a communications network is disclosed. A distributed battery system includes a number of energy storing modules, each of which includes a processor and communications interface. In a network mode of operation, a battery computer communicates with each of the module processors over an intra-battery network and cooperates with individual module processors to coordinate module monitoring and control operations. The battery computer monitors a number of battery and module conditions, including the potential and current state of the battery and individual modules, and the conditions of the battery's thermal management system. An over-discharge protection system, equalization adjustment system, and communications system are also controlled by the battery computer. The battery computer logs and reports various status data on battery level conditions which may be reported to a separate system platform computer. A module transitions to a stand-alone mode of operation if the module detects an absence of communication connectivity with the battery computer. A module which operates in a stand-alone mode performs various monitoring and control functions locally within the module to ensure safe and continued operation.

Hagen, Ronald A. (Stillwater, MN); Chen, Kenneth W. (Fair Oaks, CA); Comte, Christophe (Montreal, CA); Knudson, Orlin B. (Vadnais Heights, MN); Rouillard, Jean (Saint-Luc, CA)

2000-01-01

195

Fault Detection for Shipboard Monitoring and Decision Support Systems  

DEFF Research Database (Denmark)

In this paper a basic idea of a fault-tolerant monitoring and decision support system will be explained. Fault detection is an important part of the fault-tolerant design for in-service monitoring and decision support systems for ships. In the paper, a virtual example of fault detection will be presented for a containership with a real decision support system onboard. All possible faults can be simulated and detected using residuals and the generalized likelihood ratio (GLR) algorithm.

Lajic, Zoran; Nielsen, Ulrik Dam

2009-01-01

196

Fault tolerant microcomputer based alarm annunciator for Dhruva reactor  

International Nuclear Information System (INIS)

The Dhruva alarm annunciator displays the status of 624 alarm points on an array of display windows using the standard ringback sequence. Recognizing the need for a very high availability, the system is implemented as a fault tolerant configuration. The annunciator is partitioned into three identical units; each unit is implemented using two microcomputers wired in a hot standby mode. In the event of one computer malfunctioning, the standby computer takes over control in a bouncefree transfer. The use of microprocessors has helped built-in flexibility in the system. The system also provides built-in capability to resolve the sequence of occurrence of events and conveys this information to another system for display on a CRT. This report describes the system features, fault tolerant organisation used and the hardware and software developed for the annunciation function. (author). 8 figs

1988-01-01

197

Reconfiguration-Based Fault Tolerant Control of Dynamical Systems: A Control Reallocation Approach  

Science.gov (United States)

In this paper, the problem of control reconfiguration in the presence of actuator failure preserving the nominal controller is addressed. In the actuator failure condition, the processing algorithm of the control signal should be adapted in order to re-achieve the desired performance of the control loop. To do so, the so-called reconfiguration block, is inserted into the control loop to reallocate nominal control signals among the remaining healthy actuators. This block can be either a constant mapping or a dynamical system. In both cases, it should be designed so that the states or output of the system are fully recovered. All these situations are completely analysed in this paper using a novel structural approach leading to some theorems which are supported in each section by appropriate simulations.

Moradi Amani, Ali; Afshar, Ahmad; Menhaj, Mohammad Bagher

198

Communication and Agreement Abstractions for Fault-Tolerant Asynchronous Distributed Systems  

CERN Multimedia

Understanding distributed computing is not an easy task. This is due to the many facets of uncertainty one has to cope with and master in order to produce correct distributed software. Considering the uncertainty created by asynchrony and process crash failures in the context of message-passing systems, the book focuses on the main abstractions that one has to understand and master in order to be able to produce software with guaranteed properties. These fundamental abstractions are communication abstractions that allow the processes to communicate consistently (namely the register abstraction

Raynal, Michel

2010-01-01

199

ACID Support and Fault-Tolerant Database Systems on Cloud:A Review  

Directory of Open Access Journals (Sweden)

Full Text Available Cloud computing represents a different way to architect and remotely manage computing resources. One has only to establish an account with Microsoft or Amazon or Google to begin building and deploying application systems into a cloud. These systems can be, but certainly are not restricted to being simplistic. Some applications requires http services, some requires relational database or might require web service infrastructure and message queues. With clouds, IT-related applications can be provided as a service, which can be accessed through internet. There are platforms on cloud which provide scalability and high availability properties for web applications but there are problems related to data consistency at the same time, and in case of server failures, it becomes major problem in applications related to payment services. Data needs to be properly managed in cloud environment and to achieve proper transaction processing and consistency, RDBMS techniques such as ACID transactions should be used. Web services in Azure ensure application availability by replicating stored data at least three times and offer optional geolocation of replicas in separate Microsoft data centres to provide disaster recovery services.Azure storage services provide scalable persistent storage of structured tables, blobs and queues.

Pratiyush Guleria

2012-01-01

200

A High Performance Protocol for Fault Tolerant Distributed Shared Memory (FaTP)  

Directory of Open Access Journals (Sweden)

Full Text Available In distributed environments, runtime failures often occur. If the distributed system has the ability to handle such failures dynamically (within runtime), it is said to be fault tolerant. Such systems suffer from the problem of being slow if compared to other non-fault tolerant systems. Moreover, if the system is based on a Distributed Shared Memory (DSM) in exchanging data among the distributed application members, then it is going to be slower and may be inefficient. In this study, a generic DSM based Fault Tolerance Protocol (FaTP) is introduced. FaTP is a high performance fault tolerance protocol. The proposed protocol is based on the Linda Tuple space DSM model. It introduces a compact set of DSM access primitives and supplied with a fault tolerance layer based on dynamic replication. The complexity of FaTP has been measured and its performance has been evaluated.

Mutasem Alsmadi; Usama A. Badawi; Hosam E. Reffat

2013-01-01

 
 
 
 
201

Low-cost Fault-tolerance in Barrier Synchronizations  

UK PubMed Central (United Kingdom)

In this paper, we show how fault-tolerance can be effectively added to severaltypes of faults in program computations that use barrier synchronization. Wedivide the faults that occur in practice into two classes, detectable and undetectable,and design a fully distributed program that tolerates the faults in both classes. Ourprogram guarantees that every barrier is executed correctly even if detectable faultsoccur, and that eventually every barrier is executed correctly even if undetectablefaults occur. Via analytical as well as simulation results we show that the costof adding fault-tolerance is low, in part by comparing the times required by ourprogram with that required by the corresponding fault-intolerant counterpart.Keywords: fault-tolerance, multitolerance, detectable and undetectable faults,synchronization, concurrency.1Email: fkulkarni,anishg@cis.ohio-state.edu; Web: http://www.cis.ohio-state.edu/f~ kulkarni,~anish g. Researchsupported in part by NSF Grant CCR-93...

Sandeep S. Kulkarni; Anish Arora

202

Low-cost Fault-tolerance in Barrier Synchronizations  

UK PubMed Central (United Kingdom)

In this paper, we show how fault-tolerance can be effectively added to severaltypes of faults in program computations that use barrier synchronization. Wedivide the faults that occur in practice into two classes, detectable and undetectable,and design a fully distributed program that tolerates the faults in both classes. Ourprogram guarantees that every barrier is executed correctly even if detectable faultsoccur, and that eventually every barrier is executed correctly even if undetectablefaults occur. Via analytical as well as simulation results we show that the costof adding fault-tolerance is low, in part by comparing the times required by ourprogram with that required by the corresponding fault-intolerant counterpart.Keywords: fault-tolerance, multitolerance, detectable and undetectable faults,synchronization, concurrency.1Email: fkulkarni,anishg@cis.ohio-state.edu; Web: http://www.cis.ohio-state.edu/f~ kulkarni,~anish g. Researchsupported in part by NSF Gr...

Sandeep S. Kulkarni; Anish Arora

203

Diagnosis and Fault-tolerant Control for Ship Station Keeping  

DEFF Research Database (Denmark)

This paper adresses the design process of diagnosis and fault-tolerant control when the a system should operate despite multiple failures in sensors or actuators. Graph-teory based analysis of systems structure is demonstrated to be a unique design methodology that can cope with the diagnosis design for systems of high complexity, and also analyse the cases of cascaded or multiple faults. The paper takes as example a ship with two CP propellers, rudders and a bow thruster as actuators, and instrumentation with a suite of global position sensors, inertial navigation units and conventional gyro units to provide ship motion information. A salient feature of the design mehod is the ability to analyse cases where faults have occurrred and easily determine where in the faulty system diagnosability and controlability are retained.

Blanke, Mogens

2005-01-01

204

Universal Fault-Tolerant Computation on Decoherence-Free Subspaces  

CERN Multimedia

A general scheme to perform universal quantum computation fault-tolerantly within decoherence-free subspaces (DFSs) of a system's Hilbert space is derived. This scheme leads to the first fault-tolerant realization of universal quantum computation on DFSs with the properties that (i) only one- and two-qubit interactions are required, and (ii) the system remains within the DFS throughout the entire implementation of a quantum gate. We show explicitly how to perform universal computation on clusters of the four-qubit DFS encoding one logical qubit each under "collective decoherence" (qubit-permutation-invariant system-bath coupling). Our results have immediate relevance to a number of proposed quantum computer implementations, in particular those in which the internal system Hamiltonian is of the Heisenberg type, such as spin-spin coupled quantum dots.

Bacon, D J; Lidar, D A; Whaley, K B

2000-01-01

205

Byzantine Fault Tolerance for Nondeterministic Applications  

CERN Multimedia

All practical applications contain some degree of nondeterminism. When such applications are replicated to achieve Byzantine fault tolerance (BFT), their nondeterministic operations must be sanitized to ensure replica consistency. To the best of our knowledge, only two types of replica nondeterminism have been studied under the Byzantine fault model, which we refer to as wrappable nondeterminism and verifiable pre-determinable nondeterminism. The wrappable nondeterminism is a type of nondeterminism that can be controlled using an infrastructure-provided or application-provided wrapper function, without explicit inter-replica coordination. For example, information such as hostnames, process ids, file descriptors, etc. can be determined group-wise. The verifiable pre-determinable nondeterminism is a type of nondeterminism whose values can be independently chosen by the primary replica and verified by other replicas prior to the execution of a client's request, such as the operation to retrieve the local clock v...

Zhao, W

2007-01-01

206

Fabrication of fault-tolerant systolic array processors  

Energy Technology Data Exchange (ETDEWEB)

Methods for designing fault-tolerant systolic array processors are discussed. Several ways of bypassing faulty elements in configurations, which depend on an input-data flow organization, are suggested. An analysis of the additional hardware costs of providing fault tolerance by various techniques and for various levels of redundancy is presented. Hadamard fault-tolerant processor design was used to illustrate the efficiency of the techniques suggested.

Golovko, V.A. [Brest Polytechnical Institute (Belarus)

1995-05-01

207

Fault Tolerant Algorithms for Network-On-Chip Interconnect  

UK PubMed Central (United Kingdom)

As technology scales, fault tolerance is becoming a keyconcern in on-chip communication. Consequently, thiswork examines fault tolerant communication algorithms foruse in the NoC domain. Two different flooding algorithmsand a random walk algorithm are investigated. We showthat the flood-based fault tolerant algorithms have an exceedinglyhigh communication overhead. We find that theredundant random walk algorithm offers significantly reducedoverhead while maintaining useful levels of fault tolerance.We then compare the implementation costs of thesealgorithms, both in terms of area as well as in energy consumption,and show that the flooding algorithms consumean order of magnitude more energy per message transmitted.

M. Pirretti; G. M. Link; R. R. Brooks; N. Vijaykrishnan; M. J. Irwin

208

Fault-Tolerant WSN Time Synchronization  

Directory of Open Access Journals (Sweden)

Full Text Available This paper proposes a new fault-tolerant time synchronization algorithm for wireless sensor networks that requires a short time for synchronization, achieves a guaranteed time synchronization level for all non-faulty nodes, accommodates nodes that enter suspended mode and then wake up, is computationally efficient, operates in a completely decentralized manner and tolerates up to f (out of 2 f + 1 total) faulty nodes. The performance of the proposed algorithm is analyzed, and an equation is derived for the resynchronization interval required for a specific level of synchronization precision. Results obtained from real runs on multi-hop networks are used to demonstrate the claimed features of the proposed algorithm.

Ung-Jin Jang; Sung-Gu Lee; Jun-Young Park; Sung-Joo Yoo

2010-01-01

209

Design of fault-tolerant inductive position sensor  

International Nuclear Information System (INIS)

[en] The position sensors used in a magnetic bearing system are desirable to provide some degree of fault-tolerance as the rotor position is necessary for the feedback control to overcome the open-loop instability. In this paper, we propose and inductive position sensor that can cope with a partial fault in the sensor. The sensor has multiple poles which can be combined to sense the in-plane motion of the rotor. When a high-frequency voltage signal drives each pole of the sensor, the resulting current in the sensor coil contains information regarding the rotor position. The signal processing circuit of the sensor extracts this position information. In this paper, we used the magnetic circuit model of the sensor that shows the analytical relationship between the sensor output and the rotor motion. The multi-polar structure of the sensor makes it possible to introduce redundancy which can be exploited for fault-tolerant operation. The proposed sensor is applied to a magnetically levitated turbo-molecular vacuum pump. Experimental results validate the fault-tolerance algorithm

2008-01-01

210

Fault Detection and Isolation and Fault Tolerant Control of Wind Turbines Using Set-Valued Observers  

DEFF Research Database (Denmark)

Research on wind turbine Operations & Maintenance (O&M) procedures is critical to the expansion of Wind Energy Conversion systems (WEC). In order to reduce O&M costs and increase the lifespan of the turbine, we study the application of Set-Valued Observers (SVO) to the problem of Fault Detection and Isolation (FDI) and Fault Tolerant Control (FTC) of wind turbines, by taking advantage of the recent advances in SVO theory for model invalidation. A simple wind turbine model is presented along with possible faulty scenarios. The FDI algorithm is built on top of the described model, taking into account process disturbances, uncertainty and sensor noise. The FTC strategy takes advantage of the proposed FDI algorithm, enabling the controller reconfiguration shortly after fault events. Additionally, a robust controller is designed so as to increase the wind turbine's performance during low severity faults. Finally, the FDI algorithm is assessed within a publicly available benchmark model, using Monte-Carlo simulation runs.

Casau, Pedro; Rosa, Paulo Andre Nobre

2012-01-01

211

On Reliability Analysis of Fault-tolerant Multistage Interconnection Networks  

Directory of Open Access Journals (Sweden)

Full Text Available The design of a suitable interconnection network for inter-processor communication is one of the key issues of the system performance. The reliability of these networks and their ability to continue operating despite failures are major concerns in determining the overall system performance. In this paper a new irregular network IABN has been proposed modifying existing ABN network. ABN is a regular multipath network with limited fault tolerance. The reliabilities of the IABN and ABN multi-stage interconnection networks have been calculated and compared in terms of the Upper and Lower bounds of Mean time to failure (MTTF).The IABN is a network that provides much better fault-tolerance by providing three time more paths between any pair of source-destination and better reliability at the expanse of little more cost than ABN.

Rinkle Aggarwal; Dr. Lakhwinder Kaur

2008-01-01

212

Checkpoint-based Intelligent Fault tolerance For Cloud Service Providers  

Directory of Open Access Journals (Sweden)

Full Text Available With the increasing demand and benefits of cloud computing infrastructure, real time computing can be performed on cloud infrastructure. A real time system can take advantage of intensive computing capabilities and scalable virtualized environment of cloud computing to execute real time tasks. In most of the real time cloud applications, processing is done on remote cloud computing nodes. So there are more chances of errors, due to the undetermined latency and loose control over computing node. On the other side, most of the real time systems are also safety critical and should be highly reliable. So there is an increased requirement for fault tolerance to achieve reliability for the real time computing on cloud Infrastructure. In this paper, proposes a smart checkpoint infrastructure for virtualized service providers and fault tolerance model for real time cloud computing. The checkpoints are stored in a Hadoop Distributed File System. This allows resuming a task execution faster after a node crash and increasing the fault tolerance of the system, since checkpoints are distributed and replicated in all the nodes of the provider. This paper presents a running implementation of this infrastructure and its evaluation, demonstrating that it is an effective way to make faster checkpoints with low interference on task execution and efficient task recovery after a node failure.One advantage of cloud computing is the dynamicity of re- source provisioning. Our architecture makes use of this advantage by enabling dynamic run- time modi?cations of replication groups

Rejin Paul

2012-01-01

213

Design and Analysis of a Fault Tolerant Microprocessor Based on Triple Modular Redundancy Using VHDL  

Directory of Open Access Journals (Sweden)

Full Text Available There are numerous real time & operation critical systems in which the failure of the system is unacceptable at any stage of processing. The examples of such systems are like ATM machines, satellites, spacecraft etc. In this paper a fault tolerant microprocessor is developed by using checker units with a fault secure ALU and to develop a fault secure ALU the parity prediction logic and two rail checker method was used. Finally triple modular redundancy is applied to develop a fault tolerant processor. Proposed method was validated using the VHDL test environment and the results showed that the reliability of the system increased with a little area overhead.

Deepti Shinghal; Dinesh Chandra

2011-01-01

214

A Framework-Based Approach for Fault-Tolerant Service Robots  

Directory of Open Access Journals (Sweden)

Full Text Available Recently the component?based approach has become a major trend in intelligent service robot development due to its reusability and productivity. The framework in a component?based system should provide essential services for application components. However, to our knowledge the existing robot frameworks do not yet support fault tolerance service. Moreover, it is often believed that faults can be handled only at the application level. In this paper, by extending the robot framework with the fault tolerance function, we argue that the framework?based fault tolerance approach is feasible and even has many benefits, including that: 1) the system integrators can build fault tolerance applications from non?fault?aware components; 2) the constraints of the components and the operating environment can be considered at the time of integration, which ? cannot be anticipated eaily at the time of component development; 3) consistency in system reliability can be obtained even in spite of diverse application component sources. In the proposed construction, we build XML rule files defining the rules for probing and determining the fault conditions of each component, contamination cases from a faulty component, and the possible recovery and safety methods. The rule files are established by a system integrator and the fault manager in the framework controls the fault tolerance process according to the rules. We demonstrate that the fault?tolerant framework can incorporate widely accepted fault tolerance techniques. The effectiveness and real?time performance of the framework?based approach and its techniques are examined by testing an autonomous mobile robot in typical fault scenarios.

Heejune Ahn; Woong-Kee Loh; Woon-Young Yeo

2012-01-01

215

Performance Prediction Model for a Fault-Tolerant Computer During Recovery and Restoration.  

Science.gov (United States)

The modeling and design of a fault-tolerant multiprocessor system is addressed. Of interest is the behavior of the system during recovery and restoration after a fault has occurred. The multiprocessor systems are based on the Algorithm to Architecture Map...

R. A. Obando J. W. Stoughton

1995-01-01

216

An Active Fault-Tolerant Control Method Ofunmanned Underwater Vehicles with Continuous and Uncertain Faults  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper introduces a novel thruster fault diagnosis and accommodation system for open-frame underwater vehicles with abrupt faults. The proposed system consists of two subsystems: a fault diagnosis subsystem and a fault accommodation sub-system. In the fault diagnosis subsystem a ICMAC(Improved C...

Daqi Zhu; Qian Liu; Yongsheng Yang

217

Fault tolerant wind speed estimator used in wind turbine controllers  

DEFF Research Database (Denmark)

Advanced control schemes can be used to optimize energy production and cost of energy in modern wind turbines. These control schemes most often rely on wind speed estimations. These designs of wind speed estimators are, however, not designed to be fault tolerant towards faults in the used sensors. In this paper a fault tolerant wind speed estimator is designed based on a set of unknown input observers, each designed to the different sets of non-faulty sensors. Faults in the rotor, generator and wind speed sensors are considered. The designed wind speed estimator is passive tolerant towards faults in the wind speed sensors, and faults in the generator and rotor speed sensors are accommodated by an active fault tolerant observer scheme in which the faults are detected and identified, and the observer corresponding to the non-faulty sensors are used. The potential of the scheme is shown by applying the proposed wind speed estimator to a simulation model of a wind turbine. Notice that since the faults are accommodated in the observer scheme the actual controller do not need to be adjusted or reconfigured to accommodate the sensor faults.

Odgaard, Peter Fogh; Stoustrup, Jakob

2012-01-01

218

CMOS processor element for a fault-tolerant SVD array  

Science.gov (United States)

This paper describes the VLSI implementation of a CORDIC based processor element for use in a fault-reconfigurable systolic array to compute the singular value decomposition (SVD) of a matrix. The chip implements a time redundant fault tolerance scheme, which allows processors adjacent to a faulty processor to act as computation backup during the systolic idle time. Also, processors around a fault collaborate to reroute data around the faulty processor. This form of time redundancy is attractive when tolerance to a few faults needs to be achieved with little hardware overhead.

Kota, Kishore; Cavallaro, Joseph R.

1993-11-01

219

FTCS 13th annual international symposium. Fault-tolerant computing. Digest of papers  

Energy Technology Data Exchange (ETDEWEB)

The following topics were dealt with: fault-tolerant systems, testable arrays; recovery; interacting automata; test generation; experimental evaluation; software reliability evaluation; self checking circuits; synchronisation protocols; robust programming; on-line monitoring of systems; self-testing; fault-tolerant software; fast modelling and testing tools for VLSI; reliability modelling and evaluation; fault location and error correction; system level diagnosis; coding technique for memories; microprocessors testing; telephone switching systems; design and testing aids; pla testing; interconnection networks; real time control application; circuit design. Abstracts of individual papers can be found under the relevant classification codes in this or future issues.

1983-01-01

220

Full Tolerant Archiving System  

Science.gov (United States)

The archiving system at the Italian center for Astronomical Archives (IA2) manages data from external sources like telescopes, observatories, or surveys and handles them in order to guarantee preservation, dissemination, and reliability, in most cases in a Virtual Observatory (VO) compliant manner. A metadata model dynamic constructor and a data archive manager are new concepts aimed at automatizing the management of different astronomical data sources in a fault tolerant environment. The goal is a full tolerant archiving system, nevertheless complicated by the presence of various and time changing data models, file formats (FITS, HDF5, ROOT, PDS, etc.) and metadata content, even inside the same project. To avoid this unpleasant scenario a novel approach is proposed in order to guarantee data ingestion, backward compatibility, and information preservation.

Knapic, C.; Molinaro, M.; Smareglia, R.

2013-10-01

 
 
 
 
221

Fault Tolerance In Grid Computing: State of the Art and Open Issues  

Directory of Open Access Journals (Sweden)

Full Text Available Fault tolerance is an important property for large scale computational grid systems, wheregeographically distributed nodes co-operate to execute a task. In order to achieve high level of reliabilityand availability, the grid infrastructure should be a foolproof fault tolerant. Since the failure of resourcesaffects job execution fatally, fault tolerance service is essential to satisfy QOS requirement in gridcomputing. Commonly utilized techniques for providing fault tolerance are job checkpointing andreplication. Both techniques mitigate the amount of work lost due to changing system availability but canintroduce significant runtime overhead. The latter largely depends on the length of checkpointing intervaland the chosen number of replicas, respectively. In case of complex scientific workflows where tasks canexecute in well defined order reliability is another biggest challenge because of the unreliable nature ofthe grid resources.

Ritu Garg; Awadhesh Kumar Singh

2011-01-01

222

A Study on the Noise Threshold of Fault-tolerant Quantum Error Correction  

CERN Multimedia

Quantum circuits implementing fault-tolerant quantum error correction (QEC) for the three qubit bit-flip code and five-qubit code are studied. To describe the effect of noise, we apply a model based on a generalized effective Hamiltonian where the system-environment interactions are taken into account by including stochastic fluctuating terms in the system Hamiltonian. This noise model enables us to investigate the effect of noise in quantum circuits under realistic device conditions and avoid strong assumptions such as maximal parallelism and weak storage errors. Noise thresholds of the QEC codes are calculated. In addition, the effects of imprecision in projective measurements, collective bath, fault-tolerant repetition protocols, and level of parallelism in circuit constructions on the threshold values are also studied with emphasis on determining the optimal design for the fault-tolerant QEC circuit. These results provide insights into the fault-tolerant QEC process as well as useful information for desig...

Cheng, Y C

2004-01-01

223

A universal, fault-tolerant, non-linear analytic network for modeling and fault detection  

Energy Technology Data Exchange (ETDEWEB)

The similarities and differences of a universal network to normal neural networks are outlined. The description and application of a universal network is discussed by showing how a simple linear system is modeled by normal techniques and by universal network techniques. A full implementation of the universal network as universal process modeling software on a dedicated computer system at EBR-II is described and example results are presented. It is concluded that the universal network provides different feature recognition capabilities than a neural network and that the universal network can provide extremely fast, accurate, and fault-tolerant estimation, validation, and replacement of signals in a real system.

Mott, J.E. [Advanced Modeling Techniques Corp., Idaho Falls, ID (United States); King, R.W.; Monson, L.R.; Olson, D.L.; Staffon, J.D. [Argonne National Lab., Idaho Falls, ID (United States)

1992-03-06

224

Fault Tolerant Neuro-Robust Position Control of DC Motors  

Directory of Open Access Journals (Sweden)

Full Text Available DC motors are widely used in industry such as mechanics, robotics, and aerospace engineering. In this paper, we present a high performance control method for position control of DC motors. Fault-tolerant control model are also addressed to combine with neuro-robust control approach. It is shown that with the proposed control algorithms, external disturbances and coupled dynamics inherent in the system are effectively compensated using neural network unit in which no analytical estimation on the upper bound of the reconstruction error and uncertainties is needed. Simulations on various flight conditions also confirm the effectiveness of the proposed methods.

Ran Zhang; Marwan Bikdash

2011-01-01

225

Fault tolerant task execution through global trajectory planning  

International Nuclear Information System (INIS)

Whether a task can be completed after a failure of one of the degrees-of-freedom of a redundant manipulator depends on the joint angle at which the failure takes place. It is possible to achieve fault tolerance by globally planning a trajectory that avoids unfavourable joint positions before a failure occurs. In this article, we present a trajectory planning algorithm that guarantees fault tolerance while simultaneously satisfying joint limit and obstacle avoidance requirements.

1996-01-01

226

Fault tolerant task execution through global trajectory planning  

Energy Technology Data Exchange (ETDEWEB)

Whether a task can be completed after a failure of one of the degrees-of-freedom of a redundant manipulator depends on the joint angle at which the failure takes place. It is possible to achieve fault tolerance by globally planning a trajectory that avoids unfavourable joint positions before a failure occurs. In this article, we present a trajectory planning algorithm that guarantees fault tolerance while simultaneously satisfying joint limit and obstacle avoidance requirements.

Paredis, Christiaan J.J.; Khosla, Pradeep K

1996-09-01

227

Designing fault-tolerant manipulators: How many degrees of freedom?  

Energy Technology Data Exchange (ETDEWEB)

One of the most important parameters to consider when designing a manipulator is the number of degrees of freedom (DOFs). This article focuses on the question: How many DOFs are necessary and sufficient for fault tolerance, and how should these DOFs be distributed along the length of the manipulator? A manipulator is fault tolerant if it can complete its task even when one of its joints fails and is immobilized. The number of DOFs needed for fault tolerance strongly depends on the knowledge available about the task. In this article, two approaches are explored. First, for the design of a general purpose fault-tolerant manipulator, it is assumed that neither the exact task trajectory nor the redundancy resolution algorithm are known a priori and the manipulator has no joint limits. In this case, two redundant DOFs are necessary and sufficient to sustain one joint failure, as is demonstrated in two design templates for spatial fault-tolerant manipulators. In this second approach, both the Cartesian task path and the redundancy resolution algorithm are assumed to be known. The design of such a task-specific fault-tolerant manipulator requires only one degree of redundancy. 22 refs., 11 figs., 2 tabs.

Paredis, C.J.J.; Khosla, P.K. [Carnegie Mellon Univ., Pittsburgh, PA (United States)

1996-12-01

228

A Remote Characterization System and a fault-tolerant tracking system for subsurface mapping of buried waste sites  

International Nuclear Information System (INIS)

[en] This paper describes two closely related projects that will provide new technology for characterizing hazardous waste burial sites. The first project, a collaborative effort by five of the national laboratories, involves the development and demonstration of a remotely controlled site characterization system. The Remote Characterization System (RCS) includes a unique low-signature survey vehicle, a base station, radio telemetry data links, satellite-based vehicle tracking, stereo vision, and sensors for noninvasive inspection of the surface and subsurface. The second project, conducted by the Idaho National Engineering Laboratory (INEL), involves the development of a position sensing system that can track a survey vehicle or instrument in the field. This system can coordinate updates at a rate of 200/s with an accuracy better than 0.1% of the distance separating the target and the sensor. It can employ acoustic or electromagnetic signals in a wide range of frequencies and can be operated as a passive or active device

1992-01-01

229

A Remote Characterization System and a fault-tolerant tracking system for subsurface mapping of buried waste sites  

Energy Technology Data Exchange (ETDEWEB)

This paper describes two closely related projects that will provide new technology for characterizing hazardous waste burial sites. The first project, a collaborative effort by five of the national laboratories, involves the development and demonstration of a remotely controlled site characterization system. The Remote Characterization System (RCS) includes a unique low-signature survey vehicle, a base station, radio telemetry data links, satellite-based vehicle tracking, stereo vision, and sensors for noninvasive inspection of the surface and subsurface. The second project, conducted by the Idaho National Engineering Laboratory (INEL), involves the development of a position sensing system that can track a survey vehicle or instrument in the field. This system can coordinate updates at a rate of 200/s with an accuracy better than 0.1% of the distance separating the target and the sensor. It can employ acoustic or electromagnetic signals in a wide range of frequencies and can be operated as a passive or active device.

Sandness, G.A.; Bennett, D.W. [Pacific Northwest Lab., Richland, WA (United States); Martinson, L. [Westinghouse Idaho Nuclear Co., Inc., Idaho Falls, ID (United States); Bingham, D.N.; Anderson, A.A. [EG and G Idaho, Inc., Idaho Falls, ID (United States)

1992-08-01

230

Fault-tolerant locomotion of the hexapod robot.  

UK PubMed Central (United Kingdom)

In this paper, we propose a scheme for fault detection and tolerance of the hexapod robot locomotion on even terrain. The fault stability margin is defined to represent potential stability which a gait can have in case a sudden fault event occurs to one leg. Based on this, the fault-tolerant quadruped periodic gaits of the hexapod walking over perfectly even terrain are derived. It is demonstrated that the derived quadruped gait is the optimal one the hexapod can have maintaining fault stability margin nonnegative and a geometric condition should be satisfied for the optimal locomotion. By this scheme, when one leg is in failure, the hexapod robot has the modified tripod gait to continue the optimal locomotion.

Yang JM; Kim JH

1998-01-01

231

Fault-tolerant Control of Unmanned Underwater Vehicles with Continuous Faults: Simulations and Experiments  

Digital Repository Infrastructure Vision for European Research (DRIVER)

A novel thruster fault diagnosis and accommodation method for open-frame underwater vehicles is presented in the paper. The proposed system consists of two units: a fault diagnosis unit and a fault accommodation unit. In the fault diagnosis unit an ICMAC (Improved Credit Assignment Cerebellar Model ...

Qian Liu; Daqi Zhu

232

Realization of User Level Fault Tolerant Policy Management through a Holistic Approach for Fault Correlation  

Energy Technology Data Exchange (ETDEWEB)

Many modern scientific applications, which are designed to utilize high performance parallel com- puters, occupy hundreds of thousands of computational cores running for days or even weeks. Since many scien- tists compete for resources, most supercomputing centers practice strict scheduling policies and perform meticulous accounting on their usage. Thus computing resources and time assigned to a user is considered invaluable. However, most applications are not well prepared for un- foreseeable faults, still relying on primitive fault tolerance techniques. Considering that ever-plunging mean time to interrupt (MTTI) is making scientific applications more vulnerable to faults, it is increasingly important to provide users not only an improved fault tolerant environment, but also a framework to support their own fault tolerance policies so that their allocation times can be best utilized. This paper addresses a user level fault tolerance policy management based on a holistic approach to digest and correlate fault related information. It introduces simple semantics with which users express their policies on faults, and illustrates how event correlation techniques can be applied to manage and determine the most preferable user policies. The paper also discusses an implementation of the framework using open source software, and demonstrates, as an example, how a molecular dynamics simulation application running on the institutional cluster at Oak Ridge National Laboratory benefits from it.

Park, Byung H [ORNL; Naughton, III, Thomas J [ORNL; Agarwal, Pratul K [ORNL; Bernholdt, David E [ORNL; Geist, Al [ORNL; Tippens, Jennifer L [ORNL

2011-01-01

233

Energy Efficient Fault Tolerant Routing Mechanism for Wireless Sensor Network  

Directory of Open Access Journals (Sweden)

Full Text Available Wireless sensor networks are self-organizing systems with resource-constraints that are often deployed in inhospitable and inaccessible environments in order to gather data about some phenomenon in the outside world. For most sensor network applications, point-to-point reliability is not the main objective (Paradis & Qi, 2007); Instead, reliable delivery of the interesting event to the server has to be guaranteed (may be with a certain probability). The communication in such networks is unpredictable and failure-prone, even more so than in regular wireless ad hoc networks. Hence, it is vital to provide fault tolerant techniques for distributed applications in sensor network. Several approaches have been proposed in many recent studies to address the fault tolerance issue in application, transport and/or routing layers. In this paper, we propose a slight modification of the conventional routing (destination, next hop) by introducing the second hop information in the route construction phase in order to use it in case of node/link failure (skip only the failed link). Furthermore, the implementation of this proposed routing technique stabilizes the throughput, reduces the average jitter, provides low control overhead and decreases the energy consumption of the network. As a result, the reliability, availability, energy-efficiency and maintainability of the network are achieved.

Ahmed Roumane; Bouabdellah Kechar; Belkacem Kouninef

2012-01-01

234

MCNP load balancing and fault tolerance with PVM  

Energy Technology Data Exchange (ETDEWEB)

Version 4A of the Monte Carlo neutron, photon, and electron transport code MCNP, developed by LANL (Los Alamos National Laboratory), supports distributed-memory multiprocessing through the software package PVM (Parallel Virtual Machine, version 3.1.4). Using PVM for interprocessor communication, MCNP can simultaneously execute a single problem on a cluster of UNIX-based workstations. This capability provided system efficiencies that exceeded 80% on dedicated workstation clusters, however, on heterogeneous or multiuser systems, the performance was limited by the slowest processor (i.e., equal work was assigned to each processor). The next public release of MCNP will provide multiprocessing enhancements that include load balancing and fault tolerance which are shown to dramatically increase multiuser system efficiency and reliability.

McKinney, G.W.

1995-07-01

235

Buffered coscheduling for parallel programming and enhanced fault tolerance  

Energy Technology Data Exchange (ETDEWEB)

A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval. The buffered coscheduling method of this invention also enhances the fault tolerance of a network of parallel machine processors or distributed system processors

Petrini, Fabrizio (Los Alamos, NM); Feng, Wu-chun (Los Alamos, NM)

2006-01-31

236

Fault tolerant strategies for automated operation of nuclear reactors  

International Nuclear Information System (INIS)

This paper introduces an automatic control system incorporating a number of verification, validation, and command generation tasks with-in a fault-tolerant architecture. The integrated system utilizes recent methods of artificial intelligence such as neural networks and fuzzy logic control. Furthermore, advanced signal processing and nonlinear control methods are also included in the design. The primary goal is to create an on-line capability to validate signals, analyze plant performance, and verify the consistency of commands before control decisions are finalized. The application of this approach to the automated startup of the Experimental Breeder Reactor-II (EBR-II) is performed using a validated nonlinear model. The simulation results show that the advanced concepts have the potential to improve plant availability andsafety

1991-01-01

237

Simulating chemistry efficiently on fault-tolerant quantum computers  

CERN Document Server

Quantum computers can in principle simulate quantum physics exponentially faster than their classical counterparts, but some technical hurdles remain. Here we consider methods to make proposed chemical simulation algorithms computationally fast on fault-tolerant quantum computers in the circuit model. Fault tolerance constrains the choice of available gates, so that arbitrary gates required for a simulation algorithm must be constructed from sequences of fundamental operations. We examine techniques for constructing arbitrary gates which perform substantially faster than circuits based on the conventional Solovay-Kitaev algorithm [C.M. Dawson and M.A. Nielsen, \\emph{Quantum Inf. Comput.}, \\textbf{6}:81, 2006]. For a given approximation error $\\epsilon$, arbitrary single-qubit gates can be produced fault-tolerantly and using a limited set of gates in time which is $O(\\log \\epsilon)$ or $O(\\log \\log \\epsilon)$; with sufficient parallel preparation of ancillas, constant average depth is possible using a method w...

Jones, N Cody; McMahon, Peter L; Yung, Man-Hong; Van Meter, Rodney; Aspuru-Guzik, Alán; Yamamoto, Yoshihisa

2012-01-01

238

A Fault Tolerant Resource Allocation Architecture for Mobile Grid  

Directory of Open Access Journals (Sweden)

Full Text Available Problem statement: In order to achieve high level of reliability and availability, the grid infrastructure should be fault tolerant. Since the failure of resources affects job execution fatally, fault tolerance service is essential to satisfy QoS requirement in grid computing with respect to mobile nodes. Approach: We propose a fault tolerant technique for improving reliability in mobile grid environment considering the node mobility. The Cluster head and monitoring agent was designed in such a way it addresses both resource and network failure and present recovery techniques for overcoming the faults. Results: The proposed model achieves a identifiable performance when compared to the previous model (HRAA). By simulation results, we analyze the node and link failures on parameters such as delivery ratio, throughput and delay against the rate of success. Conclusion: The proposed fault tolerant approach checks for availability of the nodes with least work load for transferring the executed job to cluster head providing an alternate path in case of failure thereby enhancing the reliability of the grid environment.

S. Thenmozhi; A. Tamilarasi; P. T. Vanathi

2012-01-01

239

An upper bound on quantum fault tolerant thresholds  

CERN Document Server

In this paper we calculate upper bounds on fault tolerance without restrictions on the overhead involved. Optimally adaptive recovery operators are used, and the Shannon entropy is used to estimate the thresholds. By allowing for unrealistically high levels of overhead, we find a quantum fault tolerant threshold of 6.88% for the depolarizing noise used by Knill, which compares to "above 3%" evidenced by Knill. We conjecture that the optimal threshold is 6.90%. We also perform threshold calculations for types of noise other than that discussed by Knill.

Fern, Jesse

2008-01-01

240

Design of Fault Tolerant Network Interfaces for NoCs  

DEFF Research Database (Denmark)

Networks-on-Chip (NoCs) appeared as a strategy to deal with the communication requirements of complex IP-based System-on-Chips. As the complexity of designs increases and the technology scales down into the deep-submicron domain, the probability of malfunctions and failures in the NoC components increases. This paper focuses on the study and evaluation of techniques for increasing reliability and resilience of Network Interfaces (NIs). NIs act as interfaces between IP cores and the communication infrastructure; a faulty behavior in them could affect therefore the overall system. In this work, we propose a functional fault model for the NI components, and we present a two-level fault tolerant solution that can be employed for mitigating the effects of both single-event upset soft errors and hard errors on the NI. Experiments show that with a limited overhead we can obtain a significant reliability of the NI, while saving up to 83% in area with respect to a standard Triple Modular Redundancy implementation, as well as a significant energy reduction.

Fiorin, Leandro; Micconi, Laura

2011-01-01

 
 
 
 
241

Energy Bounds for Fault-Tolerant Nanoscale Designs  

CERN Multimedia

The problem of determining lower bounds for the energy cost of a given nanoscale design is addressed via a complexity theory-based approach. This paper provides a theoretical framework that is able to assess the trade-offs existing in nanoscale designs between the amount of redundancy needed for a given level of resilience to errors and the associated energy cost. Circuit size, logic depth and error resilience are analyzed and brought together in a theoretical framework that can be seamlessly integrated with automated synthesis tools and can guide the design process of nanoscale systems comprised of failure prone devices. The impact of redundancy addition on the switching energy and its relationship with leakage energy is modeled in detail. Results show that 99% error resilience is possible for fault-tolerant designs, but at the expense of at least 40% more energy if individual gates fail independently with probability of 1%.

Marculescu, Diana

2011-01-01

242

An Active Fault-Tolerant Control Method Ofunmanned Underwater Vehicles with Continuous and Uncertain Faults  

Directory of Open Access Journals (Sweden)

Full Text Available This paper introduces a novel thruster fault diagnosis and accommodation system for open-frame underwater vehicles with abrupt faults. The proposed system consists of two subsystems: a fault diagnosis subsystem and a fault accommodation sub-system. In the fault diagnosis subsystem a ICMAC(Improved Credit Assignment Cerebellar Model Articulation Controllers) neural network is used to realize the on-line fault identification and the weighting matrix computation. The fault accommodation subsystem uses a control algorithm based on weighted pseudo-inverse to find the solution of the control allocation problem. To illustrate the proposed method effective, simulation example, under multi-uncertain abrupt faults, is given in the paper.

Daqi Zhu; Qian Liu; Yongsheng Yang

2008-01-01

243

FTCS 12th annual international symposium on fault-tolerant computing. Digest of papers  

Energy Technology Data Exchange (ETDEWEB)

The following topics were dealt with: fault tolerance; architecture; distributed systems; design for testability; recovery; test generation; computer networks; interconnection networks; system level techniques; on-line monitoring; analytical evaluation; self testing; redundancy techniques; multicomputer system diagnosis; experimental evaluation; and VLSI design issues. Abstracts of individual papers can be found under the relevant classification codes in this or future issues.

1982-01-01

244

Particle Filter Based Fault-tolerant ROV Navigation using Hydro-acoustic Position and Doppler Velocity Measurements  

DEFF Research Database (Denmark)

This paper presents a fault tolerant navigation system for a remotely operated vehicle (ROV). The navigation system uses hydro-acoustic position reference (HPR) and Doppler velocity log (DVL) measurements to achieve an integrated navigation. The fault tolerant functionality is based on a modied particle lter. This particle lter is able to run in an asynchronous manner to accommodate the measurement drop out problem, and it overcomes the measurement outliers by switching observation models. Simulations with experimental data show that this fault tolerant navigation system can accurately estimate the ROV kinematic states, even when sensor failures appear frequently.

Zhao, Bo; Blanke, Mogens

2012-01-01

245

Ethernet Implementation of Fault Tolerant Train Network for Entertainment and Mixed Control Traffic  

Directory of Open Access Journals (Sweden)

Full Text Available This paper studies the integration of the control system and entertainment on board of train wagons. Both the control and entertainment loads are implemented on top of Gigabit Ethernet, each with a dedicated controller/server. The control load has mixed sampling periods. It is proven that this system can tolerate the failure of one controller in one wagon. In a two wagon scenario, fault tolerance at the controller level is studied, and simulation results show that the system can tolerate the failure of 3 controllers. The system is successful in meeting the packet end-to-end delay with zero packet loss in all OPNET simulated scenarios. The maximum permissible entertainment load is determined for the fault tolerant scenarios.

Tarek K. Refaat; Mai Hassan; Ramez M. Daoud; Hassanein H. Amer

2013-01-01

246

Diogenes approach to testable fault-tolerant arrays of processors  

Energy Technology Data Exchange (ETDEWEB)

A strategy for designing testable fault-tolerant arrays of processors is described by a series of examples. The strategy achieves fault tolerance by introducing redundancy in an array's communication links rather than in its processing elements (PEs). The major characteristics of the designs produced are as follows. (1) testability: the designs always afford isolation and scan-in scan-out capabilities for each PE. (2) Simplicity of configuration: the process of programming an array to its fault-free format consists only of setting a few variables ( = control lines) per PE. (3) Dynamic fault tolerance: the settings of variables can be altered at any time. (4) Transparency to PE designer: transforming the design of an array of PEs to a diogenes design of the array involves changing only the communication links of array, leaving the PEs and their interfaces unchanged. (5) Area-efficiency: the designs produced by the strategy are often (asymptotically) optimal in area. (6) Regularity and modularity: fault-laden chips can easily be interconnected to build an array of the desired size. (7) Speed: diogenes layouts need never have signal wires travel more than the width of a single PE without being enhanced; thus PE failures cannot cause arbitrarily long unenhanced runs of wire. 31 references.

Rosenberg, A.L.

1983-10-01

247

Fault Tolerant Message Efficient Coordinator Election Algorithm in High Traffic Bidirectional Ring Network  

Directory of Open Access Journals (Sweden)

Full Text Available Nowadays use of distributed systems such as internet and cloud computing is growing dramatically. Coordinator existence in these systems is crucial due to processes coordinating and consistency requirement as well. However the growth makes their election algorithm even more complicated. Too many algorithms are proposed in this area but the two most well known one are Bully and Ring. In this paper we propose a fault tolerant coordinator election algorithm in typical bidirectional ring topology which is twice as fast as Ring algorithm although far fewer messages are passing due to election. Fault tolerance technique is applied which leads the waiting time for the election reaching to zero.

Danial Rahdari; Amir Masoud Rahmani; Afsane Arabshahi

2012-01-01

248

Fault tolerance and reliability in integrated ship control : the ATOMOS concept  

DEFF Research Database (Denmark)

Various strategies for achieving fault tolerance in large scale control systems are discussed. The positive and negative impacts of distribution through network communication are presented. The ATOMOS framework for standardized reliable marine automation is presented along with the corresponding reliability issues. A generic framework for simulation of network traffic under fault conditions is suggested and the first practical experiences from a prototype implementation are reported.

Nielsen, Jens Frederik Dalsgaard; Izadi-Zamanabadi, Roozbeh

2002-01-01

249

Lossless fault-tolerant data structures with additive overhead  

Digital Repository Infrastructure Vision for European Research (DRIVER)

We develop the first dynamic data structures that tolerate ? memory faults, lose no data, and incur only an O(? ) additive overhead in overall space and time per operation. We obtain such data structures for arrays, linked lists, binary search trees, interval trees, predecessor search, and suffix tr...

Christiano, Paul F.; Demaine, Erik D.; Kishore, Shaunak

250

Reversible Logic Synthesis of Fault Tolerant Carry Skip BCD Adder  

CERN Multimedia

Reversible logic is emerging as an important research area having its application in diverse fields such as low power CMOS design, digital signal processing, cryptography, quantum computing and optical information processing. This paper presents a new 4*4 parity preserving reversible logic gate, IG. The proposed parity preserving reversible gate can be used to synthesize any arbitrary Boolean function. It allows any fault that affects no more than a single signal readily detectable at the circuit's primary outputs. It is shown that a fault tolerant reversible full adder circuit can be realized using only two IGs. The proposed fault tolerant full adder (FTFA) is used to design other arithmetic logic circuits for which it is used as the fundamental building block. It has also been demonstrated that the proposed design offers less hardware complexity and is efficient in terms of gate count, garbage outputs and constant inputs than the existing counterparts.

Islam, Md Saiful; 10.3329/jbas.v32i2.2431

2010-01-01

251

Fault-tolerant Algorithms for Tick-Generation in Asynchronous Logic: Robust Pulse Generation  

CERN Multimedia

Today's hardware technology presents a new challenge in designing robust systems. Deep submicron VLSI technology introduced transient and permanent faults that were never considered in low-level system designs in the past. Still, robustness of that part of the system is crucial and needs to be guaranteed for any successful product. Distributed systems, on the other hand, have been dealing with similar issues for decades. However, neither the basic abstractions nor the complexity of contemporary fault-tolerant distributed algorithms match the peculiarities of hardware implementations. This paper is intended to be part of an attempt striving to overcome this gap between theory and practice for the clock synchronization problem. Solving this task sufficiently well will allow to build a very robust high-precision clocking system for hardware designs like systems-on-chips in critical applications. As our first building block, we describe and prove correct a novel Byzantine fault-tolerant self-stabilizing pulse syn...

Dolev, Danny; Lenzen, Christoph; Schmid, Ulrich

2011-01-01

252

Fault Tolerant Wind Farm Control - a Benchmark Model  

DEFF Research Database (Denmark)

This paper presents a test benchmark model for the evaluation of fault detection and accommodation schemes. This benchmark model deals with the wind turbine on a system level, and it includes sensor, actuator, and system faults, namely faults in the pitch system, the drive train, the generator, and the converter system. Since it is a system-level model, converter and pitch system models are simplified because these are controlled by internal controllers working at higher frequencies than the system model. The model represents a three-bladed pitch-controlled variable-speed wind turbine with a nominal power of 4.8 MW. The fault detection and isolation (FDI) problem was addressed by several teams, and five of the solutions are compared in the second part of this paper. This comparison relies on additional test data in which the faults occur in different operating conditions than in the test data used for the FDI design.

Odgaard, Peter Fogh; Stoustrup, Jakob

2013-01-01

253

Fault Tolerant Control of Wind Turbines - A benchmark model  

DEFF Research Database (Denmark)

This paper presents a test benchmark model for the evaluation of fault detection and accommodation schemes. This benchmark model deals with the wind turbine on a system level, and it includes sensor, actuator, and system faults, namely faults in the pitch system, the drive train, the generator, and the converter system. Since it is a system-level model, converter and pitch system models are simplified because these are controlled by internal controllers working at higher frequencies than the system model. The model represents a three-bladed pitch-controlled variable-speed wind turbine with a nominal power of 4.8 MW. The fault detection and isolation (FDI) problem was addressed by several teams, and five of the solutions are compared in the second part of this paper. This comparison relies on additional test data in which the faults occur in different operating conditions than in the test data used for the FDI design.

Odgaard, Peter Fogh; Stoustrup, Jakob

2013-01-01

254

Combining dynamical decoupling with fault-tolerant quantum computation  

CERN Multimedia

We study how dynamical decoupling (DD) pulse sequences can improve the reliability of quantum computers. We prove upper bounds on the accuracy of DD-protected quantum gates and derive sufficient conditions for DD-protected gates to outperform unprotected gates. Under suitable conditions, fault-tolerant quantum circuits constructed from DD-protected gates can tolerate stronger noise, and have a lower overhead cost, than fault-tolerant circuits constructed from unprotected gates. Our accuracy estimates depend on the dynamics of the bath that couples to the quantum computer, and can be expressed either in terms of the operator norm of the bath's Hamiltonian or in terms of the power spectrum of bath correlations; we explain in particular how the performance of recursively generated concatenated pulse sequences can be analyzed from either viewpoint. Our results apply to Hamiltonian noise models with limited spatial correlations.

Ng, Hui Khoon; Preskill, John

2009-01-01

255

Fault Tolerant Control Using Proportional-Integral-Derivative Controller Tuned by Genetic Algorithm  

Directory of Open Access Journals (Sweden)

Full Text Available Problem statement: The growing demand for reliability, maintainability and survivability in industrial processes has drawn significant research in fault detection and fault tolerant control domain. A fault is usually defined as an unexpected change in a system, such as component malfunction and variations in operating condition, which tends to degrade the overall system performance. The purpose of fault detection is to detect these malfunctions to take proper action in order to prevent faults from developing into a total system failure. Approach: In this study an effective integrated fault detection and fault tolerant control scheme was developed for a class of LTI system. The scheme was based on a Kalman filter for simultaneous state and fault parameter estimation, statistical decisions for fault detection and activation of controller reconfiguration. Proportional-Integral-Derivative (PID) control schemes continue to provide the simplest and yet effective solutions to most of the control engineering applications today. Determination or tuning of the PID parameters continues to be important as these parameters have a great influence on the stability and performance of the control system. In this study GA was proposed to tune the PID controller. Results: The results reflect that proposed scheme improves the performance of the process in terms of time domain specifications, robustness to parametric changes and optimum stability. Also, A comparison with the conventional Ziegler-Nichols method proves the superiority of GA based system. Conclusion: This study demonstrates the effectiveness of genetic algorithm in tuning of a PID controller with optimum parameters. It is, moreover, proved to be robust to the variations in plant dynamic characteristics and disturbances assuring a parameter-insensitive operation of the process.

S. Kanthalakshmi; V. Manikandan

2011-01-01

256

Fault-tolerant Control of Unmanned Underwater Vehicles with Continuous Faults: Simulations and Experiments  

Directory of Open Access Journals (Sweden)

Full Text Available A novel thruster fault diagnosis and accommodation method for open-frame underwater vehicles is presented in the paper. The proposed system consists of two units: a fault diagnosis unit and a fault accommodation unit. In the fault diagnosis unit an ICMAC (Improved Credit Assignment Cerebellar Model Articulation Controllers) neural network information fusion model is used to realize the fault identification of the thruster. The fault accommodation unit is based on direct calculations of moment and the result of fault identification is used to find the solution of the control allocation problem. The approach resolves the continuous faulty identification of the UV. Results from the experiment are provided to illustrate the performance of the proposed method in uncertain continuous faulty situation.

Qian Liu; Daqi Zhu

2010-01-01

257

Fault Diagnosis and Accommodation of LTI systems by modified Youla parameterization  

Directory of Open Access Journals (Sweden)

Full Text Available In this paper an Active Fault Tolerant Control (FTC) scheme is proposed for Linear Time Invariant (LTI) systems, which achieves fault diagnosis followed by fault accommodation. The fault diagnosis scheme is carried out in two steps; Fault detection followed by Fault isolation. Fault detection filter use the sensor measurements to generate residuals, which have a unique static pattern in response to each fault. Distortion in these static patterns generates the probability of the presence of fault. The fault accommodation scheme is carried out using the Generalized Internal Model Control (GIMC) architecture, also known as modified Youla parameterization. In addition, performance indices are also evaluated to indicate that the resulting fault tolerant scheme can detect, identify and accommodate actuator and sensor faults under additive faults. The DC motor example is considered for the demonstration of the proposed scheme.

Minupriya A; S. Kanthalakshmi; V. Manikandan

2012-01-01

258

Separation of Fault Tolerance and Non-Functional Concerns: Aspect Oriented Patterns and Evaluation  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Dependable computer based systems employing fault tolerance and robust software development techniques demand additional error detection and recovery related tasks. This results in tangling of core functionality with these cross cutting non-functional concerns. In this regard current work identifies...

Kashif Hameed; Rob Williams; Jim Smith

259

Autonomous Decentralized Loop network - ADL aiming at fault-tolerance  

Science.gov (United States)

An Autonomous Decentralized System (ADS) network is proposed which provides fault detection, fault recovery, transmission, and maintenance for a space system in a distributed manner. An Autonomous Decentralized Loop (ADL) network system is presented as an application of ADS. The ADL system construction, communication protocol, transmission control, and fault detection and recovery are examined. The ADS features autonomous nodes which allow no subsystem to be down without advance notice. The functional availability of ADL is compared with that of a two-redundant loop.

Kanbe, Seiichiro; Ashida, Akira; Tanaka, Toshiyuki; Mori, Kinji; Ihara, Hirokazu

260

Lightweight storage and overlay networks for fault tolerance.  

Energy Technology Data Exchange (ETDEWEB)

The next generation of capability-class, massively parallel processing (MPP) systems is expected to have hundreds of thousands to millions of processors, In such environments, it is critical to have fault-tolerance mechanisms, including checkpoint/restart, that scale with the size of applications and the percentage of the system on which the applications execute. For application-driven, periodic checkpoint operations, the state-of-the-art does not provide a scalable solution. For example, on today's massive-scale systems that execute applications which consume most of the memory of the employed compute nodes, checkpoint operations generate I/O that consumes nearly 80% of the total I/O usage. Motivated by this observation, this project aims to improve I/O performance for application-directed checkpoints through the use of lightweight storage architectures and overlay networks. Lightweight storage provide direct access to underlying storage devices. Overlay networks provide caching and processing capabilities in the compute-node fabric. The combination has potential to signifcantly reduce I/O overhead for large-scale applications. This report describes our combined efforts to model and understand overheads for application-directed checkpoints, as well as implementation and performance analysis of a checkpoint service that uses available compute nodes as a network cache for checkpoint operations.

Oldfield, Ron A.

2010-01-01

 
 
 
 
261

Self-checking VLSI building blocks for fault-tolerant multicomputers  

Energy Technology Data Exchange (ETDEWEB)

The use of self-checking nodes and links for implementing fault-tolerant VLSI multicomputers is proposed. The system consists of a large number of VLSI computers interconnected by high-speed dedicated links. Hardware which performs error detection is combined with system-level protocols which handle error recovery and fault treatment. The self-checking nodes notify the rest of the system when their output is erroneous. In order to achieve high fault coverage, error detection is accomplished by duplication and matching. The critical circuit in this scheme is a comparator, which must not be susceptible to faults which can remain undetected and later mask the failure of the functional modules. With both NMOS and CMOS technologies it is possible to implement a self-testing comparator which will produce an error indication if the comparator incurs any single physical defect. 13 references.

Tamir, Y.; Sequin, C.H.

1983-01-01

262

Design of a fault-tolerant controller for the SP-100 space reactor  

Energy Technology Data Exchange (ETDEWEB)

The control system of an SP-100 space reactor is a key element of space reactor design to meet the space mission requirements of safety, reliability, and life expectancy. In this work, a fault-tolerant controller (FTC) is developed to control the thermoelectric (TE) power in the SP-100 space reactor. A fault-tolerant controller makes the control system stable and retains acceptable performance even under system faults. The objectives of the proposed model predictive controller are to minimize both the difference between the predicted TE power and the desired power, and the variation of control drum angle that adjusts the control reactivity. Also, the objectives are subject to constraints of maximum and minimum control drum angle and maximum drum angle variation speed. The model predictive controller incorporates a fault detection and diagnostics algorithm so that the controller can work properly even under input and output measurement faults. A lumped parameter simulation model of the SP-100 nuclear space reactor is used to verify the proposed controller design. Simulation result show that the TE generator power level, regulated by the proposed controller, could track the target power level effectively even under measurement faults, satisfying all control constraints. (authors)

Na, M. G. [Nuclear Eng. Dept., Chosun Univ., 375 Seosuk-dong, Dong-gu, Gwangju 501-759 (Korea, Republic of); Upadhyaya, B. R. [Nuclear Eng. Dept., Univ. of Tennessee, Knoxville, TN 37996-2300 (United States)

2006-07-01

263

BFTDT: Byzantine Fault Tolerance tryout for Dependable Transactions in Cloud  

Directory of Open Access Journals (Sweden)

Full Text Available Cloud Web Services (CWS) is the technology used for business collaboration and integration among the web users. The Web Services Atomic Transactions (WS-AT) have been used for the trusted distributed transaction processing over the web. The WS-AT in the distributed sense has byzantine faults to overcome that Byzantine Faults Techniques (BFT) is used. The reliable coordinator provides the services that are Coordination services, Activation services, Registration Services and Completion services which make the transaction effective and reliable. In the trusted environment, to evade congestion of the resources, fair share bandwidth allocation scheme is used to allocate separate bandwidth for each web users and the transaction is processed Coordinator server and the Transaction Processing Monitor (TPM). The WS-AT for business applications analysis shows the high degree of dependability, security, trust, fault tolerance and fairness of the resources in the trusted environment.

Gayathri S; Prasath T; Jamuna P

2012-01-01

264

Designing an Agent-Based Intrusion Detection System for Heterogeneous Wireless Sensor Networks: Robust, Fault Tolerant and Dynamic Reconfigurable  

Directory of Open Access Journals (Sweden)

Full Text Available Protecting networks against different types of attacks is one of most important posed issue into the network and information security domains. This problem on Wireless Sensor Networks (WSNs), in attention to their special properties, has more importance. Now, there are some of proposed solutions to protect Wireless Sensor Networks (WSNs) against different types of intrusions; but no one of them has a comprehensive view to this problem and they are usually designed in single-purpose; but, the proposed design in this paper has been a comprehensive view to this issue by presenting a complete architecture of Intrusion Detection System (IDS). The main contribution of this architecture is its modularity and flexibility; i.e. it is designed and applicable, in four steps on intrusion detection process, consistent to the application domain and its required security level. Focus of this paper is on the heterogeneous WSNs and network-based IDS, by designing and deploying the Wireless Sensor Network wide level Intrusion Detection System (WSNIDS) on the base station (sink). Finally, this paper has been designed a questionnaire to verify its idea, by using the acquired results from analyzing the questionnaires.

Hossein Jadidoleslamy

2011-01-01

265

A fault-tolerant one-way quantum computer  

International Nuclear Information System (INIS)

[en] We describe a fault-tolerant one-way quantum computer on cluster states in three dimensions. The presented scheme uses methods of topological error correction resulting from a link between cluster states and surface codes. The error threshold is 1.4% for local depolarizing error and 0.11% for each source in an error model with preparation-, gate-, storage-, and measurement errors

2006-01-01

266

Unconstrained and Constrained Fault-Tolerant Resource Allocation  

CERN Multimedia

First, we study the Unconstrained Fault-Tolerant Resource Allocation (UFTRA) problem (a.k.a. FTFA problem in \\cite{shihongftfa}). In the problem, we are given a set of sites equipped with an unconstrained number of facilities as resources, and a set of clients with set $\\mathcal{R}$ as corresponding connection requirements, where every facility belonging to the same site has an identical opening (operating) cost and every client-facility pair has a connection cost. The objective is to allocate facilities from sites to satisfy $\\mathcal{R}$ at a minimum total cost. Next, we introduce the Constrained Fault-Tolerant Resource Allocation (CFTRA) problem. It differs from UFTRA in that the number of resources available at each site $i$ is limited by $R_{i}$. Both problems are practical extensions of the classical Fault-Tolerant Facility Location (FTFL) problem \\cite{Jain00FTFL}. For instance, their solutions provide optimal resource allocation (w.r.t. enterprises) and leasing (w.r.t. clients) strategies for the cont...

Liao, Kewen

2011-01-01

267

TN- or TT-system. The difference of tolerable risks of protection under fault conditions; TN- oder TT-Systemen. Unterschiede in der Grenzrisiken fuer den Schutz gegen elektrischen Schlag unter Fehlerbedingungen  

Energy Technology Data Exchange (ETDEWEB)

For protection against electric shock under fault conditions (protection against indirect contact or fault protection) in the installation of buildings in most cases measures of fault protection by automatic disconnection of supply are used in form of the TN-system (protective neutral earthing) or the TT-system (protective direct earthing with RCD's as protective devices). The differences of tolerable risks of these measures of protection with protective conductors with regard to disconnecting times and touch voltages and in connection with it fault voltages and prospective touch voltages are investigated based on calculations of examples and measurements of comparison in the network. At the end of this contribution the most important definitions are explained. (orig.) [German] Zum Schutz gegen elektrischen Schlag unter Fehlerbedingungen (Schutz bei indirektem Beruehren oder auch Fehlerschutz) werden in der Gebaeudeinstallation in den weitaus meisten Faellen als Schutzmassnahmen durch automatische Abschaltung der Stromversorgung das TN-System (Nullung) oder das TT-System (Fehlerstrom-Schutzschaltung) angewendet. Die Unterschiede in den Grenzrisiken dieser Schutzleiter-Schutzmassnahmen in bezug auf die Abschaltzeiten und die Beruehrungsspannungen sowie die damit im Zusammenhang stehenden Fehlerspannungen und unbeeinflussten Beruehrungsspannungen werden anhand von Beispielrechnungen und Vergleichsmessungen im Netz untersucht. (orig.)

Biegelmeier, G.; Krefter, K.H. [Vereinigte Elektrizitaetswerke Westfalen AG (VEW), Dortmund (Germany). Abt. Energieanwendung; VEW Eurotest GmbH, Dortmund (Germany)

2000-02-07

268

Implementation of Fault-Tolerance Techniques for Real-Time Multiprocessor Scheduling  

UK PubMed Central (United Kingdom)

In this report I present some fault-tolerance techniques that has been implementedin a tool for real-time scheduling. The techniques implemented arebased on some of the most established existing techniques found in the realtimecommunity. The purpose of this work has only been to implement thetechniques; later on they are supposed to be used for studies on fault-toleranceproperties on real-time systems.The work basically consists of two parts: replication and reliability. Thepossibility to replicate tasks has been integrated into the scheduling tool. Afew simple algorithms are implemented and more can easily be added. Supportfor tasks with different period times has also been integrated.A reliability measure, probability of no dynamic failure has been implemented.The measure is presented in conjunction with a brief introduction tobasic fault tolerance theory. This quality measure could also be used to aid thescheduler for the purpose of improving reliability.Several reliab...

Ola Lundkvist; Jan Jonsson

269

Improving the Navigability of a Hexapod Robot using a Fault-Tolerant Adaptive Gait  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper encompasses a study on the development of a walking gait for fault tolerant locomotion in unstructured environments. The fault tolerant gait for adaptive locomotion fulfills stability conditions in opposition to a fault (locked joints or sensor failure) event preventing a robot to realize...

Umar Asif

270

Realization of Fault Tolerant Routing Protocol for Zigbee  

Directory of Open Access Journals (Sweden)

Full Text Available Problem statement: Increased use of handheld devices and sensor devices pose problems in existing routing protocols. The performance of the existing routing protocols deteriorates very much on these dense scenarios. Control overheads are very important parameter in deciding the performance of routing protocols, which are introduced during route discovery and maintenance process. Denser the network, higher is the control overhead in establishing and maintaining the communication path between end systems. This study aims at implementing an improved fault tolerant routing algorithm that minimizes the routing overhead for ad hoc networks using Zigbee. Approach: We propose a routing protocol which minimizes the routing overhead by exploiting the network density. The number of nodes involved in handling the control packets is minimized in the proposed protocol by selecting few of the neighbors of each node based on the received signal strength. The link breaks are maintained locally, thus by reducing the number of control overheads in the network. Results: The performance of the proposed protocol is tested using OMNet++ simulator. The implementation using Zigbee nodes indicate that the control overhead is reduced up to 80% in dense environments and 60% in heterogeneous and sparse thereby saving energy in the sensor nodes. Conclusion: The proposed protocol increases the energy conservation and hence the nodes life time and networks? lifetime.

Sharmila Sankar; Sankaranarayanan

2012-01-01

271

Sensor and Sensorless Fault Tolerant Control for Induction Motors Using a Wavelet Index  

Directory of Open Access Journals (Sweden)

Full Text Available Fault Tolerant Control (FTC) systems are crucial in industry to ensure safe and reliable operation, especially of motor drives. This paper proposes the use of multiple controllers for a FTC system of an induction motor drive, selected based on a switching mechanism. The system switches between sensor vector control, sensorless vector control, closed-loop voltage by frequency (V/f) control and open loop V/f control. Vector control offers high performance, while V/f is a simple, low cost strategy with high speed and satisfactory performance. The faults dealt with are speed sensor failures, stator winding open circuits, shorts and minimum voltage faults. In the event of compound faults, a protection unit halts motor operation. The faults are detected using a wavelet index. For the sensorless vector control, a novel Boosted Model Reference Adaptive System (BMRAS) to estimate the motor speed is presented, which reduces tuning time. Both simulation results and experimental results with an induction motor drive show the scheme to be a fast and effective one for fault detection, while the control methods transition smoothly and ensure the effectiveness of the FTC system. The system is also shown to be flexible, reverting rapidly back to the dominant controller if the motor returns to a healthy state.

Khalaf Salloum Gaeid; Hew Wooi Ping; Mustafa Khalid; Ammar Masaoud

2012-01-01

272

Network Fault Tolerance in Open MPI  

Energy Technology Data Exchange (ETDEWEB)

High Performance Computing (HPC) systems are rapidly growing in size and complexity. As a result, transient and persistent network failures can occur on the time scale of application run times, reducing the productive utilization of these systems. The ubiquitous network protocol used to deal with such failures is TCP/IP, however, available implementations of this protocol provide unacceptable performance for HPC system users, and do not provide the high bandwidth, low latency communications of modern interconnects. This paper describes methods used to provide protection against several network errors such as dropped packets, corrupt packets, and loss of network interfaces while maintaining high-performance communications. Micro-benchmark experiments using vendor supplied TCP/IP and O/S bypass low-level communications stacks over InfiniBand and Myrinet are used to demonstrate the high-performance characteristics of our protocol. The NAS Parallel Benchmarks are used to demonstrate the scalability and the minimal performance impact of this protocol. The micro-benchmarks show that providing higher data reliability decrease performance by up to 30% relative to unprotected communications, but provide performance improvements of a factor of four over TCP/IP running over InfiniBand DDR. The NAS Parallel Benchmarks show virtually no impact of the data reliability protocol on overall run-time.

Shipman, Galen [Los Alamos National Laboratory (LANL); Graham, Richard L [ORNL; Bosilca, George [University of Tennessee, Knoxville (UTK)

2007-01-01

273

Close range fault tolerant noncontacting position sensor  

Energy Technology Data Exchange (ETDEWEB)

A method and system for locating the three dimensional coordinates of a moving or stationary object in real time. The three dimensional coordinates of an object in half space or full space are determined based upon the time of arrival or phase of the wave front measured by a plurality of receiver elements and an established vector magnitudes proportional to the measured time of arrival or phase at each receiver element. The coordinates of the object are calculated by solving a matrix equation or a set of closed form algebraic equations.

Bingham, Dennis N. (Idaho Falls, ID); Anderson, Allen A. (Shelley, ID)

1996-01-01

274

Self-checking and Fault Tolerant approaches can help BIST fault coverage: a case study  

UK PubMed Central (United Kingdom)

ts- the control unit responsible for the "normal mode"behavior is a critical component, therefore 100% faultcoverage is required- in the BIST controller faults that shorten the testsequence have to be covered.2. Test strategyGiven the above constraints, different design choicesconcerning the BIST architecture were adopted:- the dual-port memory, inclusive of all decoding logic:the March B- test for SOA memories is selected[VdGZ93], that meets fault coverage goals- the data path: a functional test is applied by the BISTcontroller while the March test is in progress- the normal-mode control unit, controlling the FIFObehavior of the component while not under test. Severalalternatives were examined. The implementedalternative is a fault-tolerant control unit, using tripleredundancy. Since the control unit is

Fulvio Corno; Paolo Prinetto; Matteo Sonza Reorda

275

Catalysis and activation of magic states in fault tolerant architectures  

CERN Document Server

In many architectures for fault tolerant quantum computing universality is achieved by a combination of Clifford group unitaries and preparation of suitable non-stabilizer states, the so-called magic states. Universality is possible even for some fairly noisy non-stabilizer states, as distillation can convert many copies into a purer magic state. Here we propose novel protocols that exploit multiple species of magic states in surprising ways. These protocols provide examples of previously unobserved phenomena that are analogous to catalysis and activation well known in entanglement theory.

Campbell, Earl T

2010-01-01

276

Fault Tolerant Platform for Application Mobility across devices  

Directory of Open Access Journals (Sweden)

Full Text Available In the mobile era, users started using Smartphone’s, tablets and other handheld devices, The advances in telecom technologies like 3G accelerates the migration towards smart phones. But still battery power and frequent change of handsets is still a constraint.They burden on user have to manually synchronize their contacts, applications they use to the new phones. Also they loss whatever they are doing when the mobile get power down. In this paper, we propose a solution to the problem discussed with a new fault tolerant platform which can provide application mobility across the devices.

T. N. Anitha; Jayanth. A

2012-01-01

277

Fault Tolerant Air Bubble Sensor using Triple Modular Redundancy Method  

Directory of Open Access Journals (Sweden)

Full Text Available Detection of air bubbles in the blood is important for various medical treatments that use Extracorporeal Blood Circuits (ECBC), such as hemodialysis, hemofiltration and cardio-pulmonary bypass. Therefore a reliable air bubble detector is needed. In this study designed a fault tolerant air bubble detector. Triple Modular Redundancy (TMR) method is used on the sensor section. A voter circuit of the TMR will choose one of three sensor output to be processed further. Application of TMR will prevent errors in the detection of air bubbles, especially if the sensor fails to work

Noor Cholis Basjaruddin; Yoga Priyana; Kuspriyanto Kuspriyanto

2013-01-01

278

Fault Tolerant Electrical Machines. State of the Art and Future Directions  

Directory of Open Access Journals (Sweden)

Full Text Available Nowadays the evolution of electrical engineering achieved a successful expansion in the area of fault tolerant electrical machines. To achieve fault tolerance researchers tried to design various geometries and different electrical drives. When new designers are intended to be performed the knowledge of the actualstate of the work is impetuously needed. The paper summarizes the most important information on these topics. Both fault tolerant machine and drive structure were taken into accounts. In the paper also a new idea for a fault tolerant switched reluctance machine having a special winding is presented. The future tasks to be performed are also mentioned in the paper.

Mircea RUBA; Loránd SZABÓ

2008-01-01

279

2009 fault tolerance for extreme-scale computing workshop, Albuquerque, NM - March 19-20, 2009.  

Energy Technology Data Exchange (ETDEWEB)

This is a report on the third in a series of petascale workshops co-sponsored by Blue Waters and TeraGrid to address challenges and opportunities for making effective use of emerging extreme-scale computing. This workshop was held to discuss fault tolerance on large systems for running large, possibly long-running applications. The main point of the workshop was to have systems people, middleware people (including fault-tolerance experts), and applications people talk about the issues and figure out what needs to be done, mostly at the middleware and application levels, to run such applications on the emerging petascale systems, without having faults cause large numbers of application failures. The workshop found that there is considerable interest in fault tolerance, resilience, and reliability of high-performance computing (HPC) systems in general, at all levels of HPC. The only way to recover from faults is through the use of some redundancy, either in space or in time. Redundancy in time, in the form of writing checkpoints to disk and restarting at the most recent checkpoint after a fault that cause an application to crash/halt, is the most common tool used in applications today, but there are questions about how long this can continue to be a good solution as systems and memories grow faster than I/O bandwidth to disk. There is interest in both modifications to this, such as checkpoints to memory, partial checkpoints, and message logging, and alternative ideas, such as in-memory recovery using residues. We believe that systematic exploration of these ideas holds the most promise for the scientific applications community. Fault tolerance has been an issue of discussion in the HPC community for at least the past 10 years; but much like other issues, the community has managed to put off addressing it during this period. There is a growing recognition that as systems continue to grow to petascale and beyond, the field is approaching the point where we don't have any choice but to address this through R&D efforts.

Katz, D. S.; Daly, J.; DeBardeleben, N.; Elnozahy, M.; Kramer, B.; Lathrop, S.; Nystrom, N.; Milfeld, K.; Sanielevici, S.; Scott, S.; Votta, L.; Louisiana State Univ.; Center for Exceptional Computing; LANL; IBM; Univ. of Illinois; Shodor Foundation; Pittsburgh Supercomputer Center; Texas Advanced Computing Center; ORNL; Sun Microsystems

2009-02-01

280

A FAULT TOLERANT TOKEN BASED ATOMIC BROADCAST ALGORITHM RELYING ON RESPONSIVE PROPERTY  

Directory of Open Access Journals (Sweden)

Full Text Available In the Distributed Environment where shared resources are involved, we have basically two types of mechanism to allocate the shared resources: either by passing tokens or by having Request and Reply Messages. In the shared environment, a processor might fail (i.e. may crash which may lead to failure). This paper proposes a fault tolerant token based atomic broadcast algorithm which does rely on unreliable failure detectors. It combines the failure detector and a token based mechanism, satisfying responsiveness property. The mechanism can tolerate processor level faults as compared to the existing system level failure, because the proposed system is relying on the unreliable failure detector and also rely on the responsive property.

NEELAMANI SAMAL; DEBASIS GOUNTIA; MADHUSMITA SAHU

2013-01-01

 
 
 
 
281

Checkpoint and Replication Oriented Fault Tolerant Mechanism for MapReduce Framework  

Directory of Open Access Journals (Sweden)

Full Text Available MapReduce is an emerging programming paradigm and an associated implementation for processing and generating big data which has been widely applied in data-intensive systems. In cloud environment, node and task failure is no longer accidental but a common feature of large-scale systems. In MapReduce framework, although the rescheduling based fault-tolerant method is simple to implement, it failed to fully consider the location of distributed data, the computation and storage overhead. Thus, a single node failure will increase the completion time dramatically. In this paper, a Checkpoint and Replication Oriented Fault Tolerant scheduling algorithm (CROFT) is proposed, which takes both task and node failure into consideration. Preliminary experiments show that with less storage and network overhead. CROFT will significantly reduce the completion time at failure time, and the overall performance of MapReduce can be improved at least over 30% than original mechanism in Hadoop.  

Yang Liu; Wei Wei; Yuhong Zhang

2013-01-01

282

Fault Tolerance and Scalability in DSM Coherence Protocols - A Simulation Approach  

UK PubMed Central (United Kingdom)

With the advent of large networks and the demand to have uninterrupted service, there is a pressing need for computer systems to be more robust and fault tolerant. There are numerous ways to implement fault tolerance and recovery [5, 50]. Yet, a central concept in all these methods is the requirement for replicated data leading to high data availability. We believe that a protocol must not only provide data replication, but also that it should do so at low operational overhead. Further, the protocol must provide mechanisms for varying the level of replication (so that the system may be operated at a desired overhead cost), and must scale well. At the University of California, Riverside, we have developed a program-driven ...

Kirit Shah

283

Fault-tolerant sub-lithographic design with rollback recovery  

Energy Technology Data Exchange (ETDEWEB)

Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (P{sub f} = 10{sup -7}) in systems with 10{sup 12} susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme.

Naeimi, Helia [Department of Computer Science, California Institute of Technology, Pasadena, CA 91125 (United States); DeHon, Andre [Department of Electrical and System Engineering, University of Pennsylvania, 200 S. 33rd Street, Philadelphia, PA 19104 (United States)], E-mail: helia@caltech.edu

2008-03-19

284

Fault-tolerant sub-lithographic design with rollback recovery.  

Science.gov (United States)

Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (P(f) = 10(-7)) in systems with 10(12) susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme. PMID:21730568

Naeimi, Helia; Dehon, André

2008-02-19

285

Fault-tolerant self-routing computer network topology  

Energy Technology Data Exchange (ETDEWEB)

This document reports on the development and analysis of a new, easily expandable, highly fault tolerant self-routing computer network topology. The topology applies equally to any general-purpose computer-networking environment, whether local, metropolitan, or wide area. This new connectivity scheme is named the spiral topology because the architecture is built around modules of four computer nodes each, connected by top and bottom spirals. The spiral topology features a simple internal self-routing algorithm that adapts quickly, and automatically, to failed nodes and links. The six most important direct consequences of the spiral computer-network architecture are the topology's (1) ease of expansion; (2) fast, on-the-fly self-routing; (3) extremely high tolerance to network faults; (4) increased network security; (5) potential for the total elimination of store and forward transmissions due to routing decision delays; and (6) rendering the maximum path length issue moots. The fast on-the-fly routing capability of the spiral topology makes it highly amenable to fiber topic communications in any networking environment.

Mitchell, T.L.

1987-01-01

286

New Results on the Fault-Tolerant Facility Placement Problem  

CERN Multimedia

We studied the Fault-Tolerant Facility Placement problem (FTFP) which generalizes the uncapacitated facility location problem (UFL). In FTFP, we are given a set F of sites at which facilities can be built, and a set C of clients with some demands that need to be satisfied by different facilities. A client $j$ has demand $r_j$. Building one facility at a site $i$ incurs a cost $f_i$, and connecting one unit of demand from client $j$ to a facility at site $i\\in\\fac$ costs $d_{ij}$. $d_{ij}$'s are assumed to form a metric. A feasible solution specifies the number of facilities to be built at each site and the way to connect demands from clients to facilities, with the restriction that demands from the same client must go to different facilities. Facilities at the same site are considered different. The goal is to find a solution with minimum total cost. We gave a 1.7245-approximation algorithm to the FTFP problem. Our technique is via a reduction to the Fault-Tolerant Facility Location problem, in which each cli...

Yan, Li

2011-01-01

287

A fault tolerant VLSI implementation of a nuclear control rod controller  

International Nuclear Information System (INIS)

This paper presents a VLSI implementation of a control system used for automatic control of control rods in a typical nuclear power station. Fast, efficient, and reliable control over the control rods is achieved. The design is divided into two VLSI chips that form the heart of a hybrid redundant scheme for fault tolerance. The layout was generated using the MOSIS CMOS 1-2 micron process design rules

1989-01-01

288

Robust and Fault-Tolerant Linear Parameter-Varying Control of Wind Turbines  

DEFF Research Database (Denmark)

High performance and reliability are required for wind turbines to be competitive within the energy market. To capture their nonlinear behavior, wind turbines are often modeled using parameter-varying models. In this paper we design and compare multiple linear parameter-varying (LPV) controllers, designed using a proposed method that allows the inclusion of both faults and uncertainties in the LPV controller design. We specifically consider a 4.8 MW, variable-speed, variable-pitch wind turbine model with a fault in the pitch system. We propose the design of a nominal controller (NC), handling the parameter variations along the nominal operating trajectory caused by nonlinear aerodynamics. To accommodate the fault in the pitch system, an active fault-tolerant controller (AFTC) and a passive fault-tolerant controller (PFTC) are designed. In addition to the nominal LPV controller, we also propose a robust controller (RC). This controller is able to take into account model uncertainties in the aerodynamic model. The controllers are based on output feedback and are scheduled on an estimated wind speed to manage the parameter-varying nature of the model. Furthermore, the AFTC relies on information from a fault diagnosis system. The optimization problems involved in designing the PFTC and RC are based on solving bilinear matrix inequalities (BMIs) instead of linear matrix inequalities (LMIs) due to unmeasured parameter variations. Consequently, they are more difficult to solve. The paper presents a procedure, where the BMIs are rewritten into two necessary LMI conditions, which are solved using a two-step procedure. Simulation results show the performance of the LPV controllers to be superior to that of a reference controller designed based on classical principles.

Sloth, Christoffer; Esbensen, Thomas

2011-01-01

289

Residue Arithmetic for Fault-Tolerant Multiplier: The Residue Generator.  

Science.gov (United States)

Detection and correction of errors due to hardware faults in a VLSI multiplier are treated, with regard to silicon area and time performances. Due to its regularity, modularity, and carry-free properties, Residue Number System (RNS) is considered to desig...

V. Piuri A. Fabi

1986-01-01

290

Design and Bandwidth Analysis of Fault-Tolerant Multistage Interconnection Networks  

Directory of Open Access Journals (Sweden)

Full Text Available The design of a suitable interconnection network for inter-processor communication is one of the key issues of the system performance. In this study a new irregular interconnection network IABN (Irregular Augmented Baseline) has been proposed. IABN is designed by modifying existing ABN (Augmented Baseline Network). ABN is a regular multi-path network with limited fault tolerance. IABN provides three times more paths between any pair of source-destination in comparison to ABN. The ABN and IABN MINs are analyzed and compared in terms of performance parameters namely Bandwidth, Cost and Bandwidth per unit Cost. The proposed network IABN provides much better fault-tolerance and almost double bandwidth at the expanse of little more cost than ABN.

R. Aggarwal; L. Kaur

2008-01-01

291

Actuator usage and fault tolerance of the James Webb Space Telescope optical element mirror actuators  

Science.gov (United States)

The James Webb Space Telescope (JWST) telescope's secondary mirror and eighteen primary mirror segments are each actively controlled in rigid body position via six hexapod actuators. The mirrors are stowed to the mirror support structure to survive the launch environment and then must be deployed 12.5 mm to reach the nominally deployed position before the Wavefront Sensing & Control (WFS&C) alignment and phasing process begins. The actuation system is electrically, but not mechanically redundant. Therefore, with the large number of hexapod actuators, the fault tolerance of the OTE architecture and WFS&C alignment process has been carefully considered. The details of the fault tolerance will be discussed, including motor life budgeting, failure signatures, and motor life.

Barto, A.; Acton, D. S.; Finley, P.; Gallagher, B.; Hardy, B.; Knight, J. S.; Lightsey, P.

2012-09-01

292

Fault tolerant workflow scheduling based on replication and resubmission of tasks in Cloud Computing  

Directory of Open Access Journals (Sweden)

Full Text Available The aim of workflow scheduling system is to schedule the workflows within the user given deadline to achieve a good success rate. Workflow is a set of tasks processed in a predefined order based on its data and control dependency. Scheduling these workflows in a computing environment, like cloud environment, is an NP-Complete problem and it becomes more challenging when failures of tasks areconsidered. To overcome these failures, the workflow scheduling system should be fault tolerant. In this paper, the proposed Fault Tolerant Workflow Scheduling algorithm (FTWS) provides fault tolerance by using replication and resubmission of tasks based on priority of the tasks. The replication of tasks depends on a heuristic metric which is calculated by finding the tradeoff between the replication factor and resubmission factor. The heuristic metric is considered because replication alone may lead to resource wastage and resubmission alone may increase makespan. Tasks are prioritized based on the criticality of the task which is calculated by using parameters like out degree, earliest deadline and high resubmission impact. Priority helps in meeting the deadline of a task and thereby reducing wastage of resources. FTWS schedules workflows within a deadline even in the presence of failures without using any history of information. The experiments were conducted in a simulated cloud environment by scheduling workflows in the presence of failures which are generated randomly. The experimental results of the proposed work demonstrate the effective success rate in-spite of various failures.

Jayadivya S K; Jaya Nirmala S; Mary Saira Bhanu S

2012-01-01

293

Robust and Fault Tolerant Control of CD-players  

DEFF Research Database (Denmark)

Several new standards have emerged recently in the area of portable optical data sto-rage media and more are on their way. In addition to the well known Compact Disc(CD), portable optical media now also feature media for video storage (DVDs) and ge-neral data storage media for computer purposes (CD-ROMs). DVDs can be two-sided with multiple layers, allowing read, write and rewrite operations. Most significantly in this context, the new media typically have much higher physical data densities. This constitutes a significant challenge in terms of playability (the ability to reproduce the information from non-ideal discs in non-ideal circumstances) which is the main topic this Ph.D. thesis is focused on. There are three important contributions to the technical field of study treated in the thesis. It is known that the specific characteristics of the CD-drives vary from unit to unit. Traditionally the parameter estimation is performed in closed loop, probably because open loop estimation has been stated for being very difficult or even impossible. A novel method, which requires an additional current measurement, is presented in this work where parameter estimation is accomplished in open loop in a simple and reliable way. The second main contribution is related to robust control. Usually, the nominal and uncertainty models are assumed to be known and the designer is limited to specify the performance requirements. In a more realistic situation, the designer may only have a set of complex points in the Nyquist plane from several worst-case plants as a result of measurement experiments. In the thesis a deterministic method is proposed, which generates a nominal and uncertainty model based on the set of complex points in a less conservative way than conventional methods. Finally, the third main contribution is to be found in the fault-diagnosis and fault-tolerant control fields. One of the main challenges in the positioning control of the focus point in CD-players is to handle two types of disturbances with conflicting requirements in an effective way. While a high bandwidth is desired to better suppress shocks, a low bandwidth is preferred in the presence of surface defects. Traditionally, a simple defect detector is employed to deal with this trade-off. In this work, two fault diagnosis schemes are suggested which are able not only to detect but also to separate, to certain extent, the characteristics of the signals originated by the surface defects. Furthermore two fault-tolerant control schemes are proposed such that the mentioned trade-off is handled in a more efficient way.

Vidal, Enrique Sanchez

2003-01-01

294

A Survey of NASA and Military Standards on Fault Tolerance and Reliability Applied to Robotics  

UK PubMed Central (United Kingdom)

There is currently increasing interest and activity in the area of reliability and fault tolerance for robotics. This paper discusses the application of Standards in robot reliability, and surveys the literature of relevant existing standards. A bibliography of relevant Military and NASA standards for reliability and fault tolerance is included.

Joseph R. Cavallaro; Ian D. Walker

295

Second-order sliding mode fault-tolerant control of heat recovery steam generator boiler in combined cycle power plants  

International Nuclear Information System (INIS)

[en] Power generation plants are intrinsically complex systems due to their numerous internal components. Higher energy efficiency in power plants is now achieved through employing combined cycles. In this article, an adaptive robust Sliding Mode Controller (SMC) is designed to overcome the faults in Heat Recovery Steam Generator boilers (HRSG boilers) as one of the main parts of a combined cycle plant. On condition that a fault occurs in the HRSG boiler, the control system must be able to reconfigure its parameters to maintain the admissible thresholds in dynamic variables such as drum pressure, steam temperature, and drum water level. To achieve good performance for the boiler, the proposed adaptive robust SMC shall conquer the effects of faults and uncertainties by estimating their upper bounds adaptively, and force the outputs of the multivariable boiler to track the outputs of a desired multivariable reference model. Manipulating a suitable control input and using second-order sliding mode control strategy, the output tracking error slides to zero on a PID sliding surface. Besides tracking, the controlled boiler tolerates faults in system matrix, faults in input matrix, and external disturbance signal. Numerical simulations confirm the effectiveness of the proposed FTC (Fault-Tolerant Control) system for an uncertain non-minimum phase HRSG boiler. Highlights: ? This paper proposes a PID-based adaptive second-order sliding mode controller (SMC). ? SMC is robust to actuator and sensor faults and tracks outputs of a reference system. ? SMC is used in fault tolerant control of a heat recovery steam generator boilers. ? Boiler and reference system have different number of states and inputs. ? Performance of SMC is investigated with different faults scenarios in simulations.

2013-01-10

296

Design Approach for Fault Tolerance Algorithm in FPGA Architecture with BIST in Hardware Controller  

Directory of Open Access Journals (Sweden)

Full Text Available Redundancy based hardening techniques are applied at the pre-synthesis or synthesis level. To provide solutions for increasing the fault-tolerance capabilities with algorithms able to reduce sensitive configuration memory bits of FPGAs we use BIST method. While these systems frequently contain hardware redundancy to allow for continued operation in the presence of operational faults, the need to recover faulty hardware and return it to full functionality quickly and efficiently is great. In addition to providing functional density, FPGAs provide a level of fault tolerance generally not found in mask-programmable devices by including the capability to reconfigure around operational faults in the field. Reliability and process variability are serious issues for FPGAs in the future. With advancement in process technology, the feature size is decreasing which leads to higher defect densities, more sophisticated techniques at increased costs are required to avoid defects. In this work we present a solution in which configuration bit-stream of FPGA is modified by a hardware controller that is present on the chip itself. The technique uses redundant device for replacing faulty device and increases the yield.

Shweta S. Meshram; Sanjay O. Dahad; Ujwala A. Belorkar

2011-01-01

297

Cluster-based architecture for fault-tolerant quantum computation  

CERN Multimedia

We present a detailed description of an architecture for fault-tolerant quantum computation, which is based on the cluster model of encoded qubits. In this cluster-based architecture, concatenated computation is implemented in a quite different way from the usual circuit-based architecture where physical gates are recursively replaced by logical gates with error-correction gadgets. Instead, some relevant cluster states, say fundamental clusters, are recursively constructed through verification and postselection in advance for the higher-level one-way computation, which namely provides error-precorrection of gate operations. A suitable code such as the Steane seven-qubit code is adopted for transversal operations. This concatenated construction of verified fundamental clusters has a simple transversal structure of logical errors, and achieves a high noise threshold ~ 3 % for computation by using appropriate verification procedures. Since the postselection is localized within each fundamental cluster with the h...

Fujii, Keisuke

2009-01-01

298

Fault-Tolerant Energy-Efficient Tree in Dynamic WSNs  

Directory of Open Access Journals (Sweden)

Full Text Available Broadcasting has a main importance in Wireless Sensor Networks (WSNs). Effectively, the sink node hasto collect periodically, data from the environmentsupervised by sensors. To perform this operation, itsends requests to all nodes. Furthermore, WSNs havea dynamic behaviour due to their evolution. At anytime, a node can be retrieved from the network dueto an exhausting energy or a node problem. In fact,WSNs are prone to failure such as software or hardware malfunctioning, exhaustion of energy,wireless interference and environmental hazards.Thus, an appropriate broadcasting method shouldtake into consideration this aspect and uses the less possible amount of energy to accomplish the task. Inthis paper, a robust tree-based scheme is proposedwhich is called Robust Tree Broadcasting (RTB). Thenew scheme has a load-balanced behaviour which induces an efficient use of energy. In addition, RTBhas a high-quality fault tolerant performance.

Tarek Moulahi; Ahmed Almuhirat; Lamri Laouamer

2013-01-01

299

Adaptive Fault Tolerant Routing Algorithm for Tree-Hypercube Multicomputer  

Directory of Open Access Journals (Sweden)

Full Text Available A Connected tree-hypercube with faulty links and/or nodes is called injured tree-hypercube. To enable any non faulty node to communicate with any other non faulty node in an injured tree-hypercube, the information on component failures has to be made available to non faulty nodes to route message around the faulty components. We proposed an adaptive fault tolerant routing algorithm for an injured tree-hypercube in which requires each node to know only the condition of it?s own links. This routing algorithm is shown to be capable of routing messages successfully in an injured tree-hypercube as long as the number of faulty components links and/or nodes is equal d (depth).

Qatawneh Mohammad

2006-01-01

300

A Fault Tolerant, Area Efficient Architecture for Shor's Factoring Algorithm  

CERN Document Server

We optimize the area and latency of Shor's factoring while simultaneously improving fault tolerance through: (1) balancing the use of ancilla generators, (2) aggressive optimization of error correction, and (3) tuning the core adder circuits. Our custom CAD flow produces detailed layouts of the physical components and utilizes simulation to analyze circuits in terms of area, latency, and success probability. We introduce a metric, called ADCR, which is the probabilistic equivalent of the classic Area-Delay product. Our error correction optimization can reduce ADCR by an order of magnitude or more. Contrary to conventional wisdom, we show that the area of an optimized quantum circuit is not dominated exclusively by error correction. Further, our adder evaluation shows that quantum carry-lookahead adders (QCLA) beat ripple-carry adders in ADCR, despite being larger and more complex. We conclude with what we believe is one of most accurate estimates of the area and latency required for 1024-bit Shor's factorizat...

Whitney, Mark G; Patel, Yatish; Kubiatowicz, John

2009-01-01

 
 
 
 
301

Fault Tolerant Distributed and Fixed Hierarchical Mobile IP  

Directory of Open Access Journals (Sweden)

Full Text Available To several mobility management protocols proposed for IP-based mobile networks, faulttolerance aspect of mobility agents is a primary requirement to sustain continuous service availability to themobile hosts. For a localized or micro- mobility management solution, the local mobility agent i.e. gateway isa single point of failure because it is responsible for enforcing the signaling and data packets in its domain.Such failures may severely disrupt the communications among the failure-affected users. The problembecomes even more severe for mobility agents in a distributed mobility management scheme with overlappingregistration areas.This paper proposes a fault tolerance scheme for Distributed and Fixed Hierarchical Mobile IP(DFHMIP) and evaluates its performance in terms of data transmission cost and blocking probability.

Paramesh C. Upadhyay; Sudarshan Tiwari

2010-01-01

302

Experimental magic state distillation for fault-tolerant quantum computing.  

UK PubMed Central (United Kingdom)

Any physical quantum device for quantum information processing (QIP) is subject to errors in implementation. In order to be reliable and efficient, quantum computers will need error-correcting or error-avoiding methods. Fault-tolerance achieved through quantum error correction will be an integral part of quantum computers. Of the many methods that have been discovered to implement it, a highly successful approach has been to use transversal gates and specific initial states. A critical element for its implementation is the availability of high-fidelity initial states, such as |0? and the 'magic state'. Here, we report an experiment, performed in a nuclear magnetic resonance (NMR) quantum processor, showing sufficient quantum control to improve the fidelity of imperfect initial magic states by distilling five of them into one with higher fidelity.

Souza AM; Zhang J; Ryan CA; Laflamme R

2011-01-01

303

Fault-Tolerant Target Localization in Sensor Networks  

Directory of Open Access Journals (Sweden)

Full Text Available Fault-tolerant target detection and localization is a challenging task in collaborative sensor networks. This paper introduces our exploratory work toward identifying the targets in sensor networks with faulty sensors. We explore both spatial and temporal dimensions for data aggregation to decrease the false alarm rate and improve the target position accuracy. To filter out extreme measurements, the median of all readings in a close neighborhood of a sensor is used to approximate its local observation to the targets. The sensor whose observation is a local maxima computes a position estimate at each epoch. Results from multiple epoches are combined together to further decrease the false alarm rate and improve the target localization accuracy. Our algorithms have low computation and communication overheads. Simulation study demonstrates the validity and efficiency of our design.

Min Ding; Fang Liu; Andrew Thaeler; Dechang Chen; Xiuzhen Cheng

2007-01-01

304

Performance and fault tolerance improvements in the inverse augmented data manipulator network  

Energy Technology Data Exchange (ETDEWEB)

The inverse augmented data manipulator (IADM) is a multistage interconnection network based on the augmented data manipulator (ADM) and Feng's data manipulator. It is designed to be used in large-scale parallel/distributed processing systems for communication among processors, memories, and other system devices. Two aspects of IADM network design are discussed: performance and fault tolerance. A single stage look-ahead scheme for predicting blockage is presented to enhance performance. One method of adding some links to the network to enable it to tolerate one link failure is described. A different method of adding links is shown which both improves performance and allows the network to tolerate two switching element or two link failures. Included is a new routing tag scheme which accommodates the new links. 27 references.

McMillen, R.J.; Siegel, H.J.

1982-01-01

305

Novel approach to fault-tolerant logic and yield enhancement  

Energy Technology Data Exchange (ETDEWEB)

A design technique for improving reliability in function of a gate is proposed, in which a plurality of conventional logic circuits (gates) are used so as to give redundancy to a logic circuit itself. The gate with redundancy designed on the basis of the proposed technique is called the fault-tolerant gate (FTG) in this paper. The FTG has a recovery function with respect to a wider variety of faults. It is much more powerful than that offered by the TMR (triple modular redundancy) circuits. Therefore, the highly reliable logic circuits can be realized, and when the concept of FTGs is applied to vlsi chips the production yield must be enhanced. This paper is divided into three parts. In the first part, concrete methods to realize FTGs are described. The second part proves that the reliability of the gates can be improved by employing the concept of FTGs. In the last part, it is shown that the FTG contributes to the yield enhancement of vlsi chips. 13 references.

Takefuji, Y.; Adachi, Y.; Aiso, H.

1982-01-01

306

Fly-By-Light/Power-By-Wire Fault-Tolerant  

UK PubMed Central (United Kingdom)

The design and development of a fault-tolerant fiber-optic backplaneto demonstrate feasibility of such architecture is presented. Thesimulation results of test cases on the backplane in the advent of inducedfaults are presented, and the fault recovery capability of the architectureis demonstrated. The architecture was designed, developed, andimplemented using the Very High Speed Integrated Circuits (VHSIC)Hardware Description Language (VHDL). The architecture wassynthesized and implemented in hardware using Field ProgrammableGate Arrays (FPGA) on multiple prototype boards.ivAcknowledgmentsI would like to acknowledge my gratitude to Dr. Jerry H. Tucker of NASA Langley ResearchCenter for his guidance during the development process. I would like to acknowledge my appreciation toDr. Celeste M. Belcastro of NASA Langley Research Center for her recommendations. I would also liketo acknowledge my appreciation to Dr. Paul S. Miner of NASA Langley Research Center for his helpfulcomments in earlier version of this report. Lastly, I would like to acknowledge my appreciation toWilfredo Torres-Pomales of NASA Langley Research Center for his review and helpful comments of thefinal version of this report.vTable of ContentsAcknowledgments .....................................................................................................................ivTable of Contents........................................................................................................................vList of Figures............................................................................................................................viAcronyms ..............................................................................................................................

307

Policy Specification for Non-Local Fault Tolerance  

UK PubMed Central (United Kingdom)

The services provided by critical infrastructure systems are essential to the operation of modernsociety. These systems include the financial payments system, transportation systems, militarycommand and control systems, the electric power grid, and telecommunications systems includingthe Internet. Widespread failure of any of these system might result in severe financial loss or perhapshuman injury. Critical infrastructure systems rely heavily on distributed information systemsfor operation. These information systems must therefore be dependable; that is, they must "deliverservice that can justifiably be trusted."Traditional dependability alone does not provide a rich enough model to deal with the faults inlarge, critical distributed systems operating in hostile environments. These systems require not simplydependability but instead require survivability. Informally, survivability is when a system has"the ability to continue to provide service (possibly degraded or different) in a given environmentwhen various events cause major damage to the system or its operating environment.".

E. Varner; Anita K. Jones; David Evans (committee Chair; Richard W. Miksad (dean

308

ROBUST FAULT TOLERANT CONTROL WITH SENSOR FAULTS FOR A FOUR-ROTOR HELICOPTER  

Directory of Open Access Journals (Sweden)

Full Text Available This paper considers the control problem for an underactuated quadrotor UAV system in presence of sensor faults. Dynamic modelling of quadrotor while taking into account various physical phenomena, which can influence the dynamics of a flying structure is presented. Subsequently, a new control strategy based on robust integral backstepping approach using sliding mode and taking into account the sensor faults is developed. Lyapunov based stability analysis shows that the proposed control strategy design keep the stability of the closed loop dynamics of the quadrotor UAV even after the presence of sensor failures. Numerical simulation results are provided to show the good tracking performance of proposed control laws.

Hicham Khebbache; Belkacem Sait; Fouad Yacef

2012-01-01

309

Data Structures: Sequence Problems, Range Queries, and Fault Tolerance  

DEFF Research Database (Denmark)

The focus of this dissertation is on algorithms, in particular data structures that give provably ecient solutions for sequence analysis problems, range queries, and fault tolerant computing. The work presented in this dissertation is divided into three parts. In Part I we consider algorithms for a range of sequence analysis problems that have risen from applications in pattern matching, bioinformatics, and data mining. On a high level, each problem is dened by a function and some constraints and the job at hand is to locate subsequences that score high with this function and are not invalidated by the constraints. Many variants and similar problems have been proposed leading to several dierent approaches and algorithms. We consider problems where the function is the sum of the elements in the sequence and the constraints only bound the length of the subsequences considered. We give optimal algorithms for several variants of the problem based on a simple idea and classic algorithms and data structures. In Part II we consider range query data structures. This a category of problems where the task is to preprocess an input sequence using as little time and space as possible such that one can eciently compute a certain function on the elements in a given query subsequence. There are many types of functions that has been considered in connection with input from dierent sources. The input could be ip-data sorted by ip-address, real estate prices sorted by zip code, advertising cost sorted by time etc. We consider data structures for two classic statistics functions, namely median and mode. Finally, Part III investigates fault tolerant algorithms and data structures. This deals with the trend of avoiding elaborate error checking and correction circuitry that would impose non-negligible costs in terms of hardware performance and money in the design of todays high speed memory technologies. Hardware, power failures, and environmental conditions such as cosmic rays and alpha particles can all alter the memory in unpredictable ways. In applications where large memory capacities are needed at low cost, it makes sense to assume that the algorithms themselves are in charge for dealing with memory faults. We investigate searching, sorting and counting algorithms and data structures that provably returns sensible information in spite of memory corruptions.

JØrgensen, Allan GrØnlund

2010-01-01

310

Fault Diagnosis for Electrical Distribution Systems using Structural Analysis  

DEFF Research Database (Denmark)

Fault-tolerance in electrical distribution relies on the ability to diagnose possible faults and determine which components or units cause a problem or are close to doing so. Faults include defects in instrumentation, power generation, transformation and transmission. The focus of this paper is the design of efficient diagnostic algorithms, which is a prerequisite for fault-tolerant control of power distribution. Diagnosis in a grid depend on available analytic redundancies, and hence on network topology. When topology changes, due to earlier fault(s) or caused by maintenance, analytic redundancy relations (ARR) are likely to change. The algorithms used for diagnosis may need to change accordingly, and finding efficient methods to ARR generation is essential to employ fault-tolerant methods in the grid. Structural analysis (SA) is based on graph-theoretical results, that offer to find analytic redundancies in large sets of equations only from the structure (topology) of the equations. A salient feature is automated generation of redundancy relations. The method is indeed feasible in electrical networks where circuit theory and network topology together formulate the constraints that define a structure graph. This paper shows how three-phase networks are modelled and analysed using structural methods, and it extends earlier results by showing how physical faults can be identified such that adequate remedial actions can be taken. The paper illustrates a feasible modelling technique for structural analysis of power systems, it demonstrates detection and isolation of failures in a network, and shows how typical faults are diagnosed. Nonlinear fault simulations illustrate the results.

Knüppel, Thyge; Blanke, Mogens

2013-01-01

311

Initial Fault Tolerance and Autonomy Results for Autonomous On-board Processing of Hyperspectral Imaging  

Science.gov (United States)

By developing Radiation Hardening by Software (RHBSW) techniques leveraged from the High Performance Computing community, our work seeks to deliver radiation tolerant, high performance System on a Chip (SoC) processors to the remote sensing community. This SoC architecture is uniquely suited to both handle high performance signal processing tasks, as well as autonomous agent processing. This allows situational awareness to be developed in-situ, resulting in a 10-100x decrease in processing latency, which directly translates into more science experiments conducted per day and a more thorough, timely analysis of captured data. With the increase in the amount of computational throughput made possible by commodity high performance processors and low overhead fault tolerance, new applications can be considered for on-board processing. A high performance and low overhead fault tolerance strategy targeting scientific applications on the SpaceCube 1.0 platform has been enhanced with initial results showing an order of magnitude increase in Mean Time Between Data Error and a complete elimination of processor hangs. Initial study of representative Hyperspectral applications also proves promising due to high levels of data parallelism and fine grained parallelism achievable within FPGA System on a Chip architectures enabled by our RHBSW techniques. To demonstrate the kinds of capabilities these fault tolerance approaches yield, the team focused on applications representative of the Decadal Survey HyspIRI mission, which uses high throughput Thermal Infrared Scanner (132 Mbps) and Hyperspectral Visibe ShortWave InfraRed (804 Mbps) instruments, while having only a 15 Mbps downlink channel. This mission provides a great many use scenarios for onboard processing, from high compression algorithms, to pre-processing and selective download of high priority images, to full on-board classification. This paper focuses on recent efforts which revolve around developing a fault emulator for the embedded PowerPC within Xilinx V4FX devices, validating the RHBSW techniques developed in the prior year, and initial performance results on a representative autonomous Hyperspectral application. In the future, fault analysis data will be refined and correlated between software fault emulation, laser testing, and space based results. This project will also deliver expected performance results on an optimized, representative Hyperspectral imaging application demonstrating autonomous operations.

French, M.; Walters, J.; Zick, K.

2011-12-01

312

A model-based approach for fault-tolerant control  

DEFF Research Database (Denmark)

A model-based controller architecture for faulttolerant control (FTC) is presented in this paper. The controller architecture is based on the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization. The FTC architecture consists of two central parts, fault detection and isolation (FDI) part and a controller reconfiguration part. The theoretical basis for the architecture will be given followed by an investigation of the single parts in the architecture. At last, system interconnection will be considered with respect to the described controller architecture.

Niemann, Hans Henrik

2010-01-01

313

Observer-based Fault Detection and Isolation for Nonlinear Systems  

DEFF Research Database (Denmark)

With the rise in automation the increase in fault detectionand isolation & reconfiguration is inevitable. Interest in fault detection and isolation (FDI) for nonlinear systems has grown significantly in recent years. The design of FDI is motivated by the need for knowledge about occurring faults in fault-tolerant control systems (FTC systems). The idea of FTC systems is to detect, isolate, and handle faults in such a way that the systems can still perform in a required manner. One prefers reduced performance after occurrence of a fault to the shut down of (sub-) systems. Hence, the idea of fault-tolerance can be applied to ordinary industrial processes that are not categorized as high risk applications, but where high availability is desirable. The quality of fault-tolerant control is totally dependent on the quality of the underlying algorithms. They detect possible faults, and later reconfigure control software to handle the effects of the particular fault event. In the past mainly linear FDI methods were developed, but as most industrial plants show nonlinear behavior, nonlinear methods for fault diagnosis could probably perform better. This thesis considers the design of FDI for nonlinear systems. It consists of four different contributions. First, it presents a review of the idea and the theory behind the geometric approach for FDI. Starting from the original solution for linear systems up to the latest results for input-affine systems the theory and solutions are described. Then the geometric approach is applied to a nonlinear ship propulsion system benchmark. The calculations and application results are presented in detail to give an illustrative example. The obtained subsystems are considered for the design of nonlinear observers in order to obtain FDI. Additionally, an adaptive nonlinear observer design is given for comparison. The simulation results are used to discuss different aspects of the geometric approach, e.g. the possibility to use it as a general approach. The third contribution considers stability analysis of observers used for FDI. It gives proofs of stability for the observers designed for the ship propulsion system. Furthermore, it stresses the importance of the time-variant character of the linearization along a trajectory. It leads to a different stability analysis than for linearization at one operation point. Finally, the preliminary concept of (actuator) fault-output decoupling is described. It is a new idea based on the solution of the input-output decoupling problem. The idea is to include FDI considerations already during the control design.

Lootsma, T.F.

2001-01-01

314

Fault-tolerant topology of a grid-connected PV inverter coupled by a Scott transformer  

Energy Technology Data Exchange (ETDEWEB)

A grid-connected photovoltaic (PV) generator is mainly based on power electronics equipments which are considered as the most vulnerable parts in a PV system. In order to increase the reliability of modular grid-connected PV panel, a solution by using a Scott transformer is presented to reduce the number of switches and to continuously operate the PV system in case of switch-failures of the power converter. The three-phase type PV inverter is analysed in the normal and fault-operation. The simulation has shown that fault tolerance can be achieved with the proposed system configuration to give a redundancy of power switches in an integrated power electronic module. (orig.)

Mai, ThuanDat; Driesen, Johan [K.U. Leuven, ESAT/ELECTA, Heverlee (Belgium); Cheng, Yonghua [Vlaamse Instelling voor Technologisch Onderzoek (VITO), Mol (Belgium)

2012-07-01

315

Declarative Specification of Fault Tolerant Auction Protocols: The English Auction Case Study  

DEFF Research Database (Denmark)

Auction mechanisms are nowadays widely used in electronic commerce Web sites for buying and selling items among different users. The increasing importance of auction protocols in the negotiation phase is not limited to online marketplaces. In fact, the wide applicability of auctions as resource?allocation and negotiation mechanisms have also led to a great deal of interest in auctions within the agent community. A challenging issue for agents operating in open Multiagent Systems (such as the emerging semantic Web infrastructure) concerns the specification of declarative communication rules which could be published and shared allowing agents to dynamically engage well?known and trusted negotiation protocols. To cope with real?world applications, these rules should also specify fault tolerant patterns of interaction, enabling negotiating agents to interact with each other tolerating failures, for instance terminating an auction process even if some bidding agents dynamically crash. In this paper, we propose an approach to specify fault tolerant auction protocols in open and dynamic environments by means of communication rules dealing with crash failures of agents. We illustrate these concepts considering a case study about the specification of an English Auction protocol which tolerate crashes of bidding agents and we discuss its properties.

Dragoni, Nicola; Gaspari, Mauro

2012-01-01

316

A Case-Study in Component-Based Mechanical Verification of Fault-Tolerant Programs  

UK PubMed Central (United Kingdom)

In this paper, we present a case study to demonstratethat the decomposition of a fault-tolerant program into itscomponents is useful in its mechanical verification. Morespecifically, we discuss our experience in using the theoremprover PVS to verify Dijkstra's token ring program in acomponent-based manner. We also demonstrate the advantagesof component based mechanical verification.Keywords : Component-based verification, Faulttolerance,Program decomposition, Mechanical verification,Self-stabilization1 IntroductionIn this paper, we argue that the decomposition of a faulttolerantprogram into its components is beneficial in its mechanicalverification, and that such a decomposition admitsreuse of the proofs for other fault-tolerant programs as wellas the variations of the given fault-tolerant program.Arora and Kulkarni [3] have shown that a fault-tolerantprogram can be decomposed into a fault-intolerant programand a set of `tolerance'-components, namely detectors and...

Sandeep S. Kulkarni; John Rushby; Natarajan Shankar

317

Design and Analysis of Software fault-Tolerant techniques for Softcore processors in reliable SRAM based FPGA  

Directory of Open Access Journals (Sweden)

Full Text Available This paper discusses high level techniques for designing fault tolerant systems in SRAM-based FPGAs, without modification in the FPGA architecture. Triple Modular Redundancy (TMR) has been successfully applied in FPGAs to mitigate transient faults, which are likely to occur in space applications. However, TMR comes with high area and power dissipation penalties. The new technique proposed in this paper was specifically developed for FPGAs to cope with transient faults in the user combinational and sequential logic, while also reducing pin count, area and power dissipation. The methodology was validated by fault injection experiments in an emulation board. We present some fault coverage results and a comparison with the TMR approach

Vatsya Tiwari; Prof. Pratap Singh Patwal

2011-01-01

318

LQCD workflow execution framework: Models, provenance and fault-tolerance  

Science.gov (United States)

Large computing clusters used for scientific processing suffer from systemic failures when operated over long continuous periods for executing workflows. Diagnosing job problems and faults leading to eventual failures in this complex environment is difficult, specifically when the success of an entire workflow might be affected by a single job failure. In this paper, we introduce a model-based, hierarchical, reliable execution framework that encompass workflow specification, data provenance, execution tracking and online monitoring of each workflow task, also referred to as participants. The sequence of participants is described in an abstract parameterized view, which is translated into a concrete data dependency based sequence of participants with defined arguments. As participants belonging to a workflow are mapped onto machines and executed, periodic and on-demand monitoring of vital health parameters on allocated nodes is enabled according to pre-specified rules. These rules specify conditions that must be true pre-execution, during execution and post-execution. Monitoring information for each participant is propagated upwards through the reflex and healing architecture, which consists of a hierarchical network of decentralized fault management entities, called reflex engines. They are instantiated as state machines or timed automatons that change state and initiate reflexive mitigation action(s) upon occurrence of certain faults. We describe how this cluster reliability framework is combined with the workflow execution framework using formal rules and actions specified within a structure of first order predicate logic that enables a dynamic management design that reduces manual administrative workload, and increases cluster-productivity.

Piccoli, Luciano; Dubey, Abhishek; Simone, James N.; Kowalkowlski, James B.

2010-04-01

319

Reconfigurable control system design for fault diagnosis and accommodation.  

UK PubMed Central (United Kingdom)

The growing demand in system reliability and survivability under failures has urged ever-increasing research effort on the development of fault diagnosis and accommodation. In this paper, the on-line fault tolerant control problem for dynamic systems under unanticipated failures is investigated from a realistic point of view without any specific assumption on the type of system dynamical structure or failure scenarios. The sufficient conditions for system on-line stability under catastrophic failures have been derived using the discrete-time Lyapunov stability theory. Based upon the existing control theory and the modern computational intelligence techniques, an on-line fault accommodation control strategy is proposed to deal with the desired trajectory-tracking problems for systems suffering from various unknown and unanticipated catastrophic component failures. Theoretical analysis indicates that the control problem of interest can be solved on-line without a complete realization of the unknown failure dynamics provided an on-line estimator satisfies certain conditions. Through the on-line estimator, effective control signals to accommodate the dynamic failures can be computed using only the partially available information of the faults. Several on-line simulation studies have been presented to demonstrate the effectiveness of the proposed strategy. To investigate the feasibility of using the developed technique for unanticipated fault accommodation in hardware under the real-time environment, an on-line fault tolerant control test bed has been constructed to validate the proposed technology. Both on-line simulations and the real-time experiment show encouraging results and promising futures of on-line real-time fault tolerant control based solely upon insufficient information of the system dynamics and the failure dynamics.

Ho LW; Yen GG

2002-12-01

320

ALLIANCE: An architecture for fault tolerant multi-robot cooperation  

International Nuclear Information System (INIS)

ALLIANCE is a software architecture that facilitates the fault tolerant cooperative control of teams of heterogeneous mobile robots performing missions composed of loosely coupled, largely independent subtasks. ALLIANCE allows teams of robots, each of which possesses a variety of high-level functions that it can perform during a mission, to individually select appropriate actions throughout the mission based on the requirements of the mission, the activities of other robots, the current environmental conditions, and the robot's own internal states. ALLIANCE is a fully distributed, behavior-based architecture that incorporates the use of mathematically modeled motivations (such as impatience and acquiescence) within each robot to achieve adaptive action selection. Since cooperative robotic teams usually work in dynamic and unpredictable environments, this software architecture allows the robot team members to respond robustly, reliably, flexibly, and coherently to unexpected environmental changes and modifications in the robot team that may occur due to mechanical failure, the learning of new skills, or the addition or removal of robots from the team by human intervention. The feasibility of this architecture is demonstrated in an implementation on a team of mobile robots performing a laboratory version of hazardous waste cleanup

1995-01-01

 
 
 
 
321

Fault-Tolerant Transmission Protocol for Distant Agricultural Image Acquisition  

Directory of Open Access Journals (Sweden)

Full Text Available To solve the problem of the high cost in the GPRS communication and the limit transmission distance of WiFi, a transmission scheme for distant agriculture image acquisition was designed based on digital transmission radio in this paper. However, the majority of current digital transmission radio was designed for a small amount of data transmission. It could get a greater transmission distance with the help of the digital transmission radio, but the signal interference increased heavily when the digital transmission radio was used for image transmission. A fault-tolerant transmission protocol for agriculture image (FTTP-AI) based on digital transmission radio was designed in this paper. Packet verification was used to reduce the data errors caused by the signal interference of the digital transmission radio. At the same time, overtime retransmission and the lost packet retransmission were used to overcome the problem of packet loss. Experiments showed that the FTTP-AI could send the agriculture image to a remote computer center successfully in the field. With the help of the FTTP-AI, the rate of accuracy of data transmission was up to 99.2%, the success rate of image transmission was up to 95.8%, the costless distant transmission can achieve several kilometers. This scheme could satisfy the requirement of the low-cost for distant agriculture images transmission reliably. 

Jian Chen; Deqin Xiao; Dongmin Liu; Xiaoqing Jiang

2013-01-01

322

ALLIANCE: An architecture for fault tolerant multi-robot cooperation  

Energy Technology Data Exchange (ETDEWEB)

ALLIANCE is a software architecture that facilitates the fault tolerant cooperative control of teams of heterogeneous mobile robots performing missions composed of loosely coupled, largely independent subtasks. ALLIANCE allows teams of robots, each of which possesses a variety of high-level functions that it can perform during a mission, to individually select appropriate actions throughout the mission based on the requirements of the mission, the activities of other robots, the current environmental conditions, and the robot`s own internal states. ALLIANCE is a fully distributed, behavior-based architecture that incorporates the use of mathematically modeled motivations (such as impatience and acquiescence) within each robot to achieve adaptive action selection. Since cooperative robotic teams usually work in dynamic and unpredictable environments, this software architecture allows the robot team members to respond robustly, reliably, flexibly, and coherently to unexpected environmental changes and modifications in the robot team that may occur due to mechanical failure, the learning of new skills, or the addition or removal of robots from the team by human intervention. The feasibility of this architecture is demonstrated in an implementation on a team of mobile robots performing a laboratory version of hazardous waste cleanup.

Parker, L.E.

1995-02-01

323

Researching on Distribution Network Fault Location System  

Directory of Open Access Journals (Sweden)

Full Text Available In this study, we propose the method using rough set theory to realize the distribution network fault positioning in WEBGIS environment. According to the distribution network tree structure, using of the area user s’ fault complain information as condition attributes, fault elements as decision attribute, form the decision table automatically. By using rough set method reduce the decision table, the minimal reduction of decision table for fault diagnosis is derived, the minimal diagnostic rules are obtained, guarantee the objectivity of the rule. When the fault complain call information is imperfect, it can still achieve rapid, accurate fault location on purpose, has good fault tolerance performance. In this study the use of C# language programming to achieve the reduction process for distribution network fault diagnosis decision table, combined with the WEBGIS platform, make full use of internal network database resources, to realize fault diagnosis simple and rapid operation and fault location visualization. The results show that the method is feasible and effective.

Yan Li; Yu Guo; Dening Zhang

2013-01-01

324

Separation of Fault Tolerance and Non-Functional Concerns: Aspect Oriented Patterns and Evaluation  

Directory of Open Access Journals (Sweden)

Full Text Available Dependable computer based systems employing fault tolerance and robust software development techniques demand additional error detection and recovery related tasks. This results in tangling of core functionality with these cross cutting non-functional concerns. In this regard current work identifies these dependability related non-functional and cross-cutting concerns and proposes design and implementation solutions in an aspect oriented framework that modularizes and separates them from core functionality. The degree of separation has been quantified using software metrics. A Lego NXT Robot based case study has been completed to evaluate the proposed design framework.

Kashif Hameed; Rob Williams; Jim Smith

2010-01-01

325

LQCD workflow execution framework: Models, provenance and fault-tolerance  

International Nuclear Information System (INIS)

[en] Large computing clusters used for scientific processing suffer from systemic failures when operated over long continuous periods for executing workflows. Diagnosing job problems and faults leading to eventual failures in this complex environment is difficult, specifically when the success of an entire workflow might be affected by a single job failure. In this paper, we introduce a model-based, hierarchical, reliable execution framework that encompass workflow specification, data provenance, execution tracking and online monitoring of each workflow task, also referred to as participants. The sequence of participants is described in an abstract parameterized view, which is translated into a concrete data dependency based sequence of participants with defined arguments. As participants belonging to a workflow are mapped onto machines and executed, periodic and on-demand monitoring of vital health parameters on allocated nodes is enabled according to pre-specified rules. These rules specify conditions that must be true pre-execution, during execution and post-execution. Monitoring information for each participant is propagated upwards through the reflex and healing architecture, which consists of a hierarchical network of decentralized fault management entities, called reflex engines. They are instantiated as state machines or timed automatons that change state and initiate reflexive mitigation action(s) upon occurrence of certain faults. We describe how this cluster reliability framework is combined with the workflow execution framework using formal rules and actions specified within a structure of first order predicate logic that enables a dynamic management design that reduces manual administrative workload, and increases cluster-productivity.

2010-04-01

326

FAULT TOLERANT NANO-MEMORY WITH FAULT SECURE ENCODER AND DECODER  

Directory of Open Access Journals (Sweden)

Full Text Available Traditionally, memory cells were the only circuitry susceptible to transient faults The supporting circuitries around the memory were assumed to be fault-free. Due to the increase in soft error rate in logic circuits, the encoder and decoder circuitry around the memory blocks have become susceptible to soft errors as well and must be protected. Memory cells have been protected from soft errors for more than a decade; due to the increase in soft error rate in logic circuits, the encoder and decoder circuitry around the memory blocks have become susceptible to soft errors as well and must also be protected. In this paper a new approach to design fault-secure encoder and decoder circuitry for memory designs. The key novel contribution of this paper is identifying and defining a new class of error-correcting codes whose redundancy makes the design of faultsecure detectors (FSD) particularly simple. We further quantify the importance of protecting encoder and decoder circuitry against transient errors, illustrating a scenario where the system failure rate (FIT) is dominated by the failure rate of the encoder and decoder. We prove that Euclidean Geometry Low-Density Parity-Check (EG-LDPC) codes have the faultsecure detector capability.

VIJAYKUMAR.K,; SAKE POTHALAIAH,; Dr. K ASHOK BABU

2011-01-01

327

Single event upset injection simulation and fault-tolerant design for image compression applications  

Science.gov (United States)

This paper describes a SEU fault injection framework. Based on the assumption of SEU effects and SEU distribution, the quantitative analysis between measured data and simulation model is investigated. By adjusting some parameters in the simulation-based framework, the proposed framework can be very possibly close to the published data and some accelerated radiation experiments. Furthermore, how the JPEG2000 based hardware architecture is sensitive to SEUs can be found out. In terms of hardware resources and operating frequencies, some fault-tolerant techniques can be introduced to the more sensitive parts, which show the framework's effectiveness in fault-tolerant design for image compression applications.

Guo, Jie; Li, Yunsong; Liu, Kai; Lei, Jie; Wu, Chengke

2012-10-01

328

A hybrid framework for design and analysis of fault-tolerant architectures for nanoscale molecular crossbar memories.  

Energy Technology Data Exchange (ETDEWEB)

It is anticipated that self assembled ultra-dense nanomemories will be more susceptible to manufacturing defects and transient faults than conventional CMOS-based memories, thus the need exists for fault-tolerant memory architectures. The development of such architectures will require intense analysis in terms of achievable performance measures - power dissipation, area, delay and reliability. In this paper, we propose and develop a hybrid automation framework, called HMAN, that aids the design and analysis of fault-tolerant architectures for nanomemories. Our framework can analyze memory architectures at two different levels of the design abstraction, namely the system and circuit levels. To the best of our knowledge, this is the first such attempt at analyzing memory systems at different levels of abstraction and then correlating the different performance measures to provide the system designers guidelines for designing a robust nanomemory. We also illustrate the application of our framework to self-assembled crossbar architectures by analyzing a hierarchical fault-tolerant crossbar-based memory architecture that we have developed, and comparing this with existing crossbar architectures.

Graham, P. S. (Paul S.); Gokhale, M. (Maya); Bhaduri, D. (Debayan); Shukla, S. K. (Sandeep K.); Coker, D. (Deji); Taylor, V. (Valerie)

2005-01-01

329

Clustering and fault tolerance for target tracking using wireless sensor networks  

International Nuclear Information System (INIS)

Over the last few years, the deployment of WSNs (Wireless Sensor Networks) has been fostered in diverse applications. WSN has great potential for a variety of domains ranging from scientific experiments to commercial applications. Due to the deployment of WSNs in dynamic and unpredictable environments. They have potential to cope with variety of faults. This paper proposes an energy-aware fault-tolerant clustering protocol for target tracking applications termed as the FITf (Fault Tolerant Target Tracking) protocol The identification of RNs (Redundant Nodes) makes SN (Sensor Node) fault tolerance plausible and the clustering endorsed recovery of sensors supervised by a faulty CH (Cluster Head). The FfTT protocol intends two steps of reducing energy consumption: first, by identifying RNs in the network; secondly, by restricting the numbers of SNs sending data to the CH. Simulations validate the scalability and low power consumption of the FITf protocol in comparison with LEACH protocol. (author)

2012-01-01

330

High available and fault tolerant mobile communications infrastructure  

DEFF Research Database (Denmark)

High availability is a key requirement in mobile communication systems, especially, when it is used for mission-critical services such as public safety e.g. police, ambulance and fire services. A failure in the fixed network infrastructure that provides services to mobile users can affect a large number of users and risk loss of lives. The fixed infrastructure of mobile communication system has different characteristics, for example, architecture ´complexity, real-time peer-topeer communication and performance requirements that make the already existing failure recovery techniques, such as those using rollback or replication techniques inapplicable. This dissertation presents a novel failure recovery approach based on a behavioral model of the communication protocols. The new recovery method is able to deal with software and hardware faults and is particularly suitable for mobile communications infrastructure. The method enables the faulty applications in the infrastructure to quickly and effectively resume their services to their mobile clients with no or minimal loss of work after failure. In our approach, we do not assume a specific fault behavior for example failstop or transient behavior as it is the case for many recovery techniques. In addition, the method does not require any modification to mobile clients. The Communicating Extended Finite State Machine (CEFSM) is used to model the behavior of the infrastructure applications. The model based recovery scheme is integrated in the application and uses the client/server model to save the application state information during failure-free execution on a stable storage and retrieve them when needed during recovery. When and what information to be saved/retrieved is determined by the behavioral model of the application. To practically evaluate and demonstrate the effectiveness of our method, we developed as a case study an experimental testbed for the TETRA (TErrestrial Trunked Radio) packet data network. The testbed works as a distributed system and can run various communication scenarios between the fixed network infrastructure and its mobile users. We thoroughly followed the TETRA standard specifications in our implementation of the communication protocols in order to get a testbed system that operates as the real system with respect to message exchange and timing. The experimental results showed that by using our method the faulty infrastructure application can immediately resume its service after its restart and in less than a minute, it restores its service performance level prior to the failure. The failure-free overhead incurred by the method is relatively low, and is experimentally found to be less than 5% in the conducted experiments.

Beiroumi, Mohammad Zib

2006-01-01

331

An arc fault detection system  

Energy Technology Data Exchange (ETDEWEB)

An arc fault detection system for use on ungrounded or high-resistance-grounded power distribution systems is provided which can be retrofitted outside electrical switchboard circuits having limited space constraints. The system includes a differential current relay that senses a current differential between current flowing from secondary windings located in a current transformer coupled to a power supply side of a switchboard, and a total current induced in secondary windings coupled to a load side of the switchboard. When such a current differential is experienced, a current travels through a operating coil of the differential current relay, which in turn, opens an upstream circuit breaker located between the switchboard and a power supply to remove the supply of power to the switchboard.

Jha, Kamal N.

1997-12-01

332

Robot-borne fault tolerant calculators for nuclear use  

International Nuclear Information System (INIS)

The use of robots has become a necessity in civil nuclear industry. Electronic systems of such robots must tolerate cumulative ionizing radiation dose effects. Today's objective is to reach a 3 kGy dose resistance. Difficulties and costs involved during on-site maintenance imply to warrant at least one functioning mode in the case of system failure. To improve the behaviour of robot-borne systems, the CEA Department for Nuclear Engineering Studies (DEIN) has developed a method for the selection of industrial electronic components and has built computer architectures which allows to break free from some cumulative dose sensitive parameters. This paper presents the MICADO and CADMOS architectures developed at the DEIN. (J.S.). 15 refs., 5 figs.

1995-01-01

333

Testing and fault tolerance of multistage interconnection networks  

Energy Technology Data Exchange (ETDEWEB)

Test length is independent of network size in this simple, straightforward methodology for testing MINs (multistage interconnection networks). It requires only four test sequences for single-fault diagnosis. 26 references.

Agrawal, D.P.

1982-04-01

334

Built-in self-test resources for fault-tolerant VLSI environments  

Energy Technology Data Exchange (ETDEWEB)

Chip-level built-in self-test (BIST) techniques were developed to enhance testability at the manufacturing stage and have also been extended to the system level. In this extended capacity, BIST features can be used to increase the reliability and dependability of the functional system. This dissertation investigates the modeling and analysis of built-in self-test resources in a fault-tolerant VLSI environment. The BIST circuitry is utilized as a fault-detection mechanism for gracefully degrading systems employing dynamic redundancy. Specific models are evaluated under the constraints of fixed test time and fixed resource assumptions. For a fixed-test time model, the additional overhead of distributed BIST techniques, as compared to centralized schemes, is justified when area-utilization measures are considered. These measures assess system maintenance and test cost beyond the initial BIST hardware overhead penalty. The specific BIST technique employed (centralized or distribute) is also shown to have a significant influence upon instantaneous and cumulative reward measures. It is shown that optimal system performability can be achieved by dedicating an appropriate fraction of the VLSI real estate to maintenance resources.

Muha, D.C.

1988-01-01

335

Automatic fault monitoring system using a microcomputer  

Energy Technology Data Exchange (ETDEWEB)

This paper describes an automatic fault monitoring system using a microcomputer. The hardware is based on the Z80 processor and the software on the programming language PEARL. The microcomputer system is able to detect faults in the equipment during data acquisition and provides useful information for the subsequent data evaluation. The results of two applications are discussed.

Besold, R.; Baran, R.; Hofmann, A.; Holleczek, P.; Mueller, R.

1986-09-20

336

Neotectonics of Panama. I. Major fault systems  

Energy Technology Data Exchange (ETDEWEB)

The direction and rate of relative plate motion across the Caribbean-Nazca boundary in Panama is poorly known. This lack of understanding can be attributed to diffuse seismicity; lack of well constrained focal mechanisms from critical areas; and dense tropical vegetation. In order to better understand the relation of plate motions to major fault systems in Panama, the authors have integrated geologic, remote sensing, earthquake and UTIG marine seismic reflection data. Three areas of recent faulting can be distinguished in Panama and its shelf areas; ZONE 1 of eastern Panama consists of a 70 km wide zone of 3 discrete left-lateral strike-slip faults (Sanson Hills, Jaque River, Sambu) which strike N40W and can be traced as continuous features for distances of 100-150 km; ZONE 2 in central Panama consists of a diffuse zone of discontinuous normal(.) faults which range in strike from N40E, N70E; ZONE 3 in western Panama consists of a 60 km wide zone of 2 discrete, left-lateral(.) strike-slip faults which strike N60W and can be traced as continuous features for distances of 150 km; ZONE 3 faults appear to be continuous with faults bounding the forearc Teraba Trough of Costa Rica. The relation of faults of ZONE 3 to faults of ZONE 2 and a major fault bounding the southern Panama shelf is unclear.

Corrigan, J.; Mann, P.

1985-01-01

337

Implementation of Fault-tolerant Quantum Logic Gates via Optimal Control  

CERN Document Server

The implementation of fault-tolerant quantum gates on encoded logic qubits is considered. It is shown that transversal implementation of logic gates based on simple geometric control ideas is problematic for realistic physical systems suffering from imperfections such as qubit inhomogeneity or uncontrollable interactions between qubits. However, this problem can be overcome by formulating the task as an optimal control problem and designing efficient algorithms to solve it. In particular, we can find solutions that implement all of the elementary logic gates in a fixed amount of time with limited control resources for the five-qubit stabilizer code. Most importantly, logic gates that are extremely difficult to implement using conventional techniques even for ideal systems, such as the T-gate for the five-qubit stabilizer code, do not appear to pose a problem for optimal control.

Nigmatullin, R

2009-01-01

338

Design of Fault-Tolerant and Dynamically-Reconfigurable Microfluidic Biochips  

CERN Multimedia

Microfluidics-based biochips are soon expected to revolutionize clinical diagnosis, DNA sequencing, and other laboratory procedures involving molecular biology. Most microfluidic biochips are based on the principle of continuous fluid flow and they rely on permanently-etched microchannels, micropumps, and microvalves. We focus here on the automated design of "digital" droplet-based microfluidic biochips. In contrast to continuous-flow systems, digital microfluidics offers dynamic reconfigurability; groups of cells in a microfluidics array can be reconfigured to change their functionality during the concurrent execution of a set of bioassays. We present a simulated annealing-based technique for module placement in such biochips. The placement procedure not only addresses chip area, but it also considers fault tolerance, which allows a microfluidic module to be relocated elsewhere in the system when a single cell is detected to be faulty. Simulation results are presented for a case study involving the polymeras...

Su, Fei

2011-01-01

339

A Secure and Fault-tolerant framework for Mobile IPv6 based networks  

Directory of Open Access Journals (Sweden)

Full Text Available Mobile IPv6 will be an integral part of the next generation Internet protocol. The importance of mobility in the Internet gets keep on increasing. Current specification of Mobile IPv6 does not provide proper support for reliability in the mobile network and there are other problems associated with it. In this paper, we propose “Virtual Private Network (VPN) based Home Agent Reliability Protocol (VHAHA)” as a complete system architecture and extension to Mobile IPv6 that supports reliability and offers solutions to the security problems that are found in Mobile IP registration part. The key features of this protocol over other protocols are: better survivability, transparent failure detection and recovery, reduced complexity of the system and workload, secure data transfer and improved overall performance.Keywords-Mobility Agents; VPN; VHAHA; Fault-tolerance; Reliability; Self-certified keys; Confidentiality; Authentication; Attack prevention

Rathi S; Thanuskodi K

2009-01-01

340

Implementation of fault-tolerant quantum logic gates via optimal control  

International Nuclear Information System (INIS)

The implementation of fault-tolerant quantum gates on encoded logic qubits is considered. It is shown that transversal implementation of logic gates based on simple geometric control ideas is problematic for realistic physical systems suffering from imperfections such as qubit inhomogeneity or uncontrollable interactions between qubits. However, this problem can be overcome by formulating the task as an optimal control problem and designing efficient algorithms to solve it. In particular, we can find solutions that implement all of the elementary logic gates in a fixed amount of time with limited control resources for the five-qubit stabilizer code. Most importantly, logic gates that are extremely difficult to implement using conventional techniques even for ideal systems, such as the T-gate for the five-qubit stabilizer code, do not appear to pose a problem for optimal control.

2009-01-01

 
 
 
 
341

A Fault Tolerant Congestion Aware Routing Protocol for Mobile Adhoc Networks  

Directory of Open Access Journals (Sweden)

Full Text Available Problem statement: The performance of ad hoc routing protocols will significantly degrade when there are faulty nodes in the network. Packet losses and bandwidth degradation are caused due to congestion and thus, time and energy is wasted during its recovery. The fault tolerant congestion aware routing protocol addresses these problems by exploring the network redundancy through multipath routing. Approach: In this study, it is proposed to design a fault tolerant congestion aware multi path routing protocol to reduce the route breakages and congestion losses. The AOMDV protocol is used as a base for the multipath routing. This proposed scheme enables more nodes to salvage a dropped packet. Results: Simulation results show that the proposed protocol achieves better throughput and packet delivery ratio with reduced delay, packet drop and energy. Conclusion: An effective congestion control technique proposed in this study proactively detects node level and link level congestion and performs congestion control using the fault-tolerant multiple paths.

G. Rajkumar; K. Duraiswamy

2012-01-01

342

Fault-Tolerant Storage of Quantum Information by Large Block Codes  

Science.gov (United States)

An important issue in the implementation of a quantum computer is to protect quantum information from decoherence. Concatenated quantum codes and topological quantum codes are extensively studied for fault-tolerant quantum computation. However, there is not much research on large block codes in any fault-tolerant scheme. Here we propose a method for storage of quantum information by a large block code, which has a high code rate and high distance. To access or protect the quantum information stored in a large block code requires only the fault-tolerant implementation of the gates from the Clifford group. We derive the lifetime of the quantum information stored in a large block code by CSS code construction.

Lai, Ching-Yi; Brun, Todd

2013-03-01

343

Universal fault-tolerant quantum computation with only transversal gates and error correction.  

UK PubMed Central (United Kingdom)

Transversal implementations of encoded unitary gates are highly desirable for fault-tolerant quantum computation. Though transversal gates alone cannot be computationally universal, they can be combined with specially distilled resource states in order to achieve universality. We show that "triorthogonal" stabilizer codes, introduced for state distillation by Bravyi and Haah [Phys. Rev. A 86, 052329 (2012)], admit transversal implementation of the controlled-controlled-Z gate. We then construct a universal set of fault-tolerant gates without state distillation by using only transversal controlled-controlled-Z, transversal Hadamard, and fault-tolerant error correction. We also adapt the distillation procedure of Bravyi and Haah to Toffoli gates, improving on existing Toffoli distillation schemes.

Paetznick A; Reichardt BW

2013-08-01

344

Implementation of the Six Channel Redundancy to achieve fault tolerance in testing of satellites  

Directory of Open Access Journals (Sweden)

Full Text Available This paper aims to implement the six channel redundancy to achieve fault tolerance in testing of satellites with acoustic spectrum. We mainly focus here on achieving fault tolerance. An immediate application is the microphone data acquisition and to do analysis at the Acoustic Test Facility (ATF) centre, National Aerospace Laboratories. It has an 1100 cubic meter reverberation chamber in which a maximum sound pressure level of 157 dB is generated. The six channel Redundancy software with fault tolerant operation is devised and developed. The data are applied to program written in C language. The program is run using the Code Composer Studio by accepting the inputs. This is tested with the TMS 320C 6727 DSP, Pro Audio Development Kit (PADK).

H S Aravinda; H D Maheshappa; Ranjan Moodithaya

2010-01-01

345

Toward a Scalable Method for Quantifying Aspects of Fault Tolerance, Software Assurance, and Computer Security  

UK PubMed Central (United Kingdom)

Quantitative assessment tools are urgently needed in the areas of fault tolerance,software assurance, and computer security. Assessment methods typically employed invarious combinations are fault injection, formal verification, and testing. However,these methods are expensive because they are labor-intensive, with costs scaling at leastlinearly with the number of software modules tested. Additionally, they are subject tohuman lapses and oversights because they require two different representations for eachsystem, and then base results on a direct or an indirect representation comparison.

Philip Koopman

346

Performance Analysis of Fault-Tolerant Irregular Baseline Multistage Interconnection Network  

Directory of Open Access Journals (Sweden)

Full Text Available In this paper an attempt has been made to analyze the characteristics of a new class of irregular fault-tolerant multistage interconnection network named as irregular modified augmented baseline network(IMABN).IMABN can provide ‘Full access’ capability in presence of multiple faults . Permutation passibility and Bandwidth Analysis shows that Proposed IMABN achieve a significant improvement overModified Augmented Baseline Network (MABN).

Mamta Ghai,; Vinay Chopra,; Karamjit Kaur Cheema

2010-01-01

347

Fault tolerant control of wind turbines using unknown input observers  

DEFF Research Database (Denmark)

This paper presents a scheme for accommodating faults in the rotor and generator speed sensors in a wind turbine. These measured values are important both for the wind turbine controller as well as the supervisory control of the wind turbine. The scheme is based on unknown input observers, which are also used to detect and isolate these faults. The scheme is tested on a known benchmark for FDI and FTC of wind turbines. Tests on this benchmark model show a clear potential of the proposed scheme.

Odgaard, Peter Fogh; Stoustrup, Jakob

2012-01-01

348

Fault-tolerant quantum computing with spins using the conditional Faraday rotation  

CERN Multimedia

We propose a fault-tolerant scheme for deterministic quantum computing with spins that is based on a quantum teleportation scheme using the conditional Faraday rotation. The phase gate between two sets of noninteracting quantum dots, embedded in microcavities inside a photonic crystal, is mediated by single photons, which yields a Faraday rotation rate high enough for gate operation times of 100 ps. Using sets of quantum dots and error correction codes makes our scheme fault-tolerant. Single-qubit operations on encoded qubits can be implemented by means of the optical Stark effect combined with the optical RKKY interaction.

Leuenberger, M N

2004-01-01

349

Evaporator unit as a benchmark for plug and play and fault tolerant control  

DEFF Research Database (Denmark)

This paper presents a challenging industrial benchmark for implementation of control strategies under realistic working conditions. The developed control strategies should perform in a plug & play manner, i.e. adapt to varying working conditions, optimize their performance, and provide fault tolerance. A fault tolerant strategy is needed to deal with a faulty sensor measurement of the evaporation pressure. The design and algorithmic challenges in the control of an evaporator include: unknown model parameters, large parameter variations, varying loads, and external discrete phenomena such as compressor switch on/o or abrupt change in compressor speed.

Izadi-Zamanabadi, Roozbeh; Vinther, Kasper

2012-01-01

350

Fault Tolerant Variable Block Carry Skip Logic (VBCSL) using Parity Preserving Reversible Gates  

CERN Multimedia

Reversible logic design has become one of the promising research directions in low power dissipating circuit design in the past few years and has found its application in low power CMOS design, digital signal processing and nanotechnology. This paper presents the efficient design approaches of fault tolerant carry skip adders (FTCSAs) and compares those designs with the existing ones. Variable block carry skip logic (VBCSL) using the fault tolerant full adders (FTFAs) has also been developed. The designs are minimized in terms of hardware complexity, gate count, constant inputs and garbage outputs. Besides of it, technology independent evaluation of the proposed designs clearly demonstrates its superiority with the existing counterparts.

Islam, Md Saiful; Begum, Zerina; Hafiz, Mohd Zulfiquar

2010-01-01

351

Fault Tolerant Control for automated Managed Pressure Drilling  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In the last years the cost for rig rental has become very high. To reduce total cost it has been necessary to increase the cost efficiency when drilling to improve profit. One way to improve profit is to reduce the nonproductive time related to faults and failures which could occur during drilling. ...

Eilerås, Jarle

352

Mechanical Verification of a Generalized Protocol for Byzantine Fault Tolerant Clock Synchronization  

UK PubMed Central (United Kingdom)

Schneider [Sch87] generalizes a number of protocols for Byzantine faulttolerantclock synchronization and presents a uniform proof for their correctness.We present a mechanical verification of Schneider's protocol leading toseveral significant clarifications and revisions. The verification was carried outwith the Ehdm system [RvHO91] developed at the SRI Computer Science Laboratory.The mechanically checked proofs include the verification that the egocentricmean function used in Lamport and Melliar-Smith's Interactive ConvergenceAlgorithm [LMS85] satisfies the requirements of Schneider's protocol.Our mechanical verification raises a number of issues regarding the verificationof fault-tolerant, distributed, real-time protocols that are germane to the designof a special-purpose logic for such problems.This work was supported by NASA Contract NAS1-18226. John Rushby, Friedrich von Henke,Fred Schneider, and Rick Butler provided considerable guidance and encouragement. I also th...

Natarajan Shankar

353

A Fault Tolerant, Dynamic and Low Latency BDII Architecture for Grids  

CERN Document Server

The current BDII model relies on information gathering from agents that run on each core node of a Grid. This information is then published into a Grid wide information resource known as Top BDII. The Top level BDIIs are updated typically in cycles of a few minutes each. A new BDDI architecture is proposed and described in this paper based on the hypothesis that only a few attribute values change in each BDDI information cycle and consequently it may not be necessary to update each parameter in a cycle. It has been demonstrated that significant performance gains can be achieved by exchanging only the information about records that changed during a cycle. Our investigations have led us to implement a low latency and fault tolerant BDII system that involves only minimal data transfer and facilitates secure transactions in a Grid environment.

Osman, Asif; Batool, Naheed; McClatchey, Richard

2012-01-01

354

Fault-Tolerant Robot Programming through Simulation with Realistic Sensor Models  

Directory of Open Access Journals (Sweden)

Full Text Available We introduce a simulation system for mobile robots that allows a realistic interaction of multiple robots in a common environment. The simulated robots are closely modeled after robots from the EyeBot family and have an identical application programmer interface. The simulation supports driving commands at two levels of abstraction as well as numerous sensors such as shaft encoders, infrared distance sensors, and compass. Simulation of on-board digital cameras via synthetic images allows the use of image processing routines for robot control within the simulation. Specific error models for actuators, distance sensors, camera sensor, and wireless communication have been implemented. Progressively increasing error levels for an application program allows for testing and improving its robustness and fault-tolerance.

Thomas Braeunl; Andreas Koestler; Axel Waggershauser

2008-01-01

355

Theory of Decoherence-Free Fault-Tolerant Universal Quantum Computation  

CERN Multimedia

Universal quantum computation on decoherence-free subspaces and subsystems (DFSs) is examined with particular emphasis on using only physically relevant interactions. A necessary and sufficient condition for the existence of decoherence-free (noiseless) subsystems in the Markovian regime is derived here for the first time. A stabilizer formalism for DFSs is then developed which allows for the explicit understanding of these in their dual role as quantum error correcting codes. Conditions for the existence of Hamiltonians whose induced evolution always preserves a DFS are derived within this stabilizer formalism. Two possible collective decoherence mechanisms arising from permutation symmetries of the system-bath coupling are examined within this framework. It is shown that in both cases universal quantum computation which always preserves the DFS (*natural fault-tolerant computation*) can be performed using only two-body interactions. This is in marked contrast to standard error correcting codes, where all kn...

Kempe, J; Lidar, D A; Whaley, K B; Kempe, Julia; Bacon, David; Lidar, Daniel A.

2001-01-01

356

Fault tolerance in collaborative sensor networks for target detection Thomas Clouqueur, Kewal K. Saluja, Parameswaran Ramanathan  

UK PubMed Central (United Kingdom)

Collaboration in sensor networks must be fault tolerant due to the harsh environmental conditionsin which such networks can be deployed. This paper focuses on finding algorithms for collaborativetarget detection that are efficient in terms of communication cost, precision, accuracy, and number offaulty sensors tolerable in the network. Two algorithms, namely value fusion and decision fusion areidentified first. When comparing their performance and communication overhead, decision fusion isfound to become superior to value fusion as the ratio of faulty sensors increases.

Thomas Clouqueur; Kewal K. Saluja; Parameswaran Ramanathan

357

Posbist fault tree analysis of coherent systems  

International Nuclear Information System (INIS)

When the failure probability of a system is extremely small or necessary statistical data from the system is scarce, it is very difficult or impossible to evaluate its reliability and safety with conventional fault tree analysis (FTA) techniques. New techniques are needed to predict and diagnose such a system's failures and evaluate its reliability and safety. In this paper, we first provide a concise overview of FTA. Then, based on the posbist reliability theory, event failure behavior is characterized in the context of possibility measures and the structure function of the posbist fault tree of a coherent system is defined. In addition, we define the AND operator and the OR operator based on the minimal cut of a posbist fault tree. Finally, a model of posbist fault tree analysis (posbist FTA) of coherent systems is presented. The use of the model for quantitative analysis is demonstrated with a real-life safety system

2004-01-01

358

Posbist fault tree analysis of coherent systems  

Energy Technology Data Exchange (ETDEWEB)

When the failure probability of a system is extremely small or necessary statistical data from the system is scarce, it is very difficult or impossible to evaluate its reliability and safety with conventional fault tree analysis (FTA) techniques. New techniques are needed to predict and diagnose such a system's failures and evaluate its reliability and safety. In this paper, we first provide a concise overview of FTA. Then, based on the posbist reliability theory, event failure behavior is characterized in the context of possibility measures and the structure function of the posbist fault tree of a coherent system is defined. In addition, we define the AND operator and the OR operator based on the minimal cut of a posbist fault tree. Finally, a model of posbist fault tree analysis (posbist FTA) of coherent systems is presented. The use of the model for quantitative analysis is demonstrated with a real-life safety system.

Huang, H.-Z.; Tong Xin; Zuo, Ming J

2004-05-01

359

Design and evaluation of fault-tolerant VLSI/WSI processor arrays. Final technical report, 1 July 1985-31 December 1987  

Energy Technology Data Exchange (ETDEWEB)

This document is the final report of work performed under the project entitled Design and Evaluation of Fault-Tolerant VLSI/WSI Processor Arrays supported by the Innovative Science and Technology Office of the Strategic Defense Initiative Organization and administered through the Office of Naval Research under Contract No. 00014-85-k-0588. With the concurrence of Dr. Clifford Lau, the Scientific Officer for this project, this final report consists of reprints of publications reporting work performed under the project. In the attached list of publications are papers where fault-tolerant systems for processor arrays are proposed and studied. Studies on algorithmic and software aspects relevant to the systems are also reported, as well as hardware and reconfigurability issues for fault-tolerant processor arrays.

Fortes, J.A.

1987-12-31

360

Improving the Navigability of a Hexapod Robot using a Fault-Tolerant Adaptive Gait  

Directory of Open Access Journals (Sweden)

Full Text Available This paper encompasses a study on the development of a walking gait for fault tolerant locomotion in unstructured environments. The fault tolerant gait for adaptive locomotion fulfills stability conditions in opposition to a fault (locked joints or sensor failure) event preventing a robot to realize stable locomotion over uneven terrains. To accomplish this feat, a fault tolerant gait based on force?position control is proposed in this paper for a hexapod robot to enable stable walking with a joint failure. Furthermore, we extend our proposed fault detection and diagnosis (FDD) method to deal with the critical failure of the angular rate sensors responsible for the attitude control of the robot over uneven terrains. A performance analysis of straight? line walking is carried out which shows that the proposed FDD?based gait is capable of generating an adaptive walking pattern during joint or sensor failures. The performance of the proposed control is established using dynamic simulations and real?world experiments on a prototype hexapod robot.

Umar Asif

2012-01-01

 
 
 
 
361

FAULT TOLERANT MODULAR LINEAR TRANSVERSE FLUX RELUCTANCE MACHINES  

Directory of Open Access Journals (Sweden)

Full Text Available This paper deals with two types of variable reluctance machines, both having linear movement. There are in fact modular structures that can operate even if different kinds of faults occur. One of thepresented machines will be a linear transverse flux reluctance machine, the other a tubular transverse flux reluctance machine. Both of them have the same operating principle, and, as it will be proved, many of the considerations done here are valid for both machines.

Vasile IANCU; Dan-Cristian POPA; Loránd SZABÓ

2009-01-01

362

Experimenting with Component-Based Middleware for Adaptive Fault Tolerant Computing  

CERN Multimedia

This short paper describes early experiments to validate the capabilities of a component-based platform to observe and control a software architecture in the small. This is part of a whole process for resilient computing, i.e. targeting the adaptation of fault-tolerance mechanisms at runtime.

Stoicescu, Miruna; Roy, Matthieu

2012-01-01

363

Comparison between different model of hexapod robot in fault-tolerant gait  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper presents a gait analysis of the equilateral hexagonal model of hexapod robot. Mathematical analysis has been made on mobility, fault-tolerance, and stability. A comparison with the rectangular model of hexapod robot is also given, and it has shown that the hexagonal model shows better tur...

Chu, SKK; Pang, GKH

364

A Fault-Tolerant Routing Protocol for Mobile Ad Hoc Networks  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Multi-hop mobile ad hoc networks (MANETs) consist of nodes and links that are vulnerable to frequent failures. In order to provide fault-tolerance in the network, it is important that the routing protocols take into consideration the redundancy in terms of multiple paths (ideally, disjoi...

Rana Ejaz Ahmed

365

BATRUN DPS (Batch after twilight running-distributed processing system): multi-cell, fault-tolerant, parallel, batch processing for Monte Carlos simulations  

International Nuclear Information System (INIS)

This paper discusses the design of the BATRUN Distributed Processing System (DPS). In contrast to a dedicated cluster of workstations, the scheduling in BATRUN DPS must ensure that only the idle cycles are used for distributed computing and the local users, when they are operating, have full control of their machines. BATRUN DPS has several unique features: group-based scheduling policy to ensure execution priority based on ownership of machines, and multi-cell distributed design to eliminate a single point failure as well as to ensure scalibility. The implementation of the system is based on multi-threading and remote procedure call mechanisms. (author)

1996-01-01

366

BATRUN DPS (Batch after twilight running-distributed processing system): multi-cell, fault-tolerant, parallel, batch processing for Monte Carlos simulations  

Energy Technology Data Exchange (ETDEWEB)

This paper discusses the design of the BATRUN Distributed Processing System (DPS). In contrast to a dedicated cluster of workstations, the scheduling in BATRUN DPS must ensure that only the idle cycles are used for distributed computing and the local users, when they are operating, have full control of their machines. BATRUN DPS has several unique features: group-based scheduling policy to ensure execution priority based on ownership of machines, and multi-cell distributed design to eliminate a single point failure as well as to ensure scalibility. The implementation of the system is based on multi-threading and remote procedure call mechanisms. (author)

Tandiary, Fred; Kothari, Suraj C.; Dixit, Ashish [Iowa State Univ. of Science and Technology, Ames, IA (United States). Dept. of Computer Sciences; Anderson, E. Walter [Iowa State Univ. of Science and Technology, Ames, IA (United States). Dept. of Physics and Astronomy

1996-07-01

367

An Adaptive Fault-Tolerant Event Detection Scheme for Wireless Sensor Networks  

Directory of Open Access Journals (Sweden)

Full Text Available In this paper, we present an adaptive fault-tolerant event detection scheme for wireless sensor networks. Each sensor node detects an event locally in a distributed manner by using the sensor readings of its neighboring nodes. Confidence levels of sensor nodes are used to dynamically adjust the threshold for decision making, resulting in consistent performance even with increasing number of faulty nodes. In addition, the scheme employs a moving average filter to tolerate most transient faults in sensor readings, reducing the effective fault probability. Only three bits of data are exchanged to reduce the communication overhead in detecting events. Simulation results show that event detection accuracy and false alarm rate are kept very high and low, respectively, even in the case where 50% of the sensor nodes are faulty.

Sung-Jib Yim; Yoon-Hwa Choi

2010-01-01

368

FTEAP: A Fault Tolerant Energy Adaptive and Power Aware Clustering Protocol for Wireless Sensor Networks  

Directory of Open Access Journals (Sweden)

Full Text Available Progress in wireless communication has made possible the development of low cost wireless sensor networks. Wireless sensor networks (WSNs) with a large number of small sensor nodes can be used for monitoring and controlling the physical environment from remote location with better exactness. This network due to the shared wireless communication medium and deployed unpleasant surroundings is fault prone. This paper introduces the new fault tolerant protocol to reduce overall power consumption, maximize the network lifetime in a wireless sensor network. Proposed FTEAP protocol, aims to decrease the consumption of the network resources in each round of data communication and aggregation and it is a fault tolerant technique that guarantees trustworthy of the communications between sensor nodes and base station by selecting strongest node as cluster-head and electing a reserved cluster-head. Simulation results demonstrate that the proposed algorithm has higher efficiency and can achieve better network lifetime and energy consumption

Mehdi Golsorkhtabar Amiri

2010-01-01

369

A Fuzzy Approach for Component Selection amongst Different Versions of Alternatives for a Fault Tolerant Modular Software System under Recovery Block Scheme Incorporating Build-or-Buy Strategy  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Software projects generally have to deal with producing and managing large and complex software products. As the functionality of computer operations become more essential and yet more critical, there is a great need for the development of modular software system. Component-Based Software Engineerin...

P. C. Jha; Ritu Arora; U. Dinesh Kumar

370

Review of Fault Diagnosis in Control Systems  

Directory of Open Access Journals (Sweden)

Full Text Available The major achievements on the research of fault detection in control systems are reviewed in detail from three aspects which including signal threshold based approaches, signal model based approaches and process model based approaches. Particular emphasis is put on the process model based approaches which is using the closed-loop monitoring information in control systems to establish the quantitative and qualitative process model, detecting and then isolating the main failures in sensors, actuators, and the controlled process. The corresponding application problems are stated within a purpose of acquiring small probability of both the false alarm rate and missed alarm rate. As a result of the growing demand for higher performance, efficiency, reliability, sensitivity and rapidity in fault diagnosis systems , the robust fault detection in the transition process, the knowledge acquisition for quantitative and qualitative diagnosis based on processing history data, and hybrid intelligent fault diagnosis system architecture are worthy of a deeper research.

Aishe Shui; Lichuan Liu; Weihong Fang; Shenglin Li; Changhua Xie; Haitao Zhang

2012-01-01

371

A Fuzzy Approach for Component Selection amongst Different Versions of Alternatives for a Fault Tolerant Modular Software System under Recovery Block Scheme Incorporating Build-or-Buy Strategy  

Directory of Open Access Journals (Sweden)

Full Text Available Software projects generally have to deal with producing and managing large and complex software products. As the functionality of computer operations become more essential and yet more critical, there is a great need for the development of modular software system. Component-Based Software Engineering concerned with composing, selecting and designing components to satisfy a set of requirements while minimizing cost and maximizing reliability of the software system. This paper discusses the fuzzy approach for component selection using “Build-or-Buy” strategy in designing a software structure. We introduce a framework that helps developers to decide whether to buy or build components. In case a commercial off-the-shelf (COTS) component is selected then different versions are available for each alternative of a module and only one version will be selected. If a component is an in-house built component, then the alternative of a module is selected. Numerical illustrations are provided to demonstrate the model developed.

P. C. Jha; Ritu Arora; U. Dinesh Kumar

2011-01-01

372

A Study on Reliability Improvement of a Fault Tolerant Digital Governor  

Energy Technology Data Exchange (ETDEWEB)

In this paper, fault tolerant digital governor is designed to realize ceaseless controlling and to improve the reliability of control system. Designed digital governor has duplex I/O module and triplex CPU module and also 2 out of 3 voting algorithm and self diagnostic ability. The processor module of the system(SIDG-3000) is developed based on 32 Bit industrial microprocessor, which guaranteed high quality of the module and SRAM for data also SRAM for command are separated. The process module also includes inter process communication function and power back up function (SRAM for back-up). System reliability is estimated by using the model of Markov process. It is shown that the reliability of triplex system in mission time can be dramatically improved compared with a single control system. Designed digital governor system is applied after modelling of the steam turbine generator system of Buk-Cheju Thermal Power Plant. Simulation is carried out to prove the effectiveness of the designed digital governor system. (author). 12 refs., 8 figs.

Shin, M.C. [Namseoul University, Cheonan (Korea); Jeon, I.Y.; Cho, S.H.; Lee, S.G.; Kim, Y.S. [Korea Maritime University, Pusan (Korea)

2002-05-01

373

LOW POWER FAULT TOLERANT SBOX DESIGN FOR XTS-AES ENCRYPTION  

Directory of Open Access Journals (Sweden)

Full Text Available This paper discuss a low power fault tolerant S-Box design for XTS- AES Algorithm, also called as P1619 Crypto Core was developed by SISWG (Security In storage Work Group) a Hard Disk Encryption standard algorithm. The faults are injected by Fault Injection Circuits which are considered in terms of Hardware Failures for the S-Box Transformation in every round during the circular shift operation for the block size of 128 bits and the technique applied to correct the fault is component reusability which never uses extra overhead components or spare circuits. The design has been synthesized in Cadence 90nm Technology with clock frequency about 1700MHz and the cell area obtained is 256980?m2 and the power consumption is 20198.53?W

Arun Kumar.P; Pandian.P; Raja Paul Peringham

2013-01-01

374

A Multiple Fault Tolerant Approach with Improved Performance in Cluster Computing  

Directory of Open Access Journals (Sweden)

Full Text Available In case of multiple node failures performance is very low as compare to single node failure. Failures of nodes in cluster computing can be tolerated by multiple fault tolerant computing. In this paper, we propose a multiple fault tolerant technique with improved failure detection and performance. Failure detection is done by improved adaptive heartbeats based algorithm to improve the degree of confidence and accuracy. Failure recovery is based on reassignment of load with a rank based algorithm Performance is achieved by distributing the load among all available nodes with dynamic rank based balancing algorithm. Dynamic ranking algorithm is low overhead algorithm for reassignment of tasks uniformly among all available nodes. Message logging is used to recover message loss

Sanjay Bansal; Sanjeev Sharma; Rajiv Gandhi Prodhyogiki Vishwavidya

2011-01-01

375

Active fault diagnosis in closed-loop uncertain systems  

DEFF Research Database (Denmark)

Fault diagnosis of parametric faults in closed-loop uncertain systems by using an auxiliary input vector is considered in this paper, i.e. active fault diagnosis (AFD). The active fault diagnosis is based directly on the socalled fault signature matrix, related to the YJBK (Youla, Jabr, Bongiorno and Kucera) parameterization. Conditions are given for exact detection and isolation of parametric faults in closed-loop uncertain systems.

Niemann, Hans Henrik

2006-01-01

376

Harmonic Analysis and Fault-Tolerant Capability of a Semi-12-Phase Permanent-Magnet Synchronous Machine Used for EVs  

Directory of Open Access Journals (Sweden)

Full Text Available This paper deals with a fault-tolerant semi-12-phase permanent-magnet synchronous machine (PMSM) used for electric vehicles. High fault-tolerant and low toque ripple features are achieved by employing fractional slot concentrated windings (FSCWs) and open windings. Excessive magnetomotive force (MMF) harmonic components can lead to thermal demagnetization of rotor magnets as well as high core loss. An improved all-teeth-wound winding disposition that changes the winding factor of each harmonic is applied to suppress harmonics. A relatively large slot leakage inductance that limits the short-circuit current (SCC) induced in the short-circuited winding is proposed to deal with short-circuit fault. Fault-tolerant controls up to two phases open circuited are investigated in this paper based on keeping the same torque-producing MMF. The fault-tolerant control strategies corresponding to each faulty mode are studied and compared to ensure high performance operation.

Ping Zheng; Fan Wu; Yi Sui; Pengfei Wang; Yu Lei; Haipeng Wang

2012-01-01

377

Scope of Reversible Engineering at Gate-Level : Fault - Tolerant Combinational Adders  

Directory of Open Access Journals (Sweden)

Full Text Available Reversible engineering has been one of the thrust areas ensuring that continual process of the innovation trends that explore and sustain the resources of the nature. This reversible engineering is used in many fields like quantum computing, low power CMOS design, nanotechnology, optical information processing, digital signal processing, cryptography, etc. These are the digital domain implementations of Reversible and Fault-Tolerant logic gates. Any arbitrary Boolean function can be synthesized by using the proposed parity preserving reversible gates. Not only the possibility of detecting errors is induced inherently in the proposed high speed adders at their output side but also it allows any fault that affects no more than a single signal that is detectable. The fault tolerant reversible full adder circuits are realized by using two IG gates only. The derived fault tolerant full adder is used for designing other arithmetic- logic circuit by using it as fundamental building block. The proposed reversible gate is designed to have less hardwarecomplexity and efficiecyt in terms of gate count, garbage outputs and constant input. In this paper, we design BCD adder using carry select logic, Carry-select and Bypass adders using FG gates, and newly designed TG gates.

M.Bharathi; K.Neelima

2012-01-01

378

More Improvement by Helping Ant to Fault-Tolerant Heuristic Routing Algorithm in Mesh Networks  

Directory of Open Access Journals (Sweden)

Full Text Available Routing with fault-tolerant mechanisms has a crucial effect on the fast exchange of information in variety of networks including mesh networks. This study attempts to choose an optimal path in terms of fault tolerance to transmit messages from source to destination while taking into account faulty nodes in such mesh networks. In this study, we take advantage of ant colony optimization algorithm to propose Adaptive Heuristic Routing algorithms to this problem. We use color pheromone ants to overcome problem of fail-recover behavior of network components. The proposed method is compared with fault-tolerant routing algorithm in mesh networks using the balanced ring. Simulation results depict that this method reacted quickly in terms of network faults, meanwhile in each time step the data can choose the optimal path to reach their destination. In this study, we improve performance of the proposed method using update ants to inform other nodes about the discovered shortest path. Simulation results show that the proposed method dramaticcaly increase efficiency of routing mechanism in mesh networks.

Alireza Soleimany; Somayeh Azmoodeh

2013-01-01

379

Study of a Nine-Phase Fault Tolerant Permanent Magnet Starter-Alternator  

Directory of Open Access Journals (Sweden)

Full Text Available The paper presents a study on a nine-phasepermanent magnet synchronous starter-alternator forautomotive applications, analyzing different convertertopologies, detailing the simulation programs anddiscussing the results in different operating conditions,from entire healthy machine to several faulted phases.The comparison between the two converter topologiescontrolling the multiphase machine highlights theincreased fault tolerance, hence the reliability of suchstarter-alternator structures. Nevertheless, the article iswithin the area of interest of nowadays research in thefield of automotive applications. Replacing twomachines (the starter and the alternator) of a vehicle isthe motivation for designing and studying starteralternatorstructures.

RUBA Mircea; SURDU Felicia; SZABÓ Loránd

2011-01-01

380

Study of Intelligence Diagnosis System for Wind Turbine Gearbox Fault  

Directory of Open Access Journals (Sweden)

Full Text Available According to the current application and maintenance situation of gearbox of wind turbine, this paper analyzed the remote fault diagnosis system used for monitoring fault occurrence and diagnosing fault, which is a combination of Expert System and Artificial Neural Networks. A practical fault instance of gearbox of wind turbine is analyzed to define the expert system’s knowledge base structure and reasoning method. And verify its feasibility in fault diagnosis by using Matlab language.  

Xiang Ling; Cui Wei

2013-01-01

 
 
 
 
381

Fault-tolerant quantum computation with a soft-decision decoder for error correction and detection by teleportation  

Science.gov (United States)

Fault-tolerant quantum computation with quantum error-correcting codes has been considerably developed over the past decade. However, there are still difficult issues, particularly on the resource requirement. For further improvement of fault-tolerant quantum computation, here we propose a soft-decision decoder for quantum error correction and detection by teleportation. This decoder can achieve almost optimal performance for the depolarizing channel. Applying this decoder to Knill's C4/C6 scheme for fault-tolerant quantum computation, which is one of the best schemes so far and relies heavily on error correction and detection by teleportation, we dramatically improve its performance. This leads to substantial reduction of resources.

Goto, Hayato; Uchikawa, Hironori

2013-01-01

382

Theoretical Investigation of Fault System Evolution  

Science.gov (United States)

In the past several years new results have been obtained which strongly indicate that earthquake fault systems are highly correlated. One example is the work of Bowman et al[1] that indicates a correlated region, approximately ten times the size of the region of slip, in the vicinity of the ruptured fault which affects the rupture. A second example is the work of Rundle et al [2] which indicates a region of approximately the same size as found in reference 1 in which precursory activity or quiescence can be used to forcast events on a fault system. In this talk we will present results of a theoretical analysis of the Langevin equation[3] that describe earthquake faults. This analysis indicates that the earthquake fault systems are highly correlated and are similar to systems in condensed matter such as a free quantum particle in a random potential.[4] [1] D. D. Bowman, G. Ouillon, C. G. Sammis, A. Sornette and D. Sornette, ``An Observational test of the Critical Earthquake Concept'' J. Geophys. Res. 103, 24,359-24,372 [2] W. Klein, M. Anghel, C. D. Ferguson, J. B. Rundle and J. S. sa Martins, ``Statistical Analysis of a Model for Earthquake Faults with Long Range Stress Transfer'' in ``Geocomplexity and the Physics of Earthquakes'' J. B. Rundle, D. L. Turcotte and W. Klein eds. (AGU Monograph 120) [3] J. B. Rundle, W. Klein, K. Tiampo and S. Gross ``Dynamics of Seismicity Patterns in Systems of Earthquake Faults'' in Geocomplexity and the Physics of Earthquakes'' J. B. Rundle, D. L. Turcotte and W. Klein eds. (AGU Monograph 120) [4] W. Klein, J. B. Rundle and K. F. Tiampo (in preparation)

Klein, W.; Rundle, J. B.; Tiampo, K. F.

2002-12-01

383

Multi-star multi-phase winding for a high power naval propulsion machine with low ripple torques and high fault tolerant ability  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper, an original multi-phase Surface Mounted Permanent Magnet (SMPM) Machine designed for naval propulsion is proposed. The design objective of this high power low speed machine is twofold: to enhance the fault tolerance capability of the system and to optimize the quality of the torque by...

SCUILLER, Franck; Charpentier, Jean-Frederic; SEMAIL, Eric

384

Felhantering med Operatoersanpassat Funktionsoevervakningssystem (Fault Management with an Operator-Friendly Fault Identification System).  

Science.gov (United States)

In the report an experimental study of operators fault management with the help of two fault identification systems is described. The study was conducted in SAAB Avionics flight simulator T3Sim. The two systems studies were the current fault identificatio...

M. Caster S. Magnusson

2002-01-01

385

Selection of a Checkpoint Interval in Coordinated Checkpointing Protocol for Fault Tolerant Open MPI  

Directory of Open Access Journals (Sweden)

Full Text Available The goal of this paper is to address the selection of efficient checkpoint interval which reduces the total overhead cost due to the checkpointing and restarting of the applications in a distributed system environment. Coordinated checkpointing rollback recovery protocol is used for making the application programs fault tolerant on a stand-alone system under no load conditions using BLCR and OPEN MPI at system level. We have presented an experimental study in which we have used the optimum checkpoint interval determined by an existingmodel to compare the performance of coordinated checkpointing protocol using two types of checkpointing intervals namely fixed and incremental checkpoint intervals. We measured the checkpoint cost, rollback cost and total cost of overheads caused by the above two methods of checkpointing intervals Failures are simulated using the Poisson distribution with one failure per hour and the inter arrival time between the failures follow exponential distribution. We have observed from the results that, rollback overhead and total cost of overheads due to checkpointing the application are very high in incremental checkpoint interval method than in fixed checkpoint interval method. Hence, we conclude that fixed checkpointing interval method is more efficient as it reduces the rollback overhead and also total cost of overheads considerably.

Mallikarjuna Shastry P.M; K. Venkatesh

2010-01-01

386

Fault diagnosis and accommodation of a three-tank system based on analytical redundancy.  

UK PubMed Central (United Kingdom)

This paper investigates the application of a fault diagnosis and accommodation method to a real system composed of three tanks. The performance of a closed-loop system can be altered by the occurrence of faults which can, in some circumstances, cause serious damage on the system. The research goal is to prevent the system deterioration by developing a controller that has some capabilities to compensate for faults, that is, the fault accommodation or fault-tolerant control. In this paper, a two-step scheme composed of a fault detection, isolation and estimation module, and a control compensation module is presented. The main contribution is to develop a unique structured residual generator able to isolate and estimate both sensor and actuator faults. This estimation is of paramount importance to compensate for these faults and to preserve the system performances. The application of this method to the three-tank system gives encouraging results which are presented and commented on in case of various kinds of faults.

Theilliol D; Noura H; Ponsart JC

2002-07-01

387

An Adaptive Fault-Tolerant Communication Scheme for Body Sensor Networks  

CERN Multimedia

A high degree of reliability for critical data transmission is required in body sensor networks (BSNs). However, BSNs are usually vulnerable to channel impairments due to body fading effect and RF interference, which may potentially cause data transmission to be unreliable. In this paper, an adaptive and flexible fault-tolerant communication scheme for BSNs, namely AFTCS, is proposed. AFTCS adopts a channel bandwidth reservation strategy to provide reliable data transmission when channel impairments occur. In order to fulfill the reliability requirements of critical sensors, fault-tolerant priority and queue are employed to adaptively adjust the channel bandwidth allocation. Simulation results show that AFTCS can alleviate the effect of channel impairments, while yielding lower packet loss rate and latency for critical sensors at runtime.

Wu, Guowei; Xia, Feng; Xu, Zichuan; 10.3390/s101109590

2010-01-01

388

Fault-tolerant mechanism of both job execution and file transfer for integrated nuclear energy simulation  

International Nuclear Information System (INIS)

By integrating several simulation codes which simulate physical process or part of nuclear energy facility, large-scale and detailed simulation can be carried out. Such integrated simulations require several weeks or months of CPU times. Avoiding unscheduled outage of computers or network, we have developed fault-tolerant mechanism for cooperative execution of the codes. The mechanism covers abnormal end of jobs on supercomputers and error of file transfers. When the computer causes unexpected outage, the mechanism tries to submit job of simulation to alternative computer. Furthermore, by comparison the size of the files between before and after transfer, the mechanism detects error of the transfer. In the fault-tolerant mechanism, because the relations between the jobs and the file transfers are connected, we can decide an execution order of the codes by the definition of file flow. Therefore we can operate integrated simulations in which the codes are executed sequentially or concurrently. (author)

2010-01-01

389

Fault Tolerant Modular Linear Motor for Safe-Critical Automated Industrial Applications  

Directory of Open Access Journals (Sweden)

Full Text Available In various safe-critical industrial, medical and defence applications the translational movements are performed by linear motors. In such applications both the motor and its power converter should be fault tolerant. To fulfil this assignment redesigned motorstructures with novel phase connections must be used. In the paper a modular double salient permanent magnet linear motor is studied. Its phases are split into independent channels. The study on the fault tolerant capability of the linear motor was performed via cosimulation, using the Flux-to-Simulink Technology. The conclusions of the paper could help the users to select the optimal linear motor topology for their certain application, function of the required meantraction force and its acceptable ripples.

Loránd SZABÓ; Mircea RUBA; Ern? KOVÁCS; Viktor FÜVESI

2009-01-01

390

Fault tolerant deterministic quantum communications using GHZ states over collective-noise channels  

Science.gov (United States)

This study proposes two new coding functions for a GHZ state and a GHZ-like state, respectively. Based on these coding functions, two fault tolerant deterministic quantum communication (DQC) protocols are proposed. Each of the new DQC's is robust under one kind of collective noises: collective-dephasing noise and collective-rotation noise, respectively. The sender can use the proposed coding functions to encode his/her message, and the receiver can perform the Bell measurement to obtain the sender's message. In comparison to the existing fault tolerant DQC protocols over collective-noise channels, the proposed protocols provide the best qubit efficiency. Moreover, the proposed protocols are also free from the ordinary eavesdropping and the information leakage.

Yang, Chun-Wei; Tsai, Chia-Wei; Hwang, Tzonelih

2013-09-01

391

FATAL+: A Self-Stabilizing Byzantine Fault-tolerant Clocking Scheme for SoCs  

CERN Document Server

We present concept and implementation of a self-stabilizing Byzantine fault-tolerant distributed clock generation scheme for multi-synchronous GALS architectures in critical applications. It combines a variant of a recently introduced self-stabilizing algorithm for generating low-frequency, low-accuracy synchronized pulses with a simple non-stabilizing high-frequency, high-accuracy clock synchronization algorithm. We provide thorough correctness proofs and a performance analysis, which use methods from fault-tolerant distributed computing research but also addresses hardware-related issues like metastability. The algorithm, which consists of several concurrent communicating asynchronous state machines, has been implemented in VHDL using Petrify in conjunction with some extensions, and synthetisized for an Altera Cyclone FPGA. An experimental validation of this prototype has been carried out to confirm the skew and clock frequency bounds predicted by the theoretical analysis, as well as the very short stabiliz...

Dolev, Danny; Lenzen, Christoph; Posch, Markus; Schmid, Ulrich; Steininger, Andreas

2012-01-01

392

Synaptic Weight Noise During MLP Learning Enhances Fault-Tolerance, Generalisation and Learning Trajectory  

UK PubMed Central (United Kingdom)

We analyse the effects of analog noise on the synaptic arithmeticduring MultiLayer Perceptron training, by expanding the cost functionto include noise-mediated penalty terms. Predictions are madein the light of these calculations which suggest that fault tolerance,generalisation ability and learning trajectory should be improvedby such noise-injection. Extensive simulation experiments on twodistinct classification problems substantiate the claims. The resultsappear to be perfectly general for all training schemes whereweights are adjusted incrementally, and have wide-ranging implicationsfor all applications, particularly those involving "inaccurate"analog neural VLSI.1 IntroductionThis paper demonstrates both by consideration of the cost function and the learningequations, and by simulation experiments, that injection of random noise onto MLP weights during learning enhances fault-tolerance without additional supervision.We also show that the nature of the hidden node s...

Alan F. Murray; Peter J. Edwards

393

A Fault Tolerant Trajectory Clustering (FTTC) for selecting cluster heads inWireless Sensor Networks  

CERN Multimedia

Wireless sensor networks (WSNs) suffers from the hot spot problem where the sensor nodes closest to the base station are need to relay more packet than the nodes farther away from the base station. Thus, lifetime of sensory network depends on these closest nodes. Clustering methods are used to extend the lifetime of a wireless sensor network. However, current clustering algorithms usually utilize two techniques; selecting cluster heads with more residual energy, and rotating cluster heads periodically to distribute the energy consumption among nodes in each cluster and lengthen the network lifetime. Most of the algorithms use random selection for selecting the cluster heads. Here, we propose a Fault Tolerant Trajectory Clustering (FTTC) technique for selecting the cluster heads in WSNs. Our algorithm selects the cluster heads based on traffic and rotates periodically. It provides the first Fault Tolerant Trajectory based clustering technique for selecting the cluster heads and to extenuate the hot spot proble...

Munaga, Hazarath; Venkateswarlu, N B

2011-01-01

394

On the Mobility and Fault Tolerance of Closed Chain Manipulators with Passive Joints  

Directory of Open Access Journals (Sweden)

Full Text Available A systematic analysis of the mobility of closed chain manipulators with passive joints is presented. The main observation in this paper is that the mobility of the manipulator, considering the passive joints only, should always be zero. Further, for the manipulator to be fault tolerant, the mobility should remain zero when actuator failure occurs for an arbitrary joint. We present a simple and rigorous approach to the problem of finding the smallest set of active joints for which the manipulator remains equilibrated with respect to free swinging joint failure in any joint. Several examples of how to choose the active joints for different mechanisms to guarantee that the manipulator is equilibrated and fault tolerant are presented.

Pål J. From; Jan T. Gravdahl

2008-01-01

395

ENERGY CONSERVED FAULT TOLERANT CLUSTERS WITH QoS ROUTING IN WIRELESS AD HOC NETWORK  

Directory of Open Access Journals (Sweden)

Full Text Available Currently, wireless networks are fetching more fashionable and it can be employed in all genuine world applications. So it is required to present a good Quality of Service (QOS) for distributing a video, voice and data. To present diverse varieties of priority to diverse types of applications, QoS will present numerous mechanisms and the examination is chiefly utilized in the fields such as defense, military and so on. MANET, a Wireless Ad hoc network possibly would not be capable to present a good QoS as it has communications less environment. Because the mobility of nodes is self-sufficient, the topology of the network modifies recurrently. Numerous routing protocols are accessible to present the QoS service for MANET, however in all those techniques path link or communication link transparency increases. This tends to decline in the existence of the network. To progress the existence of the network and to diminish the path link failure, here in this work we are going to provide a novel technique which preserves the energy altitude of the network to balance both energy level and mobility rate. The proposed work will provide the finest QoS in MANET in the way of clustering the network based on energy consumption, fault tolerance rate, and mobility rate. The clustering approach judges only the restricted fault tolerant fairly than comprehensive fault tolerant to notice and process the rate of mobility of the nodes in the network. This energy conserved clustering on QoS for MANET will progress the life span of both the nodes and the network. An experimental evaluation is carried out to estimate the performance of the proposed Energy Conserved Fault-tolerant Clusters with QoS Routing in wireless ad-hoc networks [ECFCR] in terms of communication overhead, recovery time, failure rate.

Dr. Thangaraj. P, Renuka. M, Dr.S.N. Sivanandam

2012-01-01

396

Fault-tolerant corrector/detector chip for high-speed data processing  

Energy Technology Data Exchange (ETDEWEB)

An internally fault-tolerant data error detection and correction integrated circuit device (10) and a method of operating same. The device functions as a bidirectional data buffer between a 32-bit data processor and the remainder of a data processing system and provides a 32-bit datum is provided with a relatively short eight bits of data-protecting parity. The 32-bits of data by eight bits of parity is partitioned into eight 4-bit nibbles and two 4-bit nibbles, respectively. For data flowing towards the processor the data and parity nibbles are checked in parallel and in a single operation employing a dual orthogonal basis technique. The dual orthogonal basis increase the efficiency of the implementation. Any one of ten (eight data, two parity) nibbles are correctable if erroneous, or two different erroneous nibbles are detectable. For data flowing away from the processor the appropriate parity nibble values are calculated and transmitted to the system along with the data. The device regenerates parity values for data flowing in either direction and compares regenerated to generated parity with a totally self-checking equality checker. As such, the device is self-validating and enabled to both detect and indicate an occurrence of an internal failure. A generalization of the device to protect 64-bit data with 16-bit parity to protect against byte-wide errors is also presented.

Andaleon, David D. (San Ramon, CA); Napolitano, Jr., Leonard M. (Danville, CA); Redinbo, G. Robert (Davis, CA); Shreeve, William O. (Fayetteville, NY)

1994-01-01

397

Formal Analysis of Fault-tolerant Algorithm in the Time-triggered Architecture  

Directory of Open Access Journals (Sweden)

Full Text Available Time-Triggered architecture (TTA) provides a computing infrastructure for the design and implementation of dependable distributed systems. The core building block of the TTA is the communication protocol TTP/C. This protocol has been designed to provide no faulty nodes. TTP/C integrates a set of fault-tolerant services like: message transmissions, clocks synchronization and Group Membership Protocol (GMP). The GMP protocol ensures that each TTA node maintains a private membership set, which records all the nodes that are believed to be nonfaulty. In the GMP protocol previously studied in the literature, any detected faulty node is immediately excluded from the group. This gradual exclusion process risks invalidating the protocol after N-3 successive failures if the ability of faulty node reintegration is not implemented. Our contribution in this paper is to remedy this serious problem. A node reintegration increases system survivability by allowing a (recovering) transiently-faulty node to regain a group. Our proposal algorithm, devoted to node reintegration inside the group membership protocol, is formally specified and verified using a diagrammatic representation. The verification of the proposal has been checked with the well known PVS theorem prover.

Aliouat Zibouda; Aliouat Makhlouf; Batouche Chawki

2007-01-01

398

ALLIANCE: An architecture for fault tolerant, cooperative control of heterogeneous mobile robots  

Energy Technology Data Exchange (ETDEWEB)

This research addresses the problem of achieving fault tolerant cooperation within small- to medium-sized teams of heterogeneous mobile robots. The author describes a novel behavior-based, fully distributed architecture, called ALLIANCE, that utilizes adaptive action selection to achieve fault tolerant cooperative control in robot missions involving loosely coupled, largely independent tasks. The robots in this architecture possess a variety of high-level functions that they can perform during a mission, and must at all times select an appropriate action based on the requirements of the mission, the activities of other robots, the current environmental conditions, and their own internal states. Since such cooperative teams often work in dynamic and unpredictable environments, the software architecture allows the team members to respond robustly and reliably to unexpected environmental changes and modifications in the robot team that may occur due to mechanical failure, the learning of new skills, or the addition or removal of robots from the team by human intervention. After presenting ALLIANCE, the author describes in detail experimental results of an implementation of this architecture on a team of physical mobile robots performing a cooperative box pushing demonstration. These experiments illustrate the ability of ALLIANCE to achieve adaptive, fault-tolerant cooperative control amidst dynamic changes in the capabilities of the robot team.

Parker, L.E.

1995-02-01

399

A master system for power system fault phenomena  

Energy Technology Data Exchange (ETDEWEB)

This report includes as follows - Real time digital simulator - Remote measuring, analyzing and reproducing system of power system fault data -Power system reduction method program using EMTP -Test system for protection device. (author). 22 refs., 38 figs.

Yoo, Myung Ho; Jang, Sang Ho; Hong, Joon Hee; Min, Wan Ki; Yoo, Chang Hwan [Korea Electric Power Corp. (KEPCO), Taejon (Korea, Republic of). Research Center

1995-12-31

400

FT-GReLoSSS: a Skeletal-based approach towards application parallelization and low-overhead fault tolerance  

Digital Repository Infrastructure Vision for European Research (DRIVER)

FT-GReLoSSS (FTG) is a C++/MPI framework to ease the development of fault-tolerant parallel applications belonging to a SPMD family termed GReLoSSS. The originality of FTG is to rely on the MoLOToF programming model principles to facilitate the addition of an efficient checkpoint-based fault toleran...

Makassikis, Constantinos; Vialle, Stéphane; Warin, Xavier

 
 
 
 
401

A Fuzzy-Based Strategy to Improve Control Reconfiguration Performance of a Sensor Fault-Tolerant Induction Motor Propulsion  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This short paper deals with the transition performance improvement of a sensor fault-tolerant controller devoted to automotive applications. Indeed, improvements are brought over a previously developed technique that exhibit abrupt changes in the torque if a sensor fault is detected and after a tran...

Tabbache, Bekheira; Benbouzid, Mohamed; Kheloui, Abdelaziz; Bourgeot, Jean-Matthieu

402

Multiprocessor architecture ensures fault-tolerant transaction processing  

Energy Technology Data Exchange (ETDEWEB)

A recent entry in the expanding market for continuous-processing systems is a fail-safe computer comprised of tightly-coupled general-purpose processors and specialised input/output processors. The processor in Synapse Computer Corp.'s n+1 online transaction-processing system use a proprietary nonwrite-through cache memory and can access reconfigurable, shared main memory over dual 32 m-byte-per-sec buses. Access protection is achieved by integrating the relational database management system, the transaction processing manager and the synthesis operating system into a set of protection spheres. Synchronisation of the database and transaction processing systems provides automatic application checkpointing and recoverability.

Inselberg, A.D.

1983-04-01

403

Guideliness for system modeling: fault tree [analysis  

Energy Technology Data Exchange (ETDEWEB)

This document, the guidelines for system modeling related to Fault Tree Analysis(FTA), is intended to provide the guidelines with the analyzer to construct the fault trees in the level of the capability category II of ASME PRA standard. Especially, they are to provide the essential and basic guidelines and the related contents to be used in support of revising the Ulchin 3 and 4 PSA model for risk monitor within the capability category II of ASME PRA standard. Normally the main objective of system analysis is to assess the reliability of system modeled by Event Tree Analysis (ETA). A variety of analytical techniques can be used for the system analysis, however, FTA method is used in this procedures guide. FTA is the method used for representing the failure logic of plant systems deductively using AND, OR or NOT gates. The fault tree should reflect all possible failure modes that may contribute to the system unavailability. This should include contributions due to the mechanical failures of the components, Common Cause Failures (CCFs), human errors and outages for testing and maintenance. This document identifies and describes the definitions and the general procedures of FTA and the essential and basic guidelines for reving the fault trees. Accordingly, the guidelines for FTA will be capable to guide the FTA to the level of the capability category II of ASME PRA standard.

Lee, Yoon Hwan; Yang, Joon Eon; Kang, Dae Il; Hwang, Mee Jeong

2004-07-01

404

ROSE::FTTransform - A Source-to-Source Translation Framework for Exascale Fault-Tolerance Research  

Energy Technology Data Exchange (ETDEWEB)

Exascale computing systems will require sufficient resilience to tolerate numerous types of hardware faults while still assuring correct program execution. Such extreme-scale machines are expected to be dominated by processors driven at lower voltages (near the minimum 0.5 volts for current transistors). At these voltage levels, the rate of transient errors increases dramatically due to the sensitivity to transient and geographically localized voltage drops on parts of the processor chip. To achieve power efficiency, these processors are likely to be streamlined and minimal, and thus they cannot be expected to handle transient errors entirely in hardware. Here we present an open, compiler-based framework to automate the armoring of High Performance Computing (HPC) software to protect it from these types of transient processor errors. We develop an open infrastructure to support research work in this area, and we define tools that, in the future, may provide more complete automated and/or semi-automated solutions to support software resiliency on future exascale architectures. Results demonstrate that our approach is feasible, pragmatic in how it can be separated from the software development process, and reasonably efficient (0% to 30% overhead for the Jacobi iteration on common hardware; and 20%, 40%, 26%, and 2% overhead for a randomly selected subset of benchmarks from the Livermore Loops [1]).

Lidman, J; Quinlan, D; Liao, C; McKee, S

2012-03-26

405

Fault tree analysis of refinery utility systems  

Energy Technology Data Exchange (ETDEWEB)

This paper presents a reliability analysis of selected petroleum refinery utility systems. Fault tree analysis is used to identify system design and operation contributions to refinery unavailability. Recommendations to utility systems that will increase refinery productivity are also discussed. High plant productivity is an objective of process plant design and operation. Availability is often used to measure plant productivity. Plant availability can be improved by reducing the number of process shutdowns or minimizing the duration of shutdowns. One method to improve availability is to determine the systems that contribute most to plant unavailability and then use fault tree analysis to determine system design and operating weaknesses. Then, resources can be allocated for cost-effective improvements to those systems that contribute most to plant unavailability. This study indicates the need for developing emergency procedures to mitigate the effects of utility system failures on refinery availability. Concise, well-designed emergency procedures can significantly improve plant productivity without major capital expense.

Arendt, J.S.

1983-01-01