WorldWideScience
 
 
1

Fault Tolerant Real Time Systems  

CERN Document Server

Real time systems are systems in which there is a commitment for timely response by the computer to external stimuli. Real time applications have to function correctly even in presence of faults. Fault tolerance can be achieved by either hardware or software or time redundancy. Safety-critical applications have strict time and cost constraints, which means that not only faults have to be tolerated but also the constraints should be satisfied. Deadline scheduling means that the taskwith the earliest required response time is processed. The most common scheduling algorithms are :Rate Monotonic(RM) and Earliest deadline first(EDF).This paper deals with the interaction between the fault tolerant strategy and the EDF real time scheduling strategy.

Persya, A Christy

2010-01-01

2

Fault tolerant control for switched linear systems  

CERN Document Server

This book presents up-to-date research and novel methodologies on fault diagnosis and fault tolerant control for switched linear systems. It provides a unified yet neat framework of filtering, fault detection, fault diagnosis and fault tolerant control of switched systems. It can therefore serve as a useful textbook for senior and/or graduate students who are interested in knowing the state-of-the-art of filtering, fault detection, fault diagnosis and fault tolerant control areas, as well as recent advances in switched linear systems.  

Du, Dongsheng; Shi, Peng

2015-01-01

3

Fault tolerant system performance modeling  

Science.gov (United States)

A discrete event simulation tool for the performance modelling of fault tolerant systems is proposed. The technique can be used to specify candidate architectures and to accurately predict their performance in the early stages of design. The use of performance modelling in conjunction with some form of structured methodology (such as the IAPSA II prevalidation methodology) is shown to reduce the cost of developing system architectures. Use of the performance model during the operational life of the system provides a means of easily evaluating any additional system requirements.

Strickland, Michael J.; Palumbo, Daniel L.

1988-01-01

4

Fault-Tolerant UAV Flight Control System  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The main focus of this master?s thesis is fault-tolerant control systems (FTCSs) for unmanned aerial vehicles (UAVs). The goals are to develop an automatic-flight control system (AFCS) with fault detection and isolation (FDI) and a reconfiguration mechanism for accommodation of faults. The literature study reviews methods for fault-tolerant control and also discusses important faults and failures related to UAVs.The FTCS is implemented in MATLAB Simulink with a nonlinear model of the Ces...

Dybsjord, Kerrin Andre

2013-01-01

5

Reconfigurable fault tolerant avionics system  

Science.gov (United States)

This paper presents the design of a reconfigurable avionics system based on modern Static Random Access Memory (SRAM)-based Field Programmable Gate Array (FPGA) to be used in future generations of nano satellites. A major concern in satellite systems and especially nano satellites is to build robust systems with low-power consumption profiles. The system is designed to be flexible by providing the capability of reconfiguring itself based on its orbital position. As Single Event Upsets (SEU) do not have the same severity and intensity in all orbital locations, having the maximum at the South Atlantic Anomaly (SAA) and the polar cusps, the system does not have to be fully protected all the time in its orbit. An acceptable level of protection against high-energy cosmic rays and charged particles roaming in space is provided within the majority of the orbit through software fault tolerance. Check pointing and roll back, besides control flow assertions, is used for that level of protection. In the minority part of the orbit where severe SEUs are expected to exist, a reconfiguration for the system FPGA is initiated where the processor systems are triplicated and protection through Triple Modular Redundancy (TMR) with feedback is provided. This technique of reconfiguring the system as per the level of the threat expected from SEU-induced faults helps in reducing the average dynamic power consumption of the system to one-third of its maximum. This technique can be viewed as a smart protection through system reconfiguration. The system is built on the commercial version of the (XC5VLX50) Xilinx Virtex5 FPGA on bulk silicon with 324 IO. Simulations of orbit SEU rates were carried out using the SPENVIS web-based software package.

Ibrahim, M. M.; Asami, K.; Cho, Mengu

6

Software fault tolerance in computer operating systems  

Science.gov (United States)

This chapter provides data and analysis of the dependability and fault tolerance for three operating systems: the Tandem/GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Based on measurements from these systems, basic software error characteristics are investigated. Fault tolerance in operating systems resulting from the use of process pairs and recovery routines is evaluated. Two levels of models are developed to analyze error and recovery processes inside an operating system and interactions among multiple instances of an operating system running in a distributed environment. The measurements show that the use of process pairs in Tandem systems, which was originally intended for tolerating hardware faults, allows the system to tolerate about 70% of defects in system software that result in processor failures. The loose coupling between processors which results in the backup execution (the processor state and the sequence of events occurring) being different from the original execution is a major reason for the measured software fault tolerance. The IBM/MVS system fault tolerance almost doubles when recovery routines are provided, in comparison to the case in which no recovery routines are available. However, even when recovery routines are provided, there is almost a 50% chance of system failure when critical system jobs are involved.

Iyer, Ravishankar K.; Lee, Inhwan

1994-01-01

7

Energy-efficient fault-tolerant systems  

CERN Document Server

This book describes the state-of-the-art in energy efficient, fault-tolerant embedded systems. It covers the entire product lifecycle of electronic systems design, analysis and testing and includes discussion of both circuit and system-level approaches. Readers will be enabled to meet the conflicting design objectives of energy efficiency and fault-tolerance for reliability, given the up-to-date techniques presented.

Mathew, Jimson; Pradhan, Dhiraj K

2013-01-01

8

Embedded and cooperative control for fault tolerant systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The work presented in this memory of thesis focuses on fault tolerance in the case of linear systems. Digital communication tools are used in the context of the implementation of an architecture for fault tolerant control of complex systems. A cooperation between the control/diagnosis blocks ensures the tolerance to certain types of faults which affect the system. Control system is traditionally carried out starting from a central computer that collects all information gathered on the process...

Menighed, Kamel

2010-01-01

9

Software fault tolerance  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Because of our present inability to produce errorfree software, software fault tolerance is and will contiune to be an important consideration in software system. The root cause of software design errors in the complexity of the systems. This paper surveys various software fault tolerance techniquest and methodologies. They are two gpoups: Single version and Multi version software fault tolerance techniques. It is expected that software fault tolerance research will benefit from this research...

Kazinov, Tofik Hasanaga; Mostafa, Jalilian Shahrukh

2009-01-01

10

From fault classification to fault tolerance for multi-agent systems  

CERN Document Server

Faults are a concern for Multi-Agent Systems (MAS) designers, especially if the MAS are built for industrial or military use because there must be some guarantee of dependability. Some fault classification exists for classical systems, and is used to define faults. When dependability is at stake, such fault classification may be used from the beginning of the system's conception to define fault classes and specify which types of faults are expected. Thus, one may want to use fault classification for MAS; however, From Fault Classification to Fault Tolerance for Multi-Agent Systems argues that

Potiron, Katia; Taillibert, Patrick

2013-01-01

11

Fault-Tolerant Onboard Monitoring and Decision Support Systems  

DEFF Research Database (Denmark)

The purpose of this research project is to improve current onboard decision support systems. Special focus is on the onboard prediction of the instantaneous sea state. In this project a new approach to increasing the overall reliability of a monitoring and decision support system has been established. The basic idea is to convert the given system into a fault-tolerant system and to improve multi-sensor data fusion for the particular system. The background of the project is the SeaSense system, which has been installed on several container ships and navy vessels. The SeaSense system provides a crude and simple estimation of the actual sea state (Hs and Tz), information about the longitudinal hull girder loading, seakeeping performance of the ship, and decision support on how to operate the ship within acceptable limits. The system is able to identify critical forthcoming events and to give advice regarding speed and course changes to decrease the wave-induced loads. The SeaSense system is based on the combineduse of a mathematical model and measurements from a set of sensors. The overall dependability of a shipboard monitoring and decision support system such as the SeaSense system can be improved using fault-tolerant techniques (Fault Diagnosis and System Re-design) and a Sensor Fusion Quality (SFQ) test. Fault diagnosis means to detect the presence of faults in the system. In case sea state estimation is conducted by a ship-wave buoy analogy the best solution is achieved when a set of three different ship responses are used. Faulty signals should be discarded from the procedure for sea state estimation if it is possible, if not the fault should be estimated. The fault diagnosis can be divided into three steps: Fault detection, fault isolation and fault estimation. Fault detection means to decide whether or not a fault has occurred. This step determines the time at which the system is subjected to the given fault. Fault isolation will find in which component a fault has occurred. This step determines the location of the fault. Fault estimation provides an estimate of magnitude of a fault. A supervisory function determines the severity of the fault once its origin has been isolated and its magnitude estimated. Fault-tolerant Sensor Fusion means that the monitoring and decision support system can accommodate faults so that the overall system continues to satisfy its goal and on the other hand in the absence of a fault, the system should be able to provide the most accurate information using the SFQ test.

Lajic, Zoran

2010-01-01

12

Active Fault Tolerant Control of Livestock Stable Ventilation System  

DEFF Research Database (Denmark)

Modern stables and greenhouses are equipped with different components for providing a comfortable climate for animals and plant. A component malfunction may result in loss of production. Therefore, it is desirable to design a control system, which is stable, and is able to provide an acceptable degraded performance even in the faulty case. In this thesis, we have designed such controllers for climate control systems of livestock buildings in three steps: • Deriving a model for the climate control system of a pig-stable. • Designing an active fault diagnosis (AFD) algorithm for different kinds of fault. • Designing a fault tolerant control scheme for the climate control system. In the first step, a conceptual multi-zone model for climate control of a live-stock building is derived. In the next step, two methods for active fault diagnosis are proposed. The AFD methods excite the system by injecting a so-called excitation input. Two different algorithms, the EKF and a new adaptive filter, are used to detect the faults. Fault tolerant controller (FTC) is based on a switching scheme between a set of predefined passive fault tolerant controller (PFTC). In the FTC part of the thesis, first a passive fault tolerant controller (PFTC) based on state feed-back is proposed for discrete-time piecewise affine (PWA) systems. Only actuator faults are considered. Then the PFTC problem is reformulated as a feasibility of a set of linear matrix inequalities (LMIs).

Gholami, Mehdi

2011-01-01

13

H infinity Integrated Fault Estimation and Fault Tolerant Control of Discrete-time Piecewise Linear Systems  

DEFF Research Database (Denmark)

In this paper we consider the problem of fault estimation and accommodation for discrete time piecewise linear systems. A robust fault estimator is designed to estimate the fault such that the estimation error converges to zero and H? performance of the fault estimation is minimized. Then, the estimate of fault is used to compensate for the effect of the fault. Hence, using the estimate of fault, a fault tolerant controller using a piecewise linear static output feedback is designed such that it stabilizes the system and provides an upper bound on the H? performance of the faulty system. Sufficient conditions for the existence of robust fault estimator and fault tolerant controller are derived in terms of linear matrix inequalities. Upper bounds on the H? performance can be minimized by solving convex optimization problems with linear matrix inequality constraints. The efficiency of the method is demonstrated by means of a numerical example.

Tabatabaeipour, Seyed Mojtaba; Bak, Thomas

2012-01-01

14

Measurement and analysis of operating system fault tolerance  

Science.gov (United States)

This paper demonstrates a methodology to model and evaluate the fault tolerance characteristics of operational software. The methodology is illustrated through case studies on three different operating systems: the Tandem GUARDIAN fault-tolerant system, the VAX/VMS distributed system, and the IBM/MVS system. Measurements are made on these systems for substantial periods to collect software error and recovery data. In addition to investigating basic dependability characteristics such as major software problems and error distributions, we develop two levels of models to describe error and recovery processes inside an operating system and on multiple instances of an operating system running in a distributed environment. Based on the models, reward analysis is conducted to evaluate the loss of service due to software errors and the effect of the fault-tolerance techniques implemented in the systems. Software error correlation in multicomputer systems is also investigated.

Lee, I.; Tang, D.; Iyer, R. K.

1992-01-01

15

Fault tolerant hypercube computer system architecture  

Science.gov (United States)

A fault-tolerant multiprocessor computer system of the hypercube type comprising a hierarchy of computers of like kind which can be functionally substituted for one another as necessary is disclosed. Communication between the working nodes is via one communications network while communications between the working nodes and watch dog nodes and load balancing nodes higher in the structure is via another communications network separate from the first. A typical branch of the hierarchy reporting to a master node or host computer comprises, a plurality of first computing nodes; a first network of message conducting paths for interconnecting the first computing nodes as a hypercube. The first network provides a path for message transfer between the first computing nodes; a first watch dog node; and a second network of message connecting paths for connecting the first computing nodes to the first watch dog node independent from the first network, the second network provides an independent path for test message and reconfiguration affecting transfers between the first computing nodes and the first switch watch dog node. There is additionally, a plurality of second computing nodes; a third network of message conducting paths for interconnecting the second computing nodes as a hypercube. The third network provides a path for message transfer between the second computing nodes; a fourth network of message conducting paths for connecting the second computing nodes to the first watch dog node independent from the third network. The fourth network provides an independent path for test message and reconfiguration affecting transfers between the second computing nodes and the first watch dog node; and a first multiplexer disposed between the first watch dog node and the second and fourth networks for allowing the first watch dog node to selectively communicate with individual ones of the computing nodes through the second and fourth networks; as well as, a second watch dog node operably connected to the first multiplexer whereby the second watch dog node can selectively communicate with individual ones of the computing nodes through the second and fourth networks. The branch is completed by a first load balancing node; and a second multiplexer connected between the first load balancing node and the first and second watch dog nodes, allowing the first load balancing node to selectively communicate with the first and second watch dog nodes.

Madan, Herb S. (inventor); Chow, Edward (inventor)

1989-01-01

16

Safety Reliability Enhancement in Fault tolerant Automotive Embedded System  

Directory of Open Access Journals (Sweden)

Full Text Available Reliability is control and prevention of failures to reduce failure and improve operations by enhancing performance with system-level analysis and modelling are needed not only for predictability and comparability when partitioning end-to-end functions at design time levels of reliability. Reliability numbers by themselves will not motivate improvements, performance of two fault tolerant mechanisms dealing with repairable and non-repairable components that have failed. The improvement in the reliability and safety of a system with repairable components with respect to the fault tolerant systems under study correspond to a flexible arrangement of fault tolerant units (FTU’s. SFAS (Safety Fault tolerant Automotive Systems and ECU are being compared to achieve effective results. Reliability principles are discussed which assist system improvement for reducing the high unreliability. CAN Controllers are used in automotive for fault tolerant embedded system. The existing reliability enhancement models are emphasizing various redundancy techniques both in hardware and software without focusing a formal way of recovery time minimization from the affected or degraded states in the automotive systems.

Balachandra Pattanaik,

2013-01-01

17

Fault tolerant digital control systems for boiling water reactors  

International Nuclear Information System (INIS)

In a Boiling Water Reactor nuclear power plant, the power generation control function is divided into several systems, each system controlling only a part of the total plant. Presently, each system is controlled by conventional analog or digital logic circuits with little interaction for coordinated control. The advent of microprocessors has allowed the development of distributed fault-tolerant digital controls. The objective is to replace these conventional controls with fault-tolerant digital controls connected together with digital communication links to form a fully integrated nuclear power plant control system

18

Data-driven design of fault diagnosis and fault-tolerant control systems  

CERN Document Server

Data-driven Design of Fault Diagnosis and Fault-tolerant Control Systems presents basic statistical process monitoring, fault diagnosis, and control methods, and introduces advanced data-driven schemes for the design of fault diagnosis and fault-tolerant control systems catering to the needs of dynamic industrial processes. With ever increasing demands for reliability, availability and safety in technical processes and assets, process monitoring and fault-tolerance have become important issues surrounding the design of automatic control systems. This text shows the reader how, thanks to the rapid development of information technology, key techniques of data-driven and statistical process monitoring and control can now become widely used in industrial practice to address these issues. To allow for self-contained study and facilitate implementation in real applications, important mathematical and control theoretical knowledge and tools are included in this book. Major schemes are presented in algorithm form and...

Ding, Steven X

2014-01-01

19

Programs For Modeling Fault-Tolerant Computing Systems  

Science.gov (United States)

Pade Approximation with Scaling, (PAWS) and Scaling Taylor Exponential Matrix (STEM) computer programs are software tools for design and validation. Provide flexible, user-friendly, language-based interface for input of Markov mathematical methods describing behaviors of fault-tolerant computer systems. Markov models include both recovery from faults via reconfiguration and behaviors of such systems when faults occur. PAWS and STEM produce exact solutions of probability of system failure and provide conservative estimate of number of significant digits in solution. Written in PASCAL and FORTRAN.

Butler, Ricky W.

1991-01-01

20

Design of fault tolerant control system for steam generator using  

Energy Technology Data Exchange (ETDEWEB)

A controller and sensor fault tolerant system for a steam generator is designed with fuzzy logic. A structure of the proposed fault tolerant redundant system is composed of a supervisor and two fuzzy weighting modulators. A supervisor alternatively checks a controller and a sensor induced performances to identify which part, a controller or a sensor, is faulty. In order to analyze controller induced performance both an error and a change in error of the system output are chosen as fuzzy variables. The fuzzy logic for a sensor induced performance uses two variables : a deviation between two sensor outputs and its frequency. Fuzzy weighting modulator generates an output signal compensated for faulty input signal. Simulations show that the proposed fault tolerant control scheme for a steam generator regulates well water level by suppressing fault effect of either controllers or sensors. Therefore through duplicating sensors and controllers with the proposed fault tolerant scheme, both a reliability of a steam generator control and sensor system and that of a power plant increase even more. 2 refs., 9 figs., 1 tab. (Author)

Kim, Myung Ki; Seo, Mi Ro [Korea Electric Power Research Institute, Taejon (Korea, Republic of)

1998-12-31

 
 
 
 
21

Summarize of Electric Vehicle Electric System Fault and Fault-tolerant Technology  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Electric vehicle drive system is a multi-variable function, running environment complexed and changeable system, so it’s failure form is complicated. In this paper, according to the fault happens in different position, establish vehicle fault table, analyze the consequences of failure may cause and the causes of failure. Combined with hardware limitations, and the maximum guarantee system performance requirements, passive software redundancy fault-tolerant strategy is put forward, give an e...

Zhang Liwei; Huang Xianjin; Yang Yannan; Xu Chen; Liu Jie

2013-01-01

22

A fault tolerant process control system for PHWRs  

International Nuclear Information System (INIS)

Many computer control application have stringent requirements for continued correct operation of the control system in the presence of internal faults. There is undoubtedly a great need in the power industry, particularly in the nuclear segment, for reliable control and monitoring systems. One must guarantee safe control and avoid false alarm that can spuriously cause plant shutdown when it is not necessary. This paper discusses the issues involved in the design of fault tolerant systems and describes the features of a fault-tolerant process control system being developed for use in future PHWRs. The processes selected for this computer control application include primary coolant and steam generator pressures. (author). 8 refs., 1 fig

23

Industrial Cost-Benefit Assessment for Fault-tolerant Control Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Economic aspects are decisive for industrial acceptance of research concepts including the promising ideas in fault tolerant control. Fault tolerance is the ability of a system to detect, isolate and accommodate a fault, such that simple faults in a sub-system do not develop into failures at a system level. In a design phase for an industrial system, possibilities span from fail safe design where any single point failure is accommodated by hardware, over fault-tolerant design where selected f...

Thybo, C.; Blanke, M.

1998-01-01

24

A Fault Tolerant System for an Integrated Avionics Sensor Configuration  

Science.gov (United States)

An aircraft sensor fault tolerant system methodology for the Transport Systems Research Vehicle in a Microwave Landing System (MLS) environment is described. The fault tolerant system provides reliable estimates in the presence of possible failures both in ground-based navigation aids, and in on-board flight control and inertial sensors. Sensor failures are identified by utilizing the analytic relationships between the various sensors arising from the aircraft point mass equations of motion. The estimation and failure detection performance of the software implementation (called FINDS) of the developed system was analyzed on a nonlinear digital simulation of the research aircraft. Simulation results showing the detection performance of FINDS, using a dual redundant sensor compliment, are presented for bias, hardover, null, ramp, increased noise and scale factor failures. In general, the results show that FINDS can distinguish between normal operating sensor errors and failures while providing an excellent detection speed for bias failures in the MLS, indicated airspeed, attitude and radar altimeter sensors.

Caglayan, A. K.; Lancraft, R. E.

1984-01-01

25

Piecewise Sliding Mode Decoupling Fault Tolerant Control System  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Problem statement: Proposed method in the present study could deal with fault tolerant control system by using the so called decentralized control theory with decoupling fashion sliding mode control, dealing with subsystems instead of whole system and to the knowledge of the author there is no known computational algorithm for decentralized case, Approach: In this study we present a decoupling strategy based on the selection of sliding surface, which should be in piecewise slidi...

Rafi Youssef; Hui Peng

2010-01-01

26

Fault Tolerance by Replication in Parallel System  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper the author has concentrated on architecture of a cluster computer and the working of them in context with parallel paradigms. Author has a keen interest on guaranteeing the working of a node efficiently and the data on it should be available at any time to run the task in parallel. The applications while running may face resource faults during execution. The application must dynamically do something to prepare for, and recover from, the expected failure. Typi...

Madhavi Vaidya

2011-01-01

27

A Game-theoretic Approach for Synthesizing Fault-Tolerant Embedded Systems  

CERN Document Server

In this paper, we present an approach for fault-tolerant synthesis by combining predefined patterns for fault-tolerance with algorithmic game solving. A non-fault-tolerant system, together with the relevant fault hypothesis and fault-tolerant mechanism templates in a pool are translated into a distributed game, and we perform an incomplete search of strategies to cope with undecidability. The result of the game is translated back to executable code concretizing fault-tolerant mechanisms using constraint solving. The overall approach is implemented to a prototype tool chain and is illustrated using examples.

Cheng, Chih-Hong; Knoll, Alois; Buckl, Christian

2010-01-01

28

Safety Reliability Enhancement in Fault tolerant Automotive Embedded System  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Reliability is control and prevention of failures to reduce failure and improve operations by enhancing performance with system-level analysis and modelling are needed not only for predictability and comparability when partitioning end-to-end functions at design time levels of reliability. Reliability numbers by themselves will not motivate improvements, performance of two fault tolerant mechanisms dealing with repairable and non-repairable components that have failed. The improvement in the ...

Balachandra Pattanaik,; Chandrasekaran, S.

2013-01-01

29

Summarize of Electric Vehicle Electric System Fault and Fault-tolerant Technology  

Directory of Open Access Journals (Sweden)

Full Text Available Electric vehicle drive system is a multi-variable function, running environment complexed and changeable system, so it’s failure form is complicated. In this paper, according to the fault happens in different position, establish vehicle fault table, analyze the consequences of failure may cause and the causes of failure. Combined with hardware limitations, and the maximum guarantee system performance requirements, passive software redundancy fault-tolerant strategy is put forward, give an example to analysis the pros and cons of this method.

Zhang Liwei

2013-09-01

30

A Fault Tolerant Mobile Agent Information Retrieval System  

Directory of Open Access Journals (Sweden)

Full Text Available Problem statement: Most of the information retrieval systems used only client-server architectures. The client-server model though powerful, had some limitations. In mobile computing environment which has both wired network and wireless networks with limited communication capabilities, the performance of the system was very low. Approach: Mobile agents are considered a suitable technology to develop applications such as information retrieval system for mobile computing environment. Mobile agents are autonomous and dynamic entities that can migrate between various nodes in the network. They offer many advantages over traditional design methodologies like: reduction in network load, overcoming network latency and disconnected operations. Since the mobile agents do not need continuous communication with the mobile host, they are not affected by the sudden disconnection of wireless network and the situation of turning mobile host off for power saving. In order to get the complete benefit of mobile agent system, the system must be fault tolerant. In the context of mobile agents, fault-tolerance prevents a partial or complete loss of the agent. Results: Our system in mobile computing environment ensured that the agent arrived at its destination with result and performance of the system improved by the way of reduction in the response time. And also, the system allowed sending more requests by the way of creating many mobile agents without affecting the performance. Conclusion: Our research compared the performance of client-server architecture and fault tolerant mobile agent information retrieval system and proved that our system solved the limitations faced by the client server architecture. The system can also be extended to adhoc networks.

R. Punithavathi

2010-01-01

31

Fault Tolerant Operation in Aero Engine Using Distributed Computation System  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The paper presents fault tolerant operation in an aero engine based on real-time systems which is built for a very small set of mission-critical applications like space craft’s , avionics and other distributed control systems. The modern software deals with external interfaces and has to consider various timing implications The platform is based on the C and developed using Keil MDK tool with the targeted deadline of 100 milliseconds at the baud rate of 500 kbps. CAN interface executes the ...

Neela A G; Prabhu, Shobha S.; Channabassappa Baligar

2014-01-01

32

Fault Tolerant Software: a Multi Agent System Solution  

DEFF Research Database (Denmark)

Development of high dependable systems remains a labour intensive task. This paper explores recent advances on the adaptation of the software agent architecture for control application while looking to dependability issues. Multiple agent systems theory will be reviewed giving methods to supervise it. Software ageing is shown to be the most common problem and rejuvenation its counteract. The paper will show how an agent population can be monitored, faulty agents isolated and reloaded in a healthy state, hence rejuvenated. The aim is to propose an architecture as basis for the design of control software able to tolerate faults and residual bugs without the need of maintenance stops.

Caponetti, Fabio; Bergantino, Nicola

2009-01-01

33

A Ship Propulsion System Model for Fault-tolerant Control  

DEFF Research Database (Denmark)

This report presents a propulsion system model for a low speed marine vehicle, which can be used as a test benchmark for Fault-Tolerant Control purposes. The benchmark serves the purpose of offering realistic and challenging problems relevant in both FDI and (autonomous) supervisory control area. The propulsion system model is presented in two versions: the first one consists of one engine and one propeller, and the othe one consists of two engines and their corresponding propellers placed in parallel in the ship. The corresponding programs are developed and are available.

Izadi-Zamanabadi, Roozbeh; Blanke, M.

1998-01-01

34

Implementation of FMFRS (Fault Tolerant Most fitting Resource Scheduling algorithm in Real time system  

Directory of Open Access Journals (Sweden)

Full Text Available In computational Grid, fault tolerance is an imperative issue to be considered during job scheduling. Due to the widespread use of resources, systems are highly prone to errors and failures. Hence fault tolerance plays a key role in grid to avoid the problem of unreliability. The two main techniques for implementing fault tolerance in grid environment are check pointing and replication. This paper proposes a real time approach to a replication technique named as FMFRS (Fault Tolerant most fitting resource scheduling algorithm to improve the fault tolerance of the fittest resource scheduling algorithm. The proposed method is to improve the fault tolerance by using fittest resource scheduling algorithm, by scheduling the job in coordination with job replication when the resource has low reliability and checking the parameters like Fault Tolerance capacity and Node’s Reliability. Based on the reliability index of the resource, the resource is identified as critical.

Harkiran Kaur

2013-08-01

35

Ship Propulsion System as a Benchmark for Fault-Tolerant Control  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Fault-tolerant control combines fault detection and isolation techniques with supervisory control to achieve autonomous accommodation of faults before they develop into failures. While fault detection and isolation (FDI) methods have matured during the past decade the extension to fault-tolerant control is a fairly new area. The paper presents a ship propulsion system as a benchmark that should be useful as a platform for development of new ideas and comparison of methods. The benchmark has t...

Izadi-zamanabadi, Roozbeh; Blanke, M.

2005-01-01

36

Fault Tolerant Operation in Aero Engine Using Distributed Computation System  

Directory of Open Access Journals (Sweden)

Full Text Available The paper presents fault tolerant operation in an aero engine based on real-time systems which is built for a very small set of mission-critical applications like space craft’s , avionics and other distributed control systems. The modern software deals with external interfaces and has to consider various timing implications The platform is based on the C and developed using Keil MDK tool with the targeted deadline of 100 milliseconds at the baud rate of 500 kbps. CAN interface executes the role of Transportation and Communication, an interface cable used for serial communication between Digital Electronic Control Unit (DECU and the host to transfer data to the pilot Online Monitoring System and that is based on Laboratory Virtual Instrument Engineering Workbench (Lab VIEW 7.1. Fault diagnosis typically assumes a sufficiently large fault signature and enough time for a reliable decision to be reached. However, for a class of safety critical faults on commercial aircraft engines, prompt detection is paramount within a millisecond range to allow accommodation to avert undesired engine behavior. At the same time, false positives must be avoided to prevent inappropriate control action.

Neela A G

2014-04-01

37

Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 2: Army fault tolerant architecture design and analysis  

Science.gov (United States)

Described here is the Army Fault Tolerant Architecture (AFTA) hardware architecture and components and the operating system. The architectural and operational theory of the AFTA Fault Tolerant Data Bus is discussed. The test and maintenance strategy developed for use in fielded AFTA installations is presented. An approach to be used in reducing the probability of AFTA failure due to common mode faults is described. Analytical models for AFTA performance, reliability, availability, life cycle cost, weight, power, and volume are developed. An approach is presented for using VHSIC Hardware Description Language (VHDL) to describe and design AFTA's developmental hardware. A plan is described for verifying and validating key AFTA concepts during the Dem/Val phase. Analytical models and partial mission requirements are used to generate AFTA configurations for the TF/TA/NOE and Ground Vehicle missions.

Harper, R. E.; Alger, L. S.; Babikyan, C. A.; Butler, B. P.; Friend, S. A.; Ganska, R. J.; Lala, J. H.; Masotto, T. K.; Meyer, A. J.; Morton, D. P.

1992-01-01

38

Application-Transparent Fault Tolerance in Distributed Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

We present a new software architecture in which all concepts necessary to achieve fault tolerance can be added to an appli- cation automatically without any source code changes. As a case study, we consider the problem of providing a reliable service despite node failures by executing a group of replicat- ed servers. Replica creation and management as well as fail- ure detection and recovery are performed automatically by a separate fault tolerance layer (ft-layer) which is inserted be- tween...

Becker, Thomas

1999-01-01

39

Diagnostic software and fault tolerant microprocessor based system architectures  

International Nuclear Information System (INIS)

In numerous industrial applications including power generation, the availability of electronic systems to perform the tasks assigned has become a major issue. At the same time, the functional complexity of these systems has increased enormously. Fortunately, the arrival of cost effective microprocessor based hardware has given the system designer a cadre of techniques to ensure the desired degree of system integrity and availability. These include: dynamic redundancy, isolation, functional diversity, built-in self-tests, embedded test subsystems, communications, error checking and error correcting codes, etc. The choice among the available techniques is generally heuristic and depends greatly on the structure of major components and systems external to the electronic system itself as well as the postulated faults and their relative frequency. Indiscriminate use of these techniques will inevitably increase cost and reduce maintainability while actually reducing system availability and reliability. The issues and the application of these techniques are discussed by describing recent examples of fault tolerant microprocessor based system architectures which include the Plant Safety Monitoring System, the EAGLE-21 Process Protection System and the Advanced Rod Position Indication System for pressurized water reactors. Each of these systems utilize unique internal architectures that address the reliability, availability, and the communications issues while improving maintainability and man-machine interfaces

40

Model Driven Configuration of Fault Tolerance Solutions for Component-Based Software System  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Fault tolerance is very important for complex component-based software systems, but its configuration is complicated and challenging. In this paper, we propose a model driven approach to semi-automatic configuration of fault tolerance solutions. At design time, a set of reusable fault tolerance solu-tions are modeled as architecture styles, with the key properties verified by model checking. At runtime, the runtime software architecture of the target sys-tem is automatically constructed by th...

Wu, Yihan; Huang, Gang; Song, Hui; Zhang, Ying

2012-01-01

 
 
 
 
41

Application-level fault tolerance in real-time embedded systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Critical real-time embedded systems need to make use of fault tolerance techniques to cope with operation time errors, either in hardware or software. Fault tolerance is usually applied by means of redundancy and diversity. Redundant hardware implies the establishment of a distributed system executing a set of fault tolerance strategies by software, and may also employ some form of diversity, by using different variants or versions for the same processing. This work proposes and evaluates ...

Afonso, Francisco; Silva, Carlos A.; Tavares, Adriano; Montenegro, Se?rgio

2008-01-01

42

Fault Tolerance in Real Time Multiprocessors - Embedded Systems  

CERN Document Server

All real time tasks which are termed as critical tasks by nature have to complete its execution before its deadline, even in presence of faults. The most popularly used real time task assignment algorithms are First Fit (FF), Best Fit (BF), Bin Packing (BP).The common task scheduling algorithms are Rate Monotonic (RM), Earliest Deadline First (EDF) etc.All the current approaches deal with either fault tolerance or criticality in real time. In this paper we have proposed an integrated approach with a new algorithm, called SASA (Sorting And Sequential Assignment) which maps the real time task assignment with task schedule and fault tolerance

Persya, A Christy

2010-01-01

43

Piecewise Sliding Mode Decoupling Fault Tolerant Control System  

Directory of Open Access Journals (Sweden)

Full Text Available Problem statement: Proposed method in the present study could deal with fault tolerant control system by using the so called decentralized control theory with decoupling fashion sliding mode control, dealing with subsystems instead of whole system and to the knowledge of the author there is no known computational algorithm for decentralized case, Approach: In this study we present a decoupling strategy based on the selection of sliding surface, which should be in piecewise sliding surface partition to apply the PwLTool which have as purpose in our case to delimit regions where sliding mode occur, after that as Results: We get a simple linearized model selected in those regions which could depict the complex system, Conclusion: With the 3 water tank level system as example we implement this new design scenario and since we are interested in networked control system we believe that this kind of controller implementation will not be affected by network delays.

Rafi Youssef

2010-01-01

44

Fault-Tolerant Control using Adaptive Time-Frequency Method in Bearing Fault Detection for DFIG Wind Energy System  

Digital Repository Infrastructure Vision for European Research (DRIVER)

With the advances of power electronic technology, doubly-fed induction generators (DFIG) have increasingly drawn the interest of wind turbine industries. To ensure the reliable operation and power quality of wind power systems, the fault-tolerant control for DFIG is studied in this paper. The fault-tolerant controller is design to maintain acceptable performance during bearing fault condition. Based on measured motor currents data, an adaptive statistical time-frequency method is then used to...

Korkua, Suratsavadee Koonlaboon

2015-01-01

45

A fault tolerant superheat control strategy for supermarket refrigeration systems  

DEFF Research Database (Denmark)

In this paper, a fault tolerant control (FTC) strategy is proposed for evaporator superheat control in supermarket refrigeration systems. Conventional control uses a pressure and temperature sensor for this purpose, however, the pressure sensor can fail to function. A contingency control strategy, based on a maximum slope-seeking control method and only a single temperature sensor, is developed to drive the evaporator outlet temperature to a level that gives a suitable superheat of the refrigerant. The FTC strategy requires no a priori system knowledge or additional hardware and functions in a plug & play fashion. The strategy is outlined by means of procedural steps as well as a flow chart that also illustrates the process of automatic tuning of the maximum slope-seeking controller. Test results are furthermore presented for a display case in a full scale CO2 supermarket refrigeration system.

Vinther, Kasper; Izadi-Zamanabadi, Roozbeh

2013-01-01

46

Reliability modeling of digital component in plant protection system with various fault-tolerant techniques  

International Nuclear Information System (INIS)

Highlights: • Integrated fault coverage is introduced for reflecting characteristics of fault-tolerant techniques in the reliability model of digital protection system in NPPs. • The integrated fault coverage considers the process of fault-tolerant techniques from detection to fail-safe generation process. • With integrated fault coverage, the unavailability of repairable component of DPS can be estimated. • The new developed reliability model can reveal the effects of fault-tolerant techniques explicitly for risk analysis. • The reliability model makes it possible to confirm changes of unavailability according to variation of diverse factors. - Abstract: With the improvement of digital technologies, digital protection system (DPS) has more multiple sophisticated fault-tolerant techniques (FTTs), in order to increase fault detection and to help the system safely perform the required functions in spite of the possible presence of faults. Fault detection coverage is vital factor of FTT in reliability. However, the fault detection coverage is insufficient to reflect the effects of various FTTs in reliability model. To reflect characteristics of FTTs in the reliability model, integrated fault coverage is introduced. The integrated fault coverage considers the process of FTT from detection to fail-safe generation process. A model has been developed to estimate the unavailability of repairable component of DPS using the integrated fault coverage. The new developed model can quantify unavailability according to a diversity of conditions. Sensitivity studies are performed to ascertain important variables which affect the integrated fault coverage and unavailability

47

Energy-Aware Synthesis of Fault-Tolerant Schedules for Real-Time Distributed Embedded Systems  

DEFF Research Database (Denmark)

This paper presents a design optimisation tool for distributed embedded real-time systems that 1) decides mapping, fault-tolerance policy and generates a fault-tolerant schedule, 2) is targeted for hard real-time, 3) has hard reliability goal, 4) generates static schedule for processes and messages, 5) provides fault-tolerance for k transient/soft faults, 6) optimises for minimal energy consumption, while considering impact of lowering voltages on the probability of faults, 7) uses constraint logic programming (CLP) based implementation.

Poulsen, Kåre Harbo; Pop, Paul

2007-01-01

48

Fault diagnosis and fault-tolerant control strategies for non-linear systems analytical and soft computing approaches  

CERN Document Server

  This book presents selected fault diagnosis and fault-tolerant control strategies for non-linear systems in a unified framework. In particular, starting from advanced state estimation strategies up to modern soft computing, the discrete-time description of the system is employed Part I of the book presents original research results regarding state estimation and neural networks for robust fault diagnosis. Part II is devoted to the presentation of integrated fault diagnosis and fault-tolerant systems. It starts with a general fault-tolerant control framework, which is then extended by introducing robustness with respect to various uncertainties. Finally, it is shown how to implement the proposed framework for fuzzy systems described by the well-known Takagi–Sugeno models. This research monograph is intended for researchers, engineers, and advanced postgraduate students in control and electrical engineering, computer science,as well as mechanical and chemical engineering.

Witczak, Marcin

2014-01-01

49

Evaluation of digital fault-tolerant architectures for nuclear power plant control systems  

Energy Technology Data Exchange (ETDEWEB)

Four fault tolerant architectures were evaluated for their potential reliability in service as control systems of nuclear power plants. The reliability analyses showed that human- and software-related common cause failures and single points of failure in the output modules are dominant contributors to system unreliability. The four architectures are triple-modular-redundant (TMR), both synchronous and asynchronous, and also dual synchronous and asynchronous. The evaluation includes a review of design features, an analysis of the importance of coverage, and reliability analyses of fault tolerant systems. An advantage of fault-tolerant controllers over those not fault tolerant, is that fault-tolerant controllers continue to function after the occurrence of most single hardware faults. However, most fault-tolerant controllers have single hardware components that will cause system failure, almost all controllers have single points of failure in software, and all are subject to common cause failures. Reliability analyses based on data from several industries that have fault-tolerant controllers were used to estimate the mean-time-between-failures of fault-tolerant controllers and to predict those failures modes that may be important in nuclear power plants. 7 refs., 4 tabs.

Battle, R.E.

1990-01-28

50

Evaluation of digital fault-tolerant architectures for nuclear power plant control systems  

International Nuclear Information System (INIS)

Four fault tolerant architectures were evaluated for their potential reliability in service as control systems of nuclear power plants. The reliability analyses showed that human- and software-related common cause failures and single points of failure in the output modules are dominant contributors to system unreliability. The four architectures are triple-modular-redundant (TMR), both synchronous and asynchronous, and also dual synchronous and asynchronous. The evaluation includes a review of design features, an analysis of the importance of coverage, and reliability analyses of fault tolerant systems. An advantage of fault-tolerant controllers over those not fault tolerant, is that fault-tolerant controllers continue to function after the occurrence of most single hardware faults. However, most fault-tolerant controllers have single hardware components that will cause system failure, almost all controllers have single points of failure in software, and all are subject to common cause failures. Reliability analyses based on data from several industries that have fault-tolerant controllers were used to estimate the mean-time-between-failures of fault-tolerant controllers and to predict those failures modes that may be important in nuclear power plants. 7 refs., 4 tabs

51

Conception and Implementation of an Agreement Protocol for Fault-Tolerant Automotive Embedded Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Safety-relevant automotive systems have particularly high requirements for fault-tolerance, especially in the absence of a mechanical backup, such as for X-by-Wire systems. The replication of components, called structural redundancy, is very often a way to ensure that these systems are free from single points of failure and, hence, fault-tolerant. However, the use of redundancies also implies undesirable effects which make the masking out of faults difficult. Agreement protocols are protocol-...

Limam, Mourad

2005-01-01

52

Passive Fault Tolerant Control of Piecewise Affine Systems Based on H Infinity Synthesis  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper we design a passive fault tolerant controller against actuator faults for discretetime piecewise affine (PWA) systems. By using dissipativity theory and H analysis, fault tolerant state feedback controller design is expressed as a set of Linear Matrix Inequalities (LMIs). In the current paper, the PWA system switches not only due to the state but also due to the control input. The method is applied on a large scale livestock ventilation model.

Gholami, Mehdi; Cocquempot, Vincent; Schiøler, Henrik; Bak, Thomas

2011-01-01

53

Optimal structure of fault-tolerant software systems  

International Nuclear Information System (INIS)

This paper considers software systems consisting of fault-tolerant components. These components are built from functionally equivalent but independently developed versions characterized by different reliability and execution time. Because of hardware resource constraints, the number of versions that can run simultaneously is limited. The expected system execution time and its reliability (defined as probability of obtaining the correct output within a specified time) strictly depend on parameters of software versions and sequence of their execution. The system structure optimization problem is formulated in which one has to choose software versions for each component and find the sequence of their execution in order to achieve the greatest system reliability subject to cost constraints. The versions are to be chosen from a list of available products. Each version is characterized by its reliability, execution time and cost. The suggested optimization procedure is based on an algorithm for determining system execution time distribution that uses the moment generating function approach and on the genetic algorithm. Both N-version programming and the recovery block scheme are considered within a universal model. Illustrated example is presented

54

Design Optimization of Time- and Cost-Constrained Fault-Tolerant Embedded Systems with Checkpointing and Replication  

Digital Repository Infrastructure Vision for European Research (DRIVER)

We present an approach to the synthesis of fault-tolerant hard real-time systems for safety-critical applications. We use checkpointing with rollback recovery and active replication for tolerating transient faults. Processes and communications are statically scheduled. Our synthesis approach decides the assignment of fault-tolerance policies to processes, the optimal placement of checkpoints and the mapping of processes to processors such that multiple transient faults are tolerated and the t...

Pop, Paul; Izosimov, Viacheslav; Eles, Petru; Peng, Zebo

2008-01-01

55

Designing fault-tolerant distributed archives for picture archiving and communication systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Purpose: Distributed archives in a picture archiving and communication system (PACS) environment can provide added fault tolerance and fail-over capability, as well as increased load capacity at a more economical price than traditional “high-availability” systems. Systems can be configured with varying levels of fault tolerance, depending on the amountof redundancy desired. There is, however, a direct correlation between the level of hardware redundancy and cost to implement. This present...

Mendenhall, Rebecca; Dewey, Matt; Soutar, Ian

2001-01-01

56

Proactive Service Migration for Long-Running Byzantine Fault Tolerant Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper, we describe a novel proactive recovery scheme based on service migration for long-running Byzantine fault tolerant systems. Proactive recovery is an essential method for ensuring long term reliability of fault tolerant systems that are under continuous threats from malicious adversaries. The primary benefit of our proactive recovery scheme is a reduced vulnerability window. This is achieved by removing the time-consuming reboot step from the critical path of p...

Zhao, Wenbing

2008-01-01

57

Award ER25750: Coordinated Infrastructure for Fault Tolerance Systems Indiana University Final Report  

Energy Technology Data Exchange (ETDEWEB)

The main purpose of the Coordinated Infrastructure for Fault Tolerance in Systems initiative has been to conduct research with a goal of providing end-to-end fault tolerance on a systemwide basis for applications and other system software. While fault tolerance has been an integral part of most high-performance computing (HPC) system software developed over the past decade, it has been treated mostly as a collection of isolated stovepipes. Visibility and response to faults has typically been limited to the particular hardware and software subsystems in which they are initially observed. Little fault information is shared across subsystems, allowing little flexibility or control on a system-wide basis, making it practically impossible to provide cohesive end-to-end fault tolerance in support of scientific applications. As an example, consider faults such as communication link failures that can be seen by a network library but are not directly visible to the job scheduler, or consider faults related to node failures that can be detected by system monitoring software but are not inherently visible to the resource manager. If information about such faults could be shared by the network libraries or monitoring software, then other system software, such as a resource manager or job scheduler, could ensure that failed nodes or failed network links were excluded from further job allocations and that further diagnosis could be performed. As a founding member and one of the lead developers of the Open MPI project, our efforts over the course of this project have been focused on making Open MPI more robust to failures by supporting various fault tolerance techniques, and using fault information exchange and coordination between MPI and the HPC system software stack?from the application, numeric libraries, and programming language runtime to other common system components such as jobs schedulers, resource managers, and monitoring tools.

Lumsdaine, Andrew

2013-03-08

58

Fault tolerant control design of nonlinear systems using LMI gain synthesis  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper, an active Fault Tolerant Control (FTC) strategy is developed to nonlinear systems described by multiple linear models to prevent the system deterioration by the synthesis of adapted controllers. By considering that Fault Detection, Isolation (FDI) and estimation is realized, the synthesis of an appropriate combination of predesigned gains is performed. The main contribution concerns the design of state feedback gains through LMI both in fault-free and faulty cases in order to p...

Rodrigues, Mickael; Theilliol, Didier; Sauter, Dominique

2005-01-01

59

Towards Fault-Tolerant Quantum Computation with Higher-Dimensional Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The main focus of this thesis is to explore the advantages of using higher-dimensional quantum systems (qudits) as building blocks for fault-tolerant quantum computation. In particular, we investigate the two main essential ingredients of many state-of-the-art fault-tolerant schemes [133], which are magic state distillation and topological error correction. The theory for both of these components is well established for the qubit case, but little has been known for the generalised qudit case....

Anwar, H.

2014-01-01

60

Application-driven co-design of fault-tolerant industrial systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper presents a novel methodology for the HW/SW co-design of fault tolerant embedded systems that pursues the mitigation of radiation-induced upset events (which are a class of Single Event Effects - SEEs) on critical industrial applications. The proposal combines the flexibility and low cost of Software Implemented Hardware Fault Tolerance (SIHFT) techniques with the high reliability of selective hardware replication. The co-design flow is supported by a hardening platform that compris...

Restrepo Calle, Felipe; Marti?nez A?lvarez, Antonio; Guzma?n Miranda, Hipo?lito; Palomo Pinto, Francisco Rogelio; Cuenca Asensi, Sergio

2010-01-01

 
 
 
 
61

Industrial Cost-Benefit Assessment for Fault-tolerant Control Systems  

DEFF Research Database (Denmark)

Economic aspects are decisive for industrial acceptance of research concepts including the promising ideas in fault tolerant control. Fault tolerance is the ability of a system to detect, isolate and accommodate a fault, such that simple faults in a sub-system do not develop into failures at a system level. In a design phase for an industrial system, possibilities span from fail safe design where any single point failure is accommodated by hardware, over fault-tolerant design where selected faults are handled without extra hardware, to fault-ignorant design where no extra precaution is taken against failure. The paper describes the assessments needed to find the right path for new industrial designs. The economic decisions in the design phase are discussed: cost of different failures, profits associated with available benefits, investments needed for development and life-time support. The objective of this paper is to help, in the early product development state, to find the economical most suitable scheme. Asalient result is that with increased customer awareness of total cost of ownership, new products can benefit significantly from applying fault tolerant control principles.

Thybo, C.; Blanke, M.

1998-01-01

62

Reliability Evaluation Methodologies of Fault Tolerant Techniques of Digital I and C Systems in Nuclear Power Plants  

International Nuclear Information System (INIS)

Since the reactor protection system was replaced from analog to digital, digital reactor protection system has 4 redundant channels and each channel has several modules. It is necessary for various fault tolerant techniques to improve availability and reliability due to using complex components in DPPS. To use the digital system, it is necessary to improve the reliability and availability of a system through fault-tolerant techniques. Several researches make an effort to effects of fault tolerant techniques. However, the effects of fault tolerant techniques have not been properly considered yet in most fault tree models. Various fault-tolerant techniques, which used in digital system in NPPs, should reflect in fault tree analysis for getting lower system unavailability and more reliable PSA. When fault-tolerant techniques are modeled in fault tree, categorizing the module to detect by each fault tolerant techniques, fault coverage, detection period and the fault recovery should be considered. Further work will concentrate on various aspects for fault tree modeling. We will find other important factors, and found a new theory to construct the fault tree model

63

Fault-diagnosis applications. Model-based condition monitoring. Acutators, drives, machinery, plants, sensors, and fault-tolerant systems  

Energy Technology Data Exchange (ETDEWEB)

Supervision, condition-monitoring, fault detection, fault diagnosis and fault management play an increasing role for technical processes and vehicles in order to improve reliability, availability, maintenance and lifetime. For safety-related processes fault-tolerant systems with redundancy are required in order to reach comprehensive system integrity. This book is a sequel of the book ''Fault-Diagnosis Systems'' published in 2006, where the basic methods were described. After a short introduction into fault-detection and fault-diagnosis methods the book shows how these methods can be applied for a selection of 20 real technical components and processes as examples, such as: Electrical drives (DC, AC) Electrical actuators Fluidic actuators (hydraulic, pneumatic) Centrifugal and reciprocating pumps Pipelines (leak detection) Industrial robots Machine tools (main and feed drive, drilling, milling, grinding) Heat exchangers Also realized fault-tolerant systems for electrical drives, actuators and sensors are presented. The book describes why and how the various signal-model-based and process-model-based methods were applied and which experimental results could be achieved. In several cases a combination of different methods was most successful. The book is dedicated to graduate students of electrical, mechanical, chemical engineering and computer science and for engineers. (orig.)

Isermann, Rolf [Technische Univ. Darmstadt (DE). Inst. fuer Automatisierungstechnik (IAT)

2011-07-01

64

Fault-Tolerant Consensus of Multi-Agent System With Distributed Adaptive Protocol.  

Science.gov (United States)

In this paper, fault-tolerant consensus in multi-agent system using distributed adaptive protocol is investigated. Firstly, distributed adaptive online updating strategies for some parameters are proposed based on local information of the network structure. Then, under the online updating parameters, a distributed adaptive protocol is developed to compensate the fault effects and the uncertainty effects in the leaderless multi-agent system. Based on the local state information of neighboring agents, a distributed updating protocol gain is developed which leads to a fully distributed continuous adaptive fault-tolerant consensus protocol design for the leaderless multi-agent system. Furthermore, a distributed fault-tolerant leader-follower consensus protocol for multi-agent system is constructed by the proposed adaptive method. Finally, a simulation example is given to illustrate the effectiveness of the theoretical analysis. PMID:25415998

Chen, Shun; Ho, Daniel W C; Li, Lulu; Liu, Ming

2014-11-14

65

Fault Tolerant Control Systems : a Development Method and Real-Life Case Study  

DEFF Research Database (Denmark)

This thesis considered the development of fault tolerant control systems. The focus was on the category of automated processes that do not necessarily comprise a high number of identical sensors and actuators to maintain safe operation, but still have a potential for improving immunity to component failures. It is often feasible to increase availability for these control loops by designing the control system to perform on-line detection and reconfiguration in case of faults before the safety system makes a close-down of the process. A general development methodology is given in the thesis that carried the control system designer through the steps necessary to consider fault handling in an early design phase. It was shown how an existing control loop with interface to the plant wide control system could be extended with three additional modules to obtain fault tolerance: Fault detection and isolation, remedial action decision, and reconfiguration. The integration of these modules in software were considered. The general methodology covered the analysis, design, and implementation of fault tolerant control systems on an overall level. Two detailed studies were presented, one on fault detection and isolation design and one on design of the decision logic. Two application case studies were used to emphasize practical aspects of both the development methodology and the detailed studies. One was an electro-mechanical actuator in a position control loop for a diesel engine speed governor where the purpose was to avoid a total close-down in case of the most likely faults. The second was a fault tolerant attitude control system for a micro satellite where the operation of the system is mission critical. The purpose was to avoid hazardous effects from faults and maintain operation if possible. A method was introduced that, after a systematic examination of possible component failures, enables analysis of the relationship between failures and their consequences for the system's operation. This fault propagation analysis is based on coarse models of the subsystems describing the reaction to faults, as for example a variable being zero, low or high. Examples were given that illustrate how such models can be established by simple means, and yet provide important information when combined into a complete system. A special achievement was a method to determine how control loops behave in case of faults. This is not straight forward as the system behaviour depends on the character of the feedback. One of the detailed studies were the design of the decision logic in fault handling, realized as state-event machines. Guidelines for the design were provided, based on experience from the two case studies. Methods for verifying correct operation of the decision logic were described, where a completeness check against the fault propagation analysis is able to guarantee coverage of all considered faults. The usage of software tools to support the development process was illustrated with an off-the-shelf product for constraint logic solving and state-event machine analysis. The coarse system models and the decision logic were analyzed with the tool-box and it was shown how an easy analysis could be performed to verify correctness and completeness of the fault handling design. Experience from this study highlights requirements for a dedicated software environment for fault tolerant control systems design. The second detailed study addressed the detection of a fault event and determination of the failed component. A variety of algorithms were compared, based on two fault scenarios in the speed governor actuator setup. One was a position sensor fault and the second was an actuator current fault. The sensor fault detection was trivial, whereas the actuator fault was more challenging. The study demonstrated that many existing methods have a potential to detect and isolate the two faults, but also that the research field still misses a systematic approach to handle realistic problems such as low sampling rate and nonlinear characteristics of the system

BØgh, S.A.

1997-01-01

66

A Log-Scaling Fault Tolerant Agreement Algorithm for a Fault Tolerant MPI  

Energy Technology Data Exchange (ETDEWEB)

The lack of fault tolerance is becoming a limiting factor for application scalability in HPC systems. The MPI does not provide standardized fault tolerance interfaces and semantics. The MPI Forum's Fault Tolerance Working Group is proposing a collective fault tolerant agreement algorithm for the next MPI standard. Such algorithms play a central role in many fault tolerant applications. This paper combines a log-scaling two-phase commit agreement algorithm with a reduction operation to provide the necessary functionality for the new collective without any additional messages. Error handling mechanisms are described that preserve the fault tolerance properties while maintaining overall scalability.

Hursey, Joshua J [ORNL; Naughton, III, Thomas J [ORNL; Vallee, Geoffroy R [ORNL; Graham, Richard L [ORNL

2011-01-01

67

Fault-tolerant interconnection network and image-processing applications for the PASM parallel processing system  

International Nuclear Information System (INIS)

The demand for very high speed data processing coupled with falling hardware costs has made large-scale parallel and distributed computer systems both desirable and feasible. Two modes of parallel processing are single instruction stream-multiple data stream (SIMD) and multiple instruction stream-multiple data stream (MIMD). PASM, a partitionable SIMD/MIMD system, is a reconfigurable multimicroprocessor system being designed for image processing and pattern recognition. An important component of these systems is the interconnection network, the mechanism for communication among the computation nodes and memories. Assuring high reliability for such complex systems is a significant task. Thus, a crucial practical aspect of an interconnection network is fault tolerance. In answer to this need, the Extra Stage Cube (ESC), a fault-tolerant, multistage cube-type interconnection network, is define. The fault tolerance of the ESC is explored for both single and multiple faults, routing tags are defined, and consideration is given to permuting data and partitioning the ESC in the presence of faults. The ESC is compared with other fault-tolerant multistage networks. Finally, reliability of the ESC and an enhanced version of it are investigated

68

System Diagnosis and Fault Tolerance for Distributed Computing System: A Review  

Digital Repository Infrastructure Vision for European Research (DRIVER)

An adaptive system diagnosis fault tolerance method for distributed system. The system is comprised of a network including N nodes where N is integer and greater than equal to 3 and each node is able to execute an algorithm to communicate with the network. A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information. As computer network is a collection of...

Nilotpal Baruah; Saikia, Dr Lakshmi P.; Hemachandran, Dr K.

2013-01-01

69

A novel mathematical setup for fault tolerant control systems with state-dependent failure process  

Science.gov (United States)

In this paper, we consider a fault tolerant control system (FTCS) with state- dependent failures and provide a tractable mathematical model to handle the state-dependent failures. By assuming abrupt changes in system parameters, we use a jump process modelling of failure process and the fault detection and isolation (FDI) process. In particular, we assume that the failure rates of the failure process vary according to which set the state of the system belongs to.

Chitraganti, S.; Aberkane, S.; Aubrun, C.

2014-12-01

70

Distributed Fault-Tolerant Avionic Systems - A Real-Time Perspective  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper examines the problem of introducing advanced forms of fault-tolerance via reconfiguration into safety-critical avionic systems. This is required to enable increased availability after fault occurrence in distributed integrated avionic systems(compared to static federated systems). The approach taken is to identify a migration path from current architectures to those that incorporate re-configuration to a lesser or greater degree. Other challenges identified includ...

Burke, Michael; Audsley, Neil

2010-01-01

71

The Impact of a Fault Tolerant MPI on Scalable Systems Services and Applications  

Energy Technology Data Exchange (ETDEWEB)

Exascale targeted scientific applications must be prepared for a highly concurrent computing environment where failure will be a regular event during execution. Natural and algorithm-based fault tolerance (ABFT) techniques can often manage failures more efficiently than traditional checkpoint/restart techniques alone. Central to many petascale applications is an MPI standard that lacks support for ABFT. The Run-Through Stabilization (RTS) proposal, under consideration for MPI 3, allows an application to continue execution when processes fail. The requirements of scalable, fault tolerant MPI implementations and applications will stress the capabilities of many system services. System services must evolve to efficiently support such applications and libraries in the presence of system component failures. This paper discusses how the RTS proposal impacts system services, highlighting specific requirements. Early experimentation results from Cray systems at ORNL using prototype MPI and runtime implementations are presented. Additionally, this paper outlines fault tolerance techniques targeted at leadership class applications.

Graham, Richard L [ORNL; Hursey, Joshua J [ORNL; Vallee, Geoffroy R [ORNL; Naughton, III, Thomas J [ORNL; Boehm, Swen [ORNL

2012-01-01

72

Preface of the special issue on Advances in Control and Fault-Tolerant Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Today's automatic control systems are of high degrees of integration, complexity, embedding and networking of heterogeneous entities. This trend is driven by the industrial needs for achieving new technical performance and meeting additional performance demands. A most critical and important issue surrounding the design and operation of complex automatic systems is the application of Fault Detection and Isolation and Fault-Tolerant Control (FDI/FTC) technology, aiming at guaranteeing high sys...

Korbicz, Jozef; Maquin, Didier; Theilliol, Didier

2012-01-01

73

Fault Tolerant Control Systems : a Development Method and Real-Life Case Study  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This thesis considered the development of fault tolerant control systems. The focus was on the category of automated processes that do not necessarily comprise a high number of identical sensors and actuators to maintain safe operation, but still have a potential for improving immunity to component failures. It is often feasible to increase availability for these control loops by designing the control system to perform on-line detection and reconfiguration in case of faults before the safety ...

Bøgh, S. A.

2005-01-01

74

Reliability Monitoring of Fault Tolerant Control Systems with Demonstration on an Aircraft Model  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper proposes a reliability monitoring scheme for active fault tolerant control systems using a stochastic modeling method. The reliability index is defined based on system dynamical responses and a safety region; the plant and controller are assumed to have a multiple regime model structure, and a semi-Markov model is built for reliability evaluation based on the safety behavior of each regime model estimated by using Monte Carlo simulation. Moreover, the history data of fault detectio...

Hongbin Li; Qing Zhao; Zhenyu Yang

2008-01-01

75

Fault tolerant control based on active fault diagnosis  

Digital Repository Infrastructure Vision for European Research (DRIVER)

An active fault diagnosis (AFD) method will be considered in this paper in connection with a Fault Tolerant Control (FTC) architecture based on the YJBK parameterization of all stabilizing controllers. The architecture consists of a fault diagnosis (FD) part and a controller reconfiguration (CR) part. The FTC architecture can be applied for additive faults, parametric faults, and for system structural changes. Only parametric faults will be considered in this paper. The...

Niemann, Hans Henrik

2006-01-01

76

Fault detection and fault tolerant control of a smart base isolation system with magneto-rheological damper  

International Nuclear Information System (INIS)

Fault detection and isolation (FDI) in real-time systems can provide early warnings for faulty sensors and actuator signals to prevent events that lead to catastrophic failures. The main objective of this paper is to develop FDI and fault tolerant control techniques for base isolation systems with magneto-rheological (MR) dampers. Thus, this paper presents a fixed-order FDI filter design procedure based on linear matrix inequalities (LMI). The necessary and sufficient conditions for the existence of a solution for detecting and isolating faults using the H? formulation is provided in the proposed filter design. Furthermore, an FDI-filter-based fuzzy fault tolerant controller (FFTC) for a base isolation structure model was designed to preserve the pre-specified performance of the system in the presence of various unknown faults. Simulation and experimental results demonstrated that the designed filter can successfully detect and isolate faults from displacement sensors and accelerometers while maintaining excellent performance of the base isolation technology under faulty conditions

77

Design of fault tolerant control system for steam generator using fuzzy logic  

International Nuclear Information System (INIS)

A controller and sensor fault tolerant system for a steam generator is designed with fuzzy logic. A structure of the proposed fault tolerant redundant system is composed of a supervisor and two fuzzy weighting modulators. A supervisor alternatively checks a controller and a sensor induced performances to identify which part, a controller or a sensor, is faulty. In order to analyze controller induced performance both an error and a change in error of the system output are chosen as fuzzy variables. The fuzzy logic for a sensor induced performance uses two variables : a deviation between two sensor outputs and its frequency. Fuzzy weighting modulator generates an output signal compensated for faulty input signal. Simulations show that the proposed fault tolerant control scheme for a stem generator regulates well water level by suppressing fault effect of either controllers or sensors. Therefore through duplicating sensors and controllers with the proposed fault tolerant scheme, both a reliability of a steam generator control and sensor system and that of a power plant increase even more

78

Fault tolerance control of phase current in permanent magnet synchronous motor control system  

Science.gov (United States)

As the Photoelectric tracking system develops from earth based platform to all kinds of moving platform such as plane based, ship based, car based, satellite based and missile based, the fault tolerance control system of phase current sensor is studied in order to detect and control of failure of phase current sensor on a moving platform. By using a DC-link current sensor and the switching state of the corresponding SVPWM inverter, the failure detection and fault control of three phase current sensor is achieved. Under such conditions as one failure, two failures and three failures, fault tolerance is able to be controlled. The reason why under the method, there exists error between fault tolerance control and actual phase current, is analyzed, and solution to weaken the error is provided. The experiment based on permanent magnet synchronous motor system is conducted, and the method is proven to be capable of detecting the failure of phase current sensor effectively and precisely, and controlling the fault tolerance simultaneously. With this method, even though all the three phase current sensors malfunction, the moving platform can still work by reconstructing the phase current of the motor.

Chen, Kele; Chen, Ke; Chen, Xinglong; Li, Jinying

2014-08-01

79

Stability Guaranteed Active Fault-Tolerant Control of Networked Control Systems  

Directory of Open Access Journals (Sweden)

Full Text Available The stability guaranteed active fault-tolerant control against actuators failures and plant uncertainties in networked control systems (NCSs is addressed. A detailed design procedure is formulated as a convex optimization problem which can be efficiently solved by existing software. An illustrative example is given to show the efficiency of the proposed method for network-based control for uncertain systems.

Shanbin Li

2008-03-01

80

AN ARCHITECTURE FOR ACTIVE FAULT TOLERANT CONTROL SYSTEMS - APPLICATION TO A LARGE TRANSPORT AIRCRAFT  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This thesis discusses the design of an active Fault Tolerant Control (FTC) strategy for improvement of the operational control capability of the safety critical systems. The FTC strategy works in such a way that once a fault is detected by the Fault Detection and Isolation (FDI) unit, a compensation loop is activated for safe recovery. A key feature of the proposed strategy is that the design of the FTC loop is done independently of the nominal control law already in place. For a given applic...

Cieslak, Je?ro?me

2007-01-01

 
 
 
 
81

A Fault tolerant Control Supervisory System development Procedurefor Small Satellites : The AAUSAT-II case  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The paper presents a stepwise procedure to develop a fault tolerant control system for small satellites. The procedure is illustrated through implementation on the AAUSAT-II spacecraft. As it is shown the presented procedure requires expertise from several disciplines that are nevertheless necessary for obtaining a complete and consistent solution.

Izadi-zamanabadi, Roozbeh; Larsen, Jesper Abildgaard

2007-01-01

82

A Fault tolerant Control Supervisory System development Procedurefor Small Satellites : The AAUSAT-II case  

DEFF Research Database (Denmark)

The paper presents a stepwise procedure to develop a fault tolerant control system for small satellites. The procedure is illustrated through implementation on the AAUSAT-II spacecraft. As it is shown the presented procedure requires expertise from several disciplines that are nevertheless necessary for obtaining a complete and consistent solution.

Izadi-Zamanabadi, Roozbeh; Larsen, Jesper Abildgaard

83

BYZANTINE FAULT TOLERANCE MODEL FOR SOAP FAULTS  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The proposed model is to configure Byzantine Fault Tolerance mechanism for every SOAP fault message that is transmitted. The reliability and availability are of major requirements of Web services since they operate in the distributed environment. One of the reliability issues is handling faults. Fault occurs in all the phases of Service Oriented Architecture i.e. during publishing, discovery, composition, binding, and execution. These faults maylead to service downtime, behaves abnormally, an...

Ramachandran, V.; Murugan, S.

2012-01-01

84

Fault Tolerant Computer Architecture  

CERN Document Server

For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes

Sorin, Daniel

2009-01-01

85

Reconfigurable fault-tolerant multiprocessor system for real-time control  

Energy Technology Data Exchange (ETDEWEB)

Real-time control applications place stringent constraints in computers controlling them since the failure of a computer could result in costly damages and even loss of human lives. Fault-tolerant computers, therefore, have been always in high demand in critical avionic and aerospace applications. However, the use of redundancy techniques to achieve fault tolerance in industrial applications has only recently become feasible due to the rapid decrease in cost and increase in performance of microprocessors. As more and more robots are being built to replace human beings in dangerous and difficult tasks, the need for a reliable computer for robotics control increases. This need, in particular, motivated the research described in this dissertation - the design and implementation of a reconfigurable fault-tolerant multiprocessor system (the FREMP system). The FREMP system consists of four processing units (PUs) and three common parallel buses. Each PU is a combination of an Intel 86/30 single board computer and a custom fault detection/masking circuit board (FDM board). A hardware/software combined scheme was devised to detect faults and correct errors. This scheme has shown to be more efficient than software voting while maintaining the flexibility of software approaches. Time-frame scheduling was adopted to schedule tasks for execution.

Kao, M.L.

1986-01-01

86

A Fault Tolerant Colored Petri Net Model for Flexible Manufacturing Systems  

Scientific Electronic Library Online (English)

Full Text Available SciELO Brazil | Language: English Abstract in english This paper introduces an approach based on Colored Petri Nets (CPN) to systematically introduce fault-tolerance in the design of a supervisor for a Flexible Manufacturing System (FMS). The system is modeled by means of Place/Transition nets and then is structurally reduced, resulting in a CPN that i [...] s independent of a specific production route. The introduction of fault tolerance in the design of such a supervisor considers both forward recovery and backward recovery. For forward recovery we anticipate faults in resources in a production route and reschedule the production routes for production orders before the faulty resource is reached. The backward recovery is considered at the level of a resource in such a way that when a faulty resource is fixed, the operation restarts on the last consistent operation executed

Tomaz C., Barros; Jorge C.A. de, Figueiredo; Angelo, Perkusich.

1997-11-01

87

A Hybrid Real-time Fault-tolerant Scheduling Algorithm for Partial Reconfigurable System  

Directory of Open Access Journals (Sweden)

Full Text Available Partial reconfigurable system is an architecture consisting general purpose processors and FPGAs, in which FPGA can be reconfigured in run-time. Based on the architecture, software tasks and hardware tasks that are executed on processor and FPGA respectively co-exist. In this paper, a real-time fault-tolerant scheduling algorithm is proposed to schedule software/hardware hybrid tasks. In the algorithm, the sufficient condition for schedulable hybrid tasks is derived from analyzing system operation conditions when the first deadline is missed, and rollback/recovery and TMR approaches are used respectively to schedule software subtasks and hardware subtasks for fault tolerance. The experimental results demonstrate that all deadlines of accepted hybrid tasks are met and processor’s utilization ratio is increased greatly compared with that of the exiting approaches when multiple faults occur.

Jinyong Yin

2012-11-01

88

Diagnosis and Fault-tolerant Control  

DEFF Research Database (Denmark)

The book presents effective model-based analysis and design methods for fault diagnosis and fault-tolerant control. Architectural and structural models are used to analyse the propagation of the fault through the process, to test the fault detectability and to find the redundancies in the process that can be used to ensure fault tolerance. Design methods for diagnostic systems and fault-tolerant controllers are presented for processes that are described by analytical models, by discrete-event models or that can be dealt with as quantised systems. Four case studies on pilot processes show the applicability of the presented methods. The theoretical results are illustrated by two running examples which are used throughout the book. The book addresses engineering students, engineers in industry and researchers who wish to get a survey over the variety of approaches to process diagnosis and fault-tolerant control.

Blanke, Mogens; Kinnaert, Michel

2003-01-01

89

Fault-Tolerant Heat Exchanger  

Science.gov (United States)

A compact, lightweight heat exchanger has been designed to be fault-tolerant in the sense that a single-point leak would not cause mixing of heat-transfer fluids. This particular heat exchanger is intended to be part of the temperature-regulation system for habitable modules of the International Space Station and to function with water and ammonia as the heat-transfer fluids. The basic fault-tolerant design is adaptable to other heat-transfer fluids and heat exchangers for applications in which mixing of heat-transfer fluids would pose toxic, explosive, or other hazards: Examples could include fuel/air heat exchangers for thermal management on aircraft, process heat exchangers in the cryogenic industry, and heat exchangers used in chemical processing. The reason this heat exchanger can tolerate a single-point leak is that the heat-transfer fluids are everywhere separated by a vented volume and at least two seals. The combination of fault tolerance, compactness, and light weight is implemented in a unique heat-exchanger core configuration: Each fluid passage is entirely surrounded by a vented region bridged by solid structures through which heat is conducted between the fluids. Precise, proprietary fabrication techniques make it possible to manufacture the vented regions and heat-conducting structures with very small dimensions to obtain a very large coefficient of heat transfer between the two fluids. A large heat-transfer coefficient favors compact design by making it possible to use a relatively small core for a given heat-transfer rate. Calculations and experiments have shown that in most respects, the fault-tolerant heat exchanger can be expected to equal or exceed the performance of the non-fault-tolerant heat exchanger that it is intended to supplant (see table). The only significant disadvantages are a slight weight penalty and a small decrease in the mass-specific heat transfer.

Izenson, Michael G.; Crowley, Christopher J.

2005-01-01

90

Diagnosis and Tolerant Strategy of an Open-Switch Fault for T-type Three-Level Inverter Systems  

DEFF Research Database (Denmark)

This paper proposes a new diagnosis method of an open-switch fault and fault-tolerant control strategy for T-type three-level inverter systems. The location of faulty switch can be identified by the average of normalized phase current and the change of the neutral-point voltage. The proposed fault-tolerant strategy is explained by dividing into two cases: the faulty condition of half-bridge switches and the neutral-point switches. The performance of the T-type inverter system improves considerably by the proposed fault tolerant algorithm when a switch fails. The roposed method does not require additional components and complex calculations. Simulation and experimental results verify the feasibility of the proposed fault diagnosis and fault-tolerant control strategy.

Choi, Uimin; Lee, Kyo Beum

2014-01-01

91

Fault Tolerance Mobile Agent System Using Witness Agent in 2-Dimensional Mesh Network  

Directory of Open Access Journals (Sweden)

Full Text Available Mobile agents are computer programs that act autonomously on behalf of a user or its owner and travel through a network of heterogeneous machines. Fault tolerance is important in their itinerary. In this paper, existent methods of fault tolerance in mobile agents are described which they are considered in linear network topology. In the methods three agents are used to fault tolerance by cooperating to each others for detecting and recovering server and agent failure. Three types of agents are: actual agent which performs programs for its owner, witness agent which monitors the actual agent and the witness agent after itself, probe which is sent for recovery the actual agent or the witness agent on the side of the witness agent. Communication mechanism in the methods is message passing between these agents. The methods are considered in linear network. We introduce our witness agent approach for fault tolerance mobile agent systems in Two Dimensional Mesh (2D-Mesh Network. Indeed Our approach minimizes Witness-Dependency in this network and then represents its algorithm.

Ahmad Rostami

2010-09-01

92

Modeling and Design of Fault-Tolerant and Self-Adaptive Reconfigurable Networked Embedded Systems  

Directory of Open Access Journals (Sweden)

Full Text Available Automotive, avionic, or body-area networks are systems that consist of several communicating control units specialized for certain purposes. Typically, different constraints regarding fault tolerance, availability and also flexibility are imposed on these systems. In this article, we will present a novel framework for increasing fault tolerance and flexibility by solving the problem of hardware/software codesign online. Based on field-programmable gate arrays (FPGAs in combination with CPUs, we allow migrating tasks implemented in hardware or software from one node to another. Moreover, if not enough hardware/software resources are available, the migration of functionality from hardware to software or vice versa is provided. Supporting such flexibility through services integrated in a distributed operating system for networked embedded systems is a substantial step towards self-adaptive systems. Beside the formal definition of methods and concepts, we describe in detail a first implementation of a reconfigurable networked embedded system running automotive applications.

Jürgen Teich

2006-06-01

93

Modeling and Design of Fault-Tolerant and Self-Adaptive Reconfigurable Networked Embedded Systems  

Directory of Open Access Journals (Sweden)

Full Text Available Automotive, avionic, or body-area networks are systems that consist of several communicating control units specialized for certain purposes. Typically, different constraints regarding fault tolerance, availability and also flexibility are imposed on these systems. In this article, we will present a novel framework for increasing fault tolerance and flexibility by solving the problem of hardware/software codesign online. Based on field-programmable gate arrays (FPGAs in combination with CPUs, we allow migrating tasks implemented in hardware or software from one node to another. Moreover, if not enough hardware/software resources are available, the migration of functionality from hardware to software or vice versa is provided. Supporting such flexibility through services integrated in a distributed operating system for networked embedded systems is a substantial step towards self-adaptive systems. Beside the formal definition of methods and concepts, we describe in detail a first implementation of a reconfigurable networked embedded system running automotive applications.

Streichert Thilo

2006-01-01

94

To err is robotic, to tolerate immunological: fault detection in multirobot systems.  

Science.gov (United States)

Fault detection and fault tolerance represent two of the most important and largely unsolved issues in the field of multirobot systems (MRS). Efficient, long-term operation requires an accurate, timely detection, and accommodation of abnormally behaving robots. Most existing approaches to fault-tolerance prescribe a characterization of normal robot behaviours, and train a model to recognize these behaviours. Behaviours unrecognized by the model are consequently labelled abnormal or faulty. MRS employing these models do not transition well to scenarios involving temporal variations in behaviour (e.g., online learning of new behaviours, or in response to environment perturbations). The vertebrate immune system is a complex distributed system capable of learning to tolerate the organism's tissues even when they change during puberty or metamorphosis, and to mount specific responses to invading pathogens, all without the need of a genetically hardwired characterization of normality. We present a generic abnormality detection approach based on a model of the adaptive immune system, and evaluate the approach in a swarm of robots. Our results reveal the robust detection of abnormal robots simulating common electro-mechanical and software faults, irrespective of temporal changes in swarm behaviour. Abnormality detection is shown to be scalable in terms of the number of robots in the swarm, and in terms of the size of the behaviour classification space. PMID:25642825

Tarapore, Danesh; Lima, Pedro U; Carneiro, Jorge; Christensen, Anders Lyhne

2015-01-01

95

Architectures for fault-tolerant spacecraft computers  

Science.gov (United States)

This paper summarizes the results of a long-term research program in fault-tolerant computing for spacecraft on-board processing. In response to changing device technology this program has progressed from the design of a fault-tolerant uniprocessor to the development of fault-tolerant distributed computer systems. The unusual requirements of spacecraft computing are described along with the resulting real-time computer architectures. The following aspects of these designs are discussed: (1) architectural features to minimize complexity in the distributed computer system, (2) fault-detection and recovery, (3) techniques to enhance reliability and testability, and (4) design approaches for LSI implementation.

Rennels, D. A.

1978-01-01

96

Guaranteed Cost Fault-tolerant Controller Design of Networked Control Systems under Variable-period Sampling  

Directory of Open Access Journals (Sweden)

Full Text Available This study investigates the problem of integrity against actuator failures for networked control systems under variable-period sampling. Assuming that the distance between any two consecutive sampling instants is less than a given bound, by using the input delay approach, the networked control systems under variable-period sampling are transformed into the continuous-time networked control systems under time-varying delays. Then the existence conditions of guaranteed cost fault-tolerant control law is testified in terms of the Lyapunov stability theory combined with Linear Matrix Inequalities (LMIs. Furthermore, the guaranteed cost fault-tolerant controller gain and the minimization guaranteed cost can be obtained by solving a minimization problem. A numerical simulation example demonstrates the conclusions are feasible and effective. The proposed control method resolves the problems of variable-period sampling and actuator failures, which meets the requirements in industrial networked control systems.

Xuan Li

2009-01-01

97

Fault-Tolerant Identification in Wireless Sensor Networks for Maximizing System Lifetime  

Directory of Open Access Journals (Sweden)

Full Text Available Wireless Sensor Network (WSN is used by manyapplications such as security, command and control andsurveillance monitoring. In all such applications, themain application of WSN is sensing data and retrieval ofdata. There are many WSN systems that are querybased. They give responses in a stipulated time based onthe user’s query word. However, the WSN has possiblesensor faults for it is not reliable and thus the networkenergy level goes down. It results in reduction of lifetimeof network. To overcome the fault tolerance mechanismscan be used to improve reliability of the finding failurenodes and recovered by cluster heads. This paperpresents an algorithm that can effectively increaselifetime of WSN besides satisfying the QoS requirementsof application. Such algorithm is adaptive and also faulttolerant. It uses path and source redundancy and basedon hop-by-hop data delivery. Empirical simulationresults revealed that the proposed system is feasible. Thissystem also proposed the authentication of all kinds ofidentified faults and provides the services in qualitymanner. It increases the data flow and reduces the faults

Middela Shailaja

2012-09-01

98

Fault tolerant safety related computer based process control system for TAPP- 3 and 4  

International Nuclear Information System (INIS)

Computer based control systems for safety related applications in nuclear power plants have to meet not only the functional, performance and interface requirements, but in addition, they have to meet regulatory requirements like enhanced reliability, safety and security. While meeting these stringent requirements, such computer based systems also need to ensure high availability. Availability of these safety related systems has a direct influence on commercial operation of the NPP and on the availability of several megawatts of electrical power to the national grid. Several design features such as fault tolerance, on-line diagnostics and self-supervision etc. are to be incorporated in the computer system architecture, hardware design and software design to meet high reliability and high availability criteria. Reactor Control Division (RCnD) has designed and developed 'Dual Processor Hot Standby' (DPHS) fault tolerant architecture, which not only meets the safety requirements but also provides very high availability. The fault tolerant features of DPHS architecture and the design of Process Control System based on DPHS architecture (DPH5-PCS) for TAPP-3 and 4 are highlighted in this paper. DPH5-PCS for Tarapur Atomic Power Project (TAPP) -3 and 4 regulates Primary Heat Transport (PHT) system pressure, Pressuriser pressure, Pressuriser level, Bleed condenser pressure, Bleed condenser level and Steam generator pressure. (author)

99

FTOS-Verify: Analysis and Verification of Non-Functional Properties for Fault-Tolerant Systems  

CERN Document Server

The focus of the tool FTOS is to alleviate designers' burden by offering code generation for non-functional aspects including fault-tolerance mechanisms. One crucial aspect in this context is to ensure that user-selected mechanisms for the system model are sufficient to resist faults as specified in the underlying fault hypothesis. In this paper, formal approaches in verification are proposed to assist the claim. We first raise the precision of FTOS into pure mathematical constructs, and formulate the deterministic assumption, which is necessary as an extension of Giotto-like systems (e.g., FTOS) to equip with fault-tolerance abilities. We show that local properties of a system with the deterministic assumption will be preserved in a modified synchronous system used as the verification model. This enables the use of techniques known from hardware verification. As for implementation, we develop a prototype tool called FTOS-Verify, deploy it as an Eclipse add-on for FTOS, and conduct several case studies.

Cheng, Chih-Hong; Esparza, Javier; Knoll, Alois

2009-01-01

100

Parallel, fault-tolerant control and diagnostics system for feedwater regulation in PWRS  

International Nuclear Information System (INIS)

The feasibility of software based fault-tolerant feedwater flow control system has been investigated in this study. Although the architecture is not dedicated to a particular task, steam generator water level and differential pressure controllers will be discussed in this paper. In addition to parallel control and diagnostics techniques, an application of artificial neural networks for feedwater flow rate monitoring (to address venturi fouling) is also studied

 
 
 
 
101

An optimal redundancy allocation method for the preliminary design of fault-tolerant aircraft systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This thesis provides a methodological contribution to architecture selection in the preliminary design of fault-tolerant aircraft systems. Therefore, an existing analytical model for safety and reliability analysis has been enhanced by including aspects of redundancy allocation using three conditioned multi-objective optimization algorithms. As objective functions safety and reliability of different failure conditions of complex logics can be considered, beside additional contradictory, summa...

Raksch, Christian

2013-01-01

102

A Study on Fault-Tolerant Software Architecture for COTS-Based Dependable System  

International Nuclear Information System (INIS)

Recently, with the rapid development of digital computers and information processing technologies, nuclear instrument and control (I and C) systems which needs safety-critical function have adopted digital technologies. Also, use of commercial off-the-shelf (COTS) software in safety-critical system has been incremented with several reasons such as economical efficiency and technical problems. But, it requires a considerable integration effort and brings about software quality and safety issues. COTS software is usually provided as a black box that cannot be modified. The biggest problem when we integrate such a product into dependable systems is the reliability of COTS software. There is no guarantee that the software will perform its function correctly. It may have bugs or unidentified components. Recently, the method of software verification and validation (V and V) is accepted as a way to assure the dependability of new-developed safety-critical nuclear I and C software. But, because of the limitation of COTS software, software V and V cant be applied as rigorously as new-developed software. There are considerable attentions into describing software architecture with respect to there dependability properties. In this paper, we present fault-tolerant software architecture using the C2 architectural style. The remainder of the paper is organized as follows: Section 2 discusses background work on the COTS software in nuclear I and C, software fault tolerance and C2 ar and C, software fault tolerance and C2 architectural style. Section 3 describes the architecture for fault-tolerant COTS-based software. Finally, we discuss the conclusion and future work

103

Plan for the Characterization of HIRF Effects on a Fault-Tolerant Computer Communication System  

Science.gov (United States)

This report presents the plan for the characterization of the effects of high intensity radiated fields on a prototype implementation of a fault-tolerant data communication system. Various configurations of the communication system will be tested. The prototype system is implemented using off-the-shelf devices. The system will be tested in a closed-loop configuration with extensive real-time monitoring. This test is intended to generate data suitable for the design of avionics health management systems, as well as redundancy management mechanisms and policies for robust distributed processing architectures.

Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Miner, Paul S.; Koppen, Sandra V.

2008-01-01

104

Robust fault tolerant control based on sliding mode method for uncertain linear systems with quantization.  

Science.gov (United States)

This paper is concerned with the problem of robust fault-tolerant compensation control problem for uncertain linear systems subject to both state and input signal quantization. By incorporating novel matrix full-rank factorization technique with sliding surface design successfully, the total failure of certain actuators can be coped with, under a special actuator redundancy assumption. In order to compensate for quantization errors, an adjustment range of quantization sensitivity for a dynamic uniform quantizer is given through the flexible choices of design parameters. Comparing with the existing results, the derived inequality condition leads to the fault tolerance ability stronger and much wider scope of applicability. With a static adjustment policy of quantization sensitivity, an adaptive sliding mode controller is then designed to maintain the sliding mode, where the gain of the nonlinear unit vector term is updated automatically to compensate for the effects of actuator faults, quantization errors, exogenous disturbances and parameter uncertainties without the need for a fault detection and isolation (FDI) mechanism. Finally, the effectiveness of the proposed design method is illustrated via a model of a rocket fairing structural-acoustic. PMID:23701895

Hao, Li-Ying; Yang, Guang-Hong

2013-09-01

105

Nonlinear, Adaptive and Fault-tolerant Control for Electro-hydraulic Servo Systems  

DEFF Research Database (Denmark)

Fluid power systems have been in use since 1795 with the rst hydraulic press patented by Joseph Bramah and today form the basis of many industries. Electro hydraulic servo systems are uid power systems controlled in closed-loop. They transform reference input signals into a set of movements in hydraulic actuators (cylinders or motors) by the means of hydraulic uid under pressure. With the development of computing power and control techniques during the last few decades, they are used increasingly in many industrial elds which require high actuation forces within limited space. However, despite numerous attractive properties, hydraulic systems are always subject to potential leakages in their components, friction variation in their hydraulic actuators and deciency in their sensors. These violations of normal behaviour reduce the system performances and can lead to system failure if they are not detected early and handled. Moreover, the task of controlling electro hydraulic systems for high performance operations is challenging due to the highly nonlinear behaviour of such systems and the large amount of uncertainties present in their models. This thesis focuses on nonlinear adaptive fault-tolerant control for a representative electro hydraulic servo controlled motion system. The thesis extends existing models of hydraulic systems by considering more detailed dynamics in the servo valve and in the friction inside the hydraulic cylinder. It identies the model parameters using experimental data from a test bed by analysing both the time response to standard input signals and the variation of the outputs with dierent excitation frequencies. The thesis also presents a model that accurately describes the static and dynamic normal behaviour of the system. Further, in this thesis, a fault detector is designed and implemented on the test bed that successfully diagnoses internal or external leakages, friction variations in the actuator or fault related to pressure sensors. The presented algorithm uses the position and pressure measurements to detect and isolate faults, avoiding missed detection and false alarm. The thesis also develops a high performance adaptive nonlinear controller for the hydraulic system which outperforms comparable linear controllers widely used in the industry. Because of the controller adaptivity, uncertainties in the model parameters can be handled. Moreover, a special attention is given to reduce the complexity of the controller in order to demonstrate its real-time implementation. Finally the thesis combines the techniques developed in fault detection and nonlinear control in order to develop an active fault-tolerant controller for electro hydraulic servo systems. In order to maintain overall service and performances as high as possible when a potential fault occurs, the fault-tolerant controlled system prognoses the fault and changes its controller parameters or structure. The consequences of an unexpected fault are avoided, high availability is ensured and the overall safety in electro hydraulic servo systems is increased.

Choux, Martin

2011-01-01

106

Fault tolerance improvement for queuing systems under stress load  

International Nuclear Information System (INIS)

Various kinds of queuing information systems (exchange auctions systems, web servers, SCADA) are faced to unpredictable situations during operation, when information flow that requires being analyzed and processed rises extremely. Such stress load situations often require human (dispatcher's or administrator's) intervention that is the reason why the time of the first denial of service is extremely important. Common queuing systems architecture is described. Existing approaches to computing resource management are considered. A new late-first-denial-of-service resource management approach is proposed

107

Active Fault Tolerant Control of Livestock Stable Ventilation System  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Modern stables and greenhouses are equipped with different components for providing a comfortable climate for animals and plant. A component malfunction may result in loss of production. Therefore, it is desirable to design a control system, which is stable, and is able to provide an acceptable degraded performance even in the faulty case. In this thesis, we have designed such controllers for climate control systems for livestock buildings in three steps: Deriving a model for the climate cont...

Gholami, Mehdi

2011-01-01

108

Fault tolerant control for Takagi-Sugeno nonlinear systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

A first contribution of this thesis is to propose a systematic procedure to deal with the state and parameter estimation for nonlinear time-varying systems. It consists in transforming the original system into a T-S model with unmeasurable premise variables using the sector nonlinearity transformation. Then a joint state and parameter observer is designed and the convergence conditions of the joint state and parameter estimation errors are established. The second contribution of this thesis i...

Bezzaoucha, Souad

2013-01-01

109

Fault-tolerant Agreement in Synchronous Message-passing Systems  

CERN Document Server

The present book focuses on the way to cope with the uncertainty created by process failures (crash, omission failures and Byzantine behavior) in synchronous message-passing systems (i.e., systems whose progress is governed by the passage of time). To that end, the book considers fundamental problems that distributed synchronous processes have to solve. These fundamental problems concern agreement among processes (if processes are unable to agree in one way or another in presence of failures, no non-trivial problem can be solved). They are consensus, interactive consistency, k-set agreement an

Raynal, Michel

2010-01-01

110

The Isis project: Fault-tolerance in large distributed systems  

Science.gov (United States)

This final status report covers activities of the Isis project during the first half of 1992. During the report period, the Isis effort has achieved a major milestone in its effort to redesign and reimplement the Isis system using Mach and Chorus as target operating system environments. In addition, we completed a number of publications that address issues raised in our prior work; some of these have recently appeared in print, while others are now being considered for publication in a variety of journals and conferences.

Birman, Kenneth P.; Marzullo, Keith

1993-01-01

111

Fault-Tolerant Process Control Methods and Applications  

CERN Document Server

Fault-Tolerant Process Control focuses on the development of general, yet practical, methods for the design of advanced fault-tolerant control systems; these ensure an efficient fault detection and a timely response to enhance fault recovery, prevent faults from propagating or developing into total failures, and reduce the risk of safety hazards. To this end, methods are presented for the design of advanced fault-tolerant control systems for chemical processes which explicitly deal with actuator/controller failures and sensor faults and data losses. Specifically, the book puts forward: ·         a framework for  detection, isolation and diagnosis of actuator and sensor faults for nonlinear systems; ·         controller reconfiguration and safe-parking-based fault-handling methodologies; ·         integrated-data- and model-based fault-detection and isolation and fault-tolerant control methods; ·         methods for handling sensor faults and data losses; and ·      ...

Mhaskar, Prashant; Christofides, Panagiotis D

2013-01-01

112

Fault tolerant computer control for a Maglev transportation system  

Science.gov (United States)

Magnetically levitated (Maglev) vehicles operating on dedicated guideways at speeds of 500 km/hr are an emerging transportation alternative to short-haul air and high-speed rail. They have the potential to offer a service significantly more dependable than air and with less operating cost than both air and high-speed rail. Maglev transportation derives these benefits by using magnetic forces to suspend a vehicle 8 to 200 mm above the guideway. Magnetic forces are also used for propulsion and guidance. The combination of high speed, short headways, stringent ride quality requirements, and a distributed offboard propulsion system necessitates high levels of automation for the Maglev control and operation. Very high levels of safety and availability will be required for the Maglev control system. This paper describes the mission scenario, functional requirements, and dependability and performance requirements of the Maglev command, control, and communications system. A distributed hierarchical architecture consisting of vehicle on-board computers, wayside zone computers, a central computer facility, and communication links between these entities was synthesized to meet the functional and dependability requirements on the maglev. Two variations of the basic architecture are described: the Smart Vehicle Architecture (SVA) and the Zone Control Architecture (ZCA). Preliminary dependability modeling results are also presented.

Lala, Jaynarayan H.; Nagle, Gail A.; Anagnostopoulos, George

1994-01-01

113

Fault Tolerant Neural Network for ECG Signal Classification Systems  

Directory of Open Access Journals (Sweden)

Full Text Available The aim of this paper is to apply a new robust hardware Artificial Neural Network (ANN for ECG classification systems. This ANN includes a penalization criterion which makes the performances in terms of robustness. Specifically, in this method, the ANN weights are normalized using the auto-prune method. Simulations performed on the MIT ? BIH ECG signals, have shown that significant robustness improvements are obtained regarding potential hardware artificial neuron failures. Moreover, we show that the proposed design achieves better generalization performances, compared to the standard back-propagation algorithm.

MERAH, M.

2011-08-01

114

Evaluation of error detection coverage and fault-tolerance of digital plant protection system in nuclear power plants  

Energy Technology Data Exchange (ETDEWEB)

Recently, traditional analog-based safety-related instrumentation and control (I and C) systems in nuclear power plants (NPPs) have been replaced with modern digital-based systems. Due to the digitalization of nuclear I and C systems, the safety assessment has become a major issue, as it is crucial to the system's reliability. In the safety assessment of the digitalized system, evaluation of error detection coverage and fault-tolerance are critical factors. For the evaluation, we use C++ based hardware description instead of a board with integrated circuit components. We select the digital plant protection system (DPPS) in NPPs as a target system. Permanent fault is used as a possible fault in the system and some error detection methods are used to detect errors. From the experiment, we confirmed that the proposed approach can evaluate the error detection coverage and the fault-tolerance of DPPS in NPPs.

Lee, Jun Seok [Center for Advanced Reactor Research, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701 (Korea, Republic of)]. E-mail: wahrheit@kaist.ac.kr; Kim, Man Cheol [Center for Advanced Reactor Research, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701 (Korea, Republic of)]. E-mail: charleskim@kaist.ac.kr; Seong, Poong Hyun [Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong, Yuseong-gu, Daejeon 305-701 (Korea, Republic of)]. E-mail: phseong@kaist.ac.kr; Kang, Hyun Gook [Integrated Safety Assessment Team, Korea Atomic Energy Research Institute, 150 Deokjin-dong, Yuseong-gu, Daejeon 305-353 (Korea, Republic of)]. E-mail: hgkang@kaeri.re.kr; Jang, Seung Cheol [Integrated Safety Assessment Team, Korea Atomic Energy Research Institute, 150 Deokjin-dong, Yuseong-gu, Daejeon 305-353 (Korea, Republic of)]. E-mail: scjang@kaeri.re.kr

2006-04-15

115

The BTeV DAQ and Trigger System - Some throughput, usability and fault tolerance aspects  

Energy Technology Data Exchange (ETDEWEB)

As presented at the last CHEP conference, the BTeV triggering and data collection pose a significant challenge in construction and operation, generating 1.5 Terabytes/second of raw data from over 30 million detector channels. We report on facets of the DAQ and trigger farms. We report on the current design of the DAQ, especially its partitioning features to support commissioning of the detector. We are exploring collaborations with computer science groups experienced in fault tolerant and dynamic real-time and embedded systems to develop a system to provide the extreme flexibility and high availability required of the heterogeneous trigger farm ({approximately} ten thousand DSPs and commodity processors). We describe directions in the following areas: system modeling and analysis using the Model Integrated Computing approach to assist in the creation of domain-specific modeling, analysis, and program synthesis environments for building complex, large-scale computer-based systems; System Configuration Management to include compilable design specifications for configurable hardware components, schedules, and communication maps; Runtime Environment and Hierarchical Fault Detection/Management--a system-wide infrastructure for rapidly detecting, isolating, filtering, and reporting faults which will be encapsulated in intelligent active entities (agents) to run on DSPs, L2/3 processors, and other supporting processors throughout the system.

Erik Edward Gottschalk et al.

2001-08-20

116

Model Checking a Byzantine-Fault-Tolerant Self-Stabilizing Protocol for Distributed Clock Synchronization Systems  

Science.gov (United States)

This report presents the mechanical verification of a simplified model of a rapid Byzantine-fault-tolerant self-stabilizing protocol for distributed clock synchronization systems. This protocol does not rely on any assumptions about the initial state of the system. This protocol tolerates bursts of transient failures, and deterministically converges within a time bound that is a linear function of the self-stabilization period. A simplified model of the protocol is verified using the Symbolic Model Verifier (SMV) [SMV]. The system under study consists of 4 nodes, where at most one of the nodes is assumed to be Byzantine faulty. The model checking effort is focused on verifying correctness of the simplified model of the protocol in the presence of a permanent Byzantine fault as well as confirmation of claims of determinism and linear convergence with respect to the self-stabilization period. Although model checking results of the simplified model of the protocol confirm the theoretical predictions, these results do not necessarily confirm that the protocol solves the general case of this problem. Modeling challenges of the protocol and the system are addressed. A number of abstractions are utilized in order to reduce the state space. Also, additional innovative state space reduction techniques are introduced that can be used in future verification efforts applied to this and other protocols.

Malekpour, Mahyar R.

2007-01-01

117

State of the art on fault-tolerant real time distributed systems  

International Nuclear Information System (INIS)

The integration of new computerized functions in power plant, and especially nuclear power plant, control and instrumentation systems implies more and more stringent requirements as to communication system reliability. For if an item of equipment, or even a computer program, can be validated and qualified, no formal qualification procedure is presently imposed on communication networks. This is certainly due to the relative immaturity of these networks, but also to their complexity. It is for this reason that, in the context of preparation for the future PWR 2000 standardized nuclear plants, it would seem appropriate to take a look at fault-tolerant communication systems. Since C and I type applications (in the control room) are divided between several computers and are required to contend with extremely severe time constraints, EDF has undertaken investigation of fault-tolerant, real time distributed systems. This paper summarized the state of the art in the field as it appears from discussion with computer manufacturers, academics and research workers on related projects. The results obtained were then used to determine trends as to ''promising'' solutions. The paper concludes with recommended study programs for the PCC department of EDF/R and DD for the next few years. (author), 9 figs., 10 refs., 2 annexes

118

Task Migration for Fault-Tolerance in Mixed-Criticality Embedded Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper we are interested in mixed-criticality embedded applications implemented on distributed architectures. Depending on their time-criticality, tasks can be hard or soft real-time and regarding safety-criticality, tasks can be fault-tolerant to transient faults, permanent faults, or have no dependability requirements. We use Earliest Deadline First (EDF) scheduling for the hard tasks and the Constant Bandwidth Server (CBS) for the soft tasks. The CBS parameters determine the quality...

Saraswat, Prabhat Kumar; Pop, Paul; Madsen, Jan

2010-01-01

119

Modeling and Verification for Timing Satisfaction of Fault-Tolerant Systems with Finiteness  

CERN Document Server

The increasing use of model-based tools enables further use of formal verification techniques in the context of distributed real-time systems. To avoid state explosion, it is necessary to construct a verification model that focuses on the aspects under consideration. In this paper, we discuss how we construct a verification model for timing analysis in distributed real-time systems. We (1) give observations concerning restrictions of timed automata to model these systems, (2) formulate mathematical representations how to perform model-to-model transformation to derive verification models from system models, and (3) propose some theoretical criteria how to reduce the model size. The latter is in particular important, as for the verification of complex systems, an efficient model reflecting the properties of the system under consideration is equally important to the verification algorithm itself. Finally, we present an extension of the model-based development tool FTOS, designed to develop fault-tolerant system...

Cheng, Chih-Hong; Esparza, Javier; Knoll, Alois

2009-01-01

120

Economical and Fault-Tolerant Load Balancing in Distributed Stream Processing Systems  

Science.gov (United States)

We present an economical and fault-tolerant load balancing strategy (EFTLBS) based on an operator replication mechanism and a load shedding method, that fully utilizes the network resources to realize continuous and highly-available data stream processing without dynamic operator migration over wide area networks. In this paper, we first design an economical operator distribution (EOD) plan based on a bin-packing model under the constraints of each stream bandwidth as well as each server's CPU capacity. Next, we devise super-operator (SO) that load balances multi-degree operator replicas. Moreover, for improving the fault-tolerance of the system, we color the SOs based on a coloring bin-packing (CBP) model that assigns peer operator replicas to different servers. To minimize the effects of input rate bursts upon the system, we take advantage of a load shedding method while keeping the QoS guarantees made by the system based on the SO scheme and the CBP model. Finally, we substantiate the utility of our work through experiments on ns-3.

Xiao, Fuyuan; Kitasuka, Teruaki; Aritsugi, Masayoshi

 
 
 
 
121

Task Migration for Fault-Tolerance in Mixed-Criticality Embedded Systems  

DEFF Research Database (Denmark)

In this paper we are interested in mixed-criticality embedded applications implemented on distributed architectures. Depending on their time-criticality, tasks can be hard or soft real-time and regarding safety-criticality, tasks can be fault-tolerant to transient faults, permanent faults, or have no dependability requirements. We use Earliest Deadline First (EDF) scheduling for the hard tasks and the Constant Bandwidth Server (CBS) for the soft tasks. The CBS parameters determine the quality of service (QoS) of soft tasks. Transient faults are tolerated using checkpointing with roll- back recovery. For tolerating permanent faults in processors, we use task migration, i.e., restarting the safety-critical tasks on other processors. We propose a Greedy-based on- line heuristic for the migration of safety-critical tasks, in response to permanent faults, and the adjustment of CBS parameters on the target processors, such that the faults are tolerated, the deadlines for the hard real-time tasks are satisfied and the QoS for soft tasks is maximized. The proposed online adaptive approach has been evaluated using several synthetic benchmarks and a real-life case study.

Saraswat, Prabhat Kumar; Pop, Paul

2009-01-01

122

Observer-based Guaranteed Cost Fault-tolerant Controller Design for Networked Control Systems  

Directory of Open Access Journals (Sweden)

Full Text Available The problem of integrity against sensor failures for networked control systems based on state observer is studied. Assuming that the time-delay is more than one sampling period, the system is modeled as a discrete time system with parametrical uncertainties. Based on the model, the state observe is designed and according to possible sensor failures, an augmented mathematic model for the networked control systems based on state observer is developed. Then in terms of the given quadratic performance index function, the integrity condition of the system is given and the designs for guaranteed cost fault-tolerant controller and observer are presented, by using the cooperative design approach of the controller and observer and the approach of bilinear matrix inequalities. An example is given to show the effectiveness of our method.

Xuan Li

2011-01-01

123

Multi-agent Platform and Toolbox for Fault Tolerant Networked Control Systems  

Directory of Open Access Journals (Sweden)

Full Text Available Industrial distributed networked control systems use different communication networks to exchange different critical levels of information. Real-time control, fault diagnosis (FDI and Fault Tolerant Networked Control (FTNC systems demand one of the more stringent data exchange in the communication networks of these networked control systems (NCS. When dealing with large-scale complex NCS, designing FTNC systems is a very difficult task due to the large number of sensors and actuators spatially distributed and network connected. To solve this issue, a FTNC platform and toolbox are presented in this paper using simple and verifiable principles coming mainly from a decentralized design based on causal modelling partitioning of the NCS and distributed computing using multi-agent systems paradigm, allowing the use of agents with well established FTC methodologies or new ones developed taking into account the NCS specificities. The multi-agent platform and toolbox for FTNC systems have been built in Matlab/Simulink environment, which is in our days the scientific benchmark for this kind of research. Although the tests have been performed with a simple case, the results are promising and this approach is expected to succeed with more complex processes.

Mário J. G. C. Mendes

2009-04-01

124

GRID COMPUTING AND FAULT TOLERANCE APPROACH  

Directory of Open Access Journals (Sweden)

Full Text Available Grid computing is a means of allocating the computational power of alarge number of computers to complex difficult computation orproblem. Grid computing is a distributed computing paradigm thatdiffers from traditional distributed computing in that it is aimed toward large scale systems that even span organizational boundaries. This paper proposes a method to achieve maximum fault tolerance in the Grid environment system by using Reliability consideration by using Replication approach and Check-point approach. Fault tolerance is an important property for large scale computational grid systems, where geographically distributed nodes co-operate to execute a task. In order to achieve high level of reliability and availability, the grid infrastructure should be a foolproof fault tolerant. Since the failure of resources affects job execution fatally, fault tolerance service is essential to satisfy QOS requirement in grid computing. Commonly utilized techniques for providing fault tolerance are job check pointing and replication. Both techniques mitigate the amount of work lost due to changing system availability but can introduce significant runtime overhead. The latter largely depends on the length of check pointing interval and the chosen number of replicas, respectively. In case of complex scientific workflows where tasks can execute in well defined order reliability is another biggest challenge because of the unreliable nature of the grid resources.

Pankaj Gupta,

2011-10-01

125

Backstepping decentralized fault tolerant control for reconfigurable modular robots  

Directory of Open Access Journals (Sweden)

Full Text Available For the actuators fault of reconfigurable modular robots, a backstepping decentralized fault tolerant control(DFTC algorithm is proposed. The reconfigurable robot system is divied into a set of interconnected subsystems. The fault tolerant controller is designed based on backstepping method.

Jinbao He

2013-07-01

126

A Fault-Tolerant Emergency-Aware Access Control Scheme for Cyber-Physical Systems  

CERN Document Server

Access control is an issue of paramount importance in cyber-physical systems (CPS). In this paper, an access control scheme, namely FEAC, is presented for CPS. FEAC can not only provide the ability to control access to data in normal situations, but also adaptively assign emergency-role and permissions to specific subjects and inform subjects without explicit access requests to handle emergency situations in a proactive manner. In FEAC, emergency-group and emergency-dependency are introduced. Emergencies are processed in sequence within the group and in parallel among groups. A priority and dependency model called PD-AGM is used to select optimal response-action execution path aiming to eliminate all emergencies that occurred within the system. Fault-tolerant access control polices are used to address failure in emergency management. A case study of the hospital medical care application shows the effectiveness of FEAC.

Wu, Guowei; Xia, Feng; Yao, Lin

2012-01-01

127

Low cost management of replicated data in fault-tolerant distributed systems  

Science.gov (United States)

Many distributed systems replicate data for fault tolerance or availability. In such systems, a logical update on a data item results in a physical update on a number of copies. The synchronization and communication required to keep the copies of replicated data consistent introduce a delay when operations are performed. A technique is described that relaxes the usual degree of synchronization, permitting replicated data items to be updated concurrently with other operations, while at the same time ensuring that correctness is not violated. The additional concurrency thus obtained results in better response time when performing operations on replicated data. How this technique performs in conjunction with a roll-back and a roll-forward failure recovery mechanism is also discussed.

Joseph, Thomas A.; Birman, Kenneth P.

1990-01-01

128

The MAFT architecture for distributed fault tolerance  

Energy Technology Data Exchange (ETDEWEB)

This paper describes the Multicomputer Architecture for Fault-Tolerance (MAFT), a distributed system designed to provide extremely reliable computation in real-time control systems. MAFT is based on the physical and functional partitioning of executive functions from application functions. The implementation of the executive functions in a special-purpose hardware processor allows the fault-tolerance functions to be transparent to the application programs and minimizes overhead. Byzantine Agreement and Approximate Agreement algorithms are employed for critical system parameters. MAFT supports the use of multiversion hardware and software to tolerate built-in or generic faults. Graceful degradation and restoration of the application workload is permitted in response to the exclusion and readmission of nodes, respectively.

Kieckhafer, R.M.; Walter, C.J.; Finn, A.M.; Thambidurai, P.M.

1988-04-01

129

Fault tolerant operation of switched reluctance machine  

Science.gov (United States)

The energy crisis and environmental challenges have driven industry towards more energy efficient solutions. With nearly 60% of electricity consumed by various electric machines in industry sector, advancement in the efficiency of the electric drive system is of vital importance. Adjustable speed drive system (ASDS) provides excellent speed regulation and dynamic performance as well as dramatically improved system efficiency compared with conventional motors without electronics drives. Industry has witnessed tremendous grow in ASDS applications not only as a driving force but also as an electric auxiliary system for replacing bulky and low efficiency auxiliary hydraulic and mechanical systems. With the vast penetration of ASDS, its fault tolerant operation capability is more widely recognized as an important feature of drive performance especially for aerospace, automotive applications and other industrial drive applications demanding high reliability. The Switched Reluctance Machine (SRM), a low cost, highly reliable electric machine with fault tolerant operation capability, has drawn substantial attention in the past three decades. Nevertheless, SRM is not free of fault. Certain faults such as converter faults, sensor faults, winding shorts, eccentricity and position sensor faults are commonly shared among all ASDS. In this dissertation, a thorough understanding of various faults and their influence on transient and steady state performance of SRM is developed via simulation and experimental study, providing necessary knowledge for fault detection and post fault management. Lumped parameter models are established for fast real time simulation and drive control. Based on the behavior of the faults, a fault detection scheme is developed for the purpose of fast and reliable fault diagnosis. In order to improve the SRM power and torque capacity under faults, the maximum torque per ampere excitation are conceptualized and validated through theoretical analysis and experiments. With the proposed optimal waveform, torque production is greatly improved under the same Root Mean Square (RMS) current constraint. Additionally, position sensorless operation methods under phase faults are investigated to account for the combination of physical position sensor and phase winding faults. A comprehensive solution for position sensorless operation under single and multiple phases fault are proposed and validated through experiments. Continuous position sensorless operation with seamless transition between various numbers of phase fault is achieved.

Wang, Wei

130

Design of fault tolerant control system for individual blade control helicopters  

Science.gov (United States)

This dissertation presents the development of a fault tolerant control scheme for helicopters fitted with individually controlled blades. This novel approach attempts to improve fault tolerant capabilities of helicopter control system by increasing control redundancy using additional actuators for individual blade input and software re-mixing to obtain nominal or close to nominal conditions under failure. An advanced interactive simulation environment has been developed including modeling of sensor failure, swashplate actuator failure, individual blade actuator failure, and blade delamination to support the design, testing, and evaluation of the control laws. This simulation environment is based on the blade element theory for the calculation of forces and moments generated by the main rotor. This discretized model allows for individual blade analysis, which in turn allows measuring the consequences of a stuck blade, or loss of the surface area of the blade itself, with respect to the dynamics of the whole helicopter. The control laws are based on non-linear dynamic inversion and artificial neural network augmentation, which is a mix of linear and nonlinear methods that compensates for model inaccuracies due to linearization or failure. A stability analysis based on the Lyapunov function approach has shown that bounded tracking error is guaranteed, and under specific circumstances, global stability is guaranteed as well. An analysis over the degrees of freedom of the mechanical system and its impact over the helicopter handling qualities is also performed to measure the degree of redundancy achieved with the addition of individual blade actuators as compared to a classic swashplate helicopter configuration. Mathematical analysis and numerical simulation, using reconfiguration of the individual blade control under failure have shown that this control architecture can potentially improve the survivability of the aircraft and reduce pilot workload under failure conditions.

Tamayo, Sergio

131

Fault Tolerant Homopolar Magnetic Bearings  

Science.gov (United States)

Magnetic suspensions (MS) satisfy the long life and low loss conditions demanded by satellite and ISS based flywheels used for Energy Storage and Attitude Control (ACESE) service. This paper summarizes the development of a novel MS that improves reliability via fault tolerant operation. Specifically, flux coupling between poles of a homopolar magnetic bearing is shown to deliver desired forces even after termination of coil currents to a subset of failed poles . Linear, coordinate decoupled force-voltage relations are also maintained before and after failure by bias linearization. Current distribution matrices (CDM) which adjust the currents and fluxes following a pole set failure are determined for many faulted pole combinations. The CDM s and the system responses are obtained utilizing 1D magnetic circuit models with fringe and leakage factors derived from detailed, 3D, finite element field models. Reliability results are presented vs. detection/correction delay time and individual power amplifier reliability for 4, 6, and 7 pole configurations. Reliability is shown for two success criteria, i.e. (a) no catcher bearing contact following pole failures and (b) re-levitation off of the catcher bearings following pole failures. An advantage of the method presented over other redundant operation approaches is a significantly reduced requirement for backup hardware such as additional actuators or power amplifiers.

Li, Ming-Hsiu; Palazzolo, Alan; Kenny, Andrew; Provenza, Andrew; Beach, Raymond; Kascak, Albert

2003-01-01

132

Laboratory test methodology for evaluating the effects of electromagnetic disturbances on fault-tolerant control systems  

Science.gov (United States)

Control systems for advanced aircraft, especially those with relaxed static stability, will be critical to flight and will, therefore, have very high reliability specifications which must be met for adverse as well as nominal operating conditions. Adverse conditions can result from electromagnetic disturbances caused by lightning, high energy radio frequency transmitters, and nuclear electromagnetic pulses. Tools and techniques must be developed to verify the integrity of the control system in adverse operating conditions. The most difficult and illusive perturbations to computer based control systems caused by an electromagnetic environment (EME) are functional error modes that involve no component damage. These error modes are collectively known as upset, can occur simultaneously in all of the channels of a redundant control system, and are software dependent. A methodology is presented for performing upset tests on a multichannel control system and considerations are discussed for the design of upset tests to be conducted in the lab on fault tolerant control systems operating in a closed loop with a simulated plant.

Belcastro, Celeste M.

1989-01-01

133

Design of an active fault tolerant control and polytopic unknown input observer for systems described by a multi-model representation  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper, an active Fault Tolerant Control (FTC) strategy is developed to systems described by multiple linear models to prevent the system deterioration by the synthesis of adapted controllers. First, a Polytopic Unknown Input Observer is synthesized for providing actuator fault estimation. The actuator fault estimation is used in a FTC scheme which schedules some predefined state feedback gains. These gains are performed through LMI both in fault-free and faulty cases in order to prese...

Rodrigues, Mickael; Theilliol, Didier; Sauter, Dominique

2005-01-01

134

High Speed, High Temperature, Fault Tolerant Operation of a Combination Magnetic-Hydrostatic Bearing Rotor Support System for Turbomachinery  

Science.gov (United States)

Closed loop operation of a single, high temperature magnetic radial bearing to 30,000 RPM (2.25 million DN) and 540 C (1000 F) is discussed. Also, high temperature, fault tolerant operation for the three axis system is examined. A novel, hydrostatic backup bearing system was employed to attain high speed, high temperature, lubrication free support of the entire rotor system. The hydrostatic bearings were made of a high lubricity material and acted as journal-type backup bearings. New, high temperature displacement sensors were successfully employed to monitor shaft position throughout the entire temperature range and are described in this paper. Control of the system was accomplished through a stand alone, high speed computer controller and it was used to run both the fault-tolerant PID and active vibration control algorithms.

Jansen, Mark; Montague, Gerald; Provenza, Andrew; Palazzolo, Alan

2004-01-01

135

Fault Tolerant External Memory Algorithms  

DEFF Research Database (Denmark)

Algorithms dealing with massive data sets are usually designed for I/O-efficiency, often captured by the I/O model by Aggarwal and Vitter. Another aspect of dealing with massive data is how to deal with memory faults, e.g. captured by the adversary based faulty memory RAM by Finocchi and Italiano. However, current fault tolerant algorithms do not scale beyond the internal memory. In this paper we investigate for the first time the connection between I/O-efficiency in the I/O model and fault tolerance in the faulty memory RAM, and we assume that both memory and disk are unreliable. We show a lower bound on the number of I/Os required for any deterministic dictionary that is resilient to memory faults. We design a static and a dynamic deterministic dictionary with optimal query performance as well as an optimal sorting algorithm and an optimal priority queue. Finally, we consider scenarios where only cells in memory or only cells on disk are corruptible and separate randomized and deterministic dictionaries in the latter.

JØrgensen, Allan GrØnlund; Brodal, Gerth StØlting

2009-01-01

136

Fault-Tolerant, Real-Time, Multi-Core Computer System  

Science.gov (United States)

A document discusses a fault-tolerant, self-aware, low-power, multi-core computer for space missions with thousands of simple cores, achieving speed through concurrency. The proposed machine decides how to achieve concurrency in real time, rather than depending on programmers. The driving features of the system are simple hardware that is modular in the extreme, with no shared memory, and software with significant runtime reorganizing capability. The document describes a mechanism for moving ongoing computations and data that is based on a functional model of execution. Because there is no shared memory, the processor connects to its neighbors through a high-speed data link. Messages are sent to a neighbor switch, which in turn forwards that message on to its neighbor until reaching the intended destination. Except for the neighbor connections, processors are isolated and independent of each other. The processors on the periphery also connect chip-to-chip, thus building up a large processor net. There is no particular topology to the larger net, as a function at each processor allows it to forward a message in the correct direction. Some chip-to-chip connections are not necessarily nearest neighbors, providing short cuts for some of the longer physical distances. The peripheral processors also provide the connections to sensors, actuators, radios, science instruments, and other devices with which the computer system interacts.

Gostelow, Kim P.

2012-01-01

137

Research on fault diagnose and fault tolerant control of steam generator based on strong tracking filter  

International Nuclear Information System (INIS)

In order to further improve the safety of nuclear power plants, based on the nonlinear system with stochastic noise, the strong tracking filter is used to evaluate the sensor fault bias of steam generator control system and reconstruct the sensors output to implement the fault tolerant control. The simulation results demonstrate that this method can evaluate the time-varying sensor fault bias effectively and has great fault tolerant ability, and the methodology employing the strong tracking filter for steam generator fault tolerant control design is effective. (authors)

138

Fault tolerant sequential control  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Due to an increasing number of functions and process steps modern mechatronic assembly lines become more and more complex. Especially high precision systems face the conflict between extended availability requirements and system complexity, bearing in mind the related economic pressure. Besides the increasing demands in productivity there are also raising demands in availability and reliability. Hence a systematic approach of manufacturing and assembly plants is necessary. To meet this challe...

Neugebauer, Reimund; Barthel, S.; Richter, M.

2010-01-01

139

Task Mapping and Bandwidth Reservation for Mixed Hard/Soft Fault-Tolerant Embedded Systems  

DEFF Research Database (Denmark)

In this paper we are interested in mixed hard/soft real-time fault-tolerant applications mapped on distributed heterogeneous architectures. We use the Earliest Deadline First (EDF) scheduling for the hard real-time tasks and the Constant Bandwidth Server (CBS) for the soft tasks. The bandwidth reserved for the servers determines the quality of service (QoS) for soft tasks. CBS enforces temporal isolation, such that soft task overruns do not affect the timing guarantees of hard tasks. Transient faults in hard tasks are tolerated using checkpointing with rollback recovery. We have proposed a Tabu Search-based approach for task mapping and CBS bandwidth reservation, such that the deadlines for the hard tasks are satisfied, even in the case of transient faults, and the QoS for the soft tasks is maximized. Researchers have used fixed execution time models, such as the worst-case execution times for hard tasks and average execution times for soft tasks. However, we show that by using stochastic execution times for soft tasks, significant improvements can be obtained. The proposed strategy has been evaluated using an extensive set of benchmarks.

Saraswat, Prabhat Kumar; Pop, Paul

2010-01-01

140

Fault tolerant control - a residual based set-up  

Digital Repository Infrastructure Vision for European Research (DRIVER)

A new set-up for fault tolerant control (FTC) for stable systems is presented in this paper. The new set-up is based on a simple implementation of the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization. This implementation of the YJBK parameterization will allow a direct and simple reconfiguration of the feedback controller. Another central part of fault tolerant control is fault diagnosis. The controller implementation can be applied directly in connection with both passiv...

Niemann, Hans Henrik; Poulsen, Niels Kjølstad

2010-01-01

 
 
 
 
141

A Concept for fault tolerant controllers  

DEFF Research Database (Denmark)

This paper describe a concept for fault tolerant controllers (FTC) based on the YJBK (after Youla, Jabr, Bongiorno and Kucera) parameterization. This controller architecture will allow to change the controller on-line in the case of faults in the system. In the described FTC concept, a safe mode controller is applied as the basic feedback controller. A controller for normal operation with high performance is obtained by including certain YJBK parameters (transfer functions) in the controller. This will allow a fast switch from normal operation to safe mode operation in case of critical faults in the system. The described FTC architecture allow the different feedback controllers to apply different sets of sensors and actuators.

Niemann, Hans Henrik; Poulsen, Niels KjØlstad

2009-01-01

142

Actuator fault diagnosis and fault-tolerant control: Application to the quadruple-tank process  

Science.gov (United States)

The paper focuses on an important problem related to the modern control systems, which is the robust fault-tolerant control. In particular, the problem is oriented towards a practical application to quadruple-tank process. The proposed approach starts with a general description of the system and fault-tolerant strategy, which is composed of a suitably integrated fault estimator and robust controller. The subsequent part of the paper is concerned with the design of robust controller as well as the proposed fault-tolerant control scheme. To confirm the effectiveness of the proposed approach, the final part of the paper presents experimental results for considered the quadruple-tank process.

Buciakowski, Mariusz; de Rozprza-Faygel, Micha?; Ocha?ek, Joanna; Witczak, Marcin

2014-12-01

143

Fault Tolerant Ethernet Based Network for Time Sensitive Applications in Electrical Power Distribution Systems  

Directory of Open Access Journals (Sweden)

Full Text Available The paper analyses and experimentally verifies deployment of Ethernet based network technology to enable fault tolerant and timely exchange of data among a number of high voltage protective relays that use proprietary serial communication line to exchange data in real time on a state of its high voltage circuitry facilitating a fast protection switching in case of critical failures. The digital serial signal is first fetched into PCM multiplexer where it is mapped to the corresponding E1 (2 Mbit/s time division multiplexed signal. Subsequently, the resulting E1 frames are then packetized and sent through Ethernet control LAN to the opposite PCM demultiplexer where the same but reverse processing is done finally sending a signal into the opposite protective relay. The challenge of this setup is to assure very timely delivery of the control information between protective relays even in the cases of potential failures of Ethernet network itself. The tolerance of Ethernet network to faults is assured using widespread per VLAN Rapid Spanning Tree Protocol potentially extended by 1+1 PCM protection as a valuable option.

Leos Bohac

2013-01-01

144

Evaluation of Simple Causal Message Logging for Large-Scale Fault Tolerant HPC Systems  

Energy Technology Data Exchange (ETDEWEB)

The era of petascale computing brought machines with hundreds of thousands of processors. The next generation of exascale supercomputers will make available clusters with millions of processors. In those machines, mean time between failures will range from a few minutes to few tens of minutes, making the crash of a processor the common case, instead of a rarity. Parallel applications running on those large machines will need to simultaneously survive crashes and maintain high productivity. To achieve that, fault tolerance techniques will have to go beyond checkpoint/restart, which requires all processors to roll back in case of a failure. Incorporating some form of message logging will provide a framework where only a subset of processors are rolled back after a crash. In this paper, we discuss why a simple causal message logging protocol seems a promising alternative to provide fault tolerance in large supercomputers. As opposed to pessimistic message logging, it has low latency overhead, especially in collective communication operations. Besides, it saves messages when more than one thread is running per processor. Finally, we demonstrate that a simple causal message logging protocol has a faster recovery and a low performance penalty when compared to checkpoint/restart. Running NAS Parallel Benchmarks (CG, MG and BT) on 1024 processors, simple causal message logging has a latency overhead below 5%.

Bronevetsky, G; Meneses, E; Kale, L V

2011-02-25

145

Designing fault-tolerant real-time computer systems with diversified bus architecture for nuclear power plants  

International Nuclear Information System (INIS)

Fault-tolerant real-time computer (FT-RTC) systems are widely used to perform safe operation of nuclear power plants (NPP) and safe shutdown in the event of any untoward situation. Design requirements for such systems need high reliability, availability, computational ability for measurement via sensors, control action via actuators, data communication and human interface via keyboard or display. All these attributes of FT-RTC systems are required to be implemented using best known methods such as redundant system design using diversified bus architecture to avoid common cause failure, fail-safe design to avoid unsafe failure and diagnostic features to validate system operation. In this context, the system designer must select efficient as well as highly reliable diversified bus architecture in order to realize fault-tolerant system design. This paper presents a comparative study between CompactPCI bus and Versa Module Eurocard (VME) bus architecture for designing FT-RTC systems with switch over logic system (SOLS) for NPP. (author)

146

Fault tolerant control - a residual based set-up  

DEFF Research Database (Denmark)

A new set-up for fault tolerant control (FTC) for stable systems is presented in this paper. The new set-up is based on a simple implementation of the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization. This implementation of the YJBK parameterization will allow a direct and simple reconfiguration of the feedback controller. Another central part of fault tolerant control is fault diagnosis. The controller implementation can be applied directly in connection with both passive diagnosis (PFD) as well as with active fault diagnosis (AFD). The presented FTC set-up is investigated with respect to sensor reconfiguration. Actuator reconfiguration can be dealt with in a similar way.

Niemann, Hans Henrik; Poulsen, Niels KjØlstad

2009-01-01

147

Fault Tolerant Magnetic Bearing for Turbomachinery  

Science.gov (United States)

NASA Glenn Research Center (GRC) has developed a Fault-Tolerant Magnetic Bearing Suspension rig to enhance the bearing system safety. It successfully demonstrated that using only two active poles out of eight redundant poles from each radial bearing (that is, simply 12 out of 16 poles dead) levitated the rotor and spun it without losing stability and desired position up to the maximum allowable speed of 20,000 rpm. In this paper, it is demonstrated that as far as the summation of force vectors of the attracting poles and rotor weight is zero, a fault-tolerant magnetic bearing system maintained the rotor at the desired position without losing stability even at the maximum rotor speed. A proportional-integral-derivative (PID) controller generated autonomous corrective actions with no operator's input for the fault situations without losing load capacity in terms of rotor position. This paper also deals with a centralized modal controller to better control the dynamic behavior over system modes.

Choi, Benjamin; Provenza, Andrew

2001-01-01

148

R2PC: fault-tolerance made easy  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Fault-tolerance is a concept that is becoming more and more important as computers are increasingly being used in application areas such as process control, air-traffic control and communication systems. However, the construction of fault-tolerant software remains a very difficult task, as it requires extensive knowledge and experience on the part of the designers of the system. The basics of the Remote Procedure Call (RPC) protocol and its many variants are a fundamental mechanism that p...

Manso, Oscar

1999-01-01

149

Multiagent-Based Fault Tolerance Management for Robustness  

Science.gov (United States)

Despite the use of software engineering best practices and tools, it would be very risky to assume that the software that is developed today is fault-free. Moreover, we have to consider the fact that the software could face unexpected situations not considered during its design. Robustness is a highly desirable and sometimes indispensable software requirement, especially for critical systems, where the consequences of a system failure can be catastrophic. This chapter outlines existing fault tolerance techniques, followed by a discussion of the potential that multiagent systems have to enhance the design of robust, fault-tolerant systems, thereby improving large-scale, critical, and complex system reliability.

Gutierrez, Rosa Laura Zavala; Huhns, Michael

150

Fault Tolerant Environment in web crawler Using Hardware Failure Detection  

Directory of Open Access Journals (Sweden)

Full Text Available Fault Tolerant Environment is a complete programming environment for the reliable execution of distributed application programs. Fault Tolerant Distributed Environment encompasses all aspects of modern fault-tolerant distributed computing. The built-in user-transparent error detection mechanism covers processor node crashes and hardware transient failures. The mechanism also integrates user-assisted error checks into the system failure model. The nucleus non-blocking checkpointing mechanism combined with a novel low overhead roll forward recovery scheme delivers an efficient, low-overload backup and recovery mechanism for distributed processes. Fault Tolerant Distributed Environment also provides a means of remote automatic process allocation on distributed system nodes. In case of recovery is not possible, we can use new microrebooting approach to store the system to stable state.

Anup Garje , Prof. Bhavesh Patel , Dr. B. B. Mesharm

2012-06-01

151

Fault Tolerant Control with Additive Compensation for Faults in an Automotive Damper  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Abstract--A novel Fault-Tolerant Controller is proposed for an automotive suspension system based on a Quarter of Vehicle (QoV) model. The design is divided in a robust Linear Parameter-Varying controller used to isolate vibrations from external disturbances and in a compensation mechanism used to accommodate actuator faults. The compensation mechanism is based on a robust fault detection and estimation scheme that reconstructs a fault on the semi-active damper; this information is used to re...

Tudon-mart?nez, Juan; Varrier, Se?bastien; Morales Menendez, Ruben; Ramirez Mendoza, Ricardo; Koenig, Damien; Martinez Molina, John Jairo; Sename, Olivier

2013-01-01

152

Fault-tolerant almost exact state transmission.  

Science.gov (United States)

We show that a category of one-dimensional XY-type models may enable high-fidelity quantum state transmissions, regardless of details of coupling configurations. This observation leads to a fault-tolerant design of a state transmission setup. The setup is fault-tolerant, with specified thresholds, against engineering failures of coupling configurations, fabrication imperfections or defects, and even time-dependent noises. We propose an experimental implementation of the fault-tolerant scheme using hard-core bosons in one-dimensional optical lattices. PMID:24185259

Wang, Zhao-Ming; Wu, Lian-Ao; Modugno, Michele; Yao, Wang; Shao, Bin

2013-01-01

153

Fault-tolerant logics for FPGA linux  

International Nuclear Information System (INIS)

The increasing use of SRAM-based reconfigurable architectures at important areas of research and development (like particle accelerators and space applications) brings new, currently partially unattended effects on top. An already well known, but nevertheless important problem of such systems is its susceptibility to radiation which increases in conjunction with particle flux and energy. Regarding to current knowledge, errors induced by Single Event Upsets (SEU) and Single Event Transients (SET) are handled exclusively in hardware by the use of spacial and temporal redundancy features. Our field of research is to extend conventional fault tolerance to multiple layers of embedded computer systems, starting with the FPGA bit layer and ending up in the software application layer to get a maximum of radiation tolerance in systems running FPGA Linux in radiation susceptible environments. Only a collaboration of all these layers is able to create an adequate amount of data security and process integrity.

154

Fault-Tolerant Identification in Wireless Sensor Networks for Maximizing System Lifetime  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Wireless Sensor Network (WSN) is used by manyapplications such as security, command and control andsurveillance monitoring. In all such applications, themain application of WSN is sensing data and retrieval ofdata. There are many WSN systems that are querybased. They give responses in a stipulated time based onthe user’s query word. However, the WSN has possiblesensor faults for it is not reliable and thus the networkenergy level goes down. It results in reduction of lifetimeof network. To ...

Middela Shailaja; AnandaRaj S.P; Poornima.S

2012-01-01

155

Fault Tolerant Architecture for Telecom Wireless CORBA  

Directory of Open Access Journals (Sweden)

Full Text Available In order for non-mobile ORB to interoperate with CORBA objects and clients running on a mobile terminal, OMG have specified Wireless Access and Terminal Mobility of CORBA. In the common core of the CORBA specification, Fault Tolerance has been specified. But it is intended for the wired networks. This study proposes a fault tolerant architecture for the Telecom wireless CORBA based on replication and checkpoint of objects. The storage available at Access Bridge is employed to log messages and entity states of objects on behalf of mobile terminals. The logging and recovery infrastructures are designed on each Access Bridge, to implement the fault tolerant for Telecom wireless CORBA. The Logging Mechanism records the message in a log, from which the Recovery Mechanism can retrieve the message during recovery. The performance analysis shows that the proposed fault tolerant architecture ensures a low loss of computing incurred by the fault of the server object. The proposed fault tolerance architecture is a graceful extension of the original wired Fault Tolerant CORBA and is able to cooperate with the published CORBA specifications seamlessly.

Zhenpeng Xu

2013-01-01

156

Extensions to the Parallel Real-Time Artificial Intelligence System (PRAIS) for fault-tolerant heterogeneous cycle-stealing reasoning  

Science.gov (United States)

Extensions to an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS) are discussed. PRAIS strives for transparently parallelizing production (rule-based) systems, even under real-time constraints. PRAIS accomplished these goals (presented at the first annual C Language Integrated Production System (CLIPS) conference) by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors. Results using the original PRAIS architecture over a network of Sun 3's, Sun 4's and VAX's are presented. Mechanisms using the producer-consumer model to extend the architecture for fault-tolerance and distributed truth maintenance initiation are also discussed.

Goldstein, David

1991-01-01

157

Fault tolerant programmable digital attitude control electronics study  

Science.gov (United States)

The attitude control electronics mechanization study to develop a fault tolerant autonomous concept for a three axis system is reported. Programmable digital electronics are compared to general purpose digital computers. The requirements, constraints, and tradeoffs are discussed. It is concluded that: (1) general fault tolerance can be achieved relatively economically, (2) recovery times of less than one second can be obtained, (3) the number of faulty behavior patterns must be limited, and (4) adjoined processes are the best indicators of faulty operation.

Sorensen, A. A.

1974-01-01

158

On Fault Tolerance of Resources in Computational Grids  

Directory of Open Access Journals (Sweden)

Full Text Available Grid computing or computational grid is always a vast research field in academic, as well as in industryalso. Computational grid provides resource sharing through multi-institutional virtual organizations fordynamic problem solving. Various heterogeneous resources of different administrative domain are virtuallydistributed through different network in computational grids. Thus any type of failure can occur at anypoint of time and job running in grid environment might fail. Hence fault tolerance is an important andchallenging issue in grid computing as the dependability of individual grid resources may not beguaranteed. In order to make computational grids more effective and reliable fault tolerant system isnecessary. The objective of this paper is to review different existing fault tolerance techniques applicable ingrid computing. This paper presents state of the art of various fault tolerance technique and comparativestudy of the existing algorithms.

Arindam Das

2012-10-01

159

Dynamic Fault Tolerance in Desktop Grids Based On Reliability  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Fault tolerance is an important issue to guarantee reliable execution of tasks in computational desktop grid environment where execution failures are frequently expected, requires the availability of efficient fault tolerant strategies able to effectively deal with resource failures and/or unplanned periods of unavailability. In this paper we present a Dynamic Fault Tolerant strategy that, rather than just tolerating faults as done by traditional fault-tolerant schedulers, exploit the info...

Geeta Arora; Dr. Shaveta Rani; Dr. Paramjit Singh

2013-01-01

160

A continuous-time semi-markov bayesian belief network model for availability measure estimation of fault tolerant systems  

Scientific Electronic Library Online (English)

Full Text Available SciELO Brazil | Language: English Abstract in portuguese Neste trabalho, é proposto um modelo baseado na integração entre processos semi-Markovianos e redes Bayesianas para avaliação da disponibilidade de sistemas tolerantes à falha. Esta integração resulta em um modelo estocástico híbrido o qual é capaz de representar as características dinâmicas de um s [...] istema assim como tratar as relações de causa e efeito entre fatores externos tais como condições ambientais e operacionais. Além disso, o modelo híbrido permite avaliar a propagação de incerteza sobre a disponibilidade do sistema. É também proposto um procedimento numérico para a solução das equações de probabilidade de estado de processos semi-Markovianos descritos por taxas de transição. Tal procedimento numérico é baseado na aplicação de transformadas de Laplace que são invertidas pelo método de quadratura Gaussiana conhecido como Gauss Legendre. O modelo híbrido e procedimento numérico são ilustrados por meio de um exemplo de aplicação no contexto de sistemas tolerantes à falha. Abstract in english In this work it is proposed a model for the assessment of availability measure of fault tolerant systems based on the integration of continuous time semi-Markov processes and Bayesian belief networks. This integration results in a hybrid stochastic model that is able to represent the dynamic charact [...] eristics of a system as well as to deal with cause-effect relationships among external factors such as environmental and operational conditions. The hybrid model also allows for uncertainty propagation on the system availability. It is also proposed a numerical procedure for the solution of the state probability equations of semi-Markov processes described in terms of transition rates. The numerical procedure is based on the application of Laplace transforms that are inverted by the Gauss quadrature method known as Gauss Legendre. The hybrid model and numerical procedure are illustrated by means of an example of application in the context of fault tolerant systems.

Márcio das Chagas, Moura; Enrique López, Droguett.

2008-08-01

 
 
 
 
161

SABRE: a bio-inspired fault-tolerant electronic architecture  

International Nuclear Information System (INIS)

As electronic devices become increasingly complex, ensuring their reliable, fault-free operation is becoming correspondingly more challenging. It can be observed that, in spite of their complexity, biological systems are highly reliable and fault tolerant. Hence, we are motivated to take inspiration for biological systems in the design of electronic ones. In SABRE (self-healing cellular architectures for biologically inspired highly reliable electronic systems), we have designed a bio-inspired fault-tolerant hierarchical architecture for this purpose. As in biology, the foundation for the whole system is cellular in nature, with each cell able to detect faults in its operation and trigger intra-cellular or extra-cellular repair as required. At the next level in the hierarchy, arrays of cells are configured and controlled as function units in a transport triggered architecture (TTA), which is able to perform partial-dynamic reconfiguration to rectify problems that cannot be solved at the cellular level. Each TTA is, in turn, part of a larger multi-processor system which employs coarser grain reconfiguration to tolerate faults that cause a processor to fail. In this paper, we describe the details of operation of each layer of the SABRE hierarchy, and how these layers interact to provide a high systemic level of fault tolerance. (paper)

162

Simulation modeling based method for choosing an effective set of fault tolerance mechanisms for real-time avionics systems  

Science.gov (United States)

In this paper, the reliability allocation problem (RAP) for real-time avionics systems (RTAS) is considered. The proposed method for solving this problem consists of two steps: (i) creation of an RTAS simulation model at the necessary level of abstraction and (ii) application of metaheuristic algorithm to find an optimal solution (i. e., to choose an optimal set of fault tolerance techniques). When during the algorithm execution it is necessary to measure the execution time of some software components, the simulation modeling is applied. The procedure of simulation modeling also consists of the following steps: automatic construction of simulation model of the RTAS configuration and running this model in a simulation environment to measure the required time. This method was implemented as an experimental software tool. The tool works in cooperation with DYANA simulation environment. The results of experiments with the implemented method are presented. Finally, future plans for development of the presented method and tool are briefly described.

Bakhmurov, A. G.; Balashov, V. V.; Glonina, A. B.; Pashkov, V. N.; Smeliansky, R. L.; Volkanov, D. Yu.

2013-12-01

163

On the Fault Tolerance and Hamiltonicity of the Optical Transpose Interconnection System of Non-Hamiltonian Base Graphs  

CERN Document Server

Hamiltonicity is an important property in parallel and distributed computation. Existence of Hamiltonian cycle allows efficient emulation of distributed algorithms on a network wherever such algorithm exists for linear-array and ring, and can ensure deadlock freedom in some routing algorithms in hierarchical interconnection networks. Hamiltonicity can also be used for construction of independent spanning tree and leads to designing fault tolerant protocols. Optical Transpose Interconnection Systems or OTIS (also referred to as two-level swapped network) is a widely studied interconnection network topology which is popular due to high degree of scalability, regularity, modularity and package ability. Surprisingly, to our knowledge, only one strong result is known regarding Hamiltonicity of OTIS - showing that OTIS graph built of Hamiltonian base graphs are Hamiltonian. In this work we consider Hamiltonicity of OTIS networks, built on Non-Hamiltonian base and answer some important questions. First, we prove tha...

Ghosh, Esha; Rangan, C Pandu

2011-01-01

164

Fault-tolerant quantum dynamical decoupling.  

Science.gov (United States)

Dynamical decoupling pulse sequences have been used to extend coherence times in quantum systems ever since the discovery of the spin-echo effect. Here we introduce a method of recursively concatenated dynamical decoupling pulses, designed to overcome both decoherence and operational errors. This is important for coherent control of quantum systems such as quantum computers. For bounded-strength, non-Markovian environments, such as for the spin-bath that arises in electron- and nuclear-spin based solid-state quantum computer proposals, we show that it is strictly advantageous to use concatenated pulses, as opposed to standard periodic dynamical decoupling pulse sequences. Namely, the concatenated scheme is both fault tolerant and superpolynomially more efficient, at equal cost. We derive a condition on the pulse noise level below which concatenation is guaranteed to reduce decoherence. PMID:16383882

Khodjasteh, K; Lidar, D A

2005-10-28

165

SIFT - Multiprocessor architecture for Software Implemented Fault Tolerance flight control and avionics computers  

Science.gov (United States)

A brief description of a SIFT (Software Implemented Fault Tolerance) Flight Control Computer with emphasis on implementation is presented. A multiprocessor system that relies on software-implemented fault detection and reconfiguration algorithms is described. A high level reliability and fault tolerance is achieved by the replication of computing tasks among processing units.

Forman, P.; Moses, K.

1979-01-01

166

Computer aided reliability, availability, and safety modeling for fault-tolerant computer systems with commentary on the HARP program  

Science.gov (United States)

Many of the most challenging reliability problems of our present decade involve complex distributed systems such as interconnected telephone switching computers, air traffic control centers, aircraft and space vehicles, and local area and wide area computer networks. In addition to the challenge of complexity, modern fault-tolerant computer systems require very high levels of reliability, e.g., avionic computers with MTTF goals of one billion hours. Most analysts find that it is too difficult to model such complex systems without computer aided design programs. In response to this need, NASA has developed a suite of computer aided reliability modeling programs beginning with CARE 3 and including a group of new programs such as: HARP, HARP-PC, Reliability Analysts Workbench (Combination of model solvers SURE, STEM, PAWS, and common front-end model ASSIST), and the Fault Tree Compiler. The HARP program is studied and how well the user can model systems using this program is investigated. One of the important objectives will be to study how user friendly this program is, e.g., how easy it is to model the system, provide the input information, and interpret the results. The experiences of the author and his graduate students who used HARP in two graduate courses are described. Some brief comparisons were made with the ARIES program which the students also used. Theoretical studies of the modeling techniques used in HARP are also included. Of course no answer can be any more accurate than the fidelity of the model, thus an Appendix is included which discusses modeling accuracy. A broad viewpoint is taken and all problems which occurred in the use of HARP are discussed. Such problems include: computer system problems, installation manual problems, user manual problems, program inconsistencies, program limitations, confusing notation, long run times, accuracy problems, etc.

Shooman, Martin L.

1991-01-01

167

Electronic Power Switch for Fault-Tolerant Networks  

Science.gov (United States)

Power field-effect transistors reduce energy waste and simplify interconnections. Current switch containing power field-effect transistor (PFET) placed in series with each load in fault-tolerant power-distribution system. If system includes several loads and supplies, switches placed in series with adjacent loads and supplies. System of switches protects against overloads and losses of individual power sources.

Volp, J.

1987-01-01

168

FAULT TOLERANCE IN FPGA THROUGH KING SHIFTING  

Digital Repository Infrastructure Vision for European Research (DRIVER)

A wide range of fault tolerance methods for FPGAs have been proposed. Approaches range from simple architectural redundancy to fully on-line adaptive implementations. The homogeneous structure of ?eld programmable gate arrays (FPGAs) suggests that the defect tolerance can be achieved by shifting the con?guration data inside the FPGA. All methods and schemes are qualitatively compared and some particularly promising approaches are highlighted. The applications of these methods also ...

Sharma, S.; Kshirsagar, R. V.

2012-01-01

169

Design of Fault Tolerant Reversible Multiplier  

Directory of Open Access Journals (Sweden)

Full Text Available In the recent years, reversible logic has emerged as a promising technology having its applications in low power CMOS, quantum computing, nanotechnology, and optical computing. The classical set of gates such as AND, OR, and EXOR are not reversible. This paper proposes a novel 4x4 bit reversible fault tolerant multiplier circuit which can multiply two 4-bit numbers. It is faster and has lower hardware complexity compared to the existing designs. In addition, the proposed reversible multiplier is better than the existing counterparts in terms of delay & power. It is based on two concepts. The partial products can be generated in parallel using Fredkin gates and thereafter the addition is done by using reversible parallel adder designed from IG gates. Thus, this paper provides the initial threshold to building of more complex system which can execute more complicated operations using reversible logic.

H. P. Sinha

2012-01-01

170

Fault-tolerant holonomic quantum computation  

CERN Document Server

We explain how to combine holonomic quantum computation (HQC) with fault tolerant quantum error correction. This establishes the scalability of HQC, putting it on equal footing with other models of computation, while retaining the inherent robustness the method derives from its geometric nature.

Oreshkov, Ognyan; Lidar, Daniel A

2008-01-01

171

Ranking Components using FTCloud for Fault-Tolerant Cloud Applications  

Directory of Open Access Journals (Sweden)

Full Text Available Building highly reliable cloud applications is a challenging and critical research problem.FTCloud framework is introduced to solve this issue in cloud environment.FTCloud is a component ranking based framework for building fault-tolerant cloud applications.It consists of two algorithms.FTCloud1 uses component invocation structures and invocation frequencies for finding significant components.FTCloud2 fuses the system structure information as well as component characteristics to identify the significant components in a cloud application.This paper proposes VM Restart technique, an fault tolerant strategy based on cloud features which increases the reliability of cloud application Experimental results shows that the effectiveness of FTCloud by tolerating faults of a small part of the most significant components, the reliability of cloud application can be greatly improved.

Ms. V. Asha Judi

2014-03-01

172

Reconfigurable Fault Tolerance for FPGAs  

Science.gov (United States)

The invention allows a field-programmable gate array (FPGA) or similar device to be efficiently reconfigured in whole or in part to provide higher capacity, non-redundant operation. The redundant device consists of functional units such as adders or multipliers, configuration memory for the functional units, a programmable routing method, configuration memory for the routing method, and various other features such as block RAM, I/O (random access memory, input/output) capability, dedicated carry logic, etc. The redundant device has three identical sets of functional units and routing resources and majority voters that correct errors. The configuration memory may or may not be redundant, depending on need. For example, SRAM-based FPGAs will need some type of radiation-tolerant configuration memory, or they will need triple-redundant configuration memory. Flash or anti-fuse devices will generally not need redundant configuration memory. Some means of loading and verifying the configuration memory is also required. These are all components of the pre-existing redundant FPGA. This innovation modifies the voter to accept a MODE input, which specifies whether ordinary voting is to occur, or if redundancy is to be split. Generally, additional routing resources will also be required to pass data between sections of the device created by splitting the redundancy. In redundancy mode, the voters produce an output corresponding to the two inputs that agree, in the usual fashion. In the split mode, the voters select just one input and convey this to the output, ignoring the other inputs. In a dual-redundant system (as opposed to triple-redundant), instead of a voter, there is some means to latch or gate a state update only when both inputs agree. In this case, the invention would require modification of the latch or gate so that it would operate normally in redundant mode, and would separately latch or gate the inputs in non-redundant mode.

Shuler, Robert, Jr.

2010-01-01

173

Concepts and Methods in Fault-tolerant Control  

DEFF Research Database (Denmark)

Faults in automated processes will often cause undesired reactions and shut-down of a controlled plant, and the consequences could be damage to technical parts of the plant, to personnel or the environment. Fault-tolerant control combines diagnosis with control methods to handle faults in an intelligent way. The aim is to prevent that simple faults develop into serious failure and hence increase plant availability and reduce the risk of safety hazards. Fault-tolerant control merges several disciplines into a common framework to achieve these goals. The desired features are obtained through on-line fault diagnosis, automatic condition assessment and calculation of appropriate remedial actions to avoid certain consequences of a fault. The envelope of the possible remedial actions is very wide. Sometimes, simple could be achieved by replacing a measurement from a faulty sensor by an estimate. In yet other situations, complex reconfiguration or on-line controller redesign is required. This paper gives an overviewof recent tools to analyze and explore structure and other fundamental properties of an automated system such that any inherent redundancy in the controlled process can be fully utilized to maintain availability, even though faults may occur.

Blanke, Mogens; Staroswiecly, M.

2001-01-01

174

SMaRtLight: A Practical Fault-Tolerant SDN Controller  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The increase in the number of SDN-based deployments in production networks is triggering the need to consider fault-tolerant designs of controller architectures. Commercial SDN controller solutions incorporate fault tolerance, but there has been little discussion in the SDN literature on the design of such systems and the tradeoffs involved. To fill this gap, we present a by-construction design of a fault-tolerant controller, and materialize it by proposing and formalizing a...

Botelho, Fa?bio; Bessani, Alysson; Ramos, Fernando M. V.; Ferreira, Paulo

2014-01-01

175

A New Checkpoint Approach for Fault Tolerance in Grid Computing  

Directory of Open Access Journals (Sweden)

Full Text Available Computational and Service grid are used to solve large-scalescientific application using grid resources. The main focus is onfault identification, fault rectification (fault tolerance usingcheckpoint approaches. In order to achieve the fault tolerance,checkpoint approach can be used. Job check pointing is one ofthe most common utilized techniques for providing faulttolerance in computational grids. The effectiveness of checkpointing depends on the choice of the checkpoint interval. Acommon technique for fault tolerance is dynamically adaptingthe checkpoint, in which all the failure information aremaintained in the Grid Information Server. This requires aseparate server for storage purpose in order to increase theexecution time. The main goal of checkpoint approach is tominimize the overall execution time in grid system. In this workfault tolerant scheduling is achieved using kernel-levelcheckpoint. In case of resource failure, the Fault Index BasedRescheduling (FIBR algorithm is used to reschedule the jobs tosome other available resources. This ensures that the job isexecuted with minimized execution time.

Gokuldev S

2013-06-01

176

Control switching in high performance and fault tolerant control  

DEFF Research Database (Denmark)

The problem of reliability in high performance control and in fault tolerant control is considered in this paper. A feedback controller architecture for high performance and fault tolerance is considered. The architecture is based on the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization. By using the nominal controller in the architecture as a simple and robust controller, it is possible to use the YJBK transfer function for optimization of the closed-loop performance. This can be done both in connections with normal operation of the system as well as in connection with faults in the system. The architecture will also allow changing the applied sensors and/or actuators when switching between different controllers. This switchingget particular simple for open-loop stable systems.

Niemann, Hans Henrik; Poulsen, Niels KjØlstad

2010-01-01

177

Design of Test Articles and Monitoring System for the Characterization of HIRF Effects on a Fault-Tolerant Computer Communication System  

Science.gov (United States)

This report describes the design of the test articles and monitoring systems developed to characterize the response of a fault-tolerant computer communication system when stressed beyond the theoretical limits for guaranteed correct performance. A high-intensity radiated electromagnetic field (HIRF) environment was selected as the means of injecting faults, as such environments are known to have the potential to cause arbitrary and coincident common-mode fault manifestations that can overwhelm redundancy management mechanisms. The monitors generate stimuli for the systems-under-test (SUTs) and collect data in real-time on the internal state and the response at the external interfaces. A real-time health assessment capability was developed to support the automation of the test. A detailed description of the nature and structure of the collected data is included. The goal of the report is to provide insight into the design and operation of these systems, and to serve as a reference document for use in post-test analyses.

Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Miner, Paul S.; Koppen, Sandra V.

2008-01-01

178

Proactive Fault Tolerance Using Preemptive Migration  

Energy Technology Data Exchange (ETDEWEB)

Proactive fault tolerance (FT) in high-performance computing is a concept that prevents compute node failures from impacting running parallel applications by preemptively migrating application parts away from nodes that are about to fail. This paper provides a foundation for proactive FT by defining its architecture and classifying implementation options. This paper further relates prior work to the presented architecture and classification, and discusses the challenges ahead for needed supporting technologies.

Engelmann, Christian [ORNL; Vallee, Geoffroy R [ORNL; Naughton, III, Thomas J [ORNL; Scott, Stephen L [ORNL

2009-01-01

179

Error Mitigation of Point-to-Point Communication for Fault-Tolerant Computing  

Science.gov (United States)

Fault tolerant systems require the ability to detect and recover from physical damage caused by the hardware s environment, faulty connectors, and system degradation over time. This ability applies to military, space, and industrial computing applications. The integrity of Point-to-Point (P2P) communication, between two microcontrollers for example, is an essential part of fault tolerant computing systems. In this paper, different methods of fault detection and recovery are presented and analyzed.

Akamine, Robert L.; Hodson, Robert F.; LaMeres, Brock J.; Ray, Robert E.

2011-01-01

180

Steps toward fault-tolerant quantum chemistry.  

Energy Technology Data Exchange (ETDEWEB)

Developing quantum chemistry programs on the coming generation of exascale computers will be a difficult task. The programs will need to be fault-tolerant and minimize the use of global operations. This work explores the use a task-based model that uses a data-centric approach to allocate work to different processes as it applies to quantum chemistry. After introducing the key problems that appear when trying to parallelize a complicated quantum chemistry method such as coupled-cluster theory, we discuss the implications of that model as it pertains to the computational kernel of a coupled-cluster program - matrix multiplication. Also, we discuss the extensions that would required to build a full coupled-cluster program using the task-based model. Current programming models for high-performance computing are fault-intolerant and use global operations. Those properties are unsustainable as computers scale to millions of CPUs; instead one must recognize that these systems will be hierarchical in structure, prone to constant faults, and global operations will be infeasible. The FAST-OS HARE project is introducing a scale-free computing model to address these issues. This model is hierarchical and fault-tolerant by design, allows for the clean overlap of computation and communication, reducing the network load, does not require checkpointing, and avoids the complexity of many HPC runtimes. Development of an algorithm within this model requires a change in focus from imperative programming to a data-centric approach. Quantum chemistry (QC) algorithms, in particular electronic structure methods, are an ideal test bed for this computing model. These methods describe the distribution of electrons in a molecule, which determine the properties of the molecule. The computational cost of these methods is high, scaling quartically or higher in the size of the molecule, which is why QC applications are major users of HPC resources. The complexity of these algorithms means that MPI alone is insufficient to achieve parallel scaling; QC developers have been forced to use alternative approaches to achieve scalability and would be receptive to radical shifts in the programming paradigm. Initial work in adapting the simplest QC method, Hartree-Fock, to this the new programming model indicates that the approach is beneficial for QC applications. However, the advantages to being able to scale to exascale computers are greatest for the computationally most expensive algorithms; within QC these are the high-accuracy coupled-cluster (CC) methods. Parallel coupledcluster programs are available, however they are based on the conventional MPI paradigm. Much of the effort is spent handling the complicated data dependencies between the various processors, especially as the size of the problem becomes large. The current paradigm will not survive the move to exascale computers. Here we discuss the initial steps toward designing and implementing a CC method within this model. First, we introduce the general concepts behind a CC method, focusing on the aspects that make these methods difficult to parallelize with conventional techniques. Then we outline what is the computational core of the CC method - a matrix multiply - within the task-based approach that the FAST-OS project is designed to take advantage of. Finally we outline the general setup to implement the simplest CC method in this model, linearized CC doubles (LinCC).

Taube, Andrew Garvin

2010-05-01

 
 
 
 
181

Improving Fault Tolerance in Ad-Hoc Networks by Using Residue Number System  

Directory of Open Access Journals (Sweden)

Full Text Available In this study, we presented a method for distributing data storage by using residue number system for mobile systems and wireless networks based on peer to peer paradigm. Generally, redundant residue number system is capable in error detection and correction. In proposed method, we made a new system by mixing Redundant Residue Number System (RRNS, Multi Level Residue Number System (ML RNS and Multiple Valued Logic (MVL RNS which was perfect for parallel, carry free, high speed arithmetic and the system supports secure data communication. In addition it had ability of error detection and correction. In comparison to other number systems, it had many improvements in data security, error detection and correction, speed of storage and retrieval.

A. Barati

2008-01-01

182

Algorithm-dependent fault tolerance for distributed computing  

Energy Technology Data Exchange (ETDEWEB)

Large-scale distributed systems assembled from commodity parts, like CPlant, have become common tools in the distributed computing world. Because of their size and diversity of parts, these systems are prone to failures. Applications that are being run on these systems have not been equipped to efficiently deal with failures, nor is there vendor support for fault tolerance. Thus, when a failure occurs, the application crashes. While most programmers make use of checkpoints to allow for restarting of their applications, this is cumbersome and incurs substantial overhead. In many cases, there are more efficient and more elegant ways in which to address failures. The goal of this project is to develop a software architecture for the detection of and recovery from faults in a cluster computing environment. The detection phase relies on the latest techniques developed in the fault tolerance community. Recovery is being addressed in an application-dependent manner, thus allowing the programmer to take advantage of algorithmic characteristics to reduce the overhead of fault tolerance. This architecture will allow large-scale applications to be more robust in high-performance computing environments that are comprised of clusters of commodity computers such as CPlant and SMP clusters.

P. D. Hough; M. e. Goldsby; E. J. Walsh

2000-02-01

183

Nonlinear, Adaptive and Fault-tolerant Control for Electro-hydraulic Servo Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Fluid power systems have been in use since 1795 with the rst hydraulic press patented by Joseph Bramah and today form the basis of many industries. Electro hydraulic servo systems are uid power systems controlled in closed-loop. They transform reference input signals into a set of movements in hydraulic actuators (cylinders or motors) by the means of hydraulic uid under pressure. With the development of computing power and control techniques during the last few decad...

Choux, Martin; Blanke, Mogens; Hovland, Geir

2011-01-01

184

Cooperative Fault Tolerant Distributed Computing  

Energy Technology Data Exchange (ETDEWEB)

HARNESS was proposed as a system that combined the best of emerging technologies found in current distributed computing research and commercial products into a very flexible, dynamically adaptable framework that could be used by applications to allow them to evolve and better handle their execution environment. The HARNESS system was designed using the considerable experience from previous projects such as PVM, MPI, IceT and Cumulvs. As such, the system was designed to avoid any of the common problems found with using these current systems, such as no single point of failure, ability to survive machine, node and software failures. Additional features included improved inter-component connectivity, with full support for dynamic down loading of addition components at run-time thus reducing the stress on application developers to build in all the libraries they need in advance.

Fagg, Graham E.

2006-03-15

185

Highly Reliable Fault Tolerant Technique for Safety Critical Applications  

Directory of Open Access Journals (Sweden)

Full Text Available This paper presents a highly reliable fault tolerant technique for safety critical applications using Five Modular Redundancy method. In high radiation environments like space crafts and nuclear thermal plants it is likely that single event upsets (SEU degrades the system operation. This causes single bit flips in the sequential elements of electronic components in the system. If these systems are not provided with the fault tolerance then there are high chances of obtaining false response. In order to avoid this problem the system is made redundant and a roll-forward recovery mechanism is used to increase the overall reliability. Scan cell design is employed to shift out the internal states of all the flip flops during comparison and recovery process. The proposed method is designed using verilog HDL on XILINX ISE simulator.

Nanditha S

2014-05-01

186

Fault-tolerant battery system employing intra-battery network architecture  

Science.gov (United States)

A distributed energy storing system employing a communications network is disclosed. A distributed battery system includes a number of energy storing modules, each of which includes a processor and communications interface. In a network mode of operation, a battery computer communicates with each of the module processors over an intra-battery network and cooperates with individual module processors to coordinate module monitoring and control operations. The battery computer monitors a number of battery and module conditions, including the potential and current state of the battery and individual modules, and the conditions of the battery's thermal management system. An over-discharge protection system, equalization adjustment system, and communications system are also controlled by the battery computer. The battery computer logs and reports various status data on battery level conditions which may be reported to a separate system platform computer. A module transitions to a stand-alone mode of operation if the module detects an absence of communication connectivity with the battery computer. A module which operates in a stand-alone mode performs various monitoring and control functions locally within the module to ensure safe and continued operation.

Hagen, Ronald A. (Stillwater, MN); Chen, Kenneth W. (Fair Oaks, CA); Comte, Christophe (Montreal, CA); Knudson, Orlin B. (Vadnais Heights, MN); Rouillard, Jean (Saint-Luc, CA)

2000-01-01

187

A novel implementation of supervisory based Fault Tolerant Control  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this note, we discuss a performance-based supervisor for the fault tolerant control (FTC).We conduct the notion of stability, which is often misinterpreted in our approach with the classical arbitrary switching control. Moreover, we only deal with the trajectories generated by the system in real time and we do not have any access to plant parameters, states of the system etc. Thus, we clearly distinguished this notion employed in both the approaches. A novel switching logic is also propose...

Jain, Tushar; Yame?, Joseph Julien; Sauter, Dominique

2011-01-01

188

Fault Tolerant Weighted Voting Algorithms  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Computer networks are now necessities of modern organisations and network security has become a major concern for them. In this paper we have proposed a holistic approach to network security with a hybrid model that includes an Intrusion Detection System (IDS) to detect network attacks and a survivability model to assess the impacts of undetected attacks. A neural network-based IDS has been proposed, where the learning mechanism for the neural network is evolved using genetic algorithm. Then ...

Azad Azadmanesh; Alireza Farahani; Lotfi Najjar

2008-01-01

189

Fault Tolerant Distributed Middleware for VLAB  

Science.gov (United States)

With increasingly large storage media, fast processors and improved data-collecting instruments, the datasets in scientific fields are growing at an exponential rate. How to analyze, visualize and manipulate those datasets (geographically distributed in most cases) easily and efficiently within a collaborative environment is rather challenging. We address this problem through NaradaBrokering (NB), a unique and flexible middleware application program interface (API) (http://www.naradabrokering.org, [1]). Topics, rather than IP addresses and hostnames are used to locate remote services and support collaboration. In our framework, the underlying hardware, middleware, system load, or resource availability is transparent to the end users. Web Services are the key components within this framework that enable building of loosely coupled applications. Furthermore, multiple service providers can provide identical services. Our system routes client requests to the best qualified service provider according to some default or user-defined conditions. This not only provides a desired Quality of Service (QOS), but also acts as a load balancing mechanism to better distribute the workload across available services. We have deployed NB between Florida State University, the University of Minnesota and Indiana University, and installed multiple instances of a wavelet service. We will demonstrate fault tolerance with respect to the faulty nodes in the NB network and faulty Wavelet service providers. Two users sharing an identical view through an applet will illustrate the collaborative nature of our system. [1] S. Pallickara and G. Fox, "NaradaBrokering: A Middleware Framework and Architec- ture for Enabling Durable Peer-to-Peer Grid", in Proceedings of ACM/IFIP/USENIX International Middleware Conference Middleware-2003. pp 41-61, (2003)

Lu, Z.; Bollig, E. F.; Erlebacher, G.; Gardgil, H.; Yuen, D.; Pierce, M.; Pallickara, S.

2005-12-01

190

Fault tolerance in Hadoop MapReduce implementation  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This document reports the advances on exploring and understanding the fault tolerance mechanisms in Hadoop MapReduce. A description of the current fault tolerance features existing in Hadoop is provided, along with a review of related works on the topic. Finally, the document describes some relevant proposals about fault tolerance worth considering to implement in Hadoop within the PERMARE project in order to provide support for pervasive computing environments.

Cogorno, Mati?as; Rey, Javier; Nesmachnow, Sergio

2013-01-01

191

Diagnosis and Fault-tolerant Control for Ship Station Keeping  

DEFF Research Database (Denmark)

This paper adresses the design process of diagnosis and fault-tolerant control when the a system should operate despite multiple failures in sensors or actuators. Graph-teory based analysis of systems structure is demonstrated to be a unique design methodology that can cope with the diagnosis design for systems of high complexity, and also analyse the cases of cascaded or multiple faults. The paper takes as example a ship with two CP propellers, rudders and a bow thruster as actuators, and instrumentation with a suite of global position sensors, inertial navigation units and conventional gyro units to provide ship motion information. A salient feature of the design mehod is the ability to analyse cases where faults have occurrred and easily determine where in the faulty system diagnosability and controlability are retained.

Blanke, Mogens

2005-01-01

192

Design and Verification of Fault-Tolerant Components  

DEFF Research Database (Denmark)

We present a systematic approach to design and verification of fault-tolerant components with real-time properties as found in embedded systems. A state machine model of the correct component is augmented with internal transitions that represent hypothesized faults. Also, constraints on the occurrence or timing of faults are included in this model. This model of a faulty component is then extended with fault detection and recovery mechanisms, again in the form of state machines. Desired properties of the component are model checked for each of the successive models. The models can be made relatively detailed such that they can serve directly as blueprints for engineering, and yet be amenable to exhaustive verication. The approach is illustrated with a design of a triple modular fault-tolerant system that is a real case we received from our collaborators in the aerospace field. We use UPPAAL to model and check this design. Model checking uses concrete parameters, so we extend the result with parametric analysis using abstractions of the automata in a rigorous verification.

Zhang, Miaomiao; Liu, Zhiming

2009-01-01

193

A novel adaptive switching function on fault tolerable sliding mode control for uncertain stochastic systems.  

Science.gov (United States)

A novel switching function based on an optimization strategy for the sliding mode control (SMC) method has been provided for uncertain stochastic systems subject to actuator degradation such that the closed-loop system is globally asymptotically stable with probability one. In the previous researches the focus on sliding surface has been on proportional or proportional-integral function of states. In this research, from a degree of freedom that depends on designer choice is used to meet certain objectives. In the design of the switching function, there is a parameter which the designer can regulate for specified objectives. A sliding-mode controller is synthesized to ensure the reachability of the specified switching surface, despite actuator degradation and uncertainties. Finally, the simulation results demonstrate the effectiveness of the proposed method. PMID:24954808

Zahiripour, Seyed Ali; Jalali, Ali Akbar

2014-09-01

194

Fault tolerant attitude control for small unmanned aircraft systems equipped with an airflow sensor array.  

Science.gov (United States)

Inspired by sensing strategies observed in birds and bats, a new attitude control concept of directly using real-time pressure and shear stresses has recently been studied. It was shown that with an array of onboard airflow sensors, small unmanned aircraft systems can promptly respond to airflow changes and improve flight performances. In this paper, a mapping function is proposed to compute aerodynamic moments from the real-time pressure and shear data in a practical and computationally tractable formulation. Since many microscale airflow sensors are embedded on the small unmanned aircraft system surface, it is highly possible that certain sensors may fail. Here, an adaptive control system is developed that is robust to sensor failure as well as other numerical mismatches in calculating real-time aerodynamic moments. The advantages of the proposed method are shown in the following simulation cases: (i) feedback pressure and wall shear data from a distributed array of 45 airflow sensors; (ii) 50% failure of the symmetrically distributed airflow sensor array; and (iii) failure of all the airflow sensors on one wing. It is shown that even if 50% of the airflow sensors have failures, the aircraft is still stable and able to track the attitude commands. PMID:25405953

Shen, H; Xu, Y; Dickinson, B T

2014-12-01

195

Communication and Agreement Abstractions for Fault-Tolerant Asynchronous Distributed Systems  

CERN Document Server

Understanding distributed computing is not an easy task. This is due to the many facets of uncertainty one has to cope with and master in order to produce correct distributed software. Considering the uncertainty created by asynchrony and process crash failures in the context of message-passing systems, the book focuses on the main abstractions that one has to understand and master in order to be able to produce software with guaranteed properties. These fundamental abstractions are communication abstractions that allow the processes to communicate consistently (namely the register abstraction

Raynal, Michel

2010-01-01

196

Analysis of Fault Tolerant Techniques in Secure Mobile Agent Paradigm  

Directory of Open Access Journals (Sweden)

Full Text Available Since the past few years, the network domains and mobile agent technology has been the fastest growing and emerging trend as well. But it has to undergo certain challenges and problems, in order to meet the bandwidth requirements. Moreover, it suffers from issues related to reliability like security and fault tolerance. During the agent migration in an itinerary from one server to other common issue is server crash or agent crash. The parameters used for the evaluation of various techniques are agent centric, system centric, fault type, coordination performance analysis, central management and adaptive. Advantages of each mechanism are also described.

Parul Arora

2014-05-01

197

Fault tolerant multiphase electrical drives: the impact of design  

Science.gov (United States)

This paper deals with fault tolerant multiphase electrical drives. The quality of the torque of a vector-controlled Permanent Magnet (PM) Synchronous Machine supplied by a multi-leg Voltage Source Inverter (VSI) is examined in normal operation and when one or two phases are open-circuited. It is then deduced that a seven-phase machine is a good compromise allowing high torque-to-volume density and easy control with smooth torque in fault operation. Experimental results confirm the predicted characteristics. This article has been submitted as part of “IET Colloquium on Reliability in Electromagnetic Systems”, 24 and 25 May 2007, Paris

Semail, E.; Kestelyn, X.; Locment, F.

2008-08-01

198

Fault-Tolerant Routing in Butterfly Networks  

Directory of Open Access Journals (Sweden)

Full Text Available This research shows that Butterfly networks can be fault-tolerant using Masked Interval Routing Scheme (MIRS. The MIRS was introduced with the aim of compressing the routing tables in a network. It was shown that MIRS could drastically reduce interval information stored in networks such as globe and hypercube graphs, compared to the classical Interval Routing Scheme (IRS. In Butterfly graphs of O(N vertices the number of intervals per edge goes down from ? in IRS to O(logN in MIRS. This research shows that MIRS may be advantageously used in Butterfly networks, proving that optimal routing with one interval per edge is still possible with a harmless subset of faulty vertices. This research gives an optimal algorithm to reconfigure the intervals in the presence of faults.

Mohammed H. Mahafzah

2010-01-01

199

Synthesis of Fault Tolerant Reversible Logic Circuits  

CERN Document Server

Reversible logic is emerging as an important research area having its application in diverse fields such as low power CMOS design, digital signal processing, cryptography, quantum computing and optical information processing. This paper presents a new 4*4 universal reversible logic gate, IG. It is a parity preserving reversible logic gate, that is, the parity of the inputs matches the parity of the outputs. The proposed parity preserving reversible gate can be used to synthesize any arbitrary Boolean function. It allows any fault that affects no more than a single signal readily detectable at the circuit's primary outputs. Finally, it is shown how a fault tolerant reversible full adder circuit can be realized using only two IGs. It has also been demonstrated that the proposed design offers less hardware complexity and is efficient in terms of gate count, garbage outputs and constant inputs than the existing counterparts.

Islam, Md Saiful; Begum, Zerina; Hafiz, Mohd Zulfiquar; Mahmud, Abdullah Al; 10.1109/CAS-ICTD.2009.4960883

2010-01-01

200

Byzantine Fault Tolerance for Nondeterministic Applications  

CERN Document Server

All practical applications contain some degree of nondeterminism. When such applications are replicated to achieve Byzantine fault tolerance (BFT), their nondeterministic operations must be sanitized to ensure replica consistency. To the best of our knowledge, only two types of replica nondeterminism have been studied under the Byzantine fault model, which we refer to as wrappable nondeterminism and verifiable pre-determinable nondeterminism. The wrappable nondeterminism is a type of nondeterminism that can be controlled using an infrastructure-provided or application-provided wrapper function, without explicit inter-replica coordination. For example, information such as hostnames, process ids, file descriptors, etc. can be determined group-wise. The verifiable pre-determinable nondeterminism is a type of nondeterminism whose values can be independently chosen by the primary replica and verified by other replicas prior to the execution of a client's request, such as the operation to retrieve the local clock v...

Zhao, W

2007-01-01

 
 
 
 
201

ACID Support and Fault-Tolerant Database Systems on Cloud:A Review  

Directory of Open Access Journals (Sweden)

Full Text Available Cloud computing represents a different way to architect and remotely manage computing resources. One has only to establish an account with Microsoft or Amazon or Google to begin building and deploying application systems into a cloud. These systems can be, but certainly are not restricted to being simplistic. Some applications requires http services, some requires relational database or might require web service infrastructure and message queues. With clouds, IT-related applications can be provided as a service, which can be accessed through internet. There are platforms on cloud which provide scalability and high availability properties for web applications but there are problems related to data consistency at the same time, and in case of server failures, it becomes major problem in applications related to payment services. Data needs to be properly managed in cloud environment and to achieve proper transaction processing and consistency, RDBMS techniques such as ACID transactions should be used. Web services in Azure ensure application availability by replicating stored data at least three times and offer optional geolocation of replicas in separate Microsoft data centres to provide disaster recovery services.Azure storage services provide scalable persistent storage of structured tables, blobs and queues.

Pratiyush Guleria

2012-10-01

202

Fault Detection for Shipboard Monitoring and Decision Support Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper a basic idea of a fault-tolerant monitoring and decision support system will be explained. Fault detection is an important part of the fault-tolerant design for in-service monitoring and decision support systems for ships. In the paper, a virtual example of fault detection will be presented for a containership with a real decision support system onboard. All possible faults can be simulated and detected using residuals and the generalized likelihood ratio (GLR) algorithm.

Lajic, Zoran; Nielsen, Ulrik Dam

2008-01-01

203

Fault Detection for Shipboard Monitoring and Decision Support Systems  

DEFF Research Database (Denmark)

In this paper a basic idea of a fault-tolerant monitoring and decision support system will be explained. Fault detection is an important part of the fault-tolerant design for in-service monitoring and decision support systems for ships. In the paper, a virtual example of fault detection will be presented for a containership with a real decision support system onboard. All possible faults can be simulated and detected using residuals and the generalized likelihood ratio (GLR) algorithm.

Lajic, Zoran; Nielsen, Ulrik Dam

2009-01-01

204

H? Fault Tolerant Control of WECS Based on the PWA Model  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The main contribution of this paper is the development of H? fault tolerant control for a wind energy conversion system (WECS) based on the stochastic piecewise affine (PWA) model. In this paper the normal and fault stochastic PWA models for WECS including multiple working points at different wind speeds are established. A reliable piecewise linear quadratic regulator state feedback is designed for the fault tolerant actuator and sensor. A sufficient condition for the existence of the pass...

Yun-Tao Shi; Qi Kou; De-Hui Sun; Zheng-Xi Li; Shu-Juan Qiao; Yan-Jiao Hou

2014-01-01

205

Fault handling schemes in electronic systems with specific application to radiation tolerance and VLSI design  

Science.gov (United States)

Naturally occurring space radiation particles can produce transient and permanent changes in the electrical properties of electronic devices and systems. In this work, the transient radiation effects on DRAM and CMOS SRAM were considered. In addition, the effect of total ionizing dose radiation of the switching times of CMOS logic gates were investigated. Effects of transient radiation on the column and cell of MOS dynamic memory cell was simulated using SPICE. It was found that the critical charge of the bitline was higher than that of the cell. In addition, the critical charge of the combined cell-bitline was found to be dependent on the gate voltage of the access transistor. In addition, the effect of total ionizing dose radiation on the switching times of CMOS logic gate was obtained. The results of this work indicate that, the rise time of CMOS logic gates increases, while the fall time decreases with an increase in total ionizing dose radiation. Also, by increasing the size of the P-channel transistor with respect to that of the N-channel transistor, the propagation delay of CMOS logic gate can be made to decrease with, or be independent of an increase in total ionizing dose radiation. Furthermore, a method was developed for replacing polysilicon feedback resistance of SRAMs with a switched capacitor network. A switched capacitor SRAM was implemented using MOS Technology. The critical change of the switched capacitor SRAM has a very large critical charge. The results of this work indicate that switched capacitor SRAM is a viable alternative to SRAM with polysilicon feedback resistance.

Attia, John Okyere

1993-01-01

206

Superior model for fault tolerance computation in designing nano-sized circuit systems  

Science.gov (United States)

As CMOS technology scales nano-metrically, reliability turns out to be a decisive subject in the design methodology of nano-sized circuit systems. As a result, several computational approaches have been developed to compute and evaluate reliability of desired nano-electronic circuits. The process of computing reliability becomes very troublesome and time consuming as the computational complexity build ups with the desired circuit size. Therefore, being able to measure reliability instantly and superiorly is fast becoming necessary in designing modern logic integrated circuits. For this purpose, the paper firstly looks into the development of an automated reliability evaluation tool based on the generalization of Probabilistic Gate Model (PGM) and Boolean Difference-based Error Calculator (BDEC) models. The Matlab-based tool allows users to significantly speed-up the task of reliability analysis for very large number of nano-electronic circuits. Secondly, by using the developed automated tool, the paper explores into a comparative study involving reliability computation and evaluation by PGM and, BDEC models for different implementations of same functionality circuits. Based on the reliability analysis, BDEC gives exact and transparent reliability measures, but as the complexity of the same functionality circuits with respect to gate error increases, reliability measure by BDEC tends to be lower than the reliability measure by PGM. The lesser reliability measure by BDEC is well explained in this paper using distribution of different signal input patterns overtime for same functionality circuits. Simulation results conclude that the reliability measure by BDEC depends not only on faulty gates but it also depends on circuit topology, probability of input signals being one or zero and also probability of error on signal lines.

Singh, N. S. S.; Asirvadam, V. S.; Muthuvalu, M. S.

2014-10-01

207

Checkpoint-based Intelligent Fault tolerance For Cloud Service Providers  

Directory of Open Access Journals (Sweden)

Full Text Available With the increasing demand and benefits of cloud computing infrastructure, real time computing can be performed on cloud infrastructure. A real time system can take advantage of intensive computing capabilities and scalable virtualized environment of cloud computing to execute real time tasks. In most of the real time cloud applications, processing is done on remote cloud computing nodes. So there are more chances of errors, due to the undetermined latency and loose control over computing node. On the other side, most of the real time systems are also safety critical and should be highly reliable. So there is an increased requirement for fault tolerance to achieve reliability for the real time computing on cloud Infrastructure. In this paper, proposes a smart checkpoint infrastructure for virtualized service providers and fault tolerance model for real time cloud computing. The checkpoints are stored in a Hadoop Distributed File System. This allows resuming a task execution faster after a node crash and increasing the fault tolerance of the system, since checkpoints are distributed and replicated in all the nodes of the provider. This paper presents a running implementation of this infrastructure and its evaluation, demonstrating that it is an effective way to make faster checkpoints with low interference on task execution and efficient task recovery after a node failure.One advantage of cloud computing is the dynamicity of re- source provisioning. Our architecture makes use of this advantage by enabling dynamic run- time modi?cations of replication groups

Rejin Paul

2012-12-01

208

Strategic Planning for Fault-Tolerant Internet Connectivity Using Basic Fault-Tolerant Architectural Design as Platform  

Directory of Open Access Journals (Sweden)

Full Text Available Present focus in this study is to provide Internet connectivity without any interruption even at the presence of faults/failures thereby enhancing Internet services performance. To achieve this, the deployment and redeployment of faulty component(s are done using Basic Fault-Tolerant (BFT architectural design. A framework to provide enhanced performance in terms of confidentiality, integrity and availability in clusters is suggested using BFT, considering all sources of vulnerabilities including operating system/software, communication hardware, user-level communication and network protocols.

O.O. Adeosun

2008-01-01

209

Robustness and fault tolerance make brains harder to study  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Abstract Brains increase the survival value of organisms by being robust and fault tolerant. That is, brain circuits continue to operate as the organism needs, even when the circuit properties are significantly perturbed. Kispersky and colleagues, in a recent paper in Neural Systems & Circuits, have found that Granger Causality analysis, an important method used to infer circuit connections from the behavior of neurons within the circuit, is defeated by the mechanisms that ...

Stevens Charles F; Srinivasan Shyam

2011-01-01

210

Rule-based fault diagnosis of hall sensors and fault-tolerant control of PMSM  

Science.gov (United States)

Hall sensor is widely used for estimating rotor phase of permanent magnet synchronous motor(PMSM). And rotor position is an essential parameter of PMSM control algorithm, hence it is very dangerous if Hall senor faults occur. But there is scarcely any research focusing on fault diagnosis and fault-tolerant control of Hall sensor used in PMSM. From this standpoint, the Hall sensor faults which may occur during the PMSM operating are theoretically analyzed. According to the analysis results, the fault diagnosis algorithm of Hall sensor, which is based on three rules, is proposed to classify the fault phenomena accurately. The rotor phase estimation algorithms, based on one or two Hall sensor(s), are initialized to engender the fault-tolerant control algorithm. The fault diagnosis algorithm can detect 60 Hall fault phenomena in total as well as all detections can be fulfilled in 1/138 rotor rotation period. The fault-tolerant control algorithm can achieve a smooth torque production which means the same control effect as normal control mode (with three Hall sensors). Finally, the PMSM bench test verifies the accuracy and rapidity of fault diagnosis and fault-tolerant control strategies. The fault diagnosis algorithm can detect all Hall sensor faults promptly and fault-tolerant control algorithm allows the PMSM to face failure conditions of one or two Hall sensor(s). In addition, the transitions between health-control and fault-tolerant control conditions are smooth without any additional noise and harshness. Proposed algorithms can deal with the Hall sensor faults of PMSM in real applications, and can be provided to realize the fault diagnosis and fault-tolerant control of PMSM.

Song, Ziyou; Li, Jianqiu; Ouyang, Minggao; Gu, Jing; Feng, Xuning; Lu, Dongbin

2013-07-01

211

A Framework-Based Approach for Fault-Tolerant Service Robots  

Directory of Open Access Journals (Sweden)

Full Text Available Recently the component?based approach has become a major trend in intelligent service robot development due to its reusability and productivity. The framework in a component?based system should provide essential services for application components. However, to our knowledge the existing robot frameworks do not yet support fault tolerance service. Moreover, it is often believed that faults can be handled only at the application level. In this paper, by extending the robot framework with the fault tolerance function, we argue that the framework?based fault tolerance approach is feasible and even has many benefits, including that: 1 the system integrators can build fault tolerance applications from non?fault?aware components; 2 the constraints of the components and the operating environment can be considered at the time of integration, which ? cannot be anticipated eaily at the time of component development; 3 consistency in system reliability can be obtained even in spite of diverse application component sources. In the proposed construction, we build XML rule files defining the rules for probing and determining the fault conditions of each component, contamination cases from a faulty component, and the possible recovery and safety methods. The rule files are established by a system integrator and the fault manager in the framework controls the fault tolerance process according to the rules. We demonstrate that the fault?tolerant framework can incorporate widely accepted fault tolerance techniques. The effectiveness and real?time performance of the framework?based approach and its techniques are examined by testing an autonomous mobile robot in typical fault scenarios.

Heejune Ahn

2012-11-01

212

Fault Detection and Isolation and Fault Tolerant Control of Wind Turbines Using Set-Valued Observers  

DEFF Research Database (Denmark)

Research on wind turbine Operations & Maintenance (O&M) procedures is critical to the expansion of Wind Energy Conversion systems (WEC). In order to reduce O&M costs and increase the lifespan of the turbine, we study the application of Set-Valued Observers (SVO) to the problem of Fault Detection and Isolation (FDI) and Fault Tolerant Control (FTC) of wind turbines, by taking advantage of the recent advances in SVO theory for model invalidation. A simple wind turbine model is presented along with possible faulty scenarios. The FDI algorithm is built on top of the described model, taking into account process disturbances, uncertainty and sensor noise. The FTC strategy takes advantage of the proposed FDI algorithm, enabling the controller reconfiguration shortly after fault events. Additionally, a robust controller is designed so as to increase the wind turbine's performance during low severity faults. Finally, the FDI algorithm is assessed within a publicly available benchmark model, using Monte-Carlo simulation runs.

Casau, Pedro; Rosa, Paulo Andre Nobre

2012-01-01

213

On fault-tolerance with noisy and slow measurements  

CERN Document Server

We show that the threshold error rates of preparation and measurement for fault tolerant quantum computing can be improved considerably. By removing the dependence on measurements & feedback in quantum error correction using gadgets based on coherent feedback, one can circumvent the problem of noisy and slow measurements present in many physical systems. We develop the method for the Bacon-Shor code, and show fault-tolerant universal quantum computing is achievable when gate error rates are below p_{(g)thresh} = 2.89 x 10^{-5} and, assuming the gate error rate is below the threshold, measurement and preparation error rates can be as high as p_{(p,m)thresh}~7%.

Paz-Silva, Gerardo A; Twamley, Jason

2010-01-01

214

Fault Tolerance in ZigBee Wireless Sensor Networks  

Science.gov (United States)

Wireless sensor networks (WSN) based on the IEEE 802.15.4 Personal Area Network standard are finding increasing use in the home automation and emerging smart energy markets. The network and application layers, based on the ZigBee 2007 PRO Standard, provide a convenient framework for component-based software that supports customer solutions from multiple vendors. This technology is supported by System-on-a-Chip solutions, resulting in extremely small and low-power nodes. The Wireless Connections in Space Project addresses the aerospace flight domain for both flight-critical and non-critical avionics. WSNs provide the inherent fault tolerance required for aerospace applications utilizing such technology. The team from Ames Research Center has developed techniques for assessing the fault tolerance of ZigBee WSNs challenged by radio frequency (RF) interference or WSN node failure.

Alena, Richard; Gilstrap, Ray; Baldwin, Jarren; Stone, Thom; Wilson, Pete

2011-01-01

215

Fault-tolerance techniques for SRAM-based FPGAs  

CERN Document Server

Fault-tolerance in integrated circuits is no longer the exclusive concern of space designers or highly-reliable applications engineers. Today, designers of many next-generation products must cope with reduced margin noises. The continuous evolution of fabrication technology of semiconductor components – shrinking transistor geometry, power supply, speed, and logic density – has significantly reduced the reliability of very deep submicron integrated circuits, in face of various internal and external sources of noise. Field Programmable Gate Arrays (FPGAs), customizable by SRAM cells, are the latest advance in the integrated circuit evolution: millions of memory cells to implement the logic, embedded memories, routing, and embedded microprocessors cores. These re-programmable systems-on-chip platforms must be fault-tolerant to cope with current requirements.

Kastensmidt, Fernanda Lima; Reis, Ricardo

2006-01-01

216

Beam dynamics calculations for fault-tolerance  

International Nuclear Information System (INIS)

The European Transmutation Demonstration requires a high-power proton accelerator operating in CW mode. This accelerator is also expected to have a very limited number of unexpected beam interruptions per year. To reach such an ambitious goal, it is clear that reliability-oriented design practices need to be followed from the early stage of components design and fault-tolerance capabilities have to be introduced to the maximum extent. The goal of this document is precisely to investigate in more details the fault-tolerance capability of the XT-ADS linac. From previous analysis, it appears that if nothing is done, a cavity's failure leads in nearly all the cases to a complete beam loss, due to the non-relativistic varying velocity of the particles. To avoid such a total beam loss, it is clear that some kind of retuning has to be performed to compensate the lack of acceleration due to the faulty cavity. We have to identify and develop fast failure recovery scenarios to ensure that such retuning can be performed in less than 1 second. 2 ways are investigated. The first way is to stop the beam to achieve the retuning (Scenario 1). The other way is to try to perform the retuning without stopping the beam (Scenario 2). The present analysis demonstrates on the beam dynamics point of view that a fast retuning procedure can be envisaged without stopping the beam (Scenario 2). Nevertheless, this Scenario 2 implies stringent specifications, especially on: - the fault detection time, that has to be extremely short (order of magnitude: 100 ?s) and - the margins required on the accelerating field and RF power point of view, that are higher than in Scenario 1

217

Analysis of a cascaded multilevel inverter with fault-tolerant control  

Directory of Open Access Journals (Sweden)

Full Text Available Cascaded multilevel inverters are widely used in industry for speed control of induction motors and, even when the converters’ operation is highly reliable, several faults can occur, leading to poor engine performance or even causing the whole system to stop. It is desirable to keep the system operational when a failure occurs, even when degraded, and implementing fault-tolerant systems are thus a good choice. This paper presents a general strategy for fault-tolerant control in a 7-level cascaded multilevel inverter (the faults are in semiconductor devices; the paper includes simulation and experimental results to validate the method.

Jesús Aguayo Alquicira

2011-08-01

218

Active Fault Tolerant Control for Ultrasonic Piezoelectric Motor  

Science.gov (United States)

Ultrasonic piezoelectric motor technology is an important system component in integrated mechatronics devices working on extreme operating conditions. Due to these constraints, robustness and performance of the control interfaces should be taken into account in the motor design. In this paper, we apply a new architecture for a fault tolerant control using Youla parameterization for an ultrasonic piezoelectric motor. The distinguished feature of proposed controller architecture is that it shows structurally how the controller design for performance and robustness may be done separately which has the potential to overcome the conflict between performance and robustness in the traditional feedback framework. A fault tolerant control architecture includes two parts: one part for performance and the other part for robustness. The controller design works in such a way that the feedback control system will be solely controlled by the proportional plus double-integral PI2 performance controller for a nominal model without disturbances and H? robustification controller will only be activated in the presence of the uncertainties or an external disturbances. The simulation results demonstrate the effectiveness of the proposed fault tolerant control architecture.

Boukhnifer, Moussa

2012-07-01

219

Fault Detection and Isolation and Fault Tolerant Control of Wind Turbines Using Set-Valued Observers  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Research on wind turbine Operations & Maintenance (O&M) procedures is critical to the expansion of Wind Energy Conversion systems (WEC). In order to reduce O&M costs and increase the lifespan of the turbine, we study the application of Set-Valued Observers (SVO) to the problem of Fault Detection and Isolation (FDI) and Fault Tolerant Control (FTC) of wind turbines, by taking advantage of the recent advances in SVO theory for model invalidation. A simple wind turbine model is presented along w...

Casau, Pedro; Rosa, Paulo Andre Nobre; Tabatabaeipour, Seyed Mojtaba; Silvestre, Carlos

2012-01-01

220

A New Checkpoint Approach for Fault Tolerance in Grid Computing  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Computational and Service grid are used to solve large-scalescientific application using grid resources. The main focus is onfault identification, fault rectification (fault tolerance) usingcheckpoint approaches. In order to achieve the fault tolerance,checkpoint approach can be used. Job check pointing is one ofthe most common utilized techniques for providing faulttolerance in computational grids. The effectiveness of checkpointing depends on the choice of the checkpoint interval. Acommon t...

Gokuldev S; Valarmathi M

2013-01-01

 
 
 
 
221

Hypothetical Scenario Generator for Fault-Tolerant Diagnosis  

Science.gov (United States)

The Hypothetical Scenario Generator for Fault-tolerant Diagnostics (HSG) is an algorithm being developed in conjunction with other components of artificial- intelligence systems for automated diagnosis and prognosis of faults in spacecraft, aircraft, and other complex engineering systems. By incorporating prognostic capabilities along with advanced diagnostic capabilities, these developments hold promise to increase the safety and affordability of the affected engineering systems by making it possible to obtain timely and accurate information on the statuses of the systems and predicting impending failures well in advance. The HSG is a specific instance of a hypothetical- scenario generator that implements an innovative approach for performing diagnostic reasoning when data are missing. The special purpose served by the HSG is to (1) look for all possible ways in which the present state of the engineering system can be mapped with respect to a given model and (2) generate a prioritized set of future possible states and the scenarios of which they are parts.

James, Mark

2007-01-01

222

On the Practicality of `Practical' Byzantine Fault Tolerance  

CERN Document Server

Byzantine Fault Tolerant (BFT) systems are considered by the systems research community to be state of the art with regards to providing reliability in distributed systems. BFT systems provide safety and liveness guarantees with reasonable assumptions, amongst a set of nodes where at most f nodes display arbitrarily incorrect behaviors, known as Byzantine faults. Despite this, BFT systems are still rarely used in practice. In this paper we describe our experience, from an application developer's perspective, trying to leverage the publicly available and highly-tuned PBFT middleware (by Castro and Liskov), to provide provable reliability guarantees for an electronic voting application with high security and robustness needs. We describe several obstacles we encountered and drawbacks we identified in the PBFT approach. These include some that we tackled, such as lack of support for dynamic client management and leaving state management completely up to the application. Others still remaining include the lack of...

Chondros, Nikos; Roussopoulos, Mema

2011-01-01

223

Guaranteed Cost Fault-tolerant Control of Networked Control Systems with Short Output Delay and Short Control Delay Based on State Observer  

Directory of Open Access Journals (Sweden)

Full Text Available Supposing that the sensor and controller nodes were time-driven and the actuator node was event-driven, the problem of integrity against sensor failures for the networked control systems with short output delay and short control delay was discussed based on observer. The state observer of the system according to the time-delay compensation strategy was designed. Then, considering possible sensor failures, an augmented mathematic model for the networked control systems based on observer was developed. In terms of the given quadratic performance index function, the integrity condition of the system was given and the designs for guaranteed cost fault-tolerant controller and observer were presented respectively by using the cooperative design approach of the controller and observer and the approach of bilinear matrix inequalities. Finally, a numerical simulation example demonstrated the conclusions are feasible and effective. The proposed control method meets the requirements in industrial networked control systems.

Xiaomao Huang

2013-04-01

224

Fault diagnosis and fault-tolerant control and guidance for aerospace vehicles from theory to application  

CERN Document Server

Fault Diagnosis and Fault-Tolerant Control and Guidance for Aerospace demonstrates the attractive potential of recent developments in control for resolving such issues as improved flight performance, self-protection and extended life of structures. Importantly, the text deals with a number of practically significant considerations: tuning, complexity of design, real-time capability, evaluation of worst-case performance, robustness in harsh environments, and extensibility when development or adaptation is required. Coverage of such issues helps to draw the advanced concepts arising from academic research back towards the technological concerns of industry. Initial coverage of basic definitions and ideas and a literature review gives way to a treatment of important electrical flight control system failures: the oscillatory failure case, runaway, and jamming. Advanced fault detection and diagnosis for linear and nonlinear systems are described. Lastly recovery strategies appropriate to remaining acuator/sensor/c...

Zolghadri, Ali; Cieslak, Jerome; Efimov, Denis; Goupil, Philippe

2014-01-01

225

A Reflective Object-Oriented Architecture for Developing Fault-Tolerant Software  

Scientific Electronic Library Online (English)

Full Text Available SciELO Brazil | Language: English Abstract in english This paper proposes a reflective object-oriented architecture for developing fault-tolerant software. Reflective object-oriented programming promotes a modular structuring of systems by means of a new dimension of modularization—the separation between base-level objects and meta-level objects. This [...] property allows the creation of metaobjects responsible for managing tasks of application objects located at the base level. In the context of this work, computational reflection is applied to implement various strategies of fault tolerance at the meta-level in a transparent manner for the application programmer, that is, without interfering with the original structure of application objects that require fault tolerance facilities. The use of the proposed architecture has the following advantages: (i) separation of concerns, that is, separate the concerns related to the application domain from those related to the implementation of fault-tolerant mechanisms; (ii) it promotes code reuse of fault-tolerance mechanisms; (iii) it allows application programmers to use the most adequate fault-tolerance strategy for his implementation, and (iv) it provides a design that is more adaptable, flexible and easier to extend than traditional designs for developing fault-tolerant software. Our reflective architecture is composed of three levels, and is based on the abstraction of object groups.

Luiz E., Buzato; Cecília M. F., Rubira; Maria Lúcia B., Lisboa.

1997-11-01

226

Fault Tolerant Circuit Design Using Evolutionary Algorithms  

Directory of Open Access Journals (Sweden)

Full Text Available With the rapid development of semiconductor technology and the increasing proliferation of emission sources, digital circuits are frequently used in harsh electromagnetic environments. Electrostatic Discharge (ESD interferences are gradually gaining prominence, resulting in performance degradations, malfunctions and disturbances in component or system level applications. Conventional solutions to such problem are shielding, filtering and grounding. This paper presents an evolvable hardware platform for the automated design and adaptation of a motor control circuit. The platform uses EHW to automate the configuration of FPGA dedicated to the implementation of the motor control circuit. The ability of the platform to adapt to  certain number of faults was investigated through introducing single logic unit fault and multi-logic unit faults. Results show that the functionality of circuit can be recovered through evolution. It also shows that the placement of faulty affect the ability of GA to evolve correct circuit, and the evolutionary recovery ability of the circuit descends with the number of fault units increasing.

Hui-Cong Wu

2014-01-01

227

A Test Generation Framework for Distributed Fault-Tolerant Algorithms  

Science.gov (United States)

Heavyweight formal methods such as theorem proving have been successfully applied to the analysis of safety critical fault-tolerant systems. Typically, the models and proofs performed during such analysis do not inform the testing process of actual implementations. We propose a framework for generating test vectors from specifications written in the Prototype Verification System (PVS). The methodology uses a translator to produce a Java prototype from a PVS specification. Symbolic (Java) PathFinder is then employed to generate a collection of test cases. A small example is employed to illustrate how the framework can be used in practice.

Goodloe, Alwyn; Bushnell, David; Miner, Paul; Pasareanu, Corina S.

2009-01-01

228

An Active Fault-Tolerant Control Method Ofunmanned Underwater Vehicles with Continuous and Uncertain Faults  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper introduces a novel thruster fault diagnosis and accommodation system for open-frame underwater vehicles with abrupt faults. The proposed system consists of two subsystems: a fault diagnosis subsystem and a fault accommodation sub-system. In the fault diagnosis subsystem a ICMAC(Improved Credit Assignment Cerebellar Model Articulation Controllers) neural network is used to realize the on-line fault identification and the weighting matrix computation. The fault accommodation subsyste...

Yongsheng Yang; Qian Liu; Daqi Zhu

2008-01-01

229

Fault tolerant wind speed estimator used in wind turbine controllers  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Advanced control schemes can be used to optimize energy production and cost of energy in modern wind turbines. These control schemes most often rely on wind speed estimations. These designs of wind speed estimators are, however, not designed to be fault tolerant towards faults in the used sensors. In this paper a fault tolerant wind speed estimator is designed based on a set of unknown input observers, each designed to the different sets of non-faulty sensors. Faults in the rotor, generator a...

Odgaard, Peter Fogh; Stoustrup, Jakob

2012-01-01

230

Hybrid fault tolerance techniques to detect transient faults in embedded processors  

CERN Document Server

This book describes fault tolerance techniques based on software and hardware to create hybrid techniques. They are able to reduce overall performance degradation and increase error detection when associated with applications implemented in embedded processors. Coverage begins with an extensive discussion of the current state-of-the-art in fault tolerance techniques. The authors then discuss the best trade-off between software-based and hardware-based techniques and introduce novel hybrid techniques. Proposed techniques increase existing fault detection rates up to 100%, while maintaining low performance overheads in area and application execution time. • Discusses the effects of radiation on modern integrated circuits; • Provides a comprehensive overview of state-of-the art fault tolerance techniques based on software, hardware, and hybrid techniques; • Introduces novel hybrid fault tolerance techniques for reconfigurable FPGAs and ASICs; • Performs fault injection campaigns by simulation, bitstream ...

Azambuja, José Rodrigo; Becker, Jürgen

2014-01-01

231

Fault Tolerance In Grid Computing: State of the Art and Open Issues  

Directory of Open Access Journals (Sweden)

Full Text Available Fault tolerance is an important property for large scale computational grid systems, wheregeographically distributed nodes co-operate to execute a task. In order to achieve high level of reliabilityand availability, the grid infrastructure should be a foolproof fault tolerant. Since the failure of resourcesaffects job execution fatally, fault tolerance service is essential to satisfy QOS requirement in gridcomputing. Commonly utilized techniques for providing fault tolerance are job checkpointing andreplication. Both techniques mitigate the amount of work lost due to changing system availability but canintroduce significant runtime overhead. The latter largely depends on the length of checkpointing intervaland the chosen number of replicas, respectively. In case of complex scientific workflows where tasks canexecute in well defined order reliability is another biggest challenge because of the unreliable nature ofthe grid resources.

Ritu Garg

2011-02-01

232

Wireless Fault-Tolerant Controllers in Cascaded Industrial Workcells Using Wi-Fi and Ethernet  

Digital Repository Infrastructure Vision for European Research (DRIVER)

A Wireless Networked Control System using 802.11b is used to model fault-tolerance at the controller level of an industrial workcell. The fault-tolerance study in this paper presents the cascading of two independent workcells where each controller must be able to handle the load of both cells in case of failure of the other one. The intercommunication is completely wirele...

Refaat, Tarek K.; Daoud, Ramez M.; Amer, Hassanein H.

2013-01-01

233

Fault tolerant and lifetime control architecture for autonomous vehicles  

Science.gov (United States)

Increased vehicle autonomy, survivability and utility can provide an unprecedented impact on mission success and are one of the most desirable improvements for modern autonomous vehicles. We propose a general architecture of intelligent resource allocation, reconfigurable control and system restructuring for autonomous vehicles. The architecture is based on fault-tolerant control and lifetime prediction principles, and it provides improved vehicle survivability, extended service intervals, greater operational autonomy through lower rate of time-critical mission failures and lesser dependence on supplies and maintenance. The architecture enables mission distribution, adaptation and execution constrained on vehicle and payload faults and desirable lifetime. The proposed architecture will allow managing missions more efficiently by weighing vehicle capabilities versus mission objectives and replacing the vehicle only when it is necessary.

Bogdanov, Alexander; Chen, Yi-Liang; Sundareswaran, Venkataraman; Altshuler, Thomas

2008-04-01

234

Fault Tolerant Neuro-Robust Position Control of DC Motors  

Directory of Open Access Journals (Sweden)

Full Text Available DC motors are widely used in industry such as mechanics, robotics, and aerospace engineering. In this paper, we present a high performance control method for position control of DC motors. Fault-tolerant control model are also addressed to combine with neuro-robust control approach. It is shown that with the proposed control algorithms, external disturbances and coupled dynamics inherent in the system are effectively compensated using neural network unit in which no analytical estimation on the upper bound of the reconstruction error and uncertainties is needed. Simulations on various flight conditions also confirm the effectiveness of the proposed methods.

Ran Zhang

2011-10-01

235

Fault Tolerant Strategy for Semi-Active Suspensions with LPV Accommodation  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Abstract--A novel fault tolerant strategy to compensate multiplicative actuator faults (damper oil leakages) in a semiactive suspension system is proposed. The compensation of the lack of damping force caused by a faulty damper is carried on by the remainder three healthy semi-active dampers. Once a faulty damper is detected and isolated by a Fault Detection and Isolation strategy based on parity-space, an estimator is activated to compute the missing damping force to compensate. In order to ...

Tudon-mart?nez, Juan; Varrier, Se?bastien; Sename, Olivier; Morales Menendez, Ruben; Martinez Molina, John Jairo; Dugard, Luc

2013-01-01

236

A Reliable and Fault Tolerant Routing for Optical WDM Networks  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In optical WDM networks, since each lightpath can carry a huge mount of traffic, failures may seriously damage the end user applications. Hence fault tolerance becomes an important issue on these networks. The light path which carries traffic during normal operation is called as primary path. The traffic is rerouted on a backup path in case of a failure. In this paper we propose to design a reliable and fault tolerant routing algorithm for establishing primary and backup pat...

Ramesh, G.; Sundaravadivelu, S.

2009-01-01

237

Fault Tolerant Reconfigurable Control of a Water Delivery Canal - Actuators Faults  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This work addresses the problem of designing fault tolerant controllers for a water delivery canal that tackle actuators faults. The type of faults considered consists of blocking of one of the gates. The detection of the fault is made by comparing the gate position command with the actual (measured) gate position. Both centralized and distributed controllers are made for local upstream water level control. Centralized controllers are multivariable LQG-LTR controllers that use a model of the ...

Sampaio, Ine?s; Lemos, Joa?o; Rato, Lui?s; Rijo, Manuel

2012-01-01

238

A universal, fault-tolerant, non-linear analytic network for modeling and fault detection  

International Nuclear Information System (INIS)

The similarities and differences of a universal network to normal neural networks are outlined. The description and application of a universal network is discussed by showing how a simple linear system is modeled by normal techniques and by universal network techniques. A full implementation of the universal network as universal process modeling software on a dedicated computer system at EBR-II is described and example results are presented. It is concluded that the universal network provides different feature recognition capabilities than a neural network and that the universal network can provide extremely fast, accurate, and fault-tolerant estimation, validation, and replacement of signals in a real system

239

Coordinated Fault-Tolerance for High-Performance Computing Final Project Report  

Energy Technology Data Exchange (ETDEWEB)

With the Coordinated Infrastructure for Fault Tolerance Systems (CIFTS, as the original project came to be called) project, our aim has been to understand and tackle the following broad research questions, the answers to which will help the HEC community analyze and shape the direction of research in the field of fault tolerance and resiliency on future high-end leadership systems. Will availability of global fault information, obtained by fault information exchange between the different HEC software on a system, allow individual system software to better detect, diagnose, and adaptively respond to faults? If fault-awareness is raised throughout the system through fault information exchange, is it possible to get all system software working together to provide a more comprehensive end-to-end fault management on the system? What are the missing fault-tolerance features that widely used HEC system software lacks today that would inhibit such software from taking advantage of systemwide global fault information? What are the practical limitations of a systemwide approach for end-to-end fault management based on fault awareness and coordination? What mechanisms, tools, and technologies are needed to bring about fault awareness and coordination of responses on a leadership-class system? What standards, outreach, and community interaction are needed for adoption of the concept of fault awareness and coordination for fault management on future systems? Keeping our overall objectives in mind, the CIFTS team has taken a parallel fourfold approach. Our central goal was to design and implement a light-weight, scalable infrastructure with a simple, standardized interface to allow communication of fault-related information through the system and facilitate coordinated responses. This work led to the development of the Fault Tolerance Backplane (FTB) publish-subscribe API specification, together with a reference implementation and several experimental implementations on top of existing publish-subscribe tools. We enhanced the intrinsic fault tolerance capabilities representative implementations of a variety of key HPC software subsystems and integrated them with the FTB. Targeting software subsystems included: MPI communication libraries, checkpoint/restart libraries, resource managers and job schedulers, and system monitoring tools. Leveraging the aforementioned infrastructure, as well as developing and utilizing additional tools, we have examined issues associated with expanded, end-to-end fault response from both system and application viewpoints. From the standpoint of system operations, we have investigated log and root cause analysis, anomaly detection and fault prediction, and generalized notification mechanisms. Our applications work has included libraries for fault-tolerance linear algebra, application frameworks for coupled multiphysics applications, and external frameworks to support the monitoring and response for general applications. Our final goal was to engage the high-end computing community to increase awareness of tools and issues around coordinated end-to-end fault management.

Panda, Dhabaleswar Kumar [The Ohio State University; Beckman, Pete

2011-07-01

240

Reliable Energy Efficient Fault Tolerant Clustering in Wireless Sensor Network  

Directory of Open Access Journals (Sweden)

Full Text Available To propose a Reliable, Energy Efficient, Fault Tolerant (REEFT clustering algorithm for aggregating sensor measurements in Wireless Sensor Network (WSN. It is a hierarchical algorithm in which energy efficiency is achieved by constructing static clusters with reliable cluster head based on distance. Lifetime of WSN is improved through solving the important issues in WSN, which are distribution of clusters, optimal number of clusters and number of nodes in a cluster and optimal time duration of clustering cycle. Also the algorithm include fault tolerance feature to tolerate the Cluster Head (CH failure and improve the packet delivery ratio. The algorithm was tested using simulations and its performance improvements were analyzed.

L. Venkatesan

2014-01-01

 
 
 
 
241

A Remote Characterization System and a fault-tolerant tracking system for subsurface mapping of buried waste sites  

International Nuclear Information System (INIS)

This paper describes two closely related projects that will provide new technology for characterizing hazardous waste burial sites. The first project, a collaborative effort by five of the national laboratories, involves the development and demonstration of a remotely controlled site characterization system. The Remote Characterization System (RCS) includes a unique low-signature survey vehicle, a base station, radio telemetry data links, satellite-based vehicle tracking, stereo vision, and sensors for noninvasive inspection of the surface and subsurface. The second project, conducted by the Idaho National Engineering Laboratory (INEL), involves the development of a position sensing system that can track a survey vehicle or instrument in the field. This system can coordinate updates at a rate of 200/s with an accuracy better than 0.1% of the distance separating the target and the sensor. It can employ acoustic or electromagnetic signals in a wide range of frequencies and can be operated as a passive or active device

242

Algorithm-based fault tolerance applied to P2P computing networks  

Digital Repository Infrastructure Vision for European Research (DRIVER)

P2P computing platforms are subject to a wide range of attacks. In this paper, we propose a generalisation of the previous disk-less checkpointing approach for fault-tolerance in High Performance Computing systems. Our contribution is in two di- rections: first, instead of restricting to 2D checksums that tolerate only a small number of node failures, we propose to base disk-less checkpointing on linear codes to tolerate potentially a large number of faults. Then, we compare and analyse the u...

Roche, Thomas; Roch, Jean-louis; Cunche, Mathieu

2009-01-01

243

Robust and fault tolerant control of modular and reconfigurable robots  

Science.gov (United States)

Modular and reconfigurable robot has been one of the main areas of robotics research in recent years due to its wide range of applications, especially in aerospace sector. Dynamic control of manipulators can be performed using joint torque sensing with little information of the link dynamics. From the modular robot perspective, this advantage offered by the torque sensor can be taken to enhance the modularity of the control system. Known modular robots though boast novel and diverse mechanical design on joint modules in one way or another, they still require the whole robot dynamic model for motion control, and modularity offered in the mechanical side does not offer any advantage in the control design. In this work, a modular distributed control technique is formulated for modular and reconfigurable robots that can instantly adapt to robot reconfigurations. Under this control methodology, a modular and reconfigurable robot is stabilized joint by joint, and modules can be added or removed without the need of re-tuning the controller. Model uncertainties associated with load and links are compensated by the use of joint torque sensors. Other model uncertainties at each joint module are compensated by a decomposition based robust controller for each module. The proposed distributed control technique offers a 'modular' approach, featuring a unique joint-by-joint control synthesis of the joint modules. Fault tolerance and fault detection are formulated as a decentralized control problem for modular and reconfigurable robots in this thesis work. The modularity of the system is exploited to derive a strategy dependent only on a single joint module, while eliminating the need for the motion states of other joint modules. While the traditional fault tolerant and detection schemes are suitable for robots with the whole dynamic model, this proposed technique is ideal for modular and reconfigurable robots because of its modular nature. The proposed methods have been investigated with simulations and experimentally tested using a 3-DOF modular and reconfigurable robot.

Abdul, Sajan

244

Fault-tolerant Control of Unmanned Underwater Vehicles with Continuous Faults: Simulations and Experiments  

Digital Repository Infrastructure Vision for European Research (DRIVER)

A novel thruster fault diagnosis and accommodation method for open-frame underwater vehicles is presented in the paper. The proposed system consists of two units: a fault diagnosis unit and a fault accommodation unit. In the fault diagnosis unit an ICMAC (Improved Credit Assignment Cerebellar Model Articulation Controllers) neural network information fusion model is used to realize the fault identification of the thruster. The fault accommodation unit is based on direct calculations of moment...

Qian Liu; Daqi Zhu

2010-01-01

245

Active and Passive Fault-Tolerant LPV Control of Wind Turbines  

DEFF Research Database (Denmark)

This paper addresses the design and comparison of active and passive fault-tolerant linear parameter-varying (LPV) controllers for wind turbines. The considered wind turbine plant model is characterized by parameter variations along the nominal operating trajectory and includes a model of an incipient fault in the pitch system. We propose the design of an active fault-tolerant controller (AFTC) based on an existing LPV controller design method and extend this method to apply for the design of a passive fault-tolerant controller (PFTC). Both controllers are based on output feedback and are scheduled on the varying parameter to manage the parametervarying nature of the model. The PFTC only relies on measured system variables and an estimated wind speed, while the AFTC also relies on information from a fault diagnosis system. Consequently, the optimization problem involved in designing the PFTC is more difficult to solve, as it involves solving bilinear matrix inequalities (BMIs) instead of linear matrix inequalities (LMIs). Simulation results show the performance of the active faulttolerant control system to be slightly superior to that of the passive fault-tolerant control system.

Sloth, Christoffer; Esbensen, Thomas

2010-01-01

246

Fault-tolerant computer architecture based on INMOS transputer processor  

Science.gov (United States)

Redundant processing was used for several years in mission flight systems. In these systems, more than one processor performs the same task at the same time but only one processor is actually in real use. A fault-tolerance computer architecture based on the features provided by INMOS Transputers is presented. The Transputer architecture provides several communication links that allow data and command communication with other Transputers without the use of a bus. Additionally the Transputer allows the use of parallel processing to increase the system speed considerably. The processor architecture consists of three processors working in parallel keeping all the processors at the same operational level but only one processor is in real control of the process. The design allows each Transputer to perform a test to the other two Transputers and report the operating condition of the neighboring processors. A graphic display was developed to facilitate the identification of any problem by the user.

Ortiz, Jorge L.

1987-01-01

247

Block QCA Fault-Tolerant Logic Gates  

Science.gov (United States)

Suitably patterned arrays (blocks) of quantum-dot cellular automata (QCA) have been proposed as fault-tolerant universal logic gates. These block QCA gates could be used to realize the potential of QCA for further miniaturization, reduction of power consumption, increase in switching speed, and increased degree of integration of very-large-scale integrated (VLSI) electronic circuits. The limitations of conventional VLSI circuitry, the basic principle of operation of QCA, and the potential advantages of QCA-based VLSI circuitry were described in several NASA Tech Briefs articles, namely Implementing Permutation Matrices by Use of Quantum Dots (NPO-20801), Vol. 25, No. 10 (October 2001), page 42; Compact Interconnection Networks Based on Quantum Dots (NPO-20855) Vol. 27, No. 1 (January 2003), page 32; Bit-Serial Adder Based on Quantum Dots (NPO-20869), Vol. 27, No. 1 (January 2003), page 35; and Hybrid VLSI/QCA Architecture for Computing FFTs (NPO-20923), which follows this article. To recapitulate the principle of operation (greatly oversimplified because of the limitation on space available for this article): A quantum-dot cellular automata contains four quantum dots positioned at or between the corners of a square cell. The cell contains two extra mobile electrons that can tunnel (in the quantummechanical sense) between neighboring dots within the cell. The Coulomb repulsion between the two electrons tends to make them occupy antipodal dots in the cell. For an isolated cell, there are two energetically equivalent arrangements (denoted polarization states) of the extra electrons. The cell polarization is used to encode binary information. Because the polarization of a nonisolated cell depends on Coulomb-repulsion interactions with neighboring cells, universal logic gates and binary wires could be constructed, in principle, by arraying QCA of suitable design in suitable patterns. Heretofore, researchers have recognized two major obstacles to realization of QCA-based logic gates: One is the need for (and the difficulty of attaining) operation of QCA circuitry at room temperature or, for that matter, at any temperature above a few Kelvins. It has been theorized that room-temperature operation could be made possible by constructing QCA as molecular-scale devices. However, in approaching the lower limit of miniaturization at the molecular level, it becomes increasingly imperative to overcome the second major obstacle, which is the need for (and the difficulty of attaining) high precision in the alignments of adjacent QCA in order to ensure the correct interactions among the quantum dots.

Firjany, Amir; Toomarian, Nikzad; Modarres, Katayoon

2003-01-01

248

Fault tolerant vector control of induction motor drive  

Science.gov (United States)

For electric composed of technical objects hazardous industries, such as nuclear, military, chemical, etc. an urgent task is to increase their resiliency and survivability. The construction principle of vector control system fault-tolerant asynchronous electric. Displaying recovery efficiency three-phase induction motor drive in emergency mode using two-phase vector control system. The process of formation of a simulation model of the asynchronous electric unbalance in emergency mode. When modeling used coordinate transformation, providing emergency operation electric unbalance work. The results of modeling transient phase loss motor stator. During a power failure phase induction motor cannot save circular rotating field in the air gap of the motor and ensure the restoration of its efficiency at rated torque and speed.

Odnokopylov, G.; Bragin, A.

2014-10-01

249

Fault tolerant strategies for automated operation of nuclear reactors  

International Nuclear Information System (INIS)

This paper introduces an automatic control system incorporating a number of verification, validation, and command generation tasks with-in a fault-tolerant architecture. The integrated system utilizes recent methods of artificial intelligence such as neural networks and fuzzy logic control. Furthermore, advanced signal processing and nonlinear control methods are also included in the design. The primary goal is to create an on-line capability to validate signals, analyze plant performance, and verify the consistency of commands before control decisions are finalized. The application of this approach to the automated startup of the Experimental Breeder Reactor-II (EBR-II) is performed using a validated nonlinear model. The simulation results show that the advanced concepts have the potential to improve plant availability andsafety

250

RSFTS: RULE-BASED SEMANTIC FAULT TOLERANT SCHEDULING FOR CLOUD ENVIRONMENT  

Directory of Open Access Journals (Sweden)

Full Text Available Cloud computing has emerged as one of the latest technologies for delivering on-demand sophisticated services over the Internet. To make effective use of tremendous capabilities of the cloud, efficient scheduling algorithms are required. While concerning on large scale system, fault tolerance is a very critical issue, since the cloud resources are extensively disseminated among diverse locations. This leads to a higher probability of failures while solving huge problems, thus the cloud service reliability could be relatively low. Therefore, providing an effective fault tolerance technique for a cloud system is mandatory. This paper introduces an efficient and reliable Rule-based Semantic Fault Tolerant Scheduling (RSFTS technique for Cloud Environment. The overall system is described semantically to assign resources based on a set of semantic rules. The proposed technique could achieve the maximum reliability, availability and high efficiency.

Pandeeswari R

2013-02-01

251

RSFTS: RULE-BASED SEMANTIC FAULT TOLERANT SCHEDULING FOR CLOUD ENVIRONMENT  

Directory of Open Access Journals (Sweden)

Full Text Available Cloud computing has emerged as one of the latest technologies for delivering on-demand sophisticated services over the Internet. To make effective use of tremendous capabilities of the cloud, efficient scheduling algorithms are required. While concerning on large scale system, fault tolerance is a very critical issue, since the cloud resources are extensively disseminated among diverse locations. This leads to a higher probability of failures while solving huge problems, thus the cloud service reliability could be relatively low. Therefore, providing an effective fault tolerance technique for a cloud system is mandatory. This paper introduces an efficient and reliable Rule-based Semantic Fault Tolerant Scheduling (RSFTS technique for Cloud Environment. The overall system is described semantically to assign resources based on a set of semantic rules. The proposed technique could achieve the maximum reliability, availability and high efficiency.

Pandeeswari R

2013-03-01

252

Fault tolerant control for induction motors using sliding mode observers  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper a fault tolerant control design based on a sliding mode observer for induction motors is proposed. First, a direct field oriented controller based on backstepping technique is designed in order to steer the flux and speed variables to their desired references and to compensate the load disturbance. Second, a sliding mode observer is designed in order to detect and reconstruct the faults and also to estimate the flux. Then, additional control laws based on the estimates of the fa...

Djeghali, Nadia; Ghanes, Malek; Djennoune, Said; Barbot, Jean-pierre

2011-01-01

253

A benchmark for fault tolerant flight control evaluation:  

Digital Repository Infrastructure Vision for European Research (DRIVER)

A large transport aircraft simulation benchmark (REconfigurable COntrol for Vehicle Emergency Return ? RECOVER) has been developed within the GARTEUR (Group for Aeronautical Research and Technology in Europe) Flight Mechanics Action Group 16 (FM-AG(16)) on Fault Tolerant Control (2004 2008) for the integrated evaluation of fault detection and identification (FDI) and reconfigurable flight control strategies. The benchmark includes a suitable set of assessment criteria and failure cases, bas...

Smaili, H.; Breeman, J.; Lombaerts, T.; Stroosma, O.

2013-01-01

254

The Fibonacci scheme for fault-tolerant quantum computation  

Digital Repository Infrastructure Vision for European Research (DRIVER)

We rigorously analyze Knill's Fibonacci scheme for fault-tolerant quantum computation, which is based on the recursive preparation of Bell states protected by a concatenated error-detecting code. We prove lower bounds on the threshold fault rate of .67\\times 10^{-3} for adversarial local stochastic noise, and 1.25\\times 10^{-3} for independent depolarizing noise. In contrast to other schemes with comparable proved accuracy thresholds, the Fibonacci scheme has a significantly...

Aliferis, Panos; Preskill, John

2008-01-01

255

Improvement of Matrix Converter Drive Reliability by Online Fault Detection and a Fault-Tolerant Switching Strategy.  

DEFF Research Database (Denmark)

The matrix converter system is becoming a very promising candidate to replace the conventional two-stage ac/dc/ac converter, but system reliability remains an open issue. The most common reliability problem is that a bidirectional switch has an open-switch fault during operation. In this paper, a matrix converter driving a speed-controlled permanent-magnet synchronous motor is examined under a single open-switch fault. First, a new fault-detection method is proposed using only the motor currents. Second, a novel fault-tolerant switching strategy is presented. By treating the matrix converter as a two-stage rectifier/inverter, existing modulation techniques for the inverter stage can be reused, whereas the rectifier stage is modified by control to counteract the fault. However, the proposed techniques require no additional hardware devices or circuit modifications to the matrix converter. Experimental results show that the proposed method can maintain the motor speed with a maximum ripple of 2%—a fivefold improvement over the uncompensated system. The proposed method therefore offers a very economical and effective solution for the matrix converter fault tolerance problem.

Nguyen-Duy, Khiem; Liu, Tian-Hua

2011-01-01

256

A Fault Tolerant Resource Allocation Architecture for Mobile Grid  

Directory of Open Access Journals (Sweden)

Full Text Available Problem statement: In order to achieve high level of reliability and availability, the grid infrastructure should be fault tolerant. Since the failure of resources affects job execution fatally, fault tolerance service is essential to satisfy QoS requirement in grid computing with respect to mobile nodes. Approach: We propose a fault tolerant technique for improving reliability in mobile grid environment considering the node mobility. The Cluster head and monitoring agent was designed in such a way it addresses both resource and network failure and present recovery techniques for overcoming the faults. Results: The proposed model achieves a identifiable performance when compared to the previous model (HRAA. By simulation results, we analyze the node and link failures on parameters such as delivery ratio, throughput and delay against the rate of success. Conclusion: The proposed fault tolerant approach checks for availability of the nodes with least work load for transferring the executed job to cluster head providing an alternate path in case of failure thereby enhancing the reliability of the grid environment.

P. T. Vanathi

2012-01-01

257

Particle Filter Based Fault-tolerant ROV Navigation using Hydro-acoustic Position and Doppler Velocity Measurements  

DEFF Research Database (Denmark)

This paper presents a fault tolerant navigation system for a remotely operated vehicle (ROV). The navigation system uses hydro-acoustic position reference (HPR) and Doppler velocity log (DVL) measurements to achieve an integrated navigation. The fault tolerant functionality is based on a modied particle lter. This particle lter is able to run in an asynchronous manner to accommodate the measurement drop out problem, and it overcomes the measurement outliers by switching observation models. Simulations with experimental data show that this fault tolerant navigation system can accurately estimate the ROV kinematic states, even when sensor failures appear frequently.

Zhao, Bo; Blanke, Mogens

2012-01-01

258

Fault-Tolerant Quantum Computation via Exchange interactions  

CERN Document Server

Quantum computation can be performed by encoding logical qubits into the states of two or more physical qubits, and controlling a single effective exchange interaction and possibly a global magnetic field. This "encoded universality" paradigm offers potential simplifications in quantum computer design since it does away with the need to perform single-qubit rotations. Here we show that encoded universality schemes can be combined with quantum error correction. In particular, we show explicitly how to perform fault-tolerant leakage correction, thus overcoming the main obstacle to fault-tolerant encoded universality.

Mohseni, M

2005-01-01

259

New Limits on Fault-Tolerant Quantum Computation  

CERN Document Server

We show that quantum circuits cannot be made fault-tolerant against a depolarizing noise level of approximately 45%, thereby improving on a previous bound of 50% (due to Razborov). Our precise quantum circuit model enables perfect gates from the Clifford group (CNOT, Hadamard, S, X, Y, Z) and arbitrary additional one-qubit gates that are subject to that much depolarizing noise. We prove that this set of gates cannot be universal for arbitrary (even classical) computation, from which the upper bound on the noise threshold for fault-tolerant quantum computation follows.

Buhrman, H; Laurent, M; Linden, N; Schrijver, A; Unger, F; Buhrman, Harry; Cleve, Richard; Laurent, Monique; Linden, Noah; Schrijver, Alexander; Unger, Falk

2006-01-01

260

Approximating Fault-Tolerant Group-Steiner Problems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper, we initiate the study of designing approximation algorithms for {sf Fault-Tolerant Group-Steiner} ({sf FTGS}) problems. The motivation is to protect the well-studied group-Steiner networks from edge or vertex failures. In {sf Fault-Tolerant Group-Steiner} problems, we are given a graph with edge- (or vertex-) costs, a root vertex, and a collection of subsets of vertices called groups. The objective is to find a minimum-cost subgraph that has two edge- (or vertex-...

Khandekar, Rohit; Kortsarz, Guy; Nutov, Zeev

2009-01-01

 
 
 
 
261

A Fault Tolerance Management Framework for Wireless Sensor Networks  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Wireless Sensor Networks (WSNs) have the potential of significantly enhancing our ability to monitor and interact with our physical environment. Realizing a fault tolerant operation is critical to the success of WSNs. The main challenge is providing fault tolerance (FT) while conserving the limited resources ...

Hesham El-Sayed; Adnan Agbariax; Mohamed Eltoweissy; Iman Salehy

2007-01-01

262

Fault Diagnosis and Accommodation of LTI systems by modified Youla parameterization  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper an Active Fault Tolerant Control (FTC) scheme is proposed for Linear Time Invariant (LTI) systems, which achieves fault diagnosis followed by fault accommodation. The fault diagnosis scheme is carried out in two steps; Fault detection followed by Fault isolation. Fault detection filter use the sensor measurements to generate residuals, which have a unique static pattern in response to each fault. Distortion in these static patterns generates the probability of the presence of fa...

Minupriya A; Kanthalakshmi, S.; Manikandan, V.

2012-01-01

263

Ethernet Implementation of Fault Tolerant Train Network for Entertainment and Mixed Control Traffic  

Directory of Open Access Journals (Sweden)

Full Text Available This paper studies the integration of the control system and entertainment on board of train wagons. Both the control and entertainment loads are implemented on top of Gigabit Ethernet, each with a dedicated controller/server. The control load has mixed sampling periods. It is proven that this system can tolerate the failure of one controller in one wagon. In a two wagon scenario, fault tolerance at the controller level is studied, and simulation results show that the system can tolerate the failure of 3 controllers. The system is successful in meeting the packet end-to-end delay with zero packet loss in all OPNET simulated scenarios. The maximum permissible entertainment load is determined for the fault tolerant scenarios.

Tarek K. Refaat

2013-01-01

264

Production of Reliable Flight Crucial Software: Validation Methods Research for Fault Tolerant Avionics and Control Systems Sub-Working Group Meeting  

Science.gov (United States)

The state of the art in the production of crucial software for flight control applications was addressed. The association between reliability metrics and software is considered. Thirteen software development projects are discussed. A short term need for research in the areas of tool development and software fault tolerance was indicated. For the long term, research in format verification or proof methods was recommended. Formal specification and software reliability modeling, were recommended as topics for both short and long term research.

Dunham, J. R. (editor); Knight, J. C. (editor)

1982-01-01

265

Design of Fault Tolerant Network Interfaces for NoCs  

DEFF Research Database (Denmark)

Networks-on-Chip (NoCs) appeared as a strategy to deal with the communication requirements of complex IP-based System-on-Chips. As the complexity of designs increases and the technology scales down into the deep-submicron domain, the probability of malfunctions and failures in the NoC components increases. This paper focuses on the study and evaluation of techniques for increasing reliability and resilience of Network Interfaces (NIs). NIs act as interfaces between IP cores and the communication infrastructure; a faulty behavior in them could affect therefore the overall system. In this work, we propose a functional fault model for the NI components, and we present a two-level fault tolerant solution that can be employed for mitigating the effects of both single-event upset soft errors and hard errors on the NI. Experiments show that with a limited overhead we can obtain a significant reliability of the NI, while saving up to 83% in area with respect to a standard Triple Modular Redundancy implementation, as well as a significant energy reduction.

Fiorin, Leandro; Micconi, Laura

2011-01-01

266

FAULT TOLERANCE USING CREDENTIALS MANAGEMENT IN ONLINE TRANSACTION APPLICATION  

Directory of Open Access Journals (Sweden)

Full Text Available Web applications play a vital role in the IT field for satisfying the web customer. The customer always depends on the online transaction processing system. The web application has various forms which gives a complete service to the customer. These various forms have options that are used to satisfy the customer’s needs because of the attraction over web sites existing in the global market. The traditional web pages will be closed from the current session whenever the customer selects an improper option because of single sign-on property. Selection of wrong option that is not suitable for the current session will lead to reliability problem. If the same user needs the same service, again he has to navigate from home page to the required page, thus adding up extra burden on customer. The customer session should be maintained properly, so that the customer’s satisfaction is retained over the online web application. The existing system classifies the user with their access level and also their fault level. The main objective of the proposed work is to manage the credential in all levels in order to keep the valuable customer for a long time of access in the current session. The credential management and session management are used to manage a multilevel credential from web client to web resource level and vice versa. The options selected by the customer can be classified based on the fault and type of access. The credential management also performs the maintenance process for fixing the fault tolerance level to the web user. A complete log is recorded to trace the overall process in the online transaction processing.

L. Javid Ali

2014-07-01

267

Design Approach for Fault Tolerance Algorithm in FPGA Architecture with BIST in Hardware Controller  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Redundancy based hardening techniques are applied at the pre-synthesis or synthesis level. To provide solutions for increasing the fault-tolerance capabilities with algorithms able to reduce sensitive configuration memory bits of FPGAs we use BIST method. While these systems frequently contain hardware redundancy to allow for continued operation in the presence of operational faults, the need to recover faulty hardware and return it to full functionality quickly and efficiently is great. In a...

Meshram, Shweta S.; Dahad, Sanjay O.; Belorkar, Ujwala A.

2011-01-01

268

A novel supervisory-based fault tolerant control: application to hydraulic process  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper, we demonstrate a performance-based supervisory approach to achieve fault tolerance that does not require any explicit fault-diagnosis module. Moreover, in our real-time approach the information about the plant is unavailable. The time-valued trajectories generated by the system determine the behavior of the plant-working mode. These trajectories are supposed to follow a certain desired behavior. Therefore, the trajectories when does not belong to that desired behavior assumes t...

Jain, Tushar; Yame?, Joseph Julien; Sauter, Dominique

2011-01-01

269

Fault tolerance and reliability in integrated ship control : the ATOMOS concept  

DEFF Research Database (Denmark)

Various strategies for achieving fault tolerance in large scale control systems are discussed. The positive and negative impacts of distribution through network communication are presented. The ATOMOS framework for standardized reliable marine automation is presented along with the corresponding reliability issues. A generic framework for simulation of network traffic under fault conditions is suggested and the first practical experiences from a prototype implementation are reported.

Nielsen, Jens Frederik Dalsgaard; Izadi-Zamanabadi, Roozbeh

2002-01-01

270

Fault tolerant computer for nuclear power plant applications  

International Nuclear Information System (INIS)

A quadruply redundant synchronous fault tolerant processor (FTP) is now under fabrication at the C.S. Draper Laboratory to be used initially as a trip monitor for the Experimental Breeder Reactor EBR-II operated by the Argonne National Laboratory in Idaho Falls, Idaho. The hardware architecture of this processor is described and certain issues unique to quadruply redundant computers are discussed

271

Fault-tolerant quantum computing with color codes  

CERN Document Server

We present and analyze protocols for fault-tolerant quantum computing using color codes. We present circuit-level schemes for extracting the error syndrome of these codes fault-tolerantly. We further present an integer-program-based decoding algorithm for identifying the most likely error given the syndrome. We simulated our syndrome extraction and decoding algorithms against three physically-motivated noise models using Monte Carlo methods, and used the simulations to estimate the corresponding accuracy thresholds for fault-tolerant quantum error correction. We also used a self-avoiding walk analysis to lower-bound the accuracy threshold for two of these noise models. We present and analyze two architectures for fault-tolerantly computing with these codes: one with 2D arrays of qubits are stacked atop each other and one in a single 2D substrate. Our analysis demonstrates that color codes perform slightly better than Kitaev's surface codes when circuit details are ignored. When these details are considered, w...

Landahl, Andrew J; Rice, Patrick R

2011-01-01

272

A Redundant Communication Approach to Scalable Fault Tolerance in PGAS Programming Models  

Energy Technology Data Exchange (ETDEWEB)

Recent trends in high-performance computing point towards increasingly large machines with millions of processing, storage, and networking elements. Unfortunately, the reliability of these machines is inversely proportional to their size, resulting in a system-wide mean-time-between-failures (MTBF) ranging from a few days to a few hours. As such, for long-running applications, the ability to efficiently recover from frequent failures is essential. Traditional forms of fault tolerance, such as checkpoint/restart, suffer from performance issues related to limited I/O and memory bandwidth. In this paper, we present a fault-tolerance mechanism that reduces the cost of failure recovery by maintaining shadow data structures and performing redundant remote memory accesses. We present results from a computational chemistry application running at scale to show that our techniques provide applications with a high degree of fault tolerance and low (2%--4%) overhead for 2048 processors.

Ali, Nawab; Krishnamoorthy, Sriram; Govind, Niranjan; Palmer, B. J.

2011-02-09

273

Reversible Logic Synthesis of Fault Tolerant Carry Skip BCD Adder  

CERN Document Server

Reversible logic is emerging as an important research area having its application in diverse fields such as low power CMOS design, digital signal processing, cryptography, quantum computing and optical information processing. This paper presents a new 4*4 parity preserving reversible logic gate, IG. The proposed parity preserving reversible gate can be used to synthesize any arbitrary Boolean function. It allows any fault that affects no more than a single signal readily detectable at the circuit's primary outputs. It is shown that a fault tolerant reversible full adder circuit can be realized using only two IGs. The proposed fault tolerant full adder (FTFA) is used to design other arithmetic logic circuits for which it is used as the fundamental building block. It has also been demonstrated that the proposed design offers less hardware complexity and is efficient in terms of gate count, garbage outputs and constant inputs than the existing counterparts.

Islam, Md Saiful; 10.3329/jbas.v32i2.2431

2010-01-01

274

Passive fault tolerant control of a double inverted pendulum - a case study  

DEFF Research Database (Denmark)

A passive fault tolerant control scheme is suggested, in which a nominal controller is augmented with an additional block, which guarantees stability and performance after the occurrence of a fault. The method is based on the YJBK parameterization, which requires the nominal controller to be implemented in observer based form. The proposed method is applied to a double inverted pendulum system, for which an H_inf controller has been designed and verified in a lab setup. In this case study, the fault is a degradation of the tacho loop.

Niemann, Hans Henrik; Stoustrup, Jakob

2005-01-01

275

Passive Fault tolerant Control of an Inverted Double Pendulum : A Case Study Example  

DEFF Research Database (Denmark)

A passive fault tolerant control scheme is suggested, in which a nominal controller is augmented with an additional block, which guarantees stability and performance after the occurrence of a fault. The method is based on the Youla parameterization, which requires the nominal controller to be implemented in the observer based form. The proposed method is applied to a double inverted pendulum system, for which an H controller has been designed and verified in a lap setup. In this case study, the fault is a degradation of the tacho loop.

Niemann, H.; Stoustrup, Jakob

2003-01-01

276

An Active Fault-Tolerant Control Method Ofunmanned Underwater Vehicles with Continuous and Uncertain Faults  

Directory of Open Access Journals (Sweden)

Full Text Available This paper introduces a novel thruster fault diagnosis and accommodation system for open-frame underwater vehicles with abrupt faults. The proposed system consists of two subsystems: a fault diagnosis subsystem and a fault accommodation sub-system. In the fault diagnosis subsystem a ICMAC(Improved Credit Assignment Cerebellar Model Articulation Controllers neural network is used to realize the on-line fault identification and the weighting matrix computation. The fault accommodation subsystem uses a control algorithm based on weighted pseudo-inverse to find the solution of the control allocation problem. To illustrate the proposed method effective, simulation example, under multi-uncertain abrupt faults, is given in the paper.

Yongsheng Yang

2008-11-01

277

Fault tolerant, reliable and scalable scientific ballooning control software  

Science.gov (United States)

The Universal Balloon Control Software package (UBCS) was first designed and developed for the ATIC experiment in 1997 and has evolved over the years into a highly reliable and adaptable control system. The system has logged thousands of hours of operation time on ATIC with few reboots and has been adapted for the HASP balloon payload which has had two successful flights in 2006 and 2007. The goal was to develop a UBCS that was fault tolerant and auto-recoverable while at the same time extremely reliable and scalable. In order to meet these goals, we designed a modular software system where each process was able to run in parallel with other processes on the same or different CPUs. These modular processes needed to be relatively independent; so that one process didn't rely on another in order to function. We chose QNX 4.25 as the operating system because of its multi-tasking abilities and the level of abstraction offered in communication between processes. Another key component in the UBCS, called the Buffer Process Group (BPG), was developed to de-couple processes from one another allowing each to operate independently. The BPG is a client/server process data port with a standardized interface allowing any given server to load records for access by an independent client at any given time. The BPG is capable of handling many data servers and clients simultaneously. Examples of data servers are the data acquisition process and housekeeping processes and examples of data clients are the archive process, the down link telemetry processes and the ground display processes. Together, the BPG process and the QNX 4.25 OS allow the UBCS to meet all of its design goals. In particular they allow the system to be highly fault tolerant and recoverable. A monitoring process is able to restart failed processes and reboot the computers on which they reside, if necessary. This allows the UBCS to recover from software errors or bugs as well as hardware glitches such as temporary power problems or single event upsets. During the presentation we will discuss in more detail how this software design is applicable to many different platforms and our plans for evolving the software package for future balloon experiments.

Stewart, Michael F.; Ellison, Steven B.; Isbert, Joachim; Granger, Doug; Guzik, T. Gregory; Wefel, John P.

278

Fault Tolerant Control of Wind Turbines : A benchmark model  

DEFF Research Database (Denmark)

This paper presents a test benchmark model for the evaluation of fault detection and accommodation schemes. This benchmark model deals with the wind turbine on a system level, and it includes sensor, actuator, and system faults, namely faults in the pitch system, the drive train, the generator, and the converter system. Since it is a system-level model, converter and pitch system models are simplified because these are controlled by internal controllers working at higher frequencies than the system model. The model represents a three-bladed pitch-controlled variable-speed wind turbine with a nominal power of 4.8 MW. The fault detection and isolation (FDI) problem was addressed by several teams, and five of the solutions are compared in the second part of this paper. This comparison relies on additional test data in which the faults occur in different operating conditions than in the test data used for the FDI design.

Odgaard, Peter Fogh; Stoustrup, Jakob

2013-01-01

279

Fault Tolerant Wind Farm Control : a Benchmark Model  

DEFF Research Database (Denmark)

This paper presents a test benchmark model for the evaluation of fault detection and accommodation schemes. This benchmark model deals with the wind turbine on a system level, and it includes sensor, actuator, and system faults, namely faults in the pitch system, the drive train, the generator, and the converter system. Since it is a system-level model, converter and pitch system models are simplified because these are controlled by internal controllers working at higher frequencies than the system model. The model represents a three-bladed pitch-controlled variable-speed wind turbine with a nominal power of 4.8 MW. The fault detection and isolation (FDI) problem was addressed by several teams, and five of the solutions are compared in the second part of this paper. This comparison relies on additional test data in which the faults occur in different operating conditions than in the test data used for the FDI design.

Odgaard, Peter Fogh; Stoustrup, Jakob

2013-01-01

280

Fault Tolerant Control Using Proportional-Integral-Derivative Controller Tuned by Genetic Algorithm  

Directory of Open Access Journals (Sweden)

Full Text Available Problem statement: The growing demand for reliability, maintainability and survivability in industrial processes has drawn significant research in fault detection and fault tolerant control domain. A fault is usually defined as an unexpected change in a system, such as component malfunction and variations in operating condition, which tends to degrade the overall system performance. The purpose of fault detection is to detect these malfunctions to take proper action in order to prevent faults from developing into a total system failure. Approach: In this study an effective integrated fault detection and fault tolerant control scheme was developed for a class of LTI system. The scheme was based on a Kalman filter for simultaneous state and fault parameter estimation, statistical decisions for fault detection and activation of controller reconfiguration. Proportional-Integral-Derivative (PID control schemes continue to provide the simplest and yet effective solutions to most of the control engineering applications today. Determination or tuning of the PID parameters continues to be important as these parameters have a great influence on the stability and performance of the control system. In this study GA was proposed to tune the PID controller. Results: The results reflect that proposed scheme improves the performance of the process in terms of time domain specifications, robustness to parametric changes and optimum stability. Also, A comparison with the conventional Ziegler-Nichols method proves the superiority of GA based system. Conclusion: This study demonstrates the effectiveness of genetic algorithm in tuning of a PID controller with optimum parameters. It is, moreover, proved to be robust to the variations in plant dynamic characteristics and disturbances assuring a parameter-insensitive operation of the process.

S. Kanthalakshmi

2011-01-01

 
 
 
 
281

Fault Tolerant Multiphase Electrical Drives: The Impact of Design  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper deals with fault tolerant multiphase electrical drives. The quality of the torque of a vector-controlled Permanent Magnet (PM) Synchronous Machine supplied by a multi-leg Voltage Source Inverter (VSI) is examined in normal operation and when one or two phases are open-circuited. It is then deduced that a seven-phase machine is a good compromise allowing high torque-to-volume density and easy control with smooth torque in fault operation. Experimental results confirm the predicted c...

Semail, Eric; Kestelyn, Xavier; Locment, Fabrice

2008-01-01

282

On Permutation Capabilities of Fault Tolerant Multistage Interconnection Networks  

Directory of Open Access Journals (Sweden)

Full Text Available In this paper permutation capabilities analysis of fault tolerant [1] Multistage Interconnection Networks have been presented. I have examined some popular FT(Four Tree[8], MFT(Modified Four Tree[2], PHI(Phi Network [11], NFT(New Four Tree[4], IFT(improved Four Tree[5], IASN(Irregular Augmented Shuffle[14] and IIASN(Improved Irregular Augmented Shuffle[3] networks which are irregular in nature[11].Permutation capabilities are measured in terms of incremental and identical basis by introducing various faults at the different stages of the networks.

Sandeep Sharma

2012-11-01

283

Improving the Navigability of a Hexapod Robot using a Fault-Tolerant Adaptive Gait  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper encompasses a study on the development of a walking gait for fault tolerant locomotion in unstructured environments. The fault tolerant gait for adaptive locomotion fulfills stability conditions in opposition to a fault (locked joints or sensor failure) event preventing a robot to realize stable locomotion over uneven terrains. To accomplish this feat, a fault tolerant gait based on force?position control is proposed in this paper for a hexapod robot to enable stable walking with...

Umar Asif

2012-01-01

284

Lightweight storage and overlay networks for fault tolerance.  

Energy Technology Data Exchange (ETDEWEB)

The next generation of capability-class, massively parallel processing (MPP) systems is expected to have hundreds of thousands to millions of processors, In such environments, it is critical to have fault-tolerance mechanisms, including checkpoint/restart, that scale with the size of applications and the percentage of the system on which the applications execute. For application-driven, periodic checkpoint operations, the state-of-the-art does not provide a scalable solution. For example, on today's massive-scale systems that execute applications which consume most of the memory of the employed compute nodes, checkpoint operations generate I/O that consumes nearly 80% of the total I/O usage. Motivated by this observation, this project aims to improve I/O performance for application-directed checkpoints through the use of lightweight storage architectures and overlay networks. Lightweight storage provide direct access to underlying storage devices. Overlay networks provide caching and processing capabilities in the compute-node fabric. The combination has potential to signifcantly reduce I/O overhead for large-scale applications. This report describes our combined efforts to model and understand overheads for application-directed checkpoints, as well as implementation and performance analysis of a checkpoint service that uses available compute nodes as a network cache for checkpoint operations.

Oldfield, Ron A.

2010-01-01

285

Evaluation Of Fault-Tolerant Policies Using Simulation  

Energy Technology Data Exchange (ETDEWEB)

Various mechanisms for fault-tolerance (FT) are used today in order to reduce the impact of failures on application execution. In the case of system failure, standard FT mechanisms are checkpoint/restart (for reactive FT) and migration (for pro-active FT). However, each of these mechanisms create an overhead on application execution, overhead that for instance becomes critical on large-scale systems where previous studies have shown that applications may spend more time checkpointing state than performing useful work. In order to decrease this overhead, researchers try to both optimize existing FT mechanisms and implement new FT policies. For instance, combining reactive and pro-active approaches in order to decrease the number of checkpoints that must be performed during the application 's execution. However, currently no solutions exist which enable the evaluation of these FT approaches through simulation, instead experimentations must be done using real platforms. This increases complexity and limits experimentation into alternate solutions. This paper presents a simulation framework that evaluates different FT mechanisms and policies. The framework uses system failure logs for the simulation with a default behavior based on logs taken from the ASCI White at Lawrence Livermore National Laboratory. We evaluate the accuracy of our simulator comparing simulated results with those taken from experiments done on a 32-node compute cluster. Therefore such a simulator can be used to develop new FT policies and/or to tune existing policies.

Tikotekar, Anand A [ORNL; Vallee, Geoffroy R [ORNL; Naughton, III, Thomas J [ORNL; Scott, Stephen L [ORNL

2007-01-01

286

Design of a fault-tolerant controller for the SP-100 space reactor  

International Nuclear Information System (INIS)

The control system of an SP-100 space reactor is a key element of space reactor design to meet the space mission requirements of safety, reliability, and life expectancy. In this work, a fault-tolerant controller (FTC) is developed to control the thermoelectric (TE) power in the SP-100 space reactor. A fault-tolerant controller makes the control system stable and retains acceptable performance even under system faults. The objectives of the proposed model predictive controller are to minimize both the difference between the predicted TE power and the desired power, and the variation of control drum angle that adjusts the control reactivity. Also, the objectives are subject to constraints of maximum and minimum control drum angle and maximum drum angle variation speed. The model predictive controller incorporates a fault detection and diagnostics algorithm so that the controller can work properly even under input and output measurement faults. A lumped parameter simulation model of the SP-100 nuclear space reactor is used to verify the proposed controller design. Simulation result show that the TE generator power level, regulated by the proposed controller, could track the target power level effectively even under measurement faults, satisfying all control constraints. (authors)

287

Compilation and Synthesis for Fault-Tolerant Digital Microfluidic Biochips  

DEFF Research Database (Denmark)

Microfluidic-based biochips are replacing the conventional biochemical analyzers, by integrating all the necessary functions for biochemical analysis using microfluidics. The digital microfluidic biochips (DMBs) manipulate discrete amounts of fluids of nanoliter volume, named droplets, on an array of electrodes to perform operations such as dispensing, transport, mixing, split, dilution and detection. Researchers have proposed compilation approaches, which, starting from a biochemical application and a biochip architecture, determine the allocation, resource binding, scheduling, placement and routing of the operations in the application. During the execution of a bioassay, operations could experience transient faults, thus impacting negatively the correctness of the application. We have proposed both offline (design time) and online (runtime) recovery strategies. The online recovery strategy decides the introduction of the redundancy required for fault-tolerance. We consider both time redundancy, i.e., re-executing erroneous operations, and space redundancy, i.e., creating redundant droplets for fault-tolerance. Error recovery is performed such that the number of transient faults tolerated is maximized and the timing constraints of the biochemical application are satisfied. Previous work has assumed that the biochip architecture is given, and most approaches consider a rectangular shape for the electrode array, where operations execute on rectangular “modules” formed of electrodes. However, non-regular application-specific architectures are common in practice. Hence, we have proposed an approach to the synthesis of application-specific architectures, such that the cost is minimized and the timing constraints of the application are satisfied. We propose an algorithm to build a library of non-regular modules for a given applicationspecific architecture, so that the area of a non-regular application-specific biochip can be used effectively. During fabrication, DMBs can be affected by permanent faults, which may lead to the failure of the application. Our approach introduces redundant electrodes to synthesize fault-tolerant architectures aiming at increasing the yield of DMBs. We also propose a method to estimate, at design time, the application completion time in case of permanent faults in order to verify if an application can be successfully run on the architecture. The proposed approaches were evaluated using several real-life case studies and synthetic benchmarks.

Alistar, Mirela

2014-01-01

288

Logic Synthesis for Fault-Tolerant Quantum Computers  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Efficient constructions for quantum logic are essential since quantum computation is experimentally challenging. This thesis develops quantum logic synthesis as a paradigm for reducing the resource overhead in fault-tolerant quantum computing. The model for error correction considered here is the surface code. After developing the theory behind general logic synthesis, the resource costs of magic-state distillation for the $T = \\exp(i \\pi (I-Z)/8)$ gate are quantitatively an...

Jones, N. Cody

2013-01-01

289

FAULT TOLERANT SCHEDULING STRATEGY FOR COMPUTATIONAL GRID ENVIRONMENT  

Directory of Open Access Journals (Sweden)

Full Text Available Computational grids have the potential for solving large-scale scientific applications using heterogeneous and geographically distributed resources. In addition to the challenges of managing and scheduling these applications, reliability challenges arise because of the unreliable nature of grid infrastructure. Two major problems that are critical to the effective utilization of computational resources are efficient scheduling of jobs and providing fault tolerance in a reliable manner. This paper addresses these problems by combining the checkpoint replication based fault tolerance echanism with Minimum Total Time to Release (MTTR job scheduling algorithm. TTR includes the service time of the job, waiting time in the queue, transfer of input and output data to and from the resource. The MTTR algorithm minimizes the TTR by selecting a computational resource based on job requirements, job characteristics and hardware features of the resources. The fault tolerance mechanism used here sets the job checkpoints based on the resource failure rate. If resource failure occurs, the job is restarted from its last successful state using a checkpoint file from another grid resource. Acritical aspect for an automatic recovery is the availability of checkpoint files. A strategy to increase the availability of checkpoints is replication. Replica Resource Selection Algorithm (RRSA is proposed to provide Checkpoint Replication Service (CRS. Globus Tool Kit is used as the grid middleware to set up a grid environment and evaluate the performance of the proposed approach. The monitoring tools Ganglia and NWS (Network Weather Service are used to gather hardware and network details respectively. The experimental results demonstrate that, the proposed approach effectively schedule the grid jobs with fault tolerant way thereby reduces TTR of the jobs submitted in the grid. Also, it increases the percentage of jobs completed within specified deadline and making the grid trustworthy.

MALARVIZHI NANDAGOPAL,

2010-09-01

290

Analysis and design of Fault-Tolerant drives  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The field of fault-tolerant applications is surely among the most exciting and potentially innovative modern research of the electrical motor where the design is freedom and new solution can be explored. The cost of the permanent magnets and the drives allow to develop new solution, in particular surface mounted permanent magnet machine with fractional-slot winding and reluctance motor assisted from the permanent magnet. The reliability of these machines allows to apply these motors into crit...

Dai Pre, Michele

2008-01-01

291

Resource optimization for fault-tolerant quantum computing  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this thesis we examine a variety of techniques for reducing the resources required for fault-tolerant quantum computation. First, we show how to simplify universal encoded computation by using only transversal gates and standard error correction procedures, circumventing existing no-go theorems. We then show how to simplify ancilla preparation, reducing the cost of error correction by more than a factor of four. Using this optimized ancilla preparation, we develop improve...

Paetznick, Adam

2014-01-01

292

New Limits on Fault-Tolerant Quantum Computation  

Digital Repository Infrastructure Vision for European Research (DRIVER)

We show that quantum circuits cannot be made fault-tolerant against a depolarizing noise level of approximately 45%, thereby improving on a previous bound of 50% (due to Razborov). Our precise quantum circuit model enables perfect gates from the Clifford group (CNOT, Hadamard, S, X, Y, Z) and arbitrary additional one-qubit gates that are subject to that much depolarizing noise. We prove that this set of gates cannot be universal for arbitrary (even classical) computation, fr...

Buhrman, Harry; Cleve, Richard; Laurent, Monique; Linden, Noah; Schrijver, Alexander; Unger, Falk

2006-01-01

293

Automatic Abstraction and Fault Tolerance in Cortical Microachitectures  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Recent advances in the neuroscientific understanding of the brain are bringing about a tantalizing opportunity for building synthetic machines that perform computation in ways that differ radically from traditional Von Neumann machines. These brain-like architectures, which are premised on our understanding of how the human neocortex computes, are highly fault-tolerant, averaging results over large numbers of potentially faulty components, yet manage to solve very difficult problems more reli...

Hashmi, Atif; Berry, Hugues; Temam, Olivier; Lipasti, Mikko

2011-01-01

294

Bayesian reliability assessment of legacy safety-critical systems upgraded with fault-tolerant off-the-shelf software  

International Nuclear Information System (INIS)

This paper presents a new way of applying Bayesian assessment to systems, which consist of many components. Full Bayesian inference with such systems is problematic, because it is computationally hard and, far more seriously, one needs to specify a multivariate prior distribution with many counterintuitive dependencies between the probabilities of component failures. The approach taken here is one of decomposition. The system is decomposed into partial views of the systems or part thereof with different degrees of detail and then a mechanism of propagating the knowledge obtained with the more refined views back to the coarser views is applied (recalibration of coarse models). The paper describes the recalibration technique and then evaluates the accuracy of recalibrated models numerically on contrived examples using two techniques: u-plot and prequential likelihood, developed by others for software reliability growth models. The results indicate that the recalibrated predictions are often more accurate than the predictions obtained with the less detailed models, although this is not guaranteed. The techniques used to assess the accuracy of the predictions are accurate enough for one to be able to choose the model giving the most accurate prediction

295

A model–based approach to fault–tolerant control  

DEFF Research Database (Denmark)

A model-based controller architecture for Fault-Tolerant Control (FTC) is presented in this paper. The controller architecture is based on a general controller parameterization. The FTC architecture consists of two main parts, a Fault Detection and Isolation (FDI) part and a controller reconfiguration part. The theoretical basis for the architecture is given followed by an investigation of the single parts in the architecture. It is shown that the general controller parameterization is central in connection with both fault diagnosis as well as controller reconfiguration. Especially in relation to the controller reconfiguration part, the application of controller parameterization results in a systematic technique for switching between different controllers. This also allows controller switching using different sets of actuators and sensors.

Niemann, Hans Henrik

2012-01-01

296

A Reliable and Fault Tolerant Routing for Optical WDM Networks  

CERN Document Server

In optical WDM networks, since each lightpath can carry a huge mount of traffic, failures may seriously damage the end user applications. Hence fault tolerance becomes an important issue on these networks. The light path which carries traffic during normal operation is called as primary path. The traffic is rerouted on a backup path in case of a failure. In this paper we propose to design a reliable and fault tolerant routing algorithm for establishing primary and backup paths. In order to establish the primary path, this algorithm uses load balancing in which link cost metrics are estimated based on the current load of the links. In backup path setup, the source calculates the blocking probability through the received feedback from the destination by sending a small fraction of probe packets along the existing paths. It then selects the optimal light path with the lowest blocking probability. Based on the simulation results, we show that the reliable and fault tolerant routing algorithm reduces the blocking ...

Ramesh, G

2009-01-01

297

Unconstrained and Constrained Fault-Tolerant Resource Allocation  

CERN Document Server

First, we study the Unconstrained Fault-Tolerant Resource Allocation (UFTRA) problem (a.k.a. FTFA problem in \\cite{shihongftfa}). In the problem, we are given a set of sites equipped with an unconstrained number of facilities as resources, and a set of clients with set $\\mathcal{R}$ as corresponding connection requirements, where every facility belonging to the same site has an identical opening (operating) cost and every client-facility pair has a connection cost. The objective is to allocate facilities from sites to satisfy $\\mathcal{R}$ at a minimum total cost. Next, we introduce the Constrained Fault-Tolerant Resource Allocation (CFTRA) problem. It differs from UFTRA in that the number of resources available at each site $i$ is limited by $R_{i}$. Both problems are practical extensions of the classical Fault-Tolerant Facility Location (FTFL) problem \\cite{Jain00FTFL}. For instance, their solutions provide optimal resource allocation (w.r.t. enterprises) and leasing (w.r.t. clients) strategies for the cont...

Liao, Kewen

2011-01-01

298

Lecture Notes : Practical Approach to Reliability, Safety, and Active Fault-tolerance  

DEFF Research Database (Denmark)

"The fundamental objective of the combined safety and Reliability assessment is to identify critical items in the design and the choice of equipment that may jeopardize safety or availability, and thereby to provide arguments for the selection between different options for the system." Achieving safety and reliability has been one the prime objectives for system designers while designing safety critical system for decades. With growing environmental awareness, concerns, and demands, the scope of the design of reliable (and safe) systems has been enhanced to even small components as sensors and actuators. In the past, the normal procedure to address the higher demand for reliability was to add hardware redundancy that in turn increases the production and maintenance costs. Active fault-tolerant design is an attempt to achieve higher redundancy while minimizing the costs. In chapter 2 reliability and safety related issues are considered and described. The idea of introducing this chapter is to provide an overview of the concepts and methods used for reliability and safety assessment. The focus in chapter 3 is on fault-tolerance concept. Type of possible faults in components and customary methods for applying redundancy is described. Finally, the chapter is wrapped up by considering and describing the main subject, which is a formal and consistent procedure to design active fault-tolerant systems

Izadi-Zamanabadi, Roozbeh

2000-01-01

299

Blue Waters Petascale Workshop Series - Fault Tolerance for Extreme-Scale Computing Workshop  

Science.gov (United States)

The purpose of this workshop is to discuss fault-tolerance (FT) on large systems for running large, possibly long-running applications. The main point of the workshop is to have systems people, middleware people (include FT experts), and apps people talk about the issues and figure out what needs to be done, mostly at the middleware and app levels, to run such apps on the coming petascale systems, without having faults that cause large numbers of application failures. The workshop was held during March 19-20, 2009.

300

Empirical Study of FFANN Tolerance to Weight Stuck at Max/Min Fault  

Directory of Open Access Journals (Sweden)

Full Text Available Fault tolerance property of artificial neural networks has been investigatedwith reference to the hardware model of artificial neural networks. Weightfault is an important link, which causes breakup between two nodes. In thispaper three types of weight faults have been explained. Experiments have beenperformed to demonstrate fault tolerance behavior of feedforward artificialneural network for weight-stuck-MAX/MIN fault. Effect of weight-stuck-MAX/MIN fault on trained network has been analyzed in this paper. Theobtained results suggest that networks are not fault tolerant to this type offault.

Amit Prakash Singh

2010-04-01

 
 
 
 
301

Task-based Dynamic Fault Tolerance for Humanoid Robot Applications and Its Hardware Implementation  

Directory of Open Access Journals (Sweden)

Full Text Available This paper presents a new fault tolerance scheme suitable for humanoid robot applications. In the future, various tasks ranging from daily chores to safety-related tasks will be carried out by individual humanoid robots. If the importance of the tasks is different, the required dependability will vary accordingly. Therefore, for mobile humanoid robots operating under power constraints, fault tolerance that dynamically changes based on the importance of the tasks is desirable because fault-tolerant designs involving hardware redundancy are power intensive. In the proposed fault tolerance scheme, a duplex computer system switches between hot standby and cold standby according to each individual task. However, in mobile humanoid robots, a safety issue arises when cold standby is used for the standby computer unit. Since an unpowered unit cannot immediately start to operate, a biped-walking robot falls down when failover occurs during cold standby. This paper proposes a safety failover method to resolve this issue and describes the hardware design of the safety failover subsystem.

Masayuki Murakami

2008-08-01

302

Fault Diagnosis and Accommodation of LTI systems by modified Youla parameterization  

Directory of Open Access Journals (Sweden)

Full Text Available In this paper an Active Fault Tolerant Control (FTC scheme is proposed for Linear Time Invariant (LTI systems, which achieves fault diagnosis followed by fault accommodation. The fault diagnosis scheme is carried out in two steps; Fault detection followed by Fault isolation. Fault detection filter use the sensor measurements to generate residuals, which have a unique static pattern in response to each fault. Distortion in these static patterns generates the probability of the presence of fault. The fault accommodation scheme is carried out using the Generalized Internal Model Control (GIMC architecture, also known as modified Youla parameterization. In addition, performance indices are also evaluated to indicate that the resulting fault tolerant scheme can detect, identify and accommodate actuator and sensor faults under additive faults. The DC motor example is considered for the demonstration of the proposed scheme.

Minupriya A

2012-06-01

303

Fault-tolerant Control of Unmanned Underwater Vehicles with Continuous Faults: Simulations and Experiments  

Directory of Open Access Journals (Sweden)

Full Text Available A novel thruster fault diagnosis and accommodation method for open-frame underwater vehicles is presented in the paper. The proposed system consists of two units: a fault diagnosis unit and a fault accommodation unit. In the fault diagnosis unit an ICMAC (Improved Credit Assignment Cerebellar Model Articulation Controllers neural network information fusion model is used to realize the fault identification of the thruster. The fault accommodation unit is based on direct calculations of moment and the result of fault identification is used to find the solution of the control allocation problem. The approach resolves the continuous faulty identification of the UV. Results from the experiment are provided to illustrate the performance of the proposed method in uncertain continuous faulty situation.

Qian Liu

2010-02-01

304

Fault diagnosis of nuclear logging system  

International Nuclear Information System (INIS)

In order to diagnose and remove expressly the faults of nuclear logging system, the fault diagnosis method based on fault tree, fuzzy logic and expert system are submitted, The given live examples show that the fault diagnosis method can satisfy the need of fault diagnosis and removing of working field, the developing direction of fault diagnosis in logging system is given out. (authors)

305

Unitary reflection groups for quantum fault tolerance  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper explores the representation of quantum computing in terms of unitary reflections (unitary transformations that leave invariant a hyperplane of a vector space). The symmetries of qubit systems are found to be supported by Euclidean real reflections (i.e., Coxeter groups) or by specific imprimitive reflection groups, introduced (but not named) in a recent paper [Planat M and Jorrand Ph 2008, {\\it J Phys A: Math Theor} {\\bf 41}, 182001]. The automorphisms of multiple qubit systems are...

Planat, Michel; Kibler, Maurice

2010-01-01

306

Sensor and sensorless fault tolerant control for induction motors using a wavelet index.  

Science.gov (United States)

Fault Tolerant Control (FTC) systems are crucial in industry to ensure safe and reliable operation, especially of motor drives. This paper proposes the use of multiple controllers for a FTC system of an induction motor drive, selected based on a switching mechanism. The system switches between sensor vector control, sensorless vector control, closed-loop voltage by frequency (V/f) control and open loop V/f control. Vector control offers high performance, while V/f is a simple, low cost strategy with high speed and satisfactory performance. The faults dealt with are speed sensor failures, stator winding open circuits, shorts and minimum voltage faults. In the event of compound faults, a protection unit halts motor operation. The faults are detected using a wavelet index. For the sensorless vector control, a novel Boosted Model Reference Adaptive System (BMRAS) to estimate the motor speed is presented, which reduces tuning time. Both simulation results and experimental results with an induction motor drive show the scheme to be a fast and effective one for fault detection, while the control methods transition smoothly and ensure the effectiveness of the FTC system. The system is also shown to be flexible, reverting rapidly back to the dominant controller if the motor returns to a healthy state. PMID:22666016

Gaeid, Khalaf Salloum; Ping, Hew Wooi; Khalid, Mustafa; Masaoud, Ammar

2012-01-01

307

Fault Tolerant Circuit Design Using Evolutionary Algorithms  

Digital Repository Infrastructure Vision for European Research (DRIVER)

With the rapid development of semiconductor technology and the increasing proliferation of emission sources, digital circuits are frequently used in harsh electromagnetic environments. Electrostatic Discharge (ESD) interferences are gradually gaining prominence, resulting in performance degradations, malfunctions and disturbances in component or system level applications. Conventional solutions to such problem are shielding, filtering and grounding. This paper presents an ...

Hui-Cong Wu

2014-01-01

308

An Adaptive Job Scheduling with efficient Fault Tolerance Strategy in Computational Grid  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Grid computing is an emerging technology which has the potential to solve large scale scientific problems in an integrated heterogeneous environment. However, in the grid computing environment there are certain aspects which reduces efficiency of the system. Scheduling the jobs to the best suited resources, achieving the load balancing and fault tolerance are the key aspects to improve the efficiency and to exploit the capabilities of emergent computational systems. Because of dynamic and ...

Gokuldev, S.; Radhakrishnan, R.

2014-01-01

309

Fault tolerant workflow scheduling based on replication and resubmission of tasks in Cloud Computing  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The aim of workflow scheduling system is to schedule the workflows within the user given deadline to achieve a good success rate. Workflow is a set of tasks processed in a predefined order based on its data and control dependency. Scheduling these workflows in a computing environment, like cloud environment, is an NP-Complete problem and it becomes more challenging when failures of tasks areconsidered. To overcome these failures, the workflow scheduling system should be fault tolerant. In thi...

Jayadivya S K; Jaya Nirmala S; Mary Saira Bhanu S

2012-01-01

310

Byzantine Fault Tolerance of Regenerating Codes  

CERN Document Server

Recent years have witnessed a slew of coding techniques custom designed for networked storage systems. Network coding inspired regenerating codes are the most prolifically studied among these new age storage centric codes. A lot of effort has been invested in understanding the fundamental achievable trade-offs of storage and bandwidth usage to maintain redundancy in presence of different models of failures, showcasing the efficacy of regenerating codes with respect to traditional erasure coding techniques. For practical usability in open and adversarial environments, as is typical in peer-to-peer systems, we need however not only resilience against erasures, but also from (adversarial) errors. In this paper, we study the resilience of generalized regenerating codes (supporting multi-repairs, using collaboration among newcomers) in the presence of two classes of Byzantine nodes, relatively benign selfish (non-cooperating) nodes, as well as under more active, malicious polluting nodes. We give upper bounds on t...

Oggier, Frédérique

2011-01-01

311

Selecting Fault Tolerant Styles for Third-Party Components with Model Checking Support  

Digital Repository Infrastructure Vision for European Research (DRIVER)

To build highly available or reliable applications out of unreliable third-party components, some software-implemented fault-tolerant mechanisms are introduced to gracefully deal with failures in the components. In this paper, we address an important issue in the approach: how to select the most suitable fault-tolerant mechanisms for a given application in a specific context. To alleviate the difficulty in the selection, these mechanisms are abstracted as Fault-tolerant styles (FTSs) at first...

Li, Junguo; Chen, Xiangping; Huang, Gang; Hong, Mei; Chauvel, Franck

2009-01-01

312

Fault Tolerant Electrical Machines. State of the Art and Future Directions  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Nowadays the evolution of electrical engineering achieved a successful expansion in the area of fault tolerant electrical machines. To achieve fault tolerance researchers tried to design various geometries and different electrical drives. When new designers are intended to be performed the knowledge of the actualstate of the work is impetuously needed. The paper summarizes the most important information on these topics. Both fault tolerant machine and drive structure were taken into accounts....

Ruba, Mircea; Szabo?, Lora?nd

2008-01-01

313

Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessors  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Continued CMOS scaling is expected to make future micro-processors susceptible to transient faults, hard faults, manufacturing defects and process variations causing fault tolerance to become important even for general purpose processors targeted at the commodity market. Tomitigate the effect of decreased reliability, a number of fault-tolerant architectures have been proposed that exploit the natural coarse-grained redundancy available in chip multiprocessors (CMPs). These architectures exec...

Subramanyan, Pramod; Singh, Virendra; Saluja, Kewal K.; Larsson, Erik

2010-01-01

314

Fault Tolerant Electrical Machines. State of the Art and Future Directions  

Directory of Open Access Journals (Sweden)

Full Text Available Nowadays the evolution of electrical engineering achieved a successful expansion in the area of fault tolerant electrical machines. To achieve fault tolerance researchers tried to design various geometries and different electrical drives. When new designers are intended to be performed the knowledge of the actualstate of the work is impetuously needed. The paper summarizes the most important information on these topics. Both fault tolerant machine and drive structure were taken into accounts. In the paper also a new idea for a fault tolerant switched reluctance machine having a special winding is presented. The future tasks to be performed are also mentioned in the paper.

Mircea RUBA

2008-05-01

315

Gain-Scheduled Fault Tolerance Control Under False Identification  

Science.gov (United States)

An active fault tolerant control (FTC) law is generally sensitive to false identification since the control gain is reconfigured for fault occurrence. In the conventional FTC law design procedure, dynamic variations due to false identification are not considered. In this paper, an FTC synthesis method is developed in order to consider possible variations of closed-loop dynamics under false identification into the control design procedure. An active FTC synthesis problem is formulated into an LMI optimization problem to minimize the upper bound of the induced-L2 norm which can represent the worst-case performance degradation due to false identification. The developed synthesis method is applied for control of the longitudinal motions of FASER (Free-flying Airplane for Subscale Experimental Research). The designed FTC law of the airplane is simulated for pitch angle command tracking under a false identification case.

Shin, Jong-Yeob; Belcastro, Christine (Technical Monitor)

2006-01-01

316

Robust fault-tolerant control for a biped robot using a recurrent cerebellar model articulation controller.  

Science.gov (United States)

A design technique of a recurrent cerebellar model articulation controller (RCMAC)-based fault-tolerant control (FTC) system is investigated to rectify the nonlinear faults of a biped robot. The proposed RCMAC-based FTC (RCFTC) scheme contains two components: 1) an online fault estimation module based on an RCMAC is used to provide approximation information for any nonnominal behavior due to the system failure and modeling error of the biped robot; and 2) a controller module consisting of a computed torque controller and a robust FTC is utilized to achieve FTC. In the controller module, the computed torque controller reveals a basic stabilizing controller to stabilize the system, and the robust FTC is utilized to compensate for the effects of the system failure so as to achieve fault accommodation. The adaptive laws of the RCFTC system are rigorously established based on the Lyapunov function, so that the stability of the system can be guaranteed. Finally, two simulation cases of a biped robot are presented to illustrate the effectiveness of the proposed design method. Simulation results show that the RCFTC system can effectively recover the control performance for the system in the presence of the nonlinear faults and modeling uncertainties. PMID:17278565

Lin, Chih-Min; Chen, Chiu-Hsiung

2007-02-01

317

Checkpoint and Replication Oriented Fault Tolerant Mechanism for MapReduce Framework  

Directory of Open Access Journals (Sweden)

Full Text Available MapReduce is an emerging programming paradigm and an associated implementation for processing and generating big data which has been widely applied in data-intensive systems. In cloud environment, node and task failure is no longer accidental but a common feature of large-scale systems. In MapReduce framework, although the rescheduling based fault-tolerant method is simple to implement, it failed to fully consider the location of distributed data, the computation and storage overhead. Thus, a single node failure will increase the completion time dramatically. In this paper, a Checkpoint and Replication Oriented Fault Tolerant scheduling algorithm (CROFT is proposed, which takes both task and node failure into consideration. Preliminary experiments show that with less storage and network overhead. CROFT will significantly reduce the completion time at failure time, and the overall performance of MapReduce can be improved at least over 30% than original mechanism in Hadoop.  

Yang Liu

2013-09-01

318

2009 fault tolerance for extreme-scale computing workshop, Albuquerque, NM - March 19-20, 2009.  

Energy Technology Data Exchange (ETDEWEB)

This is a report on the third in a series of petascale workshops co-sponsored by Blue Waters and TeraGrid to address challenges and opportunities for making effective use of emerging extreme-scale computing. This workshop was held to discuss fault tolerance on large systems for running large, possibly long-running applications. The main point of the workshop was to have systems people, middleware people (including fault-tolerance experts), and applications people talk about the issues and figure out what needs to be done, mostly at the middleware and application levels, to run such applications on the emerging petascale systems, without having faults cause large numbers of application failures. The workshop found that there is considerable interest in fault tolerance, resilience, and reliability of high-performance computing (HPC) systems in general, at all levels of HPC. The only way to recover from faults is through the use of some redundancy, either in space or in time. Redundancy in time, in the form of writing checkpoints to disk and restarting at the most recent checkpoint after a fault that cause an application to crash/halt, is the most common tool used in applications today, but there are questions about how long this can continue to be a good solution as systems and memories grow faster than I/O bandwidth to disk. There is interest in both modifications to this, such as checkpoints to memory, partial checkpoints, and message logging, and alternative ideas, such as in-memory recovery using residues. We believe that systematic exploration of these ideas holds the most promise for the scientific applications community. Fault tolerance has been an issue of discussion in the HPC community for at least the past 10 years; but much like other issues, the community has managed to put off addressing it during this period. There is a growing recognition that as systems continue to grow to petascale and beyond, the field is approaching the point where we don't have any choice but to address this through R&D efforts.

Katz, D. S.; Daly, J.; DeBardeleben, N.; Elnozahy, M.; Kramer, B.; Lathrop, S.; Nystrom, N.; Milfeld, K.; Sanielevici, S.; Scott, S.; Votta, L.; Louisiana State Univ.; Center for Exceptional Computing; LANL; IBM; Univ. of Illinois; Shodor Foundation; Pittsburgh Supercomputer Center; Texas Advanced Computing Center; ORNL; Sun Microsystems

2009-02-01

319

Decoherence-Free Subspaces for Multiple-Qubit Errors (II) Universal, Fault-Tolerant Quantum Computation  

CERN Document Server

Decoherence-free subspaces (DFSs) shield quantum information from errors induced by the interaction with an uncontrollable environment. Here we study a model of correlated errors forming an Abelian subgroup (stabilizer) of the Pauli group (the group of tensor products of Pauli matrices). Unlike previous studies of DFSs, this type of errors does not involve any spatial symmetry assumptions on the system-environment interaction. We solve the problem of universal, fault-tolerant quantum computation on the associated class of DFSs.

Lidar, D A; Kempe, J; Whaley, K B; Lidar, Daniel A.; Bacon, David; Kempe, Julia

2001-01-01

320

RSFTS: RULE-BASED SEMANTIC FAULT TOLERANT SCHEDULING FOR CLOUD ENVIRONMENT  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Cloud computing has emerged as one of the latest technologies for delivering on-demand sophisticated services over the Internet. To make effective use of tremendous capabilities of the cloud, efficient scheduling algorithms are required. While concerning on large scale system, fault tolerance is a very critical issue, since the cloud resources are extensively disseminated among diverse locations. This leads to a higher probability of failures while solving huge problems, thus the cloud ser...

Pandeeswari R; Mohamadi Begum. Y

2013-01-01

 
 
 
 
321

A Model for Variation- and Fault-Tolerant Digital Logic using Self-Assembled Nanowire Architectures  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Reconfiguration has been used for both defect- and fault-tolerant nanoscale architectures with regular structure. Recent advances in self-assembled nanowires have opened doors to a new class of electronic devices with irregular structure. For such devices, reservoir computing has been shown to be a viable approach to implement computation. This approach exploits the dynamical properties of a system rather than specifics of its structure. Here, we extend a model of reservoir ...

Goudarzi, Alireza; Lakin, Matthew R.; Stefanovic, Darko; Teuscher, Christof

2014-01-01

322

Design and Analysis of Linear Fault-Tolerant Permanent-Magnet Vernier Machines  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper proposes a new linear fault-tolerant permanent-magnet (PM) vernier (LFTPMV) machine, which can offer high thrust by using the magnetic gear effect. Both PMs and windings of the proposed machine are on short mover, while the long stator is only manufactured from iron. Hence, the proposed machine is very suitable for long stroke system applications. The key of this machine is that the magnetizer splits the two movers with modular and complementary structures. Hence, the proposed mach...

Liang Xu; Jinghua Ji; Guohai Liu; Yi Du; Hu Liu

2014-01-01

323

An Efficient Fault Tolerant Approach to Resource Discovery in P2P Networks  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Current research into resource discovery in peer-topeer networks is largely focussed on the use of Distributed Hash Tables and multi-layered topologies. In this paper we present a resource discovery system capable of resolving keyword-value pair queries, based on a two-layered Chord ring architecture. We show how the base topology augmented with shortcuts between layers and selection of ring member nodes provides performance benefits and the potential for increased fault tolerance over oth...

Salter, J.; Antonopoulos, N.

2004-01-01

324

The use of hybrid automata for fault-tolerant vibration control for parametric failures  

Science.gov (United States)

The purpose of this work is to make use of hybrid automata for vibration control reconfiguration under system failures. Fault detection and isolation (FDI) filters are used to monitor an active vibration control system. When system failures occur (specifically parametric faults) the FDI filters detect and identify the specific failure. In this work we are specifically interested in parametric faults such as changes in system physical parameters; however this approach works equally well with additive faults such as sensor or actuator failures. The FDI filter output is used to drive a hybrid automaton, which selects the appropriate controller and FDI filter from a library. The hybrid automata also implements switching between controllers and filters in order to maintain optimal performance under faulty operating conditions. The biggest challenge in developing this system is managing the switching and in maintaining stability during the discontinuous switches. Therefore, in addition to vibration control, the stability associated with switching compensators and FDI filters is studied. Furthermore, the performance of two types of FDI filters is compared: filters based on parameter estimation methods and so called "Beard-Jones" filters. Finally, these simulations help in understanding the use of hybrid automata for fault-tolerant control.

Byreddy, Chakradhar; Frampton, Kenneth D.; Yongmin, Kim

2006-03-01

325

Actuator usage and fault tolerance of the James Webb Space Telescope optical element mirror actuators  

Science.gov (United States)

The James Webb Space Telescope (JWST) telescope's secondary mirror and eighteen primary mirror segments are each actively controlled in rigid body position via six hexapod actuators. The mirrors are stowed to the mirror support structure to survive the launch environment and then must be deployed 12.5 mm to reach the nominally deployed position before the Wavefront Sensing & Control (WFS&C) alignment and phasing process begins. The actuation system is electrically, but not mechanically redundant. Therefore, with the large number of hexapod actuators, the fault tolerance of the OTE architecture and WFS&C alignment process has been carefully considered. The details of the fault tolerance will be discussed, including motor life budgeting, failure signatures, and motor life.

Barto, A.; Acton, D. S.; Finley, P.; Gallagher, B.; Hardy, B.; Knight, J. S.; Lightsey, P.

2012-09-01

326

Robust and Fault-Tolerant Linear Parameter-Varying Control of Wind Turbines  

DEFF Research Database (Denmark)

High performance and reliability are required for wind turbines to be competitive within the energy market. To capture their nonlinear behavior, wind turbines are often modeled using parameter-varying models. In this paper we design and compare multiple linear parameter-varying (LPV) controllers, designed using a proposed method that allows the inclusion of both faults and uncertainties in the LPV controller design. We specifically consider a 4.8 MW, variable-speed, variable-pitch wind turbine model with a fault in the pitch system. We propose the design of a nominal controller (NC), handling the parameter variations along the nominal operating trajectory caused by nonlinear aerodynamics. To accommodate the fault in the pitch system, an active fault-tolerant controller (AFTC) and a passive fault-tolerant controller (PFTC) are designed. In addition to the nominal LPV controller, we also propose a robust controller (RC). This controller is able to take into account model uncertainties in the aerodynamic model. The controllers are based on output feedback and are scheduled on an estimated wind speed to manage the parameter-varying nature of the model. Furthermore, the AFTC relies on information from a fault diagnosis system. The optimization problems involved in designing the PFTC and RC are based on solving bilinear matrix inequalities (BMIs) instead of linear matrix inequalities (LMIs) due to unmeasured parameter variations. Consequently, they are more difficult to solve. The paper presents a procedure, where the BMIs are rewritten into two necessary LMI conditions, which are solved using a two-step procedure. Simulation results show the performance of the LPV controllers to be superior to that of a reference controller designed based on classical principles.

Sloth, Christoffer; Esbensen, Thomas

2011-01-01

327

Evaluation of Fault Detection Coverage of Digital I and C Systems  

Energy Technology Data Exchange (ETDEWEB)

In the fault tolerance evaluation, fault detection coverage is a crucial factor. The fault detection coverage is the ability to detect errors that are caused by faults in a system. If faults are not detected by a certain detection algorithm, the system could be in failure. Evaluating the fault detection coverage of the fault-tolerant technique is important for the safety analysis of digital systems. Digital I and C systems have more various fault-tolerant techniques than conventional analog I and C systems. Even though these fault-tolerant techniques are designed to ensure and improve the safety of systems, the effects of them have not been properly considered yet in most system probabilistic safety assessment (PSA) models. There have been several researches into the reliability of digital systems. However, systematical frameworks or reasonable models to obtain the reliability of digital systems by considering the effects of fault-tolerant techniques have not been proposed. Therefore, it is necessary to develop an evaluation method reflecting the features of digital I and C systems. The evaluation method for fault detection coverage of digital I and C systems was proposed in this work. The proposed method quantifies the fault detection coverage based on the fault injection experiment. Even though there are several limitations of the fault injection experiment such as fault injection into only memory and register, the method has an advantage of that it is possible to observe the actual system behavior against faults in the system. More accurate system reliability evaluation of digital I and C systems can be expected through the experiment result.

Lee, Seung Jun; Jung, Wondea [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

2013-10-15

328

Fault tolerant workflow scheduling based on replication and resubmission of tasks in Cloud Computing  

Directory of Open Access Journals (Sweden)

Full Text Available The aim of workflow scheduling system is to schedule the workflows within the user given deadline to achieve a good success rate. Workflow is a set of tasks processed in a predefined order based on its data and control dependency. Scheduling these workflows in a computing environment, like cloud environment, is an NP-Complete problem and it becomes more challenging when failures of tasks areconsidered. To overcome these failures, the workflow scheduling system should be fault tolerant. In this paper, the proposed Fault Tolerant Workflow Scheduling algorithm (FTWS provides fault tolerance by using replication and resubmission of tasks based on priority of the tasks. The replication of tasks depends on a heuristic metric which is calculated by finding the tradeoff between the replication factor and resubmission factor. The heuristic metric is considered because replication alone may lead to resource wastage and resubmission alone may increase makespan. Tasks are prioritized based on the criticality of the task which is calculated by using parameters like out degree, earliest deadline and high resubmission impact. Priority helps in meeting the deadline of a task and thereby reducing wastage of resources. FTWS schedules workflows within a deadline even in the presence of failures without using any history of information. The experiments were conducted in a simulated cloud environment by scheduling workflows in the presence of failures which are generated randomly. The experimental results of the proposed work demonstrate the effective success rate in-spite of various failures.

Jayadivya S K

2012-06-01

329

Fault-Tolerant Control of Wind Turbines using a Takagi-Sugeno Sliding Mode Observer  

Science.gov (United States)

In this paper, observer-based fault-tolerant control schemes for actuator and sensor faults are implemented within dynamic wind turbine simulations. The faults are directly reconstructed by means of a Takagi-Sugeno sliding mode observer. As simulation models, both a reduced-order model with 4 degrees of freedom and the aero-elastic code FAST by NREL are used. A fault-tolerant control scheme is set up by subtracting the reconstructed fault from the faulty control signal respectively sensor value. With these fault compensation schemes, the corrected controller behaviour is close to the fault-free case. The global stability of the controller in the full-load region in the presence of faults and with active fault compensation is shown by analysing the derivative of an appropriate Lyapunov function.

Georg, Sören; Schulte, Horst

2014-06-01

330

Dust-Tolerant Intelligent Electrical Connection System  

Science.gov (United States)

Faults in wiring systems are a serious concern for the aerospace and aeronautic (commercial, military, and civilian) industries. Circuit failures and vehicle accidents have occurred and have been attributed to faulty wiring created by open and/or short circuits. Often, such circuit failures occur due to vibration during vehicle launch or operation. Therefore, developing non-intrusive fault-tolerant techniques is necessary to detect circuit faults and automatically route signals through alternate recovery paths while the vehicle or lunar surface systems equipment is in operation. Electrical connector concepts combining dust mitigation strategies and cable diagnostic technologies have significant application for lunar and Martian surface systems, as well as for dusty terrestrial applications. The dust-tolerant intelligent electrical connection system has several novel concepts and unique features. It combines intelligent cable diagnostics (health monitoring) and automatic circuit routing capabilities into a dust-tolerant electrical umbilical. It retrofits a clamshell protective dust cover to an existing connector for reduced gravity operation, and features a universal connector housing with three styles of dust protection: inverted cap, rotating cap, and clamshell. It uses a self-healing membrane as a dust barrier for electrical connectors where required, while also combining lotus leaf technology for applications where a dust-resistant coating providing low surface tension is needed to mitigate Van der Waals forces, thereby disallowing dust particle adhesion to connector surfaces. It also permits using a ruggedized iris mechanism with an embedded electrodynamic dust shield as a dust barrier for electrical connectors where required.

Lewis, Mark; Dokos, Adam; Perotti, Jose; Calle, Carlos; Mueller, Robert; Bastin, Gary; Carlson, Jeffrey; Townsend, Ivan, III; Immer, Chirstopher; Medelius, Pedro

2012-01-01

331

Second-order sliding mode fault-tolerant control of heat recovery steam generator boiler in combined cycle power plants  

International Nuclear Information System (INIS)

Power generation plants are intrinsically complex systems due to their numerous internal components. Higher energy efficiency in power plants is now achieved through employing combined cycles. In this article, an adaptive robust Sliding Mode Controller (SMC) is designed to overcome the faults in Heat Recovery Steam Generator boilers (HRSG boilers) as one of the main parts of a combined cycle plant. On condition that a fault occurs in the HRSG boiler, the control system must be able to reconfigure its parameters to maintain the admissible thresholds in dynamic variables such as drum pressure, steam temperature, and drum water level. To achieve good performance for the boiler, the proposed adaptive robust SMC shall conquer the effects of faults and uncertainties by estimating their upper bounds adaptively, and force the outputs of the multivariable boiler to track the outputs of a desired multivariable reference model. Manipulating a suitable control input and using second-order sliding mode control strategy, the output tracking error slides to zero on a PID sliding surface. Besides tracking, the controlled boiler tolerates faults in system matrix, faults in input matrix, and external disturbance signal. Numerical simulations confirm the effectiveness of the proposed FTC (Fault-Tolerant Control) system for an uncertain non-minimum phase HRSG boiler. Highlights: ? This paper proposes a PID-based adaptive second-order sliding mode controller (SMC). ? SMC is robust to controller (SMC). ? SMC is robust to actuator and sensor faults and tracks outputs of a reference system. ? SMC is used in fault tolerant control of a heat recovery steam generator boilers. ? Boiler and reference system have different number of states and inputs. ? Performance of SMC is investigated with different faults scenarios in simulations.

332

Fault-tolerant multipath routing scheme for energy efficient wireless sensor networks  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The main challenge in wireless sensor network is to improve the fault tolerance of each node and also provide an energy efficient fast data routing service. In this paper we propose an energy efficient node fault diagnosis and recovery for wireless sensor networks referred as fault tolerant multipath routing scheme for energy efficient wireless sensor network (FTMRS).The FTMRS is based on multipath data routing scheme. One shortest path is use for main data routing in FTMRS ...

Prasenjit Chanak; Tuhina Samanta; Indrajit Banerjee

2013-01-01

333

ROBUST FAULT TOLERANT CONTROL WITH SENSOR FAULTS FOR A FOUR-ROTOR HELICOPTER  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper considers the control problem for an underactuated quadrotor UAV system in presence of sensor faults. Dynamic modelling of quadrotor while taking into account various physical phenomena, which can influence the dynamics of a flying structure is presented. Subsequently, a new control strategy based on robust integral backstepping approach using sliding mode and taking into account the sensor faults is developed. Lyapunov based stability analysis shows that the proposed control strat...

Fouad Yacef; Belkacem Sait; Hicham Khebbache

2012-01-01

334

Design Approach for Fault Tolerance Algorithm in FPGA Architecture with BIST in Hardware Controller  

Directory of Open Access Journals (Sweden)

Full Text Available Redundancy based hardening techniques are applied at the pre-synthesis or synthesis level. To provide solutions for increasing the fault-tolerance capabilities with algorithms able to reduce sensitive configuration memory bits of FPGAs we use BIST method. While these systems frequently contain hardware redundancy to allow for continued operation in the presence of operational faults, the need to recover faulty hardware and return it to full functionality quickly and efficiently is great. In addition to providing functional density, FPGAs provide a level of fault tolerance generally not found in mask-programmable devices by including the capability to reconfigure around operational faults in the field. Reliability and process variability are serious issues for FPGAs in the future. With advancement in process technology, the feature size is decreasing which leads to higher defect densities, more sophisticated techniques at increased costs are required to avoid defects. In this work we present a solution in which configuration bit-stream of FPGA is modified by a hardware controller that is present on the chip itself. The technique uses redundant device for replacing faulty device and increases the yield.

Shweta S. Meshram

2011-05-01

335

Time Delay Fault Tolerant Controller for Actuator Failures during Aircraft Autolanding  

Science.gov (United States)

A time delay control methodology is adopted to cope with degraded control performance due to control surface damage of unmanned aerial vehicles, especially in the case of the automatic landing phase. It is a crucial challenge to maintain consistent control performance even under fault environments such as stuck and/or incipient actuator faults. Flight control systems designed using conventional feedback control methods in such cases may result in unsatisfactory performance, and even worse, may not guarantee the closed-loop stability, which is fatal for aircraft in the state of auto-landing. To overcome the shortfalls of the conventional approach, the time delay control scheme is adopted. This scheme is known to be robust against disturbance, model uncertainties and so on. Motivated by the fact that the abrupt and/or incipient actuator faults focused on in this paper could be considered as model uncertainties, we consider the application of the time delay controller to designing a fault tolerant control system. To show the effectiveness of the time delay control method, a nonlinear 6-DOF simulation is performed under model uncertainties and wind disturbances, and control performance is compared with that of conventional controllers in the case of multiple and single actuator faults.

Lee, Jangho; Choi, Hyoung Sik; Lee, Sangjong; Kim, Eung Tai; Shin, Dongho

336

Initial Fault Tolerance and Autonomy Results for Autonomous On-board Processing of Hyperspectral Imaging  

Science.gov (United States)

By developing Radiation Hardening by Software (RHBSW) techniques leveraged from the High Performance Computing community, our work seeks to deliver radiation tolerant, high performance System on a Chip (SoC) processors to the remote sensing community. This SoC architecture is uniquely suited to both handle high performance signal processing tasks, as well as autonomous agent processing. This allows situational awareness to be developed in-situ, resulting in a 10-100x decrease in processing latency, which directly translates into more science experiments conducted per day and a more thorough, timely analysis of captured data. With the increase in the amount of computational throughput made possible by commodity high performance processors and low overhead fault tolerance, new applications can be considered for on-board processing. A high performance and low overhead fault tolerance strategy targeting scientific applications on the SpaceCube 1.0 platform has been enhanced with initial results showing an order of magnitude increase in Mean Time Between Data Error and a complete elimination of processor hangs. Initial study of representative Hyperspectral applications also proves promising due to high levels of data parallelism and fine grained parallelism achievable within FPGA System on a Chip architectures enabled by our RHBSW techniques. To demonstrate the kinds of capabilities these fault tolerance approaches yield, the team focused on applications representative of the Decadal Survey HyspIRI mission, which uses high throughput Thermal Infrared Scanner (132 Mbps) and Hyperspectral Visibe ShortWave InfraRed (804 Mbps) instruments, while having only a 15 Mbps downlink channel. This mission provides a great many use scenarios for onboard processing, from high compression algorithms, to pre-processing and selective download of high priority images, to full on-board classification. This paper focuses on recent efforts which revolve around developing a fault emulator for the embedded PowerPC within Xilinx V4FX devices, validating the RHBSW techniques developed in the prior year, and initial performance results on a representative autonomous Hyperspectral application. In the future, fault analysis data will be refined and correlated between software fault emulation, laser testing, and space based results. This project will also deliver expected performance results on an optimized, representative Hyperspectral imaging application demonstrating autonomous operations.

French, M.; Walters, J.; Zick, K.

2011-12-01

337

Fault tolerant small satellite attitude control using adaptive non-singular terminal sliding mode  

Science.gov (United States)

The Attitude Control System (ACS) plays a pivotal role in the whole performance of the spacecraft on the orbit; therefore, it is vitally important to design the control system with the performance of rapid response, high control precision and insensitive to external perturbations. In the first place, this paper proposes two adaptive nonlinear control algorithms based on the sliding mode control (SMC), which are designed for small satellite attitude control system. The nonlinear dynamics describing the attitude of small satellite is considered in a circle reference orbit, and the stability of the closed-loop system in the presence of external perturbations is investigated. Then, in order to account for accidental or degradation fault in satellite actuators, the fault-tolerant control schemes are presented. Hence, two adaptive fault-tolerant control laws (continuous sliding mode control and non-singular terminal sliding mode control) are developed by adopting the nonlinear analytical model to describe the system, which can guarantee global asymptotic convergence of the attitude control error with the existence of unknown external perturbations. The nonlinear hyperplane based Terminal sliding mode is introduced into the control law design; therefore, the system convergence performance improves and the control error is convergent in "finite time". As a result, the study on the non-singular terminal sliding mode control is the emphasis and the continuous sliding mode control is used to compare with the non-singular terminal sliding mode control. Meanwhile, an adaptive fuzzy algorithm has been proposed to suppress the chattering phenomenon. Moreover, several numerical examples are presented to demonstrate the efficacy of the proposed controllers by correcting for the external perturbations. Simulation results confirm that the suggested methodologies yield high control precision in control. In addition, actuator degradation, actuator stuck and actuator failure for a period of time are simulated to demonstrate the fault recovery capability of the fault tolerant controllers. The numerical results clearly demonstrate the good performance of the adaptive non-singular terminal control in the event of actuator fault compare with the continuous sliding mode control.

Cao, Lu; Chen, XiaoQian; Sheng, Tao

2013-06-01

338

Subaru FATS (fault tracking system)  

Science.gov (United States)

The Subaru Telescope requires a fault tracking system to record the problems and questions that staff experience during their work, and the solutions provided by technical experts to these problems and questions. The system records each fault and routes it to a pre-selected 'solution-provider' for each type of fault. The solution provider analyzes the fault and writes a solution that is routed back to the fault reporter and recorded in a 'knowledge-base' for future reference. The specifications of our fault tracking system were unique. (1) Dual language capacity -- Our staff speak both English and Japanese. Our contractors speak Japanese. (2) Heterogeneous computers -- Our computer workstations are a mixture of SPARCstations, Macintosh and Windows computers. (3) Integration with prime contractors -- Mitsubishi and Fujitsu are primary contractors in the construction of the telescope. In many cases, our 'experts' are our contractors. (4) Operator scheduling -- Our operators spend 50% of their work-month operating the telescope, the other 50% is spent working day shift at the base facility in Hilo, or day shift at the summit. We plan for 8 operators, with a frequent rotation. We need to keep all operators informed on the current status of all faults, no matter the operator's location.

Winegar, Tom W.; Noumaru, Junichi

2000-07-01

339

Effect estimation of an automatic periodic tests in NPP digital I and C systems by fault injections  

International Nuclear Information System (INIS)

As digital technologies have been improved, new NPPs have adapted various kinds of digital systems including digital I and C systems for safer and more efficient operations. The development of a methodology for the probabilistic safety assessment (PSA) of digital I and C systems is a critical issue because conventional PSA techniques cannot adequately evaluate all features of digital systems. In fact, digital l and C systems have more various fault-tolerant techniques including automatic inspection functions than conventional analog I and C systems. Even though these fault-tolerant techniques in digital l and C systems are designed to ensure and improve the safety of systems, the effects of them have not been properly considered yet in most system PSA models. Therefore, it is necessary to develop an evaluation method which can describe the features of digital l and C systems. In this work, a method to quantify the error coverage with consideration of duplicated effects of fault-tolerant techniques in digital I and C systems is suggested using fault injection experiments. Even though new NPPs have adapted digital l and C systems including various fault-tolerant techniques, the effects of them have not been properly considered yet in most PSA models. Among the issues to be solved in order to obtain accurate reliability of digital l and C systems, this work focused on the issue to exclude duplicated effect reflection when various fault-tolerant techniques are implemented fault-tolerant techniques are implemented simultaneously. In order to exclude a duplicated effect consideration, exact definitions of relations between faults and fault-tolerant techniques is required. In this work, the relation between faults and fault-tolerant techniques are defined using fault injection experiments. As an application, independent fault coverage of each fault-tolerant technique in an AI module and overall fault coverage were identified using the proposed methods and the experiment showed reasonable results

340

Fault detection in photovoltaic systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This master’s thesis concerns three different areas in the field of fault detection in photovoltaic systems.Previous studies have concerned homogeneous systems with a large set of parameters being observed,while this study is focused on a more restrictive case. The first problem is to discover immediate faults occurring in solar panels. A new online algorithm is developed based on similarity measures with in a single installation. It performs reliably and is able to detect all significant f...

Nilsson, David

2014-01-01

 
 
 
 
341

Observer-Based Fault Estimation and Accomodation for Dynamic Systems  

CERN Document Server

Due to the increasing security and reliability demand of actual industrial process control systems, the study on fault diagnosis and fault tolerant control of dynamic systems has received considerable attention. Fault accommodation (FA) is one of effective methods that can be used to enhance system stability and reliability, so it has been widely and in-depth investigated and become a hot topic in recent years. Fault detection is used to monitor whether a fault occurs, which is the first step in FA. On the basis of fault detection, fault estimation (FE) is utilized to determine online the magnitude of the fault, which is a very important step because the additional controller is designed using the fault estimate. Compared with fault detection, the design difficulties of FE would increase a lot, so research on FE and accommodation is very challenging. Although there have been advancements reported on FE and accommodation for dynamic systems, the common methods at the present stage have design difficulties, whi...

Zhang, Ke; Shi, Peng

2013-01-01

342

Fault-tolerant topology of a grid-connected PV inverter coupled by a Scott transformer  

Energy Technology Data Exchange (ETDEWEB)

A grid-connected photovoltaic (PV) generator is mainly based on power electronics equipments which are considered as the most vulnerable parts in a PV system. In order to increase the reliability of modular grid-connected PV panel, a solution by using a Scott transformer is presented to reduce the number of switches and to continuously operate the PV system in case of switch-failures of the power converter. The three-phase type PV inverter is analysed in the normal and fault-operation. The simulation has shown that fault tolerance can be achieved with the proposed system configuration to give a redundancy of power switches in an integrated power electronic module. (orig.)

Mai, ThuanDat; Driesen, Johan [K.U. Leuven, ESAT/ELECTA, Heverlee (Belgium); Cheng, Yonghua [Vlaamse Instelling voor Technologisch Onderzoek (VITO), Mol (Belgium)

2012-07-01

343

Active Fault Isolation in MIMO Systems  

DEFF Research Database (Denmark)

Active fault isolation of parametric faults in closed-loop MIMO system s are considered in this paper. The fault isolation consists of two steps. T he first step is group- wise fault isolation. Here, a group of faults is isolated from other pos sible faults in the system. The group-wise fault isolation is based directly on the input/output s ignals applied for the fault detection. It is guaranteed that the fault group includes the fault that had occurred in the system. The second step is individual fault isolation in the fault group . Both types of isolation are obtained by applying dedicated auxiliary inputs and the associate d residual outputs.

Niemann, Hans Henrik; Poulsen, Niels KjØlstad

2014-01-01

344

A Self-Stabilizing Byzantine-Fault-Tolerant Clock Synchronization Protocol  

Science.gov (United States)

This report presents a rapid Byzantine-fault-tolerant self-stabilizing clock synchronization protocol that is independent of application-specific requirements. It is focused on clock synchronization of a system in the presence of Byzantine faults after the cause of any transient faults has dissipated. A model of this protocol is mechanically verified using the Symbolic Model Verifier (SMV) [SMV] where the entire state space is examined and proven to self-stabilize in the presence of one arbitrary faulty node. Instances of the protocol are proven to tolerate bursts of transient failures and deterministically converge with a linear convergence time with respect to the synchronization period. This protocol does not rely on assumptions about the initial state of the system other than the presence of sufficient number of good nodes. All timing measures of variables are based on the node s local clock, and no central clock or externally generated pulse is used. The Byzantine faulty behavior modeled here is a node with arbitrarily malicious behavior that is allowed to influence other nodes at every clock tick. The only constraint is that the interactions are restricted to defined interfaces.

Malekpour, Mahyar R.

2009-01-01

345

Efficient and flexible fault tolerance and migration of scientific simulation using CUMULVS  

Energy Technology Data Exchange (ETDEWEB)

Many practical scientific applications would benefit from a simple checkpointing mechanism to provide automatic restart or recovery in response to faults and failures. CUMULVS is a middleware infrastructure for interacting with parallel scientific simulations to support online visualization and computational steering. The base CUMULVS system has been extended to provide a user-level mechanism for collecting checkpoints in a parallel simulation program. Via the same interface that CUMULVS uses to identify and describe data fields for visualization and parameters for steering, the user application can select the minimal program state necessary to restart or migrate an application task. The CUMULVS run-time system uses this information to efficiently recover fault-tolerant applications by restarting failed tasks. Application tasks can also be migrated -- even across heterogeneous architecture boundaries -- to achieve load balancing or to improve the task`s locality with a required resource. This paper describes the CUMULVS interface for checkpointing, the issues faced in utilizing this interface when developing fault-tolerant and migrating applications, and the direction of future research in this area.

Kohl, J.A.; Papadopoulos, P.M.

1998-05-01

346

Testability and Fault Tolerance for Emerging Nanoelectronic Memories :  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Emerging nanoelectronic memories such as Resistive Random Access Memories (RRAMs) are possible candidates to replace the conventional memory technologies such as SRAMs, DRAMs and flash memories in future computer systems. Despite their advantages such as enormous storage capacity, low-power per unit device and reduced manufacturing difficulties, these emerging memories are expected to suffer from high manufacturing defect densities (reducing their quality) and in-field fault rates including c...

Haron, N. Z. B.

2012-01-01

347

A model-based approach for fault-tolerant control  

DEFF Research Database (Denmark)

A model-based controller architecture for faulttolerant control (FTC) is presented in this paper. The controller architecture is based on the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization. The FTC architecture consists of two central parts, fault detection and isolation (FDI) part and a controller reconfiguration part. The theoretical basis for the architecture will be given followed by an investigation of the single parts in the architecture. At last, system interconnection will be considered with respect to the described controller architecture.

Niemann, Hans Henrik

2010-01-01

348

ROBUST FAULT TOLERANT CONTROL WITH SENSOR FAULTS FOR A FOUR-ROTOR HELICOPTER  

Directory of Open Access Journals (Sweden)

Full Text Available This paper considers the control problem for an underactuated quadrotor UAV system in presence of sensor faults. Dynamic modelling of quadrotor while taking into account various physical phenomena, which can influence the dynamics of a flying structure is presented. Subsequently, a new control strategy based on robust integral backstepping approach using sliding mode and taking into account the sensor faults is developed. Lyapunov based stability analysis shows that the proposed control strategy design keep the stability of the closed loop dynamics of the quadrotor UAV even after the presence of sensor failures. Numerical simulation results are provided to show the good tracking performance of proposed control laws.

Fouad Yacef

2012-03-01

349

Optimal Configuration of Fault-Tolerance Parameters for Distributed Server Access  

DEFF Research Database (Denmark)

Server replication is a common fault-tolerance strategy to improve transaction dependability for services in communications networks. In distributed architectures, fault-diagnosis and recovery are implemented via the interaction of the server replicas with the clients and other entities such as enhanced name servers. Such architectures provide an increased number of redundancy configuration choices. The influence of a (wide area) network connection can be quite significant and induce trade-offs between dependability and user-perceived performance. This paper develops a quantitative stochastic model using stochastic activity networks (SAN) for the evaluation of performance and dependability metrics of a generic transaction-based service implemented on a distributed replication architecture. The composite SAN model can be easily adapted to a wide range of client-server applications deployed in replicated server architectures. In order to obtain insight into the system behaviour, a set of relevant environment parameters and controllable fault-tolerance parameters are chosen and the dependability/performance trade-off is evaluated.

Daidone, Alessandro; Renier, Thibault

2013-01-01

350

Fault Diagnosis for Electrical Distribution Systems using Structural Analysis  

DEFF Research Database (Denmark)

Fault-tolerance in electrical distribution relies on the ability to diagnose possible faults and determine which components or units cause a problem or are close to doing so. Faults include defects in instrumentation, power generation, transformation and transmission. The focus of this paper is the design of efficient diagnostic algorithms, which is a prerequisite for fault-tolerant control of power distribution. Diagnosis in a grid depend on available analytic redundancies, and hence on network topology. When topology changes, due to earlier fault(s) or caused by maintenance, analytic redundancy relations (ARR) are likely to change. The algorithms used for diagnosis may need to change accordingly, and finding efficient methods to ARR generation is essential to employ fault-tolerant methods in the grid. Structural analysis (SA) is based on graph-theoretical results, that offer to find analytic redundancies in large sets of equations only from the structure (topology) of the equations. A salient feature is automated generation of redundancy relations. The method is indeed feasible in electrical networks where circuit theory and network topology together formulate the constraints that define a structure graph. This paper shows how three-phase networks are modelled and analysed using structural methods, and it extends earlier results by showing how physical faults can be identified such that adequate remedial actions can be taken. The paper illustrates a feasible modelling technique for structural analysis of power systems, it demonstrates detection and isolation of failures in a network, and shows how typical faults are diagnosed. Nonlinear fault simulations illustrate the results.

Knüppel, Thyge; Blanke, Mogens

2014-01-01

351

Universal Fault Tolerant Quantum Computation on a Class of Decoherence-Free Subspaces Without Spatial Symmetry  

CERN Document Server

Decoherence-free subspaces (DFSs) are constructed without the assumption of spatially symmetric system-bath coupling. Instead the underlying assumption is that subgroups of the full Pauli group of errors are responsible for the decoherence. The corresponding decoherence-free states can protect quantum information in the presence of multiple-qubit errors, and are stabilizer codes. It is shown how to perform universal fault tolerant quantum computation on this class of DFSs. This is the first demonstration that it is possible to use only one- and two-body quantum gates to perform full-blown quantum computation on a class of DFSs, with a finite number of measurements.

Lidar, D A; Kempe, J; Whaley, K B; Lidar, Daniel A.; Bacon, David; Kempe, Julia

2001-01-01

352

Are the Assumptions of Fault-Tolerant Quantum Error Correction Internally Consistent?  

CERN Document Server

We critically examine the internal consistency of a set of minimal assumptions entering the theory of fault-tolerant quantum error correction for Markovian noise. We point out that these assumptions may not be mutually consistent in light of rigorous formulations of the Markovian approximation. Namely, Markovian dynamics requires either the singular coupling limit (high temperature), or the weak coupling limit (weak system-bath interaction). The former is incompatible with the assumption of a constant and fresh supply of cold ancillas, while the latter is inconsistent with fast gates. We discuss ways to resolve these inconsistencies.

Alicki, R; Zanardi, P

2005-01-01

353

Fault Tolerant Magnetic Bearing Testing and Conical Magnetic Bearing Development for Extreme Temperature Environments  

Science.gov (United States)

During the six month tenure of the grant, activities included continued research of hydrostatic bearings as a viable backup-bearing solution for a magnetically levitated shaft system in extreme temperature environments (1000 F), developmental upgrades of the fault-tolerant magnetic bearing rig at the NASA Glenn Research Center, and assisting in the development of a conical magnetic bearing for extreme temperature environments, particularly turbomachinery. It leveraged work from the ongoing Smart Efficient Components (SEC) and the Turbine-Based Combined Cycle (TBCC) program at NASA Glenn Research Center. The effort was useful in providing technology for more efficient and powerful gas turbine engines.

Keith, Theo G., Jr.; Clark, Daniel

2004-01-01

354

P2P-MPI : A fault-tolerant Message Passing Interface Implementation for Grids  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This thesis aims to demonstrate that message-passing parallel programs can be deployed onto large, heterogeneous distributed systems. This work consists in the design and development of a proof-of-concept middleware named P2P-MPI, released under a public license. P2P-MPI alleviates this task by proposing a peer-to-peer based platform in which available resources are dynamically discovered upon job requests, and by providing a fault-tolerant message-passing library for Java programs. The motiv...

Rattanapoka, Choopan

2008-01-01

355

Separation of Fault Tolerance and Non-Functional Concerns: Aspect Oriented Patterns and Evaluation  

Directory of Open Access Journals (Sweden)

Full Text Available Dependable computer based systems employing fault tolerance and robust software development techniques demand additional error detection and recovery related tasks. This results in tangling of core functionality with these cross cutting non-functional concerns. In this regard current work identifies these dependability related non-functional and cross-cutting concerns and proposes design and implementation solutions in an aspect oriented framework that modularizes and separates them from core functionality. The degree of separation has been quantified using software metrics. A Lego NXT Robot based case study has been completed to evaluate the proposed design framework.

Kashif Hameed

2010-04-01

356

Fault-tolerant permanent-magnet synchronous machine drives -- Fault detection and isolation, control reconfiguration and design considerations  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The need for efficiency, reliability and continuous operation has lead over the years to the development of fault-tolerant electrical drives for various industrial purposes and for transport applications. Permanent-magnet synchronous machines have also been gaining interest due to their high torque-to-mass ratio and high efficiency, which make them a very good candidate to reduce the weight and volume of the equipment. In this work, a multidisciplinary approach for the design of fault-tole...

Meinguet, Fabien

2012-01-01

357

Design of Parity Preserving Logic Based Fault Tolerant Reversible Arithmetic Logic Unit  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Reversible Logic is gaining significant consideration as the potential logic design style for implementation in modern nanotechnology and quantum computing with minimal impact on physical entropy .Fault Tolerant reversible logic is one class of reversible logic that maintain the parity of the input and the outputs. Significant contributions have been made in the literature towards the design of fault tolerant reversible logic gate structures and arithmetic units, however, th...

Rakshith Saligram; Shrihari Shridhar Hegde; Kulkarni, Shashidhar A.; Bhagyalakshmi, H. R.; Venkatesha, M. K.

2013-01-01

358

A Byzantine resilient fault tolerant computer for nuclear power plant applications  

International Nuclear Information System (INIS)

A quadruply redundant synchronous fault tolerant processor, capable of tolerating Byzantine faults, is now under fabrication at the C.S. Draper Laboratory to be used initially as a trip monitor for the Experimental Breeder Reactor EBR-II operated by the Argonne National Laboratory in Idaho Falls, Idaho. This paper describes the hardware architecture of this processor and discusses certain issues unique to quadruply redundant computers

359

Comparison between different model of hexapod robot in fault-tolerant gait  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper presents a gait analysis of the equilateral hexagonal model of hexapod robot. Mathematical analysis has been made on mobility, fault-tolerance, and stability. A comparison with the rectangular model of hexapod robot is also given, and it has shown that the hexagonal model shows better turning ability, a higher margin of stability during the fault-tolerant gait, and greater stride length in certain conditions.

Chu, Skk; Pang, Gkh

2002-01-01

360

Fault-Tolerant Data Sharing for High-level Grid Programming: A Hierarchical Storage Architecture  

Digital Repository Infrastructure Vision for European Research (DRIVER)

Enabling high-level programming models on grids is today a major challenge. A way to achieve this goal relies on the use of environments able to transparently and automatically provide adequate support for low-level, grid-specific issues (fault-tolerance, scalability, etc.). This paper discusses the above approach when applied to grid data management. As a case study, we propose a 2-tier software architecture which supports transparent, fault-tolerant, grid-level data sharing in the ASSIS...

Aldinucci, Marco; Antoniu, Gabriel; Danelutto, Marco; Jan, Mathieu

2006-01-01

 
 
 
 
361

ALLIANCE: An architecture for fault tolerant multi-robot cooperation  

International Nuclear Information System (INIS)

ALLIANCE is a software architecture that facilitates the fault tolerant cooperative control of teams of heterogeneous mobile robots performing missions composed of loosely coupled, largely independent subtasks. ALLIANCE allows teams of robots, each of which possesses a variety of high-level functions that it can perform during a mission, to individually select appropriate actions throughout the mission based on the requirements of the mission, the activities of other robots, the current environmental conditions, and the robot's own internal states. ALLIANCE is a fully distributed, behavior-based architecture that incorporates the use of mathematically modeled motivations (such as impatience and acquiescence) within each robot to achieve adaptive action selection. Since cooperative robotic teams usually work in dynamic and unpredictable environments, this software architecture allows the robot team members to respond robustly, reliably, flexibly, and coherently to unexpected environmental changes and modifications in the robot team that may occur due to mechanical failure, the learning of new skills, or the addition or removal of robots from the team by human intervention. The feasibility of this architecture is demonstrated in an implementation on a team of mobile robots performing a laboratory version of hazardous waste cleanup

362

ALLIANCE: An architecture for fault tolerant multi-robot cooperation  

Energy Technology Data Exchange (ETDEWEB)

ALLIANCE is a software architecture that facilitates the fault tolerant cooperative control of teams of heterogeneous mobile robots performing missions composed of loosely coupled, largely independent subtasks. ALLIANCE allows teams of robots, each of which possesses a variety of high-level functions that it can perform during a mission, to individually select appropriate actions throughout the mission based on the requirements of the mission, the activities of other robots, the current environmental conditions, and the robot`s own internal states. ALLIANCE is a fully distributed, behavior-based architecture that incorporates the use of mathematically modeled motivations (such as impatience and acquiescence) within each robot to achieve adaptive action selection. Since cooperative robotic teams usually work in dynamic and unpredictable environments, this software architecture allows the robot team members to respond robustly, reliably, flexibly, and coherently to unexpected environmental changes and modifications in the robot team that may occur due to mechanical failure, the learning of new skills, or the addition or removal of robots from the team by human intervention. The feasibility of this architecture is demonstrated in an implementation on a team of mobile robots performing a laboratory version of hazardous waste cleanup.

Parker, L.E.

1995-02-01

363

Verification of a Byzantine-Fault-Tolerant Self-stabilizing Protocol for Clock Synchronization  

Science.gov (United States)

This paper presents the mechanical verification of a simplified model of a rapid Byzantine-fault-tolerant self-stabilizing protocol for distributed clock synchronization systems. This protocol does not rely on any assumptions about the initial state of the system except for the presence of sufficient good nodes, thus making the weakest possible assumptions and producing the strongest results. This protocol tolerates bursts of transient failures, and deterministically converges within a time bound that is a linear function of the self-stabilization period. A simplified model of the protocol is verified using the Symbolic Model Verifier (SMV). The system under study consists of 4 nodes, where at most one of the nodes is assumed to be Byzantine faulty. The model checking effort is focused on verifying correctness of the simplified model of the protocol in the presence of a permanent Byzantine fault as well as confirmation of claims of determinism and linear convergence with respect to the self-stabilization period. Although model checking results of the simplified model of the protocol confirm the theoretical predictions, these results do not necessarily confirm that the protocol solves the general case of this problem. Modeling challenges of the protocol and the system are addressed. A number of abstractions are utilized in order to reduce the state space.

Malekpour, Mahyar R.

2008-01-01

364

High Speed Operation and Testing of a Fault Tolerant Magnetic Bearing  

Science.gov (United States)

Research activities undertaken to upgrade the fault-tolerant facility, continue testing high-speed fault-tolerant operation, and assist in the commission of the high temperature (1000 degrees F) thrust magnetic bearing as described. The fault-tolerant magnetic bearing test facility was upgraded to operate to 40,000 RPM. The necessary upgrades included new state-of-the art position sensors with high frequency modulation and new power edge filtering of amplifier outputs. A comparison study of the new sensors and the previous system was done as well as a noise assessment of the sensor-to-controller signals. Also a comparison study of power edge filtering for amplifier-to-actuator signals was done; this information is valuable for all position sensing and motor actuation applications. After these facility upgrades were completed, the rig is believed to have capabilities for 40,000 RPM operation, though this has yet to be demonstrated. Other upgrades included verification and upgrading of safety shielding, and upgrading control algorithms. The rig will now also be used to demonstrate motoring capabilities and control algorithms are in the process of being created. Recently an extreme temperature thrust magnetic bearing was designed from the ground up. The thrust bearing was designed to fit within the existing high temperature facility. The retrofit began near the end of the summer, 04, and continues currently. Contract staff authored a NASA-TM entitled "An Overview of Magnetic Bearing Technology for Gas Turbine Engines", containing a compilation of bearing data as it pertains to operation in the regime of the gas turbine engine and a presentation of how magnetic bearings can become a viable candidate for use in future engine technology.

DeWitt, Kenneth; Clark, Daniel

2004-01-01

365

An Adaptive Job Scheduling with efficient Fault Tolerance Strategy in Computational Grid  

Directory of Open Access Journals (Sweden)

Full Text Available Grid computing is an emerging technology which has the potential to solve large scale scientific problems in an integrated heterogeneous environment. However, in the grid computing environment there are certain aspects which reduces efficiency of the system. Scheduling the jobs to the best suited resources, achieving the load balancing and fault tolerance are the key aspects to improve the efficiency and to exploit the capabilities of emergent computational systems. Because of dynamic and distributed nature of the grid, the traditional methodologies of scheduling are inefficient for the effective utilization of the available resources. In this paper, an efficient adaptive job scheduling algorithm is proposed to improve the efficiency of the grid system for a large number of tasks. Moreover, the proposed adaptive job scheduling in addition to the fault tolerance strategy with check pointing approach shows the improvement in performance of the overall computation time even in worst scenario under the heterogeneous grid environment. The simulation results illustrates that the proposed strategy effectively schedules the grid jobs with more than 10% increase in overall performance thus resulting in minimization of overall execution time.

S. Gokuldev

2014-08-01

366

A Self-Stabilizing Hybrid-Fault Tolerant Synchronization Protocol  

Science.gov (United States)

In this report we present a strategy for solving the Byzantine general problem for self-stabilizing a fully connected network from an arbitrary state and in the presence of any number of faults with various severities including any number of arbitrary (Byzantine) faulty nodes. Our solution applies to realizable systems, while allowing for differences in the network elements, provided that the number of arbitrary faults is not more than a third of the network size. The only constraint on the behavior of a node is that the interactions with other nodes are restricted to defined links and interfaces. Our solution does not rely on assumptions about the initial state of the system and no central clock nor centrally generated signal, pulse, or message is used. Nodes are anonymous, i.e., they do not have unique identities. We also present a mechanical verification of a proposed protocol. A bounded model of the protocol is verified using the Symbolic Model Verifier (SMV). The model checking effort is focused on verifying correctness of the bounded model of the protocol as well as confirming claims of determinism and linear convergence with respect to the self-stabilization period. We believe that our proposed solution solves the general case of the clock synchronization problem.

Malekpour, Mahyar R.

2014-01-01

367

Transient Faults in Computer Systems  

Science.gov (United States)

A powerful technique particularly appropriate for the detection of errors caused by transient faults in computer systems was developed. The technique can be implemented in either software or hardware; the research conducted thus far primarily considered software implementations. The error detection technique developed has the distinct advantage of having provably complete coverage of all errors caused by transient faults that affect the output produced by the execution of a program. In other words, the technique does not have to be tuned to a particular error model to enhance error coverage. Also, the correctness of the technique can be formally verified. The technique uses time and software redundancy. The foundation for an effective, low-overhead, software-based certification trail approach to real-time error detection resulting from transient fault phenomena was developed.

Masson, Gerald M.

1993-01-01

368

A hybrid framework for design and analysis of fault-tolerant architectures for nanoscale molecular crossbar memories.  

Energy Technology Data Exchange (ETDEWEB)

It is anticipated that self assembled ultra-dense nanomemories will be more susceptible to manufacturing defects and transient faults than conventional CMOS-based memories, thus the need exists for fault-tolerant memory architectures. The development of such architectures will require intense analysis in terms of achievable performance measures - power dissipation, area, delay and reliability. In this paper, we propose and develop a hybrid automation framework, called HMAN, that aids the design and analysis of fault-tolerant architectures for nanomemories. Our framework can analyze memory architectures at two different levels of the design abstraction, namely the system and circuit levels. To the best of our knowledge, this is the first such attempt at analyzing memory systems at different levels of abstraction and then correlating the different performance measures to provide the system designers guidelines for designing a robust nanomemory. We also illustrate the application of our framework to self-assembled crossbar architectures by analyzing a hierarchical fault-tolerant crossbar-based memory architecture that we have developed, and comparing this with existing crossbar architectures.

Graham, P. S. (Paul S.); Gokhale, M. (Maya); Bhaduri, D. (Debayan); Shukla, S. K. (Sandeep K.); Coker, D. (Deji); Taylor, V. (Valerie)

2005-01-01

369

Parameter Estimation Analysis for Hybrid Adaptive Fault Tolerant Control  

Science.gov (United States)

Research efforts have increased in recent years toward the development of intelligent fault tolerant control laws, which are capable of helping the pilot to safely maintain aircraft control at post failure conditions. Researchers at West Virginia University (WVU) have been actively involved in the development of fault tolerant adaptive control laws in all three major categories: direct, indirect, and hybrid. The first implemented design to provide adaptation was a direct adaptive controller, which used artificial neural networks to generate augmentation commands in order to reduce the modeling error. Indirect adaptive laws were implemented in another controller, which utilized online PID to estimate and update the controller parameter. Finally, a new controller design was introduced, which integrated both direct and indirect control laws. This controller is known as hybrid adaptive controller. This last control design outperformed the two earlier designs in terms of less NNs effort and better tracking quality. The performance of online PID has an important role in the quality of the hybrid controller; therefore, the quality of the estimation will be of a great importance. Unfortunately, PID is not perfect and the online estimation process has some inherited issues; the online PID estimates are primarily affected by delays and biases. In order to ensure updating reliable estimates to the controller, the estimator consumes some time to converge. Moreover, the estimator will often converge to a biased value. This thesis conducts a sensitivity analysis for the estimation issues, delay and bias, and their effect on the tracking quality. In addition, the performance of the hybrid controller as compared to direct adaptive controller is explored. In order to serve this purpose, a simulation environment in MATLAB/SIMULINK has been created. The simulation environment is customized to provide the user with the flexibility to add different combinations of biases and delays to the explored derivatives. Biases were considered in the range -500% to 500% and delays in the range 0.5 to 40 seconds. The stability and control derivatives considered in this research effort are a combination of decoupled derivatives in the three channels, longitudinal, lateral, and directional. Numerous simulation scenarios and flight conditions are considered to provide more credibility to the obtained results. In addition, a statistical analysis has been conducted to assess the results. The performance of the control laws has been evaluated in terms of the integral of the error in tracking the three desired angular rates, pitch, roll, and yaw. In addition, the effort of the neural networks exerted to compensate for tracking errors is considered in the analysis as well. The results show that in order to obtain reliable estimates for the investigated derivatives, the estimator needs to generate values with less than five seconds delay. In addition, derivatives estimates are within 50% or -15% off the exact values. Moreover, the importance of updating derivatives depends on the maneuver scenario and the flight condition. The estimation process at quasi-steady state conditions provides reliable estimates as opposed to estimation during fast dynamic changes; also, the estimation process has better performance at large rate of change of derivatives values.

Eshak, Peter B.

370

Clustering and fault tolerance for target tracking using wireless sensor networks  

International Nuclear Information System (INIS)

Over the last few years, the deployment of WSNs (Wireless Sensor Networks) has been fostered in diverse applications. WSN has great potential for a variety of domains ranging from scientific experiments to commercial applications. Due to the deployment of WSNs in dynamic and unpredictable environments. They have potential to cope with variety of faults. This paper proposes an energy-aware fault-tolerant clustering protocol for target tracking applications termed as the FITf (Fault Tolerant Target Tracking) protocol The identification of RNs (Redundant Nodes) makes SN (Sensor Node) fault tolerance plausible and the clustering endorsed recovery of sensors supervised by a faulty CH (Cluster Head). The FfTT protocol intends two steps of reducing energy consumption: first, by identifying RNs in the network; secondly, by restricting the numbers of SNs sending data to the CH. Simulations validate the scalability and low power consumption of the FITf protocol in comparison with LEACH protocol. (author)

371

Design of Parity Preserving Logic Based Fault Tolerant Reversible Arithmetic Logic Unit  

Directory of Open Access Journals (Sweden)

Full Text Available Reversible Logic is gaining significant consideration as the potential logic design style for implementationin modern nanotechnology and quantum computing with minimal impact on physical entropy .FaultTolerant reversible logic is one class of reversible logic that maintain the parity of the input and theoutputs. Significant contributions have been made in the literature towards the design of fault tolerantreversible logic gate structures and arithmetic units, however, there are not many efforts directed towardsthe design of fault tolerant reversible ALUs. Arithmetic Logic Unit (ALU is the prime performing unit inany computing device and it has to be made fault tolerant. In this paper we aim to design one such faulttolerant reversible ALU that is constructed using parity preserving reversible logic gates. The designedALU can generate up to seven Arithmetic operations and four logical operations.

Rakshith Saligram

2013-07-01

372

A Modified Fault Tolerant Location-Based Service Discovery Protocol for Vehicular Networks  

Directory of Open Access Journals (Sweden)

Full Text Available In the recent years, advances in Vehicular networks have attracted special attraction of researchers. Lately two types of applications have gain popularity: Road safety and Driving comfort. Reliable data transformation in the city environment is hard to accomplish due to presence of noise and obstacles. In addition transient or permanent faults of vehicles or roadside routers (road components are unavoidable, so we need a fault tolerant algorithm to overcome such failures. Although utilizing faulttolerant techniques cause to more efficiency and reliability in service discovery for vehicle networks, there are many few service discovery algorithms that have considered fault- tolerant techniques. In this paper we have improved one of these algorithms which is named Fault-Tolerant Location-Based Vehicular Service Discovery Protocol (FLocVSD in order to being more reliable.

Saeed Fathi Ghiri

2012-08-01

373

Group-based Scheduling Algorithm for Fault Tolerance in Mobile Grid  

Science.gov (United States)

Mobile Grid is a branch of Grid computing where the infrastructure includes mobile devices. Because mobile devices are resource-constrained, mobile Grid should provide new scheduling strategies considering its environment. This paper presents a group-based fault tolerance scheduling algorithm. The algorithm classifies mobile devices into several groups considering characteristic parameters of mobile Grid. Then, it uses an adaptive replication algorithm for enduring faults in an active manner. The experimental results show that our scheduling algorithm provides a superior performance in terms of execution times to the one without considering grouping and fault tolerance. Throughout the experiments, we found that the active fault tolerance (i.e., replication) is essential to improving performance in mobile Grid.

Lee, Jonghyuk; Choi, Sungjin; Suh, Taeweon; Yu, Heonchang; Gil, Joonmin

374

Fault-Tolerance and Load-Balance Tradeoff in a Distributed Storage System / Estudio de la interdependencia entre tolerancia a fallas y balance de carga en un sistema de almacenamiento distribuido  

Scientific Electronic Library Online (English)

Full Text Available SciELO Mexico | Language: English Abstract in spanish En los últimos años los sistemas de almacenamiento distribuido han sido objeto de un gran interés por parte de la comunidad de investigadores. Estos sistemas prometen mejoras en cuanto a integridad, seguridad y disponibilidad de la información. Sin embargo, hasta este momento no existe un enfoque pr [...] edominante, aunque hay diversas propuestas en la literatura. En este artículo reportamos los resultados de nuestras investigaciones con una combinación de técnicas de redundancia que tienen el propósito de proveer simultáneamente tolerancia a fallas y balance de carga en un sistema de almacenamiento distribuido de pequeña escala. Con base en nuestro análisis proporcionamos líneas directrices generales para diseñadores y desarrolladores de sistemas similares. Abstract in english In recent years distributed storage systems have been the object of increasing interest by the research community. They promise improvements on information availability, security and integrity. Nevertheless, at this point in time, there is no a predominant approach, but a wide spectrum of proposals [...] in the literature. In this paper we report our findings with a combination of redundancy techniques intended to simultaneously provide fault tolerance and load balance in a small-scale distributed storage system. Based on our analysis, we provide general guidelines for system designers and developers under similar conditions.

Moisés, Quezada Naquid; Ricardo, Marcelín Jiménez; Miguel, López Guerrero.

2010-12-01

375

Design and modelling of permanent magnet machine's windings for fault-tolerant applications  

Digital Repository Infrastructure Vision for European Research (DRIVER)

The research described in this thesis focuses on the mitigation of inter-turn short-circuit (SC) faults in Fault tolerant Permanent Magnet (FT-PM) machines. An analytical model is proposed to evaluate the inter-turn SC fault current accounting for the location in the slot of the short-circuited turn(s). As a mitigation strategy to SC faults at the design stage, a winding arrangement called VSW (Vertically placed Strip Winding) is proposed and analysed. The proposed analytical model is benchma...

Arumugam, Puvaneswaran

2013-01-01

376

Fault Tolerance Implementation within SRAM Based FPGA Designs based upon Single Event Upset Occurrence Rates  

Science.gov (United States)

Emerging technology is enabling the design community to consistently expand the amount of functionality that can be implemented within Integrated Circuits (ICs). As the number of gates placed within an FPGA increases, the complexity of the design can grow exponentially. Consequently, the ability to create reliable circuits has become an incredibly difficult task. In order to ease the complexity of design completion, the commercial design community has developed a very rigid (but effective) design methodology based on synchronous circuit techniques. In order to create faster, smaller and lower power circuits, transistor geometries and core voltages have decreased. In environments that contain ionizing energy, such a combination will increase the probability of Single Event Upsets (SEUs) and will consequently affect the state space of a circuit. In order to combat the effects of radiation, the aerospace community has developed several "Hardened by Design" (fault tolerant) design schemes. This paper will address design mitigation schemes targeted for SRAM Based FPGA CMOS devices. Because some mitigation schemes may be over zealous (too much power, area, complexity, etc.. . .), the designer should be conscious that system requirements can ease the amount of mitigation necessary for acceptable operation. Therefore, various degrees of Fault Tolerance will be demonstrated along with an analysis of its effectiveness.

Berg, Melanie

2006-01-01

377

Minimum sliding mode error feedback control for fault tolerant reconfigurable satellite formations with J2 perturbations  

Science.gov (United States)

Minimum Sliding Mode Error Feedback Control (MSMEFC) is proposed to improve the control precision of spacecraft formations based on the conventional sliding mode control theory. This paper proposes a new approach to estimate and offset the system model errors, which include various kinds of uncertainties and disturbances, as well as smoothes out the effect of nonlinear switching control terms. To facilitate the analysis, the concept of equivalent control error is introduced, which is the key to the utilization of MSMEFC. A cost function is formulated on the basis of the principle of minimum sliding mode error; then the equivalent control error is estimated and fed back to the conventional sliding mode control. It is shown that the sliding mode after the MSMEFC will approximate to the ideal sliding mode, resulting in improved control performance and quality. The new methodology is applied to spacecraft formation flying. It guarantees global asymptotic convergence of the relative tracking error in the presence of J2 perturbations. In addition, some fault tolerant situations such as thruster failure for a period of time, thruster degradation and so on, are also considered to verify the effectiveness of MSMEFC. Numerical simulations are performed to demonstrate the efficacy of the proposed methodology to maintain and reconfigure the satellite formation with the existence of initial offsets and J2 perturbation effects, even in the fault-tolerant cases.

Cao, Lu; Chen, Xiaoqian; Misra, Arun K.

2014-03-01

378

New Design for Quantum Dots Cellular Automata to obtain Fault Tolerant Logic Gates  

Energy Technology Data Exchange (ETDEWEB)

In this paper, we analyze fault tolerance properties of the Majority Gate, as the main logic gate for implementation with Quantum dots Cellular Automata (QCA), in terms of fabrication defect. Our results demonstrate the poor fault tolerance properties of the conventional design of Majority Gate and thus the difficulty in its practical application. We propose a new approach to the design of QCA-based Majority Gate by considering two-dimensional arrays of QCA cells rather than a single cell for the design of such a gate. We analyze fault tolerance properties of such Block Majority Gates in terms of inputs misalignment and irregularity and defect (missing cells) in assembly of the array. We present simulation results based on semiconductor implementation of QCA with an intermediate dimensional dot of about 5 nm in size as opposed to magnetic dots of greater than 100 nm or molecular dots of 2-5A. Our results clearly demonstrate the superior fault tolerance properties of the Block Majority Gate and its greater potential for a practical realization. We also show the possibility of designing fault tolerant QCA circuits by using Block Majority Gates.

Fijany, Amir; Toomarian, Benny N. [California Institute of Technology, Jet Propulsion Laboratory (United States)

2001-02-15

379

New Design for Quantum Dots Cellular Automata to obtain Fault Tolerant Logic Gates  

International Nuclear Information System (INIS)

In this paper, we analyze fault tolerance properties of the Majority Gate, as the main logic gate for implementation with Quantum dots Cellular Automata (QCA), in terms of fabrication defect. Our results demonstrate the poor fault tolerance properties of the conventional design of Majority Gate and thus the difficulty in its practical application. We propose a new approach to the design of QCA-based Majority Gate by considering two-dimensional arrays of QCA cells rather than a single cell for the design of such a gate. We analyze fault tolerance properties of such Block Majority Gates in terms of inputs misalignment and irregularity and defect (missing cells) in assembly of the array. We present simulation results based on semiconductor implementation of QCA with an intermediate dimensional dot of about 5 nm in size as opposed to magnetic dots of greater than 100 nm or molecular dots of 2-5A. Our results clearly demonstrate the superior fault tolerance properties of the Block Majority Gate and its greater potential for a practical realization. We also show the possibility of designing fault tolerant QCA circuits by using Block Majority Gates

380

High available and fault tolerant mobile communications infrastructure  

DEFF Research Database (Denmark)

High availability is a key requirement in mobile communication systems, especially, when it is used for mission-critical services such as public safety e.g. police, ambulance and fire services. A failure in the fixed network infrastructure that provides services to mobile users can affect a large number of users and risk loss of lives. The fixed infrastructure of mobile communication system has different characteristics, for example, architecture ´complexity, real-time peer-topeer communication and performance requirements that make the already existing failure recovery techniques, such as those using rollback or replication techniques inapplicable. This dissertation presents a novel failure recovery approach based on a behavioral model of the communication protocols. The new recovery method is able to deal with software and hardware faults and is particularly suitable for mobile communications infrastructure. The method enables the faulty applications in the infrastructure to quickly and effectively resume their services to their mobile clients with no or minimal loss of work after failure. In our approach, we do not assume a specific fault behavior for example failstop or transient behavior as it is the case for many recovery techniques. In addition, the method does not require any modification to mobile clients. The Communicating Extended Finite State Machine (CEFSM) is used to model the behavior of the infrastructure applications. The model based recovery scheme is integrated in the application and uses the client/server model to save the application state information during failure-free execution on a stable storage and retrieve them when needed during recovery. When and what information to be saved/retrieved is determined by the behavioral model of the application. To practically evaluate and demonstrate the effectiveness of our method, we developed as a case study an experimental testbed for the TETRA (TErrestrial Trunked Radio) packet data network. The testbed works as a distributed system and can run various communication scenarios between the fixed network infrastructure and its mobile users. We thoroughly followed the TETRA standard specifications in our implementation of the communication protocols in order to get a testbed system that operates as the real system with respect to message exchange and timing. The experimental results showed that by using our method the faulty infrastructure application can immediately resume its service after its restart and in less than a minute, it restores its service performance level prior to the failure. The failure-free overhead incurred by the method is relatively low, and is experimentally found to be less than 5% in the conducted experiments.

Beiroumi, Mohammad Zib

2006-01-01

 
 
 
 
381

Solar Dynamic Power System Fault Diagnosis  

Science.gov (United States)

The objective of this research is to conduct various fault simulation studies for diagnosing the type and location of faults in the power distribution system. Different types of faults are simulated at different locations within the distribution system and the faulted waveforms are monitored at measurable nodes such as at the output of the DDCU's. These fault signatures are processed using feature extractors such as FFT and wavelet transforms. The extracted features are fed to a clustering based neural network for training and subsequent testing using previously unseen data. Different load models consisting of constant impedance and constant power are used for the loads. Open circuit faults and short circuit faults are studied. It is concluded from present studies that using features extracted from wavelet transforms give better success rates during ANN testing. The trained ANN's are capable of diagnosing fault types and approximate locations in the solar dynamic power distribution system.

Momoh, James A.; Dias, Lakshman G.

1996-01-01

382

ECFS: A decentralized, distributed and fault-tolerant FUSE filesystem for the LHCb online farm  

Science.gov (United States)

The LHCb experiment records millions of proton collisions every second, but only a fraction of them are useful for LHCb physics. In order to filter out the "bad events" a large farm of x86-servers (~2000 nodes) has been put in place. These servers boot from and run from NFS, however they use their local disk to temporarily store data, which cannot be processed in real-time ("data-deferring"). These events are subsequently processed, when there are no live-data coming in. The effective CPU power is thus greatly increased. This gain in CPU power depends critically on the availability of the local disks. For cost and power-reasons, mirroring (RAID-1) is not used, leading to a lot of operational headache with failing disks and disk-errors or server failures induced by faulty disks. To mitigate these problems and increase the reliability of the LHCb farm, while at same time keeping cost and power-consumption low, an extensive research and study of existing highly available and distributed file systems has been done. While many distributed file systems are providing reliability by "file replication", none of the evaluated ones supports erasure algorithms. A decentralised, distributed and fault-tolerant "write once read many" file system has been designed and implemented as a proof of concept providing fault tolerance without using expensive - in terms of disk space - file replication techniques and providing a unique namespace as a main goals. This paper describes the design and the implementation of the Erasure Codes File System (ECFS) and presents the specialised FUSE interface for Linux. Depending on the encoding algorithm ECFS will use a certain number of target directories as a backend to store the segments that compose the encoded data. When target directories are mounted via nfs/autofs - ECFS will act as a file-system over network/block-level raid over multiple servers.

Rybczynski, Tomasz; Bonaccorsi, Enrico; Neufeld, Niko

2014-06-01

383

High-Order Sliding Mode Control of a DFIG-Based Wind Turbine for Power Maximization and Grid Fault Tolerance  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This paper deals with power extraction maximization and grid fault tolerance of a Doubly-Fed Induction Generator (DFIG)-based Wind Turbine (WT). These variable speed systems have several advantages over the traditional wind turbine operating methods, such as the reduction of the mechanical stress and an increase in the energy capture. To fully exploit this latest advantage, many efforts have been made to develop Maximum Power Point Tracking (MPPT) control schemes. In this context, this paper ...

Beltran, Brice; Benbouzid, Mohamed; Ahmed-ali, Tarek

2009-01-01

384

Fault-Tolerance through Message-logging and Check-pointing: Disaster Recovery for CORBA-based Distributed Bank Servers  

Digital Repository Infrastructure Vision for European Research (DRIVER)

This report presents results of our endeavor towards developing a failure-recovery variant of a CORBA-based bank server that provides fault tolerance features through message logging and checkpoint logging. In this group of projects, three components were developed to satisfy the requirements: 1) a message-logging protocol for the branch servers of the distributed banking system to log required information; 2) a recovery module that restarts the bank server using the message...

Vassev, Emil; Nguyen, Que Thu Dung; Kuang, Heng

2009-01-01

385

Design of Fault-Tolerant and Dynamically-Reconfigurable Microfluidic Biochips  

CERN Document Server

Microfluidics-based biochips are soon expected to revolutionize clinical diagnosis, DNA sequencing, and other laboratory procedures involving molecular biology. Most microfluidic biochips are based on the principle of continuous fluid flow and they rely on permanently-etched microchannels, micropumps, and microvalves. We focus here on the automated design of "digital" droplet-based microfluidic biochips. In contrast to continuous-flow systems, digital microfluidics offers dynamic reconfigurability; groups of cells in a microfluidics array can be reconfigured to change their functionality during the concurrent execution of a set of bioassays. We present a simulated annealing-based technique for module placement in such biochips. The placement procedure not only addresses chip area, but it also considers fault tolerance, which allows a microfluidic module to be relocated elsewhere in the system when a single cell is detected to be faulty. Simulation results are presented for a case study involving the polymeras...

Su, Fei

2011-01-01

386

Implementation of the Six Channel Redundancy to achieve fault tolerance in testing of satellites  

CERN Document Server

This paper aims to implement the six channel redundancy to achieve fault tolerance in testing of satellites with acoustic spectrum. We mainly focus here on achieving fault tolerance. An immediate application is the microphone data acquisition and to do analysis at the Acoustic Test Facility (ATF) centre, National Aerospace Laboratories. It has an 1100 cubic meter reverberation chamber in which a maximum sound pressure level of 157 dB is generated. The six channel Redundancy software with fault tolerant operation is devised and developed. The data are applied to program written in C language. The program is run using the Code Composer Studio by accepting the inputs. This is tested with the TMS 320C 6727 DSP, Pro Audio Development Kit (PADK).

Aravinda, H S; Moodithaya, Ranjan

2010-01-01

387

Machine-checked proofs of the design and implementation of a fault-tolerant circuit  

Science.gov (United States)

A formally verified implementation of the 'oral messages' algorithm of Pease, Shostak, and Lamport is described. An abstract implementation of the algorithm is verified to achieve interactive consistency in the presence of faults. This abstract characterization is then mapped down to a hardware level implementation which inherits the fault-tolerant characteristics of the abstract version. All steps in the proof were checked with the Boyer-Moore theorem prover. A significant results is the demonstration of a fault-tolerant device that is formally specified and whose implementation is proved correct with respect to this specification. A significant simplifying assumption is that the redundant processors behave synchronously. A mechanically checked proof that the oral messages algorithm is 'optimal' in the sense that no algorithm which achieves agreement via similar message passing can tolerate a larger proportion of faulty processor is also described.

Bevier, William R.; Young, William D.

1990-01-01

388

A Fault Tolerant Congestion Aware Routing Protocol for Mobile Adhoc Networks  

Directory of Open Access Journals (Sweden)

Full Text Available Problem statement: The performance of ad hoc routing protocols will significantly degrade when there are faulty nodes in the network. Packet losses and bandwidth degradation are caused due to congestion and thus, time and energy is wasted during its recovery. The fault tolerant congestion aware routing protocol addresses these problems by exploring the network redundancy through multipath routing. Approach: In this study, it is proposed to design a fault tolerant congestion aware multi path routing protocol to reduce the route breakages and congestion losses. The AOMDV protocol is used as a base for the multipath routing. This proposed scheme enables more nodes to salvage a dropped packet. Results: Simulation results show that the proposed protocol achieves better throughput and packet delivery ratio with reduced delay, packet drop and energy. Conclusion: An effective congestion control technique proposed in this study proactively detects node level and link level congestion and performs congestion control using the fault-tolerant multiple paths.

K. Duraiswamy

2012-01-01

389

Evolutionary Based Techniques for Fault Tolerant Field Programmable Gate Arrays  

Science.gov (United States)

The use of SRAM-based Field Programmable Gate Arrays (FPGAs) is becoming more and more prevalent in space applications. Commercial-grade FPGAs are potentially susceptible to permanently debilitating Single-Event Latchups (SELs). Repair methods based on Evolutionary Algorithms may be applied to FPGA circuits to enable successful fault recovery. This paper presents the experimental results of applying such methods to repair four commonly used circuits (quadrature decoder, 3-by-3-bit multiplier, 3-by-3-bit adder, 440-7 decoder) into which a number of simulated faults have been introduced. The results suggest that evolutionary repair techniques can improve the process of fault recovery when used instead of or as a supplement to Triple Modular Redundancy (TMR), which is currently the predominant method for mitigating FPGA faults.

Larchev, Gregory V.; Lohn, Jason D.

2006-01-01

390

Study on the Fault-Tolerance Concept of the Five-Phase Permanent Magnet Synchronous Generator  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In this paper an investigation on the fault tolerance capability of the five-phase permanent magnet synchronous generator is presented. The electric machine, which has lap stator winding and surface permanent magnets, has been designed for islanded-use purposes. The study takes into consideration the open-circuit type faults. It was analyzed the operation under healthy, one-phase open-circuited and two-phase open-circuited (adjacent and non-adjacent) conditions respectively. T...

Livadaru, L.; Munteanu, A.; Simion, A.; Virlan, B.; Benelghali, S.

2014-01-01

391

Fault Tolerant Attitude Control for Flexible Satellite with Uncertainties and Actuator Saturation  

Digital Repository Infrastructure Vision for European Research (DRIVER)

A novel fault tolerant control scheme, using model?based control and time delay control theories, is proposed for flexible satellites with uncertainties and actuator saturation and the stability condition of the scheme is analysed. The moment–of?inertia uncertainty, actuator faults uncertainty, space environment disturbances and the actuator saturation are analysed. The computable control torques, including the space environmental torques, reaction wheel dynamics and the known flexible ...

Qiang Meng; Tao Zhang; Da-chuan Li; Jie-mei Liang; Bo Liu; Jing-yan Song

2013-01-01

392

A Modified Fault Tolerant Location-Based Service Discovery Protocol for Vehicular Networks  

Digital Repository Infrastructure Vision for European Research (DRIVER)

In the recent years, advances in Vehicular networks have attracted special attraction of researchers. Lately two types of applications have gain popularity: Road safety and Driving comfort. Reliable data transformation in the city environment is hard to accomplish due to presence of noise and obstacles. In addition transient or permanent faults of vehicles or roadside routers (road components) are unavoidable, so we need a fault tolerant algorithm to overcome such failures. Although utilizin...

Saeed Fathi Ghiri; Morteza Rahmani; Hassan Almasi

2012-01-01

393

The Rainbow Skip Graph: A Fault-Tolerant Constant-Degree P2P Relay Structure  

Digital Repository Infrastructure Vision for European Research (DRIVER)

We present a distributed data structure, which we call the rainbow skip graph. To our knowledge, this is the first peer-to-peer data structure that simultaneously achieves high fault tolerance, constant-sized nodes, and fast update and query times for ordered data. It is a non-trivial adaptation of the SkipNet/skip-graph structures of Harvey et al. and Aspnes and Shah, so as to provide fault-tolerance as these structures do, but to do so using constant-sized nodes, as in the...

Goodrich, Michael T.; Nelson, Michael J.; Sun, Jonathan Z.

2009-01-01

394

Evaporator unit as a benchmark for plug and play and fault tolerant control  

DEFF Research Database (Denmark)

This paper presents a challenging industrial benchmark for implementation of control strategies under realistic working conditions. The developed control strategies should perform in a plug & play manner, i.e. adapt to varying working conditions, optimize their performance, and provide fault tolerance. A fault tolerant strategy is needed to deal with a faulty sensor measurement of the evaporation pressure. The design and algorithmic challenges in the control of an evaporator include: unknown model parameters, large parameter variations, varying loads, and external discrete phenomena such as compressor switch on/o or abrupt change in compressor speed.

Izadi-Zamanabadi, Roozbeh; Vinther, Kasper

2012-01-01

395

Fault Tolerant Variable Block Carry Skip Logic (VBCSL) using Parity Preserving Reversible Gates  

CERN Document Server

Reversible logic design has become one of the promising research directions in low power dissipating circuit design in the past few years and has found its application in low power CMOS design, digital signal processing and nanotechnology. This paper presents the efficient design approaches of fault tolerant carry skip adders (FTCSAs) and compares those designs with the existing ones. Variable block carry skip logic (VBCSL) using the fault tolerant full adders (FTFAs) has also been developed. The designs are minimized in terms of hardware complexity, gate count, constant inputs and garbage outputs. Besides of it, technology independent evaluation of the proposed designs clearly demonstrates its superiority with the existing counterparts.

Islam, Md Saiful; Begum, Zerina; Hafiz, Mohd Zulfiquar

2010-01-01

396

Active fault detection in MIMO systems  

DEFF Research Database (Denmark)

The focus in this paper is on active fault detection (AFD) for MIMO systems with parametric faults. The problem of design of auxiliary inputs with respect to detection of parametric faults is investigated. An analysis of the design of auxiliary inputs is given based on analytic transfer functions from auxiliary input to residual outputs. The analysis is based on a singular value decomposition of these transfer functions Based on this analysis, it is possible to design auxiliary input as well as design of the associated residual vector with respect to every single parametric fault in the system such that it is possible to detect these faults.

Niemann, Hans Henrik; Poulsen, Niels KjØlstad

2014-01-01

397

Fault trees for diagnosis of system fault conditions  

International Nuclear Information System (INIS)

Methods for generating repair checklists on the basis of fault tree logic and probabilistic importance are presented. A one-step-ahead optimization procedure, based on the concept of component criticality, minimizing the expected time to diagnose system failure is outlined. Options available to the operator of a nuclear power plant when system fault conditions occur are addressed. A low-pressure emergency core cooling injection system, a standby safeguard system of a pressurized water reactor power plant, is chosen as an example illustrating the methods presented

398

Fault tolerant attitude sensing and force feedback control for unmanned aerial vehicles  

Science.gov (United States)

Two aspects of an unmanned aerial vehicle are studied in this work. One is fault tolerant attitude determination and the other is to provide force feedback to the joy-stick of the UAV so as to prevent faulty inputs from the pilot. Determination of attitude plays an important role in control of aerial vehicles. One way of defining the attitude is through Euler angles. These angles can be determined based on the measurements of the projections of the gravity and earth magnetic fields on the three body axes of the vehicle. Attitude determination in unmanned aerial vehicles poses additional challenges due to limitations of space, payload, power and cost. Therefore it provides for almost no room for any bulky sensors or extra sensor hardware for backup and as such leaves no room for sensor fault issues either. In the face of these limitations, this study proposes a fault tolerant computing of Euler angles by utilizing multiple different computation methods, with each method utilizing a different subset of the available sensor measurement data. Twenty-five such methods have been presented in this document. The capability of computing the Euler angles in multiple ways provides a diversified redundancy required for fault tolerance. The proposed approach can identify certain sets of sensor failures and even separate the reference fields from the disturbances. A bank-to-turn maneuver of the NASA GTM UAV is used to demonstrate the fault tolerance provided by the proposed method as well as to demonstrate the method of determining the correct Euler angles despite interferences by inertial acceleration disturbances. Attitude computation is essential for stability. But as of today most UAVs are commanded remotely by human pilots. While basic stability control is entrusted to machine or the on-board automatic controller, overall guidance is usually with humans. It is therefore the pilot who sets the command/references through a joy-stick. While this is a good compromise between complete automation and complete human control, it still poses some unique challenges. Pilots of manned aircraft are present inside the cockpit of the aircraft they fly and thus have a better feel of the flying environment and also the limitations of the flight. The same might not be true for UAV pilots stationed on the ground. A major handicap is that visual feedback is the only one available for the UAV pilot. An additional parameter like force feedback on the remote control joy-stick can help the UAV pilot to physically feel the limitation of the safe flight envelope. This can make the flying itself easier and safer. A method proposed here is to design a joy-stick assembly with an additional actuator. This actuator is controlled so as to generate a force feedback on the joy-stick. The control developed for this system is such that the actuator allows free movement for the pilot as long as the UAV is within the safe flight envelope. On the other hand, if it is outside this safe range, the actuator opposes the pilot's applied torque and prevents him/her from giving erroneous commands to the UAV.

Jagadish, Chirag

399

Error-detection-based quantum fault tolerance against discrete Pauli noise  

Digital Repository Infrastructure Vision for European Research (DRIVER)

A quantum computer -- i.e., a computer capable of manipulating data in quantum superposition -- would find applications including factoring, quantum simulation and tests of basic quantum theory. Since quantum superpositions are fragile, the major hurdle in building such a computer is overcoming noise. Developed over the last couple of years, new schemes for achieving fault tolerance based on error detection, rather than error correction, appear to tolerate as much as 3-6% ...

Reichardt, Ben W.

2006-01-01

400

Theory of Decoherence-Free Fault-Tolerant Universal Quantum Computation  

CERN Document Server

Universal quantum computation on decoherence-free subspaces and subsystems (DFSs) is examined with particular emphasis on using only physically relevant interactions. A necessary and sufficient condition for the existence of decoherence-free (noiseless) subsystems in the Markovian regime is derived here for the first time. A stabilizer formalism for DFSs is then developed which allows for the explicit understanding of these in their dual role as quantum error correcting codes. Conditions for the existence of Hamiltonians whose induced evolution always preserves a DFS are derived within this stabilizer formalism. Two possible collective decoherence mechanisms arising from permutation symmetries of the system-bath coupling are examined within this framework. It is shown that in both cases universal quantum computation which always preserves the DFS (*natural fault-tolerant computation*) can be performed using only two-body interactions. This is in marked contrast to standard error correcting codes, where all kn...

Kempe, J; Lidar, D A; Whaley, K B; Kempe, Julia; Bacon, David; Lidar, Daniel A.

2001-01-01

 
 
 
 
401

Fault-Tolerant Robot Programming through Simulation with Realistic Sensor Models  

Directory of Open Access Journals (Sweden)

Full Text Available We introduce a simulation system for mobile robots that allows a realistic interaction of multiple robots in a common environment. The simulated robots are closely modeled after robots from the EyeBot family and have an identical application programmer interface. The simulation supports driving commands at two levels of abstraction as well as numerous sensors such as shaft encoders, infrared distance sensors, and compass. Simulation of on-board digital cameras via synthetic images allows the use of image processing routines for robot control within the simulation. Specific error models for actuators, distance sensors, camera sensor, and wireless communication have been implemented. Progressively increasing error levels for an application program allows for testing and improving its robustness and fault-tolerance.

Axel Waggershauser

2008-11-01

402

Fault-Tolerant Dissipative Preparation of Atomic Quantum Registers with Fermions  

CERN Document Server

We propose a fault tolerant loading scheme to produce an array of fermions in an optical lattice of the high fidelity required for applications in quantum information processing and the modelling of strongly correlated systems. A cold reservoir of Fermions plays a dual role as a source of atoms to be loaded into the lattice via a Raman process and as a heat bath for sympathetic cooling of lattice atoms. Atoms are initially transferred into an excited motional state in each lattice site, and then decay to the motional ground state, creating particle-hole pairs in the reservoir. Atoms transferred into the ground motional level are no longer coupled back to the reservoir, and doubly occupied sites in the motional ground state are prevented by Pauli blocking. This scheme has strong conceptual connections with optical pumping, and can be extended to load high-fidelity patterns of atoms.

Griessner, A; Jaksch, D; Zoller, P

2005-01-01

403

ADHOCFTSIM: A Simulator of Fault Tolerence In the AD-HOC Networks  

Directory of Open Access Journals (Sweden)

Full Text Available The flexibility and diversity of Wireless Mobile Networks offer many opportunities that are not alwaystaken into account by existing distributed systems. In particular, the proliferation of mobile users and theuse of mobile Ad-Hoc promote the formation of collaborative groups to share resources. We propose asolution for the management of fault tolerance in the Ad-Hoc networks, combining the functions neededto better availability of data. Our contribution takes into account the characteristics of mobile terminalsin order to reduce the consumption of resources critical that energy, and to minimize the loss ofinformation. Our solution is based on the formation of clusters, where each is managed by a node leader.This solution is mainly composed of four sub-services, namely: prediction, replication, management ofnodes in the cluster and supervision. We have shown, using several sets of simulation, that our solution istwofold: minimizing the energy consumption which increases the life of the network and better supportdeal with requests lost.

Esma Insaf Djebbar

2010-11-01

404

Fault Isolation in Distributed Embedded Systems  

Digital Repository Infrastructure Vision for European Research (DRIVER)

To improve safety, reliability, and efficiency of automotive vehicles and other technical applications, embedded systems commonly use fault diagnosis consisting of fault detection and isolation. Since many systems are constructed as distributed embedded systems including multiple control units, it is necessary to perform global fault isolation using for example a central unit. However, the drawbacks with such a centralized method are the need of a powerful diagnostic unit and the sensitivity ...

Biteus, Jonas

2007-01-01

405

Sensor and Actuator Fault-Hiding Reconfigurable Control Design for a Four-Tank System Benchmark  

DEFF Research Database (Denmark)

Fault detection and compensation plays a key role to fulfill high demands for performance and security in today's technological systems. In this paper, a fault-hiding (i.e., tolerant) control scheme that detects and compensates for actuator and sensor faults in a four-tank system benchmark is introduced. Faults are modeled as a drastic gain loss in actuators (i.e., pumps) and in sensor measurements (i.e., level detection) which could lead to a large loss in the nominal performance. A configurable decentralized Proportional Integral (PI) controller is designed and applied to a Linear Time Invariant (LTI) system where virtual sensors and virtual actuators are used to correct faulty performance through the use of a pre-fault performance. Simulation results showed that the developed approach can handle different types of faults and able to completely and instantly recover the original system performance/functionality directly after the occurrence of faults.

Hameed, Ibrahim; El-Madbouly, E I

2015-01-01

406

Fault tolerant control of wind turbines using unknown input observers  

DEFF Research Database (Denmark)

This paper presents a scheme for accommodating faults in the rotor and generator speed sensors in a wind turbine. These measured values are important both for the wind turbine controller as well as the supervisory control of the wind turbine. The scheme is based on unknown input observers, which are also used to detect and isolate these faults. The scheme is tested on a known benchmark for FDI and FTC of wind turbines. Tests on this benchmark model show a clear potential of the proposed scheme.

Odgaard, Peter Fogh; Stoustrup, Jakob

2012-01-01

407

Improving the Navigability of a Hexapod Robot using a Fault-Tolerant Adaptive Gait  

Directory of Open Access Journals (Sweden)

Full Text Available This paper encompasses a study on the development of a walking gait for fault tolerant locomotion in unstructured environments. The fault tolerant gait for adaptive locomotion fulfills stability conditions in opposition to a fault (locked joints or sensor failure event preventing a robot to realize stable locomotion over uneven terrains. To accomplish this feat, a fault tolerant gait based on force?position control is proposed in this paper for a hexapod robot to enable stable walking with a joint failure. Furthermore, we extend our proposed fault detection and diagnosis (FDD method to deal with the critical failure of the angular rate sensors responsible for the attitude control of the robot over uneven terrains. A performance analysis of straight? line walking is carried out which shows that the proposed FDD?based gait is capable of generating an adaptive walking pattern during joint or sensor failures. The performance of the proposed control is established using dynamic simulations and real?world experiments on a prototype hexapod robot.

Umar Asif

2012-06-01

408

Fault-tolerant VHDL descriptions: a case study for SEU-tolerant digital library  

Science.gov (United States)

This paper presents a new cost-effective method of designing Single Event Upset (SEU)-tolerant digital systems based on Commercial-Off-The-Shelf (COTS) Field-Programmable-Gate-Array (FPGA) devices. The project was carried out in cooperation of Technical University of Lodz (TUL) with Deutsches Elektronen-Synchrotron (DESY). DESY is a high-energy particle physics research centre, located in Hamburg, Germany, and has been chosen as a home site for a new generation particle collider - X-Ray Free Electron Laser (X-FEL) accelerator. A need of implementing digital control systems inside accelerators main tunnel, brought a new hardware approach to low-cost design reliable compex circuits with respect to Single Event Effects (SEEs). The goal was to develop a high performance method without modifications in the FPGA architecture and without high area penalties. A SEU-tolerant, digital library has been created. From basic gates, through combinational and sequential cells to some more sophisticated units like memory blocks, code converters or arithmetical functions cells, in all elements upset detection and mitigation schemes have been implemented. The library was described in Very High Speed Integrated Circuit Hardware Description Language (VHDL).

Tomczak, M.; Swiercz, B.; Napieralski, A.

2006-10-01

409

Step-by-step magic state encoding for efficient fault-tolerant quantum computation  

Science.gov (United States)

Quantum error correction allows one to make quantum computers fault-tolerant against unavoidable errors due to decoherence and imperfect physical gate operations. However, the fault-tolerant quantum computation requires impractically large computational resources for useful applications. This is a current major obstacle to the realization of a quantum computer. In particular, magic state distillation, which is a standard approach to universality, consumes the most resources in fault-tolerant quantum computation. For the resource problem, here we propose step-by-step magic state encoding for concatenated quantum codes, where magic states are encoded step by step from the physical level to the logical one. To manage errors during the encoding, we carefully use error detection. Since the sizes of intermediate codes are small, it is expected that the resource overheads will become lower than previous approaches based on the distillation at the logical level. Our simulation results suggest that the resource requirements for a logical magic state will become comparable to those for a single logical controlled-NOT gate. Thus, the present method opens a new possibility for efficient fault-tolerant quantum computation.

Goto, Hayato

2014-12-01

410

Fault-tolerant scheduling using primary-backup approach for optical grid applications  

Science.gov (United States)

Fault-tolerant scheduling is an important issue for optical gird applications because of a wide range of grid resource failures. To improve the availability of the DAGs (directed acyclic graphs), a primary-backup approach is considered when making DAG scheduling decision. Experiments demonstrate the effectiveness and the practicability of the proposed scheme.

Zhu, Min; Xiao, Shilin; Guo, Wei; Wei, Anne; Jin, Yaohui; Hu, Weisheng; Geller, Benoit