WorldWideScience

Sample records for fault tolerant systems

  1. Fault Tolerant Control Systems

    DEFF Research Database (Denmark)

    Bøgh, S.A.

    1997-01-01

    This thesis considered the development of fault tolerant control systems. The focus was on the category of automated processes that do not necessarily comprise a high number of identical sensors and actuators to maintain safe operation, but still have a potential for improving immunity to component failures. It is often feasible to increase availability for these control loops by designing the control system to perform on-line detection and reconfiguration in case of faults before the safety sys...

  2. Fault Tolerant Real Time Systems

    CERN Document Server

    Persya, A Christy

    2010-01-01

    Real time systems are systems in which there is a commitment for timely response by the computer to external stimuli. Real time applications have to function correctly even in presence of faults. Fault tolerance can be achieved by either hardware or software or time redundancy. Safety-critical applications have strict time and cost constraints, which means that not only faults have to be tolerated but also the constraints should be satisfied. Deadline scheduling means that the taskwith the earliest required response time is processed. The most common scheduling algorithms are :Rate Monotonic(RM) and Earliest deadline first(EDF).This paper deals with the interaction between the fault tolerant strategy and the EDF real time scheduling strategy.

  3. Fault tolerant control for switched linear systems

    CERN Document Server

    Du, Dongsheng; Shi, Peng

    2015-01-01

    This book presents up-to-date research and novel methodologies on fault diagnosis and fault tolerant control for switched linear systems. It provides a unified yet neat framework of filtering, fault detection, fault diagnosis and fault tolerant control of switched systems. It can therefore serve as a useful textbook for senior and/or graduate students who are interested in knowing the state-of-the-art of filtering, fault detection, fault diagnosis and fault tolerant control areas, as well as recent advances in switched linear systems.  

  4. Soft Computing Approaches To Fault Tolerant Systems

    Directory of Open Access Journals (Sweden)

    Neeraj Prakash Srivastava

    2014-05-01

    Full Text Available We present in this paper as an introduction to soft computing techniques for fault tolerant systems and the terminology with different ways of achieving fault tolerance. The paper focuses on the problem of fault tolerance using soft computing techniques. The fundamentals of soft computing approaches and its type with introduction of fault tolerance are discussed. The main objective is to show how to implement soft computing approaches for fault detection, isolation and identification. The paper contains details about soft computing application with an application of wireless sensor network as fault tolerant system.

  5. Fault Tolerant Quantum Filtering and Fault Detection for Quantum Systems

    OpenAIRE

    Gao, Qing(MOE Key Laboratory of Fundamental Quantities Measurement, School of Physics, Huazhong University of Science and Technology, 430074, Wuhan, Hubei, China); Dong, Daoyi; Petersen, Ian R

    2015-01-01

    This paper aims to determine the fault tolerant quantum filter and fault detection equation for a class of open quantum systems coupled to laser fields and subject to stochastic faults. In order to analyze open quantum systems where the system dynamics involve both classical and quantum random variables, a quantum-classical probability space model is developed. Using a reference probability approach, a fault tolerant quantum filter and a fault detection equation are simultan...

  6. The Low Latency Fault Tolerance System

    OpenAIRE

    Zhao, Wenbing; P. M. Melliar-Smith; L. E. Moser

    2010-01-01

    The Low Latency Fault Tolerance (LLFT) system provides fault tolerance for distributed applications, using the leader-follower replication technique. The LLFT system provides application-transparent replication, with strong replica consistency, for applications that involve multiple interacting processes or threads. The LLFT system comprises a Low Latency Messaging Protocol, a Leader-Determined Membership Protocol, and a Virtual Determinizer Framework. The Low Latency Messag...

  7. Energy-efficient fault-tolerant systems

    CERN Document Server

    Mathew, Jimson; Pradhan, Dhiraj K

    2013-01-01

    This book describes the state-of-the-art in energy efficient, fault-tolerant embedded systems. It covers the entire product lifecycle of electronic systems design, analysis and testing and includes discussion of both circuit and system-level approaches. Readers will be enabled to meet the conflicting design objectives of energy efficiency and fault-tolerance for reliability, given the up-to-date techniques presented.

  8. Fault-tolerant parallel processing system

    Energy Technology Data Exchange (ETDEWEB)

    Harper, R.E.; Lala, J.H.

    1990-03-06

    This patent describes a fault tolerant processing system for providing processing operations, while tolerating f failures in the execution thereof. It comprises: at least (3f + 1) fault containment regions. Each of the regions includes a plurality of processors; network means connected to the processors and to the network means of the others of the fault containment regions; groups of one or more processors being configured to form redundant processing sites at least one of the groups having (2f + 1) processors, each of the processors of a group being included in a different one of the fault containment regions. Each network means of a fault containment region includes means for providing communication operations between the network means and the network means of the others of the fault containment regions, each of the network means being connected to each other network means by at lest (2f + 1) disjoint communication paths, a minimum of (f + 1) rounds of communication being provided among the network means of the fault containment regions in the execution of a the processing operation; and means for synchronizing the communication operations of the network means with the communications operations of the network means of the other fault containment regions.

  9. Fault tolerant control of systems with saturations

    DEFF Research Database (Denmark)

    Niemann, Hans Henrik

    2013-01-01

    This paper presents framework for fault tolerant controllers (FTC) that includes input saturation. The controller architecture known from FTC is based on the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization is extended to handle input saturation. Applying this controller architecture in connection with faulty systems including input saturation gives an additional YJBK transfer function related to the input saturation. In the fault free case, this additional YJBK transfer function can be applied directly for optimizing the feedback loop around the input saturation. In the faulty case, the design problem is a mixed design problem involved both parametric faults and input saturation.

  10. Software engineering of fault tolerant systems

    CERN Document Server

    Pelliccione, P; Muccini, Henry

    2007-01-01

    In architecting dependable systems, what is required to improve the overall system robustness is fault tolerance. Many methods have been proposed to this end, the solutions are usually considered late during the design and implementation phases of the software life-cycle (e.g., Java and Windows NT exception handling), thus reducing the effectiveness error and fault handling. Since the system design typically models only normal behaviour of the system while ignoring exceptional ones, the implementation of the system is unable to handle abnormal events. Consequently, the system may fail in unexp

  11. Fault tolerance versus performance metrics for robot systems

    International Nuclear Information System (INIS)

    The incorporation of fault tolerance techniques into robot systems improves the reliability, but also increases the hardware and computational requirements in the overall system. It is not always clear how to evaluate the merit, or 'effectiveness' of different fault tolerance approaches for a given application. In this paper, we present a new set of performance criteria designed to measure and compare the effectiveness of robot fault tolerance strategies. The measures, which are designed to evaluate fault tolerance/performance/cost tradeoffs, can also be used to evaluate pure performance or pure fault tolerance strategies. We show their usefulness using a variety of proposed fault tolerance approaches in the literature, focusing on multiprocessor control architectures

  12. Fault-tolerant Actuator System for Electrical Steering of Vehicles

    DEFF Research Database (Denmark)

    SØrensen, Jesper Sandberg; Blanke, Mogens

    2006-01-01

    Being critical to the safety of vehicles, the steering system is required to maintain the vehicles ability to steer until it is brought to halt, should a fault occur. With electrical steering becoming a cost-effective candidate for electrical powered vehicles, a fault-tolerant architecture is needed that meets this requirement. This paper studies the fault-tolerance properties of an electrical steering system. It presents a fault-tolerant architecture where a dedicated AC motor design used in conjunction with cheap voltage measurements can ensure detection of all relevant faults in the steering system. The paper shows how active control reconfiguration can accommodate all critical faults. The fault-tolerant abilities of the steering system are demonstrated on the hardware of a warehouse truck.

  13. Fault-tolerant actuator system for electrical steering of vehicles

    DEFF Research Database (Denmark)

    Thomsen, Jesper Sandberg; Blanke, Mogens

    2006-01-01

    Being critical to the safety of vehicles, the steering system is required to maintain the vehicles ability to steer until it is brought to halt, should a fault occur. With electrical steering becoming a cost-effective candidate for electrical powered vehicles, a fault-tolerant architecture is needed that meets this requirement. This paper studies the fault-tolerance properties of an electrical steering system. It presents a fault-tolerant architecture where a dedicated AC motor design used in conjunction with cheap voltage measurements can ensure detection of all relevant faults in the steering system. The paper shows how active control reconfiguration can accommodate all critical faults. The fault-tolerant abilities of the steering system are demonstrated on the hardware of a warehouse truck.

  14. Fault tolerant aggregation for power system services

    DEFF Research Database (Denmark)

    Kosek, Anna Magdalena; Gehrke, Oliver

    2013-01-01

    Exploiting the flexibility in distributed energy resources (DER) is seen as an important contribution to allow high penetrations of renewable generation in electrical power systems. However, the present control infrastructure in power systems is not well suited for the integration of a very large number of small units. A common approach is to aggregate a portfolio of such units together and expose them to the power system as a single large virtual unit. In order to realize the vision of a Smart Grid, concepts for flexible, resilient and reliable aggregation infrastructures are required. This paper presents such a concept while focusing on the aspect of resilience and fault tolerance. The proposed concept makes use of a multi-level election algorithm to transparently manage the addition, removal, failure and reorganization of units. It has been implemented and tested as a proof-of-concept on the distributed smart grid test bed SYSLAB at the Technical University of Denmark.

  15. Comparing Distributed Online Stream Processing Systems Considering Fault Tolerance Issues

    Directory of Open Access Journals (Sweden)

    André Leon Sampaio Gradvohl

    2014-05-01

    Full Text Available This paper presents an analysis of four online stream processing systems (MillWheel, S4, Spark Streaming and Storm regarding the strategies they use for fault tolerance. We use this sort of system for processing of data streams that can come from different sources such as web sites, sensors, mobile phones or any set of devices that provide real-time high-speed data. Typically, these systems are concerned more with the throughput in data processing than on fault tolerance. However, depending on the type of application, we should consider fault tolerance as an important a feature. The work describes some of the main strategies for fault tolerance – replication components, upstream backup, checkpoint and recovery – and shows how each of the four systems uses these strategies. In the end, the paper discusses the advantages and disadvantages of the combination of the strategies for fault tolerance in these systems.

  16. Software fault tolerance

    OpenAIRE

    Kazinov, Tofik Hasanaga; Mostafa, Jalilian Shahrukh

    2009-01-01

    Because of our present inability to produce errorfree software, software fault tolerance is and will contiune to be an important consideration in software system. The root cause of software design errors in the complexity of the systems. This paper surveys various software fault tolerance techniquest and methodologies. They are two gpoups: Single version and Multi version software fault tolerance techniques. It is expected that software fault tolerance research will benefit from this research...

  17. Mine-hoist active fault tolerant control system and strategy

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Z.; Wang, Y.; Meng, J.; Zhao, P.; Chang, Y. [China University of Mining and Technology, Xuzhou (China)] wzjsdstu@163.com

    2005-06-01

    Based on fault diagnosis and fault tolerant technologies, the mine-hoist active fault-tolerant control system (MAFCS) is presented with corresponding strategies, which includes the fault diagnosis module (FDM), the dynamic library (DL) and the fault-tolerant control model (FCM). When a fault is judged from some sensor by the FDM, FCM reconfigures the state of the MAFCS by calling the parameters from all sub libraries in DL, in order to ensure the reliability and safety of the mine hoist. The simulating result shows that MAFCS is of certain intelligence, which can adopt the corresponding control strategies according to different fault modes, even when there is quite a difference between the real data and the prior fault modes. 7 refs., 5 figs., 1 tab.

  18. From fault classification to fault tolerance for multi-agent systems

    CERN Document Server

    Potiron, Katia; Taillibert, Patrick

    2013-01-01

    Faults are a concern for Multi-Agent Systems (MAS) designers, especially if the MAS are built for industrial or military use because there must be some guarantee of dependability. Some fault classification exists for classical systems, and is used to define faults. When dependability is at stake, such fault classification may be used from the beginning of the system's conception to define fault classes and specify which types of faults are expected. Thus, one may want to use fault classification for MAS; however, From Fault Classification to Fault Tolerance for Multi-Agent Systems argues that

  19. A fault-tolerant software strategy for digital systems

    Science.gov (United States)

    Hitt, E. F.; Webb, J. J.

    1984-01-01

    Techniques developed for producing fault-tolerant software are described. Tolerance is required because of the impossibility of defining fault-free software. Faults are caused by humans and can appear anywhere in the software life cycle. Tolerance is effected through error detection, damage assessment, recovery, and fault treatment, followed by return of the system to service. Multiversion software comprises two or more versions of the software yielding solutions which are examined by a decision algorithm. Errors can also be detected by extrapolation from previous results or by the acceptability of results. Violations of timing specifications can reveal errors, or the system can roll back to an error-free state when a defect is detected. The software, when used in flight control systems, must not impinge on time-critical responses. Efforts are still needed to reduce the costs of developing the fault-tolerant systems.

  20. Validated Fault Tolerant Architectures for Space Station

    Science.gov (United States)

    Lala, Jaynarayan H.

    1990-01-01

    Viewgraphs on validated fault tolerant architectures for space station are presented. Topics covered include: fault tolerance approach; advanced information processing system (AIPS); and fault tolerant parallel processor (FTPP).

  1. Fault-Tolerant Onboard Monitoring and Decision Support Systems

    DEFF Research Database (Denmark)

    Lajic, Zoran

    2010-01-01

    The purpose of this research project is to improve current onboard decision support systems. Special focus is on the onboard prediction of the instantaneous sea state. In this project a new approach to increasing the overall reliability of a monitoring and decision support system has been established. The basic idea is to convert the given system into a fault-tolerant system and to improve multi-sensor data fusion for the particular system. The background of the project is the SeaSense system, which has been installed on several container ships and navy vessels. The SeaSense system provides a crude and simple estimation of the actual sea state (Hs and Tz), information about the longitudinal hull girder loading, seakeeping performance of the ship, and decision support on how to operate the ship within acceptable limits. The system is able to identify critical forthcoming events and to give advice regarding speed and course changes to decrease the wave-induced loads. The SeaSense system is based on the combineduse of a mathematical model and measurements from a set of sensors. The overall dependability of a shipboard monitoring and decision support system such as the SeaSense system can be improved using fault-tolerant techniques (Fault Diagnosis and System Re-design) and a Sensor Fusion Quality (SFQ) test. Fault diagnosis means to detect the presence of faults in the system. In case sea state estimation is conducted by a ship-wave buoy analogy the best solution is achieved when a set of three different ship responses are used. Faulty signals should be discarded from the procedure for sea state estimation if it is possible, if not the fault should be estimated. The fault diagnosis can be divided into three steps: Fault detection, fault isolation and fault estimation. Fault detection means to decide whether or not a fault has occurred. This step determines the time at which the system is subjected to the given fault. Fault isolation will find in which component a fault has occurred. This step determines the location of the fault. Fault estimation provides an estimate of magnitude of a fault. A supervisory function determines the severity of the fault once its origin has been isolated and its magnitude estimated. Fault-tolerant Sensor Fusion means that the monitoring and decision support system can accommodate faults so that the overall system continues to satisfy its goal and on the other hand in the absence of a fault, the system should be able to provide the most accurate information using the SFQ test.

  2. Active Fault Tolerant Control of Livestock Stable Ventilation System

    DEFF Research Database (Denmark)

    Gholami, Mehdi

    2011-01-01

    Modern stables and greenhouses are equipped with different components for providing a comfortable climate for animals and plant. A component malfunction may result in loss of production. Therefore, it is desirable to design a control system, which is stable, and is able to provide an acceptable degraded performance even in the faulty case. In this thesis, we have designed such controllers for climate control systems of livestock buildings in three steps: • Deriving a model for the climate control system of a pig-stable. • Designing an active fault diagnosis (AFD) algorithm for different kinds of fault. • Designing a fault tolerant control scheme for the climate control system. In the first step, a conceptual multi-zone model for climate control of a live-stock building is derived. In the next step, two methods for active fault diagnosis are proposed. The AFD methods excite the system by injecting a so-called excitation input. Two different algorithms, the EKF and a new adaptive filter, are used to detect the faults. Fault tolerant controller (FTC) is based on a switching scheme between a set of predefined passive fault tolerant controller (PFTC). In the FTC part of the thesis, first a passive fault tolerant controller (PFTC) based on state feed-back is proposed for discrete-time piecewise affine (PWA) systems. Only actuator faults are considered. Then the PFTC problem is reformulated as a feasibility of a set of linear matrix inequalities (LMIs).

  3. Active fault tolerant control design for switched hybrid systems

    OpenAIRE

    Rodrigues, Mickael; Theilliol, Didier; Sauter, Dominique

    2006-01-01

    In this paper, an active Fault Tolerant Control (FTC) strategy is developed for Switched Hybrid Systems. The main contribution concerns the design of a linear Output Feedback dedicated to Switched Hybrid System. Based on an available Fault Detection, Isolation (FDI) scheme, the controllers redesign is performed on-line trough LMI both in fault-free and faulty cases in order to preserve the system closed-loop stability despite of actuator failures. The effectiveness and performances of the pro...

  4. Fault Tolerant Controllers for Sampled-data Systems

    DEFF Research Database (Denmark)

    Niemann, H.; Stoustrup, Jakob

    2004-01-01

    A general compensator architecture for fault tolerant control (FTC) for sampled-data systems is proposed. The architecture is based on the YJBK parameterization of all stabilizing controllers, and uses the dual YJBK parameterization to quantify the performance of the fault tolerant system. The FTC architecture is based on a discrete-time nominal feedback controller and with the FTC part also in discrete-time. Further, a number of problems for the design of the controller reconfiguration parat in...

  5. Fault tolerant oxygen control of a diesel engine air system

    OpenAIRE

    Nitsche, Rainer; Bitzer, Matthias; El Khaldi, Mahmoud; Bloch, Gérard

    2010-01-01

    This paper is devoted to the fault tolerant control problem of a Diesel engine air system having a jammed Exhaust Gas Recirculation (EGR) valve. The fault tolerant control is based on replaning the trajectory in order to track a new controlled variable which is the oxygen concentration in the intake manifold instead of the fresh air mass flow. The trajectory planning is based on an inverse model approach, utilizing the fundamental thermodynamic relations of the air system.

  6. Fault-tolerant computation with higher-dimensional systems

    Energy Technology Data Exchange (ETDEWEB)

    Gottesman, D.

    1998-07-01

    Instead of a quantum computer where the fundamental units are 2-dimensional qubits, the author can consider a quantum computer made up of d-dimensional systems. There is a straightforward generalization of the class of stabilizer codes to d-dimensional systems, and he will discuss the theory of fault-tolerant computation using such codes. He proves that universal fault-tolerant computation is possible with any higher-dimensional stabilizer code for prime d.

  7. Fault-Tolerant Quantum Computation with Higher-Dimensional Systems

    CERN Document Server

    Gottesman, D

    1998-01-01

    Instead of a quantum computer where the fundamental units are 2-dimensional qubits, we can consider a quantum computer made up of d-dimensional systems. There is a straightforward generalization of the class of stabilizer codes to d-dimensional systems, and I will discuss the theory of fault-tolerant computation using such codes. I prove that universal fault-tolerant computation is possible with any higher-dimensional stabilizer code for prime d.

  8. Data-driven design of fault diagnosis and fault-tolerant control systems

    CERN Document Server

    Ding, Steven X

    2014-01-01

    Data-driven Design of Fault Diagnosis and Fault-tolerant Control Systems presents basic statistical process monitoring, fault diagnosis, and control methods, and introduces advanced data-driven schemes for the design of fault diagnosis and fault-tolerant control systems catering to the needs of dynamic industrial processes. With ever increasing demands for reliability, availability and safety in technical processes and assets, process monitoring and fault-tolerance have become important issues surrounding the design of automatic control systems. This text shows the reader how, thanks to the rapid development of information technology, key techniques of data-driven and statistical process monitoring and control can now become widely used in industrial practice to address these issues. To allow for self-contained study and facilitate implementation in real applications, important mathematical and control theoretical knowledge and tools are included in this book. Major schemes are presented in algorithm form and...

  9. A system architecture for fault tolerance in concurrent software

    Energy Technology Data Exchange (ETDEWEB)

    Ancona, M.; Dodero, G.; Gianuzzi, V. (Univ. of Genova (IT)); Clematis, A. (Italian National Research Council (US)); Fernardez, E.B. (Florida Atlantic Univ. (US))

    1990-10-01

    Robotics, process control, navigational systems, and other critical computer applications demand reliable software. Such systems generally comprise a set of concurrent, cooperating processes. Programmers need an environment that effectively supports the development of fault-tolerant programs. Such a system must let the user select and apply the most suitable fault-tolerant mechanism for the application program; modify that mechanism as required by the program or to experiment with new fault-tolerance schemes; and modify and maintain the application program without interfering much with the recovery structure. The authors propose a system architecture which fulfills these requirements and can support a variety of policies in a structured way. It treats the application program and the recovery software--called the Recovery Metaprogram--separately. The RMP acts like a programmer who monitors and (possibly) modifies the execution of the application program.

  10. Design of fault tolerant control system for steam generator using

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Myung Ki; Seo, Mi Ro [Korea Electric Power Research Institute, Taejon (Korea, Republic of)

    1998-12-31

    A controller and sensor fault tolerant system for a steam generator is designed with fuzzy logic. A structure of the proposed fault tolerant redundant system is composed of a supervisor and two fuzzy weighting modulators. A supervisor alternatively checks a controller and a sensor induced performances to identify which part, a controller or a sensor, is faulty. In order to analyze controller induced performance both an error and a change in error of the system output are chosen as fuzzy variables. The fuzzy logic for a sensor induced performance uses two variables : a deviation between two sensor outputs and its frequency. Fuzzy weighting modulator generates an output signal compensated for faulty input signal. Simulations show that the proposed fault tolerant control scheme for a steam generator regulates well water level by suppressing fault effect of either controllers or sensors. Therefore through duplicating sensors and controllers with the proposed fault tolerant scheme, both a reliability of a steam generator control and sensor system and that of a power plant increase even more. 2 refs., 9 figs., 1 tab. (Author)

  11. Summarize of Electric Vehicle Electric System Fault and Fault-tolerant Technology

    OpenAIRE

    Zhang Liwei; Huang Xianjin; Yang Yannan; Xu Chen; Liu Jie

    2013-01-01

    Electric vehicle drive system is a multi-variable function, running environment complexed and changeable system, so it’s failure form is complicated. In this paper, according to the fault happens in different position, establish vehicle fault table, analyze the consequences of failure may cause and the causes of failure. Combined with hardware limitations, and the maximum guarantee system performance requirements, passive software redundancy fault-tolerant strategy is put forward, give an exa...

  12. A Review Of Fault Tolerant Scheduling In Multicore Systems

    Directory of Open Access Journals (Sweden)

    Shefali Malhotra

    2015-05-01

    Full Text Available Abstract In this paper we have discussed about various fault tolerant task scheduling algorithm for multi core system based on hardware and software. Hardware based algorithm which is blend of Triple Modulo Redundancy and Double Modulo Redundancy in which Agricultural Vulnerability Factor is considered while deciding the scheduling other than EDF and LLF scheduling algorithms. In most of the real time system the dominant part is shared memory.Low overhead software based fault tolerance approach can be implemented at user-space level so that it does not require any changes at application level. Here redundant multi-threaded processes are used. Using those processes we can detect soft errors and recover from them. This method gives low overhead fast error detection and recovery mechanism. The overhead incurred by this method ranges from 0 to 18 for selected benchmarks. Hybrid Scheduling Method is another scheduling approach for real time systems. Dynamic fault tolerant scheduling gives high feasibility rate whereas task criticality is used to select the type of fault recovery method in order to tolerate the maximum number of faults.

  13. Fault Tolerance Middleware for a Multi-Core System

    Science.gov (United States)

    Some, Raphael R.; Springer, Paul L.; Zima, Hans P.; James, Mark; Wagner, David A.

    2012-01-01

    Fault Tolerance Middleware (FTM) provides a framework to run on a dedicated core of a multi-core system and handles detection of single-event upsets (SEUs), and the responses to those SEUs, occurring in an application running on multiple cores of the processor. This software was written expressly for a multi-core system and can support different kinds of fault strategies, such as introspection, algorithm-based fault tolerance (ABFT), and triple modular redundancy (TMR). It focuses on providing fault tolerance for the application code, and represents the first step in a plan to eventually include fault tolerance in message passing and the FTM itself. In the multi-core system, the FTM resides on a single, dedicated core, separate from the cores used by the application. This is done in order to isolate the FTM from application faults and to allow it to swap out any application core for a substitute. The structure of the FTM consists of an interface to a fault tolerant strategy module, a responder module, a fault manager module, an error factory, and an error mapper that determines the severity of the error. In the present reference implementation, the only fault tolerant strategy implemented is introspection. The introspection code waits for an application node to send an error notification to it. It then uses the error factory to create an error object, and at this time, a severity level is assigned to the error. The introspection code uses its built-in knowledge base to generate a recommended response to the error. Responses might include ignoring the error, logging it, rolling back the application to a previously saved checkpoint, swapping in a new node to replace a bad one, or restarting the application. The original error and recommended response are passed to the top-level fault manager module, which invokes the response. The responder module also notifies the introspection module of the generated response. This provides additional information to the introspection module that it can use in generating its next response. For example, if the responder triggers an application rollback and errors are still occurring, the introspection module may decide to recommend an application restart.

  14. Co-design of Fault-Tolerant Systems with Imperfect Fault Detection

    OpenAIRE

    Chen, Yi-ching

    2014-01-01

    In recent decades, transient faults have become a critical issue in modernelectronic devices. Therefore, many fault-tolerant techniques have been proposedto increase system reliability, such as active redundancy, which can beimplemented in both space and time dimensions. The main challenge of activeredundancy is to introduce the minimal overhead of redundancy and to schedulethe tasks. In many pervious works, perfect fault detectors are assumed to simplifythe problem. However, the induced reso...

  15. Fault Tolerance by Replication in Parallel System

    Directory of Open Access Journals (Sweden)

    Madhavi Vaidya

    2011-04-01

    Full Text Available In this paper the author has concentrated on architecture of a cluster computer and the working of them in context with parallel paradigms. Author has a keen interest on guaranteeing the working of a node efficiently and the data on it should be available at any time to run the task in parallel. The applications while running may face resource faults during execution. The application must dynamically do something to prepare for, and recover from, the expected failure. Typically, checkpointing is used to minimize the loss of computation. Checkpointing is a strategy purely local, but can be very costly. Most checkpointing techniques, however, require central storage for storing checkpoints. This results in a bottleneck and severely limits the scalability of checkpointing, while also proving to be too expensive for dedicated checkpointing networks and storage systems. The author has suggested the technique of replication implemented on it. Replication has been studied for parallel databases in general. Author has worked on parallel execution of task on a node; if it fails then self protecting feature should be turned on. Self-protecting in this context means that computer clusters should detect and handle failures automatically with the help of replication.

  16. Industrial Computing Systems: A Case Study of Fault Tolerance Analysis

    OpenAIRE

    Shchurov, Andrey A.

    2015-01-01

    Fault tolerance is a key factor of industrial computing systems design. But in practical terms, these systems, like every commercial product, are under great financial constraints and they have to remain in operational state as long as possible due to their commercial attractiveness. This work provides an analysis of the instantaneous failure rate of these systems at the end of their life-time period. On the basis of this analysis, we determine the effect of a critical incre...

  17. Reliable, fault tolerant control systems for nuclear generating stations

    International Nuclear Information System (INIS)

    Two operational features of CANDU Nuclear Power Stations provide for high plant availability. First, the plant re-fuels on-line, thereby eliminating the need for periodic and lengthy refuelling 'outages'. Second, the all plants are controlled by real-time computer systems. Later plants are also protected using real-time computer systems. In the past twenty years, the control systems now operating in 21 plants have achieved an availability of 99.8%, making significant contributions to high CANDU plant capacity factors. This paper describes some of the features that ensure the high degree of system fault tolerance and hence high plant availability. The emphasis will be placed on the fault tolerant features of the computer systems included in the latest reactor design - the CANDU 3 (450MWe). (author)

  18. Fault Analysis and Fault Tolerance of a Base Station System for Mobile Communications

    OpenAIRE

    Kehl, Hubertus

    2007-01-01

    A new approach for fault analysis and the evaluation of fault tolerance (FT) schemes of telecommunication systems is described. In particular mobile base stations are considered as an example for such systems. It is shown that standard metrics for availability and reliability are in general not suitable. A new model is proposed that leads to three different metrics which measure reliability and availability from either the system or the user point of view. These metrics are used to discover t...

  19. Fault-tolerant control systems design and practical applications

    CERN Document Server

    Noura, Hassan; Ponsart, Jean-Christophe; Chamseddine, Abbas

    2009-01-01

    This book reports the development of fault diagnosis and fault-tolerant control methods. It discusses the state-of-the-art in both theory and practice and thoroughly details three real-world industrial applications..

  20. A Ship Propulsion System Model for Fault-tolerant Control

    DEFF Research Database (Denmark)

    Izadi-Zamanabadi, Roozbeh; Blanke, M.

    1998-01-01

    This report presents a propulsion system model for a low speed marine vehicle, which can be used as a test benchmark for Fault-Tolerant Control purposes. The benchmark serves the purpose of offering realistic and challenging problems relevant in both FDI and (autonomous) supervisory control area. The propulsion system model is presented in two versions: the first one consists of one engine and one propeller, and the othe one consists of two engines and their corresponding propellers placed in pa...

  1. Synthesis of Fault-Tolerant Embedded Systems Using Games: From Theory to Practice

    Science.gov (United States)

    Cheng, Chih-Hong; Rueß, Harald; Knoll, Alois; Buckl, Christian

    In this paper, we present an approach for fault-tolerant synthesis by combining predefined patterns for fault-tolerance with algorithmic game solving. A non-fault-tolerant system, together with the relevant fault hypothesis and fault-tolerant mechanism templates in a pool are translated into a distributed game, and we perform an incomplete search of strategies to cope with undecidability. The result of the game is translated back to executable code concretizing fault-tolerant mechanisms using constraint solving. The overall approach is implemented to a prototype tool chain and is illustrated using examples.

  2. A Game-theoretic Approach for Synthesizing Fault-Tolerant Embedded Systems

    CERN Document Server

    Cheng, Chih-Hong; Knoll, Alois; Buckl, Christian

    2010-01-01

    In this paper, we present an approach for fault-tolerant synthesis by combining predefined patterns for fault-tolerance with algorithmic game solving. A non-fault-tolerant system, together with the relevant fault hypothesis and fault-tolerant mechanism templates in a pool are translated into a distributed game, and we perform an incomplete search of strategies to cope with undecidability. The result of the game is translated back to executable code concretizing fault-tolerant mechanisms using constraint solving. The overall approach is implemented to a prototype tool chain and is illustrated using examples.

  3. Summarize of Electric Vehicle Electric System Fault and Fault-tolerant Technology

    Directory of Open Access Journals (Sweden)

    Zhang Liwei

    2013-09-01

    Full Text Available Electric vehicle drive system is a multi-variable function, running environment complexed and changeable system, so it’s failure form is complicated. In this paper, according to the fault happens in different position, establish vehicle fault table, analyze the consequences of failure may cause and the causes of failure. Combined with hardware limitations, and the maximum guarantee system performance requirements, passive software redundancy fault-tolerant strategy is put forward, give an example to analysis the pros and cons of this method.

  4. Fault-tolerant Supervisory Control : System Analysis and Logic Design

    DEFF Research Database (Denmark)

    Izadi-Zamanabadi, Roozbeh

    1999-01-01

    The main purpose of this work has been to achieve active fault-tolerance in control systems, defined as a methodology where fault detection and isolation techniques are combined with supervisory control to achieve autonomous accommodation of faults before they develop into failures. The aim of this work has been to develop and employ concepts and methods that are suitable for use in different automation processes, with applicability in various industrial fields. The requirements for high productivity and quality has resulted in employing additional instrumentation and use of more sophisticated control algorithms. The drawback is, however, that these control systems have become more vulnerable to even simple faults in instrumentation. On the other hand, due to cost-optimality requirements, an extensive use of hardware redundancy has been prohibited. Nevertheless, the dependency and availability could be increased through enhancing control systems' ability to on-line perform fault detection and reconfiguration when a fault occurs and before a safety system shuts-down the entire process. The main contributions of this research effort are development and experimentation with methodologies for systematic analysis of reconfiguration and design of supervisor logic. In addition, useful experience is obtained through implementation of a fault-tolerant control scheme against a simulated ship and its propulsion system. A development methodology, which was suggested in the Control Engineering Department, is extended to cope with the important reconfiguration problem. In order to enable a designer to acquire knowledge about reconfiguration possibilities, the structural analysis method is added as an extension to the existing methodology. This extension builds upon the earlier method where fault propagation and severity analysis are the essential parts. Structural analysis (SA) enables the designer to distinguish between the parts of the systems with no redundant information and the parts with possible redundant information. This method, hence, provides the designer with information, which is necessary during the selection of remedial actions. Furthermore, it is shown how sensor information fusion is obtained by using the SA method. The construction of the supervisor's decision logic is essential for the active form of fault-tolerant control. In this regard, two approaches has been presented. The first aims at constructing the decision logic in form of a ``language''. This language is obtained as a direct result of the component based approach, presented in this thesis. This approach is based on the definition of a functional component, components placement in a control system hierarchy and the definition of system level hierarchy. The supervisor language includes all valid strings, representing the combination of valid components, that keep the system functional. This approach is simple and can be automated. In the second approach, implementation of supervisor functionality is realized on the basis of an extension to the traditional state-event machines. Due to parallelity (inherent modularity) the supervisor logic is more easily modified, updated, maintained, and tested. A salient feature is that a change in one task only necessitates redesign of essentially one corresponding state-event machine (SEM). A heuristic guideline is provided for designing the logic in form of SEMs. A ship propulsion system benchmark has been designed and used as a case study. This includes experimentation with the above methodologies and implementation of a fault-tolerant control against the simulation. Four generic faults have been considered. It has been shown how the SA method is easily employed to generate analytical redundancy relations, which in turn are then used for FDI purposes. Three different methods are used to generate residuals. These methods are: simple numerical calculation, a non-linear observer, and a Neuro-Fuzzy method. Employment of each method follows the assumption about the available system information. The results show that it is p

  5. Fault-Tolerant Onboard Monitoring and Decision Support Systems

    DEFF Research Database (Denmark)

    Lajic, Zoran

    2010-01-01

    The purpose of this research project is to improve current onboard decision support systems. Special focus is on the onboard prediction of the instantaneous sea state. In this project a new approach to increasing the overall reliability of a monitoring and decision support system has been established. The basic idea is to convert the given system into a fault-tolerant system and to improve multi-sensor data fusion for the particular system. The background of the project is the SeaSense system, w...

  6. Fault-tolerant mechatronic systems. Pt. 2; Fehlertolerante mechatronische Systeme. T. 2

    Energy Technology Data Exchange (ETDEWEB)

    Isermann, R. [Technische Univ. Darmstadt (Germany). Inst. fuer Automatisierungstechnik, FG Regelungstechnik und Prozessautomatisierung

    2007-05-15

    After an introductory part in issue 4/2007 this part 2 then considers some examples of realized, fault-tolerant systems as, e. g., a fault tolerant position sensor for an electrical throttle actuator, a fault-tolerant asynchronous drive with different duplex configurations, and the analytical sensor redundancy for the lateral dynamic behaviour of a passenger car. The applied fault-detection methods and the reconfiguration strategy are described. Herewith it is shown how the function can be maintained with only small disturbances of the operation after appearance of faults. (orig.)

  7. Logical specification and analysis of fault tolerant systems through partial model checking

    OpenAIRE

    Martinelli, Fabio

    2003-01-01

    This paper presents a framework for a logical characterization of fault tolerance and its formal analysis based on partial model checking techniques. The framework requires a fault tolerant system to be modeled using a formal calculus, here the CCS process algebra. To this aim we propose a uniform modeling scheme in which to specify a formal model of the system, its failing behaviour and possibly its fault-recovering procedures. Once a formal model is provided into our scheme, fault tolerance...

  8. System Diagnosis and Fault Tolerance for Distributed Computing System: A Review

    Directory of Open Access Journals (Sweden)

    Nilotpal Baruah

    2013-10-01

    Full Text Available An adaptive system diagnosis fault tolerance method for distributed system. The system is comprised of a network including N nodes where N is integer and greater than equal to 3 and each node is able to execute an algorithm to communicate with the network. A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information. As computer network is a collection of hardware components it is very often that is may have some fault either in the hardware or in the software of the entire network. So to deal with these kinds of faults either hardware of software, some fault diagnosis and fault tolerance mechanism to be implemented for the proper functioning of the system. For such a fault detection and fault tolerant mechanism is to be discussed in this paper. What kind of fault and how they occur will discuss and try to find out some suitable solution of our proposed problem. Various fault detecting mechanism and fault tolerant methodology to be study here and the main goal of the study is to find out some automatic fault detection and fault tolerance techniques

  9. Sliding mode based fault detection, reconstruction and fault tolerant control scheme for motor systems.

    Science.gov (United States)

    Mekki, Hemza; Benzineb, Omar; Boukhetala, Djamel; Tadjine, Mohamed; Benbouzid, Mohamed

    2015-07-01

    The fault-tolerant control problem belongs to the domain of complex control systems in which inter-control-disciplinary information and expertise are required. This paper proposes an improved faults detection, reconstruction and fault-tolerant control (FTC) scheme for motor systems (MS) with typical faults. For this purpose, a sliding mode controller (SMC) with an integral sliding surface is adopted. This controller can make the output of system to track the desired position reference signal in finite-time and obtain a better dynamic response and anti-disturbance performance. But this controller cannot deal directly with total system failures. However an appropriate combination of the adopted SMC and sliding mode observer (SMO), later it is designed to on-line detect and reconstruct the faults and also to give a sensorless control strategy which can achieve tolerance to a wide class of total additive failures. The closed-loop stability is proved, using the Lyapunov stability theory. Simulation results in healthy and faulty conditions confirm the reliability of the suggested framework. PMID:25747198

  10. Fault-diagnosis systems an introduction from fault detection to fault tolerance

    CERN Document Server

    Isermann, Rolf

    2006-01-01

    With increasing demands for efficiency and product quality plus progress in the integration of automatic control systems in high-cost mechatronic and safety-critical processes, the field of supervision (or monitoring), fault detection and fault diagnosis plays an important role. The book gives an introduction into advanced methods of fault detection and diagnosis (FDD). After definitions of important terms, it considers the reliability, availability, safety and systems integrity of technical processes. Then fault-detection methods for single signals without models such as limit and trend check

  11. Implementation of FMFRS (Fault Tolerant Most fitting Resource Scheduling algorithm in Real time system

    Directory of Open Access Journals (Sweden)

    Harkiran Kaur

    2013-08-01

    Full Text Available In computational Grid, fault tolerance is an imperative issue to be considered during job scheduling. Due to the widespread use of resources, systems are highly prone to errors and failures. Hence fault tolerance plays a key role in grid to avoid the problem of unreliability. The two main techniques for implementing fault tolerance in grid environment are check pointing and replication. This paper proposes a real time approach to a replication technique named as FMFRS (Fault Tolerant most fitting resource scheduling algorithm to improve the fault tolerance of the fittest resource scheduling algorithm. The proposed method is to improve the fault tolerance by using fittest resource scheduling algorithm, by scheduling the job in coordination with job replication when the resource has low reliability and checking the parameters like Fault Tolerance capacity and Node’s Reliability. Based on the reliability index of the resource, the resource is identified as critical.

  12. Ship Propulsion System as a Benchmark for Fault-Tolerant Control

    DEFF Research Database (Denmark)

    Izadi-Zamanabadi, Roozbeh; Blanke, M.

    1998-01-01

    Fault-tolerant control combines fault detection and isolation techniques with supervisory control to achieve autonomous accommodation of faults before they develop into failures. While fault detection and isolation (FDI) methods have matured during the past decade the extension to fault-tolerant control is a fairly new area. The paper presents a ship propulsion system as a benchmark that should be useful as a platform for development of new ideas and comparison of methods. The benchmark has two ...

  13. Fault Tolerant Software: a Multi Agent System Solution

    DEFF Research Database (Denmark)

    Caponetti, Fabio; Bergantino, Nicola

    2009-01-01

    Development of high dependable systems remains a labour intensive task. This paper explores recent advances on the adaptation of the software agent architecture for control application while looking to dependability issues. Multiple agent systems theory will be reviewed giving methods to supervise it. Software ageing is shown to be the most common problem and rejuvenation its counteract. The paper will show how an agent population can be monitored, faulty agents isolated and reloaded in a healthy state, hence rejuvenated. The aim is to propose an architecture as basis for the design of control software able to tolerate faults and residual bugs without the need of maintenance stops.

  14. Reliability Models for Highly Fault-tolerant Storage Systems

    OpenAIRE

    Resch, Jason; Volvovski, Ilya

    2013-01-01

    We found that a reliability model commonly used to estimate Mean-Time-To-Data-Loss (MTTDL), while suitable for modeling RAID 0 and RAID 5, fails to accurately model systems having a fault-tolerance greater than 1. Therefore, to model the reliability of RAID 6, Triple-Replication, or k-of-n systems requires an alternate technique. In this paper, we explore some alternatives, and evaluate their efficacy by comparing their predictions to simulations. Our main result is a new fo...

  15. Fault Tolerant Operation in Aero Engine Using Distributed Computation System

    Directory of Open Access Journals (Sweden)

    Neela A G

    2014-04-01

    Full Text Available The paper presents fault tolerant operation in an aero engine based on real-time systems which is built for a very small set of mission-critical applications like space craft’s , avionics and other distributed control systems. The modern software deals with external interfaces and has to consider various timing implications The platform is based on the C and developed using Keil MDK tool with the targeted deadline of 100 milliseconds at the baud rate of 500 kbps. CAN interface executes the role of Transportation and Communication, an interface cable used for serial communication between Digital Electronic Control Unit (DECU and the host to transfer data to the pilot Online Monitoring System and that is based on Laboratory Virtual Instrument Engineering Workbench (Lab VIEW 7.1. Fault diagnosis typically assumes a sufficiently large fault signature and enough time for a reliable decision to be reached. However, for a class of safety critical faults on commercial aircraft engines, prompt detection is paramount within a millisecond range to allow accommodation to avert undesired engine behavior. At the same time, false positives must be avoided to prevent inappropriate control action.

  16. Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 2: Army fault tolerant architecture design and analysis

    Science.gov (United States)

    Harper, R. E.; Alger, L. S.; Babikyan, C. A.; Butler, B. P.; Friend, S. A.; Ganska, R. J.; Lala, J. H.; Masotto, T. K.; Meyer, A. J.; Morton, D. P.

    1992-01-01

    Described here is the Army Fault Tolerant Architecture (AFTA) hardware architecture and components and the operating system. The architectural and operational theory of the AFTA Fault Tolerant Data Bus is discussed. The test and maintenance strategy developed for use in fielded AFTA installations is presented. An approach to be used in reducing the probability of AFTA failure due to common mode faults is described. Analytical models for AFTA performance, reliability, availability, life cycle cost, weight, power, and volume are developed. An approach is presented for using VHSIC Hardware Description Language (VHDL) to describe and design AFTA's developmental hardware. A plan is described for verifying and validating key AFTA concepts during the Dem/Val phase. Analytical models and partial mission requirements are used to generate AFTA configurations for the TF/TA/NOE and Ground Vehicle missions.

  17. Fault-diagnosis applications model-based condition monitoring actuators, drives, machinery, plants, sensors, and fault-tolerant systems

    CERN Document Server

    Isermann, Rolf

    2011-01-01

    Supervision, condition-monitoring, fault detection, fault diagnosis and fault management play an increasing role for technical processes and vehicles in order to improve reliability, availability, maintenance and lifetime. For safety-related processes fault-tolerant systems with redundancy are required in order to reach comprehensive system integrity.   This book is a sequel of the book "Fault-Diagnosis Systems" published in 2006, where the basic methods were described. After a short introduction into fault-detection and fault-diagnosis methods the book shows how these methods can be applie

  18. Fault-Tolerant Relative Navigation System (RNS) for Docking Project

    Data.gov (United States)

    National Aeronautics and Space Administration — A method is propsed to develop a sensor fusion process for blending GPS/IMU/EO data for fault tolerant rendezvous and docking of spacecraft. The methodology takes...

  19. Fault-diagnosis systems. An introduction from fault detection to fault tolerance

    Energy Technology Data Exchange (ETDEWEB)

    Isermann, R. [TU Darmstadt (Germany). Fachgebiet Regelungstechnik und Prozessautomatisierung

    2006-07-01

    With increasing demands for efficiency and product quality plus progress in the integration of automatic control systems in high-cost mechatronic and safety-critical processes, the field of supervision (or monitoring), fault detection and fault diagnosis plays an important role. The book gives an introduction into advanced methods of fault detection and diagnosis (FDD). After definitions of important terms, it considers the reliability, availability, safety and systems integrity of technical processes. Then fault-detection methods for single signals without models such as limit and trend checking and with harmonic and stochastic models, such as Fourier analysis, correlation and wavelets are treated. This is followed by fault detection with process models using the relationships between signals such as parameter estimation, parity equations, observers and principal component analysis. The treated fault-diagnosis methods include classification methods from Bayes classification to neural networks with decision trees and inference methods from approximate reasoning with fuzzy logic to hybrid fuzzy-neuro systems. Several practical examples for fault detection and diagnosis of DC motor drives, a centrifugal pump, automotive suspension and tire demonstrate applications. (orig.)

  20. Design and analysis of reliable and fault-tolerant computer systems

    CERN Document Server

    Abd-El-Barr, Mostafa

    2006-01-01

    Covering both the theoretical and practical aspects of fault-tolerant mobile systems, and fault tolerance and analysis, this book tackles the current issues of reliability-based optimization of computer networks, fault-tolerant mobile systems, and fault tolerance and reliability of high speed and hierarchical networks.The book is divided into six parts to facilitate coverage of the material by course instructors and computer systems professionals. The sequence of chapters in each part ensures the gradual coverage of issues from the basics to the most recent developments. A useful set of refere

  1. Diagnostic software and fault tolerant microprocessor based system architectures

    International Nuclear Information System (INIS)

    In numerous industrial applications including power generation, the availability of electronic systems to perform the tasks assigned has become a major issue. At the same time, the functional complexity of these systems has increased enormously. Fortunately, the arrival of cost effective microprocessor based hardware has given the system designer a cadre of techniques to ensure the desired degree of system integrity and availability. These include: dynamic redundancy, isolation, functional diversity, built-in self-tests, embedded test subsystems, communications, error checking and error correcting codes, etc. The choice among the available techniques is generally heuristic and depends greatly on the structure of major components and systems external to the electronic system itself as well as the postulated faults and their relative frequency. Indiscriminate use of these techniques will inevitably increase cost and reduce maintainability while actually reducing system availability and reliability. The issues and the application of these techniques are discussed by describing recent examples of fault tolerant microprocessor based system architectures which include the Plant Safety Monitoring System, the EAGLE-21 Process Protection System and the Advanced Rod Position Indication System for pressurized water reactors. Each of these systems utilize unique internal architectures that address the reliability, availability, and the communications issues while improving maintainability and man-machine interfaces

  2. Disturbance observer based fault estimation and dynamic output feedback fault tolerant control for fuzzy systems with local nonlinear models.

    Science.gov (United States)

    Han, Jian; Zhang, Huaguang; Wang, Yingchun; Liu, Yang

    2015-11-01

    This paper addresses the problems of fault estimation (FE) and fault tolerant control (FTC) for fuzzy systems with local nonlinear models, external disturbances, sensor and actuator faults, simultaneously. Disturbance observer (DO) and FE observer are designed, simultaneously. Compared with the existing results, the proposed observer is with a wider application range. Using the estimation information, a novel fuzzy dynamic output feedback fault tolerant controller (DOFFTC) is designed. The controller can be used for the fuzzy systems with unmeasurable local nonlinear models, mismatched input disturbances, and measurement output affecting by sensor faults and disturbances. At last, the simulation shows the effectiveness of the proposed methods. PMID:26456728

  3. Piecewise Sliding Mode Decoupling Fault Tolerant Control System

    Directory of Open Access Journals (Sweden)

    Rafi Youssef

    2010-01-01

    Full Text Available Problem statement: Proposed method in the present study could deal with fault tolerant control system by using the so called decentralized control theory with decoupling fashion sliding mode control, dealing with subsystems instead of whole system and to the knowledge of the author there is no known computational algorithm for decentralized case, Approach: In this study we present a decoupling strategy based on the selection of sliding surface, which should be in piecewise sliding surface partition to apply the PwLTool which have as purpose in our case to delimit regions where sliding mode occur, after that as Results: We get a simple linearized model selected in those regions which could depict the complex system, Conclusion: With the 3 water tank level system as example we implement this new design scenario and since we are interested in networked control system we believe that this kind of controller implementation will not be affected by network delays.

  4. Distributed Adaptive Fault-Tolerant Control of Uncertain Multi-Agent Systems

    OpenAIRE

    Khalili, Mohsen; Zhang, Xiaodong; Polycarpou, Marios M.; PARISINI, THOMAS; Cao, Yongcan

    2015-01-01

    This paper presents an adaptive fault-tolerant control (FTC) scheme for a class of nonlinear uncertain multi-agent systems. A local FTC scheme is designed for each agent using local measurements and suitable information exchanged between neighboring agents. Each local FTC scheme consists of a fault diagnosis module and a reconfigurable controller module comprised of a baseline controller and two adaptive fault-tolerant controllers activated after fault detection and after fa...

  5. Fault-tolerance in Two-dimensional Topological Systems

    Science.gov (United States)

    Anderson, Jonas T.

    This thesis is a collection of ideas with the general goal of building, at least in the abstract, a local fault-tolerant quantum computer. The connection between quantum information and topology has proven to be an active area of research in several fields. The introduction of the toric code by Alexei Kitaev demonstrated the usefulness of topology for quantum memory and quantum computation. Many quantum codes used for quantum memory are modeled by spin systems on a lattice, with operators that extract syndrome information placed on vertices or faces of the lattice. It is natural to wonder whether the useful codes in such systems can be classified. This thesis presents work that leverages ideas from topology and graph theory to explore the space of such codes. Homological stabilizer codes are introduced and it is shown that, under a set of reasonable assumptions, any qubit homological stabilizer code is equivalent to either a toric code or a color code. Additionally, the toric code and the color code correspond to distinct classes of graphs. Many systems have been proposed as candidate quantum computers. It is very desirable to design quantum computing architectures with two-dimensional layouts and low complexity in parity-checking circuitry. Kitaev's surface codes provided the first example of codes satisfying this property. They provided a new route to fault tolerance with more modest overheads and thresholds approaching 1%. The recently discovered color codes share many properties with the surface codes, such as the ability to perform syndrome extraction locally in two dimensions. Some families of color codes admit a transversal implementation of the entire Clifford group. This work investigates color codes on the 4.8.8 lattice known as triangular codes. I develop a fault-tolerant error-correction strategy for these codes in which repeated syndrome measurements on this lattice generate a three-dimensional space-time combinatorial structure. I then develop an integer program that analyzes this structure and determines the most likely set of errors consistent with the observed syndrome values. I implement this integer program to find the threshold for depolarizing noise on small versions of these triangular codes. Because the threshold for magic-state distillation is likely to be higher than this value and because logical CNOT gates can be performed by code deformation in a single block instead of between pairs of blocks, the threshold for fault-tolerant quantum memory for these codes is also the threshold for fault-tolerant quantum computation with them. Since the advent of a threshold theorem for quantum computers much has been improved upon. Thresholds have increased, architectures have become more local, and gate sets have been simplified. The overhead for magic-state distillation has been studied, but not nearly to the extent of the aforementioned topics. A method for greatly reducing this overhead, known as reusable magic states, is studied here. While examples of reusable magic states exist for Clifford gates, I give strong reasons to believe they do not exist for non-Clifford gates.

  6. Reliability modeling of digital component in plant protection system with various fault-tolerant techniques

    Energy Technology Data Exchange (ETDEWEB)

    Kim, Bo Gyung, E-mail: bogyungkim@kaist.ac.kr [Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701 (Korea, Republic of); Kang, Hyun Gook [Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701 (Korea, Republic of); Department of Nuclear Engineering, Khalifa University of Science, Technology and Research, Abu Dhabi (United Arab Emirates); Kim, Hee Eun [Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701 (Korea, Republic of); Lee, Seung Jun [Integrated Safety Assessment Team, Korea Atomic Energy Research Institute, 1045, Daedeok-daero, Daejeon 305-353 (Korea, Republic of); Seong, Poong Hyun [Department of Nuclear and Quantum Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-ro, Yuseong-gu, Daejeon 305-701 (Korea, Republic of)

    2013-12-15

    Highlights: • Integrated fault coverage is introduced for reflecting characteristics of fault-tolerant techniques in the reliability model of digital protection system in NPPs. • The integrated fault coverage considers the process of fault-tolerant techniques from detection to fail-safe generation process. • With integrated fault coverage, the unavailability of repairable component of DPS can be estimated. • The new developed reliability model can reveal the effects of fault-tolerant techniques explicitly for risk analysis. • The reliability model makes it possible to confirm changes of unavailability according to variation of diverse factors. - Abstract: With the improvement of digital technologies, digital protection system (DPS) has more multiple sophisticated fault-tolerant techniques (FTTs), in order to increase fault detection and to help the system safely perform the required functions in spite of the possible presence of faults. Fault detection coverage is vital factor of FTT in reliability. However, the fault detection coverage is insufficient to reflect the effects of various FTTs in reliability model. To reflect characteristics of FTTs in the reliability model, integrated fault coverage is introduced. The integrated fault coverage considers the process of FTT from detection to fail-safe generation process. A model has been developed to estimate the unavailability of repairable component of DPS using the integrated fault coverage. The new developed model can quantify unavailability according to a diversity of conditions. Sensitivity studies are performed to ascertain important variables which affect the integrated fault coverage and unavailability.

  7. Reliability modeling of digital component in plant protection system with various fault-tolerant techniques

    International Nuclear Information System (INIS)

    Highlights: • Integrated fault coverage is introduced for reflecting characteristics of fault-tolerant techniques in the reliability model of digital protection system in NPPs. • The integrated fault coverage considers the process of fault-tolerant techniques from detection to fail-safe generation process. • With integrated fault coverage, the unavailability of repairable component of DPS can be estimated. • The new developed reliability model can reveal the effects of fault-tolerant techniques explicitly for risk analysis. • The reliability model makes it possible to confirm changes of unavailability according to variation of diverse factors. - Abstract: With the improvement of digital technologies, digital protection system (DPS) has more multiple sophisticated fault-tolerant techniques (FTTs), in order to increase fault detection and to help the system safely perform the required functions in spite of the possible presence of faults. Fault detection coverage is vital factor of FTT in reliability. However, the fault detection coverage is insufficient to reflect the effects of various FTTs in reliability model. To reflect characteristics of FTTs in the reliability model, integrated fault coverage is introduced. The integrated fault coverage considers the process of FTT from detection to fail-safe generation process. A model has been developed to estimate the unavailability of repairable component of DPS using the integrated fault coverage. The new developed model can quantify unavailability according to a diversity of conditions. Sensitivity studies are performed to ascertain important variables which affect the integrated fault coverage and unavailability

  8. Reactive system verification case study: Fault-tolerant transputer communication

    Science.gov (United States)

    Crane, D. Francis; Hamory, Philip J.

    1993-01-01

    A reactive program is one which engages in an ongoing interaction with its environment. A system which is controlled by an embedded reactive program is called a reactive system. Examples of reactive systems are aircraft flight management systems, bank automatic teller machine (ATM) networks, airline reservation systems, and computer operating systems. Reactive systems are often naturally modeled (for logical design purposes) as a composition of autonomous processes which progress concurrently and which communicate to share information and/or to coordinate activities. Formal (i.e., mathematical) frameworks for system verification are tools used to increase the users' confidence that a system design satisfies its specification. A framework for reactive system verification includes formal languages for system modeling and for behavior specification and decision procedures and/or proof-systems for verifying that the system model satisfies the system specifications. Using the Ostroff framework for reactive system verification, an approach to achieving fault-tolerant communication between transputers was shown to be effective. The key components of the design, the decoupler processes, may be viewed as discrete-event-controllers introduced to constrain system behavior such that system specifications are satisfied. The Ostroff framework was also effective. The expressiveness of the modeling language permitted construction of a faithful model of the transputer network. The relevant specifications were readily expressed in the specification language. The set of decision procedures provided was adequate to verify the specifications of interest. The need for improved support for system behavior visualization is emphasized.

  9. Evaluation of digital fault-tolerant architectures for nuclear power plant control systems

    International Nuclear Information System (INIS)

    Four fault tolerant architectures were evaluated for their potential reliability in service as control systems of nuclear power plants. The reliability analyses showed that human- and software-related common cause failures and single points of failure in the output modules are dominant contributors to system unreliability. The four architectures are triple-modular-redundant (TMR), both synchronous and asynchronous, and also dual synchronous and asynchronous. The evaluation includes a review of design features, an analysis of the importance of coverage, and reliability analyses of fault tolerant systems. An advantage of fault-tolerant controllers over those not fault tolerant, is that fault-tolerant controllers continue to function after the occurrence of most single hardware faults. However, most fault-tolerant controllers have single hardware components that will cause system failure, almost all controllers have single points of failure in software, and all are subject to common cause failures. Reliability analyses based on data from several industries that have fault-tolerant controllers were used to estimate the mean-time-between-failures of fault-tolerant controllers and to predict those failures modes that may be important in nuclear power plants. 7 refs., 4 tabs

  10. Fault-tolerant for Electric Vehicles Drive System Sensor Failure

    Directory of Open Access Journals (Sweden)

    Zhang Liwei

    2013-10-01

    Full Text Available When EV failure happens, it needs to take some fault-tolerant method to ensure people’s safety. When the current sensor and speed sensor are out of work, the software fault-tolerant control algorithm switching strategy can be used. This paper has done theoretical analysis of the rotor field-oriented vectoe control algorithm into the open loop constant V/F control algorithm, and the phase angle compensation method is used to reduce the shock of current and torque, and simulation is done in MATLAB/Simulink.    

  11. Advanced information processing system - Status report. [for fault tolerant and damage tolerant data processing for aerospace vehicles

    Science.gov (United States)

    Brock, L. D.; Lala, J.

    1986-01-01

    The Advanced Information Processing System (AIPS) is designed to provide a fault tolerant and damage tolerant data processing architecture for a broad range of aerospace vehicles. The AIPS architecture also has attributes to enhance system effectiveness such as graceful degradation, growth and change tolerance, integrability, etc. Two key building blocks being developed by the AIPS program are a fault and damage tolerant processor and communication network. A proof-of-concept system is now being built and will be tested to demonstrate the validity and performance of the AIPS concepts.

  12. Boolean Logic with Fault Tolerant Coding

    OpenAIRE

    Alagoz, B. Baykant

    2009-01-01

    Error detectable and error correctable coding in Hamming space was researched to discover possible fault tolerant coding constellations, which can implement Boolean logic with fault tolerant property. Basic logic operators of the Boolean algebra were developed to apply fault tolerant coding in the logic circuits. It was shown that application of three-bit fault tolerant codes have provided the digital system skill of auto-recovery without need for designing additional-fault ...

  13. Design and Assessment of a Multiple Sensor Fault Tolerant Robust Control System

    Directory of Open Access Journals (Sweden)

    J. Chen

    2008-03-01

    Full Text Available This paper presents an enhanced robust control design structure to realise fault tolerance towards sensor faults suitable for multi-input-multi-output (MIMO systems implementation. The proposed design permits fault detection and controller elements to be designed with considerations to stability and robustness towards uncertainties besides multiple faults environment on a common mathematical platform. This framework can also cater to systems requiring fast responses. A design example is illustrated with a fast, multivariable and unstable system, that is, the double inverted pendulum system. Results indicate the potential of this design framework to handle fast systems with multiple sensor faults.

  14. Fault tolerant control for nonlinear systems described by Takagi-Sugeno models

    OpenAIRE

    Kheder, Atef; Ben Othman, Kamel; Benrejeb, Mohamed; Maquin, Didier

    2010-01-01

    In this paper the problem of active fault tolerant control (FTC) in noisy systems is studied. The proposed FTC strategy is based on the known of the fault estimate and the error between the faulty system state and a reference system state. A proportional integral observer is used in order to estimate the state and the actuator faults. The obtained results are then extended to nonlinear systems described by nonlinear Takagi-Sugeno models. The problem of conception of the proportional integral ...

  15. Transient Fault Tolerance and System Safety Enhancement Based on System Theory

    Directory of Open Access Journals (Sweden)

    Xiongfeng Huang

    2011-10-01

    Full Text Available Transient faults are hard to be detected and located due to their unpredictable nature and short duration, and they are the dominant causations of system failures, which makes it necessary to consider transient fault-tolerant design in the development of modern safety-critical industrial system. In this paper an approach based on system theory is proposed to tolerate the transient faults in tunnel construction wireless monitoring and control systems (TCWMCS, in which the effects of transient faults are expressed by dysfunction of interactions among software applications. After analyzing the dysfunctional interactions of the system by the operational process model and educing the causes of dysfunction in the functional control diagram, a safety enhancement way was proposed for the designers, in which effictive safety constraints were set up to tolerate the transient faults. The experiment evaluation indicated that the effects of transient faults could be exposed by the causal factors of dysfunctional interactions and system safety could be enhanced by the enforcement of  appropriate constraints.

  16. A Piecewise Affine Hybrid Systems Approach to Fault Tolerant Satellite Formation Control

    DEFF Research Database (Denmark)

    Grunnet, Jacob Deleuran; Larsen, Jesper Abildgaard

    2008-01-01

    In this paper a procedure for modelling satellite formations   including failure dynamics as a piecewise-affine hybrid system is   shown. The formulation enables recently developed methods and tools   for control and analysis of piecewise-affine systems to be applied   leading to synthesis of fault tolerant controllers and analysis of   the system behaviour given possible faults.  The method is   illustrated using a simple example involving two satellites trying   to reach a specific formation despite of actuator faults occurring.

  17. FATOMAS - A Fault-Tolerant Mobile Agent System Based on the Agent-Dependent Approach

    OpenAIRE

    Pleisch, Stefan; Schiper, André

    2001-01-01

    Fault tolerance is fundamental to the further development of mobile agent applications. In the context of mobile agents, fault-tolerance prevents a partial or complete loss of the agent, i.e., it ensures that the agent arrives at its destination. In this paper, we present FATOMAS, a Java-based fault-tolerant mobile agent system based on an algorithm presented in an earlier paper. In contrary to the standard ``place-dependent'' architectural approach, FATOMAS uses the novel ``agent-d...

  18. Award ER25750: Coordinated Infrastructure for Fault Tolerance Systems Indiana University Final Report

    Energy Technology Data Exchange (ETDEWEB)

    Lumsdaine, Andrew

    2013-03-08

    The main purpose of the Coordinated Infrastructure for Fault Tolerance in Systems initiative has been to conduct research with a goal of providing end-to-end fault tolerance on a systemwide basis for applications and other system software. While fault tolerance has been an integral part of most high-performance computing (HPC) system software developed over the past decade, it has been treated mostly as a collection of isolated stovepipes. Visibility and response to faults has typically been limited to the particular hardware and software subsystems in which they are initially observed. Little fault information is shared across subsystems, allowing little flexibility or control on a system-wide basis, making it practically impossible to provide cohesive end-to-end fault tolerance in support of scientific applications. As an example, consider faults such as communication link failures that can be seen by a network library but are not directly visible to the job scheduler, or consider faults related to node failures that can be detected by system monitoring software but are not inherently visible to the resource manager. If information about such faults could be shared by the network libraries or monitoring software, then other system software, such as a resource manager or job scheduler, could ensure that failed nodes or failed network links were excluded from further job allocations and that further diagnosis could be performed. As a founding member and one of the lead developers of the Open MPI project, our efforts over the course of this project have been focused on making Open MPI more robust to failures by supporting various fault tolerance techniques, and using fault information exchange and coordination between MPI and the HPC system software stack?from the application, numeric libraries, and programming language runtime to other common system components such as jobs schedulers, resource managers, and monitoring tools.

  19. Diagnosis and fault-tolerant control

    CERN Document Server

    Blanke, Mogens; Lunze, Jan; Staroswiecki, Marcel

    2016-01-01

    Fault-tolerant control aims at a gradual shutdown response in automated systems when faults occur. It satisfies the industrial demand for enhanced availability and safety, in contrast to traditional reactions to faults, which bring about sudden shutdowns and loss of availability. The book presents effective model-based analysis and design methods for fault diagnosis and fault-tolerant control. Architectural and structural models are used to analyse the propagation of the fault through the process, to test the fault detectability and to find the redundancies in the process that can be used to ensure fault tolerance. It also introduces design methods suitable for diagnostic systems and fault-tolerant controllers for continuous processes that are described by analytical models of discrete-event systems represented by automata. The book is suitable for engineering students, engineers in industry and researchers who wish to get an overview of the variety of approaches to process diagnosis and fault-tolerant contro...

  20. Flexible fault tolerance in configurable middleware for embedded systems

    Energy Technology Data Exchange (ETDEWEB)

    Dorow, Kevin E.

    2003-11-03

    MicroQoSCORBA (MQC) is a middleware platform that focuses on embedded applications by providing a very fine level of configurability of its internal orthogonal components. Using this configurability, a developer can generate a customized middleware instantiation that is tailored to both the requirements and constraints of a specific embedded application and the embedded hardware. One of the key components provided by MQC is a set of fault-tolerant mechanisms, which allow for support of applications that require a higher level of reliability. This document provides a detailed description of the algorithms and protocols selected for these mechanisms, along with a discussion of their implementation and incorporation into the MQC platform.

  1. Application-driven co-design of fault-tolerant industrial systems

    OpenAIRE

    Restrepo Calle, Felipe; Martínez Álvarez, Antonio; Guzmán Miranda, Hipólito; Palomo Pinto, Francisco Rogelio; Cuenca Asensi, Sergio

    2010-01-01

    This paper presents a novel methodology for the HW/SW co-design of fault tolerant embedded systems that pursues the mitigation of radiation-induced upset events (which are a class of Single Event Effects - SEEs) on critical industrial applications. The proposal combines the flexibility and low cost of Software Implemented Hardware Fault Tolerance (SIHFT) techniques with the high reliability of selective hardware replication. The co-design flow is supported by a hardening platform that compris...

  2. Industrial Cost-Benefit Assessment for Fault-tolerant Control Systems

    DEFF Research Database (Denmark)

    Thybo, C.; Blanke, M.

    1998-01-01

    Economic aspects are decisive for industrial acceptance of research concepts including the promising ideas in fault tolerant control. Fault tolerance is the ability of a system to detect, isolate and accommodate a fault, such that simple faults in a sub-system do not develop into failures at a system level. In a design phase for an industrial system, possibilities span from fail safe design where any single point failure is accommodated by hardware, over fault-tolerant design where selected faults are handled without extra hardware, to fault-ignorant design where no extra precaution is taken against failure. The paper describes the assessments needed to find the right path for new industrial designs. The economic decisions in the design phase are discussed: cost of different failures, profits associated with available benefits, investments needed for development and life-time support. The objective of this paper is to help, in the early product development state, to find the economical most suitable scheme. A salient result is that with increased customer awareness of total cost of ownership, new products can benefit significantly from applying fault tolerant control principles.

  3. Towards fault-tolerant decision support systems for ship operator guidance

    DEFF Research Database (Denmark)

    Nielsen, Ulrik Dam; Lajic, Zoran

    2012-01-01

    Fault detection and isolation are very important elements in the design of fault-tolerant decision support systems for ship operator guidance. This study outlines remedies that can be applied for fault diagnosis, when the ship responses are assumed to be linear in the wave excitation. A novel numerical procedure is described for the calculation of residuals using the ship's transfer functions which correlate the wave excitation and the ship responses. As tests, multiplicative faults have artificially been imposed to full-scale motion measurements and it is shown that the developed model is able to detect and isolate all faults.

  4. Towards fault-tolerant decision support systems for ship operator guidance

    International Nuclear Information System (INIS)

    Fault detection and isolation are very important elements in the design of fault-tolerant decision support systems for ship operator guidance. This study outlines remedies that can be applied for fault diagnosis, when the ship responses are assumed to be linear in the wave excitation. A novel numerical procedure is described for the calculation of residuals using the ship's transfer functions which correlate the wave excitation and the ship responses. As tests, multiplicative faults have artificially been imposed to full-scale motion measurements and it is shown that the developed model is able to detect and isolate all faults.

  5. Fault-diagnosis applications. Model-based condition monitoring. Acutators, drives, machinery, plants, sensors, and fault-tolerant systems

    Energy Technology Data Exchange (ETDEWEB)

    Isermann, Rolf [Technische Univ. Darmstadt (DE). Inst. fuer Automatisierungstechnik (IAT)

    2011-07-01

    Supervision, condition-monitoring, fault detection, fault diagnosis and fault management play an increasing role for technical processes and vehicles in order to improve reliability, availability, maintenance and lifetime. For safety-related processes fault-tolerant systems with redundancy are required in order to reach comprehensive system integrity. This book is a sequel of the book ''Fault-Diagnosis Systems'' published in 2006, where the basic methods were described. After a short introduction into fault-detection and fault-diagnosis methods the book shows how these methods can be applied for a selection of 20 real technical components and processes as examples, such as: Electrical drives (DC, AC) Electrical actuators Fluidic actuators (hydraulic, pneumatic) Centrifugal and reciprocating pumps Pipelines (leak detection) Industrial robots Machine tools (main and feed drive, drilling, milling, grinding) Heat exchangers Also realized fault-tolerant systems for electrical drives, actuators and sensors are presented. The book describes why and how the various signal-model-based and process-model-based methods were applied and which experimental results could be achieved. In several cases a combination of different methods was most successful. The book is dedicated to graduate students of electrical, mechanical, chemical engineering and computer science and for engineers. (orig.)

  6. Fault Tolerant Software Architectures

    OpenAIRE

    Saridakis, Titos; Issarny, Valérie

    1998-01-01

    Coping explicitly with failures during the conception and the design of software development complicates significantly the designer's job. The design complexity leads to software descriptions difficult to understand, which have to undergo many simplifications until their first functioning version. To support the systematic development of complex, fault tolerant software, this paper proposes a layered framework for the analysis of the fault tolerance software properties, where the top-most lay...

  7. Fault-Tolerant Consensus of Multi-Agent System With Distributed Adaptive Protocol.

    Science.gov (United States)

    Chen, Shun; Ho, Daniel W C; Li, Lulu; Liu, Ming

    2015-10-01

    In this paper, fault-tolerant consensus in multi-agent system using distributed adaptive protocol is investigated. Firstly, distributed adaptive online updating strategies for some parameters are proposed based on local information of the network structure. Then, under the online updating parameters, a distributed adaptive protocol is developed to compensate the fault effects and the uncertainty effects in the leaderless multi-agent system. Based on the local state information of neighboring agents, a distributed updating protocol gain is developed which leads to a fully distributed continuous adaptive fault-tolerant consensus protocol design for the leaderless multi-agent system. Furthermore, a distributed fault-tolerant leader-follower consensus protocol for multi-agent system is constructed by the proposed adaptive method. Finally, a simulation example is given to illustrate the effectiveness of the theoretical analysis. PMID:25415998

  8. Fault Tolerant Control Systems : a Development Method and Real-Life Case Study

    DEFF Research Database (Denmark)

    BØgh, S.A.

    1997-01-01

    This thesis considered the development of fault tolerant control systems. The focus was on the category of automated processes that do not necessarily comprise a high number of identical sensors and actuators to maintain safe operation, but still have a potential for improving immunity to component failures. It is often feasible to increase availability for these control loops by designing the control system to perform on-line detection and reconfiguration in case of faults before the safety system makes a close-down of the process. A general development methodology is given in the thesis that carried the control system designer through the steps necessary to consider fault handling in an early design phase. It was shown how an existing control loop with interface to the plant wide control system could be extended with three additional modules to obtain fault tolerance: Fault detection and isolation, remedial action decision, and reconfiguration. The integration of these modules in software were considered. The general methodology covered the analysis, design, and implementation of fault tolerant control systems on an overall level. Two detailed studies were presented, one on fault detection and isolation design and one on design of the decision logic. Two application case studies were used to emphasize practical aspects of both the development methodology and the detailed studies. One was an electro-mechanical actuator in a position control loop for a diesel engine speed governor where the purpose was to avoid a total close-down in case of the most likely faults. The second was a fault tolerant attitude control system for a micro satellite where the operation of the system is mission critical. The purpose was to avoid hazardous effects from faults and maintain operation if possible. A method was introduced that, after a systematic examination of possible component failures, enables analysis of the relationship between failures and their consequences for the system's operation. This fault propagation analysis is based on coarse models of the subsystems describing the reaction to faults, as for example a variable being zero, low or high. Examples were given that illustrate how such models can be established by simple means, and yet provide important information when combined into a complete system. A special achievement was a method to determine how control loops behave in case of faults. This is not straight forward as the system behaviour depends on the character of the feedback. One of the detailed studies were the design of the decision logic in fault handling, realized as state-event machines. Guidelines for the design were provided, based on experience from the two case studies. Methods for verifying correct operation of the decision logic were described, where a completeness check against the fault propagation analysis is able to guarantee coverage of all considered faults. The usage of software tools to support the development process was illustrated with an off-the-shelf product for constraint logic solving and state-event machine analysis. The coarse system models and the decision logic were analyzed with the tool-box and it was shown how an easy analysis could be performed to verify correctness and completeness of the fault handling design. Experience from this study highlights requirements for a dedicated software environment for fault tolerant control systems design. The second detailed study addressed the detection of a fault event and determination of the failed component. A variety of algorithms were compared, based on two fault scenarios in the speed governor actuator setup. One was a position sensor fault and the second was an actuator current fault. The sensor fault detection was trivial, whereas the actuator fault was more challenging. The study demonstrated that many existing methods have a potential to detect and isolate the two faults, but also that the research field still misses a systematic approach to handle realistic problems such as low sampling rate and nonlinear characteristics of the system

  9. Adaptive sensor-fault tolerant control for a class of multivariable uncertain nonlinear systems.

    Science.gov (United States)

    Khebbache, Hicham; Tadjine, Mohamed; Labiod, Salim; Boulkroune, Abdesselem

    2015-03-01

    This paper deals with the active fault tolerant control (AFTC) problem for a class of multiple-input multiple-output (MIMO) uncertain nonlinear systems subject to sensor faults and external disturbances. The proposed AFTC method can tolerate three additive (bias, drift and loss of accuracy) and one multiplicative (loss of effectiveness) sensor faults. By employing backstepping technique, a novel adaptive backstepping-based AFTC scheme is developed using the fact that sensor faults and system uncertainties (including external disturbances and unexpected nonlinear functions caused by sensor faults) can be on-line estimated and compensated via robust adaptive schemes. The stability analysis of the closed-loop system is rigorously proven using a Lyapunov approach. The effectiveness of the proposed controller is illustrated by two simulation examples. PMID:25701191

  10. System Diagnosis and Fault Tolerance for Distributed Computing System: A Review

    OpenAIRE

    Nilotpal Baruah; Dr. Lakshmi P. Saikia; Dr. K. Hemachandran

    2013-01-01

    An adaptive system diagnosis fault tolerance method for distributed system. The system is comprised of a network including N nodes where N is integer and greater than equal to 3 and each node is able to execute an algorithm to communicate with the network. A computer network, often simply referred to as a network, is a collection of hardware components and computers interconnected by communication channels that allow sharing of resources and information. As computer network is a collection of...

  11. Checkpointing Based Fault Tolerant Job Scheduling System for Computational Grid

    OpenAIRE

    Mangesh Ramesh Balpande

    2014-01-01

    A computational grid environment, due to its heterogeneous, autonomous and dynamic nature is prone to different kinds of faults which may lead to delay in completion of job or even execution of job from starting point. Checkpointing mechanism plays a vital role for making grid more reliable, cost effective and efficient. In this paper, we have proposed schemes based on system checkpointing and application checkpointing. Their performance comparison is done based on the empirical study. The AB...

  12. Fault detection and fault tolerant control of a smart base isolation system with magneto-rheological damper

    International Nuclear Information System (INIS)

    Fault detection and isolation (FDI) in real-time systems can provide early warnings for faulty sensors and actuator signals to prevent events that lead to catastrophic failures. The main objective of this paper is to develop FDI and fault tolerant control techniques for base isolation systems with magneto-rheological (MR) dampers. Thus, this paper presents a fixed-order FDI filter design procedure based on linear matrix inequalities (LMI). The necessary and sufficient conditions for the existence of a solution for detecting and isolating faults using the H? formulation is provided in the proposed filter design. Furthermore, an FDI-filter-based fuzzy fault tolerant controller (FFTC) for a base isolation structure model was designed to preserve the pre-specified performance of the system in the presence of various unknown faults. Simulation and experimental results demonstrated that the designed filter can successfully detect and isolate faults from displacement sensors and accelerometers while maintaining excellent performance of the base isolation technology under faulty conditions

  13. Replicated R-Resilient Process Allocation for Load Distribution in Fault Tolerant System

    OpenAIRE

    Jian Wang; Jianling Sun; xinyu Wang; Hang Chen

    2008-01-01

    Process allocation for load distribution can improve system performance by utilizing resources efficiently. For primary-backup based fault tolerant system, a classic load-balancing process allocation method (two-stage allocation algorithm) has been proposed that can balance the load before as well as after faults occurrence. But two-stage allocation algorithm has bad scalability since its load-balancing performance reduces dramatically when each primary process is duplicated more than o...

  14. Checkpointing Based Fault Tolerant Job Scheduling System for Computational Grid

    Directory of Open Access Journals (Sweden)

    Mangesh Ramesh Balpande

    2014-09-01

    Full Text Available A computational grid environment, due to its heterogeneous, autonomous and dynamic nature is prone to different kinds of faults which may lead to delay in completion of job or even execution of job from starting point. Checkpointing mechanism plays a vital role for making grid more reliable, cost effective and efficient. In this paper, we have proposed schemes based on system checkpointing and application checkpointing. Their performance comparison is done based on the empirical study. The ABSC scheme is suitable for the applications where computations are not intense. But for computationally intense applications where reliability is more important ABAC scheme is more suitable. But this scheme may produce slight overheads in fault free situations and very reliable in faulty situations.

  15. Fault tolerant linear actuator

    Energy Technology Data Exchange (ETDEWEB)

    Tesar, Delbert

    2004-09-14

    In varying embodiments, the fault tolerant linear actuator of the present invention is a new and improved linear actuator with fault tolerance and positional control that may incorporate velocity summing, force summing, or a combination of the two. In one embodiment, the invention offers a velocity summing arrangement with a differential gear between two prime movers driving a cage, which then drives a linear spindle screw transmission. Other embodiments feature two prime movers driving separate linear spindle screw transmissions, one internal and one external, in a totally concentric and compact integrated module.

  16. Novel fault tolerant modular system architecture for I and C applications

    International Nuclear Information System (INIS)

    Novel fault tolerant 3U modular system architecture has been developed for safety related and safety critical I and C systems of the reactor. Design innovatively utilizes simplest multi-drop serial bus called Inter-Integrated Circuits (I2C) Bus for system operation with simplicity, fault tolerance and online maintainability (hot swap). I2C bus failure modes analysis was done and system design was hardened for possible failure modes. System backplane uses only passive components, dual redundant I2C buses, data consistency checks and geographical addressing scheme to tackle bus lock ups/stuck buses and bit flips in data transactions. Dual CPU active/standby redundancy architecture with hot swap implements tolerance for CPU software stuck up conditions and hardware faults. System cards implement hot swap for online maintainability, power supply fault containment, communication buses fault containment and I/O channel to channel isolation and independency. Typical applications for pure hardwired (without real time software) Core Temperature Monitoring System for FBRs, as a Universal Signal Conditioning System for safety related I and C systems and as a complete control system for non nuclear safety systems have also been discussed. (author)

  17. Active fault tolerant control of piecewise affine systems with reference tracking and input constraints

    DEFF Research Database (Denmark)

    Gholami, M.; Cocquempot, V.

    2014-01-01

    An active fault tolerant control (AFTC) method is proposed for discrete-time piecewise affine (PWA) systems. Only actuator faults are considered. The AFTC framework contains a supervisory scheme, which selects a suitable controller in a set of controllers such that the stability and an acceptable performance of the faulty system are held. The design of the supervisory scheme is not considered here. The set of controllers is composed of a normal controller for the fault-free case, an active fault detection and isolation controller for isolation and identification of the faults, and a set of passive fault tolerant controllers (PFTCs) modules designed to be robust against a set of actuator faults. In this research, the piecewise nonlinear model is approximated by a PWA system. The PFTCs are state feedback laws. Each one is robust against a fixed set of actuator faults and is able to track the reference signal while the control inputs are bounded. The PFTC problem is transformed into a feasibility problem of a set of LMIs. The method is applied on a large-scale live-stock ventilation model.

  18. An Overview of Checkpointing Techniques for Fault Tolerance in Distributed Computing Systems

    Directory of Open Access Journals (Sweden)

    Jagdish Makhijani Dr. Anil Rajput

    2012-02-01

    Full Text Available Checkpointing is an important feature in distributed computing systems. It gives fault tolerance without requiring additional efforts from the programmer[1]. In order to provide fault tolerance for distributed systems, the checkpointing technique has widely been used and many researchers have been performed to reduce the overhead of checkpointing coordination. A checkpoint is a snapshot of the current state of a process. It saves enough information in non-volatile stable storage such that, if the contents of the volatile storage are lost due to process failure, one can reconstruct the process state from the information saved in the non-volatile stable storage [1].

  19. Designing an Adaptive Fault Tolerance Structure in Distributed Real Time Systems

    Directory of Open Access Journals (Sweden)

    N. Mosharraf

    2009-01-01

    Full Text Available In this study, the Fault Tolerance CORBA (FT-CORBA structure as a structure used for supporting fault tolerance programs as well as relative important parameters including replication style and number of replica which play further role in improved performance and making it adaptive to real time distributed system have been reviewed. Studying these specifications have been made a structure adaptive to real time systems with higher performance than FT-CORBA structure and finally the implementing of the said structure and determination of the number of replica and the objects replication style as well as the significance of related parameters have been investigated.

  20. Design of fault tolerant control system for steam generator using fuzzy logic

    International Nuclear Information System (INIS)

    A controller and sensor fault tolerant system for a steam generator is designed with fuzzy logic. A structure of the proposed fault tolerant redundant system is composed of a supervisor and two fuzzy weighting modulators. A supervisor alternatively checks a controller and a sensor induced performances to identify which part, a controller or a sensor, is faulty. In order to analyze controller induced performance both an error and a change in error of the system output are chosen as fuzzy variables. The fuzzy logic for a sensor induced performance uses two variables : a deviation between two sensor outputs and its frequency. Fuzzy weighting modulator generates an output signal compensated for faulty input signal. Simulations show that the proposed fault tolerant control scheme for a stem generator regulates well water level by suppressing fault effect of either controllers or sensors. Therefore through duplicating sensors and controllers with the proposed fault tolerant scheme, both a reliability of a steam generator control and sensor system and that of a power plant increase even more

  1. Fault Tolerance in a Multi-Layered DRE System: A Case Study

    Directory of Open Access Journals (Sweden)

    Paul Rubel

    2006-09-01

    Full Text Available Dynamic resource management is a crucial part of the infrastructure for emerging distributed real-time embedded systems, responsible for keeping mission-critical applications operating and allocating the resources necessary for them to meet their requirements. Because of this, the resource manager must be fault-tolerant, with nearly continuous operation. This paper describes our efforts to develop a fault-tolerant multi-layer dynamic resource management capability and the challenges we encountered, some due to the fault tolerance requirements we needed to meet and others due to characteristics of the resource management software. The challenges include the need for extremely rapid recovery; supporting the characteristics of component middleware, including peer-to-peer communication and multi-tiered calling semantics; supporting multiple languages; and the co-existence of replicated and non-replicated elements. Making our multi-layer dynamic resource manager fault-tolerant required simultaneously overcoming all of these challenges, presenting a significant fault tolerance research challenge.

  2. The fault-tolerant multiprocessor computer

    Energy Technology Data Exchange (ETDEWEB)

    Smith, T.B. III; Lala, J.H.; Goldberg, J.; Kautz, W.H.; Melliar-Smith, P.M.; Green, M.W.; Levitt, K.N.; Schwartz, R.L.; Weinstock, C.B.; Palumbo, D.; Butler, R.W.

    1986-01-01

    This book presents studies of two fault-tolerant computer systems designed to meet the extreme reliability requirements for safety- critical functions in advanced NASA vehicles , plus a study of potential architectures for future flight control fault-tolerant systems, which might succeed the current generation of computers. While it is understood that these studies were done for NASA, they also have practical commercial applicability. The fault-tolerant multiprocessor (FTMP) architecture is a high reliability computer concept. The basic organization of the FTMP is that of a general purpose homogeneous multiprocessor. Three processors operate on a shared system (memory and l/O) bus. Replication and tight synchronization of all elements and hardware voting are employed to detect and correct any single fault. Reconfiguration is then employed to ''repair'' a fault. Multiple faults may be tolerated as a sequence of single faults with repair between fault occurrences.

  3. Fault tolerance control of phase current in permanent magnet synchronous motor control system

    Science.gov (United States)

    Chen, Kele; Chen, Ke; Chen, Xinglong; Li, Jinying

    2014-08-01

    As the Photoelectric tracking system develops from earth based platform to all kinds of moving platform such as plane based, ship based, car based, satellite based and missile based, the fault tolerance control system of phase current sensor is studied in order to detect and control of failure of phase current sensor on a moving platform. By using a DC-link current sensor and the switching state of the corresponding SVPWM inverter, the failure detection and fault control of three phase current sensor is achieved. Under such conditions as one failure, two failures and three failures, fault tolerance is able to be controlled. The reason why under the method, there exists error between fault tolerance control and actual phase current, is analyzed, and solution to weaken the error is provided. The experiment based on permanent magnet synchronous motor system is conducted, and the method is proven to be capable of detecting the failure of phase current sensor effectively and precisely, and controlling the fault tolerance simultaneously. With this method, even though all the three phase current sensors malfunction, the moving platform can still work by reconstructing the phase current of the motor.

  4. On-board fault-tolerant SAR processor for spaceborne imaging radar systems

    Science.gov (United States)

    Fang, Wai-Chi; Le, Charles; Taft, Stephanie

    2005-01-01

    A real-time high-performance and fault-tolerant FPGA-based hardware architecture for the processing of synthetic aperture radar (SAR) images has been developed for advanced spaceborne radar imaging systems. In this paper, we present the integrated design approach, from top-level algorithm specifications, system architectures, design methodology, functional verification, performance validation, down to hardware design and implementation.

  5. Energy/Reliability Trade-offs in Fault-Tolerant Event-Triggered Distributed Embedded Systems

    DEFF Research Database (Denmark)

    Gan, Junhe; Gruian, Flavius

    2011-01-01

    This paper presents an approach to the synthesis of low-power fault-tolerant hard real-time applications mapped on distributed heterogeneous embedded systems. Our synthesis approach decides the mapping of tasks to processing elements, as well as the voltage and frequency levels for executing each task, such that transient faults are tolerated, the timing constraints of the application are satisfied, and the energy consumed is minimized. Tasks are scheduled using fixed-priority preemptive scheduling, while replication is used for recovery from multiple transient faults. Addressing energy and reliability simultaneously is especially challenging, since lowering the voltage to reduce the energy consumption has been shown to increase the transient fault rate. We presented a Tabu Search-based approach which uses an energy/reliability trade-off model to find reliable and schedulable implementations with limited energy and hardware resources. We evaluated the algorithm proposed using several synthetic and reallife benchmarks.

  6. Formal specification of requirements for analytical redundancy-based fault-tolerant flight control systems

    Science.gov (United States)

    Del Gobbo, Diego

    2000-10-01

    Flight control systems are undergoing a rapid process of automation. The use of Fly-By-Wire digital flight control systems in commercial aviation (Airbus 320 and Boeing FBW-B777) is a clear sign of this trend. The increased automation goes in parallel with an increased complexity of flight control systems with obvious consequences on reliability and safety. Flight control systems must meet strict fault-tolerance requirements. The standard solution to achieving fault tolerance capability relies on multi-string architectures. On the other hand, multi-string architectures further increase the complexity of the system inducing a reduction of overall reliability. In the past two decades a variety of techniques based on analytical redundancy have been suggested for fault diagnosis purposes. While research on analytical redundancy has obtained desirable results, a design methodology involving requirements specification and feasibility analysis of analytical redundancy based fault tolerant flight control systems is missing. The main objective of this research work is to describe within a formal framework the implications of adopting analytical redundancy as a basis to achieve fault tolerance. The research activity involves analysis of the analytical redundancy approach, analysis of flight control system informal requirements, and re-engineering (modeling and specification) of the fault tolerance requirements. The USAF military specification MIL-F-9490D and supporting documents are adopted as source for the flight control informal requirements. The De Havilland DHC-2 general aviation aircraft equipped with standard autopilot control functions is adopted as pilot application. Relational algebra is adopted as formal framework for the specification of the requirements. The detailed analysis and formalization of the requirements resulted in a better definition of the fault tolerance problem in the framework of analytical redundancy. Fault tolerance requirements and related certification procedures turned out to be considerably more demanding than those typically adopted in the literature. Furthermore, the research work brought up to light important issues in all fields involved in the specification process, namely flight control system requirements, analytical redundancy, and requirements engineering.

  7. Fault-tolerant EQL and MRL rule-based systems

    Energy Technology Data Exchange (ETDEWEB)

    Cheng, A.M.K. [Univ. of Houston, TX (United States)

    1996-12-31

    Rule-based systems operating in an embedded environment where internal variables may be corrupted during their execution as a result of transient faults must be able to recover automatically. Given a rule-based program p with bounded response time, the problem is to derive a self-stabilizing program q that implements p with the constraint that q must also have bounded response time. We first present an approach for solving this problem for a class of EQL rule-based programs with bounded response time. Then we extend this transformation approach to make a class of real-time MRL rule-based systems self-stabilizing. As a more expressive superset of EQL, MRL allows existentially quantified as well as universally quantified variables (simple or macro), making it comparable in expressive power to that of the OPS5 and CLIPS.

  8. A Fault tolerant Control Supervisory System development Procedurefor Small Satellites : The AAUSAT-II case

    DEFF Research Database (Denmark)

    Izadi-Zamanabadi, Roozbeh; Larsen, Jesper Abildgaard

    The paper presents a stepwise procedure to develop a fault tolerant control system for small satellites. The procedure is illustrated through implementation on the AAUSAT-II spacecraft. As it is shown the presented procedure requires expertise from several disciplines that are nevertheless necessary for obtaining a complete and consistent solution.

  9. A Fault tolerant Control Supervisory System development Procedurefor Small Satellites : The AAUSAT-II case

    OpenAIRE

    Izadi-Zamanabadi, Roozbeh; Larsen, Jesper Abildgaard

    2007-01-01

    The paper presents a stepwise procedure to develop a fault tolerant control system for small satellites. The procedure is illustrated through implementation on the AAUSAT-II spacecraft. As it is shown the presented procedure requires expertise from several disciplines that are nevertheless necessary for obtaining a complete and consistent solution.

  10. Fault Tolerant Computer Architecture

    CERN Document Server

    Sorin, Daniel

    2009-01-01

    For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes

  11. Scheduling and Optimization of Fault-Tolerant Embedded Systems with Transparency/Performance Trade-Offs

    DEFF Research Database (Denmark)

    Izosimov, Viacheslav; Pop, Paul; Eles, Petru; Peng, Zebo

    2012-01-01

    In this article, we propose a strategy for the synthesis of fault-tolerant schedules and for the mapping of fault-tolerant applications. Our techniques handle transparency/performance trade-offs and use the faultoccurrence information to reduce the overhead due to fault tolerance. Processes and messages are statically scheduled, and we use process reexecution for recovering from multiple transient faults. We propose a finegrained transparent recovery, where the property of transparency can be se...

  12. A Hybrid Real-time Fault-tolerant Scheduling Algorithm for Partial Reconfigurable System

    Directory of Open Access Journals (Sweden)

    Jinyong Yin

    2012-11-01

    Full Text Available Partial reconfigurable system is an architecture consisting general purpose processors and FPGAs, in which FPGA can be reconfigured in run-time. Based on the architecture, software tasks and hardware tasks that are executed on processor and FPGA respectively co-exist. In this paper, a real-time fault-tolerant scheduling algorithm is proposed to schedule software/hardware hybrid tasks. In the algorithm, the sufficient condition for schedulable hybrid tasks is derived from analyzing system operation conditions when the first deadline is missed, and rollback/recovery and TMR approaches are used respectively to schedule software subtasks and hardware subtasks for fault tolerance. The experimental results demonstrate that all deadlines of accepted hybrid tasks are met and processor’s utilization ratio is increased greatly compared with that of the exiting approaches when multiple faults occur.

  13. An Efficient Fault Tolerance System Design for Cmos/Nanodevice Digital Memories

    Directory of Open Access Journals (Sweden)

    D. Kavitha

    2014-11-01

    Full Text Available Targeting on the future fault-prone hybrid CMOS/Nanodevice digital memories, this paper present two faulttolerance design approaches the integrally address the tolerance for defect and transient faults. These two approaches share several key features, including the use of a group of Bose-Chaudhuri- Hocquenghem (BCH codes for both defect tolerance and transient fault tolerance, and integration of BCH code selection and dynamic logical-to-physical address mapping. Thus, a new model of BCH decoder is proposed to reduce the area and simplify the computational scheduling of both syndrome and chien search blocks without parallelism leading to high throughput.The goal of fault tolerant computing is improve the dependability of systems where dependability can be defined as the ability of a system to deliver service at an acceptable level of confidence in either presence or absence falult.ss The results of the simulation and implementation using Xilinx ISE software and the LCD screen on the FPGA’s Board will be shown at last.

  14. Diagnosis and Tolerant Strategy of an Open-Switch Fault for T-type Three-Level Inverter Systems

    DEFF Research Database (Denmark)

    Choi, Uimin; Lee, Kyo Beum

    2014-01-01

    This paper proposes a new diagnosis method of an open-switch fault and fault-tolerant control strategy for T-type three-level inverter systems. The location of faulty switch can be identified by the average of normalized phase current and the change of the neutral-point voltage. The proposed fault-tolerant strategy is explained by dividing into two cases: the faulty condition of half-bridge switches and the neutral-point switches. The performance of the T-type inverter system improves considerably by the proposed fault tolerant algorithm when a switch fails. The roposed method does not require additional components and complex calculations. Simulation and experimental results verify the feasibility of the proposed fault diagnosis and fault-tolerant control strategy.

  15. Tolerance of design faults

    OpenAIRE

    Powell, David; Arlat, Jean; Deswarte, Yves; Kanoun, Karama

    2011-01-01

    The idea that diverse or dissimilar computations could be used to detect errors can be traced back to Dynosius Lardner's analysis of Babbage's mechanical computers in the early 19th century. In the modern era of electronic computers, diverse redundancy techniques were pioneered in the 1970's by Elmendorf, Randell, Avi?zienis and Chen. Since then, the tolerance of design faults has been a very active research topic, which has had practical impact on real critical applications. In this paper, w...

  16. Stochastic Models for Fault Tolerance

    CERN Document Server

    Wolter, Katinka M

    2010-01-01

    As modern society relies on the fault-free operation of complex computing systems, system fault-tolerance has become an indispensable requirement. Therefore, we need mechanisms that guarantee correct service in cases where system components fail, be they software or hardware elements. Redundancy patterns are commonly used, for either redundancy in space or redundancy in time. Wolter's book details methods of redundancy in time that need to be issued at the right moment. In particular, she addresses the so-called "timeout selection problem", i.e., the question of choosing the right ti

  17. Integrity-Enhancing Replica Coordination for Byzantine Fault Tolerant Systems

    OpenAIRE

    Zhao, Wenbing

    2008-01-01

    Strong replica consistency is often achieved by writing deterministic applications, or by using a variety of mechanisms to render replicas deterministic. There exists a large body of work on how to render replicas deterministic under the benign fault model. However, when replicas can be subject to malicious faults, most of the previous work is no longer effective. Furthermore, the determinism of the replicas is often considered harmful from the security perspective and for m...

  18. Robust fault-tolerant H? control of active suspension systems with finite-frequency constraint

    Science.gov (United States)

    Wang, Rongrong; Jing, Hui; Karimi, Hamid Reza; Chen, Nan

    2015-10-01

    In this paper, the robust fault-tolerant (FT) H? control problem of active suspension systems with finite-frequency constraint is investigated. A full-car model is employed in the controller design such that the heave, pitch and roll motions can be simultaneously controlled. Both the actuator faults and external disturbances are considered in the controller synthesis. As the human body is more sensitive to the vertical vibration in 4-8 Hz, robust H? control with this finite-frequency constraint is designed. Other performances such as suspension deflection and actuator saturation are also considered. As some of the states such as the sprung mass pitch and roll angles are hard to measure, a robust H? dynamic output-feedback controller with fault tolerant ability is proposed. Simulation results show the performance of the proposed controller.

  19. Fault Tolerance Mobile Agent System Using Witness Agent in 2-Dimensional Mesh Network

    Directory of Open Access Journals (Sweden)

    Ahmad Rostami

    2010-09-01

    Full Text Available Mobile agents are computer programs that act autonomously on behalf of a user or its owner and travel through a network of heterogeneous machines. Fault tolerance is important in their itinerary. In this paper, existent methods of fault tolerance in mobile agents are described which they are considered in linear network topology. In the methods three agents are used to fault tolerance by cooperating to each others for detecting and recovering server and agent failure. Three types of agents are: actual agent which performs programs for its owner, witness agent which monitors the actual agent and the witness agent after itself, probe which is sent for recovery the actual agent or the witness agent on the side of the witness agent. Communication mechanism in the methods is message passing between these agents. The methods are considered in linear network. We introduce our witness agent approach for fault tolerance mobile agent systems in Two Dimensional Mesh (2D-Mesh Network. Indeed Our approach minimizes Witness-Dependency in this network and then represents its algorithm.

  20. A Systematic Approach to Sensitivity Analysis of Fault Tolerant Systems in NMR Architecture

    Directory of Open Access Journals (Sweden)

    Kourosh Aslansefat

    2015-01-01

    Full Text Available A fault tree illustrates the ways through which a system fails. It states different ways in which combination of faulty components result in an undesired event in the system. Being used in phases such as designing and exploiting industrial systems, and the designers able to evaluate the dependability attributes such as reliability, MTTF and sensitivity. In addition, in the mentioned ability, the fault tree is a systematic method for finding systems bottlenecks and weakness point. In spite of its extensive use in evaluating the reliability of systems, fault tree is rarely used in calculating sensitivity. In the last decade, few researches has been conducted in this field, however these methods are not applicable to large scale systems and are not systematic. This paper provides a systematic method for evaluating system sensitivity through fault tree. Then, it introduces sensitivity of NMR architecture as one of the common structures of fault tolerance which is used for enhancing systems’ reliability, safety and availability in industry. This article presents a comprehensive and parameterized formula for NMR structure's sensitivity. The presented method can be a great help for designing and exploiting reliable systems engineers in systematic and instant calculation of sensitivity by means of fault tree.

  1. Modeling and Design of Fault-Tolerant and Self-Adaptive Reconfigurable Networked Embedded Systems

    Directory of Open Access Journals (Sweden)

    Streichert Thilo

    2006-01-01

    Full Text Available Automotive, avionic, or body-area networks are systems that consist of several communicating control units specialized for certain purposes. Typically, different constraints regarding fault tolerance, availability and also flexibility are imposed on these systems. In this article, we will present a novel framework for increasing fault tolerance and flexibility by solving the problem of hardware/software codesign online. Based on field-programmable gate arrays (FPGAs in combination with CPUs, we allow migrating tasks implemented in hardware or software from one node to another. Moreover, if not enough hardware/software resources are available, the migration of functionality from hardware to software or vice versa is provided. Supporting such flexibility through services integrated in a distributed operating system for networked embedded systems is a substantial step towards self-adaptive systems. Beside the formal definition of methods and concepts, we describe in detail a first implementation of a reconfigurable networked embedded system running automotive applications.

  2. Modeling and Design of Fault-Tolerant and Self-Adaptive Reconfigurable Networked Embedded Systems

    Directory of Open Access Journals (Sweden)

    Jürgen Teich

    2006-06-01

    Full Text Available Automotive, avionic, or body-area networks are systems that consist of several communicating control units specialized for certain purposes. Typically, different constraints regarding fault tolerance, availability and also flexibility are imposed on these systems. In this article, we will present a novel framework for increasing fault tolerance and flexibility by solving the problem of hardware/software codesign online. Based on field-programmable gate arrays (FPGAs in combination with CPUs, we allow migrating tasks implemented in hardware or software from one node to another. Moreover, if not enough hardware/software resources are available, the migration of functionality from hardware to software or vice versa is provided. Supporting such flexibility through services integrated in a distributed operating system for networked embedded systems is a substantial step towards self-adaptive systems. Beside the formal definition of methods and concepts, we describe in detail a first implementation of a reconfigurable networked embedded system running automotive applications.

  3. To err is robotic, to tolerate immunological: fault detection in multirobot systems.

    Science.gov (United States)

    Tarapore, Danesh; Lima, Pedro U; Carneiro, Jorge; Christensen, Anders Lyhne

    2015-01-01

    Fault detection and fault tolerance represent two of the most important and largely unsolved issues in the field of multirobot systems (MRS). Efficient, long-term operation requires an accurate, timely detection, and accommodation of abnormally behaving robots. Most existing approaches to fault-tolerance prescribe a characterization of normal robot behaviours, and train a model to recognize these behaviours. Behaviours unrecognized by the model are consequently labelled abnormal or faulty. MRS employing these models do not transition well to scenarios involving temporal variations in behaviour (e.g., online learning of new behaviours, or in response to environment perturbations). The vertebrate immune system is a complex distributed system capable of learning to tolerate the organism's tissues even when they change during puberty or metamorphosis, and to mount specific responses to invading pathogens, all without the need of a genetically hardwired characterization of normality. We present a generic abnormality detection approach based on a model of the adaptive immune system, and evaluate the approach in a swarm of robots. Our results reveal the robust detection of abnormal robots simulating common electro-mechanical and software faults, irrespective of temporal changes in swarm behaviour. Abnormality detection is shown to be scalable in terms of the number of robots in the swarm, and in terms of the size of the behaviour classification space. PMID:25642825

  4. Fault detection and fault-tolerant control using sliding modes

    CERN Document Server

    Alwi, Halim; Tan, Chee Pin

    2011-01-01

    ""Fault Detection and Fault-tolerant Control Using Sliding Modes"" is the first text dedicated to showing the latest developments in the use of sliding-mode concepts for fault detection and isolation (FDI) and fault-tolerant control in dynamical engineering systems. It begins with an introduction to the basic concepts of sliding modes to provide a background to the field. This is followed by chapters that describe the use and design of sliding-mode observers for FDI using robust fault reconstruction. The development of a class of sliding-mode observers is described from first principles throug

  5. BYZANTINE FAULT TOLERANCE MODEL FOR SOAP FAULTS

    Directory of Open Access Journals (Sweden)

    V. Ramachandran

    2012-04-01

    Full Text Available The proposed model is to configure Byzantine Fault Tolerance mechanism for every SOAP fault message that is transmitted. The reliability and availability are of major requirements of Web services since they operate in the distributed environment. One of the reliability issues is handling faults. Fault occurs in all the phases of Service Oriented Architecture i.e. during publishing, discovery, composition, binding, and execution. These faults maylead to service downtime, behaves abnormally, and may send incorrect responses. These abnormalities are classified as Byzantine faults in Web services. Even though SOAP specification provides fault handlingmechanisms, the correctness of the received SOAP fault messages are not known. In this paper, a model is proposed to check the correctness of the SOAP fault message received, by incorporating the Byzantine agreement for fault tolerance. The existing fault tolerant mechanism detects server failure and routes the request to the next available server without the knowledge of the client. The proposed model ensures a transparent environment by providing fault handling information to the client. This is achieved by incorporating an activereplication technique.

  6. Guaranteed Cost Fault-tolerant Controller Design of Networked Control Systems under Variable-period Sampling

    Directory of Open Access Journals (Sweden)

    Xuan Li

    2009-01-01

    Full Text Available This study investigates the problem of integrity against actuator failures for networked control systems under variable-period sampling. Assuming that the distance between any two consecutive sampling instants is less than a given bound, by using the input delay approach, the networked control systems under variable-period sampling are transformed into the continuous-time networked control systems under time-varying delays. Then the existence conditions of guaranteed cost fault-tolerant control law is testified in terms of the Lyapunov stability theory combined with Linear Matrix Inequalities (LMIs. Furthermore, the guaranteed cost fault-tolerant controller gain and the minimization guaranteed cost can be obtained by solving a minimization problem. A numerical simulation example demonstrates the conclusions are feasible and effective. The proposed control method resolves the problems of variable-period sampling and actuator failures, which meets the requirements in industrial networked control systems.

  7. Fault-tolerant multiprocessor computer

    Energy Technology Data Exchange (ETDEWEB)

    Smith, T.B. III; Lala, J.H.; Goldberg, J.; Kautz, W.H.; Melliar-Smith, P.M.; Green, M.W.; Levitt, K.N.; Schwartz, R.L.; Weinstock, C.B.; Palumbo, D.L.

    1986-01-01

    The development and evaluation of fault-tolerant computer architectures and software-implemented fault tolerance (SIFT) for use in advanced NASA vehicles and potentially in flight-control systms are described in a collection of previously published reports prepared for NASA. Topics addressed include the principles of fault-tolerant multiprocessor (FTMP) operation; processor and slave regional designs; FTMP executive, facilities, aceptance-test/diagnostic, applications, and support software; FTM reliability and availability models; SIFT hardware design; and SIFT validation and verification.

  8. Heterogeneity-aware Fault Tolerance using a Self-Organizing Runtime System

    OpenAIRE

    Kicherer, Mario; Karl, Wolfgang

    2014-01-01

    Due to the diversity and implicit redundancy in terms of processing units and compute kernels, off-the-shelf heterogeneous systems offer the opportunity to detect and tolerate faults during task execution in hardware as well as in software. To automatically leverage this diversity, we introduce an extension of an online-learning runtime system that combines the benefits of the existing performance-oriented task mapping with task duplication, a diversity-oriented mapping stra...

  9. Failure Detection vs. Group Membership in Fault-Tolerant Distributed Systems: Hidden Trade-Offs

    OpenAIRE

    Schiper, A.

    2002-01-01

    Failure detection and group membership are two important components of fault-tolerant distributed systems. Understanding their role is essential when developing efficient solutions, not only in failure-free runs, but also in runs in which processes do crash. While group membership provides consistent information about the status of processes in the system, failure detectors provide inconsistent information. This paper discusses the trade-offs related to the use of these two components, ...

  10. Fault-tolerant unfalsified control for PEM fuel cell systems

    OpenAIRE

    Bianchi, Fernando Daniel; Ocampo-Martínez, Carlos; Kunusch, Cristian; Sánchez Peña, Ricardo Salvador

    2015-01-01

    The article addresses the implementation of a data-driven control strategy in a real test bench based on proton exchange membrane fuel cells (PEMFCs). The proposed control scheme is based on Unfalsified Control (UC), which allows adapting in real-time the control law by evaluating the performance specifications based only on measured input-output data. This approach is especially suitable to deal with non-linearity, model uncertainty and also possible faults that may occur in PEMFCs. The cont...

  11. Fault tolerant synchronization of chaotic systems based on T–S fuzzy model with fuzzy sampled-data controller

    International Nuclear Information System (INIS)

    In this paper the fault tolerant synchronization of two chaotic systems based on fuzzy model and sample data is investigated. The problem of fault tolerant synchronization is formulated to study the global asymptotical stability of the error system with the fuzzy sampled-data controller which contains a state feedback controller and a fault compensator. The synchronization can be achieved no matter whether the fault occurs or not. To investigate the stability of the error system and facilitate the design of the fuzzy sampled-data controller, a Takagi–Sugeno (T–S) fuzzy model is employed to represent the chaotic system dynamics. To acquire good performance and produce a less conservative analysis result, a new parameter-dependent Lyapunov–Krasovksii functional and a relaxed stabilization technique are considered. The stability conditions based on linear matrix inequality are obtained to achieve the fault tolerant synchronization of the chaotic systems. Finally, a numerical simulation is shown to verify the results. (general)

  12. Parallel, fault-tolerant control and diagnostics system for feedwater regulation in PWRS

    International Nuclear Information System (INIS)

    The feasibility of software based fault-tolerant feedwater flow control system has been investigated in this study. Although the architecture is not dedicated to a particular task, steam generator water level and differential pressure controllers will be discussed in this paper. In addition to parallel control and diagnostics techniques, an application of artificial neural networks for feedwater flow rate monitoring (to address venturi fouling) is also studied

  13. Robust fault tolerant control based on sliding mode method for uncertain linear systems with quantization.

    Science.gov (United States)

    Hao, Li-Ying; Yang, Guang-Hong

    2013-09-01

    This paper is concerned with the problem of robust fault-tolerant compensation control problem for uncertain linear systems subject to both state and input signal quantization. By incorporating novel matrix full-rank factorization technique with sliding surface design successfully, the total failure of certain actuators can be coped with, under a special actuator redundancy assumption. In order to compensate for quantization errors, an adjustment range of quantization sensitivity for a dynamic uniform quantizer is given through the flexible choices of design parameters. Comparing with the existing results, the derived inequality condition leads to the fault tolerance ability stronger and much wider scope of applicability. With a static adjustment policy of quantization sensitivity, an adaptive sliding mode controller is then designed to maintain the sliding mode, where the gain of the nonlinear unit vector term is updated automatically to compensate for the effects of actuator faults, quantization errors, exogenous disturbances and parameter uncertainties without the need for a fault detection and isolation (FDI) mechanism. Finally, the effectiveness of the proposed design method is illustrated via a model of a rocket fairing structural-acoustic. PMID:23701895

  14. Fault tolerant, multiplexed control rod position detection and indication system for nuclear power plants

    International Nuclear Information System (INIS)

    The majority of Westinghouse nuclear plants placed in service thus far have incorporated a Rod Position Indication system based upon an analog design philosophy. This system, while meeting all functional and accuracy requirements, has proven somewhat cumbersome, particularly in the area of initial field calibration and maintenance. This paper describes a new Digital Rod Position Indication system (DRPI) developed for use with pressurized water reactors. The system is based upon a digital design philosophy and meets all previous design constraints and environmental requirements. Further, fault tolerance, improved accuracy, interference from adjacent rods and the elimination of adjustments and calibration has been provided

  15. Nonlinear, Adaptive and Fault-tolerant Control for Electro-hydraulic Servo Systems

    DEFF Research Database (Denmark)

    Choux, Martin

    2011-01-01

    Fluid power systems have been in use since 1795 with the rst hydraulic press patented by Joseph Bramah and today form the basis of many industries. Electro hydraulic servo systems are uid power systems controlled in closed-loop. They transform reference input signals into a set of movements in hydraulic actuators (cylinders or motors) by the means of hydraulic uid under pressure. With the development of computing power and control techniques during the last few decades, they are used increasingly in many industrial elds which require high actuation forces within limited space. However, despite numerous attractive properties, hydraulic systems are always subject to potential leakages in their components, friction variation in their hydraulic actuators and deciency in their sensors. These violations of normal behaviour reduce the system performances and can lead to system failure if they are not detected early and handled. Moreover, the task of controlling electro hydraulic systems for high performance operations is challenging due to the highly nonlinear behaviour of such systems and the large amount of uncertainties present in their models. This thesis focuses on nonlinear adaptive fault-tolerant control for a representative electro hydraulic servo controlled motion system. The thesis extends existing models of hydraulic systems by considering more detailed dynamics in the servo valve and in the friction inside the hydraulic cylinder. It identies the model parameters using experimental data from a test bed by analysing both the time response to standard input signals and the variation of the outputs with dierent excitation frequencies. The thesis also presents a model that accurately describes the static and dynamic normal behaviour of the system. Further, in this thesis, a fault detector is designed and implemented on the test bed that successfully diagnoses internal or external leakages, friction variations in the actuator or fault related to pressure sensors. The presented algorithm uses the position and pressure measurements to detect and isolate faults, avoiding missed detection and false alarm. The thesis also develops a high performance adaptive nonlinear controller for the hydraulic system which outperforms comparable linear controllers widely used in the industry. Because of the controller adaptivity, uncertainties in the model parameters can be handled. Moreover, a special attention is given to reduce the complexity of the controller in order to demonstrate its real-time implementation. Finally the thesis combines the techniques developed in fault detection and nonlinear control in order to develop an active fault-tolerant controller for electro hydraulic servo systems. In order to maintain overall service and performances as high as possible when a potential fault occurs, the fault-tolerant controlled system prognoses the fault and changes its controller parameters or structure. The consequences of an unexpected fault are avoided, high availability is ensured and the overall safety in electro hydraulic servo systems is increased.

  16. Fault-Tolerant Process Control Methods and Applications

    CERN Document Server

    Mhaskar, Prashant; Christofides, Panagiotis D

    2013-01-01

    Fault-Tolerant Process Control focuses on the development of general, yet practical, methods for the design of advanced fault-tolerant control systems; these ensure an efficient fault detection and a timely response to enhance fault recovery, prevent faults from propagating or developing into total failures, and reduce the risk of safety hazards. To this end, methods are presented for the design of advanced fault-tolerant control systems for chemical processes which explicitly deal with actuator/controller failures and sensor faults and data losses. Specifically, the book puts forward: ·         a framework for  detection, isolation and diagnosis of actuator and sensor faults for nonlinear systems; ·         controller reconfiguration and safe-parking-based fault-handling methodologies; ·         integrated-data- and model-based fault-detection and isolation and fault-tolerant control methods; ·         methods for handling sensor faults and data losses; and ·      ...

  17. A Fault Tolerant Mobile Agent Information Retrieval System

    OpenAIRE

    R. Punithavathi; K.Duraiswamy

    2010-01-01

    Problem statement: Most of the information retrieval systems used only client-server architectures. The client-server model though powerful, had some limitations. In mobile computing environment which has both wired network and wireless networks with limited communication capabilities, the performance of the system was very low. Approach: Mobile agents are considered a suitable technology to develop applications such as information retrieval system for mobile computing environment. Mobile age...

  18. Performance Evaluation of SDS Algorithm with Fault Tolerance for Distributed System

    Directory of Open Access Journals (Sweden)

    K.Sathiya Bharathi,

    2012-07-01

    Full Text Available In the recent past, Security-sensitive applications, such as electronic transaction processing systems, stock quote update systems, which require high quality of security to guarantee authentication, integrity, and confidentiality of information, have adopted Heterogeneous Distributed System (HDS as their platforms.We systematically design a security-driven scheduling architecture that can dynamically measure the trust level of each node in the system by using differential equations and introduce SRank to estimate security overhead of critical tasks using SDS algorithm.Furthermore,we can achieve high quality of security for applications by using security-driven scheduling algorithm for DAGs in terms of minimizing the makespan, risk probability, and speedup. In addition to that the fault tolerant is included using Security Driven Fault Tolerant Scheduling Algorithm (SDFT to tolerate N processors failure at one time, and it introduced a new global scheduler to improve efficiency of scheduling process.Moreover, the SDFT supported flexible security policy applied on real time tasks according to its security requirement and considered the effect of security overhead during scheduling. We also observe that the improvement obtained by our algorithm increases as the security-sensitive data of applications increases.

  19. An integrated methodology for the dynamic performance and reliability evaluation of fault-tolerant systems

    International Nuclear Information System (INIS)

    We propose an integrated methodology for the reliability and dynamic performance analysis of fault-tolerant systems. This methodology uses a behavioral model of the system dynamics, similar to the ones used by control engineers to design the control system, but also incorporates artifacts to model the failure behavior of each component. These artifacts include component failure modes (and associated failure rates) and how those failure modes affect the dynamic behavior of the component. The methodology bases the system evaluation on the analysis of the dynamics of the different configurations the system can reach after component failures occur. For each of the possible system configurations, a performance evaluation of its dynamic behavior is carried out to check whether its properties, e.g., accuracy, overshoot, or settling time, which are called performance metrics, meet system requirements. Markov chains are used to model the stochastic process associated with the different configurations that a system can adopt when failures occur. This methodology not only enables an integrated framework for evaluating dynamic performance and reliability of fault-tolerant systems, but also enables a method for guiding the system design process, and further optimization. To illustrate the methodology, we present a case-study of a lateral-directional flight control system for a fighter aircraft

  20. A Robust and Fault-Tolerant Distributed Intrusion Detection System

    CERN Document Server

    Sen, Jaydip

    2011-01-01

    Since it is impossible to predict and identify all the vulnerabilities of a network, and penetration into a system by malicious intruders cannot always be prevented, intrusion detection systems (IDSs) are essential entities for ensuring the security of a networked system. To be effective in carrying out their functions, the IDSs need to be accurate, adaptive, and extensible. Given these stringent requirements and the high level of vulnerabilities of the current days' networks, the design of an IDS has become a very challenging task. Although, an extensive research has been done on intrusion detection in a distributed environment, distributed IDSs suffer from a number of drawbacks e.g., high rates of false positives, low detection efficiency etc. In this paper, the design of a distributed IDS is proposed that consists of a group of autonomous and cooperating agents. In addition to its ability to detect attacks, the system is capable of identifying and isolating compromised nodes in the network thereby introduc...

  1. Active Fault Tolerant Control of Livestock Stable Ventilation System

    DEFF Research Database (Denmark)

    Gholami, Mehdi

    2011-01-01

    Modern stables and greenhouses are equipped with different components for providing a comfortable climate for animals and plant. A component malfunction may result in loss of production. Therefore, it is desirable to design a control system, which is stable, and is able to provide an acceptable degraded performance even in the faulty case. In this thesis, we have designed such controllers for climate control systems for livestock buildings in three steps: Deriving a model for the climate control...

  2. Fault Tolerant Software: a Multi Agent System Solution

    DEFF Research Database (Denmark)

    Caponetti, Fabio; Bergantino, Nicola; Longhi, Sauro

    2009-01-01

    Development of high dependable systems remains a labour intensive task. This paper explores recent advances on the adaptation of the software agent architecture for control application while looking to dependability issues. Multiple agent systems theory will be reviewed giving methods to supervise it. Software ageing is shown to be the most common problem and rejuvenation its counteract. The paper will show how an agent population can be monitored, faulty agents isolated and reloaded in a health...

  3. Active Fault Tolerant Control of Livestock Stable Ventilation System

    OpenAIRE

    Gholami, Mehdi

    2011-01-01

    Modern stables and greenhouses are equipped with different components for providing a comfortable climate for animals and plant. A component malfunction may result in loss of production. Therefore, it is desirable to design a control system, which is stable, and is able to provide an acceptable degraded performance even in the faulty case. In this thesis, we have designed such controllers for climate control systems for livestock buildings in three steps: Deriving a model for the climate cont...

  4. Diagnosis and Tolerant Strategy of an Open-Switch Fault for T-type Three-Level Inverter Systems

    DEFF Research Database (Denmark)

    Choi, Uimin; Lee, Kyo Beum; Blaabjerg, Frede

    2014-01-01

    This paper proposes a new diagnosis method of an open-switch fault and fault-tolerant control strategy for T-type three-level inverter systems. The location of faulty switch can be identified by the average of normalized phase current and the change of the neutral-point voltage. The proposed fault-tolerant strategy is explained by dividing into two cases: the faulty condition of half-bridge switches and the neutral-point switches. The performance of the T-type inverter system improves considerab...

  5. The Isis project: Fault-tolerance in large distributed systems

    Science.gov (United States)

    Birman, Kenneth P.; Marzullo, Keith

    1993-01-01

    This final status report covers activities of the Isis project during the first half of 1992. During the report period, the Isis effort has achieved a major milestone in its effort to redesign and reimplement the Isis system using Mach and Chorus as target operating system environments. In addition, we completed a number of publications that address issues raised in our prior work; some of these have recently appeared in print, while others are now being considered for publication in a variety of journals and conferences.

  6. Fault-tolerant Agreement in Synchronous Message-passing Systems

    CERN Document Server

    Raynal, Michel

    2010-01-01

    The present book focuses on the way to cope with the uncertainty created by process failures (crash, omission failures and Byzantine behavior) in synchronous message-passing systems (i.e., systems whose progress is governed by the passage of time). To that end, the book considers fundamental problems that distributed synchronous processes have to solve. These fundamental problems concern agreement among processes (if processes are unable to agree in one way or another in presence of failures, no non-trivial problem can be solved). They are consensus, interactive consistency, k-set agreement an

  7. Fault tolerant computer control for a Maglev transportation system

    Science.gov (United States)

    Lala, Jaynarayan H.; Nagle, Gail A.; Anagnostopoulos, George

    1994-05-01

    Magnetically levitated (Maglev) vehicles operating on dedicated guideways at speeds of 500 km/hr are an emerging transportation alternative to short-haul air and high-speed rail. They have the potential to offer a service significantly more dependable than air and with less operating cost than both air and high-speed rail. Maglev transportation derives these benefits by using magnetic forces to suspend a vehicle 8 to 200 mm above the guideway. Magnetic forces are also used for propulsion and guidance. The combination of high speed, short headways, stringent ride quality requirements, and a distributed offboard propulsion system necessitates high levels of automation for the Maglev control and operation. Very high levels of safety and availability will be required for the Maglev control system. This paper describes the mission scenario, functional requirements, and dependability and performance requirements of the Maglev command, control, and communications system. A distributed hierarchical architecture consisting of vehicle on-board computers, wayside zone computers, a central computer facility, and communication links between these entities was synthesized to meet the functional and dependability requirements on the maglev. Two variations of the basic architecture are described: the Smart Vehicle Architecture (SVA) and the Zone Control Architecture (ZCA). Preliminary dependability modeling results are also presented.

  8. Diagnosis and Fault-tolerant Control, 2nd edition

    DEFF Research Database (Denmark)

    Blanke, Mogens; Kinnaert, Michel; Lunze, Jan; Starosweicki, Marcel

    2006-01-01

    Fault-tolerant control aims at a graceful degradation of the behaviour of automated systems in case of faults. It satisfies the industrial demand for enhanced availability and safety, in contrast to traditional reactions to faults that bring about sudden shutdowns and loss of availability. The book presents effective model-based analysis and design methods for fault diagnosis and fault-tolerant control. Architectural and structural models are used to analyse the propagation of the fault throught...

  9. Synchronization of fault-tolerant parallel processing systems

    Energy Technology Data Exchange (ETDEWEB)

    Harper, R.E.; Lala, J.H.

    1990-06-26

    This patent describes a system for synchronizing the operation of redundant processors forming a group thereof wherein a frame of operation is defined for each processor as a time period during which a selected number of specified processing events occurs. It comprises: means for performing a final one of the specified processing events for identifying the end of a current frame of operation; means responsive to the performing of the final one of the specified processing events by the last-to-perform of the processors for synchronizing the operations of the processors so that all of the processors can subsequently start their next frame of operation at substantially the same time; and each the processor further includes means responsive to the synchronizing means for performing a first one of the specified processing events to identify the start of the next frame of operation.

  10. Replicated R-Resilient Process Allocation for Load Distribution in Fault Tolerant System

    Directory of Open Access Journals (Sweden)

    Jian Wang

    2008-01-01

    Full Text Available Process allocation for load distribution can improve system performance by utilizing resources efficiently. For primary-backup based fault tolerant system, a classic load-balancing process allocation method (two-stage allocation algorithm has been proposed that can balance the load before as well as after faults occurrence. But two-stage allocation algorithm has bad scalability since its load-balancing performance reduces dramatically when each primary process is duplicated more than once (i.e., has more than one backup process. In this study, we present an improved algorithm named RSA (R-Stage Allocation algorithm that can have the load better balanced no matter how many backup processes each primary process owns; Simulations are also used to compare the proposed algorithm with the two-stage allocation algorithm and the experimental results show that when extending to replicated R-Resilient processes, RSA has significantly better load distribution performance than two-stage allocation algorithm.

  11. Fault tolerant architectures for superconducting qubits

    OpenAIRE

    DiVincenzo, David P.

    2009-01-01

    In this short review, I draw attention to new developments in the theory of fault tolerance in quantum computation that may give concrete direction to future work in the development of superconducting qubit systems. The basics of quantum error correction codes, which I will briefly review, have not significantly changed since their introduction fifteen years ago. But an interesting picture has emerged of an efficient use of these codes that may put fault tolerant operation w...

  12. Analysing Fault Tolerance for Erlang Applications

    OpenAIRE

    Nyström, Jan Henry

    2009-01-01

    ERLANG is a concurrent functional language, well suited for distributed, highly concurrent and fault-tolerant software. An important part of Erlang is its support for failure recovery. Fault tolerance is provided by organising the processes of an ERLANG application into tree structures. In these structures, parent processes monitor failures of their children and are responsible for their restart. Libraries support the creation of such structures during system initialisation.A technique to aut...

  13. Evaluation of error detection coverage and fault-tolerance of digital plant protection system in nuclear power plants

    International Nuclear Information System (INIS)

    Recently, traditional analog-based safety-related instrumentation and control (I and C) systems in nuclear power plants (NPPs) have been replaced with modern digital-based systems. Due to the digitalization of nuclear I and C systems, the safety assessment has become a major issue, as it is crucial to the system's reliability. In the safety assessment of the digitalized system, evaluation of error detection coverage and fault-tolerance are critical factors. For the evaluation, we use C++ based hardware description instead of a board with integrated circuit components. We select the digital plant protection system (DPPS) in NPPs as a target system. Permanent fault is used as a possible fault in the system and some error detection methods are used to detect errors. From the experiment, we confirmed that the proposed approach can evaluate the error detection coverage and the fault-tolerance of DPPS in NPPs

  14. The BTeV DAQ and trigger system - some throughput, usability and fault tolerance aspects

    International Nuclear Information System (INIS)

    As presented at the last CHEP conference, the BTeV triggering and data collection pose a significant challenge in construction and operation, generating 1.5 Terabytes/second of raw data from over 30 million detector channels. The authors report on facets of the DAQ and trigger farms. The authors report on the current design of the DAQ, especially its partitioning features to support commissioning of the detector. The authors are exploring collaborations with computer science groups experienced in fault tolerant and dynamic real-time and embedded systems to develop a system to provide the extreme flexibility and high availability required of the heterogeneous trigger farm (?ten thousand DSPs and commodity processors). The authors describe directions in the following areas: system modeling and analysis using the Model Integrated Computing approach to assist in the creation of domain-specific modeling, analysis, and program synthesis environments for building complex, large-scale computer-based systems; System Configuration Management to include compilable design specifications for configurable hardware components, schedules, and communication maps; Runtime Environment and Hierarchical Fault Detection/Management- a system-wide infrastructure for rapidly detecting, isolating, filtering, and reporting faults which will be encapsulated in intelligent active entities (agents) to run on DSPs, L2/3 processors, and other supporting processors throughout the system

  15. The BTeV DAQ and Trigger System - Some throughput, usability and fault tolerance aspects

    International Nuclear Information System (INIS)

    As presented at the last CHEP conference, the BTeV triggering and data collection pose a significant challenge in construction and operation, generating 1.5 Terabytes/second of raw data from over 30 million detector channels. We report on facets of the DAQ and trigger farms. We report on the current design of the DAQ, especially its partitioning features to support commissioning of the detector. We are exploring collaborations with computer science groups experienced in fault tolerant and dynamic real-time and embedded systems to develop a system to provide the extreme flexibility and high availability required of the heterogeneous trigger farm (? ten thousand DSPs and commodity processors). We describe directions in the following areas: system modeling and analysis using the Model Integrated Computing approach to assist in the creation of domain-specific modeling, analysis, and program synthesis environments for building complex, large-scale computer-based systems; System Configuration Management to include compilable design specifications for configurable hardware components, schedules, and communication maps; Runtime Environment and Hierarchical Fault Detection/Management--a system-wide infrastructure for rapidly detecting, isolating, filtering, and reporting faults which will be encapsulated in intelligent active entities (agents) to run on DSPs, L2/3 processors, and other supporting processors throughout the system

  16. A Study on Fault-Tolerant Software Architecture for COTS-Based Dependable System

    International Nuclear Information System (INIS)

    Recently, with the rapid development of digital computers and information processing technologies, nuclear instrument and control (I and C) systems which needs safety-critical function have adopted digital technologies. Also, use of commercial off-the-shelf (COTS) software in safety-critical system has been incremented with several reasons such as economical efficiency and technical problems. But, it requires a considerable integration effort and brings about software quality and safety issues. COTS software is usually provided as a black box that cannot be modified. The biggest problem when we integrate such a product into dependable systems is the reliability of COTS software. There is no guarantee that the software will perform its function correctly. It may have bugs or unidentified components. Recently, the method of software verification and validation (V and V) is accepted as a way to assure the dependability of new-developed safety-critical nuclear I and C software. But, because of the limitation of COTS software, software V and V cant be applied as rigorously as new-developed software. There are considerable attentions into describing software architecture with respect to there dependability properties. In this paper, we present fault-tolerant software architecture using the C2 architectural style. The remainder of the paper is organized as follows: Section 2 discusses background work on the COTS software in nuclear I and C, software fault tolerance and C2 architectural style. Section 3 describes the architecture for fault-tolerant COTS-based software. Finally, we discuss the conclusion and future work

  17. Fault-tolerant parallel processor

    Energy Technology Data Exchange (ETDEWEB)

    Harper, R.E.; Lala, J.H. (Charles Stark Draper Laboratory, Inc., Cambridge, MA (USA))

    1991-06-01

    This paper addresses issues central to the design and operation of an ultrareliable, Byzantine resilient parallel computer. Interprocessor connectivity requirements are met by treating connectivity as a resource that is shared among many processing elements, allowing flexibility in their configuration and reducing complexity. Redundant groups are synchronized solely by message transmissions and receptions, which aslo provide input data consistency and output voting. Reliability analysis results are presented that demonstrate the reduced failure probability of such a system. Performance analysis results are presented that quantify the temporal overhead involved in executing such fault-tolerance-specific operations. Empirical performance measurements of prototypes of the architecture are presented. 30 refs.

  18. Fault-tolerant rotary actuator

    Energy Technology Data Exchange (ETDEWEB)

    Tesar, Delbert

    2006-10-17

    A fault-tolerant actuator module, in a single containment shell, containing two actuator subsystems that are either asymmetrically or symmetrically laid out is provided. Fault tolerance in the actuators of the present invention is achieved by the employment of dual sets of equal resources. Dual resources are integrated into single modules, with each having the external appearance and functionality of a single set of resources.

  19. Fault-tolerant control of discrete-time LPV systems using virtual actuators and sensors

    DEFF Research Database (Denmark)

    Tabatabaeipour, S. Mojtaba; Stoustrup, Jakob

    2015-01-01

    This paper introduce a new fault-tolerant control (FTC) method for discrete-time linear parameter varying (LPV) systems. Fault-tolerance is achieved without redesigning the nominal controller by inserting a reconfiguration block between the plant and the nominal controller. The reconfiguration block is realized by a virtual actuator and a virtual sensor. The signals from the faulty system are transformed such that its behavior is similar to that of the nominal system from the viewpoint of the controller. It transforms the controller output for the faulty system preserving the stability and performance. Input-to-state stabilizing LPV gains of the virtual actuator and sensor are obtained by solving LMIs. The gains guarantees the input-to-state stability (ISS) of the closed-loop reconfigured system. Moreover, we obtain performances in terms of the ISS gains for the virtual actuator, the virtual sensor, and their interconnection. Minimizing these performances is formulated as convex optimization problems subject to LMI constraints. The effectiveness of the method is demonstrated via a numerical example and stator current control of an induction motor.

  20. State of the art on fault-tolerant real time distributed systems

    International Nuclear Information System (INIS)

    The integration of new computerized functions in power plant, and especially nuclear power plant, control and instrumentation systems implies more and more stringent requirements as to communication system reliability. For if an item of equipment, or even a computer program, can be validated and qualified, no formal qualification procedure is presently imposed on communication networks. This is certainly due to the relative immaturity of these networks, but also to their complexity. It is for this reason that, in the context of preparation for the future PWR 2000 standardized nuclear plants, it would seem appropriate to take a look at fault-tolerant communication systems. Since C and I type applications (in the control room) are divided between several computers and are required to contend with extremely severe time constraints, EDF has undertaken investigation of fault-tolerant, real time distributed systems. This paper summarized the state of the art in the field as it appears from discussion with computer manufacturers, academics and research workers on related projects. The results obtained were then used to determine trends as to ''promising'' solutions. The paper concludes with recommended study programs for the PCC department of EDF/R and DD for the next few years. (author), 9 figs., 10 refs., 2 annexes

  1. A multi-layer robust adaptive fault tolerant control system for high performance aircraft

    Science.gov (United States)

    Huo, Ying

    Modern high-performance aircraft demand advanced fault-tolerant flight control strategies. Not only the control effector failures, but the aerodynamic type failures like wing-body damages often result in substantially deteriorate performance because of low available redundancy. As a result the remaining control actuators may yield substantially lower maneuvering capabilities which do not authorize the accomplishment of the air-craft's original specified mission. The problem is to solve the control reconfiguration on available control redundancies when the mission modification is urged to save the aircraft. The proposed robust adaptive fault-tolerant control (RAFTC) system consists of a multi-layer reconfigurable flight controller architecture. It contains three layers accounting for different types and levels of failures including sensor, actuator, and fuselage damages. In case of the nominal operation with possible minor failure(s) a standard adaptive controller stands to achieve the control allocation. This is referred to as the first layer, the controller layer. The performance adjustment is accounted for in the second layer, the reference layer, whose role is to adjust the reference model in the controller design with a degraded transit performance. The upmost mission adjust is in the third layer, the mission layer, when the original mission is not feasible with greatly restricted control capabilities. The modified mission is achieved through the optimization of the command signal which guarantees the boundedness of the closed-loop signals. The main distinguishing feature of this layer is the the mission decision property based on the current available resources. The contribution of the research is the multi-layer fault-tolerant architecture that can address the complete failure scenarios and their accommodations in realities. Moreover, the emphasis is on the mission design capabilities which may guarantee the stability of the aircraft with restricted post-failure control capabilities. The implementation issues of the architecture are also addressed, with possible realizations and the feasibility analysis.

  2. Task Migration for Fault-Tolerance in Mixed-Criticality Embedded Systems

    DEFF Research Database (Denmark)

    Saraswat, Prabhat Kumar; Pop, Paul

    2009-01-01

    In this paper we are interested in mixed-criticality embedded applications implemented on distributed architectures. Depending on their time-criticality, tasks can be hard or soft real-time and regarding safety-criticality, tasks can be fault-tolerant to transient faults, permanent faults, or have no dependability requirements. We use Earliest Deadline First (EDF) scheduling for the hard tasks and the Constant Bandwidth Server (CBS) for the soft tasks. The CBS parameters determine the quality of service (QoS) of soft tasks. Transient faults are tolerated using checkpointing with roll- back recovery. For tolerating permanent faults in processors, we use task migration, i.e., restarting the safety-critical tasks on other processors. We propose a Greedy-based on- line heuristic for the migration of safety-critical tasks, in response to permanent faults, and the adjustment of CBS parameters on the target processors, such that the faults are tolerated, the deadlines for the hard real-time tasks are satisfied and the QoS for soft tasks is maximized. The proposed online adaptive approach has been evaluated using several synthetic benchmarks and a real-life case study.

  3. Fault tolerant control based on active fault diagnosis

    DEFF Research Database (Denmark)

    Niemann, Hans Henrik

    2005-01-01

    An active fault diagnosis (AFD) method will be considered in this paper in connection with a Fault Tolerant Control (FTC) architecture based on the YJBK parameterization of all stabilizing controllers. The architecture consists of a fault diagnosis (FD) part and a controller reconfiguration (CR) part. The FTC architecture can be applied for additive faults, parametric faults, and for system structural changes. Only parametric faults will be considered in this paper. The main focus in this paper is on the use of the new approach of active fault diagnosis in connection with FTC. The active fault diagnosis approach is based on including an auxiliary input in the system. A fault signature matrix is introduced in connection with AFD, given as the transfer function from the auxiliary input to the residual output. This can be considered as a generalization of the passive fault diagnosis case, where the diagnosis is only based on a residual vector. The fault diagnosis is then derived by on-line tests by using the residual vector.

  4. Diagnosis and Fault-tolerant Control, 3rd Edition

    DEFF Research Database (Denmark)

    Blanke, Mogens; Kinnaert, Michel

    2015-01-01

    The book presents effective model-based analysis and design methods for fault diagnosis and fault-tolerant control. Architectural and structural models are used to analyse the propagation of the fault through the process, to test the fault detectability and to find the redundancies in the process that can be used to ensure fault tolerance. It also introduces design methods suitable for diagnostic systems and fault-tolerant controllers for continuous processes that are described by analytical models of discrete-event systems represented by automata.

  5. Multi-agent Platform and Toolbox for Fault Tolerant Networked Control Systems

    Directory of Open Access Journals (Sweden)

    Mário J. G. C. Mendes

    2009-04-01

    Full Text Available Industrial distributed networked control systems use different communication networks to exchange different critical levels of information. Real-time control, fault diagnosis (FDI and Fault Tolerant Networked Control (FTNC systems demand one of the more stringent data exchange in the communication networks of these networked control systems (NCS. When dealing with large-scale complex NCS, designing FTNC systems is a very difficult task due to the large number of sensors and actuators spatially distributed and network connected. To solve this issue, a FTNC platform and toolbox are presented in this paper using simple and verifiable principles coming mainly from a decentralized design based on causal modelling partitioning of the NCS and distributed computing using multi-agent systems paradigm, allowing the use of agents with well established FTC methodologies or new ones developed taking into account the NCS specificities. The multi-agent platform and toolbox for FTNC systems have been built in Matlab/Simulink environment, which is in our days the scientific benchmark for this kind of research. Although the tests have been performed with a simple case, the results are promising and this approach is expected to succeed with more complex processes.

  6. Fault tolerant integrated inertial navigation/global positioning systems for next generation spacecraft

    Science.gov (United States)

    Miller, Hugh; Hilts, David A.

    The authors address the requirements, benefits, and mitigation of risks to adapt a commercial Hexad fault-tolerant inertial navigation/global positioning system (FT IN/GPS) for use in next-generation spacecraft. Next-generation requirements are examined to determine whether a high production base system can meet autonomous, reliable, and low-cost requirements for future spacecraft. The major benefits are the combining and replacement of functions, the reduction of unscheduled maintenance and operations costs, and a higher probability of mission success. The design, development, and production risks are mitigated by the long-term commercial production schedule for the Boeing 777 air data inertial reference unit (ADIRU) which begins in the mid-1990s. The conclusion is that a strapdown ring laser gyro (RLG) Hexad FT IN/GPS is the preferred integrated navigation and control system for next-generation vehicles.

  7. Backstepping decentralized fault tolerant control for reconfigurable modular robots

    OpenAIRE

    Jinbao He; Xinhua Yi; Zaifei Luo; Guojun Li

    2013-01-01

    For the actuators fault of reconfigurable modular robots, a backstepping decentralized fault tolerant control(DFTC) algorithm is proposed. The reconfigurable robot system is divied into a set of interconnected subsystems. The fault tolerant controller is designed based on backstepping method.

  8. Modular, Fault-Tolerant Electronics Supporting Space Exploration Project

    Data.gov (United States)

    National Aeronautics and Space Administration — Modern electronic systems tolerate only as many point failures as there are redundant system copies, using mere macro-scale redundancy. Fault Tolerant Electronics...

  9. A Constraint Logic Programming Framework for the Synthesis of Fault-Tolerant Schedules for Distributed Embedded Systems

    DEFF Research Database (Denmark)

    Poulsen, Kåre Harbo; Pop, Paul

    2007-01-01

    We present a constraint logic programming (CLP) approach for synthesis of fault-tolerant hard real-time applications on distributed heterogeneous architectures. We address time-triggered systems, where processes and messages are statically scheduled based on schedule tables. We use process re-execution for recovering from multiple transient faults. We propose three scheduling approaches, which each present a trade-off between schedule simplicity and performance, (i) full transparency, (ii) slack sharing and (iii) conditional, and provide various degrees of transparency. We have developed a CLP framework that produces the fault-tolerant schedules, guaranteeing schedulability in the presence of transient faults. We show how the framework can be used to tackle design optimization problems.The proposed approach has been evaluated using extensive experiments.

  10. A Fault-Tolerant Emergency-Aware Access Control Scheme for Cyber-Physical Systems

    CERN Document Server

    Wu, Guowei; Xia, Feng; Yao, Lin

    2012-01-01

    Access control is an issue of paramount importance in cyber-physical systems (CPS). In this paper, an access control scheme, namely FEAC, is presented for CPS. FEAC can not only provide the ability to control access to data in normal situations, but also adaptively assign emergency-role and permissions to specific subjects and inform subjects without explicit access requests to handle emergency situations in a proactive manner. In FEAC, emergency-group and emergency-dependency are introduced. Emergencies are processed in sequence within the group and in parallel among groups. A priority and dependency model called PD-AGM is used to select optimal response-action execution path aiming to eliminate all emergencies that occurred within the system. Fault-tolerant access control polices are used to address failure in emergency management. A case study of the hospital medical care application shows the effectiveness of FEAC.

  11. Low cost management of replicated data in fault-tolerant distributed systems

    Science.gov (United States)

    Joseph, Thomas A.; Birman, Kenneth P.

    1990-01-01

    Many distributed systems replicate data for fault tolerance or availability. In such systems, a logical update on a data item results in a physical update on a number of copies. The synchronization and communication required to keep the copies of replicated data consistent introduce a delay when operations are performed. A technique is described that relaxes the usual degree of synchronization, permitting replicated data items to be updated concurrently with other operations, while at the same time ensuring that correctness is not violated. The additional concurrency thus obtained results in better response time when performing operations on replicated data. How this technique performs in conjunction with a roll-back and a roll-forward failure recovery mechanism is also discussed.

  12. Coordinated Fault Tolerance for High-Performance Computing

    Energy Technology Data Exchange (ETDEWEB)

    Dongarra, Jack; Bosilca, George; et al.

    2013-04-08

    Our work to meet our goal of end-to-end fault tolerance has focused on two areas: (1) improving fault tolerance in various software currently available and widely used throughout the HEC domain and (2) using fault information exchange and coordination to achieve holistic, systemwide fault tolerance and understanding how to design and implement interfaces for integrating fault tolerance features for multiple layers of the software stack—from the application, math libraries, and programming language runtime to other common system software such as jobs schedulers, resource managers, and monitoring tools.

  13. Fault tolerant operation of switched reluctance machine

    Science.gov (United States)

    Wang, Wei

    The energy crisis and environmental challenges have driven industry towards more energy efficient solutions. With nearly 60% of electricity consumed by various electric machines in industry sector, advancement in the efficiency of the electric drive system is of vital importance. Adjustable speed drive system (ASDS) provides excellent speed regulation and dynamic performance as well as dramatically improved system efficiency compared with conventional motors without electronics drives. Industry has witnessed tremendous grow in ASDS applications not only as a driving force but also as an electric auxiliary system for replacing bulky and low efficiency auxiliary hydraulic and mechanical systems. With the vast penetration of ASDS, its fault tolerant operation capability is more widely recognized as an important feature of drive performance especially for aerospace, automotive applications and other industrial drive applications demanding high reliability. The Switched Reluctance Machine (SRM), a low cost, highly reliable electric machine with fault tolerant operation capability, has drawn substantial attention in the past three decades. Nevertheless, SRM is not free of fault. Certain faults such as converter faults, sensor faults, winding shorts, eccentricity and position sensor faults are commonly shared among all ASDS. In this dissertation, a thorough understanding of various faults and their influence on transient and steady state performance of SRM is developed via simulation and experimental study, providing necessary knowledge for fault detection and post fault management. Lumped parameter models are established for fast real time simulation and drive control. Based on the behavior of the faults, a fault detection scheme is developed for the purpose of fast and reliable fault diagnosis. In order to improve the SRM power and torque capacity under faults, the maximum torque per ampere excitation are conceptualized and validated through theoretical analysis and experiments. With the proposed optimal waveform, torque production is greatly improved under the same Root Mean Square (RMS) current constraint. Additionally, position sensorless operation methods under phase faults are investigated to account for the combination of physical position sensor and phase winding faults. A comprehensive solution for position sensorless operation under single and multiple phases fault are proposed and validated through experiments. Continuous position sensorless operation with seamless transition between various numbers of phase fault is achieved.

  14. Robot Position Sensor Fault Tolerance

    Science.gov (United States)

    Aldridge, Hal A.

    1997-01-01

    Robot systems in critical applications, such as those in space and nuclear environments, must be able to operate during component failure to complete important tasks. One failure mode that has received little attention is the failure of joint position sensors. Current fault tolerant designs require the addition of directly redundant position sensors which can affect joint design. A new method is proposed that utilizes analytical redundancy to allow for continued operation during joint position sensor failure. Joint torque sensors are used with a virtual passive torque controller to make the robot joint stable without position feedback and improve position tracking performance in the presence of unknown link dynamics and end-effector loading. Two Cartesian accelerometer based methods are proposed to determine the position of the joint. The joint specific position determination method utilizes two triaxial accelerometers attached to the link driven by the joint with the failed position sensor. The joint specific method is not computationally complex and the position error is bounded. The system wide position determination method utilizes accelerometers distributed on different robot links and the end-effector to determine the position of sets of multiple joints. The system wide method requires fewer accelerometers than the joint specific method to make all joint position sensors fault tolerant but is more computationally complex and has lower convergence properties. Experiments were conducted on a laboratory manipulator. Both position determination methods were shown to track the actual position satisfactorily. A controller using the position determination methods and the virtual passive torque controller was able to servo the joints to a desired position during position sensor failure.

  15. Quantum Error Correction and Fault-Tolerance

    CERN Document Server

    Gottesman, D

    2005-01-01

    I give an overview of the basic concepts behind quantum error correction and quantum fault tolerance. This includes the quantum error correction conditions, stabilizer codes, CSS codes, transversal gates, fault-tolerant error correction, and the threshold theorem.

  16. Reversible Fault-Tolerant Logic

    CERN Document Server

    Boykin, P O; Roychowdhury, Vwani P.

    2005-01-01

    It is now widely accepted that the CMOS technology implementing irreversible logic will hit a scaling limit beyond 2016, and that the increased power dissipation is a major limiting factor. Reversible computing can potentially require arbitrarily small amounts of energy. Recently several nano-scale devices which have the potential to scale, and which naturally perform reversible logic, have emerged. This paper addresses several fundamental issues that need to be addressed before any nano-scale reversible computing systems can be realized, including reliability and performance trade-offs and architecture optimization. Many nano-scale devices will be limited to only near neighbor interactions, requiring careful optimization of circuits. We provide efficient fault-tolerant (FT) circuits when restricted to both 2D and 1D. Finally, we compute bounds on the entropy (and hence, heat) generated by our FT circuits and provide quantitative estimates on how large can we make our circuits before we lose any advantage ove...

  17. Methods and apparatuses for self-generating fault-tolerant keys in spread-spectrum systems

    Energy Technology Data Exchange (ETDEWEB)

    Moradi, Hussein; Farhang, Behrouz; Subramanian, Vijayarangam

    2015-12-22

    Self-generating fault-tolerant keys for use in spread-spectrum systems are disclosed. At a communication device, beacon signals are received from another communication device and impulse responses are determined from the beacon signals. The impulse responses are circularly shifted to place a largest sample at a predefined position. The impulse responses are converted to a set of frequency responses in a frequency domain. The frequency responses are shuffled with a predetermined shuffle scheme to develop a set of shuffled frequency responses. A set of phase differences is determined as a difference between an angle of the frequency response and an angle of the shuffled frequency response at each element of the corresponding sets. Each phase difference is quantized to develop a set of secret-key quantized phases and a set of spreading codes is developed wherein each spreading code includes a corresponding phase of the set of secret-key quantized phases.

  18. Methods and apparatuses for self-generating fault-tolerant keys in spread-spectrum systems

    Energy Technology Data Exchange (ETDEWEB)

    Moradi, Hussein; Farhang, Behrouz; Subramanian, Vijayarangam

    2015-12-15

    Self-generating fault-tolerant keys for use in spread-spectrum systems are disclosed. At a communication device, beacon signals are received from another communication device and impulse responses are determined from the beacon signals. The impulse responses are circularly shifted to place a largest sample at a predefined position. The impulse responses are converted to a set of frequency responses in a frequency domain. The frequency responses are shuffled with a predetermined shuffle scheme to develop a set of shuffled frequency responses. A set of phase differences is determined as a difference between an angle of the frequency response and an angle of the shuffled frequency response at each element of the corresponding sets. Each phase difference is quantized to develop a set of secret-key quantized phases and a set of spreading codes is developed wherein each spreading code includes a corresponding phase of the set of secret-key quantized phases.

  19. A Fault-Tolerant Modulation Method to Counteract the Double Open-Switch Fault in Matrix Converter Drive Systems without Redundant Power Devices

    DEFF Research Database (Denmark)

    Chen, Der-Fa; Nguyen-Duy, Khiem

    2012-01-01

    This paper studies the double open-switch fault issue occurring within the conventional matrix converter driving a three-phase permanent-magnet synchronous motor system and proposes a fault-tolerant solution by introducing a revised modulation strategy. In this switching strategy, the rectifier-stage modulation is adjusted based on the knowledge of the switching logics of the inverter-stage and the operating input voltage sectors. However, the proposed fault-tolerant method does not rely on the assist of any redundant power devices or any reconfiguration of the matrix converter circuit by means of using redundant physical connections. It is shown that different locations of the double open switch affect the availability of the revised modulation. The steady state absolute speed error achieved with the proposed method is 4% of the nominal speed. Experimental results are performed to demonstrate the efficacy of the proposed methods.

  20. Fault tolerant architectures for superconducting qubits

    CERN Document Server

    DiVincenzo, David P

    2009-01-01

    In this short review, I draw attention to new developments in the theory of fault tolerance in quantum computation that may give concrete direction to future work in the development of superconducting qubit systems. The basics of quantum error correction codes, which I will briefly review, have not significantly changed since their introduction fifteen years ago. But an interesting picture has emerged of an efficient use of these codes that may put fault tolerant operation within reach. It is now understood that two dimensional surface codes, close relatives of the original toric code of Kitaev, can be adapted to effectively perform logical gate operations in a very simple planar architecture, with error thresholds for fault tolerant operation simulated to be 0.75%. This architecture uses topological ideas in its functioning, but it is not 'topological quantum computation' -- there are no non-abelian anyons in sight. I offer some speculations on the crucial pieces of superconducting hardware that could be dem...

  1. Early Error Detection for Fault Tolerance Strategies

    OpenAIRE

    ROBERT, THOMAS; Roy, Matthieu; Fabre, Jean-Charles

    2010-01-01

    In this paper we present an integration of early run-time monitors in real-time systems to improve their fault tolerance properties. Early Error Detection is a mechanism that provides a theoretically optimal run-time error detection service, based on a formal specification of an application, e.g., given by a timed automata. We show how our approach can improve classical fault tolerance strategies by investigating two use-cases, namely for a design pattern that provides several degraded modes ...

  2. Fault tolerance using self-checking building-block computers

    Science.gov (United States)

    Rennels, D. A.

    1978-01-01

    The paper attempts to define and characterize a set of VLSI (very large scale integration) building-block circuits which can be used to combine existing microprocessors and memories into a wide variety of fault-tolerant computing systems. Such VLSI circuits would transform fault-tolerant computing into an off-the-shelf technology and enable its routine use for new applications. The self-checking computer module (SCCM) is the basic component out of which fault-tolerant computer systems are constructed. Several fault-tolerant configurations of SCCM are discussed, including the standby redundant uniprocessor, the voted/hybrid uniprocessor, and the distributed computer network.

  3. Minimization of the effect of fault-clearance period in the fault-tolerant sensing of an intelligent system

    Science.gov (United States)

    Rokonuzzaman, Mohd; Gosine, Raymond G.

    1997-09-01

    An intelligent system (IS) senses, reasons and acts to perform its required tasks. Sensors are used to sense environmental parameters, and through the computational intelligence the system understands the situation and takes appropriate steps toward the desired performance. To deploy systems for mission- critical applications, the underlying technology should have the ability to detect the failure of the components as well as to replace faulty components with fault-free ones within a specified time window known as fault-clearance period. The presence of fault-clearance period in the perception phase of system operation results in the loss of on-line data from different sensors and the subsequent loss of valuable information about the environmental parameters. After fault- clearance, a repetition of the sensing cycle will not recover the lost data in sensing highly transient non-periodic signals. Moreover, the unpredictable repetition will create significant overhead to satisfy the stringent timing requirements of the system. A new scheme has been developed to minimize the loss of these real-time sensor's data during fault-clearance period. This scheme is based on the restoration of data through parallel sensing. The restoration processes for both dual and triple modular redundancy schemes have been developed. The effects of both hardware and software implementation of voting logic on the performance of the system and the quality of restoration have been shown. It has been shown that this scheme is capable to recover most of the lost data during fault-clearance.

  4. Simulation Framework for Evaluation of Fault Tolerant Large Dynamic Distributed System

    Directory of Open Access Journals (Sweden)

    Sanjay Bansal

    2012-08-01

    Full Text Available The use of Java based simulators in the design and development of distributed system for evaluating the dependability on algorithms is appreciable due to their efficiency and scalability. It allows in designing the realistic simulation scenarios. In this work, we have proposed a Saturn, a multithreaded process oriented over simulation framework which is designed for modeling large scale distributed system. Realistic simulation is provided by it to provide a wide-range of distributed system technologies. It is an innovative solution to the problem of evaluating dependability characteristics of distributed system. Our solution is based on several proposed extensions to the simulation model of the MONARC simulation framework. These extensions refer to fault tolerance and system orchestration mechanisms in order to access the reliability and availability of distributed systems. The extended simulation model includes the necessary components to describe various actual failure situations and provides the mechanism to evaluate different strategies for replication and redundancy procedure as well as security enforcement mechanism. It is a simulator which also evaluates major QoS of the heartbeat based adaptive failure detection mechanism.

  5. Parallel fault-tolerant robot control

    Science.gov (United States)

    Hamilton, D. L.; Bennett, J. K.; Walker, I. D.

    1992-01-01

    A shared memory multiprocessor architecture is used to develop a parallel fault-tolerant robot controller. Several versions of the robot controller are developed and compared. A robot simulation is also developed for control observation. Comparison of a serial version of the controller and a parallel version without fault tolerance showed the speedup possible with the coarse-grained parallelism currently employed. The performance degradation due to the addition of processor fault tolerance was demonstrated by comparison of these controllers with their fault-tolerant versions. Comparison of the more fault-tolerant controller with the lower-level fault-tolerant controller showed how varying the amount of redundant data affects performance. The results demonstrate the trade-off between speed performance and processor fault tolerance.

  6. Fault-tolerant Control of Discrete-time LPV systems using Virtual Actuators and Sensors

    DEFF Research Database (Denmark)

    Tabatabaeipour, Mojtaba; Stoustrup, Jakob

    2015-01-01

    This paper proposes a new fault-tolerant control (FTC) method for discrete-time linear parameter varying (LPV) systems using a reconfiguration block. The basic idea of the method is to achieve the FTC goal without re-designing the nominal controller by inserting a reconfiguration block between the plant and the nominal controller. The reconfiguration block is realized by an LPV virtual actuator and an LPV virtual sensor. Its goal is to transform the signals from the faulty system such that its behavior is similar to that of the nominal system from the viewpoint of the controller. Furthermore, it transforms the output of the controller for the faulty system such that the stability and performance goals are preserved. Input-to-state stabilizing LPV gains of the virtual actuator and sensor are obtained by solving linear matrix inequalities (LMIs). We show that separate design of these gains guarantees the input-to-state stability (ISS) of the closed-loop reconfigured system. Moreover, we obtain performances in terms of the ISS gains for the virtual actuator, the virtual sensor and their interconnection. Minimizing these performances is formulated as convex optimization problems subject to LMI constraints. Finally, the effectiveness of the method is demonstrated via a numerical example and stator current control of an induction motor.

  7. Scheduling of Fault-Tolerant Embedded Systems with Soft and Hard Timing Constraints

    DEFF Research Database (Denmark)

    Izosimov, Viacheslav; Pop, Paul

    2008-01-01

    In this paper we present an approach to the synthesis of fault-tolerant schedules for embedded applications with soft and hard real-time constraints. We are interested to guarantee the deadlines for the hard processes even in the case of faults, while maximizing the overall utility. We use time/utility functions to capture the utility of soft processes. Process re-execution is employed to recover from multiple faults. A single static schedule computed off-line is not fault tolerant and is pessimistic in terms of utility, while a purely online approach, which computes a new schedule every time a process fails or completes, incurs an unacceptable overhead. Thus, we use a quasi-static scheduling strategy, where a set of schedules is synthesized off-line and, at run time, the scheduler will select the right schedule based on the occurrence of faults and the actual execution times of processes. The proposed schedule synthesis heuristics have been evaluated using extensive experiments.

  8. Fault model development for fault tolerant VLSI design

    Science.gov (United States)

    Hartmann, C. R.; Lala, P. K.; Ali, A. M.; Visweswaran, G. S.; Ganguly, S.

    1988-05-01

    Fault models provide systematic and precise representations of physical defects in microcircuits in a form suitable for simulation and test generation. The current difficulty in testing VLSI circuits can be attributed to the tremendous increase in design complexity and the inappropriateness of traditional stuck-at fault models. This report develops fault models for three different types of common defects that are not accurately represented by the stuck-at fault model. The faults examined in this report are: bridging faults, transistor stuck-open faults, and transient faults caused by alpha particle radiation. A generalized fault model could not be developed for the three fault types. However, microcircuit behavior and fault detection strategies are described for the bridging, transistor stuck-open, and transient (alpha particle strike) faults. The results of this study can be applied to the simulation and analysis of faults in fault tolerant VLSI circuits.

  9. A continuous-time semi-markov bayesian belief network model for availability measure estimation of fault tolerant systems

    OpenAIRE

    Márcio das Chagas Moura; Enrique López Droguett

    2008-01-01

    In this work it is proposed a model for the assessment of availability measure of fault tolerant systems based on the integration of continuous time semi-Markov processes and Bayesian belief networks. This integration results in a hybrid stochastic model that is able to represent the dynamic characteristics of a system as well as to deal with cause-effect relationships among external factors such as environmental and operational conditions. The hybrid model also allows for uncertainty propaga...

  10. Fault Tolerant External Memory Algorithms

    DEFF Research Database (Denmark)

    JØrgensen, Allan GrØnlund; Brodal, Gerth StØlting

    2009-01-01

    Algorithms dealing with massive data sets are usually designed for I/O-efficiency, often captured by the I/O model by Aggarwal and Vitter. Another aspect of dealing with massive data is how to deal with memory faults, e.g. captured by the adversary based faulty memory RAM by Finocchi and Italiano. However, current fault tolerant algorithms do not scale beyond the internal memory. In this paper we investigate for the first time the connection between I/O-efficiency in the I/O model and fault tolerance in the faulty memory RAM, and we assume that both memory and disk are unreliable. We show a lower bound on the number of I/Os required for any deterministic dictionary that is resilient to memory faults. We design a static and a dynamic deterministic dictionary with optimal query performance as well as an optimal sorting algorithm and an optimal priority queue. Finally, we consider scenarios where only cells in memory or only cells on disk are corruptible and separate randomized and deterministic dictionaries in the latter.

  11. Architecture for Intrusion Detection System with Fault Tolerance Using Mobile Agent

    OpenAIRE

    Chintan Bhatt; Asha Koshti; Hemant Agrawal; Zakiya Malek; Bhushan Trivedi

    2011-01-01

    This paper is a survey of the work, done for making an IDS fault tolerant.Architecture of IDS that usesmobile Agent provides higher scalability. Mobile Agent uses Platform for detecting Intrusions using filterAgent, co-relater agent, Interpreter agent and rule database. When server (IDS Monitor) goes down,other hosts based on priority takes Ownership. This architecture uses decentralized collection andanalysis for identifying Intrusion. Rule sets are fed based on user-behaviour or application...

  12. Software-implemented hardware fault tolerance

    CERN Document Server

    Goloubeva, O; Sonza Reorda, M

    2006-01-01

    Addresses the topic of software-implemented hardware fault tolerance (SIHFT), that is, how to deal with faults affecting the hardware by only (or mainly) acting on the software. This book presents the theory behind software-implemented hardware fault tolerance, as well as the practical aspects related to put it at work on real examples.

  13. Fault-tolerant control for current sensors of doubly fed induction generators based on an improved fault detection method

    DEFF Research Database (Denmark)

    Li, Hui; Yang, Chao

    2014-01-01

    Fault-tolerant control of current sensors is studied in this paper to improve the reliability of a doubly fed induction generator (DFIG). A fault-tolerant control system of current sensors is presented for the DFIG, which consists of a new current observer and an improved current sensor fault detection algorithm. The current observer is constructed by using only voltage signals as inputs. The fault detection algorithm is based on the current observer, in which an adaptive threshold and different fault duration times are considered. The performance of the proposed observer, improved fault detection algorithm, and fault-tolerant control system are investigated by simulation. The results indicate that the outputs of the observer and the sensor are highly coherent. The fault detection algorithm can efficiently detect both soft and hard faults in current sensors, and the fault-tolerant control system can effectively tolerate both types of faults. © 2013 Published by Elsevier Ltd. All rights reserved.

  14. Electrical Steering of Vehicles - Fault-tolerant Analysis and Design

    DEFF Research Database (Denmark)

    Blanke, Mogens; Thomsen, Jesper Sandberg

    2006-01-01

    The topic of this paper is systems that need be designed such that no single fault can cause failure at the overall level. A methodology is presented for analysis and design of fault-tolerant architectures, where diagnosis and autonomous reconfiguration can replace high cost triple redundancy solutions and still meet strict requirements to functional safety. The paper applies graph-based analysis of functional system structure to find a novel fault-tolerant architecture for an electrical steerin...

  15. Fault Tolerance in Distributed Ada 95

    OpenAIRE

    Wolf, Thomas

    1997-01-01

    In this paper we present a project to provide fault tolerance in distributed Ada 95 application by means of replication of partitions. Replication is intended to be largely transparent to an application. A group communication system is used for replica management. We examine some of the possibilities for implementing such a system and highlight some of the difficulties encountered in the context of the programming language Ada 95.

  16. Dynamic and fault-tolerant cluster management

    OpenAIRE

    Gidenstam, Anders; Koldehofe, Boris; Papatriantafilou, Marina; Tsigas, Philippas

    2005-01-01

    Recent decentralised event-based systems have focused on providing event delivery which scales with increasing number of processes. While the main focus of research has been on ensuring that processes maintain only a small amount of information on maintaining membership and routing, an important factor in achieving scalability for event-based peer-to-peer dissemination system is the number of events disseminated at the same time. This work presents a dynamic and fault tolerant cluster managem...

  17. Techniques for modeling the reliability of fault-tolerant systems with the Markov state-space approach

    Science.gov (United States)

    Butler, Ricky W.; Johnson, Sally C.

    1995-01-01

    This paper presents a step-by-step tutorial of the methods and the tools that were used for the reliability analysis of fault-tolerant systems. The approach used in this paper is the Markov (or semi-Markov) state-space method. The paper is intended for design engineers with a basic understanding of computer architecture and fault tolerance, but little knowledge of reliability modeling. The representation of architectural features in mathematical models is emphasized. This paper does not present details of the mathematical solution of complex reliability models. Instead, it describes the use of several recently developed computer programs SURE, ASSIST, STEM, and PAWS that automate the generation and the solution of these models.

  18. Task Mapping and Bandwidth Reservation for Mixed Hard/Soft Fault-Tolerant Embedded Systems

    DEFF Research Database (Denmark)

    Saraswat, Prabhat Kumar; Pop, Paul

    2010-01-01

    In this paper we are interested in mixed hard/soft real-time fault-tolerant applications mapped on distributed heterogeneous architectures. We use the Earliest Deadline First (EDF) scheduling for the hard real-time tasks and the Constant Bandwidth Server (CBS) for the soft tasks. The bandwidth reserved for the servers determines the quality of service (QoS) for soft tasks. CBS enforces temporal isolation, such that soft task overruns do not affect the timing guarantees of hard tasks. Transient faults in hard tasks are tolerated using checkpointing with rollback recovery. We have proposed a Tabu Search-based approach for task mapping and CBS bandwidth reservation, such that the deadlines for the hard tasks are satisfied, even in the case of transient faults, and the QoS for the soft tasks is maximized. Researchers have used fixed execution time models, such as the worst-case execution times for hard tasks and average execution times for soft tasks. However, we show that by using stochastic execution times for soft tasks, significant improvements can be obtained. The proposed strategy has been evaluated using an extensive set of benchmarks.

  19. Theory of fault-tolerant quantum computation

    Energy Technology Data Exchange (ETDEWEB)

    Gottesman, D. [California Institute of Technology, Pasadena, California 91125 (United States)]|[Los Alamos National Laboratories, Los Alamos, New Mexico 87545 (United States)

    1998-01-01

    In order to use quantum error-correcting codes to improve the performance of a quantum computer, it is necessary to be able to perform operations fault-tolerantly on encoded states. I present a theory of fault-tolerant operations on stabilizer codes based on symmetries of the code stabilizer. This allows a straightforward determination of which operations can be performed fault-tolerantly on a given code. I demonstrate that fault-tolerant universal computation is possible for any stabilizer code. I discuss a number of examples in more detail, including the five-quantum-bit code. {copyright} {ital 1998} {ital The American Physical Society}

  20. Efficient Fault-Tolerant Strategy Selection Algorithm in Cloud Computing

    Directory of Open Access Journals (Sweden)

    P.Priyanka

    2014-02-01

    Full Text Available Cloud computing is upcoming a mainstream feature of information technology. More progressively enterprises deploy their software systems in the cloud environment. The applications in cloud are usually large scale and containing a lot of distributed cloud components. Building cloud applications is highly reliable for challenging and critical research issues. Information processing systems has increased the significance of its correct and continuous operation even in the presence of faulty components. To address this issue, proposes a cloud framework to build fault-tolerant cloud applications. We first propose fault detection algorithms to identify significant components from the huge amount of cloud components. Then, we present an efficient fault-tolerance strategy selection algorithm to determine the most suitable fault-tolerance strategy for each significant component. Software fault tolerance is widely adopted to increase the overall system reliability in critical applications. System reliability can be enhanced by employing functionally equivalent components to tolerate component failures. Fault-tolerance strategies introduced a three well-known techniques are in the following with formulas for calculating the failure probabilities of the fault-tolerant modules. Our work will mainly be driven toward the implementation of the framework to measure the strength of fault tolerance service and to make an in-depth analysis of the cost benefits among all the stakeholders. An algorithm is proposed to automatically determine an efficient fault-tolerance strategy for the significant cloud components. Using real failure traces and model, we evaluate the proposed resource provisioning policies to determine their performance, cost as well as cost efficiency. The experimental results show that by tolerating faults of a small part of the most important components, the reliability of cloud applications can be highly improved.

  1. System Wide Joint Position Sensor Fault Tolerance in Robot Systems Using Cartesian Accelerometers

    Science.gov (United States)

    Aldridge, Hal A.; Juang, Jer-Nan

    1997-01-01

    Joint position sensors are necessary for most robot control systems. A single position sensor failure in a normal robot system can greatly degrade performance. This paper presents a method to obtain position information from Cartesian accelerometers without integration. Depending on the number and location of the accelerometers. the proposed system can tolerate the loss of multiple position sensors. A solution technique suitable for real-time implementation is presented. Simulations were conducted using 5 triaxial accelerometers to recover from the loss of up to 4 joint position sensors on a 7 degree of freedom robot moving in general three dimensional space. The simulations show good estimation performance using non-ideal accelerometer measurements.

  2. Robust MPC for actuator-fault tolerance using set-based passive fault detection and active fault isolation

    OpenAIRE

    Xu, Feng; Puig Cayuela, Vicenç; Ocampo-Martínez, Carlos; Olaru, Sorin; Niculescu, Silviu-Iulian

    2014-01-01

    In this paper, an actuator fault-tolerant control (FTC) scheme is proposed, which is based on tube-based model predictive control (MPC) and set-theoretic fault detection and isolation (FDI). As a robust MPC technique, tube-based MPC, can effectively deal with system constraints and uncertainties with relatively low computational complexity. Set-based FDI can robustly detect and isolate actuator faults. Here, fault detection (FD) is passive by invariant sets, while fault isolation (FI) is acti...

  3. A Dynamic Effective Fault Tolerance System in Robotic Manipulator using a Hybrid Neural Network based Controller

    OpenAIRE

    G. Jiji; M.Rajaram

    2014-01-01

    Robot manipulator play important role in the field of automobile industry, mainly it is used in gas welding application and manufacturing and assembling of motor parts. In complex trajectory, on each joint the speed of the robot manipulator is affected. For that reason, it is necessary to analyze the noise and vibration of robot's joints for predicting faults also improve the control precision of robotic manipulator. In this study we will propose a new fault detection system for Robot manipul...

  4. Incorporating fault tolerance in distributed agent based systems by simulating bio-computing model of stress pathways

    Science.gov (United States)

    Bansal, Arvind K.

    2006-05-01

    Bio-computing model of 'Distributed Multiple Intelligent Agents Systems' (BDMIAS) models agents as genes, a cooperating group of agents as operons - commonly regulated groups of genes, and the complex task as a set of interacting pathways such that the pathways involve multiple cooperating operons. The agents (or groups of agents) interact with each other using message passing and pattern based bindings that may reconfigure agent's function temporarily. In this paper, a technique has been described for incorporating fault tolerance in BDMIAS. The scheme is based upon simulating BDMIAS, exploiting the modeling of biological stress pathways, integration of fault avoidance, and distributed fault recovery of the crashed agents. Stress pathways are latent pathways in biological system that gets triggered very quickly, regulate the complex biological system by temporarily regulating or inactivating the undesirable pathways, and are essential to avoid catastrophic failures. Pattern based interaction between messages and agents allow multiple agents to react concurrently in response to single condition change represented by a message broadcast. The fault avoidance exploits the integration of the intelligent processing rate control using message based loop feedback and temporary reconfiguration that alters the data flow between functional modules within an agent, and may alter. The fault recovery exploits the concept of semi passive shadow agents - one on the local machine and other on the remote machine, dynamic polling of machines, logically time stamped messages to avoid message losses, and distributed archiving of volatile part of agent state on distributed machines. Various algorithms have been described.

  5. Enhancement of Fault Tolerance in Cloud Computing

    OpenAIRE

    Pushpanjali Gupta; Rasmi Ranjan Patra

    2014-01-01

    In recent years researchers are trying to work out scientific applications in cloud so that it decreases the infrastructure cost and increases the span of team and finally innovative ideas towards applications is increased. But the cloud is still not as much reliable, controllable as grid. So in the evolving Cloud computing environment there is a great need of fault tolerance mechanism for the system to work effectively even in the presence of failure. Moreover Big Organizations ar...

  6. A Concept for fault tolerant controllers

    DEFF Research Database (Denmark)

    Niemann, Hans Henrik; Poulsen, Niels KjØlstad

    2009-01-01

    This paper describe a concept for fault tolerant controllers (FTC) based on the YJBK (after Youla, Jabr, Bongiorno and Kucera) parameterization. This controller architecture will allow to change the controller on-line in the case of faults in the system. In the described FTC concept, a safe mode controller is applied as the basic feedback controller. A controller for normal operation with high performance is obtained by including certain YJBK parameters (transfer functions) in the controller. This will allow a fast switch from normal operation to safe mode operation in case of critical faults in the system. The described FTC architecture allow the different feedback controllers to apply different sets of sensors and actuators.

  7. Design of an adaptive fault tolerant control: case of sensor faults

    OpenAIRE

    Kheder, Atef; Ben Othman, Kamel; Maquin, Didier; Benrejeb, Mohamed

    2010-01-01

    This paper presents a method of design of a sensor faults tolerant control. The method is presented for the case of linear systems and then for the case of non linear systems described by Takagi-Sugeno models. The faults are initially estimated using a proportional integral observer. A mathematical transformation is used to conceive an augmented system in which the sensor fault appear as an unknown inputs. The synthesized control depends on the estimated faults and the error between the state...

  8. Fault Tolerant Ethernet Based Network for Time Sensitive Applications in Electrical Power Distribution Systems

    Directory of Open Access Journals (Sweden)

    Leos Bohac

    2013-01-01

    Full Text Available The paper analyses and experimentally verifies deployment of Ethernet based network technology to enable fault tolerant and timely exchange of data among a number of high voltage protective relays that use proprietary serial communication line to exchange data in real time on a state of its high voltage circuitry facilitating a fast protection switching in case of critical failures. The digital serial signal is first fetched into PCM multiplexer where it is mapped to the corresponding E1 (2 Mbit/s time division multiplexed signal. Subsequently, the resulting E1 frames are then packetized and sent through Ethernet control LAN to the opposite PCM demultiplexer where the same but reverse processing is done finally sending a signal into the opposite protective relay. The challenge of this setup is to assure very timely delivery of the control information between protective relays even in the cases of potential failures of Ethernet network itself. The tolerance of Ethernet network to faults is assured using widespread per VLAN Rapid Spanning Tree Protocol potentially extended by 1+1 PCM protection as a valuable option.

  9. Recovery in fault-tolerant distributed microcontrollers

    Science.gov (United States)

    Hwang, Riki I.-Ming

    A critical problem facing both the government and commercial space program is the need for lower cost, higher performance and lower power consumption for on-board processing. Special radiation hardened processors have been developed to operate in the space radiation environment, but they are typically one to two orders of magnitude behind the performance of commercial devices, and they consume much more power. Yet there is a need for much greater processing performance in most future space missions. The use of commercial (designated COTS Commercial Off-the-Shelf) processors in space has been prevented by the fact that the space radiation environment causes a unacceptably high transient error rate---derailing their computations every few hours [MESS 92]. However, protective redundancy can be employed along with the technology of fault-tolerant computing to automatically recover from such errors and thus enable their use. This thesis focuses on one aspect of this problem, the embedded microcontrollers highly integrated computer system on a single chip that, not unlike those used in modern automobiles, control various subsystems that make up a spacecraft. This thesis examines tradeoffs and experiments with design techniques required to implement fault-tolerant distributed networks using embedded microcontroller processing nodes. A new fault-tolerant node architecture was developed that allows differing amounts of redundancy to be employed with minimal design change. This includes a special isolated wire-or output system that allows modules to be powered down to recover from some potentially destructive radiation events (latchup). An novel recovery approach was developed that uses comparison voting for error detection and recovery but also employs a "stable" set of recovery actions to allow recovery if multiple errors or Byzantine behaviors occur. Finally, a redundant intercommunication architecture between embedded processing nodes was developed that provides fault-tolerance in communications between them. A testbed has been constructed, a real-time executive has been developed, and a supporting test environment has also been implemented to allow fault-insertion testing of the experimental architecture. Our initial results strongly support the viability of the fault-tolerance approaches we have developed.

  10. Electrical Steering of Vehicles - Fault-tolerant Analysis and Design

    DEFF Research Database (Denmark)

    Blanke, Mogens; Thomsen, Jesper Sandberg

    2006-01-01

    The topic of this paper is systems that need be designed such that no single fault can cause failure at the overall level. A methodology is presented for analysis and design of fault-tolerant architectures, where diagnosis and autonomous reconfiguration can replace high cost triple redundancy solutions and still meet strict requirements to functional safety. The paper applies graph-based analysis of functional system structure to find a novel fault-tolerant architecture for an electrical steering where a dedicated AC-motor design and cheap voltage measurements ensure ability to detect all relevant faults. The paper shows how active control reconfiguration can accommodate all critical faults and the fault-tolerant abilities are demonstrated on a warehouse truck hardware.

  11. Designing fault-tolerant real-time computer systems with diversified bus architecture for nuclear power plants

    International Nuclear Information System (INIS)

    Fault-tolerant real-time computer (FT-RTC) systems are widely used to perform safe operation of nuclear power plants (NPP) and safe shutdown in the event of any untoward situation. Design requirements for such systems need high reliability, availability, computational ability for measurement via sensors, control action via actuators, data communication and human interface via keyboard or display. All these attributes of FT-RTC systems are required to be implemented using best known methods such as redundant system design using diversified bus architecture to avoid common cause failure, fail-safe design to avoid unsafe failure and diagnostic features to validate system operation. In this context, the system designer must select efficient as well as highly reliable diversified bus architecture in order to realize fault-tolerant system design. This paper presents a comparative study between CompactPCI bus and Versa Module Eurocard (VME) bus architecture for designing FT-RTC systems with switch over logic system (SOLS) for NPP. (author)

  12. Fault-tolerant architectures for superconducting qubits

    Energy Technology Data Exchange (ETDEWEB)

    DiVincenzo, David P [IBM Research Division, Thomas J Watson Research Center, Yorktown Heights, NY 10598 (United States)], E-mail: divince@watson.ibm.com

    2009-12-15

    In this short review, I draw attention to new developments in the theory of fault tolerance in quantum computation that may give concrete direction to future work in the development of superconducting qubit systems. The basics of quantum error-correction codes, which I will briefly review, have not significantly changed since their introduction 15 years ago. But an interesting picture has emerged of an efficient use of these codes that may put fault-tolerant operation within reach. It is now understood that two-dimensional surface codes, close relatives of the original toric code of Kitaev, can be adapted as shown by Raussendorf and Harrington to effectively perform logical gate operations in a very simple planar architecture, with error thresholds for fault-tolerant operation simulated to be 0.75%. This architecture uses topological ideas in its functioning, but it is not 'topological quantum computation'-there are no non-abelian anyons in sight. I offer some speculations on the crucial pieces of superconducting hardware that could be demonstrated in the next couple of years that would be clear stepping stones towards this surface-code architecture.

  13. Fault-tolerant architectures for superconducting qubits

    International Nuclear Information System (INIS)

    In this short review, I draw attention to new developments in the theory of fault tolerance in quantum computation that may give concrete direction to future work in the development of superconducting qubit systems. The basics of quantum error-correction codes, which I will briefly review, have not significantly changed since their introduction 15 years ago. But an interesting picture has emerged of an efficient use of these codes that may put fault-tolerant operation within reach. It is now understood that two-dimensional surface codes, close relatives of the original toric code of Kitaev, can be adapted as shown by Raussendorf and Harrington to effectively perform logical gate operations in a very simple planar architecture, with error thresholds for fault-tolerant operation simulated to be 0.75%. This architecture uses topological ideas in its functioning, but it is not 'topological quantum computation'-there are no non-abelian anyons in sight. I offer some speculations on the crucial pieces of superconducting hardware that could be demonstrated in the next couple of years that would be clear stepping stones towards this surface-code architecture.

  14. A Dynamic Slack Management Technique for Real-Time Distributed Embedded System with Enhanced Fault Tolerance and Resource Constraints

    Directory of Open Access Journals (Sweden)

    Santhi Baskaran,

    2011-01-01

    Full Text Available This project work aims to develop a dynamic slack management technique, for real-time distributed embedded systems to reduce the total energy consumption in addition to timing, precedence and resource constraints. The Slack Distribution Technique proposed considers a modified Feedback Control Scheduling (FCS algorithm. This algorithm schedules dependent tasks effectively with precedence and resource constraints. It further minimizes the schedule length and utilizes the available slack to increase the energy efficiency. A fault tolerant mechanism uses a deferred-active-backup scheme increases the schedulability and provides reliability to the system.

  15. Fault-Tolerant Precision Formation Guidance for Interferometry Project

    Data.gov (United States)

    National Aeronautics and Space Administration — A methodology is to be developed that will allow the development and implementation of fault-tolerant control system for distributed collaborative spacecraft. The...

  16. Fault Tolerant Environment in web crawler Using Hardware Failure Detection

    Directory of Open Access Journals (Sweden)

    Anup Garje , Prof. Bhavesh Patel , Dr. B. B. Mesharm

    2012-06-01

    Full Text Available Fault Tolerant Environment is a complete programming environment for the reliable execution of distributed application programs. Fault Tolerant Distributed Environment encompasses all aspects of modern fault-tolerant distributed computing. The built-in user-transparent error detection mechanism covers processor node crashes and hardware transient failures. The mechanism also integrates user-assisted error checks into the system failure model. The nucleus non-blocking checkpointing mechanism combined with a novel low overhead roll forward recovery scheme delivers an efficient, low-overload backup and recovery mechanism for distributed processes. Fault Tolerant Distributed Environment also provides a means of remote automatic process allocation on distributed system nodes. In case of recovery is not possible, we can use new microrebooting approach to store the system to stable state.

  17. Microcontroller-Based Fault Tolerant Data Acquisition System For Air Quality Monitoring And Control Of Environmental Pollution

    Directory of Open Access Journals (Sweden)

    Tochukwu Chiagunye

    2015-08-01

    Full Text Available ABSTRACT The design applied Passive fault tolerance to a microcontroller based data acquisition system to achieve the stated considerations where redundant sensors and microcontrollers with associated circuitry were designed and implemented to enable measurement of pollutant concentration information from chimney vents in two industry. Microsoft visual basic was used to develop a data mining tool which implemented an underlying artificial neural network model for forecasting pollutant concentrations for future time periods. The feed forward back propagation method was used to train the ANN model with a training data set while a decision tree algorithm was used to select an optimal output result for the model from its two output neurons.

  18. Fault-tolerance of functionally adaptive and robust manipulator

    International Nuclear Information System (INIS)

    Robots are required to have the ability to adapt their function according to the tasks to be carried out in an unexpected environment, and to execute tasks even if a part of the system is malfunctions. Fault tolerance is a significant factor of functional adaptability. In this paper, a fault-tolerant control method with a proxy control strategy for a distributed manipulator is proposed. A Byzantine fault model is assumed in the method, where in the behavior of the faulty part cannot be predicted. The method focuses on malfunction of CPU (central processing unit) which is the controller of the manipulator. The method consists of procedures for fault detection, localization, containment, system reconfiguration and error recovery. The fault detection procedure is based on communication using shared memory. A voting algorithm for fault location is proposed. The fault-tolerance control method is implemented in a distributed manipulator with modular architecture, called Fun-ARM (functionally adaptive and robust manipulator). A reaching motion experiment with a CPU pseudo fault is shown, and the proposed fault-tolerant control method is verified. (author)

  19. Fault-tolerant logics for FPGA linux

    International Nuclear Information System (INIS)

    The increasing use of SRAM-based reconfigurable architectures at important areas of research and development (like particle accelerators and space applications) brings new, currently partially unattended effects on top. An already well known, but nevertheless important problem of such systems is its susceptibility to radiation which increases in conjunction with particle flux and energy. Regarding to current knowledge, errors induced by Single Event Upsets (SEU) and Single Event Transients (SET) are handled exclusively in hardware by the use of spacial and temporal redundancy features. Our field of research is to extend conventional fault tolerance to multiple layers of embedded computer systems, starting with the FPGA bit layer and ending up in the software application layer to get a maximum of radiation tolerance in systems running FPGA Linux in radiation susceptible environments. Only a collaboration of all these layers is able to create an adequate amount of data security and process integrity.

  20. A Dynamic Effective Fault Tolerance System in Robotic Manipulator using a Hybrid Neural Network based Controller

    Directory of Open Access Journals (Sweden)

    G. Jiji

    2014-04-01

    Full Text Available Robot manipulator play important role in the field of automobile industry, mainly it is used in gas welding application and manufacturing and assembling of motor parts. In complex trajectory, on each joint the speed of the robot manipulator is affected. For that reason, it is necessary to analyze the noise and vibration of robot's joints for predicting faults also improve the control precision of robotic manipulator. In this study we will propose a new fault detection system for Robot manipulator. The proposed hybrid fault detection system is designed based on fuzzy support vector machine and Artificial Neural Networks (ANNs. In this system the decouple joints are identified and corrected using fuzzy SVM, here non-linear signal are used for complete process and treatment, the Artificial Neural Networks (ANNs are used to detect the free-swinging and locked joint of the robot, two types of neural predictors are also employed in the proposed adaptive neural network structure. The simulation results of a hybrid controller demonstrate the feasibility and performance of the methodology.

  1. Fault tolerant control of a three-phase three-wire shunt active filter system based on reliability analysis

    Energy Technology Data Exchange (ETDEWEB)

    Poure, P. [Laboratoire d' Instrumentation Electronique de Nancy LIEN, EA 3440, Nancy-Universite, Faculte des Sciences et Techniques, BP 239, 54506 Vandoeuvre Cedex (France); Weber, P.; Theilliol, D. [Centre de Recherche en Automatique de Nancy UMR 7039, Nancy-Universite, CNRS, Faculte des Sciences et Techniques, BP 239, 54506 Vandoeuvre Cedex (France); Saadate, S. [Groupe de Recherches en Electrotechnique et Electronique de Nancy UMR 7037, Nancy-Universite, CNRS, Faculte des Sciences et Techniques, BP 239, 54506 Vandoeuvre Cedex (France)

    2009-02-15

    This paper deals with fault tolerant shunt three-phase three-wire active filter topologies for which reliability is very important in industry applications. The determination of the optimal reconfiguration structure among various ones with or without redundant components is discussed based on reliability criteria. First, the reconfiguration of the inverter is detailed and a fast fault diagnosis method for power semi-conductor or driver fault detection and compensation is presented. This method avoids false fault detection due to power semi-conductors switching. The control architecture and algorithm are studied and a fault tolerant control strategy is considered. Simulation results in open and short circuit cases validate the theoretical study. Finally, the reliability of the studied three-phase three-wire filter shunt active topologies is analyzed to determine the optimal one. (author)

  2. Closed-Loop Evaluation of an Integrated Failure Identification and Fault Tolerant Control System for a Transport Aircraft

    Science.gov (United States)

    Shin, Jong-Yeob; Belcastro, Christine; Khong, thuan

    2006-01-01

    Formal robustness analysis of aircraft control upset prevention and recovery systems could play an important role in their validation and ultimate certification. Such systems developed for failure detection, identification, and reconfiguration, as well as upset recovery, need to be evaluated over broad regions of the flight envelope or under extreme flight conditions, and should include various sources of uncertainty. To apply formal robustness analysis, formulation of linear fractional transformation (LFT) models of complex parameter-dependent systems is required, which represent system uncertainty due to parameter uncertainty and actuator faults. This paper describes a detailed LFT model formulation procedure from the nonlinear model of a transport aircraft by using a preliminary LFT modeling software tool developed at the NASA Langley Research Center, which utilizes a matrix-based computational approach. The closed-loop system is evaluated over the entire flight envelope based on the generated LFT model which can cover nonlinear dynamics. The robustness analysis results of the closed-loop fault tolerant control system of a transport aircraft are presented. A reliable flight envelope (safe flight regime) is also calculated from the robust performance analysis results, over which the closed-loop system can achieve the desired performance of command tracking and failure detection.

  3. USAGE OF STANDARD PERSONAL COMPUTER PORTS FOR DESIGNING OF THE DOUBLE REDUNDANT FAULT-TOLERANT COMPUTER CONTROL SYSTEMS

    Directory of Open Access Journals (Sweden)

    Rafig SAMEDOV

    2005-01-01

    Full Text Available In this study, for designing of the fault-tolerant control systems by using standard personal computers, the ports have been investigated, different structure versions have been designed and the method for choosing of an optimal structure has been suggested. In this scope, first of all, the Ç?FTYAK system has been defined and its work principle has been determined. Then, data transmission ports of the standard personal computers have been classified and analyzed. After that, the structure versions have been designed and evaluated according to the used data transmission methods, the numbers of ports and the criterions of reliability, performance, truth, control and cost. Finally, the method for choosing of the most optimal structure version has been suggested.

  4. A continuous-time semi-markov bayesian belief network model for availability measure estimation of fault tolerant systems

    Scientific Electronic Library Online (English)

    Márcio das Chagas, Moura; Enrique López, Droguett.

    2008-08-01

    Full Text Available Neste trabalho, é proposto um modelo baseado na integração entre processos semi-Markovianos e redes Bayesianas para avaliação da disponibilidade de sistemas tolerantes à falha. Esta integração resulta em um modelo estocástico híbrido o qual é capaz de representar as características dinâmicas de um s [...] istema assim como tratar as relações de causa e efeito entre fatores externos tais como condições ambientais e operacionais. Além disso, o modelo híbrido permite avaliar a propagação de incerteza sobre a disponibilidade do sistema. É também proposto um procedimento numérico para a solução das equações de probabilidade de estado de processos semi-Markovianos descritos por taxas de transição. Tal procedimento numérico é baseado na aplicação de transformadas de Laplace que são invertidas pelo método de quadratura Gaussiana conhecido como Gauss Legendre. O modelo híbrido e procedimento numérico são ilustrados por meio de um exemplo de aplicação no contexto de sistemas tolerantes à falha. Abstract in english In this work it is proposed a model for the assessment of availability measure of fault tolerant systems based on the integration of continuous time semi-Markov processes and Bayesian belief networks. This integration results in a hybrid stochastic model that is able to represent the dynamic charact [...] eristics of a system as well as to deal with cause-effect relationships among external factors such as environmental and operational conditions. The hybrid model also allows for uncertainty propagation on the system availability. It is also proposed a numerical procedure for the solution of the state probability equations of semi-Markov processes described in terms of transition rates. The numerical procedure is based on the application of Laplace transforms that are inverted by the Gauss quadrature method known as Gauss Legendre. The hybrid model and numerical procedure are illustrated by means of an example of application in the context of fault tolerant systems.

  5. A continuous-time semi-markov bayesian belief network model for availability measure estimation of fault tolerant systems

    Directory of Open Access Journals (Sweden)

    Márcio das Chagas Moura

    2008-08-01

    Full Text Available In this work it is proposed a model for the assessment of availability measure of fault tolerant systems based on the integration of continuous time semi-Markov processes and Bayesian belief networks. This integration results in a hybrid stochastic model that is able to represent the dynamic characteristics of a system as well as to deal with cause-effect relationships among external factors such as environmental and operational conditions. The hybrid model also allows for uncertainty propagation on the system availability. It is also proposed a numerical procedure for the solution of the state probability equations of semi-Markov processes described in terms of transition rates. The numerical procedure is based on the application of Laplace transforms that are inverted by the Gauss quadrature method known as Gauss Legendre. The hybrid model and numerical procedure are illustrated by means of an example of application in the context of fault tolerant systems.Neste trabalho, é proposto um modelo baseado na integração entre processos semi-Markovianos e redes Bayesianas para avaliação da disponibilidade de sistemas tolerantes à falha. Esta integração resulta em um modelo estocástico híbrido o qual é capaz de representar as características dinâmicas de um sistema assim como tratar as relações de causa e efeito entre fatores externos tais como condições ambientais e operacionais. Além disso, o modelo híbrido permite avaliar a propagação de incerteza sobre a disponibilidade do sistema. É também proposto um procedimento numérico para a solução das equações de probabilidade de estado de processos semi-Markovianos descritos por taxas de transição. Tal procedimento numérico é baseado na aplicação de transformadas de Laplace que são invertidas pelo método de quadratura Gaussiana conhecido como Gauss Legendre. O modelo híbrido e procedimento numérico são ilustrados por meio de um exemplo de aplicação no contexto de sistemas tolerantes à falha.

  6. Enhancement of Fault Tolerance in Cloud Computing

    Directory of Open Access Journals (Sweden)

    Pushpanjali Gupta

    2014-08-01

    Full Text Available In recent years researchers are trying to work out scientific applications in cloud so that it decreases the infrastructure cost and increases the span of team and finally innovative ideas towards applications is increased. But the cloud is still not as much reliable, controllable as grid. So in the evolving Cloud computing environment there is a great need of fault tolerance mechanism for the system to work effectively even in the presence of failure. Moreover Big Organizations are also opting for using Hybrid Cloud instead of private Cloud. Thus, in this paper we propose an approach of using a new framework in Cloud so as to use Cloud for scientific applications as well makes the public Cloud trustworthy platform. There is a progressive approach introduced to provide an effective way to achieve high fault tolerance in Clouds by enabling a new workflow planning method to balance performance, reliability and cost for critical scientific applications and focus mainly on use of distributed resources for workflow execution mainly in serial and concurrent manner.

  7. Practical fault tolerance for quantum circuits

    Science.gov (United States)

    Whitney, Mark Gregory

    Due to very high projected error rates, large scale quantum computers will require substantial fault tolerance just to maintain a minimum level of reliability. We present tools to better analyze the performance of large, fault tolerant quantum computer designs. We find that current uses of quantum error correction are overly conservative in mitigating the impact of gate errors and negligent of other error sources in quantum data communication and memory. We have developed circuit layout heuristics to generate detailed designs in trapped ion quantum computing technology. From these designs, we can extract much more accurate error models for a given application, including all gate, movement and idle errors on qubits. Using these extracted models, our flexible error simulation environment determines the overall failure probability of the design. Included in this simulation environment is a bit-parallel Monte Carlo technique that is 10 times faster than previous fault propagation simulations. This allows us to evaluate the reliability of designs that are an order of magnitude larger, in the same amount of time. Using this analysis framework to verify reliability, we have developed a linear programming-based optimization for error correction which decreases overall circuit resources by an order of magnitude. In some cases, our optimization actually improves overall system reliability by removing error correction. We combine this optimization with judicious quantum error correcting code selection to provide efficient designs for large quantum arithmetic kernels used in Shor's factorization algorithm. We show our optimized designs perform 2x to 100x better than previous works in terms of probabilistic area-delay product. Additionally, the area of our layout of a 1024-bit factoring using Shor's algorithm is 64cm2, a substantial improvement compared to the 0.9m2 state-of-the-art design from prior work. A design size reduction by this amount will make fabricating such an application feasible much sooner.

  8. Fault Tolerant Parallel Filters Based On Bch Codes

    Directory of Open Access Journals (Sweden)

    K.Mohana Krishna

    2015-04-01

    Full Text Available Digital filters are used in signal processing and communication systems. In some cases, the reliability of those systems is critical, and fault tolerant filter implementations are needed. Over the years, many techniques that exploit the filters’ structure and properties to achieve fault tolerance have been proposed. As technology scales, it enables more complex systems that incorporate many filters. In those complex systems, it is common that some of the filters operate in parallel, for example, by applying the same filter to different input signals. Recently, a simple technique that exploits the presence of parallel filters to achieve multiple fault tolerance has been presented. In this brief, that idea is generalized to show that parallel filters can be protected using Bose– Chaudhuri–Hocquenghem codes (BCH in which each filter is the equivalent of a bit in a traditional ECC. This new scheme allows more efficient protection when the number of parallel filters is large.

  9. Application of a fault-tolerant microprocessor-based core-surveillance system in a German fast breeder reactor

    International Nuclear Information System (INIS)

    For the fast breeder reactor KNK II at Karlsruhe, Germany, a microprocessor-based safety shut-down system is built. Analogue to the triple modular instrumentation it consists of TMR hardware. Functionally it is split into four blocks which operate in cascade-like fashion. The main functions are mean value calculation, current limit control, trend control, and final evaluation. In order to secure correctness, several constructive and analytical methods are applied for fault avoidance, like formal specification languages, programming guidelines, software quality assurance plan, validation, verification, and testing. Since additional means for correct and safe operation are still necessary, fault-tolerance and error-detection techniques are applied. These include self-checking programs, plausibility checks, control data, information exchange and control between the redundancies, and especially diversity. This diversity refers to different teams for the different development phases as well as to different tools and environments, like different programming languages for the application software. Three separate but functional identical programs will be implemented in Iftran, Pascal and PL/M. These will not only be used during the extensive testing period, but also during final operation

  10. Design and Analysis of a Fault Tolerant Microprocessor Based on Triple Modular Redundancy Using VHDL

    OpenAIRE

    Deepti Shinghal; Dinesh Chandra

    2011-01-01

    There are numerous real time & operation critical systems in which the failure of the system is unacceptable at any stage of processing. The examples of such systems are like ATM machines, satellites, spacecraft etc. In this paper a fault tolerant microprocessor is developed by using checker units with a fault secure ALU and to develop a fault secure ALU the parity prediction logic and two rail checker method was used. Finally triple modular redundancy is applied to develop a fault tolerant p...

  11. FTMP (Fault Tolerant Multiprocessor) programmer's manual

    Science.gov (United States)

    Feather, F. E.; Liceaga, C. A.; Padilla, P. A.

    1986-01-01

    The Fault Tolerant Multiprocessor (FTMP) computer system was constructed using the Rockwell/Collins CAPS-6 processor. It is installed in the Avionics Integration Research Laboratory (AIRLAB) of NASA Langley Research Center. It is hosted by AIRLAB's System 10, a VAX 11/750, for the loading of programs and experimentation. The FTMP support software includes a cross compiler for a high level language called Automated Engineering Design (AED) System, an assembler for the CAPS-6 processor assembly language, and a linker. Access to this support software is through an automated remote access facility on the VAX which relieves the user of the burden of learning how to use the IBM 4381. This manual is a compilation of information about the FTMP support environment. It explains the FTMP software and support environment along many of the finer points of running programs on FTMP. This will be helpful to the researcher trying to run an experiment on FTMP and even to the person probing FTMP with fault injections. Much of the information in this manual can be found in other sources; we are only attempting to bring together the basic points in a single source. If the reader should need points clarified, there is a list of support documentation in the back of this manual.

  12. Learning Fault-tolerant Speech Parsing with SCREEN

    CERN Document Server

    Wermter, S; Wermter, Stefan; Weber, Volker

    1994-01-01

    This paper describes a new approach and a system SCREEN for fault-tolerant speech parsing. SCREEEN stands for Symbolic Connectionist Robust EnterprisE for Natural language. Speech parsing describes the syntactic and semantic analysis of spontaneous spoken language. The general approach is based on incremental immediate flat analysis, learning of syntactic and semantic speech parsing, parallel integration of current hypotheses, and the consideration of various forms of speech related errors. The goal for this approach is to explore the parallel interactions between various knowledge sources for learning incremental fault-tolerant speech parsing. This approach is examined in a system SCREEN using various hybrid connectionist techniques. Hybrid connectionist techniques are examined because of their promising properties of inherent fault tolerance, learning, gradedness and parallel constraint integration. The input for SCREEN is hypotheses about recognized words of a spoken utterance potentially analyzed by a spe...

  13. Fault Tolerant Heterogeneous Limited Duplication Scheduling algorithm for Decentralized Grid

    Directory of Open Access Journals (Sweden)

    DR. NITIN

    2013-04-01

    Full Text Available Fault tolerance is one of the most desirable property in decentralized grid computing systems, where computational resources are geographically distributed. These resources collaborate in order to execute workflow applications as fast as possible. In workflow applications, tasks are dependent on each other, so it becomes extremely vital that scheduling techniques should also have some decentralized fault tolerant mechanism. In this paper, we have proposed a decentralized fault tolerant mechanism which utilize the checkpoint concept; for Heterogeneous Limited Duplication (HLD algorithm. HLD is based on task duplication scheduling in heterogeneous environment. There are two fold benefits firstly; if node failure occurs then rest of grid nodes sustain the execution of application. Secondly, less makespan of application is obtained using checkpoint concept. Therefore, application scheduled over decentralized grid systems (which are known for their unreliable behavior will yield results fast utilizing algorithm proposed in this paper.

  14. Interactive animation of fault-tolerant parallel algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Apgar, S.W.

    1992-02-01

    Animation of algorithms makes understanding them intuitively easier. This paper describes the software tool Raft (Robust Animator of Fault Tolerant Algorithms). The Raft system allows the user to animate a number of parallel algorithms which achieve fault tolerant execution. In particular, we use it to illustrate the key Write-All problem. It has an extensive user-interface which allows a choice of the number of processors, the number of elements in the Write-All array, and the adversary to control the processor failures. The novelty of the system is that the interface allows the user to create new on-line adversaries as the algorithm executes.

  15. Design Approach for Fault Tolerance in FPGA Architecture

    Directory of Open Access Journals (Sweden)

    Ms. Shweta S. Meshram

    2011-03-01

    Full Text Available Failures of nano-metric technologies owing to defects and shrinking process tolerances give rise tosignificant challenges for IC testing. In recent years the application space of reconfigurable devices hasgrown to include many platforms with a strong need for fault tolerance. While these systems frequentlycontain hardware redundancy to allow for continued operation in the presence of operational faults, theneed to recover faulty hardware and return it to full functionality quickly and efficiently is great. Inaddition to providing functional density, FPGAs provide a level of fault tolerance generally not found inmask-programmable devices by including the capability to reconfigure around operational faults in thefield. Reliability and process variability are serious issues for FPGAs in the future. With advancement inprocess technology, the feature size is decreasing which leads to higher defect densities, moresophisticated techniques at increased costs are required to avoid defects. If nano-technology fabricationare applied the yield may go down to zero as avoiding defect during fabrication will not be a feasibleoption Hence, feature architecture have to be defect tolerant. In regular structure like FPGA, redundancyis commonly used for fault tolerance. In this work we present a solution in which configuration bit-streamof FPGA is modified by a hardware controller that is present on the chip itself. The technique usesredundant device for replacing faulty device and increases the yield.

  16. Fault tolerance and reliability in integrated ship control

    DEFF Research Database (Denmark)

    Nielsen, Jens Frederik Dalsgaard; Izadi-Zamanabadi, Roozbeh; Schiøler, Henrik

    2002-01-01

    Various strategies for achieving fault tolerance in large scale control systems are discussed. The positive and negative impacts of distribution through network communication are presented. The ATOMOS framework for standardized reliable marine automation is presented along with the corresponding reliability issues. A generic framework for simulation of network traffic under fault conditions is suggested and the first practical experiences from a prototype implementation are reported.

  17. Passive Fault tolerant Control of an Inverted Double Pendulum

    DEFF Research Database (Denmark)

    Niemann, H.; Stoustrup, Jakob

    2003-01-01

    A passive fault tolerant control scheme is suggested, in which a nominal controller is augmented with an additional block, which guarantees stability and performance after the occurrence of a fault. The method is based on the Youla parameterization, which requires the nominal controller to be implemented in the observer based form. The proposed method is applied to a double inverted pendulum system, for which an H controller has been designed and verified in a lap setup. In this case study, the ...

  18. Design methods for fault-tolerant finite state machines

    Science.gov (United States)

    Niranjan, Shailesh; Frenzel, James F.

    1993-01-01

    VLSI electronic circuits are increasingly being used in space-borne applications where high levels of radiation may induce faults, known as single event upsets. In this paper we review the classical methods of designing fault tolerant digital systems, with an emphasis on those methods which are particularly suitable for VLSI-implementation of finite state machines. Four methods are presented and will be compared in terms of design complexity, circuit size, and estimated circuit delay.

  19. Fault-tolerant search algorithms reliable computation with unreliable information

    CERN Document Server

    Cicalese, Ferdinando

    2013-01-01

    Why a book on fault-tolerant search algorithms? Searching is one of the fundamental problems in computer science. Time and again algorithmic and combinatorial issues originally studied in the context of search find application in the most diverse areas of computer science and discrete mathematics. On the other hand, fault-tolerance is a necessary ingredient of computing. Due to their inherent complexity, information systems are naturally prone to errors, which may appear at any level - as imprecisions in the data, bugs in the software, or transient or permanent hardware failures. This book pr

  20. Concepts and Methods in Fault-tolerant Control

    DEFF Research Database (Denmark)

    Blanke, Mogens; Staroswiecly, M.

    2001-01-01

    Faults in automated processes will often cause undesired reactions and shut-down of a controlled plant, and the consequences could be damage to technical parts of the plant, to personnel or the environment. Fault-tolerant control combines diagnosis with control methods to handle faults in an intelligent way. The aim is to prevent that simple faults develop into serious failure and hence increase plant availability and reduce the risk of safety hazards. Fault-tolerant control merges several disciplines into a common framework to achieve these goals. The desired features are obtained through on-line fault diagnosis, automatic condition assessment and calculation of appropriate remedial actions to avoid certain consequences of a fault. The envelope of the possible remedial actions is very wide. Sometimes, simple could be achieved by replacing a measurement from a faulty sensor by an estimate. In yet other situations, complex reconfiguration or on-line controller redesign is required. This paper gives an overviewof recent tools to analyze and explore structure and other fundamental properties of an automated system such that any inherent redundancy in the controlled process can be fully utilized to maintain availability, even though faults may occur.

  1. SMaRtLight: A Practical Fault-Tolerant SDN Controller

    OpenAIRE

    Botelho, Fábio; Bessani, Alysson; Ramos, Fernando M. V.; Ferreira, Paulo

    2014-01-01

    The increase in the number of SDN-based deployments in production networks is triggering the need to consider fault-tolerant designs of controller architectures. Commercial SDN controller solutions incorporate fault tolerance, but there has been little discussion in the SDN literature on the design of such systems and the tradeoffs involved. To fill this gap, we present a by-construction design of a fault-tolerant controller, and materialize it by proposing and formalizing a...

  2. Implementations of a four-level mechanical architecture for fault-tolerant robots

    Energy Technology Data Exchange (ETDEWEB)

    Hooper, Richard; Sreevijayan, Dev; Tesar, Delbert; Geisinger, Joseph; Kapoor, Chelan

    1996-09-01

    This paper describes a fault tolerant mechanical architecture with four levels devised and implemented in concert with NASA (Tesar, D. and Sreevijayan, D., Four-level fault tolerance in manipulator design for space operations. In First Int. Symp. Measurement and Control in Robotics (ISMCR '90), Houston, Texas, 20-22 June 1990.) Subsequent work has clarified and revised the architecture. The four levels proceed from fault tolerance at the actuator level, to fault tolerance via in-parallel chains, to fault tolerance using serial kinematic redundancy, and finally to the fault tolerance multiple arm systems provide. This is a subsumptive architecture because each successive layer can incorporate the fault tolerance provided by all layers beneath. For instance a serially-redundant robot can incorporate dual fault-tolerant actuators. Redundant systems provide the fault tolerance, but the guiding principle of this architecture is that functional redundancies actively increase the performance of the system. Redundancies do not simply remain dormant until needed. This paper includes specific examples of hardware and/or software implementation at all four levels.

  3. Control switching in high performance and fault tolerant control

    DEFF Research Database (Denmark)

    Niemann, Hans Henrik; Poulsen, Niels KjØlstad

    2010-01-01

    The problem of reliability in high performance control and in fault tolerant control is considered in this paper. A feedback controller architecture for high performance and fault tolerance is considered. The architecture is based on the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization. By using the nominal controller in the architecture as a simple and robust controller, it is possible to use the YJBK transfer function for optimization of the closed-loop performance. This can be done both in connections with normal operation of the system as well as in connection with faults in the system. The architecture will also allow changing the applied sensors and/or actuators when switching between different controllers. This switchingget particular simple for open-loop stable systems.

  4. An Approach to Build Software Based on Fault Tolerance Computing Using Uncertainty Factor

    Directory of Open Access Journals (Sweden)

    Mrityunjay Brahma

    2013-12-01

    Full Text Available In this work, we have started with an overview on fault tolerance based system. In case of design diversity based software fault tolerance system, we observed that uncertainty remains an important factor. Keeping this factor, we have discussed about implementing Bayes’ theorem and probabilistic mathematical model to handle the uncertainty factor. We assume that, once developed, the complete model will give us better efficiency. The rest of this paper deals with other types of fault tolerance systems and their approaches. This part is a kind of literature review, which includes, fault tolerant computing schemes that rely on the single-design as well as on the multiple-design. Further, in single-design, we have discussed about recovery block, N-version programming, N self-checking programming scheme. Lastly, focusing on multiple-design, we have discussed about software engineering aspects, error detection mechanisms and fault tolerance by fault injection. The paper ends with a general conclusion.

  5. Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer. Volume 1: FTMP principles of operation

    Science.gov (United States)

    Smith, T. B., Jr.; Lala, J. H.

    1983-01-01

    The basic organization of the fault tolerant multiprocessor, (FTMP) is that of a general purpose homogeneous multiprocessor. Three processors operate on a shared system (memory and I/O) bus. Replication and tight synchronization of all elements and hardware voting is employed to detect and correct any single fault. Reconfiguration is then employed to repair a fault. Multiple faults may be tolerated as a sequence of single faults with repair between fault occurrences.

  6. Concepts and Methods in Fault-tolerant Control

    DEFF Research Database (Denmark)

    Blanke, Mogens; Staroswiecly, M.; Wu, N.E.

    2001-01-01

    Faults in automated processes will often cause undesired reactions and shut-down of a controlled plant, and the consequences could be damage to technical parts of the plant, to personnel or the environment. Fault-tolerant control combines diagnosis with control methods to handle faults in an intelligent way. The aim is to prevent that simple faults develop into serious failure and hence increase plant availability and reduce the risk of safety hazards. Fault-tolerant control merges several disci...

  7. Sliding mode fault detection and fault-tolerant control of smart dampers in semi-active control of building structures

    Science.gov (United States)

    Yeganeh Fallah, Arash; Taghikhany, Touraj

    2015-12-01

    Recent decades have witnessed much interest in the application of active and semi-active control strategies for seismic protection of civil infrastructures. However, the reliability of these systems is still in doubt as there remains the possibility of malfunctioning of their critical components (i.e. actuators and sensors) during an earthquake. This paper focuses on the application of the sliding mode method due to the inherent robustness of its fault detection observer and fault-tolerant control. The robust sliding mode observer estimates the state of the system and reconstructs the actuators’ faults which are used for calculating a fault distribution matrix. Then the fault-tolerant sliding mode controller reconfigures itself by the fault distribution matrix and accommodates the fault effect on the system. Numerical simulation of a three-story structure with magneto-rheological dampers demonstrates the effectiveness of the proposed fault-tolerant control system. It was shown that the fault-tolerant control system maintains the performance of the structure at an acceptable level in the post-fault case.

  8. Design of Test Articles and Monitoring System for the Characterization of HIRF Effects on a Fault-Tolerant Computer Communication System

    Science.gov (United States)

    Torres-Pomales, Wilfredo; Malekpour, Mahyar R.; Miner, Paul S.; Koppen, Sandra V.

    2008-01-01

    This report describes the design of the test articles and monitoring systems developed to characterize the response of a fault-tolerant computer communication system when stressed beyond the theoretical limits for guaranteed correct performance. A high-intensity radiated electromagnetic field (HIRF) environment was selected as the means of injecting faults, as such environments are known to have the potential to cause arbitrary and coincident common-mode fault manifestations that can overwhelm redundancy management mechanisms. The monitors generate stimuli for the systems-under-test (SUTs) and collect data in real-time on the internal state and the response at the external interfaces. A real-time health assessment capability was developed to support the automation of the test. A detailed description of the nature and structure of the collected data is included. The goal of the report is to provide insight into the design and operation of these systems, and to serve as a reference document for use in post-test analyses.

  9. Optimized Nanometric Fault Tolerant Reversible BCD Adder

    OpenAIRE

    Majid Haghparast; Masoumeh Shams

    2012-01-01

    In this study a novel nanometric fault tolerant quantum and reversible binary coded decimal adder is proposed. Reversible logic has found emerging attentions in optical information processing, quantum computing, nanotechnology and low power design. BCD Adder is a combinational circuit that can be used for the addition of two numbers in BCD arithmetic's. The proposed reversible BCD adder has also parity preserving property. It is better than all the existing counterparts. The proposed circuit ...

  10. Implementation of Fault Tolerant Method Using BCH Code on FPGA

    Directory of Open Access Journals (Sweden)

    Mahadevaswamy V P

    2012-09-01

    Full Text Available The Fault tolerance degradation is the property thatenables a system (often computer-based to continue operatingproperly in the event of the failure of (or one or more faultswithin some of its components. To designing a new 32-bitArithmetic Logic Unit (ALU that is secure against many attacksor faults and able to correct any 5-bit fault in any position of its 32bits input register of ALU. Because the radiation effects onelectronic circuits may cause to be inverted data bits of registers ormemories. If one bit of main storage system is changed themission of system would be completely different. The highmotivation in choice of BCH (Bose, chaudhuri, andHocquenghem codes is that, it is able to correct multiple errorsand these classes of codes are kind of powerful random errorcorrecting cyclic codes. In comparison with area penalty methods,32-bit fault tolerant ALU using BCH code is a better choice interms of area as compared to Triple Modular Redundancy (TMRand Residue code. This is due to the fault tolerant method for32-bit ALU using TMR with single or triplicated voting needsingle voting scheme or tripled voter and two extra 32-bit ALUwhich has been increased the hardware overhead by 202% and208% respectively. The Residue code requires hardwareoverhead of 148.9%. However, in comparison with TMR a n dRe s i d u e c o d e , BCH code needs the hardware overhead is 70to 75%, which causes that the overall cost and power consumptionwill get reduces. Thus proposed fault tolerant hardware overheadhas lower hardware and multiple error correction when comparedto the other techniques.

  11. A Self-Stabilizing Hybrid Fault-Tolerant Synchronization Protocol

    Science.gov (United States)

    Malekpour, Mahyar R.

    2015-01-01

    This paper presents a strategy for solving the Byzantine general problem for self-stabilizing a fully connected network from an arbitrary state and in the presence of any number of faults with various severities including any number of arbitrary (Byzantine) faulty nodes. The strategy consists of two parts: first, converting Byzantine faults into symmetric faults, and second, using a proven symmetric-fault tolerant algorithm to solve the general case of the problem. A protocol (algorithm) is also present that tolerates symmetric faults, provided that there are more good nodes than faulty ones. The solution applies to realizable systems, while allowing for differences in the network elements, provided that the number of arbitrary faults is not more than a third of the network size. The only constraint on the behavior of a node is that the interactions with other nodes are restricted to defined links and interfaces. The solution does not rely on assumptions about the initial state of the system and no central clock nor centrally generated signal, pulse, or message is used. Nodes are anonymous, i.e., they do not have unique identities. A mechanical verification of a proposed protocol is also present. A bounded model of the protocol is verified using the Symbolic Model Verifier (SMV). The model checking effort is focused on verifying correctness of the bounded model of the protocol as well as confirming claims of determinism and linear convergence with respect to the self-stabilization period.

  12. Steps toward fault-tolerant quantum chemistry.

    Energy Technology Data Exchange (ETDEWEB)

    Taube, Andrew Garvin

    2010-05-01

    Developing quantum chemistry programs on the coming generation of exascale computers will be a difficult task. The programs will need to be fault-tolerant and minimize the use of global operations. This work explores the use a task-based model that uses a data-centric approach to allocate work to different processes as it applies to quantum chemistry. After introducing the key problems that appear when trying to parallelize a complicated quantum chemistry method such as coupled-cluster theory, we discuss the implications of that model as it pertains to the computational kernel of a coupled-cluster program - matrix multiplication. Also, we discuss the extensions that would required to build a full coupled-cluster program using the task-based model. Current programming models for high-performance computing are fault-intolerant and use global operations. Those properties are unsustainable as computers scale to millions of CPUs; instead one must recognize that these systems will be hierarchical in structure, prone to constant faults, and global operations will be infeasible. The FAST-OS HARE project is introducing a scale-free computing model to address these issues. This model is hierarchical and fault-tolerant by design, allows for the clean overlap of computation and communication, reducing the network load, does not require checkpointing, and avoids the complexity of many HPC runtimes. Development of an algorithm within this model requires a change in focus from imperative programming to a data-centric approach. Quantum chemistry (QC) algorithms, in particular electronic structure methods, are an ideal test bed for this computing model. These methods describe the distribution of electrons in a molecule, which determine the properties of the molecule. The computational cost of these methods is high, scaling quartically or higher in the size of the molecule, which is why QC applications are major users of HPC resources. The complexity of these algorithms means that MPI alone is insufficient to achieve parallel scaling; QC developers have been forced to use alternative approaches to achieve scalability and would be receptive to radical shifts in the programming paradigm. Initial work in adapting the simplest QC method, Hartree-Fock, to this the new programming model indicates that the approach is beneficial for QC applications. However, the advantages to being able to scale to exascale computers are greatest for the computationally most expensive algorithms; within QC these are the high-accuracy coupled-cluster (CC) methods. Parallel coupledcluster programs are available, however they are based on the conventional MPI paradigm. Much of the effort is spent handling the complicated data dependencies between the various processors, especially as the size of the problem becomes large. The current paradigm will not survive the move to exascale computers. Here we discuss the initial steps toward designing and implementing a CC method within this model. First, we introduce the general concepts behind a CC method, focusing on the aspects that make these methods difficult to parallelize with conventional techniques. Then we outline what is the computational core of the CC method - a matrix multiply - within the task-based approach that the FAST-OS project is designed to take advantage of. Finally we outline the general setup to implement the simplest CC method in this model, linearized CC doubles (LinCC).

  13. Database mirroring in fault-tolerant continuous technological process control

    Directory of Open Access Journals (Sweden)

    R. Danel

    2015-10-01

    Full Text Available This paper describes the implementations of mirroring technology of the selected database systems – Microsoft SQL Server, MySQL and Caché. By simulating critical failures the systems behavior and their resilience against failure were tested. The aim was to determine whether the database mirroring is suitable to use in continuous metallurgical processes for ensuring the fault-tolerant solution at affordable cost. The present day database systems are characterized by high robustness and are resistant to sudden system failure. Database mirroring technologies are reliable and even low-budget projects can be provided with a decent fault-tolerant solution. The database system technologies available for low-budget projects are not suitable for use in real-time systems.

  14. Fault Detection for Shipboard Monitoring and Decision Support Systems

    DEFF Research Database (Denmark)

    Lajic, Zoran; Nielsen, Ulrik Dam

    2009-01-01

    In this paper a basic idea of a fault-tolerant monitoring and decision support system will be explained. Fault detection is an important part of the fault-tolerant design for in-service monitoring and decision support systems for ships. In the paper, a virtual example of fault detection will be presented for a containership with a real decision support system onboard. All possible faults can be simulated and detected using residuals and the generalized likelihood ratio (GLR) algorithm.

  15. Fault tolerant microcomputer based alarm annunciator for Dhruva reactor

    International Nuclear Information System (INIS)

    The Dhruva alarm annunciator displays the status of 624 alarm points on an array of display windows using the standard ringback sequence. Recognizing the need for a very high availability, the system is implemented as a fault tolerant configuration. The annunciator is partitioned into three identical units; each unit is implemented using two microcomputers wired in a hot standby mode. In the event of one computer malfunctioning, the standby computer takes over control in a bouncefree transfer. The use of microprocessors has helped built-in flexibility in the system. The system also provides built-in capability to resolve the sequence of occurrence of events and conveys this information to another system for display on a CRT. This report describes the system features, fault tolerant organisation used and the hardware and software developed for the annunciation function. (author). 8 figs

  16. Highly Reliable Fault Tolerant Technique for Safety Critical Applications

    Directory of Open Access Journals (Sweden)

    Nanditha S

    2014-05-01

    Full Text Available This paper presents a highly reliable fault tolerant technique for safety critical applications using Five Modular Redundancy method. In high radiation environments like space crafts and nuclear thermal plants it is likely that single event upsets (SEU degrades the system operation. This causes single bit flips in the sequential elements of electronic components in the system. If these systems are not provided with the fault tolerance then there are high chances of obtaining false response. In order to avoid this problem the system is made redundant and a roll-forward recovery mechanism is used to increase the overall reliability. Scan cell design is employed to shift out the internal states of all the flip flops during comparison and recovery process. The proposed method is designed using verilog HDL on XILINX ISE simulator.

  17. Dynamic Fault Tolerance in Desktop Grids Based On Reliability

    Directory of Open Access Journals (Sweden)

    Geeta Arora

    2013-10-01

    Full Text Available Fault tolerance is an important issue to guarantee reliable execution of tasks in computational desktop grid environment where execution failures are frequently expected, requires the availability of efficient fault tolerant strategies able to effectively deal with resource failures and/or unplanned periods of unavailability. In this paper we present a Dynamic Fault Tolerant strategy that, rather than just tolerating faults as done by traditional fault-tolerant schedulers, exploit the information concerning size of task, resource speed and resource reliability by maintaining resource history to improve application performance. The performance of this strategy has been compared via simulation with those attained by traditional fault-tolerant strategy. Our results, obtained by considering a set of realistic scenarios modeled after real Desktop Grids, show that our approach results in better application performance and resource utilization.

  18. Improving Fault Tolerance in Ad-Hoc Networks by Using Residue Number System

    Directory of Open Access Journals (Sweden)

    A. Barati

    2008-01-01

    Full Text Available In this study, we presented a method for distributing data storage by using residue number system for mobile systems and wireless networks based on peer to peer paradigm. Generally, redundant residue number system is capable in error detection and correction. In proposed method, we made a new system by mixing Redundant Residue Number System (RRNS, Multi Level Residue Number System (ML RNS and Multiple Valued Logic (MVL RNS which was perfect for parallel, carry free, high speed arithmetic and the system supports secure data communication. In addition it had ability of error detection and correction. In comparison to other number systems, it had many improvements in data security, error detection and correction, speed of storage and retrieval.

  19. Nonlinear, Adaptive and Fault-tolerant Control for Electro-hydraulic Servo Systems

    DEFF Research Database (Denmark)

    Choux, Martin

    2011-01-01

    Fluid power systems have been in use since 1795 with the rst hydraulic press patented by Joseph Bramah and today form the basis of many industries. Electro hydraulic servo systems are uid power systems controlled in closed-loop. They transform reference input signals into a set of movements in hydraulic actuators (cylinders or motors) by the means of hydraulic uid under pressure. With the development of computing power and control techniques during the last few decades, they are used increasingl...

  20. On Fault Tolerance of Resources in Grid Environment

    OpenAIRE

    Minakshi Memoria, Mukesh Yadav

    2013-01-01

    Grid computing, most simply stated, isdistributed computing taken to the next evolutionary level.The goal is to create the illusion of a simple yet large andpowerful self managing virtual computer out of a largecollection of connected heterogeneous systems sharingvarious combinations of resources. However, in the gridcomputing environment there are certain aspects whichreduce efficiency of the system, job scheduling of theresources and fault tolerance are the key aspect to improvethe efficien...

  1. Visual Programming of Fault-Tolerant Distributed Applications

    OpenAIRE

    Muganga, B.; Pacull, F.; Mazouni, K. R.; Wolff, A.-D.

    1995-01-01

    The design of fault-tolerant distributed applications is a complex task. In addition to application functionalities, the programmer must consider issues related to both replication and distribution for every application component concerned with fault-tolerance. This paper describes an approach which combines two environments (Specs and Garf) so as to: (1) graphically design applications using high level Petri nets and (2) discharge the programmer of fault-tolerance issues.

  2. Fault tolerance in Hadoop MapReduce implementation

    OpenAIRE

    Cogorno, Matías; Rey, Javier; Nesmachnow, Sergio

    2013-01-01

    This document reports the advances on exploring and understanding the fault tolerance mechanisms in Hadoop MapReduce. A description of the current fault tolerance features existing in Hadoop is provided, along with a review of related works on the topic. Finally, the document describes some relevant proposals about fault tolerance worth considering to implement in Hadoop within the PERMARE project in order to provide support for pervasive computing environments.

  3. Cooperative Fault Tolerant Distributed Computing

    Energy Technology Data Exchange (ETDEWEB)

    Fagg, Graham E.

    2006-03-15

    HARNESS was proposed as a system that combined the best of emerging technologies found in current distributed computing research and commercial products into a very flexible, dynamically adaptable framework that could be used by applications to allow them to evolve and better handle their execution environment. The HARNESS system was designed using the considerable experience from previous projects such as PVM, MPI, IceT and Cumulvs. As such, the system was designed to avoid any of the common problems found with using these current systems, such as no single point of failure, ability to survive machine, node and software failures. Additional features included improved inter-component connectivity, with full support for dynamic down loading of addition components at run-time thus reducing the stress on application developers to build in all the libraries they need in advance.

  4. Design and Verification of Fault-Tolerant Components

    DEFF Research Database (Denmark)

    Zhang, Miaomiao; Liu, Zhiming

    2009-01-01

    We present a systematic approach to design and verification of fault-tolerant components with real-time properties as found in embedded systems. A state machine model of the correct component is augmented with internal transitions that represent hypothesized faults. Also, constraints on the occurrence or timing of faults are included in this model. This model of a faulty component is then extended with fault detection and recovery mechanisms, again in the form of state machines. Desired properties of the component are model checked for each of the successive models. The models can be made relatively detailed such that they can serve directly as blueprints for engineering, and yet be amenable to exhaustive verication. The approach is illustrated with a design of a triple modular fault-tolerant system that is a real case we received from our collaborators in the aerospace field. We use UPPAAL to model and check this design. Model checking uses concrete parameters, so we extend the result with parametric analysis using abstractions of the automata in a rigorous verification.

  5. On Fault Tolerance of Resources in Grid Environment

    Directory of Open Access Journals (Sweden)

    Minakshi Memoria, Mukesh Yadav

    2013-01-01

    Full Text Available Grid computing, most simply stated, isdistributed computing taken to the next evolutionary level.The goal is to create the illusion of a simple yet large andpowerful self managing virtual computer out of a largecollection of connected heterogeneous systems sharingvarious combinations of resources. However, in the gridcomputing environment there are certain aspects whichreduce efficiency of the system, job scheduling of theresources and fault tolerance are the key aspect to improvethe efficiency and exploit the capabilities of emergentcomputational systems. Because of dynamic and distributednature of grid, the traditional methodologies of schedulingare inefficient for the effective utilization of the resourceavailable. The fault tolerance strategy proposed will improvethe performance of the overall computational gridenvironment. In this paper we propose an efficient jobscheduling, replication and check pointing to improve theefficiency of the grid environment. The simulation resultsillustrate that the proposed strategy effectively schedules thegrid jobs and reduce the execution time.

  6. System-Level Development of Fault-Tolerant Distributed Aero-Engine Control Architecture Project

    Data.gov (United States)

    National Aeronautics and Space Administration — NASA's vision for an "intelligent engine" will be realized with the development of a truly distributed control system and reliable smart transducer node components;...

  7. A Theory of Fault-Tolerant Quantum Computation

    CERN Document Server

    Gottesman, D

    1998-01-01

    In order to use quantum error-correcting codes to actually improve the performance of a quantum computer, it is necessary to be able to perform operations fault-tolerantly on encoded states. I present a general theory of fault-tolerant operations based on symmetries of the code stabilizer. This allows a straightforward determination of which operations can be performed fault-tolerantly on a given code. I present a number of examples and demonstrate that fault-tolerant universal computation is possible for a great number of different codes, including the five-qubit code.

  8. Design of fault-tolerant inductive position sensor

    International Nuclear Information System (INIS)

    The position sensors used in a magnetic bearing system are desirable to provide some degree of fault-tolerance as the rotor position is necessary for the feedback control to overcome the open-loop instability. In this paper, we propose and inductive position sensor that can cope with a partial fault in the sensor. The sensor has multiple poles which can be combined to sense the in-plane motion of the rotor. When a high-frequency voltage signal drives each pole of the sensor, the resulting current in the sensor coil contains information regarding the rotor position. The signal processing circuit of the sensor extracts this position information. In this paper, we used the magnetic circuit model of the sensor that shows the analytical relationship between the sensor output and the rotor motion. The multi-polar structure of the sensor makes it possible to introduce redundancy which can be exploited for fault-tolerant operation. The proposed sensor is applied to a magnetically levitated turbo-molecular vacuum pump. Experimental results validate the fault-tolerance algorithm

  9. Byzantine Fault Tolerance for Nondeterministic Applications

    CERN Document Server

    Zhao, W

    2007-01-01

    All practical applications contain some degree of nondeterminism. When such applications are replicated to achieve Byzantine fault tolerance (BFT), their nondeterministic operations must be sanitized to ensure replica consistency. To the best of our knowledge, only two types of replica nondeterminism have been studied under the Byzantine fault model, which we refer to as wrappable nondeterminism and verifiable pre-determinable nondeterminism. The wrappable nondeterminism is a type of nondeterminism that can be controlled using an infrastructure-provided or application-provided wrapper function, without explicit inter-replica coordination. For example, information such as hostnames, process ids, file descriptors, etc. can be determined group-wise. The verifiable pre-determinable nondeterminism is a type of nondeterminism whose values can be independently chosen by the primary replica and verified by other replicas prior to the execution of a client's request, such as the operation to retrieve the local clock v...

  10. A novel adaptive switching function on fault tolerable sliding mode control for uncertain stochastic systems.

    Science.gov (United States)

    Zahiripour, Seyed Ali; Jalali, Ali Akbar

    2014-09-01

    A novel switching function based on an optimization strategy for the sliding mode control (SMC) method has been provided for uncertain stochastic systems subject to actuator degradation such that the closed-loop system is globally asymptotically stable with probability one. In the previous researches the focus on sliding surface has been on proportional or proportional-integral function of states. In this research, from a degree of freedom that depends on designer choice is used to meet certain objectives. In the design of the switching function, there is a parameter which the designer can regulate for specified objectives. A sliding-mode controller is synthesized to ensure the reachability of the specified switching surface, despite actuator degradation and uncertainties. Finally, the simulation results demonstrate the effectiveness of the proposed method. PMID:24954808

  11. Trail Systems as fault tolerant wires and their use in bio-processors

    OpenAIRE

    Glade, Nicolas; Ben-Amor, Hedi; Bastien, Olivier

    2009-01-01

    Motivated by the idea that one day, probably far in the future, the computers and robots will be architectureless, made of collections of numerous 'intelligent' subsystems or nanomachines able to self-organize each other into computational morphologies with perhaps more computational power than classical electronic-based computers, many studies are burgeoning in different fields (chemistry, biology, condensed matter, quantum physics, ...). Several systems inspired from Nature have indeed been...

  12. Communication and Agreement Abstractions for Fault-Tolerant Asynchronous Distributed Systems

    CERN Document Server

    Raynal, Michel

    2010-01-01

    Understanding distributed computing is not an easy task. This is due to the many facets of uncertainty one has to cope with and master in order to produce correct distributed software. Considering the uncertainty created by asynchrony and process crash failures in the context of message-passing systems, the book focuses on the main abstractions that one has to understand and master in order to be able to produce software with guaranteed properties. These fundamental abstractions are communication abstractions that allow the processes to communicate consistently (namely the register abstraction

  13. Constrained Fault-Tolerant Resource Allocation

    OpenAIRE

    Liao, Kewen; Shen, Hong; Guo, Longkun

    2012-01-01

    In the Constrained Fault-Tolerant Resource Allocation (FTRA) problem, we are given a set of sites containing facilities as resources, and a set of clients accessing these resources. Specifically, each site i is allowed to open at most R_i facilities with cost f_i for each opened facility. Each client j requires an allocation of r_j open facilities and connecting j to any facility at site i incurs a connection cost c_ij. The goal is to minimize the total cost of this resource...

  14. PAD: A fault tolerance specific circuit

    Science.gov (United States)

    Chavade, J.; Crouzet, Y.

    1982-09-01

    A fault tolerance specific circuit, designed to reduce cost in the number of units of a safe operating structure, PAD is based on a general architecture with duplication of central units and a duplication or a coding of memory blocks and of inputs/outputs. The operations performed by PAD are described and presented in the form of peripheral, programmable circuit of 6800 microprocessor family. Circuit diagrams for PAD, the programming of the circuit and of the mode for executing instruction, and self-checking at the level of PAD are considered.

  15. A lightweight fault-tolerant middleware for a Subaru Telescope second generation observation control system

    Science.gov (United States)

    Jeschke, Eric; Bon, Bruce; Inagaki, Takeshi; Streeper, Sam

    2008-08-01

    Subaru Telescope is developing a second-generation Observation Control System that specifically addresses some of the deficiencies of the current Subaru OCS. Two areas of concern are complexity and failure handling. The current system has over 1000 dedicated OCS processes spread across a dozen hosts and provides nothing in the way of automated failover. Furthermore, manual failover is so fraught with difficulty that it is rarely attempted. Our Generation 2 OCS is written almost entirely in Python and builds upon a Subaru-developed middleware based on the XML-RPC protocol. This framework offers the following benefits: - has very few dependences outside of standard Python - provides a nearly seamless remote proxy object-oriented interface - provides optional user/password authentication and/or SSL encryption - is extremely simple to use from client applications - is connectionless, and assists transparent failover of communications and services on a cluster of hosts - has reasonable performance for a wide range of needs - allows multiple language bindings - for dynamic languages, requires no interface stub files The "back end" (service side) of the OCS is nearing completion, and has already been used successfully during two separate OCS engineering runs. It is comprised of only a couple dozen processes, and provides automated failover capabilities on a rack of commodity x86 Linux servers. We provide an overview of the middleware design and its failover capabilities. Some data on the performance of communications using the middleware protocol is included.

  16. Fault-tolerant distributed mass storage for LHC computing

    CERN Document Server

    Wiebalck, A; Lindenstruth, V; Stinbeck, T M

    2003-01-01

    In this paper we present the concept and first prototyping results of a modular fault-tolerant distributed mass storage architecture for large Linux PC clusters as they are deployed by the upcoming particle physics experiments. The device masquerading technique using an Enhanced Network Block Device (ENBD) enables local RAID over remote disks as the key concept of the ClusterRAID system. The block level interface to remote files, partitions or disks provided by the ENBD makes it possible to use the standard Linux software RAID to add fault-tolerance to the system. Preliminary performance measurements indicate that the latency is comparable to a local hard drive. With four disks throughput rates of up to 55MB/s were achieved with first prototypes for a RAIDO setup, and about 40M/s for a RAID5 setup. (29 refs).

  17. Checkpoint-based Intelligent Fault tolerance For Cloud Service Providers

    Directory of Open Access Journals (Sweden)

    Rejin Paul

    2012-12-01

    Full Text Available With the increasing demand and benefits of cloud computing infrastructure, real time computing can be performed on cloud infrastructure. A real time system can take advantage of intensive computing capabilities and scalable virtualized environment of cloud computing to execute real time tasks. In most of the real time cloud applications, processing is done on remote cloud computing nodes. So there are more chances of errors, due to the undetermined latency and loose control over computing node. On the other side, most of the real time systems are also safety critical and should be highly reliable. So there is an increased requirement for fault tolerance to achieve reliability for the real time computing on cloud Infrastructure. In this paper, proposes a smart checkpoint infrastructure for virtualized service providers and fault tolerance model for real time cloud computing. The checkpoints are stored in a Hadoop Distributed File System. This allows resuming a task execution faster after a node crash and increasing the fault tolerance of the system, since checkpoints are distributed and replicated in all the nodes of the provider. This paper presents a running implementation of this infrastructure and its evaluation, demonstrating that it is an effective way to make faster checkpoints with low interference on task execution and efficient task recovery after a node failure.One advantage of cloud computing is the dynamicity of re- source provisioning. Our architecture makes use of this advantage by enabling dynamic run- time modi?cations of replication groups

  18. Superior model for fault tolerance computation in designing nano-sized circuit systems

    Science.gov (United States)

    Singh, N. S. S.; Asirvadam, V. S.; Muthuvalu, M. S.

    2014-10-01

    As CMOS technology scales nano-metrically, reliability turns out to be a decisive subject in the design methodology of nano-sized circuit systems. As a result, several computational approaches have been developed to compute and evaluate reliability of desired nano-electronic circuits. The process of computing reliability becomes very troublesome and time consuming as the computational complexity build ups with the desired circuit size. Therefore, being able to measure reliability instantly and superiorly is fast becoming necessary in designing modern logic integrated circuits. For this purpose, the paper firstly looks into the development of an automated reliability evaluation tool based on the generalization of Probabilistic Gate Model (PGM) and Boolean Difference-based Error Calculator (BDEC) models. The Matlab-based tool allows users to significantly speed-up the task of reliability analysis for very large number of nano-electronic circuits. Secondly, by using the developed automated tool, the paper explores into a comparative study involving reliability computation and evaluation by PGM and, BDEC models for different implementations of same functionality circuits. Based on the reliability analysis, BDEC gives exact and transparent reliability measures, but as the complexity of the same functionality circuits with respect to gate error increases, reliability measure by BDEC tends to be lower than the reliability measure by PGM. The lesser reliability measure by BDEC is well explained in this paper using distribution of different signal input patterns overtime for same functionality circuits. Simulation results conclude that the reliability measure by BDEC depends not only on faulty gates but it also depends on circuit topology, probability of input signals being one or zero and also probability of error on signal lines.

  19. Superior model for fault tolerance computation in designing nano-sized circuit systems

    International Nuclear Information System (INIS)

    As CMOS technology scales nano-metrically, reliability turns out to be a decisive subject in the design methodology of nano-sized circuit systems. As a result, several computational approaches have been developed to compute and evaluate reliability of desired nano-electronic circuits. The process of computing reliability becomes very troublesome and time consuming as the computational complexity build ups with the desired circuit size. Therefore, being able to measure reliability instantly and superiorly is fast becoming necessary in designing modern logic integrated circuits. For this purpose, the paper firstly looks into the development of an automated reliability evaluation tool based on the generalization of Probabilistic Gate Model (PGM) and Boolean Difference-based Error Calculator (BDEC) models. The Matlab-based tool allows users to significantly speed-up the task of reliability analysis for very large number of nano-electronic circuits. Secondly, by using the developed automated tool, the paper explores into a comparative study involving reliability computation and evaluation by PGM and, BDEC models for different implementations of same functionality circuits. Based on the reliability analysis, BDEC gives exact and transparent reliability measures, but as the complexity of the same functionality circuits with respect to gate error increases, reliability measure by BDEC tends to be lower than the reliability measure by PGM. The lesser reliability measure by BDEC is well explained in this paper using distribution of different signal input patterns overtime for same functionality circuits. Simulation results conclude that the reliability measure by BDEC depends not only on faulty gates but it also depends on circuit topology, probability of input signals being one or zero and also probability of error on signal lines

  20. Superior model for fault tolerance computation in designing nano-sized circuit systems

    Energy Technology Data Exchange (ETDEWEB)

    Singh, N. S. S., E-mail: narinderjit@petronas.com.my; Muthuvalu, M. S., E-mail: msmuthuvalu@gmail.com [Fundamental and Applied Sciences Department, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, Perak (Malaysia); Asirvadam, V. S., E-mail: vijanth-sagayan@petronas.com.my [Electrical and Electronics Engineering Department, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, Perak (Malaysia)

    2014-10-24

    As CMOS technology scales nano-metrically, reliability turns out to be a decisive subject in the design methodology of nano-sized circuit systems. As a result, several computational approaches have been developed to compute and evaluate reliability of desired nano-electronic circuits. The process of computing reliability becomes very troublesome and time consuming as the computational complexity build ups with the desired circuit size. Therefore, being able to measure reliability instantly and superiorly is fast becoming necessary in designing modern logic integrated circuits. For this purpose, the paper firstly looks into the development of an automated reliability evaluation tool based on the generalization of Probabilistic Gate Model (PGM) and Boolean Difference-based Error Calculator (BDEC) models. The Matlab-based tool allows users to significantly speed-up the task of reliability analysis for very large number of nano-electronic circuits. Secondly, by using the developed automated tool, the paper explores into a comparative study involving reliability computation and evaluation by PGM and, BDEC models for different implementations of same functionality circuits. Based on the reliability analysis, BDEC gives exact and transparent reliability measures, but as the complexity of the same functionality circuits with respect to gate error increases, reliability measure by BDEC tends to be lower than the reliability measure by PGM. The lesser reliability measure by BDEC is well explained in this paper using distribution of different signal input patterns overtime for same functionality circuits. Simulation results conclude that the reliability measure by BDEC depends not only on faulty gates but it also depends on circuit topology, probability of input signals being one or zero and also probability of error on signal lines.

  1. Fault Detection and Isolation and Fault Tolerant Control of Wind Turbines Using Set-Valued Observers

    DEFF Research Database (Denmark)

    Casau, Pedro; Rosa, Paulo Andre Nobre

    2012-01-01

    Research on wind turbine Operations & Maintenance (O&M) procedures is critical to the expansion of Wind Energy Conversion systems (WEC). In order to reduce O&M costs and increase the lifespan of the turbine, we study the application of Set-Valued Observers (SVO) to the problem of Fault Detection and Isolation (FDI) and Fault Tolerant Control (FTC) of wind turbines, by taking advantage of the recent advances in SVO theory for model invalidation. A simple wind turbine model is presented along with possible faulty scenarios. The FDI algorithm is built on top of the described model, taking into account process disturbances, uncertainty and sensor noise. The FTC strategy takes advantage of the proposed FDI algorithm, enabling the controller reconfiguration shortly after fault events. Additionally, a robust controller is designed so as to increase the wind turbine's performance during low severity faults. Finally, the FDI algorithm is assessed within a publicly available benchmark model, using Monte-Carlo simulation runs.

  2. Fault tolerant controller for a class of additive faults: a quasi-continuous high-order sliding mode approach

    Science.gov (United States)

    Dávila, J.; Cieslak, J.; Henry, D.; Zolghadri, A.

    2015-11-01

    In this paper a fault tolerant control strategy that combines the backstepping procedure and the quasi-continuous high-order sliding mode controller is proposed. The fault tolerance principle is based on a hierarchical application of the backstepping methodology ensuring the finite time convergence of the desired system states, in spite of the considered fault situations. The additive effect of the faults and disturbances is canceled out by the hierarchical application of the quasi-continuous controller ensuring fault-tolerance. The effect of Lebesgue measurable noise over the precision of the proposed controller is studied. Simulation results based on a nonlinear model of the F16 jet fighter show the efficiency of the proposed techniques.

  3. A Framework-Based Approach for Fault-Tolerant Service Robots

    Directory of Open Access Journals (Sweden)

    Heejune Ahn

    2012-11-01

    Full Text Available Recently the component?based approach has become a major trend in intelligent service robot development due to its reusability and productivity. The framework in a component?based system should provide essential services for application components. However, to our knowledge the existing robot frameworks do not yet support fault tolerance service. Moreover, it is often believed that faults can be handled only at the application level. In this paper, by extending the robot framework with the fault tolerance function, we argue that the framework?based fault tolerance approach is feasible and even has many benefits, including that: 1 the system integrators can build fault tolerance applications from non?fault?aware components; 2 the constraints of the components and the operating environment can be considered at the time of integration, which ? cannot be anticipated eaily at the time of component development; 3 consistency in system reliability can be obtained even in spite of diverse application component sources. In the proposed construction, we build XML rule files defining the rules for probing and determining the fault conditions of each component, contamination cases from a faulty component, and the possible recovery and safety methods. The rule files are established by a system integrator and the fault manager in the framework controls the fault tolerance process according to the rules. We demonstrate that the fault?tolerant framework can incorporate widely accepted fault tolerance techniques. The effectiveness and real?time performance of the framework?based approach and its techniques are examined by testing an autonomous mobile robot in typical fault scenarios.

  4. Analysis of a cascaded multilevel inverter with fault-tolerant control

    Directory of Open Access Journals (Sweden)

    Jesús Aguayo Alquicira

    2011-08-01

    Full Text Available Cascaded multilevel inverters are widely used in industry for speed control of induction motors and, even when the converters’ operation is highly reliable, several faults can occur, leading to poor engine performance or even causing the whole system to stop. It is desirable to keep the system operational when a failure occurs, even when degraded, and implementing fault-tolerant systems are thus a good choice. This paper presents a general strategy for fault-tolerant control in a 7-level cascaded multilevel inverter (the faults are in semiconductor devices; the paper includes simulation and experimental results to validate the method.

  5. Fault Tolerance in ZigBee Wireless Sensor Networks

    Science.gov (United States)

    Alena, Richard; Gilstrap, Ray; Baldwin, Jarren; Stone, Thom; Wilson, Pete

    2011-01-01

    Wireless sensor networks (WSN) based on the IEEE 802.15.4 Personal Area Network standard are finding increasing use in the home automation and emerging smart energy markets. The network and application layers, based on the ZigBee 2007 PRO Standard, provide a convenient framework for component-based software that supports customer solutions from multiple vendors. This technology is supported by System-on-a-Chip solutions, resulting in extremely small and low-power nodes. The Wireless Connections in Space Project addresses the aerospace flight domain for both flight-critical and non-critical avionics. WSNs provide the inherent fault tolerance required for aerospace applications utilizing such technology. The team from Ames Research Center has developed techniques for assessing the fault tolerance of ZigBee WSNs challenged by radio frequency (RF) interference or WSN node failure.

  6. Beam dynamics calculations for fault-tolerance

    International Nuclear Information System (INIS)

    The European Transmutation Demonstration requires a high-power proton accelerator operating in CW mode. This accelerator is also expected to have a very limited number of unexpected beam interruptions per year. To reach such an ambitious goal, it is clear that reliability-oriented design practices need to be followed from the early stage of components design and fault-tolerance capabilities have to be introduced to the maximum extent. The goal of this document is precisely to investigate in more details the fault-tolerance capability of the XT-ADS linac. From previous analysis, it appears that if nothing is done, a cavity's failure leads in nearly all the cases to a complete beam loss, due to the non-relativistic varying velocity of the particles. To avoid such a total beam loss, it is clear that some kind of retuning has to be performed to compensate the lack of acceleration due to the faulty cavity. We have to identify and develop fast failure recovery scenarios to ensure that such retuning can be performed in less than 1 second. 2 ways are investigated. The first way is to stop the beam to achieve the retuning (Scenario 1). The other way is to try to perform the retuning without stopping the beam (Scenario 2). The present analysis demonstrates on the beam dynamics point of view that a fast retuning procedure can be envisaged without stopping the beam (Scenario 2). Nevertheless, this Scenario 2 implies stringent specifications, especially on: - the fault detection time, that has to be extremely short (order of magnitude: 100 ?s) and - the margins required on the accelerating field and RF power point of view, that are higher than in Scenario 1

  7. Proactive and Reactive View Change for Fault Tolerant Byzantine Agreement

    OpenAIRE

    Poonam Saini; Awadhesh K. Singh

    2011-01-01

    Problem statement: Dealing with arbitrary failures effectively, while reaching agreement, remains a major operational challenge in distributed transactions. In the contemporary literature, standard protocols such as Byzantine Fault Tolerant Distributed Commit and Practical Byzantine Fault Tolerance handles the problem to a greater extent. However, the limitation with these protocols is that they incur increased message overhead as well as large latency. Approach: To improv...

  8. FTMP - A highly reliable Fault-Tolerant Multiprocessor for aircraft

    Science.gov (United States)

    Hopkins, A. L., Jr.; Smith, T. B., III; Lala, J. H.

    1978-01-01

    The FTMP (Fault-Tolerant Multiprocessor) is a complex multiprocessor computer that employs a form of redundancy related to systems considered by Mathur (1971), in which each major module can substitute for any other module of the same type. Despite the conceptual simplicity of the redundancy form, the implementation has many intricacies owing partly to the low target failure rate, and partly to the difficulty of eliminating single-fault vulnerability. An extensive analysis of the computer through the use of such modeling techniques as Markov processes and combinatorial mathematics shows that for random hard faults the computer can meet its requirements. It is also shown that the maintenance scheduled at intervals of 200 hr or more can be adequate most of the time.

  9. Hypothetical Scenario Generator for Fault-Tolerant Diagnosis

    Science.gov (United States)

    James, Mark

    2007-01-01

    The Hypothetical Scenario Generator for Fault-tolerant Diagnostics (HSG) is an algorithm being developed in conjunction with other components of artificial- intelligence systems for automated diagnosis and prognosis of faults in spacecraft, aircraft, and other complex engineering systems. By incorporating prognostic capabilities along with advanced diagnostic capabilities, these developments hold promise to increase the safety and affordability of the affected engineering systems by making it possible to obtain timely and accurate information on the statuses of the systems and predicting impending failures well in advance. The HSG is a specific instance of a hypothetical- scenario generator that implements an innovative approach for performing diagnostic reasoning when data are missing. The special purpose served by the HSG is to (1) look for all possible ways in which the present state of the engineering system can be mapped with respect to a given model and (2) generate a prioritized set of future possible states and the scenarios of which they are parts.

  10. Fault-tolerant quantum computation and communication on a distributed 2D array of small local systems

    Energy Technology Data Exchange (ETDEWEB)

    Fujii, K.; Yamamoto, T.; Imoto, N. [Graduate School of Engineering Science, Osaka University, Toyonaka, Osaka 560-8531 (Japan); Koashi, M. [Photon Science Center, The University of Tokyo, 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8656 (Japan)

    2014-12-04

    We propose a scheme for distributed quantum computation with small local systems connected via noisy quantum channels. We show that the proposed scheme tolerates errors with probabilities ?30% and ? 0.1% in quantum channels and local operations, respectively, both of which are improved substantially compared to the previous works.

  11. Fault-tolerant quantum computation and communication on a distributed 2D array of small local systems

    International Nuclear Information System (INIS)

    We propose a scheme for distributed quantum computation with small local systems connected via noisy quantum channels. We show that the proposed scheme tolerates errors with probabilities ?30% and ? 0.1% in quantum channels and local operations, respectively, both of which are improved substantially compared to the previous works

  12. A fault-tolerant multiprocessor architecture for aircraft, volume 1. [autopilot configuration

    Science.gov (United States)

    Smith, T. B.; Hopkins, A. L.; Taylor, W.; Ausrotas, R. A.; Lala, J. H.; Hanley, L. D.; Martin, J. H.

    1978-01-01

    A fault-tolerant multiprocessor architecture is reported. This architecture, together with a comprehensive information system architecture, has important potential for future aircraft applications. A preliminary definition and assessment of a suitable multiprocessor architecture for such applications is developed.

  13. Fault diagnosis and fault-tolerant control and guidance for aerospace vehicles from theory to application

    CERN Document Server

    Zolghadri, Ali; Cieslak, Jerome; Efimov, Denis; Goupil, Philippe

    2014-01-01

    Fault Diagnosis and Fault-Tolerant Control and Guidance for Aerospace demonstrates the attractive potential of recent developments in control for resolving such issues as improved flight performance, self-protection and extended life of structures. Importantly, the text deals with a number of practically significant considerations: tuning, complexity of design, real-time capability, evaluation of worst-case performance, robustness in harsh environments, and extensibility when development or adaptation is required. Coverage of such issues helps to draw the advanced concepts arising from academic research back towards the technological concerns of industry. Initial coverage of basic definitions and ideas and a literature review gives way to a treatment of important electrical flight control system failures: the oscillatory failure case, runaway, and jamming. Advanced fault detection and diagnosis for linear and nonlinear systems are described. Lastly recovery strategies appropriate to remaining acuator/sensor/c...

  14. Adaptive Fault Tolerant Execution of Multi-Robot Missions using Behavior Trees

    OpenAIRE

    Colledanchise, Michele; Marzinotto, Alejandro; Dimarogonas, Dimos V.; Ögren, Petter

    2015-01-01

    Multi-robot teams offer possibilities of improved performance and fault tolerance, compared to single robot solutions. In this paper, we show how to realize those possibilities when starting from a single robot system controlled by a Behavior Tree (BT). By extending the single robot BT to a multi-robot BT, we are able to combine the fault tolerant properties of the BT, in terms of built-in fallbacks, with the fault tolerance inherent in multi-robot approaches, in terms of a ...

  15. Optimized Nanometric Fault Tolerant Reversible BCD Adder

    Directory of Open Access Journals (Sweden)

    Majid Haghparast

    2012-01-01

    Full Text Available In this study a novel nanometric fault tolerant quantum and reversible binary coded decimal adder is proposed. Reversible logic has found emerging attentions in optical information processing, quantum computing, nanotechnology and low power design. BCD Adder is a combinational circuit that can be used for the addition of two numbers in BCD arithmetic's. The proposed reversible BCD adder has also parity preserving property. It is better than all the existing counterparts. The proposed circuit is optimized. It is compared with the existing circuits in terms of number of constant inputs, number of garbage outputs, quantum cost and hardware complexity. All of the parameters are improved dramatically. It is to be noted that all the circuits have nanometric scales.

  16. A Reflective Object-Oriented Architecture for Developing Fault-Tolerant Software

    Scientific Electronic Library Online (English)

    Luiz E., Buzato; Cecília M. F., Rubira; Maria Lúcia B., Lisboa.

    1997-11-01

    Full Text Available This paper proposes a reflective object-oriented architecture for developing fault-tolerant software. Reflective object-oriented programming promotes a modular structuring of systems by means of a new dimension of modularization—the separation between base-level objects and meta-level objects. This [...] property allows the creation of metaobjects responsible for managing tasks of application objects located at the base level. In the context of this work, computational reflection is applied to implement various strategies of fault tolerance at the meta-level in a transparent manner for the application programmer, that is, without interfering with the original structure of application objects that require fault tolerance facilities. The use of the proposed architecture has the following advantages: (i) separation of concerns, that is, separate the concerns related to the application domain from those related to the implementation of fault-tolerant mechanisms; (ii) it promotes code reuse of fault-tolerance mechanisms; (iii) it allows application programmers to use the most adequate fault-tolerance strategy for his implementation, and (iv) it provides a design that is more adaptable, flexible and easier to extend than traditional designs for developing fault-tolerant software. Our reflective architecture is composed of three levels, and is based on the abstraction of object groups.

  17. Guaranteed Cost Fault-tolerant Control of Networked Control Systems with Short Output Delay and Short Control Delay Based on State Observer

    Directory of Open Access Journals (Sweden)

    Xiaomao Huang

    2013-04-01

    Full Text Available Supposing that the sensor and controller nodes were time-driven and the actuator node was event-driven, the problem of integrity against sensor failures for the networked control systems with short output delay and short control delay was discussed based on observer. The state observer of the system according to the time-delay compensation strategy was designed. Then, considering possible sensor failures, an augmented mathematic model for the networked control systems based on observer was developed. In terms of the given quadratic performance index function, the integrity condition of the system was given and the designs for guaranteed cost fault-tolerant controller and observer were presented respectively by using the cooperative design approach of the controller and observer and the approach of bilinear matrix inequalities. Finally, a numerical simulation example demonstrated the conclusions are feasible and effective. The proposed control method meets the requirements in industrial networked control systems.

  18. Hybrid fault tolerance techniques to detect transient faults in embedded processors

    CERN Document Server

    Azambuja, José Rodrigo; Becker, Jürgen

    2014-01-01

    This book describes fault tolerance techniques based on software and hardware to create hybrid techniques. They are able to reduce overall performance degradation and increase error detection when associated with applications implemented in embedded processors. Coverage begins with an extensive discussion of the current state-of-the-art in fault tolerance techniques. The authors then discuss the best trade-off between software-based and hardware-based techniques and introduce novel hybrid techniques. Proposed techniques increase existing fault detection rates up to 100%, while maintaining low performance overheads in area and application execution time. • Discusses the effects of radiation on modern integrated circuits; • Provides a comprehensive overview of state-of-the art fault tolerance techniques based on software, hardware, and hybrid techniques; • Introduces novel hybrid fault tolerance techniques for reconfigurable FPGAs and ASICs; • Performs fault injection campaigns by simulation, bitstream ...

  19. An approach to the verification of a fault-tolerant, computer-based reactor safety system: A case study using automated reasoning: Volume 1: Interim report

    International Nuclear Information System (INIS)

    The purpose of this project is to explore the feasibility of automating the verification process for computer systems. The intent is to demonstrate that both the software and hardware that comprise the system meet specified availability and reliability criteria, that is, total design analysis. The approach to automation is based upon the use of Automated Reasoning Software developed at Argonne National Laboratory. This approach is herein referred to as formal analysis and is based on previous work on the formal verification of digital hardware designs. Formal analysis represents a rigorous evaluation which is appropriate for system acceptance in critical applications, such as a Reactor Safety System (RSS). This report describes a formal analysis technique in the context of a case study, that is, demonstrates the feasibility of applying formal analysis via application. The case study described is based on the Reactor Safety System (RSS) for the Experimental Breeder Reactor-II (EBR-II). This is a system where high reliability and availability are tantamount to safety. The conceptual design for this case study incorporates a Fault-Tolerant Processor (FTP) for the computer environment. An FTP is a computer which has the ability to produce correct results even in the presence of any single fault. This technology was selected as it provides a computer-based equivalent to the traditional analog based RSSs. This provides a more conservative design constraint than that imposed by the IEEE Standard, Criteria For Protection Systems For Nuclear Power Generating Stations (ANSI N42.7-1972)

  20. A fault-tolerant attitude control system for a satellite based on fuzzy global sliding mode control algorithm

    Science.gov (United States)

    Liang, Jinjin; Dong, Chaoyang; Wang, Qing

    2008-10-01

    An effective approach for fault diagnosis of aeroengine based on integration of wavelet analysis and neural networks is presented. The wavelet transform can accurately localizes the characteristics of a signal in time-frequency domains and in a view of the inter relationship of wavelet transform between exponent theory, the whole and local exponents obtained from wavelet transform coefficients as features are presented for extracting fault signals, which are inputted into radial basis function for fault pattern recognition. The fault diagnosis model of aero-engine is established and the improved Levenberg-Marquardt training algorithm is used to fulfill the network structure and parameter identification. By choosing enough samples to train the fault diagnosis network and the information representing the faults input into the neural network, the fault pattern can be determined. The robustness of wavelet neural network for fault diagnosis is discussed. The practical fault diagnosis for aeroengine vibration approves to be accurate and comprehensive.

  1. Software fault-tolerant distributed applications in LiPS

    OpenAIRE

    Setz, Thomas

    1997-01-01

    This paper illustrates how software fault-tolerant distributed applications are implemented within LIPS version 2.4, a system for distributed computing using idle-cycles in networks of workstations. The LIPS system [SR92, SR93,STea94,Set95,SF96,ST96,SL97,Set97] employs the tuple space programming paradigm, as originally used in the LINDA programming language. Applications implemented using this paradigm easily adapt to changes in availability as they occur in workstation networks. In LIPS, ap...

  2. A Study on the Noise Threshold of Fault-tolerant Quantum Error Correction

    CERN Document Server

    Cheng, Y C

    2004-01-01

    Quantum circuits implementing fault-tolerant quantum error correction (QEC) for the three qubit bit-flip code and five-qubit code are studied. To describe the effect of noise, we apply a model based on a generalized effective Hamiltonian where the system-environment interactions are taken into account by including stochastic fluctuating terms in the system Hamiltonian. This noise model enables us to investigate the effect of noise in quantum circuits under realistic device conditions and avoid strong assumptions such as maximal parallelism and weak storage errors. Noise thresholds of the QEC codes are calculated. In addition, the effects of imprecision in projective measurements, collective bath, fault-tolerant repetition protocols, and level of parallelism in circuit constructions on the threshold values are also studied with emphasis on determining the optimal design for the fault-tolerant QEC circuit. These results provide insights into the fault-tolerant QEC process as well as useful information for desig...

  3. A New Fault-tolerant Switched Reluctance Motor with reliable fault detection capability

    DEFF Research Database (Denmark)

    Lu, Kaiyuan

    2014-01-01

    For reliable fault detection, often, search coils are used in many fault-tolerant drives. The search coils occupy extra slot space. They are normally open-circuited and are not used for torque production. This degrades the motor performance, increases the cost and manufacture complexity. A new Fault-Tolerant Switched Reluctance (FTSR) motor is proposed in this paper. A unique feature of this special design is that it allows use of the unexcited phase coils as search coils for fault detection. Therefore this new motor has all the advantages of using search coils for reliable fault detection while no extra search coil is actually needed. The motor itself is able to continue to work under any faulted conditions, providing fault-tolerant features. The working principle, performance evaluation of this motor will be demonstrated in this paper and Finite Element Analysis results are provided.

  4. Formal validation of fault-tolerance mechanisms inside GUARDS

    International Nuclear Information System (INIS)

    In this paper we report the experiments carried out during the specification and validation of the fault-tolerance mechanisms developed in the European project Generic Upgradable Architecture for Real-time Dependable Systems (GUARDS). These mechanisms are the components of an architecture developed for embedded safety-critical systems. The validation approach is based on model-checking techniques and exploits the verification methodology supported by the Just Another Concurrency Kit (JACK) environment. The properties that guarantee the desired behaviour of the mechanisms are specified as temporal logic formulae; the JACK model-checker is then used to verify that the behaviour of the mechanisms satisfy such properties also in the presence of faults

  5. Design of Passive Fault–Tolerant Controllers of a Quadrotor Based on Sliding Mode Theory

    Directory of Open Access Journals (Sweden)

    Merheb Abdel-Razzak

    2015-08-01

    Full Text Available Abstract In this paper, sliding mode control is used to develop two passive fault tolerant controllers for an AscTec Pelican UAV quadrotor. In the first approach, a regular sliding mode controller (SMC augmented with an integrator uses the robustness property of variable structure control to tolerate partial actuator faults. The second approach is a cascaded sliding mode controller with an inner and outer SMC loops. In this configuration, faults are tolerated in the fast inner loop controlling the velocity system. Tuning the controllers to find the optimal values of the sliding mode controller gains is made using the ecological systems algorithm (ESA, a biologically inspired stochastic search algorithm based on the natural equilibrium of animal species. The controllers are tested using SIMULINK in the presence of two different types of actuator faults, partial loss of motor power affecting all the motors at once, and partial loss of motor speed. Results of the quadrotor following a continuous path demonstrated the effectiveness of the controllers, which are able to tolerate a significant number of actuator faults despite the lack of hardware redundancy in the quadrotor system. Tuning the controller using a faulty system improves further its ability to afford more severe faults. Simulation results show that passive schemes reserve their important role in fault tolerant control and are complementary to active techniques

  6. Hardware and software fault tolerance : adaptive architectures in distributed computing environments

    OpenAIRE

    Di giandomenico, Felicita; Bondavalli, Andrea; Xu, Jie

    1995-01-01

    This paper discusses the issue of providing tolerance to hardware and software faults in distributed computing environments as well as issues related to efficiency and flexibility. A set of new fault-tolerant architectures is presented, and a detailed dependability analysis of these architectures is performed together with an efficiency and response time evaluation. The proposed architectural solutions are designed mainly for general-purpose distributed computing systems where many unrelated ...

  7. Fusion of Built in Test (BIT) Technologies with Embeddable Fault Tolerant Techniques for Power System and Drives in Space Exploration Project

    Data.gov (United States)

    National Aeronautics and Space Administration — Impact Technologies has proposed development of an effective prognostic and fault accommodation system for critical DC power systems including PV systems. Overall...

  8. Solar system fault detection

    Science.gov (United States)

    Farrington, Robert B. (Wheatridge, CO); Pruett, Jr., James C. (Lakewood, CO)

    1986-01-01

    A fault detecting apparatus and method are provided for use with an active solar system. The apparatus provides an indication as to whether one or more predetermined faults have occurred in the solar system. The apparatus includes a plurality of sensors, each sensor being used in determining whether a predetermined condition is present. The outputs of the sensors are combined in a pre-established manner in accordance with the kind of predetermined faults to be detected. Indicators communicate with the outputs generated by combining the sensor outputs to give the user of the solar system and the apparatus an indication as to whether a predetermined fault has occurred. Upon detection and indication of any predetermined fault, the user can take appropriate corrective action so that the overall reliability and efficiency of the active solar system are increased.

  9. Wind turbine fault detection and fault tolerant control : An enhanced benchmark challenge

    DEFF Research Database (Denmark)

    Odgaard, Peter Fogh; Johnson, Kathryn

    2013-01-01

    In this updated edition of a previous wind turbine fault detection and fault tolerant control challenge, we present a more sophisticated wind turbine model and updated fault scenarios to enhance the realism of the challenge and therefore the value of the solutions. This paper describes the challenge model and the requirements for challenge participants. In addition, it motivates many of the faults by citing publications that give ?eld data from wind turbine control tests.

  10. Tolerance of Radial-Basis Functions Against Stuck-At-Faults

    OpenAIRE

    Eickhoff, Ralf; Rückert, Ulrich

    2005-01-01

    Neural networks are intended to be used in future nanoelectronic systems since neural architectures seem to be robust against malfunctioning elements and noise in their weights. In this paper we analyze the fault-tolerance of Radial Basis Function networks to Stuck- At-Faults at the trained weights and at the output of neurons. Moreover, we determine upper bounds on the mean square error arising from these faults.

  11. Fault tolerant task execution through global trajectory planning

    International Nuclear Information System (INIS)

    Whether a task can be completed after a failure of one of the degrees-of-freedom of a redundant manipulator depends on the joint angle at which the failure takes place. It is possible to achieve fault tolerance by globally planning a trajectory that avoids unfavourable joint positions before a failure occurs. In this article, we present a trajectory planning algorithm that guarantees fault tolerance while simultaneously satisfying joint limit and obstacle avoidance requirements

  12. Fault tolerance analysis of the class of rearrangeable interconnection networks

    Energy Technology Data Exchange (ETDEWEB)

    Pakzad, S. (Pennsylvania State Univ., University Park, PA (USA). Dept. of Electrical Engineering)

    1989-08-01

    This paper analyzes the fault tolerance characteristics of a range or rearrangeable {beta}-networks based on the concepts and the framework developed by S. Pakzad and S. Lakshmivarahan. These rearrangeable {beta}-networks include the Benes network, the Waksman network, the Joel network, and the serial network. In addition, this paper presents a comparative analysis of the aforementioned networks according to their hardware cost, performance, and degree of fault tolerance.

  13. Fault Tolerance Structure of Radix 2 Signed Digital Adders

    Directory of Open Access Journals (Sweden)

    Jishun Kuang

    2012-01-01

    Full Text Available In this study, structure of fault tolerance adder based on Radix 2 Signed Digital (SD representation is proposed. The “carry-free” property of the SD adder that faults impact limited to a few digits can be used to fault detection which is based on parity checking assumed single fault set. Using an encoding scheme to get the parity value of digits involved in computing, this parity values can be exploited to check the circuit. An error information register is set to store the checking results and the bits of the register indicate the corresponding units faulty or not. According to the fault type, recomputation or reconfiguration is used to error correction. The hardware overhead appending Fault-Tolerant is about 120% and the maximum combinational path delay of the proposed adder is constant with the increase of operands.

  14. Design of neuro fuzzy fault tolerant control using an adaptive observer

    International Nuclear Information System (INIS)

    New methodologies and concepts are developed in the control theory to meet the ever-increasing demands in industrial applications. Fault detection and diagnosis of technical processes have become important in the course of progressive automation in the operation of groups of electric drives. When a group of electric drives is under operation, fault tolerant control becomes complicated. For multiple motors in operation, fault detection and diagnosis might prove to be difficult. Estimation of all states and parameters of all drives is necessary to analyze the actuator and sensor faults. To maintain system reliability, detection and isolation of failures should be performed quickly and accurately, and hardware should be properly integrated. Luenberger full order observer can be used for estimation of the entire states in the system for the detection of actuator and sensor failures. Due to the insensitivity of the Luenberger observer to the system parameter variations, state estimation becomes inaccurate under the varying parameter conditions of the drives. Consequently, the estimation performance deteriorates, resulting in ordinary state observers unsuitable for fault detection technique. Therefore an adaptive observe, which can estimate the system states and parameter and detect the faults simultaneously, is designed in our paper. For a Group of D C drives, there may be parameter variations for some of the drives, and for other drives, there may not be parameter variations depending on load torque, friction, etc. So, estimation of all states and parameters of all drives is carried out using an adaptive observer. If there is any deviation with the estimated values, it is understood that fault has occurred and the nature of the fault, whether sensor fault or actuator fault, is determined by neural fuzzy network, and fault tolerant control is reconfigured. Experimental results with neuro fuzzy system using adaptive observer-based fault tolerant control are good, so as to confirm the best characteristics of the proposed approach

  15. Coordinated Fault-Tolerance for High-Performance Computing Final Project Report

    Energy Technology Data Exchange (ETDEWEB)

    Panda, Dhabaleswar Kumar [The Ohio State University; Beckman, Pete

    2011-07-28

    With the Coordinated Infrastructure for Fault Tolerance Systems (CIFTS, as the original project came to be called) project, our aim has been to understand and tackle the following broad research questions, the answers to which will help the HEC community analyze and shape the direction of research in the field of fault tolerance and resiliency on future high-end leadership systems. Will availability of global fault information, obtained by fault information exchange between the different HEC software on a system, allow individual system software to better detect, diagnose, and adaptively respond to faults? If fault-awareness is raised throughout the system through fault information exchange, is it possible to get all system software working together to provide a more comprehensive end-to-end fault management on the system? What are the missing fault-tolerance features that widely used HEC system software lacks today that would inhibit such software from taking advantage of systemwide global fault information? What are the practical limitations of a systemwide approach for end-to-end fault management based on fault awareness and coordination? What mechanisms, tools, and technologies are needed to bring about fault awareness and coordination of responses on a leadership-class system? What standards, outreach, and community interaction are needed for adoption of the concept of fault awareness and coordination for fault management on future systems? Keeping our overall objectives in mind, the CIFTS team has taken a parallel fourfold approach. Our central goal was to design and implement a light-weight, scalable infrastructure with a simple, standardized interface to allow communication of fault-related information through the system and facilitate coordinated responses. This work led to the development of the Fault Tolerance Backplane (FTB) publish-subscribe API specification, together with a reference implementation and several experimental implementations on top of existing publish-subscribe tools. We enhanced the intrinsic fault tolerance capabilities representative implementations of a variety of key HPC software subsystems and integrated them with the FTB. Targeting software subsystems included: MPI communication libraries, checkpoint/restart libraries, resource managers and job schedulers, and system monitoring tools. Leveraging the aforementioned infrastructure, as well as developing and utilizing additional tools, we have examined issues associated with expanded, end-to-end fault response from both system and application viewpoints. From the standpoint of system operations, we have investigated log and root cause analysis, anomaly detection and fault prediction, and generalized notification mechanisms. Our applications work has included libraries for fault-tolerance linear algebra, application frameworks for coupled multiphysics applications, and external frameworks to support the monitoring and response for general applications. Our final goal was to engage the high-end computing community to increase awareness of tools and issues around coordinated end-to-end fault management.

  16. Adaptive Fault Tolerance for Many-Core Based Space-Borne Computing

    Science.gov (United States)

    James, Mark; Springer, Paul; Zima, Hans

    2010-01-01

    This paper describes an approach to providing software fault tolerance for future deep-space robotic NASA missions, which will require a high degree of autonomy supported by an enhanced on-board computational capability. Such systems have become possible as a result of the emerging many-core technology, which is expected to offer 1024-core chips by 2015. We discuss the challenges and opportunities of this new technology, focusing on introspection-based adaptive fault tolerance that takes into account the specific requirements of applications, guided by a fault model. Introspection supports runtime monitoring of the program execution with the goal of identifying, locating, and analyzing errors. Fault tolerance assertions for the introspection system can be provided by the user, domain-specific knowledge, or via the results of static or dynamic program analysis. This work is part of an on-going project at the Jet Propulsion Laboratory in Pasadena, California.

  17. A Benchmark Evaluation of Fault Tolerant Wind Turbine Control Concepts

    Energy Technology Data Exchange (ETDEWEB)

    Odgaard, Peter F.; Stoustrup, Jakob

    2015-05-01

    As the world’s power supply to a larger and larger degree depends on wind turbines, it is consequently increasingly important that these are as reliable and available as possible. Modern fault tolerant control could play a substantial part in increasing reliability of modern wind turbines. A benchmark model for wind turbine fault detection and isolation and fault tolerant control has previously been proposed. Based on this benchmark an international competition on wind turbine fault tolerant control was announced. In this article the top three solutions from that competition are presented and evaluated. The analysis show that especially the winner of the competition shows potential for wind turbine fault tolerant control. In addition to showing good performance, the approach is based on method which is relevant for industrial usage. It is based on a virtual sensor and actuator strategy, in which the fault accommodation is handled in software sensor and actuator blocks. This means that the wind turbine controller can continue operation as in the fault free case. The other two evaluated solutions show some potential but clearly need improvements.

  18. Hardware and software fault tolerance - A unified architectural approach

    Science.gov (United States)

    Lala, Jaynarayan H.; Alger, Linda S.

    1988-01-01

    The loss of hardware fault tolerance which often arises when design diversity is used to improve the fault tolerance of computer software is considered analytically, and a unified design approach is proposed to avoid the problem. The fundamental theory of fault-tolerant (FT) architectures is reviewed; the current status of design-diversity software development is surveyed; and the FT-processor/attached-processor (FTP/AP) architecture developed by Lala et al. (1986) is described in detail and illustrated with diagrams. FTP/AP is shown to permit efficient implementation of N-version FT software while still tolerating random hardware failures with very high coverage; the reliability is found to be significantly higher than that of conventional majority-vote N-version software.

  19. Fault-tolerant Control of Unmanned Underwater Vehicles with Continuous Faults: Simulations and Experiments

    OpenAIRE

    Qian Liu; Daqi Zhu

    2010-01-01

    A novel thruster fault diagnosis and accommodation method for open-frame underwater vehicles is presented in the paper. The proposed system consists of two units: a fault diagnosis unit and a fault accommodation unit. In the fault diagnosis unit an ICMAC (Improved Credit Assignment Cerebellar Model Articulation Controllers) neural network information fusion model is used to realize the fault identification of the thruster. The fault accommodation unit is based on direct calculations of moment...

  20. Modelling, simulation, and analysis of fault-tolerant multiprocessor architectures

    Energy Technology Data Exchange (ETDEWEB)

    Spicker, G.

    1989-01-01

    With the new generation of very fast microprocessors and support chips, it is now possible to consider the development of new tightly coupled multiprocessor systems. These systems provide high levels of processing power, very fast I/O response, minimum interrupt latency, high levels of availability, and graceful degradation in presence of faults. By measuring the bus utilization and the processing power using simulation and analytical multiprocessor models, the significant impact of having large cache memories on the overall system performance is demonstrated. It also shows a limited increase in the line (block) size for systems with large cache sizes. Analytical and experimental performance measurements for two different styles of processing elements clearly shows that a RISC-based microprocessor outperforms a CISC-based microprocessor by an average factor of three. The memory requirements are also measured, and indicate a very close relationship with the program under analysis. Using the Hybrid Automated Reliability Predictor (HARP) software package from NASA, a family of curves are developed for general multiprocessor configurations. Combinatorial and Markovian models are also developed to illustrate the sensitivity of the models to changes in the number of redundant elements as well as to changes in various reliability and availability system parameters. A case study with the description of two different types of tightly coupled multiprocessor architectures is presented: (1) the N4 CISC-style multiprocessor system, and (2) the C3/FT RISC-style fault-tolerant multiprocessor system. These systems are currently under development by MODCOMP in Fort Lauderdale.

  1. A Remote Characterization System and a fault-tolerant tracking system for subsurface mapping of buried waste sites

    International Nuclear Information System (INIS)

    This paper describes two closely related projects that will provide new technology for characterizing hazardous waste burial sites. The first project, a collaborative effort by five of the national laboratories, involves the development and demonstration of a remotely controlled site characterization system. The Remote Characterization System (RCS) includes a unique low-signature survey vehicle, a base station, radio telemetry data links, satellite-based vehicle tracking, stereo vision, and sensors for noninvasive inspection of the surface and subsurface. The second project, conducted by the Idaho National Engineering Laboratory (INEL), involves the development of a position sensing system that can track a survey vehicle or instrument in the field. This system can coordinate updates at a rate of 200/s with an accuracy better than 0.1% of the distance separating the target and the sensor. It can employ acoustic or electromagnetic signals in a wide range of frequencies and can be operated as a passive or active device

  2. Fault Diagnosis and Accommodation of LTI systems by modified Youla parameterization

    OpenAIRE

    Minupriya A; S.KANTHALAKSHMI; V. Manikandan

    2012-01-01

    In this paper an Active Fault Tolerant Control (FTC) scheme is proposed for Linear Time Invariant (LTI) systems, which achieves fault diagnosis followed by fault accommodation. The fault diagnosis scheme is carried out in two steps; Fault detection followed by Fault isolation. Fault detection filter use the sensor measurements to generate residuals, which have a unique static pattern in response to each fault. Distortion in these static patterns generates the probability of the presence of fa...

  3. Fault tolerant control of multivariable processes using auto-tuning PID controller.

    Science.gov (United States)

    Yu, Ding-Li; Chang, T K; Yu, Ding-Wen

    2005-02-01

    Fault tolerant control of dynamic processes is investigated in this paper using an auto-tuning PID controller. A fault tolerant control scheme is proposed composing an auto-tuning PID controller based on an adaptive neural network model. The model is trained online using the extended Kalman filter (EKF) algorithm to learn system post-fault dynamics. Based on this model, the PID controller adjusts its parameters to compensate the effects of the faults, so that the control performance is recovered from degradation. The auto-tuning algorithm for the PID controller is derived with the Lyapunov method and therefore, the model predicted tracking error is guaranteed to converge asymptotically. The method is applied to a simulated two-input two-output continuous stirred tank reactor (CSTR) with various faults, which demonstrate the applicability of the developed scheme to industrial processes. PMID:15719931

  4. FPGA-Based, Self-Checking, Fault-Tolerant Computers

    Science.gov (United States)

    Some, Raphael; Rennels, David

    2004-01-01

    A proposed computer architecture would exploit the capabilities of commercially available field-programmable gate arrays (FPGAs) to enable computers to detect and recover from bit errors. The main purpose of the proposed architecture is to enable fault-tolerant computing in the presence of single-event upsets (SEUs). [An SEU is a spurious bit flip (also called a soft error) caused by a single impact of ionizing radiation.] The architecture would also enable recovery from some soft errors caused by electrical transients and, to some extent, from intermittent and permanent (hard) errors caused by aging of electronic components. A typical FPGA of the current generation contains one or more complete processor cores, memories, and highspeed serial input/output (I/O) channels, making it possible to shrink a board-level processor node to a single integrated-circuit chip. Custom, highly efficient microcontrollers, general-purpose computers, custom I/O processors, and signal processors can be rapidly and efficiently implemented by use of FPGAs. Unfortunately, FPGAs are susceptible to SEUs. Prior efforts to mitigate the effects of SEUs have yielded solutions that degrade performance of the system and require support from external hardware and software. In comparison with other fault-tolerant- computing architectures (e.g., triple modular redundancy), the proposed architecture could be implemented with less circuitry and lower power demand. Moreover, the fault-tolerant computing functions would require only minimal support from circuitry outside the central processing units (CPUs) of computers, would not require any software support, and would be largely transparent to software and to other computer hardware. There would be two types of modules: a self-checking processor module and a memory system (see figure). The self-checking processor module would be implemented on a single FPGA and would be capable of detecting its own internal errors. It would contain two CPUs executing identical programs in lock step, with comparison of their outputs to detect errors. It would also contain various cache local memory circuits, communication circuits, and configurable special-purpose processors that would use self-checking checkers. (The basic principle of the self-checking checker method is to utilize logic circuitry that generates error signals whenever there is an error in either the checker or the circuit being checked.) The memory system would comprise a main memory and a hardware-controlled check-pointing system (CPS) based on a buffer memory denoted the recovery cache. The main memory would contain random-access memory (RAM) chips and FPGAs that would, in addition to everything else, implement double-error-detecting and single-error-correcting memory functions to enable recovery from single-bit errors.

  5. An Accurate and Fault-Tolerant Target Positioning System for Buildings Using Laser Rangefinders and Low-Cost MEMS-Based MARG Sensors.

    Science.gov (United States)

    Zhao, Lin; Guan, Dongxue; Jr Landry, René; Cheng, Jianhua; Sydorenko, Kostyantyn

    2015-01-01

    Target positioning systems based on MEMS gyros and laser rangefinders (LRs) have extensive prospects due to their advantages of low cost, small size and easy realization. The target positioning accuracy is mainly determined by the LR's attitude derived by the gyros. However, the attitude error is large due to the inherent noises from isolated MEMS gyros. In this paper, both accelerometer/magnetometer and LR attitude aiding systems are introduced to aid MEMS gyros. A no-reset Federated Kalman Filter (FKF) is employed, which consists of two local Kalman Filters (KF) and a Master Filter (MF). The local KFs are designed by using the Direction Cosine Matrix (DCM)-based dynamic equations and the measurements from the two aiding systems. The KFs can estimate the attitude simultaneously to limit the attitude errors resulting from the gyros. Then, the MF fuses the redundant attitude estimates to yield globally optimal estimates. Simulation and experimental results demonstrate that the FKF-based system can improve the target positioning accuracy effectively and allow for good fault-tolerant capability. PMID:26512672

  6. Improvement of Matrix Converter Drive Reliability by Online Fault Detection and a Fault-Tolerant Switching Strategy.

    DEFF Research Database (Denmark)

    Nguyen-Duy, Khiem; Liu, Tian-Hua

    2011-01-01

    The matrix converter system is becoming a very promising candidate to replace the conventional two-stage ac/dc/ac converter, but system reliability remains an open issue. The most common reliability problem is that a bidirectional switch has an open-switch fault during operation. In this paper, a matrix converter driving a speed-controlled permanent-magnet synchronous motor is examined under a single open-switch fault. First, a new fault-detection method is proposed using only the motor currents. Second, a novel fault-tolerant switching strategy is presented. By treating the matrix converter as a two-stage rectifier/inverter, existing modulation techniques for the inverter stage can be reused, whereas the rectifier stage is modified by control to counteract the fault. However, the proposed techniques require no additional hardware devices or circuit modifications to the matrix converter. Experimental results show that the proposed method can maintain the motor speed with a maximum ripple of 2%—a fivefold improvement over the uncompensated system. The proposed method therefore offers a very economical and effective solution for the matrix converter fault tolerance problem.

  7. Buffered coscheduling for parallel programming and enhanced fault tolerance

    Science.gov (United States)

    Petrini, Fabrizio (Los Alamos, NM); Feng, Wu-chun (Los Alamos, NM)

    2006-01-31

    A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval. The buffered coscheduling method of this invention also enhances the fault tolerance of a network of parallel machine processors or distributed system processors

  8. Simulating chemistry efficiently on fault-tolerant quantum computers

    CERN Document Server

    Jones, N Cody; McMahon, Peter L; Yung, Man-Hong; Van Meter, Rodney; Aspuru-Guzik, Alán; Yamamoto, Yoshihisa

    2012-01-01

    Quantum computers can in principle simulate quantum physics exponentially faster than their classical counterparts, but some technical hurdles remain. Here we consider methods to make proposed chemical simulation algorithms computationally fast on fault-tolerant quantum computers in the circuit model. Fault tolerance constrains the choice of available gates, so that arbitrary gates required for a simulation algorithm must be constructed from sequences of fundamental operations. We examine techniques for constructing arbitrary gates which perform substantially faster than circuits based on the conventional Solovay-Kitaev algorithm [C.M. Dawson and M.A. Nielsen, \\emph{Quantum Inf. Comput.}, \\textbf{6}:81, 2006]. For a given approximation error $\\epsilon$, arbitrary single-qubit gates can be produced fault-tolerantly and using a limited set of gates in time which is $O(\\log \\epsilon)$ or $O(\\log \\log \\epsilon)$; with sufficient parallel preparation of ancillas, constant average depth is possible using a method w...

  9. Particle Filter Based Fault-tolerant ROV Navigation using Hydro-acoustic Position and Doppler Velocity Measurements

    DEFF Research Database (Denmark)

    Zhao, Bo; Blanke, Mogens

    2012-01-01

    This paper presents a fault tolerant navigation system for a remotely operated vehicle (ROV). The navigation system uses hydro-acoustic position reference (HPR) and Doppler velocity log (DVL) measurements to achieve an integrated navigation. The fault tolerant functionality is based on a modied particle lter. This particle lter is able to run in an asynchronous manner to accommodate the measurement drop out problem, and it overcomes the measurement outliers by switching observation models. Simulations with experimental data show that this fault tolerant navigation system can accurately estimate the ROV kinematic states, even when sensor failures appear frequently.

  10. On fault-tolerant structure, distributed fault-diagnosis, reconfiguration, and recovery of the array processors

    Energy Technology Data Exchange (ETDEWEB)

    Hosseini, S.H.

    1989-07-01

    The increasing need for the design of high-performance computers has led to the design of special purpose computers such as array processors. This paper studies the design of fault-tolerant array processors. First, it is shown how hardware redundancy can be employed in the existing structures in order to make them capable of withstanding the failure of some of the array links and processors. Then distributed fault-tolerance schemes are introduced for the diagnosis of the faulty elements, reconfiguration, and recovery of the array. Fault tolerance is maintained by the cooperation of processors in a decentralized form of control without the participation of any type of hardcore or fault-free central controller such as a host computer.

  11. Fault tolerance in space-based digital signal processing and switching systems: Protecting up-link processing resources, demultiplexer, demodulator, and decoder

    Science.gov (United States)

    Redinbo, Robert

    1994-01-01

    Fault tolerance features in the first three major subsystems appearing in the next generation of communications satellites are described. These satellites will contain extensive but efficient high-speed processing and switching capabilities to support the low signal strengths associated with very small aperture terminals. The terminals' numerous data channels are combined through frequency division multiplexing (FDM) on the up-links and are protected individually by forward error-correcting (FEC) binary convolutional codes. The front-end processing resources, demultiplexer, demodulators, and FEC decoders extract all data channels which are then switched individually, multiplexed, and remodulated before retransmission to earth terminals through narrow beam spot antennas. Algorithm based fault tolerance (ABFT) techniques, which relate real number parity values with data flows and operations, are used to protect the data processing operations. The additional checking features utilize resources that can be substituted for normal processing elements when resource reconfiguration is required to replace a failed unit.

  12. Fault-tolerant Sensor Fusion for Marine Navigation

    DEFF Research Database (Denmark)

    Blanke, Mogens

    2006-01-01

    Reliability of navigation data are critical for steering and manoeuvring control, and in particular so at high speed or in critical phases of a mission. Should faults occur, faulty instruments need be autonomously isolated and faulty information discarded. This paper designs a navigation solution where essential navigation information is provided even with multiple faults in instrumentation. The paper proposes a provable correct implementation through auto-generated state-event logics in a supervisory part of the algorithms. Test results from naval vessels document the performance and shows events where the fault-tolerant sensor fusion provided uninterrupted navigation data despite temporal instrument defects

  13. Fault Tolerant Message Efficient Coordinator Election Algorithm in High Traffic Bidirectional Ring Network

    Directory of Open Access Journals (Sweden)

    Danial Rahdari

    2012-12-01

    Full Text Available Nowadays use of distributed systems such as internet and cloud computing is growing dramatically. Coordinator existence in these systems is crucial due to processes coordinating and consistency requirement as well. However the growth makes their election algorithm even more complicated. Too many algorithms are proposed in this area but the two most well known one are Bully and Ring. In this paper we propose a fault tolerant coordinator election algorithm in typical bidirectional ring topology which is twice as fast as Ring algorithm although far fewer messages are passing due to election. Fault tolerance technique is applied which leads the waiting time for the election reaching to zero.

  14. Energy Bounds for Fault-Tolerant Nanoscale Designs

    CERN Document Server

    Marculescu, Diana

    2011-01-01

    The problem of determining lower bounds for the energy cost of a given nanoscale design is addressed via a complexity theory-based approach. This paper provides a theoretical framework that is able to assess the trade-offs existing in nanoscale designs between the amount of redundancy needed for a given level of resilience to errors and the associated energy cost. Circuit size, logic depth and error resilience are analyzed and brought together in a theoretical framework that can be seamlessly integrated with automated synthesis tools and can guide the design process of nanoscale systems comprised of failure prone devices. The impact of redundancy addition on the switching energy and its relationship with leakage energy is modeled in detail. Results show that 99% error resilience is possible for fault-tolerant designs, but at the expense of at least 40% more energy if individual gates fail independently with probability of 1%.

  15. Final Project Report. Scalable fault tolerance runtime technology for petascale computers

    Energy Technology Data Exchange (ETDEWEB)

    Krishnamoorthy, Sriram [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Sadayappan, P [Ohio State Univ., Columbus, OH (United States)

    2015-06-16

    With the massive number of components comprising the forthcoming petascale computer systems, hardware failures will be routinely encountered during execution of large-scale applications. Due to the multidisciplinary, multiresolution, and multiscale nature of scientific problems that drive the demand for high end systems, applications place increasingly differing demands on the system resources: disk, network, memory, and CPU. In addition to MPI, future applications are expected to use advanced programming models such as those developed under the DARPA HPCS program as well as existing global address space programming models such as Global Arrays, UPC, and Co-Array Fortran. While there has been a considerable amount of work in fault tolerant MPI with a number of strategies and extensions for fault tolerance proposed, virtually none of advanced models proposed for emerging petascale systems is currently fault aware. To achieve fault tolerance, development of underlying runtime and OS technologies able to scale to petascale level is needed. This project has evaluated range of runtime techniques for fault tolerance for advanced programming models.

  16. Design and analysis of linear fault-tolerant permanent-magnet vernier machines.

    Science.gov (United States)

    Xu, Liang; Ji, Jinghua; Liu, Guohai; Du, Yi; Liu, Hu

    2014-01-01

    This paper proposes a new linear fault-tolerant permanent-magnet (PM) vernier (LFTPMV) machine, which can offer high thrust by using the magnetic gear effect. Both PMs and windings of the proposed machine are on short mover, while the long stator is only manufactured from iron. Hence, the proposed machine is very suitable for long stroke system applications. The key of this machine is that the magnetizer splits the two movers with modular and complementary structures. Hence, the proposed machine offers improved symmetrical and sinusoidal back electromotive force waveform and reduced detent force. Furthermore, owing to the complementary structure, the proposed machine possesses favorable fault-tolerant capability, namely, independent phases. In particular, differing from the existing fault-tolerant machines, the proposed machine offers fault tolerance without sacrificing thrust density. This is because neither fault-tolerant teeth nor the flux-barriers are adopted. The electromagnetic characteristics of the proposed machine are analyzed using the time-stepping finite-element method, which verifies the effectiveness of the theoretical analysis. PMID:24982959

  17. Design of Fault Tolerant Network Interfaces for NoCs

    DEFF Research Database (Denmark)

    Fiorin, Leandro; Micconi, Laura

    2011-01-01

    Networks-on-Chip (NoCs) appeared as a strategy to deal with the communication requirements of complex IP-based System-on-Chips. As the complexity of designs increases and the technology scales down into the deep-submicron domain, the probability of malfunctions and failures in the NoC components increases. This paper focuses on the study and evaluation of techniques for increasing reliability and resilience of Network Interfaces (NIs). NIs act as interfaces between IP cores and the communication infrastructure; a faulty behavior in them could affect therefore the overall system. In this work, we propose a functional fault model for the NI components, and we present a two-level fault tolerant solution that can be employed for mitigating the effects of both single-event upset soft errors and hard errors on the NI. Experiments show that with a limited overhead we can obtain a significant reliability of the NI, while saving up to 83% in area with respect to a standard Triple Modular Redundancy implementation, as well as a significant energy reduction.

  18. Strategic Planning for Fault-Tolerant Internet Connectivity Using Basic Fault-Tolerant Architectural Design as Platform

    OpenAIRE

    Adeosun, O.O; E.R. Adagunodo; I.A. Adetunde; T. H. Adeosun

    2008-01-01

    Present focus in this study is to provide Internet connectivity without any interruption even at the presence of faults/failures thereby enhancing Internet services performance. To achieve this, the deployment and redeployment of faulty component(s) are done using Basic Fault-Tolerant (BFT) architectural design. A framework to provide enhanced performance in terms of confidentiality, integrity and availability in clusters is suggested using BFT, considering all sources of vulnerabilities incl...

  19. Fault tolerance and reliability in integrated ship control : the ATOMOS concept

    DEFF Research Database (Denmark)

    Nielsen, Jens Frederik Dalsgaard; Izadi-Zamanabadi, Roozbeh

    2002-01-01

    Various strategies for achieving fault tolerance in large scale control systems are discussed. The positive and negative impacts of distribution through network communication are presented. The ATOMOS framework for standardized reliable marine automation is presented along with the corresponding reliability issues. A generic framework for simulation of network traffic under fault conditions is suggested and the first practical experiences from a prototype implementation are reported.

  20. Passive fault tolerant control of a double inverted pendulum - a case study

    DEFF Research Database (Denmark)

    Niemann, Hans Henrik; Stoustrup, Jakob

    2005-01-01

    A passive fault tolerant control scheme is suggested, in which a nominal controller is augmented with an additional block, which guarantees stability and performance after the occurrence of a fault. The method is based on the YJBK parameterization, which requires the nominal controller to be implemented in observer based form. The proposed method is applied to a double inverted pendulum system, for which an H_inf controller has been designed and verified in a lab setup. In this case study, the f...

  1. Fault Tolerance for Industrial Actuators in Absence of Accurate Models and Hardware Redundancy

    DEFF Research Database (Denmark)

    Papageorgiou, Dimitrios; Blanke, Mogens

    2015-01-01

    This paper investigates Fault-Tolerant Control for closed-loop systems where only coarse models are available and there is lack of actuator and sensor redundancies. The problem is approached in the form of a typical servomotor in closed-loop. A linear model is extracted from input/output data to describe the system over a frequency range. Two methods based on the Kalman Filter and Statistical Change Detection techniques are proposed for detecting degradation faults and component failures, respectively. Finally, a reference correction setup is used to compensate for degradation faults.

  2. FAULT TOLERANCE USING CREDENTIALS MANAGEMENT IN ONLINE TRANSACTION APPLICATION

    Directory of Open Access Journals (Sweden)

    L. Javid Ali

    2014-07-01

    Full Text Available Web applications play a vital role in the IT field for satisfying the web customer. The customer always depends on the online transaction processing system. The web application has various forms which gives a complete service to the customer. These various forms have options that are used to satisfy the customer’s needs because of the attraction over web sites existing in the global market. The traditional web pages will be closed from the current session whenever the customer selects an improper option because of single sign-on property. Selection of wrong option that is not suitable for the current session will lead to reliability problem. If the same user needs the same service, again he has to navigate from home page to the required page, thus adding up extra burden on customer. The customer session should be maintained properly, so that the customer’s satisfaction is retained over the online web application. The existing system classifies the user with their access level and also their fault level. The main objective of the proposed work is to manage the credential in all levels in order to keep the valuable customer for a long time of access in the current session. The credential management and session management are used to manage a multilevel credential from web client to web resource level and vice versa. The options selected by the customer can be classified based on the fault and type of access. The credential management also performs the maintenance process for fixing the fault tolerance level to the web user. A complete log is recorded to trace the overall process in the online transaction processing.

  3. Fault detection in nonlinear systems

    OpenAIRE

    Adjallah, Kondo; KRATZ, Frédéric; Maquin, Didier

    1993-01-01

    This communication deals with the problem of fault detection and localization for a wide class of nonlinear systems subjected to bounded nonlinearities. A dedicated nonlinear observer scheme for fault detection and identification of observable systems is proposed.

  4. Fault-tolerant Algorithms for Tick-Generation in Asynchronous Logic: Robust Pulse Generation

    CERN Document Server

    Dolev, Danny; Lenzen, Christoph; Schmid, Ulrich

    2011-01-01

    Today's hardware technology presents a new challenge in designing robust systems. Deep submicron VLSI technology introduced transient and permanent faults that were never considered in low-level system designs in the past. Still, robustness of that part of the system is crucial and needs to be guaranteed for any successful product. Distributed systems, on the other hand, have been dealing with similar issues for decades. However, neither the basic abstractions nor the complexity of contemporary fault-tolerant distributed algorithms match the peculiarities of hardware implementations. This paper is intended to be part of an attempt striving to overcome this gap between theory and practice for the clock synchronization problem. Solving this task sufficiently well will allow to build a very robust high-precision clocking system for hardware designs like systems-on-chips in critical applications. As our first building block, we describe and prove correct a novel Byzantine fault-tolerant self-stabilizing pulse syn...

  5. Fault Tolerant, Radiation Hard DSP Project

    Data.gov (United States)

    National Aeronautics and Space Administration — We propose to develop a radiation tolerant/hardened signal processing node, which effectively utilizes state-of-the-art commercial semiconductors plus our...

  6. Fault-model development for fault-tolerant VLSI design. Final technical report, May 1986-May 1987

    Energy Technology Data Exchange (ETDEWEB)

    Hartmann, C.R.; Lala, P.K.; Ali, A.M.; Visweswaran, G.S.; Ganguly, S.

    1988-05-01

    Fault models provide systematic and precise representations of physical defects in microcircuits in a form suitable for simulation and test generation. The current difficulty in testing VLSI circuits can be attributed to the tremendous increase in design complexity and the inappropriateness of traditional stuck-at fault models. This report develops fault models for three different types of common defects that are not accurately represented by the stuck-at fault model. The faults examined in this report are: bridging faults, transistor stuck-open faults, and transient faults caused by alpha-particle radiation. A generalized fault model could not be developed for the three fault types. However, microcircuit behavior and fault-detection strategies are described for the bridging, transistor stuck-open, and transient (alpha-particle strike) faults. The results of this study can be applied to the simulation and analysis of faults in fault-tolerant VLSI circuits.

  7. Survey on SOA Fault Tolerance Techniques

    Directory of Open Access Journals (Sweden)

    Anushka

    2013-06-01

    Full Text Available The collection of discrete software modules forming software architecture design pattern to deliver services. These services collectively provide the complete functionality of a large software application. SOA allows performing the operations of a large number of computers that are connected over a network with ease. SOA exhibits features loose coupling, black box, dynamism, etc. Sometimes, SOA faces problem in delivering messages between different services and faults in the SOA components, etc. These faults are diagnosed with various methods such as actuators, different service routing methods, MAPE, etc.

  8. Fault-tolerant three-level inverter

    Science.gov (United States)

    Edwards, John; Xu, Longya; Bhargava, Brij B.

    2006-12-05

    A method for driving a neutral point clamped three-level inverter is provided. In one exemplary embodiment, DC current is received at a neutral point-clamped three-level inverter. The inverter has a plurality of nodes including first, second and third output nodes. The inverter also has a plurality of switches. Faults are checked for in the inverter and predetermined switches are automatically activated responsive to a detected fault such that three-phase electrical power is provided at the output nodes.

  9. The Kaleidoscope switch-a new concept for implementation of a large and fault tolerant ATM switch system

    DEFF Research Database (Denmark)

    Dittmann, Lars

    1997-01-01

    This paper describes a new concept for implementing a large switch network based on smaller modules. The concept is based an an alternative self-routing structure that due to a point symmetry allows the bit in the routing tag to be processed in a random order. Among others this property provides an inherent fault protection and allows a simple implementation of broadcast and multicast. The concept has been implemented as a small prototype, that currently is used in a national experimental ATM ne...

  10. Implementation of fault tolerant control for modular multilevel converter using EtherCAT communication

    DEFF Research Database (Denmark)

    Dan Burlacu, Paul; Mathe, Laszlo

    2015-01-01

    Modular Multilevel Converter (MMC) is very promising technology this days. It offers fault tolerant capabilities and ensures high efficiency with low output voltage harmonic content which results in need for smaller filter size. A disadvantage of the system is that the control becomes more cumbersome due to the high number of the employed submodules. A very efficient way to control the MMC is by using a real time communication platform between the sub-modules and a central unit. Thus, the central unit can deal with the overall control and some of the tasks can be distributed to the submodules. This communication platform has to ensure a perfect synchronization between the modules, and it should be also fault tolerant. The analysis of a MMC based on EtherCAT is presented in this paper from implementation and module fault point of view. The experimental tests show that the MMC operates after communication or module hardware fault occurred.

  11. Algorithms for testing fault-tolerance of sequenced jobs.

    Czech Academy of Sciences Publication Activity Database

    Chrobak, M.; Hurand, M.; Sgall, Ji?í

    2009-01-01

    Ro?. 12, ?. 5 (2009), s. 501-515. ISSN 1094-6136 R&D Projects: GA MŠk(CZ) 1M0545; GA AV ?R IAA100190902; GA AV ?R IAA1019401 Keywords : sequencing algorithms * fault-tolerance * dynamic programming Subject RIV: IN - Informatics, Computer Science Impact factor: 1.265, year: 2009

  12. Fault tolerant computer for nuclear power plant applications

    International Nuclear Information System (INIS)

    A quadruply redundant synchronous fault tolerant processor (FTP) is now under fabrication at the C.S. Draper Laboratory to be used initially as a trip monitor for the Experimental Breeder Reactor EBR-II operated by the Argonne National Laboratory in Idaho Falls, Idaho. The hardware architecture of this processor is described and certain issues unique to quadruply redundant computers are discussed

  13. Critique of Fault-Tolerant Quantum Information Processing

    OpenAIRE

    Alicki, Robert

    2013-01-01

    This is a chapter in a book \\emph{Quantum Error Correction} edited by D. A. Lidar and T. A. Brun, and published by Cambridge University Press (2013)\\\\ (http://www.cambridge.org/us/academic/subjects/physics/quantum-physics-quantum-information-and-quantum-computation/quantum-error-correction)\\\\ presenting the author's view on feasibility of fault-tolerant quantum information processing.

  14. Fault-tolerant computer study. [logic designs for building block circuits

    Science.gov (United States)

    Rennels, D. A.; Avizienis, A. A.; Ercegovac, M. D.

    1981-01-01

    A set of building block circuits is described which can be used with commercially available microprocessors and memories to implement fault tolerant distributed computer systems. Each building block circuit is intended for VLSI implementation as a single chip. Several building blocks and associated processor and memory chips form a self checking computer module with self contained input output and interfaces to redundant communications buses. Fault tolerance is achieved by connecting self checking computer modules into a redundant network in which backup buses and computer modules are provided to circumvent failures. The requirements and design methodology which led to the definition of the building block circuits are discussed.

  15. Reversible Logic Synthesis of Fault Tolerant Carry Skip BCD Adder

    CERN Document Server

    Islam, Md Saiful; 10.3329/jbas.v32i2.2431

    2010-01-01

    Reversible logic is emerging as an important research area having its application in diverse fields such as low power CMOS design, digital signal processing, cryptography, quantum computing and optical information processing. This paper presents a new 4*4 parity preserving reversible logic gate, IG. The proposed parity preserving reversible gate can be used to synthesize any arbitrary Boolean function. It allows any fault that affects no more than a single signal readily detectable at the circuit's primary outputs. It is shown that a fault tolerant reversible full adder circuit can be realized using only two IGs. The proposed fault tolerant full adder (FTFA) is used to design other arithmetic logic circuits for which it is used as the fundamental building block. It has also been demonstrated that the proposed design offers less hardware complexity and is efficient in terms of gate count, garbage outputs and constant inputs than the existing counterparts.

  16. Particle Filter Based Fault-tolerant ROV Navigation using Hydro-acoustic Position and Doppler Velocity Measurements

    DEFF Research Database (Denmark)

    Zhao, Bo; Blanke, Mogens; Skjetne, Roger

    2012-01-01

    This paper presents a fault tolerant navigation system for a remotely operated vehicle (ROV). The navigation system uses hydro-acoustic position reference (HPR) and Doppler velocity log (DVL) measurements to achieve an integrated navigation. The fault tolerant functionality is based on a modied particle lter. This particle lter is able to run in an asynchronous manner to accommodate the measurement drop out problem, and it overcomes the measurement outliers by switching observation models. Simul...

  17. Combining dynamical decoupling with fault-tolerant quantum computation

    International Nuclear Information System (INIS)

    We study how dynamical decoupling (DD) pulse sequences can improve the reliability of quantum computers. We prove upper bounds on the accuracy of DD-protected quantum gates and derive sufficient conditions for DD-protected gates to outperform unprotected gates. Under suitable conditions, fault-tolerant quantum circuits constructed from DD-protected gates can tolerate stronger noise and have a lower overhead cost than fault-tolerant circuits constructed from unprotected gates. Our accuracy estimates depend on the dynamics of the bath that couples to the quantum computer and can be expressed either in terms of the operator norm of the bath's Hamiltonian or in terms of the power spectrum of bath correlations; we explain in particular how the performance of recursively generated concatenated pulse sequences can be analyzed from either viewpoint. Our results apply to Hamiltonian noise models with limited spatial correlations.

  18. Active fault tolerant control research for nuclear power plant based on BP neural network

    International Nuclear Information System (INIS)

    In view of the sensor fault of nuclear power plant, the sensor was trained by adopting improved back propagation (BP) neural network method, and the dynamic model bank in different states was set up. The system was detected by using BP neural network in real time. When the sensor goes wrong, it will be controlled by reconstruction. Taking pressurizer as the case, a simulation experiment was performed on the nuclear power plant simulator. The results show that the proposed method is valid for the fault tolerant control of sensor faults in nuclear power plant. (authors)

  19. Passive fault tolerant control of a double inverted pendulum - a case study

    DEFF Research Database (Denmark)

    Niemann, Hans Henrik; Stoustrup, Jakob

    2005-01-01

    A passive fault tolerant control scheme is suggested, in which a nominal controller is augmented with an additional block, which guarantees stability and performance after the occurrence of a fault. The method is based on the YJBK parameterization, which requires the nominal controller to be implemented in observer based form. The proposed method is applied to a double inverted pendulum system, for which an H_inf controller has been designed and verified in a lab setup. In this case study, the fault is a degradation of the tacho loop.

  20. Passive Fault tolerant Control of an Inverted Double Pendulum : A Case Study Example

    DEFF Research Database (Denmark)

    Niemann, H.; Stoustrup, Jakob

    2003-01-01

    A passive fault tolerant control scheme is suggested, in which a nominal controller is augmented with an additional block, which guarantees stability and performance after the occurrence of a fault. The method is based on the Youla parameterization, which requires the nominal controller to be implemented in the observer based form. The proposed method is applied to a double inverted pendulum system, for which an H controller has been designed and verified in a lap setup. In this case study, the fault is a degradation of the tacho loop.

  1. Fault Detection, Isolation, and Accommodation for LTI Systems Based on GIMC Structure

    OpenAIRE

    D. U. Campos-Delgado; Palacios, E.; D. R. Espinoza-Trejo

    2008-01-01

    In this contribution, an active fault-tolerant scheme that achieves fault detection, isolation, and accommodation is developed for LTI systems. Faults and perturbations are considered as additive signals that modify the state or output equations. The accommodation scheme is based on the generalized internal model control architecture recently proposed for fault-tolerant control. In order to improve the performance after a fault, the compensation is considered in two steps according with a fau...

  2. Review of fault diagnosis and fault-tolerant control for modular multilevel converter of HVDC

    DEFF Research Database (Denmark)

    Liu, Hui; Loh, Poh Chiang

    2013-01-01

    This review focuses on faults in Modular Multilevel Converter (MMC) for use in high voltage direct current (HVDC) systems by analyzing the vulnerable spots and failure mechanism from device to system and illustrating the control & protection methods under failure condition. At the beginning, several typical topologies of MMC-HVDC systems are presented. Then fault types such as capacitor voltage unbalance, unbalance between upper and lower arm voltage are analyzed and the corresponding fault detection and diagnosis approaches are explained. In addition, more attention is dedicated to control strategies, when running in MMC faults or grid faults. This paper ends up with a discussion of other opportunities for future development.

  3. Fault detection, isolation and reconfiguration in FTMP Methods and experimental results. [fault tolerant multiprocessor

    Science.gov (United States)

    Lala, J. H.

    1983-01-01

    The Fault-Tolerant Multiprocessor (FTMP) is a highly reliable computer designed to meet a goal of 10 to the -10th failures per hour and built with the objective of flying an active-control transport aircraft. Fault detection, identification, and recovery software is described, and experimental results obtained by injecting faults in the pin level in the FTMP are presented. Over 21,000 faults were injected in the CPU, memory, bus interface circuits, and error detection, masking, and error reporting circuits of one LRU of the multiprocessor. Detection, isolation, and reconfiguration times were recorded for each fault, and the results were found to agree well with earlier assumptions made in reliability modeling.

  4. Multiversion software reliability through fault-avoidance and fault-tolerance

    Science.gov (United States)

    Vouk, Mladen A.; Mcallister, David F.

    1990-01-01

    In this project we have proposed to investigate a number of experimental and theoretical issues associated with the practical use of multi-version software in providing dependable software through fault-avoidance and fault-elimination, as well as run-time tolerance of software faults. In the period reported here we have working on the following: We have continued collection of data on the relationships between software faults and reliability, and the coverage provided by the testing process as measured by different metrics (including data flow metrics). We continued work on software reliability estimation methods based on non-random sampling, and the relationship between software reliability and code coverage provided through testing. We have continued studying back-to-back testing as an efficient mechanism for removal of uncorrelated faults, and common-cause faults of variable span. We have also been studying back-to-back testing as a tool for improvement of the software change process, including regression testing. We continued investigating existing, and worked on formulation of new fault-tolerance models. In particular, we have partly finished evaluation of Consensus Voting in the presence of correlated failures, and are in the process of finishing evaluation of Consensus Recovery Block (CRB) under failure correlation. We find both approaches far superior to commonly employed fixed agreement number voting (usually majority voting). We have also finished a cost analysis of the CRB approach.

  5. Fault Tolerant Control Using Proportional-Integral-Derivative Controller Tuned by Genetic Algorithm

    Directory of Open Access Journals (Sweden)

    S. Kanthalakshmi

    2011-01-01

    Full Text Available Problem statement: The growing demand for reliability, maintainability and survivability in industrial processes has drawn significant research in fault detection and fault tolerant control domain. A fault is usually defined as an unexpected change in a system, such as component malfunction and variations in operating condition, which tends to degrade the overall system performance. The purpose of fault detection is to detect these malfunctions to take proper action in order to prevent faults from developing into a total system failure. Approach: In this study an effective integrated fault detection and fault tolerant control scheme was developed for a class of LTI system. The scheme was based on a Kalman filter for simultaneous state and fault parameter estimation, statistical decisions for fault detection and activation of controller reconfiguration. Proportional-Integral-Derivative (PID control schemes continue to provide the simplest and yet effective solutions to most of the control engineering applications today. Determination or tuning of the PID parameters continues to be important as these parameters have a great influence on the stability and performance of the control system. In this study GA was proposed to tune the PID controller. Results: The results reflect that proposed scheme improves the performance of the process in terms of time domain specifications, robustness to parametric changes and optimum stability. Also, A comparison with the conventional Ziegler-Nichols method proves the superiority of GA based system. Conclusion: This study demonstrates the effectiveness of genetic algorithm in tuning of a PID controller with optimum parameters. It is, moreover, proved to be robust to the variations in plant dynamic characteristics and disturbances assuring a parameter-insensitive operation of the process.

  6. Advanced Information Processing System - Fault detection and error handling

    Science.gov (United States)

    Lala, J. H.

    1985-01-01

    The Advanced Information Processing System (AIPS) is designed to provide a fault tolerant and damage tolerant data processing architecture for a broad range of aerospace vehicles, including tactical and transport aircraft, and manned and autonomous spacecraft. A proof-of-concept (POC) system is now in the detailed design and fabrication phase. This paper gives an overview of a preliminary fault detection and error handling philosophy in AIPS.

  7. Fault Tolerant Wind Farm Control : a Benchmark Model

    DEFF Research Database (Denmark)

    Odgaard, Peter Fogh; Stoustrup, Jakob

    2013-01-01

    This paper presents a test benchmark model for the evaluation of fault detection and accommodation schemes. This benchmark model deals with the wind turbine on a system level, and it includes sensor, actuator, and system faults, namely faults in the pitch system, the drive train, the generator, and the converter system. Since it is a system-level model, converter and pitch system models are simplified because these are controlled by internal controllers working at higher frequencies than the system model. The model represents a three-bladed pitch-controlled variable-speed wind turbine with a nominal power of 4.8 MW. The fault detection and isolation (FDI) problem was addressed by several teams, and five of the solutions are compared in the second part of this paper. This comparison relies on additional test data in which the faults occur in different operating conditions than in the test data used for the FDI design.

  8. Fault Tolerant Control of Wind Turbines : A benchmark model

    DEFF Research Database (Denmark)

    Odgaard, Peter Fogh; Stoustrup, Jakob

    2013-01-01

    This paper presents a test benchmark model for the evaluation of fault detection and accommodation schemes. This benchmark model deals with the wind turbine on a system level, and it includes sensor, actuator, and system faults, namely faults in the pitch system, the drive train, the generator, and the converter system. Since it is a system-level model, converter and pitch system models are simplified because these are controlled by internal controllers working at higher frequencies than the system model. The model represents a three-bladed pitch-controlled variable-speed wind turbine with a nominal power of 4.8 MW. The fault detection and isolation (FDI) problem was addressed by several teams, and five of the solutions are compared in the second part of this paper. This comparison relies on additional test data in which the faults occur in different operating conditions than in the test data used for the FDI design.

  9. Fault-tolerance performance evaluation of fieldbus for NPCS network of KNGR

    International Nuclear Information System (INIS)

    In contrast with conventional fieldbus researches which are focused merely on real time performance, this study aims to evaluate the real-time performance of the communication system including fault-tolerant mechanisms. Maintaining performance in presence of recoverable faults is very important because the communication network will be applied to next generation NPP(Nuclear Power Plant). In order to guarantee the performance of NPP communication network, the time characteristics of the target system in presence of recoverable fault should be investigated. If the time characteristics meet the requirements of the system, the faults will be recovered by fieldbus recovery mechanisms and the system will be safe. If the time characteristics can not meet the requirements, the faults in the fieldbus can propagate to system failure. In this study, for the purpose of investigating the time characteristics of fieldbus, the recoverable faults are classified and then the formulas which represent delays including recovery mechanisms and the simulation model are developed. In order to validate the proposed approach, the simulation model is applied to the Korea Next Generation Reactor (KNGR) NSSS Process Control System (NPCS). The results of the simulation provide reasonable delay characteristics of the fault cases with recovery mechanisms. Using the outcome of the simulation and the system requirements, we also can calculate the failure propagation probability from fieldbus to outer system

  10. Improving the Navigability of a Hexapod Robot using a Fault-Tolerant Adaptive Gait

    OpenAIRE

    Umar Asif

    2012-01-01

    This paper encompasses a study on the development of a walking gait for fault tolerant locomotion in unstructured environments. The fault tolerant gait for adaptive locomotion fulfills stability conditions in opposition to a fault (locked joints or sensor failure) event preventing a robot to realize stable locomotion over uneven terrains. To accomplish this feat, a fault tolerant gait based on force?position control is proposed in this paper for a hexapod robot to enable stable walking with a...

  11. Fault Tolerant Multiphase Electrical Drives: The Impact of Design

    OpenAIRE

    SEMAIL, Eric; KESTELYN, Xavier; LOCMENT, Fabrice

    2008-01-01

    This paper deals with fault tolerant multiphase electrical drives. The quality of the torque of a vector-controlled Permanent Magnet (PM) Synchronous Machine supplied by a multi-leg Voltage Source Inverter (VSI) is examined in normal operation and when one or two phases are open-circuited. It is then deduced that a seven-phase machine is a good compromise allowing high torque-to-volume density and easy control with smooth torque in fault operation. Experimental results confirm the predicted c...

  12. Building a fault tolerant application using the GASPI communication layer

    OpenAIRE

    Shahzad, Faisal; Kreutzer, Moritz; Zeiser, Thomas; Machado, Rui; Pieper, Andreas; Hager, Georg; Wellein, Gerhard

    2015-01-01

    It is commonly agreed that highly parallel software on Exascale computers will suffer from many more runtime failures due to the decreasing trend in the mean time to failures (MTTF). Therefore, it is not surprising that a lot of research is going on in the area of fault tolerance and fault mitigation. Applications should survive a failure and/or be able to recover with minimal cost. MPI is not yet very mature in handling failures, the User-Level Failure Mitigation (ULFM) pro...

  13. Fault Diagnosis and Accommodation of LTI systems by modified Youla parameterization

    Directory of Open Access Journals (Sweden)

    Minupriya A

    2012-06-01

    Full Text Available In this paper an Active Fault Tolerant Control (FTC scheme is proposed for Linear Time Invariant (LTI systems, which achieves fault diagnosis followed by fault accommodation. The fault diagnosis scheme is carried out in two steps; Fault detection followed by Fault isolation. Fault detection filter use the sensor measurements to generate residuals, which have a unique static pattern in response to each fault. Distortion in these static patterns generates the probability of the presence of fault. The fault accommodation scheme is carried out using the Generalized Internal Model Control (GIMC architecture, also known as modified Youla parameterization. In addition, performance indices are also evaluated to indicate that the resulting fault tolerant scheme can detect, identify and accommodate actuator and sensor faults under additive faults. The DC motor example is considered for the demonstration of the proposed scheme.

  14. Advanced information processing system: Fault injection study and results

    Science.gov (United States)

    Burkhardt, Laura F.; Masotto, Thomas K.; Lala, Jaynarayan H.

    1992-01-01

    The objective of the AIPS program is to achieve a validated fault tolerant distributed computer system. The goals of the AIPS fault injection study were: (1) to present the fault injection study components addressing the AIPS validation objective; (2) to obtain feedback for fault removal from the design implementation; (3) to obtain statistical data regarding fault detection, isolation, and reconfiguration responses; and (4) to obtain data regarding the effects of faults on system performance. The parameters are described that must be varied to create a comprehensive set of fault injection tests, the subset of test cases selected, the test case measurements, and the test case execution. Both pin level hardware faults using a hardware fault injector and software injected memory mutations were used to test the system. An overview is provided of the hardware fault injector and the associated software used to carry out the experiments. Detailed specifications are given of fault and test results for the I/O Network and the AIPS Fault Tolerant Processor, respectively. The results are summarized and conclusions are given.

  15. Lightweight storage and overlay networks for fault tolerance.

    Energy Technology Data Exchange (ETDEWEB)

    Oldfield, Ron A.

    2010-01-01

    The next generation of capability-class, massively parallel processing (MPP) systems is expected to have hundreds of thousands to millions of processors, In such environments, it is critical to have fault-tolerance mechanisms, including checkpoint/restart, that scale with the size of applications and the percentage of the system on which the applications execute. For application-driven, periodic checkpoint operations, the state-of-the-art does not provide a scalable solution. For example, on today's massive-scale systems that execute applications which consume most of the memory of the employed compute nodes, checkpoint operations generate I/O that consumes nearly 80% of the total I/O usage. Motivated by this observation, this project aims to improve I/O performance for application-directed checkpoints through the use of lightweight storage architectures and overlay networks. Lightweight storage provide direct access to underlying storage devices. Overlay networks provide caching and processing capabilities in the compute-node fabric. The combination has potential to signifcantly reduce I/O overhead for large-scale applications. This report describes our combined efforts to model and understand overheads for application-directed checkpoints, as well as implementation and performance analysis of a checkpoint service that uses available compute nodes as a network cache for checkpoint operations.

  16. Fault Tolerant Characteristics of Artificial Neural Network Electronic Hardware

    Science.gov (United States)

    Zee, Frank

    1995-01-01

    The fault tolerant characteristics of analog-VLSI artificial neural network (with 32 neurons and 532 synapses) chips are studied by exposing them to high energy electrons, high energy protons, and gamma ionizing radiations under biased and unbiased conditions. The biased chips became nonfunctional after receiving a cumulative dose of less than 20 krads, while the unbiased chips only started to show degradation with a cumulative dose of over 100 krads. As the total radiation dose increased, all the components demonstrated graceful degradation. The analog sigmoidal function of the neuron became steeper (increase in gain), current leakage from the synapses progressively shifted the sigmoidal curve, and the digital memory of the synapses and the memory addressing circuits began to gradually fail. From these radiation experiments, we can learn how to modify certain designs of the neural network electronic hardware without using radiation-hardening techniques to increase its reliability and fault tolerance.

  17. Fault-Tolerant Quantum Computation With Constant Error Rate

    CERN Document Server

    Aharonov, D; Aharonov, Dorit; Ben-Or, Michael

    1999-01-01

    This paper proves the threshold result, which asserts that quantum computation can be made robust against errors and inaccuracies, when the error rate, $\\eta$, is smaller than a constant threshold, $\\eta_c$. The result holds for a very general, not necessarily probabilistic noise model, for quantum particles with any number of states, and is also generalized to one dimensional quantum computers with only nearest neighbor interactions. No measurements, or classical operations, are required during the quantum computation. The proceeding version was very succinct, and here we fill all the missing details, and elaborate on many parts of the proof. In particular, we devote a section for a discussion of universality issues and proofs that the sets of gates that we use are universal. Another section is devoted to a rigorous proof that fault tolerance can be achieved in the presence of general non probabilistic noise. The systematic structure of the fault tolerant procedures for polynomial codes is explained in lengt...

  18. Proactive and Reactive View Change for Fault Tolerant Byzantine Agreement

    Directory of Open Access Journals (Sweden)

    Poonam Saini

    2011-01-01

    Full Text Available Problem statement: Dealing with arbitrary failures effectively, while reaching agreement, remains a major operational challenge in distributed transactions. In the contemporary literature, standard protocols such as Byzantine Fault Tolerant Distributed Commit and Practical Byzantine Fault Tolerance handles the problem to a greater extent. However, the limitation with these protocols is that they incur increased message overhead as well as large latency. Approach: To improve the failure resiliency with minimum execution overhead, we propose two new protocols based on proactive view change and reactive view change. Also, both approaches have been analyzed and compared. Results: Our dynamic analysis reflects that, in a faulty scenario, the proactive approach is computationally more efficient with reduced latency as compared to reactive one. Conclusion/Recommendations: Moreover, unlike PBFT and BFTDC, our agreement protocol runs in two phases, which leads to reduced message overhead and total execution time.

  19. Fault-Tolerant Quantum Computation with Local Gates

    CERN Document Server

    Gottesman, D

    1999-01-01

    I discuss how to perform fault-tolerant quantum computation with concatenated codes using local gates in small numbers of dimensions. I show that a threshold result still exists in three, two, or one dimensions when next-to-nearest-neighbor gates are available, and present explicit constructions. In two or three dimensions, I also show how nearest-neighbor gates can give a threshold result. In all cases, I simply demonstrate that a threshold exists, and do not attempt to optimize the error correction circuit or determine the exact value of the threshold. The additional overhead due to the fault-tolerance in both space and time is polylogarithmic in the error rate per logical gate.

  20. Compilation and Synthesis for Fault-Tolerant Digital Microfluidic Biochips

    DEFF Research Database (Denmark)

    Alistar, Mirela

    2014-01-01

    Microfluidic-based biochips are replacing the conventional biochemical analyzers, by integrating all the necessary functions for biochemical analysis using microfluidics. The digital microfluidic biochips (DMBs) manipulate discrete amounts of fluids of nanoliter volume, named droplets, on an array of electrodes to perform operations such as dispensing, transport, mixing, split, dilution and detection. Researchers have proposed compilation approaches, which, starting from a biochemical application and a biochip architecture, determine the allocation, resource binding, scheduling, placement and routing of the operations in the application. During the execution of a bioassay, operations could experience transient faults, thus impacting negatively the correctness of the application. We have proposed both offline (design time) and online (runtime) recovery strategies. The online recovery strategy decides the introduction of the redundancy required for fault-tolerance. We consider both time redundancy, i.e., re-executing erroneous operations, and space redundancy, i.e., creating redundant droplets for fault-tolerance. Error recovery is performed such that the number of transient faults tolerated is maximized and the timing constraints of the biochemical application are satisfied. Previous work has assumed that the biochip architecture is given, and most approaches consider a rectangular shape for the electrode array, where operations execute on rectangular “modules” formed of electrodes. However, non-regular application-specific architectures are common in practice. Hence, we have proposed an approach to the synthesis of application-specific architectures, such that the cost is minimized and the timing constraints of the application are satisfied. We propose an algorithm to build a library of non-regular modules for a given applicationspecific architecture, so that the area of a non-regular application-specific biochip can be used effectively. During fabrication, DMBs can be affected by permanent faults, which may lead to the failure of the application. Our approach introduces redundant electrodes to synthesize fault-tolerant architectures aiming at increasing the yield of DMBs. We also propose a method to estimate, at design time, the application completion time in case of permanent faults in order to verify if an application can be successfully run on the architecture. The proposed approaches were evaluated using several real-life case studies and synthetic benchmarks.

  1. Unconstrained and Constrained Fault-Tolerant Resource Allocation

    OpenAIRE

    Liao, Kewen; Shen, Hong

    2011-01-01

    First, we study the Unconstrained Fault-Tolerant Resource Allocation (UFTRA) problem (a.k.a. FTFA problem in \\cite{shihongftfa}). In the problem, we are given a set of sites equipped with an unconstrained number of facilities as resources, and a set of clients with set $\\mathcal{R}$ as corresponding connection requirements, where every facility belonging to the same site has an identical opening (operating) cost and every client-facility pair has a connection cost. The objec...

  2. Fault-Tolerant Target Localization in Sensor Networks

    OpenAIRE

    Chen Dechang; Ding Min; Liu Fang; Thaeler Andrew; Cheng Xiuzhen

    2007-01-01

    Fault-tolerant target detection and localization is a challenging task in collaborative sensor networks. This paper introduces our exploratory work toward identifying the targets in sensor networks with faulty sensors. We explore both spatial and temporal dimensions for data aggregation to decrease the false alarm rate and improve the target position accuracy. To filter out extreme measurements, the median of all readings in a close neighborhood of a sensor is used to approximate its local o...

  3. Data Structures: Sequence Problems, Range Queries, and Fault Tolerance

    DEFF Research Database (Denmark)

    Jørgensen, Allan Grønlund

    2010-01-01

    The focus of this dissertation is on algorithms, in particular data structures that give provably ecient solutions for sequence analysis problems, range queries, and fault tolerant computing. The work presented in this dissertation is divided into three parts. In Part I we consider algorithms for a range of sequence analysis problems that have risen from applications in pattern matching, bioinformatics, and data mining. On a high level, each problem is dened by a function and some constraints an...

  4. Effect Analysis of Faults in Digital I and C Systems of Nuclear Power Plants

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Seung Jun; Jung, Won Dea [KAERI, Dajeon (Korea, Republic of); Kim, Man Cheol [Chung-Ang University, Seoul (Korea, Republic of)

    2014-08-15

    A reliability analysis of digital instrumentation and control (I and C) systems in nuclear power plants has been introduced as one of the important elements of a probabilistic safety assessment because of the unique characteristics of digital I and C systems. Digital I and C systems have various features distinguishable from those of analog I and C systems such as software and fault-tolerant techniques. In this work, the faults in a digital I and C system were analyzed and a model for representing the effects of the faults was developed. First, the effects of the faults in a system were analyzed using fault injection experiments. A software-implemented fault injection technique in which faults can be injected into the memory was used based on the assumption that all faults in a system are reflected in the faults in the memory. In the experiments, the effect of a fault on the system output was observed. In addition, the success or failure in detecting the fault by fault-tolerant functions included in the system was identified. Second, a fault tree model for representing that a fault is propagated to the system output was developed. With the model, it can be identified how a fault is propagated to the output or why a fault is not detected by fault-tolerant techniques. Based on the analysis results of the proposed method, it is possible to not only evaluate the system reliability but also identify weak points of fault-tolerant techniques by identifying undetected faults. The results can be reflected in the designs to improve the capability of fault-tolerant techniques.

  5. Effect Analysis of Faults in Digital I and C Systems of Nuclear Power Plants

    International Nuclear Information System (INIS)

    A reliability analysis of digital instrumentation and control (I and C) systems in nuclear power plants has been introduced as one of the important elements of a probabilistic safety assessment because of the unique characteristics of digital I and C systems. Digital I and C systems have various features distinguishable from those of analog I and C systems such as software and fault-tolerant techniques. In this work, the faults in a digital I and C system were analyzed and a model for representing the effects of the faults was developed. First, the effects of the faults in a system were analyzed using fault injection experiments. A software-implemented fault injection technique in which faults can be injected into the memory was used based on the assumption that all faults in a system are reflected in the faults in the memory. In the experiments, the effect of a fault on the system output was observed. In addition, the success or failure in detecting the fault by fault-tolerant functions included in the system was identified. Second, a fault tree model for representing that a fault is propagated to the system output was developed. With the model, it can be identified how a fault is propagated to the output or why a fault is not detected by fault-tolerant techniques. Based on the analysis results of the proposed method, it is possible to not only evaluate the system reliability but also identify weak points of fault-tolerant techniques by identifying undetected faults. The results can be reflected in the designs to improve the capability of fault-tolerant techniques

  6. A model–based approach to fault–tolerant control

    DEFF Research Database (Denmark)

    Niemann, Hans Henrik

    2012-01-01

    A model-based controller architecture for Fault-Tolerant Control (FTC) is presented in this paper. The controller architecture is based on a general controller parameterization. The FTC architecture consists of two main parts, a Fault Detection and Isolation (FDI) part and a controller reconfiguration part. The theoretical basis for the architecture is given followed by an investigation of the single parts in the architecture. It is shown that the general controller parameterization is central in connection with both fault diagnosis as well as controller reconfiguration. Especially in relation to the controller reconfiguration part, the application of controller parameterization results in a systematic technique for switching between different controllers. This also allows controller switching using different sets of actuators and sensors.

  7. BFTDT: Byzantine Fault Tolerance tryout for Dependable Transactions in Cloud

    Directory of Open Access Journals (Sweden)

    Gayathri S

    2012-11-01

    Full Text Available Cloud Web Services (CWS is the technology used for business collaboration and integration among the web users. The Web Services Atomic Transactions (WS-AT have been used for the trusted distributed transaction processing over the web. The WS-AT in the distributed sense has byzantine faults to overcome that Byzantine Faults Techniques (BFT is used. The reliable coordinator provides the services that are Coordination services, Activation services, Registration Services and Completion services which make the transaction effective and reliable. In the trusted environment, to evade congestion of the resources, fair share bandwidth allocation scheme is used to allocate separate bandwidth for each web users and the transaction is processed Coordinator server and the Transaction Processing Monitor (TPM. The WS-AT for business applications analysis shows the high degree of dependability, security, trust, fault tolerance and fairness of the resources in the trusted environment.

  8. Experimental Robot Position Sensor Fault Tolerance Using Accelerometers and Joint Torque Sensors

    Science.gov (United States)

    Aldridge, Hal A.; Juang, Jer-Nan

    1997-01-01

    Robot systems in critical applications, such as those in space and nuclear environments, must be able to operate during component failure to complete important tasks. One failure mode that has received little attention is the failure of joint position sensors. Current fault tolerant designs require the addition of directly redundant position sensors which can affect joint design. The proposed method uses joint torque sensors found in most existing advanced robot designs along with easily locatable, lightweight accelerometers to provide a joint position sensor fault recovery mode. This mode uses the torque sensors along with a virtual passive control law for stability and accelerometers for joint position information. Two methods for conversion from Cartesian acceleration to joint position based on robot kinematics, not integration, are presented. The fault tolerant control method was tested on several joints of a laboratory robot. The controllers performed well with noisy, biased data and a model with uncertain parameters.

  9. A Fault Tolerance Management Framework for Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Hesham El-Sayed

    2007-06-01

    Full Text Available

    Wireless Sensor Networks (WSNs have the potential of significantly enhancing our ability to monitor and interact with our physical environment. Realizing a fault tolerant operation is critical to the success of WSNs. The main challenge is providing fault tolerance (FT while conserving the limited resources of the network. Many schemes have been proposed in this area. Our main contribution in this paper is to propose a general framework for fault tolerance in WSNs. The proposed framework can be used to guide the design and development of FT solutions and to evaluate existing ones. We present a comparative study of the existing schemes and identify potential enhancements. A primary module of the framework is the learning and refinement module which enables a FT solution to be adaptive and self-configurable based on changes in the network conditions. We view this as vital to the resource-constrained and highly dynamic WSNs. Up to our knowledge, we are the first to propose the implementation of such module in FT solutions for WSNs.Index Terms

  10. Faster quantum chemistry simulation on fault-tolerant quantum computers

    International Nuclear Information System (INIS)

    Quantum computers can in principle simulate quantum physics exponentially faster than their classical counterparts, but some technical hurdles remain. We propose methods which substantially improve the performance of a particular form of simulation, ab initio quantum chemistry, on fault-tolerant quantum computers; these methods generalize readily to other quantum simulation problems. Quantum teleportation plays a key role in these improvements and is used extensively as a computing resource. To improve execution time, we examine techniques for constructing arbitrary gates which perform substantially faster than circuits based on the conventional Solovay–Kitaev algorithm (Dawson and Nielsen 2006 Quantum Inform. Comput. 6 81). For a given approximation error ?, arbitrary single-qubit gates can be produced fault-tolerantly and using a restricted set of gates in time which is O(log??) or O(log?log??); with sufficient parallel preparation of ancillas, constant average depth is possible using a method we call programmable ancilla rotations. Moreover, we construct and analyze efficient implementations of first- and second-quantized simulation algorithms using the fault-tolerant arbitrary gates and other techniques, such as implementing various subroutines in constant time. A specific example we analyze is the ground-state energy calculation for lithium hydride. (paper)

  11. Fault tolerant control in the case of actuator type of faults based on derivative estimation; Fehlertolerante Regelung bei aktuatoraehnlichen Fehlern mittels Ableitungsschaetzung

    Energy Technology Data Exchange (ETDEWEB)

    Mai, Philipp; Hillermeier, Claus [Univ. der Bundeswehr Muenchen, Neubiberg (Germany). Professur fuer Automatisierungs- und Regelungstechnik

    2010-07-01

    In this article, we present an FTC-architecture where generalized actuator faults are online diagnosed and compensated for by the control system. Employing least-squares derivative estimators for identifying the faults inserts delay times into the control loop. In the course of a stability analysis tolerable delay times are determined, which allows to deduce admissible values of the process parameters. The FTC-scheme is illustrated by the classical three-tank system. (orig.)

  12. Realization of Fault Tolerant Routing Protocol for Zigbee

    Directory of Open Access Journals (Sweden)

    Sankaranarayanan

    2012-01-01

    Full Text Available Problem statement: Increased use of handheld devices and sensor devices pose problems in existing routing protocols. The performance of the existing routing protocols deteriorates very much on these dense scenarios. Control overheads are very important parameter in deciding the performance of routing protocols, which are introduced during route discovery and maintenance process. Denser the network, higher is the control overhead in establishing and maintaining the communication path between end systems. This study aims at implementing an improved fault tolerant routing algorithm that minimizes the routing overhead for ad hoc networks using Zigbee. Approach: We propose a routing protocol which minimizes the routing overhead by exploiting the network density. The number of nodes involved in handling the control packets is minimized in the proposed protocol by selecting few of the neighbors of each node based on the received signal strength. The link breaks are maintained locally, thus by reducing the number of control overheads in the network. Results: The performance of the proposed protocol is tested using OMNet++ simulator. The implementation using Zigbee nodes indicate that the control overhead is reduced up to 80% in dense environments and 60% in heterogeneous and sparse thereby saving energy in the sensor nodes. Conclusion: The proposed protocol increases the energy conservation and hence the nodes life time and networks? lifetime.

  13. Arc fault detection system

    Science.gov (United States)

    Jha, Kamal N. (Bethel Park, PA)

    1999-01-01

    An arc fault detection system for use on ungrounded or high-resistance-grounded power distribution systems is provided which can be retrofitted outside electrical switchboard circuits having limited space constraints. The system includes a differential current relay that senses a current differential between current flowing from secondary windings located in a current transformer coupled to a power supply side of a switchboard, and a total current induced in secondary windings coupled to a load side of the switchboard. When such a current differential is experienced, a current travels through a operating coil of the differential current relay, which in turn opens an upstream circuit breaker located between the switchboard and a power supply to remove the supply of power to the switchboard.

  14. Fault-Tolerant Vision for Vehicle Guidance in Agriculture

    DEFF Research Database (Denmark)

    Blas, Morten Rufus

    2010-01-01

    The emergence of widely available vision technologies is enabling for a wide range of automation tasks in industry and other areas. Agricultural vehicle guidance systems have benefitted from advances in 3D vision based on stereo camera technology. By automatically guiding vehicles along crops and other field structures the operator’s stress levels can be reduced. High precision steering in sensitive crops can also be maintained for longer periods of time as the driver is less tired. Safety and availabilitymust be inherent in such systems in order to get widespread market acceptance. To tolerate dropout of 3D vision, faults in classification, or other defects, redundant information should be utilized. Such information can be used to diagnose faulty behavior and to temporarily continue operation with a reduced set of sensors when faults or artifacts occur. Additional sensors include GPS receivers and inertial sensors. To fully utilize the possibilities in 3D vision, the system must also be able to learn and adapt to changing environments. By learning features of the environment new diagnostic relations can be generated by creating redundant feed-forward information about crop location. Also, by mapping the field that is seen by the stereo camera, it is possible to support the guidance system by storing salient information about the environment. By tracking the motion of the vehicle, vision output can be fused over time to create more reliable and robust estimates of crop location. This thesis approaches these challenges by considering systematic design methods using graph-based analysis. It is demonstrated how diagnostic relations can be derived and remedial actions can be done to maintain safety and healthy ii functioning of vision systems. The combination of redundant information from 3D vision, mapping, and aiding sensors such as GPS provide means to detect and isolate single faults in the system. In addition, learning is employed to adapt the system to variational changes in the natural environment. 3D vision is enhanced by learning texture and color information. Intensity gradients on small neighborhoods of pixels are shown to provide a superior approach to modeling texture information than other methods. Stochastic automatas using optimally quantized data is demonstrated as a strong approach for offline learning. It is considered how 3D vision provides labeling of training data that subsequently can be fed into a learning system. Statistical change detection theory is shown to be a suitable approach to detecting artifacts in the learning process so safe operation can be maintained. The system can be used to perform real-time classification using a fast online approach that is superior to state-of-the-art. Advances in tracking vehicle motion using 3D vision is demonstrated to allow unprecedented high accuracy maps to be created of the local environment. Features in the environment are extracted and tracked using novel feature detectors relying on approximating the Laplacian operator with a bi-level octagonal kernel. It is shown how these features display high levels of accuracy and stability while being considerable faster than similar feature detectors. Artifacts in 3D vision range measurements are demonstrated to be detectable by using the generated 3D maps and a probabilistic approach to fusing and comparing range measurements.

  15. Fault Diagnosis and Fault Tolerant Control with Application on a Wind Turbine Low Speed Shaft Encoder

    DEFF Research Database (Denmark)

    Odgaard, Peter Fogh; Sardi, Hector Eloy Sanchez

    2015-01-01

    In recent years, individual pitch control has been developed for wind turbines, with the purpose of reducing blade and tower loads. Such algorithms depend on reliable sensor information. The azimuth angle sensor, which positions the wind turbine rotor in its rotation, is quite important. This sensor has to be correct as blade pitch actions should be different at different azimuth angle as the wind speed varies within the rotor field due to different phenomena. A scheme detecting faults in this sensor has previously been designed for the application of a high end fault diagnosis and fault tolerant control of wind turbines using a benchmark model. In this paper, the fault diagnosis scheme is improved and integrated with a fault accommodation scheme which enables and disables the individual pitch algorithm based on the fault detection. In this way, the blade and tower loads are not increased due to individual pitch control algorithm operating with faulty azimuth angle inputs. The proposed approach is evaluated on a wind turbine benchmark model, which is based on the FAST aero-elastic code provided by NREL.

  16. Fault tolerant workflow scheduling based on replication and resubmission of tasks in Cloud Computing

    OpenAIRE

    Jayadivya S K; Jaya Nirmala S; Mary Saira Bhanu S

    2012-01-01

    The aim of workflow scheduling system is to schedule the workflows within the user given deadline to achieve a good success rate. Workflow is a set of tasks processed in a predefined order based on its data and control dependency. Scheduling these workflows in a computing environment, like cloud environment, is an NP-Complete problem and it becomes more challenging when failures of tasks areconsidered. To overcome these failures, the workflow scheduling system should be fault tolerant. In thi...

  17. Fault Tolerant Electrical Machines. State of the Art and Future Directions

    OpenAIRE

    RUBA Mircea; SZABÓ Loránd

    2008-01-01

    Nowadays the evolution of electrical engineering achieved a successful expansion in the area of fault tolerant electrical machines. To achieve fault tolerance researchers tried to design various geometries and different electrical drives. When new designers are intended to be performed the knowledge of the actualstate of the work is impetuously needed. The paper summarizes the most important information on these topics. Both fault tolerant machine and drive structure were taken into accounts....

  18. Performance Studies of Fault-Tolerant Middleware

    OpenAIRE

    Szentiványi, Diana

    2005-01-01

    Today’s software engineering and application development trend is to take advantage of reusable software. Much effort is directed towards easing the task of developing complex, distributed, network based applications with reusable components. To ease the task of the distributed systems’ developers, one can use middleware, i.e. a software layer between the operating system and the application, which handles distribution transparently. A crucial feature of distributed server applications is hig...

  19. Prognostics Enhancemend Fault-Tolerant Control with an Application to a Hovercraft Project

    Data.gov (United States)

    National Aeronautics and Space Administration — Fault-Tolerant Control (FTC) is an emerging area of engineering and scientific research that integrates prognostics, health management concepts and intelligent...

  20. Fault Tolerant Electrical Machines. State of the Art and Future Directions

    Directory of Open Access Journals (Sweden)

    Mircea RUBA

    2008-05-01

    Full Text Available Nowadays the evolution of electrical engineering achieved a successful expansion in the area of fault tolerant electrical machines. To achieve fault tolerance researchers tried to design various geometries and different electrical drives. When new designers are intended to be performed the knowledge of the actualstate of the work is impetuously needed. The paper summarizes the most important information on these topics. Both fault tolerant machine and drive structure were taken into accounts. In the paper also a new idea for a fault tolerant switched reluctance machine having a special winding is presented. The future tasks to be performed are also mentioned in the paper.

  1. Byzantine Fault Tolerance of Regenerating Codes

    CERN Document Server

    Oggier, Frédérique

    2011-01-01

    Recent years have witnessed a slew of coding techniques custom designed for networked storage systems. Network coding inspired regenerating codes are the most prolifically studied among these new age storage centric codes. A lot of effort has been invested in understanding the fundamental achievable trade-offs of storage and bandwidth usage to maintain redundancy in presence of different models of failures, showcasing the efficacy of regenerating codes with respect to traditional erasure coding techniques. For practical usability in open and adversarial environments, as is typical in peer-to-peer systems, we need however not only resilience against erasures, but also from (adversarial) errors. In this paper, we study the resilience of generalized regenerating codes (supporting multi-repairs, using collaboration among newcomers) in the presence of two classes of Byzantine nodes, relatively benign selfish (non-cooperating) nodes, as well as under more active, malicious polluting nodes. We give upper bounds on t...

  2. Two New Protocols for Fault Tolerant Agreement

    Directory of Open Access Journals (Sweden)

    Poonam Saini

    2011-02-01

    Full Text Available The paper attempts to handle failures effectively, while reaching agreement, in a distributed transaction processing system. The standard protocols such as BFTDC [3], Zyzzyva [4] and PBFT [5] handle the problem to a greater extent. However, the limitation with these protocols is that they incur increased message overhead as well as large latency. Moreover, the nodes are evacuated from the transactionsystem after being declared faulty. We propose a novel proactive based agreement which identifies the tentative failures in the system. To improve the failure resiliency with minimum execution overhead, we also propose an optimized reactive view change mechanism. Both mechanisms have been analyzed and compared. The dynamic analysis of the protocol reflects that, in a faulty scenario, the proactive approach is computationally more efficient with reduced latency as compared to reactive one. Moreover, unlike PBFT and BFTDC, our agreement protocol runs in two phases, which leads to reduced message overhead and total execution time. The protocol treats the fail-silent (i.e. crashed nodes in the system.

  3. A Benchmark Evaluation of Fault Tolerant Wind Turbine Control Concepts

    DEFF Research Database (Denmark)

    Odgaard, Peter Fogh; Stoustrup, Jakob

    2015-01-01

    As the world’s power supply to a larger and larger degree depends on wind turbines, it is consequently and increasingly important that these are as reliable and available as possible. Modern fault tolerant control (FTC) could play a substantial part in increasing reliability of modern wind turbines. A benchmark model for wind turbine fault detection and isolation, and FTC has previously been proposed. Based on this benchmark, an international competition on wind turbine FTC was announced. In this brief, the top three solutions from that competition are presented and evaluated. The analysis shows that all three methods and, in particular, the winner of the competition shows potential for wind turbine FTC. In addition to showing good performance, the approach is based on a method, which is relevant for industrial usage. It is based on a virtual sensor and actuator strategy, in which the fault accommodation is handled in software sensor and actuator blocks. This means that the wind turbine controller can continue operation as in the fault free case. The other two evaluated solutions show some potential but probably need improvements before industrial applications.

  4. Thermal Fault Tolerance Analysis of Carbon Fiber Rope Barrier Systems for Use in the Reusable Solid Rocket Motor ( RSRM) Nozzle Joints

    Science.gov (United States)

    Clayton, J. Louie; Phelps, Lisa (Technical Monitor)

    2001-01-01

    Carbon Fiber Rope (CFR) thermal barrier systems are being considered for use in several RSRM (Reusable Solid Rocket Motor) nozzle joints as a replacement for the current assembly gap close-out process/design. This study provides for development and test verification of analysis methods used for flow-thermal modeling of a CFR thermal barrier subject to fault conditions such as rope combustion gas blow-by and CFR splice failure. Global model development is based on a 1-D (one dimensional) transient volume filling approach where the flow conditions are calculated as a function of internal 'pipe' and porous media 'Darcy' flow correlations. Combustion gas flow rates are calculated for the CFR on a per-linear inch basis and solved simultaneously with a detailed thermal-gas dynamic model of a local region of gas blow by (or splice fault). Effects of gas compressibility, friction and heat transfer are accounted for the model. Computational Fluid Dynamic (CFD) solutions of the fault regions are used to characterize the local flow field, quantify the amount of free jet spreading and assist in the determination of impingement film coefficients on the nozzle housings. Gas to wall heat transfer is simulated by a large thermal finite element grid of the local structure. The employed numerical technique loosely couples the FE (Finite Element) solution with the gas dynamics solution of the faulted region. All free constants that appear in the governing equations are calibrated by hot fire sub-scale test. The calibrated model is used to make flight predictions using motor aft end environments and timelines. Model results indicate that CFR barrier systems provide a near 'vented joint' style of pressurization. Hypothetical fault conditions considered in this study (blow by, splice defect) are relatively benign in terms of overall heating to nozzle metal housing structures.

  5. Fault-tolerant digital microfluidic biochips compilation and synthesis

    CERN Document Server

    Pop, Paul; Stuart, Elena; Madsen, Jan

    2016-01-01

    This book describes for researchers in the fields of compiler technology, design and test, and electronic design automation the new area of digital microfluidic biochips (DMBs), and thus offers a new application area for their methods.  The authors present a routing-based model of operation execution, along with several associated compilation approaches, which progressively relax the assumption that operations execute inside fixed rectangular modules.  Since operations can experience transient faults during the execution of a bioassay, the authors show how to use both offline (design time) and online (runtime) recovery strategies. The book also presents methods for the synthesis of fault-tolerant application-specific DMB architectures. ·         Presents the current models used for the research on compilation and synthesis techniques of DMBs in a tutorial fashion; ·         Includes a set of “benchmarks”, which are presented in great detail and includes the source code of most of the t...

  6. Close range fault tolerant noncontacting position sensor

    Science.gov (United States)

    Bingham, Dennis N. (Idaho Falls, ID); Anderson, Allen A. (Shelley, ID)

    1996-01-01

    A method and system for locating the three dimensional coordinates of a moving or stationary object in real time. The three dimensional coordinates of an object in half space or full space are determined based upon the time of arrival or phase of the wave front measured by a plurality of receiver elements and an established vector magnitudes proportional to the measured time of arrival or phase at each receiver element. The coordinates of the object are calculated by solving a matrix equation or a set of closed form algebraic equations.

  7. Reliability Improvement of a T-Type Three-Level Inverter With Fault-Tolerant Control Strategy

    DEFF Research Database (Denmark)

    Choi, Uimin; Blaabjerg, Frede

    2015-01-01

    This paper proposes a fault-tolerant control strategy for a T-type three-level inverter when an open-circuit fault occurs. The proposed method is explained by dividing fault into two cases: the faulty condition of half-bridge switches and neutral-point switches. In case of the open-circuit fault in a neutral-point switch, two methods will be proposed and compared based on thermal analysis and neutral-point voltage oscillation. The reliability of T-type inverter systems is improved considerably by the proposed algorithm when a switch fails. The proposed method does not require any additional components. Simulation and experimental results verify the validity and feasibility of the proposed fault-tolerant control strategy.

  8. Automatic distribution fault locating system

    Energy Technology Data Exchange (ETDEWEB)

    Hager, G.E. [Electrical Systems Consultants, Inc., Fort Collins, CO (United States); Bear, R.N.M. [City of Aztec, NM (United States); Baum, A.S. [Scipar, Inc., Buffalo, NY (United States)

    1995-12-31

    An automated fault locating system was designed and implemented for the Colorado River Agency (CRA) 12.5/7.2 kV distribution system. This automated fault locating system (FLS) was integrated into the Supervisory Control and Data Acquisition (SCADA) system which was installed at a hydro power plant. This FLS offers several benefits to the CRA distribution system. These benefits include: reduced outage time; help in locating momentary faults; enhanced safety to the line crews; provide notification of an outage without receiving calls from the consumer; and decreased overtime.

  9. Data center networks topologies, architectures and fault-tolerance characteristics

    CERN Document Server

    Liu, Yang; Veeraraghavan, Malathi; Lin, Dong; Hamdi, Mounir

    2013-01-01

    This SpringerBrief presents a survey of data center network designs and topologies and compares several properties in order to highlight their advantages and disadvantages. The brief also explores several routing protocols designed for these topologies and compares the basic algorithms to establish connections, the techniques used to gain better performance, and the mechanisms for fault-tolerance. Readers will be equipped to understand how current research on data center networks enables the design of future architectures that can improve performance and dependability of data centers. This con

  10. A comparative code study for quantum fault-tolerance

    CERN Document Server

    Cross, Andrew W; Terhal, Barbara M

    2007-01-01

    We study a comprehensive list of quantum codes as candidates of codes to be used at the bottom, physical, level in a fault-tolerant code architecture. Using the Aliferis-Gottesman-Preskill (AGP) ex-Rec method we calculate the pseudo-threshold for these codes against depolarizing noise at various levels of overhead. We estimate the logical noise rate as a function of overhead at a physical error rate of $p_0=1\\times 10^{-4}$. The Bacon-Shor codes and the Golay code are the best performers in our study.

  11. Checkpoint and Replication Oriented Fault Tolerant Mechanism for MapReduce Framework

    Directory of Open Access Journals (Sweden)

    Yang Liu

    2013-09-01

    Full Text Available MapReduce is an emerging programming paradigm and an associated implementation for processing and generating big data which has been widely applied in data-intensive systems. In cloud environment, node and task failure is no longer accidental but a common feature of large-scale systems. In MapReduce framework, although the rescheduling based fault-tolerant method is simple to implement, it failed to fully consider the location of distributed data, the computation and storage overhead. Thus, a single node failure will increase the completion time dramatically. In this paper, a Checkpoint and Replication Oriented Fault Tolerant scheduling algorithm (CROFT is proposed, which takes both task and node failure into consideration. Preliminary experiments show that with less storage and network overhead. CROFT will significantly reduce the completion time at failure time, and the overall performance of MapReduce can be improved at least over 30% than original mechanism in Hadoop.  

  12. 2009 fault tolerance for extreme-scale computing workshop, Albuquerque, NM - March 19-20, 2009.

    Energy Technology Data Exchange (ETDEWEB)

    Katz, D. S.; Daly, J.; DeBardeleben, N.; Elnozahy, M.; Kramer, B.; Lathrop, S.; Nystrom, N.; Milfeld, K.; Sanielevici, S.; Scott, S.; Votta, L.; Louisiana State Univ.; Center for Exceptional Computing; LANL; IBM; Univ. of Illinois; Shodor Foundation; Pittsburgh Supercomputer Center; Texas Advanced Computing Center; ORNL; Sun Microsystems

    2009-02-01

    This is a report on the third in a series of petascale workshops co-sponsored by Blue Waters and TeraGrid to address challenges and opportunities for making effective use of emerging extreme-scale computing. This workshop was held to discuss fault tolerance on large systems for running large, possibly long-running applications. The main point of the workshop was to have systems people, middleware people (including fault-tolerance experts), and applications people talk about the issues and figure out what needs to be done, mostly at the middleware and application levels, to run such applications on the emerging petascale systems, without having faults cause large numbers of application failures. The workshop found that there is considerable interest in fault tolerance, resilience, and reliability of high-performance computing (HPC) systems in general, at all levels of HPC. The only way to recover from faults is through the use of some redundancy, either in space or in time. Redundancy in time, in the form of writing checkpoints to disk and restarting at the most recent checkpoint after a fault that cause an application to crash/halt, is the most common tool used in applications today, but there are questions about how long this can continue to be a good solution as systems and memories grow faster than I/O bandwidth to disk. There is interest in both modifications to this, such as checkpoints to memory, partial checkpoints, and message logging, and alternative ideas, such as in-memory recovery using residues. We believe that systematic exploration of these ideas holds the most promise for the scientific applications community. Fault tolerance has been an issue of discussion in the HPC community for at least the past 10 years; but much like other issues, the community has managed to put off addressing it during this period. There is a growing recognition that as systems continue to grow to petascale and beyond, the field is approaching the point where we don't have any choice but to address this through R&D efforts.

  13. Evaluation of Fault Detection Coverage of Digital I and C Systems

    International Nuclear Information System (INIS)

    In the fault tolerance evaluation, fault detection coverage is a crucial factor. The fault detection coverage is the ability to detect errors that are caused by faults in a system. If faults are not detected by a certain detection algorithm, the system could be in failure. Evaluating the fault detection coverage of the fault-tolerant technique is important for the safety analysis of digital systems. Digital I and C systems have more various fault-tolerant techniques than conventional analog I and C systems. Even though these fault-tolerant techniques are designed to ensure and improve the safety of systems, the effects of them have not been properly considered yet in most system probabilistic safety assessment (PSA) models. There have been several researches into the reliability of digital systems. However, systematical frameworks or reasonable models to obtain the reliability of digital systems by considering the effects of fault-tolerant techniques have not been proposed. Therefore, it is necessary to develop an evaluation method reflecting the features of digital I and C systems. The evaluation method for fault detection coverage of digital I and C systems was proposed in this work. The proposed method quantifies the fault detection coverage based on the fault injection experiment. Even though there are several limitations of the fault injection experiment such as fault injection into only memory and register, the method has an advantage of that it is possible to observe the actual system behavior against faults in the system. More accurate system reliability evaluation of digital I and C systems can be expected through the experiment result

  14. Realistic Models and Efficient Algorithms for Fault Tolerant Scheduling on Heterogeneous Platforms

    OpenAIRE

    Benoit, Anne; Hakem, Mourad; Robert, Yves

    2008-01-01

    Most list scheduling heuristics rely on a simple platform model where communication contention is not taken into account. In addition, it is generally assumed that processors in the systems are completely safe. To schedule precedence graphs in a more realistic framework, we introduce an efficient fault tolerant scheduling algorithm that is both contention-aware and capable of supporting $\\varepsilon$ arbitrary fail-silent (fail-stop) processor failures. We focus on a bi-criteria approach, whe...

  15. Decoherence-Free Subspaces for Multiple-Qubit Errors (II) Universal, Fault-Tolerant Quantum Computation

    CERN Document Server

    Lidar, D A; Kempe, J; Whaley, K B; Lidar, Daniel A.; Bacon, David; Kempe, Julia

    2001-01-01

    Decoherence-free subspaces (DFSs) shield quantum information from errors induced by the interaction with an uncontrollable environment. Here we study a model of correlated errors forming an Abelian subgroup (stabilizer) of the Pauli group (the group of tensor products of Pauli matrices). Unlike previous studies of DFSs, this type of errors does not involve any spatial symmetry assumptions on the system-environment interaction. We solve the problem of universal, fault-tolerant quantum computation on the associated class of DFSs.

  16. Implementation of Fault-tolerant Quantum Logic Gates via Optimal Control

    OpenAIRE

    Nigmatullin, R.; Schirmer, S. G.

    2009-01-01

    The implementation of fault-tolerant quantum gates on encoded logic qubits is considered. It is shown that transversal implementation of logic gates based on simple geometric control ideas is problematic for realistic physical systems suffering from imperfections such as qubit inhomogeneity or uncontrollable interactions between qubits. However, this problem can be overcome by formulating the task as an optimal control problem and designing efficient algorithms to solve it. ...

  17. A Model for Variation- and Fault-Tolerant Digital Logic using Self-Assembled Nanowire Architectures

    OpenAIRE

    Goudarzi, Alireza; Lakin, Matthew R.; Stefanovic, Darko; Teuscher, Christof

    2014-01-01

    Reconfiguration has been used for both defect- and fault-tolerant nanoscale architectures with regular structure. Recent advances in self-assembled nanowires have opened doors to a new class of electronic devices with irregular structure. For such devices, reservoir computing has been shown to be a viable approach to implement computation. This approach exploits the dynamical properties of a system rather than specifics of its structure. Here, we extend a model of reservoir ...

  18. Performance and economy of a fault-tolerant multiprocessor

    Science.gov (United States)

    Lala, J. H.; Smith, C. J.

    1979-01-01

    The FTMP (Fault-Tolerant Multiprocessor) is one of two central aircraft fault-tolerant architectures now in the prototype phase under NASA sponsorship. The intended application of the computer includes such critical real-time tasks as 'fly-by-wire' active control and completely automatic Category III landings of commercial aircraft. The FTMP architecture is briefly described and it is shown that it is a viable solution to the multi-faceted problems of safety, speed, and cost. Three job dispatch strategies are described, and their results with respect to job-starting delay are presented. The first strategy is a simple First-Come-First-Serve (FCFS) job dispatch executive. The other two schedulers are an adaptive FCFS and an interrupt driven scheduler. Three failure modes are discussed, and the FTMP survival probability in the face of random hard failures is evaluated. It is noted that the hourly cost of operating two FTMPs in a transport aircraft can be as little as one-to-two percent of the total flight-hour cost of the aircraft.

  19. Scalable Fault-Tolerant Location Management Scheme for Mobile IP

    Directory of Open Access Journals (Sweden)

    JinHo Ahn

    2001-11-01

    Full Text Available As the number of mobile nodes registering with a network rapidly increases in Mobile IP, multiple mobility (home of foreign agents can be allocated to a network in order to improve performance and availability. Previous fault tolerant schemes (denoted by PRT schemes to mask failures of the mobility agents use passive replication techniques. However, they result in high failure-free latency during registration process if the number of mobility agents in the same network increases, and force each mobility agent to manage bindings of all the mobile nodes registering with its network. In this paper, we present a new fault-tolerant scheme (denoted by CML scheme using checkpointing and message logging techniques. The CML scheme achieves low failure-free latency even if the number of mobility agents in a network increases, and improves scalability to a large number of mobile nodes registering with each network compared with the PRT schemes. Additionally, the CML scheme allows each failed mobility agent to recover bindings of the mobile nodes registering with the mobility agent when it is repaired even if all the other mobility agents in the same network concurrently fail.

  20. Fault Tolerant Event Detection in Distributed WSN via Pivotal Messaging

    Directory of Open Access Journals (Sweden)

    Chhavi Gupta

    2013-05-01

    Full Text Available To mitigate the problem of loosing data, which can be very crucial to prevent environmental disasters, we proposed a fault tolerant event detection algorithm in a distributed environment. We considered the data loss because of node failure in sensor network due to some technical fault or energy issues. This algorithm works by backing up the data of cluster heads at some secondary nodes. If the secondary node that is backup cluster head notices the failure of primary or vital clusterhead, it will inform this to all non cluster-head nodes through pivotal messages. Subsequently, backup cluster-head will start working as vital cluster-head with new backup cluster-head. We have taken all parameters with the promising optimal use of energy. We have compared our results with the existing Dynamic Static Clustering Protocol (DSC and fault toleran Dynamic Clustering Protocol (FT-DSC. We have compared results on the basis of various parameters like energy consumption over time, number of data packet transmission and network life time based on network remaining energy.

  1. Robust and Fault-Tolerant Linear Parameter-Varying Control of Wind Turbines

    DEFF Research Database (Denmark)

    Sloth, Christoffer; Esbensen, Thomas

    2011-01-01

    High performance and reliability are required for wind turbines to be competitive within the energy market. To capture their nonlinear behavior, wind turbines are often modeled using parameter-varying models. In this paper we design and compare multiple linear parameter-varying (LPV) controllers, designed using a proposed method that allows the inclusion of both faults and uncertainties in the LPV controller design. We specifically consider a 4.8 MW, variable-speed, variable-pitch wind turbine model with a fault in the pitch system. We propose the design of a nominal controller (NC), handling the parameter variations along the nominal operating trajectory caused by nonlinear aerodynamics. To accommodate the fault in the pitch system, an active fault-tolerant controller (AFTC) and a passive fault-tolerant controller (PFTC) are designed. In addition to the nominal LPV controller, we also propose a robust controller (RC). This controller is able to take into account model uncertainties in the aerodynamic model. The controllers are based on output feedback and are scheduled on an estimated wind speed to manage the parameter-varying nature of the model. Furthermore, the AFTC relies on information from a fault diagnosis system. The optimization problems involved in designing the PFTC and RC are based on solving bilinear matrix inequalities (BMIs) instead of linear matrix inequalities (LMIs) due to unmeasured parameter variations. Consequently, they are more difficult to solve. The paper presents a procedure, where the BMIs are rewritten into two necessary LMI conditions, which are solved using a two-step procedure. Simulation results show the performance of the LPV controllers to be superior to that of a reference controller designed based on classical principles.

  2. Fault-Tolerant Control of Wind Turbines using a Takagi-Sugeno Sliding Mode Observer

    Science.gov (United States)

    Georg, Sören; Schulte, Horst

    2014-06-01

    In this paper, observer-based fault-tolerant control schemes for actuator and sensor faults are implemented within dynamic wind turbine simulations. The faults are directly reconstructed by means of a Takagi-Sugeno sliding mode observer. As simulation models, both a reduced-order model with 4 degrees of freedom and the aero-elastic code FAST by NREL are used. A fault-tolerant control scheme is set up by subtracting the reconstructed fault from the faulty control signal respectively sensor value. With these fault compensation schemes, the corrected controller behaviour is close to the fault-free case. The global stability of the controller in the full-load region in the presence of faults and with active fault compensation is shown by analysing the derivative of an appropriate Lyapunov function.

  3. Fault-Tolerant Control of Wind Turbines using a Takagi-Sugeno Sliding Mode Observer

    International Nuclear Information System (INIS)

    In this paper, observer-based fault-tolerant control schemes for actuator and sensor faults are implemented within dynamic wind turbine simulations. The faults are directly reconstructed by means of a Takagi-Sugeno sliding mode observer. As simulation models, both a reduced-order model with 4 degrees of freedom and the aero-elastic code FAST by NREL are used. A fault-tolerant control scheme is set up by subtracting the reconstructed fault from the faulty control signal respectively sensor value. With these fault compensation schemes, the corrected controller behaviour is close to the fault-free case. The global stability of the controller in the full-load region in the presence of faults and with active fault compensation is shown by analysing the derivative of an appropriate Lyapunov function

  4. Systematic fault tolerant control based on adaptive Thau observer estimation for quadrotor UAVs

    Directory of Open Access Journals (Sweden)

    Cen Zhaohui

    2015-03-01

    Full Text Available A systematic fault tolerant control (FTC scheme based on fault estimation for a quadrotor actuator, which integrates normal control, active and passive FTC and fault parking is proposed in this paper. Firstly, an adaptive Thau observer (ATO is presented to estimate the quadrotor rotor fault magnitudes, and then faults with different magnitudes and time-varying natures are rated into corresponding fault severity levels based on the pre-defined fault-tolerant boundaries. Secondly, a systematic FTC strategy which can coordinate various FTC methods is designed to compensate for failures depending on the fault types and severity levels. Unlike former stand-alone passive FTC or active FTC, our proposed FTC scheme can compensate for faults in a way of condition-based maintenance (CBM, and especially consider the fatal failures that traditional FTC techniques cannot accommodate to avoid the crashing of UAVs. Finally, various simulations are carried out to show the performance and effectiveness of the proposed method.

  5. Fault tolerant workflow scheduling based on replication and resubmission of tasks in Cloud Computing

    Directory of Open Access Journals (Sweden)

    Jayadivya S K

    2012-06-01

    Full Text Available The aim of workflow scheduling system is to schedule the workflows within the user given deadline to achieve a good success rate. Workflow is a set of tasks processed in a predefined order based on its data and control dependency. Scheduling these workflows in a computing environment, like cloud environment, is an NP-Complete problem and it becomes more challenging when failures of tasks areconsidered. To overcome these failures, the workflow scheduling system should be fault tolerant. In this paper, the proposed Fault Tolerant Workflow Scheduling algorithm (FTWS provides fault tolerance by using replication and resubmission of tasks based on priority of the tasks. The replication of tasks depends on a heuristic metric which is calculated by finding the tradeoff between the replication factor and resubmission factor. The heuristic metric is considered because replication alone may lead to resource wastage and resubmission alone may increase makespan. Tasks are prioritized based on the criticality of the task which is calculated by using parameters like out degree, earliest deadline and high resubmission impact. Priority helps in meeting the deadline of a task and thereby reducing wastage of resources. FTWS schedules workflows within a deadline even in the presence of failures without using any history of information. The experiments were conducted in a simulated cloud environment by scheduling workflows in the presence of failures which are generated randomly. The experimental results of the proposed work demonstrate the effective success rate in-spite of various failures.

  6. Fault-tolerant topology in the wireless sensor networks for energy depletion and random failure

    International Nuclear Information System (INIS)

    Nodes in the wireless sensor networks (WSNs) are prone to failure due to energy depletion and poor environment, which could have a negative impact on the normal operation of the network. In order to solve this problem, in this paper, we build a fault-tolerant topology which can effectively tolerate energy depletion and random failure. Firstly, a comprehensive failure model about energy depletion and random failure is established. Then an improved evolution model is presented to generate a fault-tolerant topology, and the degree distribution of the topology can be adjusted. Finally, the relation between the degree distribution and the topological fault tolerance is analyzed, and the optimal value of evolution model parameter is obtained. Then the target fault-tolerant topology which can effectively tolerate energy depletion and random failure is obtained. The performances of the new fault tolerant topology are verified by simulation experiments. The results show that the new fault tolerant topology effectively prolongs the network lifetime and has strong fault tolerance. (general)

  7. Second-order sliding mode fault-tolerant control of heat recovery steam generator boiler in combined cycle power plants

    International Nuclear Information System (INIS)

    Power generation plants are intrinsically complex systems due to their numerous internal components. Higher energy efficiency in power plants is now achieved through employing combined cycles. In this article, an adaptive robust Sliding Mode Controller (SMC) is designed to overcome the faults in Heat Recovery Steam Generator boilers (HRSG boilers) as one of the main parts of a combined cycle plant. On condition that a fault occurs in the HRSG boiler, the control system must be able to reconfigure its parameters to maintain the admissible thresholds in dynamic variables such as drum pressure, steam temperature, and drum water level. To achieve good performance for the boiler, the proposed adaptive robust SMC shall conquer the effects of faults and uncertainties by estimating their upper bounds adaptively, and force the outputs of the multivariable boiler to track the outputs of a desired multivariable reference model. Manipulating a suitable control input and using second-order sliding mode control strategy, the output tracking error slides to zero on a PID sliding surface. Besides tracking, the controlled boiler tolerates faults in system matrix, faults in input matrix, and external disturbance signal. Numerical simulations confirm the effectiveness of the proposed FTC (Fault-Tolerant Control) system for an uncertain non-minimum phase HRSG boiler. Highlights: ? This paper proposes a PID-based adaptive second-order sliding mode controller (SMC). ? SMC is robust to actuator and sensor faults and tracks outputs of a reference system. ? SMC is used in fault tolerant control of a heat recovery steam generator boilers. ? Boiler and reference system have different number of states and inputs. ? Performance of SMC is investigated with different faults scenarios in simulations.

  8. Design of Parity Preserving Logic Based Fault Tolerant Reversible Arithmetic Logic Unit

    Directory of Open Access Journals (Sweden)

    Rakshith Saligram1

    2013-06-01

    Full Text Available Reversible Logic is gaining significant consideration as the potential logic design style for implementation in modern nanotechnology and quantum computing with minimal impact on physical entropy .Fault Tolerant reversible logic is one class of reversible logic that maintain the parity of the input and the outputs. Significant contributions have been made in the literature towards the design of fault tolerant reversible logic gate structures and arithmetic units, however, there are not many efforts directed towards the design of fault tolerant reversible ALUs. Arithmetic Logic Unit (ALU is the prime performing unit in any computing device and it has to be made fault tolerant. In this paper we aim to design one such fault tolerant reversible ALU that is constructed using parity preserving reversible logic gates. The designed ALU can generate up to seven Arithmetic operations and four logical operations

  9. Fault-tolerant Control of Inverter-fed Induction Motor Drives

    DEFF Research Database (Denmark)

    Thybo, C.

    1999-01-01

    The main purpose of this work was to investigate how fault-tolerant control (FTC) could be included in the control scheme of frequency converter fed induction motor applications. This was approached by identifying the potential failure modes for which fault tolerant control should be applied. A description of the different frequency converter components, including models of the inverter, sensors and controllers was given, followed by a fault mode and effect analysis, which points out the potenti...

  10. Active Fault Isolation in MIMO Systems

    DEFF Research Database (Denmark)

    Niemann, Hans Henrik; Poulsen, Niels KjØlstad

    2014-01-01

    Active fault isolation of parametric faults in closed-loop MIMO system s are considered in this paper. The fault isolation consists of two steps. T he first step is group- wise fault isolation. Here, a group of faults is isolated from other pos sible faults in the system. The group-wise fault isolation is based directly on the input/output s ignals applied for the fault detection. It is guaranteed that the fault group includes the fault that had occurred in the system. The second step is individual fault isolation in the fault group . Both types of isolation are obtained by applying dedicated auxiliary inputs and the associate d residual outputs.

  11. Wiring systems and fault finding

    CERN Document Server

    Scaddan, Brian

    1905-01-01

    This book deals with an area of practice which many students and non-electricians find particularly challenging. It explains how to interpret circuit diagrams, wiring systems and the principles and practice of testing and fault diagnosis. It will give the reader confidence to understand the principles of testing and to apply this knowledge to fault finding in electrical circuits.It is a handy reference for anybody who needs to be able to trace faults in circuits, whether in domestic, commercial or industrial settings. It will be a time-saver for all electricians, plumbers, heating engineers, t

  12. An LDPC decoding method for fault-tolerant digital logic

    OpenAIRE

    Tang, Yangyang; Winstead, Chris; Boutillon, Emmanuel; Jego, Christophe; JEZEQUEL, Michel

    2012-01-01

    A decoding algorithm and logic implementation is proposed for fast, low-complexity error correction in environments with a high rate of transient faults as well as hard errors. The circuit is able to correct a single error in one clock cycle, making it suitable for mitigating faults in pipelined digital logic systems. The proposed method is also resilient against internal transient gate errors that may occur within the decoder itself. In the presence of a high input error rate (0.001) and hig...

  13. Robust and Fault Tolerant Control of CD-players

    DEFF Research Database (Denmark)

    Vidal, Enrique Sanchez

    2003-01-01

    Several new standards have emerged recently in the area of portable optical data sto-rage media and more are on their way. In addition to the well known Compact Disc(CD), portable optical media now also feature media for video storage (DVDs) and ge-neral data storage media for computer purposes (CD-ROMs). DVDs can be two-sided with multiple layers, allowing read, write and rewrite operations. Most significantly in this context, the new media typically have much higher physical data densities. This constitutes a significant challenge in terms of playability (the ability to reproduce the information from non-ideal discs in non-ideal circumstances) which is the main topic this Ph.D. thesis is focused on. There are three important contributions to the technical field of study treated in the thesis. It is known that the specific characteristics of the CD-drives vary from unit to unit. Traditionally the parameter estimation is performed in closed loop, probably because open loop estimation has been stated for being very difficult or even impossible. A novel method, which requires an additional current measurement, is presented in this work where parameter estimation is accomplished in open loop in a simple and reliable way. The second main contribution is related to robust control. Usually, the nominal and uncertainty models are assumed to be known and the designer is limited to specify the performance requirements. In a more realistic situation, the designer may only have a set of complex points in the Nyquist plane from several worst-case plants as a result of measurement experiments. In the thesis a deterministic method is proposed, which generates a nominal and uncertainty model based on the set of complex points in a less conservative way than conventional methods. Finally, the third main contribution is to be found in the fault-diagnosis and fault-tolerant control fields. One of the main challenges in the positioning control of the focus point in CD-players is to handle two types of disturbances with conflicting requirements in an effective way. While a high bandwidth is desired to better suppress shocks, a low bandwidth is preferred in the presence of surface defects. Traditionally, a simple defect detector is employed to deal with this trade-off. In this work, two fault diagnosis schemes are suggested which are able not only to detect but also to separate, to certain extent, the characteristics of the signals originated by the surface defects. Furthermore two fault-tolerant control schemes are proposed such that the mentioned trade-off is handled in a more efficient way.

  14. A Multiple Fault Tolerant Approach with Improved Performance in Cluster Computing

    OpenAIRE

    Sanjay Bansal; Sanjeev Sharma; Rajiv Gandhi Prodhyogiki Vishwavidya

    2011-01-01

    In case of multiple node failures performance is very low as compare to single node failure. Failures of nodes in cluster computing can be tolerated by multiple fault tolerant computing. In this paper, we propose a multiple fault tolerant technique with improved failure detection and performance. Failure detection is done by improved adaptive heartbeats based algorithm to improve the degree of confidence and accuracy. Failure recovery is based on reassignment of load with a rank based algorit...

  15. Row fault detection system

    Science.gov (United States)

    Archer, Charles Jens (Rochester, MN); Pinnow, Kurt Walter (Rochester, MN); Ratterman, Joseph D. (Rochester, MN); Smith, Brian Edward (Rochester, MN)

    2012-02-07

    An apparatus, program product and method check for nodal faults in a row of nodes by causing each node in the row to concurrently communicate with its adjacent neighbor nodes in the row. The communications are analyzed to determine a presence of a faulty node or connection.

  16. Initial Fault Tolerance and Autonomy Results for Autonomous On-board Processing of Hyperspectral Imaging

    Science.gov (United States)

    French, M.; Walters, J.; Zick, K.

    2011-12-01

    By developing Radiation Hardening by Software (RHBSW) techniques leveraged from the High Performance Computing community, our work seeks to deliver radiation tolerant, high performance System on a Chip (SoC) processors to the remote sensing community. This SoC architecture is uniquely suited to both handle high performance signal processing tasks, as well as autonomous agent processing. This allows situational awareness to be developed in-situ, resulting in a 10-100x decrease in processing latency, which directly translates into more science experiments conducted per day and a more thorough, timely analysis of captured data. With the increase in the amount of computational throughput made possible by commodity high performance processors and low overhead fault tolerance, new applications can be considered for on-board processing. A high performance and low overhead fault tolerance strategy targeting scientific applications on the SpaceCube 1.0 platform has been enhanced with initial results showing an order of magnitude increase in Mean Time Between Data Error and a complete elimination of processor hangs. Initial study of representative Hyperspectral applications also proves promising due to high levels of data parallelism and fine grained parallelism achievable within FPGA System on a Chip architectures enabled by our RHBSW techniques. To demonstrate the kinds of capabilities these fault tolerance approaches yield, the team focused on applications representative of the Decadal Survey HyspIRI mission, which uses high throughput Thermal Infrared Scanner (132 Mbps) and Hyperspectral Visibe ShortWave InfraRed (804 Mbps) instruments, while having only a 15 Mbps downlink channel. This mission provides a great many use scenarios for onboard processing, from high compression algorithms, to pre-processing and selective download of high priority images, to full on-board classification. This paper focuses on recent efforts which revolve around developing a fault emulator for the embedded PowerPC within Xilinx V4FX devices, validating the RHBSW techniques developed in the prior year, and initial performance results on a representative autonomous Hyperspectral application. In the future, fault analysis data will be refined and correlated between software fault emulation, laser testing, and space based results. This project will also deliver expected performance results on an optimized, representative Hyperspectral imaging application demonstrating autonomous operations.

  17. Fault Tolerant Distributed and Fixed Hierarchical Mobile IP

    Directory of Open Access Journals (Sweden)

    Paramesh C. Upadhyay

    2010-04-01

    Full Text Available To several mobility management protocols proposed for IP-based mobile networks, faulttolerance aspect of mobility agents is a primary requirement to sustain continuous service availability to themobile hosts. For a localized or micro- mobility management solution, the local mobility agent i.e. gateway isa single point of failure because it is responsible for enforcing the signaling and data packets in its domain.Such failures may severely disrupt the communications among the failure-affected users. The problembecomes even more severe for mobility agents in a distributed mobility management scheme with overlappingregistration areas.This paper proposes a fault tolerance scheme for Distributed and Fixed Hierarchical Mobile IP(DFHMIP and evaluates its performance in terms of data transmission cost and blocking probability.

  18. Declarative Specification of Fault Tolerant Auction Protocols: The English Auction Case Study

    DEFF Research Database (Denmark)

    Dragoni, Nicola; Gaspari, Mauro

    2012-01-01

    Auction mechanisms are nowadays widely used in electronic commerce Web sites for buying and selling items among different users. The increasing importance of auction protocols in the negotiation phase is not limited to online marketplaces. In fact, the wide applicability of auctions as resource?allocation and negotiation mechanisms have also led to a great deal of interest in auctions within the agent community. A challenging issue for agents operating in open Multiagent Systems (such as the emerging semantic Web infrastructure) concerns the specification of declarative communication rules which could be published and shared allowing agents to dynamically engage well?known and trusted negotiation protocols. To cope with real?world applications, these rules should also specify fault tolerant patterns of interaction, enabling negotiating agents to interact with each other tolerating failures, for instance terminating an auction process even if some bidding agents dynamically crash. In this paper, we propose an approach to specify fault tolerant auction protocols in open and dynamic environments by means of communication rules dealing with crash failures of agents. We illustrate these concepts considering a case study about the specification of an English Auction protocol which tolerate crashes of bidding agents and we discuss its properties.

  19. Implementation Of High Reliable Fine Grain Fault Tolerance Redundant Technique For FPGA

    Directory of Open Access Journals (Sweden)

    M.J.C.prasad

    2013-11-01

    Full Text Available SRAM based FPGAs are attractive to use in space applications because of more flexibility and reprogram ability. As technology size decreases below nanometer SRAM based FPGAs are more susceptible to radiation. These effects can cause transient or permanent bit flipping on SRAM cells and respectively change the function of logic elements within FPGAs. Fault-masking methodologies are essential, because it is vital for the system to work always properly irrespective of various faults that occurs in Complex digital circuitry. Due to this fact, redundancy techniques, which target fault masking and fault tolerance are in our scope. In this project we are proposing Quadruple Force Decide Redundancy (QFDR a new approach in fault tolerance for mitigation problems in digital circuits, as simply replicating complete systems in Triple Modular Redundancy (TMR technique may not be sufficient anymore when especially applies to the space applications, failure rate increases because of second instance occurs before the first one recovers. It QFDR makes SRAM-based FPGAs effectively immune from SEU (Single Event Up-set mitigation challenges. The proposed QFDR is operated at an abstraction level of CLBs of FPGA. The Quadruple Force Decide Redundancy (QFDR is a redundant logical structure which quadruplicates logical functions and defines two different Force and Decide rules for different quadruple logic functions based on their level in design and then connects them together using special connection patterns. The complete logic of QFDR is implemented in VHDL. Modelsim Xilinx edition (MXE will be used for simulation and functional verification. Xilinx ISE will be used for synthesis. Xilinx FPGA board will be used for testing and demonstration of the implemented system.

  20. Comparing fault susceptibility of multiple ISAs and operating systems

    Science.gov (United States)

    Chy?ek, S?awomir

    2015-09-01

    This paper presents a research that aims to compare effects of faults on different configurations of computer systems. The study covers comparison of susceptibility to faults of x86, AMD64, ARM, PowerPC, MIPS architectures and Linux, FreeBSD, Minix operating systems. An emulation based software implemented fault injection technique was used to perform experiments. The problem of choosing an adequate number of tests in experiments is followed by report with collected results where multiple aspects of test runs were analyzed: providing correct computation result, availability of the system under test and error messages. The research allows to determine characteristics of susceptibility to faults of each platform and is a first step towards designing new fault tolerance solutions and assessing their effectiveness.

  1. Randomness fault detection system

    Science.gov (United States)

    Russell, B. Don (Inventor); Aucoin, B. Michael (Inventor); Benner, Carl L. (Inventor)

    1996-01-01

    A method and apparatus are provided for detecting a fault on a power line carrying a line parameter such as a load current. The apparatus monitors and analyzes the load current to obtain an energy value. The energy value is compared to a threshold value stored in a buffer. If the energy value is greater than the threshold value a counter is incremented. If the energy value is greater than a high value threshold or less than a low value threshold then a second counter is incremented. If the difference between two subsequent energy values is greater than a constant then a third counter is incremented. A fault signal is issued if the counter is greater than a counter limit value and either the second counter is greater than a second limit value or the third counter is greater than a third limit value.

  2. Automated distribution fault locating system

    Energy Technology Data Exchange (ETDEWEB)

    Hager, G.E. [Electrical Systems Consultants Inc., Fort Collins, CO (United States); Medicine Bear, R.N. [City of Aztec, NM (United States); Baum, A.S. [Scipar Inc., Buffalo, NY (United States)

    1996-05-01

    An automated fault locating system (FLS) was designed and implemented for the Colorado River Agency (CRA) 12.5/7.2-kV distribution system. This FLS was integrated into the supervisory control and data acquisition (SCADA) system which was installed at a hydropower plant. This FLS offers several benefits to the CRA distribution system. These benefits include reduced outage time, help in locating momentary faults, enhanced safety to the line crews, provide notification of an outage without receiving calls from the consumer, and decreased overtime.

  3. Expert System Detects Power-Distribution Faults

    Science.gov (United States)

    Walters, Jerry L.; Quinn, Todd M.

    1994-01-01

    Autonomous Power Expert (APEX) computer program is prototype expert-system program detecting faults in electrical-power-distribution system. Assists human operators in diagnosing faults and deciding what adjustments or repairs needed for immediate recovery from faults or for maintenance to correct initially nonthreatening conditions that could develop into faults. Written in Lisp.

  4. Data Structures: Sequence Problems, Range Queries, and Fault Tolerance

    DEFF Research Database (Denmark)

    JØrgensen, Allan GrØnlund

    2010-01-01

    The focus of this dissertation is on algorithms, in particular data structures that give provably ecient solutions for sequence analysis problems, range queries, and fault tolerant computing. The work presented in this dissertation is divided into three parts. In Part I we consider algorithms for a range of sequence analysis problems that have risen from applications in pattern matching, bioinformatics, and data mining. On a high level, each problem is dened by a function and some constraints and the job at hand is to locate subsequences that score high with this function and are not invalidated by the constraints. Many variants and similar problems have been proposed leading to several dierent approaches and algorithms. We consider problems where the function is the sum of the elements in the sequence and the constraints only bound the length of the subsequences considered. We give optimal algorithms for several variants of the problem based on a simple idea and classic algorithms and data structures. In Part II we consider range query data structures. This a category of problems where the task is to preprocess an input sequence using as little time and space as possible such that one can eciently compute a certain function on the elements in a given query subsequence. There are many types of functions that has been considered in connection with input from dierent sources. The input could be ip-data sorted by ip-address, real estate prices sorted by zip code, advertising cost sorted by time etc. We consider data structures for two classic statistics functions, namely median and mode. Finally, Part III investigates fault tolerant algorithms and data structures. This deals with the trend of avoiding elaborate error checking and correction circuitry that would impose non-negligible costs in terms of hardware performance and money in the design of todays high speed memory technologies. Hardware, power failures, and environmental conditions such as cosmic rays and alpha particles can all alter the memory in unpredictable ways. In applications where large memory capacities are needed at low cost, it makes sense to assume that the algorithms themselves are in charge for dealing with memory faults. We investigate searching, sorting and counting algorithms and data structures that provably returns sensible information in spite of memory corruptions.

  5. Fault Diagnosis for Electrical Distribution Systems using Structural Analysis

    DEFF Research Database (Denmark)

    Knüppel, Thyge; Blanke, Mogens

    2014-01-01

    Fault-tolerance in electrical distribution relies on the ability to diagnose possible faults and determine which components or units cause a problem or are close to doing so. Faults include defects in instrumentation, power generation, transformation and transmission. The focus of this paper is the design of efficient diagnostic algorithms, which is a prerequisite for fault-tolerant control of power distribution. Diagnosis in a grid depend on available analytic redundancies, and hence on network topology. When topology changes, due to earlier fault(s) or caused by maintenance, analytic redundancy relations (ARR) are likely to change. The algorithms used for diagnosis may need to change accordingly, and finding efficient methods to ARR generation is essential to employ fault-tolerant methods in the grid. Structural analysis (SA) is based on graph-theoretical results, that offer to find analytic redundancies in large sets of equations only from the structure (topology) of the equations. A salient feature is automated generation of redundancy relations. The method is indeed feasible in electrical networks where circuit theory and network topology together formulate the constraints that define a structure graph. This paper shows how three-phase networks are modelled and analysed using structural methods, and it extends earlier results by showing how physical faults can be identified such that adequate remedial actions can be taken. The paper illustrates a feasible modelling technique for structural analysis of power systems, it demonstrates detection and isolation of failures in a network, and shows how typical faults are diagnosed. Nonlinear fault simulations illustrate the results.

  6. A Self-Stabilizing Byzantine-Fault-Tolerant Clock Synchronization Protocol

    Science.gov (United States)

    Malekpour, Mahyar R.

    2009-01-01

    This report presents a rapid Byzantine-fault-tolerant self-stabilizing clock synchronization protocol that is independent of application-specific requirements. It is focused on clock synchronization of a system in the presence of Byzantine faults after the cause of any transient faults has dissipated. A model of this protocol is mechanically verified using the Symbolic Model Verifier (SMV) [SMV] where the entire state space is examined and proven to self-stabilize in the presence of one arbitrary faulty node. Instances of the protocol are proven to tolerate bursts of transient failures and deterministically converge with a linear convergence time with respect to the synchronization period. This protocol does not rely on assumptions about the initial state of the system other than the presence of sufficient number of good nodes. All timing measures of variables are based on the node s local clock, and no central clock or externally generated pulse is used. The Byzantine faulty behavior modeled here is a node with arbitrarily malicious behavior that is allowed to influence other nodes at every clock tick. The only constraint is that the interactions are restricted to defined interfaces.

  7. Observer-based Fault Detection and Isolation for Nonlinear Systems

    DEFF Research Database (Denmark)

    Lootsma, T.F.

    2001-01-01

    With the rise in automation the increase in fault detectionand isolation & reconfiguration is inevitable. Interest in fault detection and isolation (FDI) for nonlinear systems has grown significantly in recent years. The design of FDI is motivated by the need for knowledge about occurring faults in fault-tolerant control systems (FTC systems). The idea of FTC systems is to detect, isolate, and handle faults in such a way that the systems can still perform in a required manner. One prefers reduced performance after occurrence of a fault to the shut down of (sub-) systems. Hence, the idea of fault-tolerance can be applied to ordinary industrial processes that are not categorized as high risk applications, but where high availability is desirable. The quality of fault-tolerant control is totally dependent on the quality of the underlying algorithms. They detect possible faults, and later reconfigure control software to handle the effects of the particular fault event. In the past mainly linear FDI methods were developed, but as most industrial plants show nonlinear behavior, nonlinear methods for fault diagnosis could probably perform better. This thesis considers the design of FDI for nonlinear systems. It consists of four different contributions. First, it presents a review of the idea and the theory behind the geometric approach for FDI. Starting from the original solution for linear systems up to the latest results for input-affine systems the theory and solutions are described. Then the geometric approach is applied to a nonlinear ship propulsion system benchmark. The calculations and application results are presented in detail to give an illustrative example. The obtained subsystems are considered for the design of nonlinear observers in order to obtain FDI. Additionally, an adaptive nonlinear observer design is given for comparison. The simulation results are used to discuss different aspects of the geometric approach, e.g. the possibility to use it as a general approach. The third contribution considers stability analysis of observers used for FDI. It gives proofs of stability for the observers designed for the ship propulsion system. Furthermore, it stresses the importance of the time-variant character of the linearization along a trajectory. It leads to a different stability analysis than for linearization at one operation point. Finally, the preliminary concept of (actuator) fault-output decoupling is described. It is a new idea based on the solution of the input-output decoupling problem. The idea is to include FDI considerations already during the control design.

  8. A New Trend on the Development of Fault-Tolerant Applications: Software Meta-Level Architectures

    Scientific Electronic Library Online (English)

    Maria Lúcia Blanck, Lisbôa.

    1997-11-01

    Full Text Available The purpose of this paper is to investigate a clearly defined way of developing fault-tolerant applications using software meta-level architectures. Meta-level architectures are software architectures based on computational reflection. It addresses complex pieces of software: fault-tolerant software [...] . Fault-tolerant applications must cope with several non-functional requirements to maintain its functionality. So, it is particularly relevant to investigate how to alleviate developers from repeatedly dealing with this complexity. Some solutions are presented, such as software patterns and basic guidelines to help the development of such applications

  9. Fault tolerant VLSI (Very Large-Scale Integration) design using error correcting codes

    Science.gov (United States)

    Hartmann, C. R.; Lala, P. K.; Ali, A. M.; Ganguly, S.; Visweswaran, G. S.

    1989-02-01

    Very Large-Scale Integration (VLSI) provides the opportunity to design fault tolerant, self-checking circuits with on-chip, concurrent error correction. This study determines the applicability of a variety of error-detecting, error-correcting codes (EDAC) in high speed digital data processors and buses. In considering both microcircuit faults and bus faults, some of the codes examined are: Berger, repetition, parity, residue, and Modified Reflected Binary codes. The report describes the improvement in fault tolerance obtained as a result of implementing these EDAC schemes and the associated penalties in circuit area.

  10. Development and evaluation of a Fault-Tolerant Multiprocessor (FTMP) computer. Volume 2: FTMP software

    Science.gov (United States)

    Lala, J. H.; Smith, T. B., III

    1983-01-01

    The software developed for the Fault-Tolerant Multiprocessor (FTMP) is described. The FTMP executive is a timer-interrupt driven dispatcher that schedules iterative tasks which run at 3.125, 12.5, and 25 Hz. Major tasks which run under the executive include system configuration control, flight control, and display. The flight control task includes autopilot and autoland functions for a jet transport aircraft. System Displays include status displays of all hardware elements (processors, memories, I/O ports, buses), failure log displays showing transient and hard faults, and an autopilot display. All software is in a higher order language (AED, an ALGOL derivative). The executive is a fully distributed general purpose executive which automatically balances the load among available processor triads. Provisions for graceful performance degradation under processing overload are an integral part of the scheduling algorithms.

  11. Fault tolerant model predictive control of open channels

    OpenAIRE

    Horváth, Klaudia; Blesa Izquierdo, Joaquim; Duviella, Eric; Chuquet, Karine

    2014-01-01

    Automated control of water systems (irrigation canals, navigation canals, rivers etc.) relies on the measured data. The control action is calculated, in case of feedback controller, directly from the on-line measured data. If the measured data is corrupted, the calculated control action will have a different effect than it is desired. Therefore, it is crucial that the feedback controller receives good quality measurement data. On-line fault detection techniques can be applied in order to dete...

  12. A model-based approach for fault-tolerant control

    OpenAIRE

    Niemann, Hans Henrik

    2010-01-01

    A model-based controller architecture for faulttolerant control (FTC) is presented in this paper. The controller architecture is based on the Youla-Jabr-Bongiorno-Kucera (YJBK) parameterization. The FTC architecture consists of two central parts, fault detection and isolation (FDI) part and a controller reconfiguration part. The theoretical basis for the architecture will be given followed by an investigation of the single parts in the architecture. At last, system ...

  13. Fault tolerance in a supercomputer through dynamic repartitioning

    Energy Technology Data Exchange (ETDEWEB)

    Chen, Dong (Croton On Hudson, NY); Coteus, Paul W. (Yorktown Heights, NY); Gara, Alan G. (Mount Kisco, NY); Takken, Todd E. (Mount Kisco, NY)

    2007-02-27

    A multiprocessor, parallel computer is made tolerant to hardware failures by providing extra groups of redundant standby processors and by designing the system so that these extra groups of processors can be swapped with any group which experiences a hardware failure. This swapping can be under software control, thereby permitting the entire computer to sustain a hardware failure but, after swapping in the standby processors, to still appear to software as a pristine, fully functioning system.

  14. Detection and treatment of faults in manufacturing systems based on Petri Nets

    OpenAIRE

    L. A. M. Riascos; L. A. Moscato; P.E. Miyagi

    2004-01-01

    This paper introduces a methodology for modeling and analyzing fault-tolerant manufacturing systems that not only optimizes normal productive processes, but also performs detection and treatment of faults. This approach is based on the hierarchical and modular integration of Petri Nets. The modularity provides the integration of three types of processes: those representing the productive process, fault detection, and fault treatment. The hierarchical aspect of the approach permits us to consi...

  15. P2P-MPI : A fault-tolerant Message Passing Interface Implementation for Grids

    OpenAIRE

    Rattanapoka, Choopan

    2008-01-01

    This thesis aims to demonstrate that message-passing parallel programs can be deployed onto large, heterogeneous distributed systems. This work consists in the design and development of a proof-of-concept middleware named P2P-MPI, released under a public license. P2P-MPI alleviates this task by proposing a peer-to-peer based platform in which available resources are dynamically discovered upon job requests, and by providing a fault-tolerant message-passing library for Java programs. The motiv...

  16. Fault Detection for Difference Flat Systems

    OpenAIRE

    Nan Zhang; Andrei Doncescu; Alexandre C. Brandao Ramos; Felix Mora-Camino

    2012-01-01

    Fault detection is essential for the survivability of many systems. Since many systems present highly nonlinear dynamics, the applicability of general fault detection techniques designed mainly for linear systems is very questionable. In this communication, after introducing the concept of difference flat nonlinear systems, a fault detection scheme based on difference flatness is proposed.

  17. LQCD workflow execution framework: Models, provenance and fault-tolerance

    International Nuclear Information System (INIS)

    Large computing clusters used for scientific processing suffer from systemic failures when operated over long continuous periods for executing workflows. Diagnosing job problems and faults leading to eventual failures in this complex environment is difficult, specifically when the success of an entire workflow might be affected by a single job failure. In this paper, we introduce a model-based, hierarchical, reliable execution framework that encompass workflow specification, data provenance, execution tracking and online monitoring of each workflow task, also referred to as participants. The sequence of participants is described in an abstract parameterized view, which is translated into a concrete data dependency based sequence of participants with defined arguments. As participants belonging to a workflow are mapped onto machines and executed, periodic and on-demand monitoring of vital health parameters on allocated nodes is enabled according to pre-specified rules. These rules specify conditions that must be true pre-execution, during execution and post-execution. Monitoring information for each participant is propagated upwards through the reflex and healing architecture, which consists of a hierarchical network of decentralized fault management entities, called reflex engines. They are instantiated as state machines or timed automatons that change state and initiate reflexive mitigation action(s) upon occurrence of certain faults. We describe how this cluster reliability framework is combined with the workflow execution framework using formal rules and actions specified within a structure of first order predicate logic that enables a dynamic management design that reduces manual administrative workload, and increases cluster-productivity.

  18. LQCD workflow execution framework: Models, provenance and fault-tolerance

    Energy Technology Data Exchange (ETDEWEB)

    Piccoli, Luciano; Simone, James N; Kowalkowlski, James B [Fermi National Accelerator Laboratory, Batavia, IL, 60510 (United States); Dubey, Abhishek [Institute for Software Integrated Systems, Vanderbilt University, Nashville, TN, 37235 (United States)

    2010-04-01

    Large computing clusters used for scientific processing suffer from systemic failures when operated over long continuous periods for executing workflows. Diagnosing job problems and faults leading to eventual failures in this complex environment is difficult, specifically when the success of an entire workflow might be affected by a single job failure. In this paper, we introduce a model-based, hierarchical, reliable execution framework that encompass workflow specification, data provenance, execution tracking and online monitoring of each workflow task, also referred to as participants. The sequence of participants is described in an abstract parameterized view, which is translated into a concrete data dependency based sequence of participants with defined arguments. As participants belonging to a workflow are mapped onto machines and executed, periodic and on-demand monitoring of vital health parameters on allocated nodes is enabled according to pre-specified rules. These rules specify conditions that must be true pre-execution, during execution and post-execution. Monitoring information for each participant is propagated upwards through the reflex and healing architecture, which consists of a hierarchical network of decentralized fault management entities, called reflex engines. They are instantiated as state machines or timed automatons that change state and initiate reflexive mitigation action(s) upon occurrence of certain faults. We describe how this cluster reliability framework is combined with the workflow execution framework using formal rules and actions specified within a structure of first order predicate logic that enables a dynamic management design that reduces manual administrative workload, and increases cluster-productivity.

  19. A Byzantine resilient fault tolerant computer for nuclear power plant applications

    International Nuclear Information System (INIS)

    A quadruply redundant synchronous fault tolerant processor, capable of tolerating Byzantine faults, is now under fabrication at the C.S. Draper Laboratory to be used initially as a trip monitor for the Experimental Breeder Reactor EBR-II operated by the Argonne National Laboratory in Idaho Falls, Idaho. This paper describes the hardware architecture of this processor and discusses certain issues unique to quadruply redundant computers

  20. PWM Inverter-Fed Induction Motor-Based Electrical Vehicles Fault-Tolerant Control

    OpenAIRE

    Tabbache, Bekheira; Benbouzid, Mohamed; Kheloui, Abdelaziz; Bourgeot, Jean-Matthieu; Mamoune, Abdeslam

    2013-01-01

    This paper proposes a fault-tolerant control scheme for PWM inverter-fed induction motor-based electric vehicles. The proposed strategy deals with power switch (IGBTs) failures mitigation within a reconfigurable induction motor control. In a vehicle context, 4-wire and 4-leg PWM inverter topologies are investigated and their performances discussed. Two topologies exploit the induction motor neutral accessibility for fault-tolerant purposes. The 4-wire topology uses then classical hysteresis c...

  1. Evaporator unit as a benchmark for plug and play and fault tolerant control

    DEFF Research Database (Denmark)

    Izadi-Zamanabadi, Roozbeh; Vinther, Kasper; Mojallali, Hamed; Rasmussen, Henrik; Stoustrup, Jakob

    2012-01-01

    This paper presents a challenging industrial benchmark for implementation of control strategies under realistic working conditions. The developed control strategies should perform in a plug & play manner, i.e. adapt to varying working conditions, optimize their performance, and provide fault tolerance. A fault tolerant strategy is needed to deal with a faulty sensor measurement of the evaporation pressure. The design and algorithmic challenges in the control of an evaporator include: unknown mod...

  2. ALLIANCE: An architecture for fault tolerant multi-robot cooperation

    Energy Technology Data Exchange (ETDEWEB)

    Parker, L.E.

    1995-02-01

    ALLIANCE is a software architecture that facilitates the fault tolerant cooperative control of teams of heterogeneous mobile robots performing missions composed of loosely coupled, largely independent subtasks. ALLIANCE allows teams of robots, each of which possesses a variety of high-level functions that it can perform during a mission, to individually select appropriate actions throughout the mission based on the requirements of the mission, the activities of other robots, the current environmental conditions, and the robot`s own internal states. ALLIANCE is a fully distributed, behavior-based architecture that incorporates the use of mathematically modeled motivations (such as impatience and acquiescence) within each robot to achieve adaptive action selection. Since cooperative robotic teams usually work in dynamic and unpredictable environments, this software architecture allows the robot team members to respond robustly, reliably, flexibly, and coherently to unexpected environmental changes and modifications in the robot team that may occur due to mechanical failure, the learning of new skills, or the addition or removal of robots from the team by human intervention. The feasibility of this architecture is demonstrated in an implementation on a team of mobile robots performing a laboratory version of hazardous waste cleanup.

  3. ALLIANCE: An architecture for fault tolerant multi-robot cooperation

    International Nuclear Information System (INIS)

    ALLIANCE is a software architecture that facilitates the fault tolerant cooperative control of teams of heterogeneous mobile robots performing missions composed of loosely coupled, largely independent subtasks. ALLIANCE allows teams of robots, each of which possesses a variety of high-level functions that it can perform during a mission, to individually select appropriate actions throughout the mission based on the requirements of the mission, the activities of other robots, the current environmental conditions, and the robot's own internal states. ALLIANCE is a fully distributed, behavior-based architecture that incorporates the use of mathematically modeled motivations (such as impatience and acquiescence) within each robot to achieve adaptive action selection. Since cooperative robotic teams usually work in dynamic and unpredictable environments, this software architecture allows the robot team members to respond robustly, reliably, flexibly, and coherently to unexpected environmental changes and modifications in the robot team that may occur due to mechanical failure, the learning of new skills, or the addition or removal of robots from the team by human intervention. The feasibility of this architecture is demonstrated in an implementation on a team of mobile robots performing a laboratory version of hazardous waste cleanup

  4. Fault tolerant channel-encrypting quantum dialogue against collective noise

    Science.gov (United States)

    Ye, TianYu

    2015-04-01

    In this paper, two fault tolerant channel-encrypting quantum dialogue (QD) protocols against collective noise are presented. One is against collective-dephasing noise, while the other is against collective-rotation noise. The decoherent-free states, each of which is composed of two physical qubits, act as traveling states combating collective noise. Einstein-Podolsky-Rosen pairs, which play the role of private quantum key, are securely shared between two participants over a collective-noise channel in advance. Through encryption and decryption with private quantum key, the initial state of each traveling two-photon logical qubit is privately shared between two participants. Due to quantum encryption sharing of the initial state of each traveling logical qubit, the issue of information leakage is overcome. The private quantum key can be repeatedly used after rotation as long as the rotation angle is properly chosen, making quantum resource economized. As a result, their information-theoretical efficiency is nearly up to 66.7%. The proposed QD protocols only need single-photon measurements rather than two-photon joint measurements for quantum measurements. Security analysis shows that an eavesdropper cannot obtain anything useful about secret messages during the dialogue process without being discovered. Furthermore, the proposed QD protocols can be implemented with current techniques in experiment.

  5. RAID Unbound: Storage Fault Tolerance in a Distributed Environment

    Science.gov (United States)

    Ritchie, Brian

    1996-01-01

    Mirroring, data replication, backup, and more recently, redundant arrays of independent disks (RAID) are all technologies used to protect and ensure access to critical company data. A new set of problems has arisen as data becomes more and more geographically distributed. Each of the technologies listed above provides important benefits; but each has failed to adapt fully to the realities of distributed computing. The key to data high availability and protection is to take the technologies' strengths and 'virtualize' them across a distributed network. RAID and mirroring offer high data availability, which data replication and backup provide strong data protection. If we take these concepts at a very granular level (defining user, record, block, file, or directory types) and them liberate them from the physical subsystems with which they have traditionally been associated, we have the opportunity to create a highly scalable network wide storage fault tolerance. The network becomes the virtual storage space in which the traditional concepts of data high availability and protection are implemented without their corresponding physical constraints.

  6. Study of a Nine-Phase Fault Tolerant Permanent Magnet Starter-Alternator

    OpenAIRE

    RUBA Mircea; SURDU Felicia; SZABÓ Loránd

    2011-01-01

    The paper presents a study on a nine-phasepermanent magnet synchronous starter-alternator forautomotive applications, analyzing different convertertopologies, detailing the simulation programs anddiscussing the results in different operating conditions,from entire healthy machine to several faulted phases.The comparison between the two converter topologiescontrolling the multiphase machine highlights theincreased fault tolerance, hence the reliability of suchstarter-alternator structures. Nev...

  7. Load Balancing with Fault Tolerance and Optimal Resource Utilization in Grid Computing

    Directory of Open Access Journals (Sweden)

    Neeraj Nehra

    2007-01-01

    Full Text Available In grid computing, load balancing with optimal resource utilization and fault tolerance are important issues. The availability of the selected resources for job execution is a primary factor that determines the computing performance. Typically, the probability of a failure is higher in the grid computing than in a traditional parallel computing and the failure of resources affects job execution fatally. Therefore, a fault tolerance service is essential in grid. Also grid services are often expected to meet some minimum levels of Quality of Service (QoS for a desirable operation. To address this issue, we propose load balancing with optimal resource utilization and fault tolerance service that satisfies QoS requirements. A fault tolerance service deals with various types of resource failures, which include process failures, processor failures and network failures. We design and implement a fault detector and a fault manager. Approach is effective in the sense that the fault detector detects the occurrence of resource failures and the fault manager guarantees that the submitted jobs completely executed with optimal resources. The performance of job execution is improved due to job migration using Mobile Agent (MA even if some failures occurs. This MA executes one of the check pointing algorithms and its performance is compared with check pointing algorithm-using Message Passing Interface (MPI. Also the overhead generated during job migration is compared with MA and MPI.

  8. Flatness-based fault tolerant control / Control tolerante a fallas basado en planitud

    Scientific Electronic Library Online (English)

    César, Martínez-Torres; Loïc, Lavigne; Franck, Cazaurang; Efraín, Alcorta-García; David A., Díaz-Romero.

    2014-12-01

    Full Text Available Este artículo presenta un método de control tolerante a fallas para sistemas no lineales planos. Las propiedades intrínsecas de los sistemas planos generan redundancia analítica y permiten calcular todos los estados y las entradas de control del sistema. Los residuos son calculados comparando las me [...] didas reales provenientes de los sensores y las señales obtenidas gracias al conjunto de ecuaciones del sistema plano. Fallas multiplicativas y aditivas se pueden manejar de manera indistinta. Las señales redundantes obtenidas con las ecuaciones del sistema plano son usadas para reconfigurar el sistema con falla. La factibilidad del método propuesto es verificada para fallas aditivas en un sistema de tres tanques. Abstract in english This paper presents a Fault Tolerant control approach for nonlinear flat systems. Flatness property affords analytical redundancy and permit to compute the states and control inputs of the system. Residual signals are computed by comparing real measures and the computed signals obtained using the di [...] fferentially flat equations. Multiplicative and additive faults can be handled indistinctly. The redundant signals obtained with the differentially flat equations are used to reconfigure the faulty system. Feasibility of this approach is verified for additive faults in a three tank system.

  9. An Adaptive Job Scheduling with efficient Fault Tolerance Strategy in Computational Grid

    Directory of Open Access Journals (Sweden)

    S. Gokuldev

    2014-08-01

    Full Text Available Grid computing is an emerging technology which has the potential to solve large scale scientific problems in an integrated heterogeneous environment. However, in the grid computing environment there are certain aspects which reduces efficiency of the system. Scheduling the jobs to the best suited resources, achieving the load balancing and fault tolerance are the key aspects to improve the efficiency and to exploit the capabilities of emergent computational systems. Because of dynamic and distributed nature of the grid, the traditional methodologies of scheduling are inefficient for the effective utilization of the available resources. In this paper, an efficient adaptive job scheduling algorithm is proposed to improve the efficiency of the grid system for a large number of tasks. Moreover, the proposed adaptive job scheduling in addition to the fault tolerance strategy with check pointing approach shows the improvement in performance of the overall computation time even in worst scenario under the heterogeneous grid environment. The simulation results illustrates that the proposed strategy effectively schedules the grid jobs with more than 10% increase in overall performance thus resulting in minimization of overall execution time.

  10. Software fault detection and recovery in critical real-time systems: An approach based on loose coupling

    International Nuclear Information System (INIS)

    Highlights: •We analyze fault tolerance in mission-critical real-time systems. •Decoupled architectural model can be used to implement fault tolerance. •Prototype implementation for remote handling control system and service manager. •Recovery from transient faults by restarting services. -- Abstract: Remote handling (RH) systems are used to inspect, make changes to, and maintain components in the ITER machine and as such are an example of mission-critical system. Failure in a critical system may cause damage, significant financial losses and loss of experiment runtime, making dependability one of their most important properties. However, even if the software for RH control systems has been developed using best practices, the system might still fail due to undetected faults (bugs), hardware failures, etc. Critical systems therefore need capability to tolerate faults and resume operation after their occurrence. However, design of effective fault detection and recovery mechanisms poses a challenge due to timeliness requirements, growth in scale, and complex interactions. In this paper we evaluate effectiveness of service-oriented architectural approach to fault tolerance in mission-critical real-time systems. We use a prototype implementation for service management with an experimental RH control system and industrial manipulator. The fault tolerance is based on using the high level of decoupling between services to recover from transient faults by service restarts. In case the recovery process is not successful, the system can still be used if the fault was not in a critical software module

  11. An Evaluation of Fault Tolerant Wind Turbine Control Schemes applied to a Benchmark Model

    DEFF Research Database (Denmark)

    Odgaard, Peter Fogh; Stoustrup, Jakob

    2014-01-01

    Reliability and availability of modern wind turbines increases in importance as the ratio in the world's power supply increases. This is important in order to increase the energy generated per unit and their lowering cost of energy and as well to ensure availability of generated power, which helps keeping the power grids stable. Advanced Fault Tolerant Control is one of the potential tools to increase reliability of modern wind turbines. A benchmark model for wind turbine fault detection and isolation and fault tolerant control has previously been proposed, and based on this benchmark an international competition on wind turbine fault tolerant control has been proposed. In this article the top three solutions from this wind fault tolerant control competition are introduced and evaluated. The evaluation presented in this paper shows that the winner of the competition performs very well on this benchmark and is especially good accommodating sensors faults. The two other evaluated solutions do also well accommodating sensors faults, but have some issues which should be worked on, before they can be considered as a full solution to the benchmark problem.

  12. Clustering and fault tolerance for target tracking using wireless sensor networks

    International Nuclear Information System (INIS)

    Over the last few years, the deployment of WSNs (Wireless Sensor Networks) has been fostered in diverse applications. WSN has great potential for a variety of domains ranging from scientific experiments to commercial applications. Due to the deployment of WSNs in dynamic and unpredictable environments. They have potential to cope with variety of faults. This paper proposes an energy-aware fault-tolerant clustering protocol for target tracking applications termed as the FITf (Fault Tolerant Target Tracking) protocol The identification of RNs (Redundant Nodes) makes SN (Sensor Node) fault tolerance plausible and the clustering endorsed recovery of sensors supervised by a faulty CH (Cluster Head). The FfTT protocol intends two steps of reducing energy consumption: first, by identifying RNs in the network; secondly, by restricting the numbers of SNs sending data to the CH. Simulations validate the scalability and low power consumption of the FITf protocol in comparison with LEACH protocol. (author)

  13. Design of Parity Preserving Logic Based Fault Tolerant Reversible Arithmetic Logic Unit

    Directory of Open Access Journals (Sweden)

    Rakshith Saligram

    2013-07-01

    Full Text Available Reversible Logic is gaining significant consideration as the potential logic design style for implementationin modern nanotechnology and quantum computing with minimal impact on physical entropy .FaultTolerant reversible logic is one class of reversible logic that maintain the parity of the input and theoutputs. Significant contributions have been made in the literature towards the design of fault tolerantreversible logic gate structures and arithmetic units, however, there are not many efforts directed towardsthe design of fault tolerant reversible ALUs. Arithmetic Logic Unit (ALU is the prime performing unit inany computing device and it has to be made fault tolerant. In this paper we aim to design one such faulttolerant reversible ALU that is constructed using parity preserving reversible logic gates. The designedALU can generate up to seven Arithmetic operations and four logical operations.

  14. Geodetic Imaging of Fault System Activity

    OpenAIRE

    Evans, Eileen Louise

    2014-01-01

    Geodetic observations provide kinematic constraints on the behavior of tectonically active fault systems. Estimates of earthquake cycle activity derived from these constraints may depend on modeling assumptions and/or regularization of a geodetic inverse problem, which is often poorly conditioned. Common model assumptions may affect kinematic solutions and conclusions about physical properties of faults and fault zones. For example, within a geometrically complex fault system, parameterizatio...

  15. Fault-Tolerance and Load-Balance Tradeoff in a Distributed Storage System / Estudio de la interdependencia entre tolerancia a fallas y balance de carga en un sistema de almacenamiento distribuido

    Scientific Electronic Library Online (English)

    Moisés, Quezada Naquid; Ricardo, Marcelín Jiménez; Miguel, López Guerrero.

    2010-12-01

    Full Text Available En los últimos años los sistemas de almacenamiento distribuido han sido objeto de un gran interés por parte de la comunidad de investigadores. Estos sistemas prometen mejoras en cuanto a integridad, seguridad y disponibilidad de la información. Sin embargo, hasta este momento no existe un enfoque pr [...] edominante, aunque hay diversas propuestas en la literatura. En este artículo reportamos los resultados de nuestras investigaciones con una combinación de técnicas de redundancia que tienen el propósito de proveer simultáneamente tolerancia a fallas y balance de carga en un sistema de almacenamiento distribuido de pequeña escala. Con base en nuestro análisis proporcionamos líneas directrices generales para diseñadores y desarrolladores de sistemas similares. Abstract in english In recent years distributed storage systems have been the object of increasing interest by the research community. They promise improvements on information availability, security and integrity. Nevertheless, at this point in time, there is no a predominant approach, but a wide spectrum of proposals [...] in the literature. In this paper we report our findings with a combination of redundancy techniques intended to simultaneously provide fault tolerance and load balance in a small-scale distributed storage system. Based on our analysis, we provide general guidelines for system designers and developers under similar conditions.

  16. Minimum sliding mode error feedback control for fault tolerant reconfigurable satellite formations with J2 perturbations

    Science.gov (United States)

    Cao, Lu; Chen, Xiaoqian; Misra, Arun K.

    2014-03-01

    Minimum Sliding Mode Error Feedback Control (MSMEFC) is proposed to improve the control precision of spacecraft formations based on the conventional sliding mode control theory. This paper proposes a new approach to estimate and offset the system model errors, which include various kinds of uncertainties and disturbances, as well as smoothes out the effect of nonlinear switching control terms. To facilitate the analysis, the concept of equivalent control error is introduced, which is the key to the utilization of MSMEFC. A cost function is formulated on the basis of the principle of minimum sliding mode error; then the equivalent control error is estimated and fed back to the conventional sliding mode control. It is shown that the sliding mode after the MSMEFC will approximate to the ideal sliding mode, resulting in improved control performance and quality. The new methodology is applied to spacecraft formation flying. It guarantees global asymptotic convergence of the relative tracking error in the presence of J2 perturbations. In addition, some fault tolerant situations such as thruster failure for a period of time, thruster degradation and so on, are also considered to verify the effectiveness of MSMEFC. Numerical simulations are performed to demonstrate the efficacy of the proposed methodology to maintain and reconfigure the satellite formation with the existence of initial offsets and J2 perturbation effects, even in the fault-tolerant cases.

  17. Experimental Analysis of the Fault Tolerance of the PIM-SM IP Multicast Routing Protocol under GNS3

    OpenAIRE

    Gábor Lencse; István Derka

    2014-01-01

    PIM-SM is the most commonly used IP multicast routing protocol in IPTV systems. Its fault tolerance is examined by experimenting on a mesh topology multicast test network built up by Cisco routers under GNS3. Different fault scenarios are played and different parameters of the PIM-SM and of the OSPF protocols are examined if they influence and how they influence the outage time of an IPTV service. The failure of the Rendezvous Point (RP) of the given IP multicast group as well as the complete...

  18. Development and evaluation of a Fault-Tolerant Multiprocessor (FTMP) computer. Volume 3: FTMP test and evaluation

    Science.gov (United States)

    Lala, J. H.; Smith, T. B., III

    1983-01-01

    The experimental test and evaluation of the Fault-Tolerant Multiprocessor (FTMP) is described. Major objectives of this exercise include expanding validation envelope, building confidence in the system, revealing any weaknesses in the architectural concepts and in their execution in hardware and software, and in general, stressing the hardware and software. To this end, pin-level faults were injected into one LRU of the FTMP and the FTMP response was measured in terms of fault detection, isolation, and recovery times. A total of 21,055 stuck-at-0, stuck-at-1 and invert-signal faults were injected in the CPU, memory, bus interface circuits, Bus Guardian Units, and voters and error latches. Of these, 17,418 were detected. At least 80 percent of undetected faults are estimated to be on unused pins. The multiprocessor identified all detected faults correctly and recovered successfully in each case. Total recovery time for all faults averaged a little over one second. This can be reduced to half a second by including appropriate self-tests.

  19. Robot-borne fault tolerant calculators for nuclear use

    International Nuclear Information System (INIS)

    The use of robots has become a necessity in civil nuclear industry. Electronic systems of such robots must tolerate cumulative ionizing radiation dose effects. Today's objective is to reach a 3 kGy dose resistance. Difficulties and costs involved during on-site maintenance imply to warrant at least one functioning mode in the case of system failure. To improve the behaviour of robot-borne systems, the CEA Department for Nuclear Engineering Studies (DEIN) has developed a method for the selection of industrial electronic components and has built computer architectures which allows to break free from some cumulative dose sensitive parameters. This paper presents the MICADO and CADMOS architectures developed at the DEIN. (J.S.). 15 refs., 5 figs

  20. Sensor-driven, fault-tolerant control of a maintenance robot

    International Nuclear Information System (INIS)

    A robot system has been designed to do routine maintenance tasks on the Sandia Pulsed Reactor (SPR). The use of this Remote Maintenance Robot (RMR) is expected to significantly reduce the occupational radiation exposure of the reactor operators. Reactor safety was a key issue in the design of the robot maintenance system. Using sensors to detect error conditions and intelligent control to recover from the errors, the RMR is capable of responding to error conditions without creating a hazard. This paper describes the design and implementation of a sensor-driven, fault-tolerant control for the RMR. Recovery from errors is not automatic; it does rely on operator assistance. However, a key feature of the error recovery procedure is that the operator is allowed to reenter the programmed operation after the error has been corrected. The recovery procedure guarantees that the moving components of the system will not collide with the reactor during recovery

  1. Active fault detection in MIMO systems

    DEFF Research Database (Denmark)

    Niemann, Hans Henrik; Poulsen, Niels KjØlstad

    2014-01-01

    The focus in this paper is on active fault detection (AFD) for MIMO systems with parametric faults. The problem of design of auxiliary inputs with respect to detection of parametric faults is investigated. An analysis of the design of auxiliary inputs is given based on analytic transfer functions from auxiliary input to residual outputs. The analysis is based on a singular value decomposition of these transfer functions Based on this analysis, it is possible to design auxiliary input as well as design of the associated residual vector with respect to every single parametric fault in the system such that it is possible to detect these faults.

  2. Fault Diagnosis for Electrical Distribution Systems using Structural Analysis

    DEFF Research Database (Denmark)

    Knüppel, Thyge; Blanke, Mogens; Østergaard, Jacob

    2014-01-01

    Fault-tolerance in electrical distribution relies on the ability to diagnose possible faults and determine which components or units cause a problem or are close to doing so. Faults include defects in instrumentation, power generation, transformation and transmission. The focus of this paper is the design of efficient diagnostic algorithms, which is a prerequisite for fault-tolerant control of power distribution. Diagnosis in a grid depend on available analytic redundancies, and hence on network...

  3. Open-Phase Fault Tolerance Techniques of Five-Phase Dual-Rotor Permanent Magnet Synchronous Motor

    Directory of Open Access Journals (Sweden)

    Jing Zhao

    2015-11-01

    Full Text Available Multi-phase motors are gaining more attention due to the advantages of good fault tolerance capability and high power density, etc. By applying dual-rotor technology to multi-phase machines, a five-phase dual-rotor permanent magnet synchronous motor (DRPMSM is researched in this paper to further promote their torque density and fault tolerance capability. It has two rotors and two sets of stator windings, and it can adopt a series drive mode or parallel drive mode. The fault-tolerance capability of the five-phase DRPMSM is researched. All open circuit fault types and corresponding fault tolerance techniques in different drive modes are analyzed. A fault-tolerance control strategy of injecting currents containing a certain third harmonic component is proposed for five-phase DRPMSM to ensure performance after faults in the motor or drive circuit. For adjacent double-phase faults in the motor, based on where the additional degrees of freedom are used, two different fault-tolerance current calculation schemes are adopted and the torque results are compared. Decoupling of the inner motor and outer motor is investigated under fault-tolerant conditions in parallel drive mode. The finite element analysis (FMA results and co-simulation results based on Simulink-Simplorer-Maxwell verify the effectiveness of the techniques.

  4. High available and fault tolerant mobile communications infrastructure

    DEFF Research Database (Denmark)

    Beiroumi, Mohammad Zib

    2006-01-01

    High availability is a key requirement in mobile communication systems, especially, when it is used for mission-critical services such as public safety e.g. police, ambulance and fire services. A failure in the fixed network infrastructure that provides services to mobile users can affect a large number of users and risk loss of lives. The fixed infrastructure of mobile communication system has different characteristics, for example, architecture ´complexity, real-time peer-topeer communication and performance requirements that make the already existing failure recovery techniques, such as those using rollback or replication techniques inapplicable. This dissertation presents a novel failure recovery approach based on a behavioral model of the communication protocols. The new recovery method is able to deal with software and hardware faults and is particularly suitable for mobile communications infrastructure. The method enables the faulty applications in the infrastructure to quickly and effectively resume their services to their mobile clients with no or minimal loss of work after failure. In our approach, we do not assume a specific fault behavior for example failstop or transient behavior as it is the case for many recovery techniques. In addition, the method does not require any modification to mobile clients. The Communicating Extended Finite State Machine (CEFSM) is used to model the behavior of the infrastructure applications. The model based recovery scheme is integrated in the application and uses the client/server model to save the application state information during failure-free execution on a stable storage and retrieve them when needed during recovery. When and what information to be saved/retrieved is determined by the behavioral model of the application. To practically evaluate and demonstrate the effectiveness of our method, we developed as a case study an experimental testbed for the TETRA (TErrestrial Trunked Radio) packet data network. The testbed works as a distributed system and can run various communication scenarios between the fixed network infrastructure and its mobile users. We thoroughly followed the TETRA standard specifications in our implementation of the communication protocols in order to get a testbed system that operates as the real system with respect to message exchange and timing. The experimental results showed that by using our method the faulty infrastructure application can immediately resume its service after its restart and in less than a minute, it restores its service performance level prior to the failure. The failure-free overhead incurred by the method is relatively low, and is experimentally found to be less than 5% in the conducted experiments.

  5. Detection and treatment of faults in manufacturing systems based on Petri Nets

    Scientific Electronic Library Online (English)

    L. A. M., Riascos; L. A., Moscato; P.E., Miyagi.

    2004-09-01

    Full Text Available This paper introduces a methodology for modeling and analyzing fault-tolerant manufacturing systems that not only optimizes normal productive processes, but also performs detection and treatment of faults. This approach is based on the hierarchical and modular integration of Petri Nets. The modulari [...] ty provides the integration of three types of processes: those representing the productive process, fault detection, and fault treatment. The hierarchical aspect of the approach permits us to consider processes on different levels of detail (i.e. factory, manufacturing cell, or machine). Case studies considering detection and treatment of faults are presented, and a simulation tool is applied for verifying the models.

  6. Fault-Tolerance through Message-logging and Check-pointing: Disaster Recovery for CORBA-based Distributed Bank Servers

    OpenAIRE

    Vassev, Emil; Nguyen, Que Thu Dung; Kuang, Heng

    2009-01-01

    This report presents results of our endeavor towards developing a failure-recovery variant of a CORBA-based bank server that provides fault tolerance features through message logging and checkpoint logging. In this group of projects, three components were developed to satisfy the requirements: 1) a message-logging protocol for the branch servers of the distributed banking system to log required information; 2) a recovery module that restarts the bank server using the message...

  7. ECFS: A decentralized, distributed and fault-tolerant FUSE filesystem for the LHCb online farm

    International Nuclear Information System (INIS)

    The LHCb experiment records millions of proton collisions every second, but only a fraction of them are useful for LHCb physics. In order to filter out the 'bad events' a large farm of x86-servers (?2000 nodes) has been put in place. These servers boot from and run from NFS, however they use their local disk to temporarily store data, which cannot be processed in real-time ('data-deferring'). These events are subsequently processed, when there are no live-data coming in. The effective CPU power is thus greatly increased. This gain in CPU power depends critically on the availability of the local disks. For cost and power-reasons, mirroring (RAID-1) is not used, leading to a lot of operational headache with failing disks and disk-errors or server failures induced by faulty disks. To mitigate these problems and increase the reliability of the LHCb farm, while at same time keeping cost and power-consumption low, an extensive research and study of existing highly available and distributed file systems has been done. While many distributed file systems are providing reliability by 'file replication', none of the evaluated ones supports erasure algorithms. A decentralised, distributed and fault-tolerant 'write once read many' file system has been designed and implemented as a proof of concept providing fault tolerance without using expensive – in terms of disk space – file replication techniques and providing a unique namespace as a main goals. This paper describes the design and the implementation of the Erasure Codes File System (ECFS) and presents the specialised FUSE interface for Linux. Depending on the encoding algorithm ECFS will use a certain number of target directories as a backend to store the segments that compose the encoded data. When target directories are mounted via nfs/autofs – ECFS will act as a file-system over network/block-level raid over multiple servers.

  8. ECFS: A decentralized, distributed and fault-tolerant FUSE filesystem for the LHCb online farm

    Science.gov (United States)

    Rybczynski, Tomasz; Bonaccorsi, Enrico; Neufeld, Niko

    2014-06-01

    The LHCb experiment records millions of proton collisions every second, but only a fraction of them are useful for LHCb physics. In order to filter out the "bad events" a large farm of x86-servers (~2000 nodes) has been put in place. These servers boot from and run from NFS, however they use their local disk to temporarily store data, which cannot be processed in real-time ("data-deferring"). These events are subsequently processed, when there are no live-data coming in. The effective CPU power is thus greatly increased. This gain in CPU power depends critically on the availability of the local disks. For cost and power-reasons, mirroring (RAID-1) is not used, leading to a lot of operational headache with failing disks and disk-errors or server failures induced by faulty disks. To mitigate these problems and increase the reliability of the LHCb farm, while at same time keeping cost and power-consumption low, an extensive research and study of existing highly available and distributed file systems has been done. While many distributed file systems are providing reliability by "file replication", none of the evaluated ones supports erasure algorithms. A decentralised, distributed and fault-tolerant "write once read many" file system has been designed and implemented as a proof of concept providing fault tolerance without using expensive - in terms of disk space - file replication techniques and providing a unique namespace as a main goals. This paper describes the design and the implementation of the Erasure Codes File System (ECFS) and presents the specialised FUSE interface for Linux. Depending on the encoding algorithm ECFS will use a certain number of target directories as a backend to store the segments that compose the encoded data. When target directories are mounted via nfs/autofs - ECFS will act as a file-system over network/block-level raid over multiple servers.

  9. Implementation of fault-tolerant quantum logic gates via optimal control

    International Nuclear Information System (INIS)

    The implementation of fault-tolerant quantum gates on encoded logic qubits is considered. It is shown that transversal implementation of logic gates based on simple geometric control ideas is problematic for realistic physical systems suffering from imperfections such as qubit inhomogeneity or uncontrollable interactions between qubits. However, this problem can be overcome by formulating the task as an optimal control problem and designing efficient algorithms to solve it. In particular, we can find solutions that implement all of the elementary logic gates in a fixed amount of time with limited control resources for the five-qubit stabilizer code. Most importantly, logic gates that are extremely difficult to implement using conventional techniques even for ideal systems, such as the T-gate for the five-qubit stabilizer code, do not appear to pose a problem for optimal control.

  10. Design of Fault-Tolerant and Dynamically-Reconfigurable Microfluidic Biochips

    CERN Document Server

    Su, Fei

    2011-01-01

    Microfluidics-based biochips are soon expected to revolutionize clinical diagnosis, DNA sequencing, and other laboratory procedures involving molecular biology. Most microfluidic biochips are based on the principle of continuous fluid flow and they rely on permanently-etched microchannels, micropumps, and microvalves. We focus here on the automated design of "digital" droplet-based microfluidic biochips. In contrast to continuous-flow systems, digital microfluidics offers dynamic reconfigurability; groups of cells in a microfluidics array can be reconfigured to change their functionality during the concurrent execution of a set of bioassays. We present a simulated annealing-based technique for module placement in such biochips. The placement procedure not only addresses chip area, but it also considers fault tolerance, which allows a microfluidic module to be relocated elsewhere in the system when a single cell is detected to be faulty. Simulation results are presented for a case study involving the polymeras...

  11. Implementation of fault-tolerant quantum logic gates via optimal control

    Energy Technology Data Exchange (ETDEWEB)

    Nigmatullin, R [Cavendish Laboratory, University of Cambridge, J J Thomson Avenue, Cambridge, CB3 0HE (United Kingdom); Schirmer, S G [Department of Applied Mathemetics and Theoretical Physics, University of Cambridge, Wilberforce Road, CB3 0WA (United Kingdom)], E-mail: sgs29@cam.ac.uk

    2009-10-15

    The implementation of fault-tolerant quantum gates on encoded logic qubits is considered. It is shown that transversal implementation of logic gates based on simple geometric control ideas is problematic for realistic physical systems suffering from imperfections such as qubit inhomogeneity or uncontrollable interactions between qubits. However, this problem can be overcome by formulating the task as an optimal control problem and designing efficient algorithms to solve it. In particular, we can find solutions that implement all of the elementary logic gates in a fixed amount of time with limited control resources for the five-qubit stabilizer code. Most importantly, logic gates that are extremely difficult to implement using conventional techniques even for ideal systems, such as the T-gate for the five-qubit stabilizer code, do not appear to pose a problem for optimal control.

  12. Implementation of Fault-tolerant Quantum Logic Gates via Optimal Control

    CERN Document Server

    Nigmatullin, R

    2009-01-01

    The implementation of fault-tolerant quantum gates on encoded logic qubits is considered. It is shown that transversal implementation of logic gates based on simple geometric control ideas is problematic for realistic physical systems suffering from imperfections such as qubit inhomogeneity or uncontrollable interactions between qubits. However, this problem can be overcome by formulating the task as an optimal control problem and designing efficient algorithms to solve it. In particular, we can find solutions that implement all of the elementary logic gates in a fixed amount of time with limited control resources for the five-qubit stabilizer code. Most importantly, logic gates that are extremely difficult to implement using conventional techniques even for ideal systems, such as the T-gate for the five-qubit stabilizer code, do not appear to pose a problem for optimal control.

  13. A Secure and Fault-tolerant framework for Mobile IPv6 based networks

    Directory of Open Access Journals (Sweden)

    Rathi S

    2009-09-01

    Full Text Available Mobile IPv6 will be an integral part of the next generation Internet protocol. The importance of mobility in the Internet gets keep on increasing. Current specification of Mobile IPv6 does not provide proper support for reliability in the mobile network and there are other problems associated with it. In this paper, we propose “Virtual Private Network (VPN based Home Agent Reliability Protocol (VHAHA” as a complete system architecture and extension to Mobile IPv6 that supports reliability and offers solutions to the security problems that are found in Mobile IP registration part. The key features of this protocol over other protocols are: better survivability, transparent failure detection and recovery, reduced complexity of the system and workload, secure data transfer and improved overall performance.Keywords-Mobility Agents; VPN; VHAHA; Fault-tolerance; Reliability; Self-certified keys; Confidentiality; Authentication; Attack prevention

  14. A Fault Tolerant Congestion Aware Routing Protocol for Mobile Adhoc Networks

    Directory of Open Access Journals (Sweden)

    K. Duraiswamy

    2012-01-01

    Full Text Available Problem statement: The performance of ad hoc routing protocols will significantly degrade when there are faulty nodes in the network. Packet losses and bandwidth degradation are caused due to congestion and thus, time and energy is wasted during its recovery. The fault tolerant congestion aware routing protocol addresses these problems by exploring the network redundancy through multipath routing. Approach: In this study, it is proposed to design a fault tolerant congestion aware multi path routing protocol to reduce the route breakages and congestion losses. The AOMDV protocol is used as a base for the multipath routing. This proposed scheme enables more nodes to salvage a dropped packet. Results: Simulation results show that the proposed protocol achieves better throughput and packet delivery ratio with reduced delay, packet drop and energy. Conclusion: An effective congestion control technique proposed in this study proactively detects node level and link level congestion and performs congestion control using the fault-tolerant multiple paths.

  15. Fault-tolerant onboard digital information switching and routing for communications satellites

    Science.gov (United States)

    Shalkhauser, Mary JO; Quintana, Jorge A.; Soni, Nitin J.; Kim, Heechul

    1993-01-01

    The NASA Lewis Research Center is developing an information-switching processor for future meshed very-small-aperture terminal (VSAT) communications satellites. The information-switching processor will switch and route baseband user data onboard the VSAT satellite to connect thousands of Earth terminals. Fault tolerance is a critical issue in developing information-switching processor circuitry that will provide and maintain reliable communications services. In parallel with the conceptual development of the meshed VSAT satellite network architecture, NASA designed and built a simple test bed for developing and demonstrating baseband switch architectures and fault-tolerance techniques. The meshed VSAT architecture and the switching demonstration test bed are described, and the initial switching architecture and the fault-tolerance techniques that were developed and tested are discussed.

  16. Fault Isolation in Distributed Embedded Systems

    OpenAIRE

    Biteus, Jonas

    2007-01-01

    To improve safety, reliability, and efficiency of automotive vehicles and other technical applications, embedded systems commonly use fault diagnosis consisting of fault detection and isolation. Since many systems are constructed as distributed embedded systems including multiple control units, it is necessary to perform global fault isolation using for example a central unit. However, the drawbacks with such a centralized method are the need of a powerful diagnostic unit and the sensitivity ...

  17. Soft-Fault Detection Technologies Developed for Electrical Power Systems

    Science.gov (United States)

    Button, Robert M.

    2004-01-01

    The NASA Glenn Research Center, partner universities, and defense contractors are working to develop intelligent power management and distribution (PMAD) technologies for future spacecraft and launch vehicles. The goals are to provide higher performance (efficiency, transient response, and stability), higher fault tolerance, and higher reliability through the application of digital control and communication technologies. It is also expected that these technologies will eventually reduce the design, development, manufacturing, and integration costs for large, electrical power systems for space vehicles. The main focus of this research has been to incorporate digital control, communications, and intelligent algorithms into power electronic devices such as direct-current to direct-current (dc-dc) converters and protective switchgear. These technologies, in turn, will enable revolutionary changes in the way electrical power systems are designed, developed, configured, and integrated in aerospace vehicles and satellites. Initial successes in integrating modern, digital controllers have proven that transient response performance can be improved using advanced nonlinear control algorithms. One technology being developed includes the detection of "soft faults," those not typically covered by current systems in use today. Soft faults include arcing faults, corona discharge faults, and undetected leakage currents. Using digital control and advanced signal analysis algorithms, we have shown that it is possible to reliably detect arcing faults in high-voltage dc power distribution systems (see the preceding photograph). Another research effort has shown that low-level leakage faults and cable degradation can be detected by analyzing power system parameters over time. This additional fault detection capability will result in higher reliability for long-lived power systems such as reusable launch vehicles and space exploration missions.

  18. Trajectory planning/re-planning for satellite systems in rendezvous mission in the presence of actuator faults based on attainable efforts analysis

    OpenAIRE

    Chamseddine, Abbas; Join, Cédric; Theilliol, Didier

    2015-01-01

    The objective of Fault-tolerant Control (FTC) is to minimize the effect of faults on systems performance (stability, trajectory tracking, etc.). However, the majority of the existing FTC methods continue to force the system to follow the pre-fault trajectories without considering the reduction in available control resources caused by actuator faults. Forcing the system to follow the same trajectories as before fault occurrence may result in actuator saturation and system's instability. Thus, ...

  19. Sensor and Actuator Fault-Hiding Reconfigurable Control Design for a Four-Tank System Benchmark

    DEFF Research Database (Denmark)

    Hameed, Ibrahim; El-Madbouly, Esam I

    2015-01-01

    Fault detection and compensation plays a key role to fulfill high demands for performance and security in today's technological systems. In this paper, a fault-hiding (i.e., tolerant) control scheme that detects and compensates for actuator and sensor faults in a four-tank system benchmark is introduced. Faults are modeled as a drastic gain loss in actuators (i.e., pumps) and in sensor measurements (i.e., level detection) which could lead to a large loss in the nominal performance. A configurable decentralized Proportional Integral (PI) controller is designed and applied to a Linear Time Invariant (LTI) system where virtual sensors and virtual actuators are used to correct faulty performance through the use of a pre-fault performance. Simulation results showed that the developed approach can handle different types of faults and able to completely and instantly recover the original system performance/functionality directly after the occurrence of faults.

  20. Optimal Configuration of Fault-Tolerance Parameters for Distributed Server Access

    DEFF Research Database (Denmark)

    Daidone, Alessandro; Renier, Thibault; Bondavalli, Andrea; Schwefel, Hans-Peter

    2013-01-01

    Server replication is a common fault-tolerance strategy to improve transaction dependability for services in communications networks. In distributed architectures, fault-diagnosis and recovery are implemented via the interaction of the server replicas with the clients and other entities such as enhanced name servers. Such architectures provide an increased number of redundancy configuration choices. The influence of a (wide area) network connection can be quite significant and induce trade-offs ...