WorldWideScience

Sample records for multiprocessors

  1. The art of multiprocessor programming

    CERN Document Server

    Herlihy, Maurice

    2012-01-01

    Revised and updated with improvements conceived in parallel programming courses, The Art of Multiprocessor Programming is an authoritative guide to multicore programming. It introduces a higher level set of software development skills than that needed for efficient single-core programming. This book provides comprehensive coverage of the new principles, algorithms, and tools necessary for effective multiprocessor programming. Students and professionals alike will benefit from thorough coverage of key multiprocessor programming issues. This revised edition incorporates much-demanded updates t

  2. Single-chip serial channel enhances multi-processor systems

    Energy Technology Data Exchange (ETDEWEB)

    Millar, J.

    1982-01-01

    In this paper multiprocessor systems are described and explained. The impact that VLSI advancements are having on multiprocessor design is pointed out. The TMS 7041 single-chip microcomputer is described briefly, highlighting its multiprocessor communication capability. And finally, a typical multiprocessor system is shown, implementing the TMS 7041.

  3. TUMULT, the Twente University multiprocessor

    NARCIS (Netherlands)

    Scholten, Johan; Jansen, P.G.

    1988-01-01

    TUMULT, (Twente University multiprocessor) is described. Its aim is the design and implementation of a modular extendable multiprocessor system. Up to 15 processing elements are connected through an interprocessor communication network, using message-passing for the exchange of data. The hardware is

  4. Multiprocessors for high energy physics

    International Nuclear Information System (INIS)

    Pohl, M.

    1987-01-01

    I review the role, status and progress of multiprocessor projects relevant to high energy physics. A short overview of the large variety of multiprocessors architectures is given, with special emphasis on machines suitable for experimental data reconstruction. A lot of progress has been made in the attempt to make the use of multiprocessors less painful by creating a ''Parallel Programming Environment'' supporting the non-expert user. A high degree of usability has been reached for coarse grain (event level) parallelism. The program development tools available on various systems (subroutine packages, preprocessors and parallelizing compilers) are discussed in some detail. Tools for execution control and debugging are also developing, thus opening the path from dedicated systems for large scale, stable production towards support of a more general job mix. At medium term, multiprocessors will thus cover a growing fraction of the typical high energy physics computing task. (orig.)

  5. Multiprocessor development for robot control

    International Nuclear Information System (INIS)

    Lee, Jong Min; Kim, Byung Soo; Kim, Chang Hoi; Hwang, Suk Yong; Sohn, Surg Won; Yoon, Tae Seob; Lee, Yong Bum; Kim, Woong Ki

    1988-02-01

    A mutiprocessor system that is essential to A.I. (Artificial Intelligence) robot control was developed. A.I. robot control needs very complex real time control. The multiprocessor system interconnecting many SBC's (Single Board Computer) is much faster and accurater than using only one SBC. Various multiprocessor systems and their applications were compared and discussed. The multiprocessor architecture system is specially designed to be used in nuclear environments. The main functions are job distribution, multitasking, and intelligent remote control by SDLC protocol using optical fiber. The system can be applied to position control for locomotion and manipulation, data fusion system, and image processing. (Author)

  6. Multiprocessor scheduling for real-time systems

    CERN Document Server

    Baruah, Sanjoy; Buttazzo, Giorgio

    2015-01-01

    This book provides a comprehensive overview of both theoretical and pragmatic aspects of resource-allocation and scheduling in multiprocessor and multicore hard-real-time systems.  The authors derive new, abstract models of real-time tasks that capture accurately the salient features of real application systems that are to be implemented on multiprocessor platforms, and identify rules for mapping application systems onto the most appropriate models.  New run-time multiprocessor scheduling algorithms are presented, which are demonstrably better than those currently used, both in terms of run-time efficiency and tractability of off-line analysis.  Readers will benefit from a new design and analysis framework for multiprocessor real-time systems, which will translate into a significantly enhanced ability to provide formally verified, safety-critical real-time systems at a significantly lower cost.

  7. Multiprocessor data acquisition system

    International Nuclear Information System (INIS)

    Haumann, J.R.; Crawford, R.K.

    1987-01-01

    A multiprocessor data acquisition system has been built to replace the single processor systems at the Intense Pulsed Neutron Source (IPNS) at Argonne National Laboratory. The multiprocessor system was needed to accommodate the higher data rates at IPNS brought about by improvements in the source and changes in instrument configurations. This paper describes the hardware configuration of the system and the method of task sharing and compares results to the single processor system

  8. File-System Workload on a Scientific Multiprocessor

    Science.gov (United States)

    Kotz, David; Nieuwejaar, Nils

    1995-01-01

    Many scientific applications have intense computational and I/O requirements. Although multiprocessors have permitted astounding increases in computational performance, the formidable I/O needs of these applications cannot be met by current multiprocessors a their I/O subsystems. To prevent I/O subsystems from forever bottlenecking multiprocessors and limiting the range of feasible applications, new I/O subsystems must be designed. The successful design of computer systems (both hardware and software) depends on a thorough understanding of their intended use. A system designer optimizes the policies and mechanisms for the cases expected to most common in the user's workload. In the case of multiprocessor file systems, however, designers have been forced to build file systems based only on speculation about how they would be used, extrapolating from file-system characterizations of general-purpose workloads on uniprocessor and distributed systems or scientific workloads on vector supercomputers (see sidebar on related work). To help these system designers, in June 1993 we began the Charisma Project, so named because the project sought to characterize 1/0 in scientific multiprocessor applications from a variety of production parallel computing platforms and sites. The Charisma project is unique in recording individual read and write requests-in live, multiprogramming, parallel workloads (rather than from selected or nonparallel applications). In this article, we present the first results from the project: a characterization of the file-system workload an iPSC/860 multiprocessor running production, parallel scientific applications at NASA's Ames Research Center.

  9. Modeling and Analyzing Real-Time Multiprocessor Systems

    NARCIS (Netherlands)

    Wiggers, M.H.; Thiele, Lothar; Lee, Edward A.; Schlieker, Simon; Bekooij, Marco Jan Gerrit

    2010-01-01

    Researchers have proposed approaches to verify that real-time multiprocessor systems meet their timeliness constraints. These approaches make assumptions on the model of computation, the load placed on the multiprocessor system, and the faults that can arise. This heterogeneous set of assumptions

  10. Multiprocessor development for robot control

    International Nuclear Information System (INIS)

    Lee, Jong Min; Kim, Seung Ho; Hwang, Suk Yeoung; Sohn, Surg Won; Kim, Byung Soo; Kim, Chang Hoi; Lee, Yong Bum; Kim, Woong Ki

    1988-12-01

    The object of this project is to develop a multiprocessor system which is essential to robot technology. A multiprocessor system interconnecting many single board computer is much faster and flexible than a single processor. The developed multiprocessor will be used to control nuclear mobile robot, so a loosely coupled system is adopted as a robot controller. A total configuration of controller is divided into three main parts in related with its function. It is consisted of supervisory control part, functional control part, remote control part. The designed control system is to be expanded easily for further use with a modular architecture, so the functional independency within sub-systems can be obtained throughout the system structure. Electromagnetic interference affecting to the control system is minimized by using optical fiber as communication media between robot and control system. System performances is enhanced not only by using distributed architecture in hardware, but by adopting real-time, multi-tasking operating system in software. The iRMX86 OS is used and reconfigured for real-time, multi-tasking operation. RS-485 serial communication protocol is used between functional control part and remote control part. Since the developed multiprocessor control system is an essential and fundamental technology for artificial intelligent robot, the result of this project can be applied directly to nuclear mobile robot. (Author)

  11. Embedded multiprocessors scheduling and synchronization

    CERN Document Server

    Sriram, Sundararajan

    2009-01-01

    Techniques for Optimizing Multiprocessor Implementations of Signal Processing ApplicationsAn indispensable component of the information age, signal processing is embedded in a variety of consumer devices, including cell phones and digital television, as well as in communication infrastructure, such as media servers and cellular base stations. Multiple programmable processors, along with custom hardware running in parallel, are needed to achieve the computation throughput required of such applications. Reviews important research in key areas related to the multiprocessor implementation of multi

  12. Multiprocessor systems and their concurrency

    Energy Technology Data Exchange (ETDEWEB)

    Starke, P H

    1984-01-01

    A multiprocessor system can be considered as a collection of finite automata which communicate over channels or shared memory units. The behaviour of such a system can be described by a semilanguage. This approach allows to define a numerical measure for the concurrency of multiprocessor systems and of distributed systems. This measure is characterized algebraically and the reconfiguration problem asking for an algorithm to construct an l-processor system which is equivalent to a given n-processor system is solved in the paper. 6 references.

  13. Shared performance monitor in a multiprocessor system

    Science.gov (United States)

    Chiu, George; Gara, Alan G.; Salapura, Valentina

    2012-07-24

    A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU comprises: a plurality of performance counters each for counting signals representing occurrences of events from one or more the plurality of processor units in the multiprocessor system; and, a plurality of input devices for receiving the event signals from one or more processor devices of the plurality of processor units, the plurality of input devices programmable to select event signals for receipt by one or more of the plurality of performance counters for counting, wherein the PMU is shared between multiple processing units, or within a group of processors in the multiprocessing system. The PMU is further programmed to monitor event signals issued from non-processor devices.

  14. Cyclic executive for safety-critical Java on chip-multiprocessors

    DEFF Research Database (Denmark)

    Ravn, Anders P.; Schoeberl, Martin

    2010-01-01

    , that uses model checking to find a static schedule, if one exists at all, which gives an implementation of a table driven multiprocessor scheduler. To evaluate the proposed cyclic executive for multiprocessors we have implemented it in the context of safety-critical Java on a Java processor....

  15. Multiprocessor Global Scheduling on Frame-Based DVFS Systems

    OpenAIRE

    Berten, Vandy; Goossens, Joël

    2008-01-01

    International audience; In this work, we are interested in multiprocessor energy efficient systems where task durations are not known in advance but are known stochastically. More precisely we consider global scheduling algorithms for frame-based multiprocessor stochastic DVFS (Dynamic Voltage and Frequency Scaling) systems. Moreover we consider processors with a discrete set of available frequencies. We provide a global scheduling algorithm, and formally show that no deadline will ever be mi...

  16. Multiprocessor programming environment

    Energy Technology Data Exchange (ETDEWEB)

    Smith, M.B.; Fornaro, R.

    1988-12-01

    Programming tools and techniques have been well developed for traditional uniprocessor computer systems. The focus of this research project is on the development of a programming environment for a high speed real time heterogeneous multiprocessor system, with special emphasis on languages and compilers. The new tools and techniques will allow a smooth transition for programmers with experience only on single processor systems.

  17. System-Level Design Methodologies for Networked Multiprocessor Systems-on-Chip

    DEFF Research Database (Denmark)

    Virk, Kashif Munir

    2008-01-01

    is the first such attempt in the published literature. The second part of the thesis deals with the issues related to the development of system-level design methodologies for networked multiprocessor systems-on-chip at various levels of design abstraction with special focus on the modeling and design...... at the system-level. The multiprocessor modeling framework is then extended to include models of networked multiprocessor systems-on-chip which is then employed to model wireless sensor networks both at the sensor node level as well as the wireless network level. In the third and the final part, the thesis...... to the transaction-level model. The thesis, as a whole makes contributions by describing a design methodology for networked multiprocessor embedded systems at three layers of abstraction from system-level through transaction-level to the cycle accurate level as well as demonstrating it practically by implementing...

  18. Investigation of implementing a synchronization protocol under multiprocessors hierarchical scheduling

    NARCIS (Netherlands)

    Nemati, F.; Behnam, M.; Bril, R.J.

    2009-01-01

    In the multi-core and multiprocessor domain, there has been considerable work done on scheduling techniques assuming that real-time tasks are independent. In practice a typical real-time system usually share logical resources among tasks. However, synchronization in the multiprocessor area has not

  19. Realtime multiprocessor for mobile ad hoc networks

    Directory of Open Access Journals (Sweden)

    T. Jungeblut

    2008-05-01

    Full Text Available This paper introduces a real-time Multiprocessor System-On-Chip (MPSoC for low power wireless applications. The multiprocessor is based on eight 32bit RISC processors that are connected via an Network-On-Chip (NoC. The NoC follows a novel approach with guaranteed bandwidth to the application that meets hard realtime requirements. At a clock frequency of 100 MHz the total power consumption of the MPSoC that has been fabricated in 180 nm UMC standard cell technology is 772 mW.

  20. Hardware locks for a real-time Java chip multiprocessor

    DEFF Research Database (Denmark)

    Strøm, Torur Biskopstø; Puffitsch, Wolfgang; Schoeberl, Martin

    2016-01-01

    A software locking mechanism commonly protects shared resources for multithreaded applications. This mechanism can, especially in chip-multiprocessor systems, result in a large synchronization overhead. For real-time systems in particular, this overhead increases the worst-case execution time....... This improvement can allow a larger number of real-time tasks to be reliably scheduled on a multiprocessor real-time platform....

  1. Best Speed Fit EDF Scheduling for Performance Asymmetric Multiprocessors

    Directory of Open Access Journals (Sweden)

    Peng Wu

    2017-01-01

    Full Text Available In order to improve the performance of a real-time system, asymmetric multiprocessors have been proposed. The benefits of improved system performance and reduced power consumption from such architectures cannot be fully exploited unless suitable task scheduling and task allocation approaches are implemented at the operating system level. Unfortunately, most of the previous research on scheduling algorithms for performance asymmetric multiprocessors is focused on task priority assignment. They simply assign the highest priority task to the fastest processor. In this paper, we propose BSF-EDF (best speed fit for earliest deadline first for performance asymmetric multiprocessor scheduling. This approach chooses a suitable processor rather than the fastest one, when allocating tasks. With this proposed BSF-EDF scheduling, we also derive an effective schedulability test.

  2. Multiprocessor Priority Ceiling Emulation for Safety-Critical Java

    DEFF Research Database (Denmark)

    Strøm, Torur Biskopstø; Schoeberl, Martin

    2015-01-01

    Priority ceiling emulation has preferable properties on uniprocessor systems, such as avoiding priority inversion and being deadlock free. This has made it a popular locking protocol. According to the safety-critical Java specication, priority ceiling emulation is a requirement for implementations....... However, implementing the protocol for multiprocessor systemsis more complex so implementations might perform worse than non-preemptive implementations. In this paper we compare two multiprocessor lock implementations with hardware support for the Java optimized processor: non-preemptive locking...

  3. Multigrid solution of diffusion equations on distributed memory multiprocessor systems

    International Nuclear Information System (INIS)

    Finnemann, H.

    1988-01-01

    The subject is the solution of partial differential equations for simulation of the reactor core on high-performance computers. The parallelization and implementation of nodal multigrid diffusion algorithms on array and ring configurations of the DIRMU multiprocessor system is outlined. The particular iteration scheme employed in the nodal expansion method appears similarly efficient in serial and parallel environments. The combination of modern multi-level techniques with innovative hardware (vector-multiprocessor systems) provides powerful tools needed for real time simulation of physical systems. The parallel efficiencies range from 70 to 90%. The same performance is estimated for large problems on large multiprocessor systems being designed at present. (orig.) [de

  4. Safety-critical Java with cyclic executives on chip-multiprocessors

    DEFF Research Database (Denmark)

    Ravn, Anders P.; Schoeberl, Martin

    2012-01-01

    Chip-multiprocessors offer increased processing power at a low cost. However, in order to use them for real-time systems, tasks have to be scheduled efficiently and predictably. It is well known that finding optimal schedules is a computationally hard problem. In this paper we present a solution ...... for multiprocessors, we have implemented it in the context of safety-critical Java on a Java processor....

  5. Control and Reliability of Optical Networks in Multiprocessors

    Science.gov (United States)

    Olsen, James Jonathan

    1993-01-01

    Optical communication links have great potential to improve the performance of interconnection networks within large parallel multiprocessors, but the problems of semiconductor laser drive control and reliability inhibit their wide use. These problems have been solved in the telecommunications context, but the telecommunications solutions, based on a small number of links, are often too bulky, complex, power-hungry, and expensive to be feasible for use in a multiprocessor network with thousands of optical links. The main problems with the telecommunications approaches are that they are, by definition, designed for long-distance communication and therefore deal with communications links in isolation, instead of in an overall systems context. By taking a system-level approach to solving the laser reliability problem in a multiprocessor, and by exploiting the short -distance nature of the links, one can achieve small, simple, low-power, and inexpensive solutions, practical for implementation in the thousands of optical links that might be used in a multiprocessor. Through modeling and experimentation, I demonstrate that such system-level solutions exist, and are feasible for use in a multiprocessor network. I divide semiconductor laser reliability problems into two classes: transient errors and hard failures, and develop solutions to each type of problem in the context of a large multiprocessor. I find that for transient errors, the computer system would require a very low bit-error-rate (BER), such as 10^{-23}, if no provision were made for error control. Optical links cannot achieve such rates directly, but I find that a much more reasonable link-level BER (such as 10^{-7} ) would be acceptable with simple error detection coding. I then propose a feedback system that will enable lasers to achieve these error levels even when laser threshold current varies. Instead of telecommunications techniques, which require laser output power monitors, I describe a software

  6. Runtime adaptive multi-processor system-on-chip: RAMPSoC

    OpenAIRE

    Göhringer, D.; Hübner, M.; Schatz, V.; Becker, J.

    2008-01-01

    Current trends in high performance computing show, that the usage of multiprocessor systems on chip are one approach for the requirements of computing intensive applications. The multiprocessor system on chip (MPSoC) approaches often provide a static and homogeneous infrastructure of networked microprocessor on the chip die. A novel idea in this research area is to introduce the dynamic adaptivity of reconfigurable hardware in order to provide a flexible heterogeneous set of processing elemen...

  7. A Multiprocessor Operating System Simulator

    Science.gov (United States)

    Johnston, Gary M.; Campbell, Roy H.

    1988-01-01

    This paper describes a multiprocessor operating system simulator that was developed by the authors in the Fall semester of 1987. The simulator was built in response to the need to provide students with an environment in which to build and test operating system concepts as part of the coursework of a third-year undergraduate operating systems course. Written in C++, the simulator uses the co-routine style task package that is distributed with the AT&T C++ Translator to provide a hierarchy of classes that represents a broad range of operating system software and hardware components. The class hierarchy closely follows that of the 'Choices' family of operating systems for loosely- and tightly-coupled multiprocessors. During an operating system course, these classes are refined and specialized by students in homework assignments to facilitate experimentation with different aspects of operating system design and policy decisions. The current implementation runs on the IBM RT PC under 4.3bsd UNIX.

  8. A system-level multiprocessor system-on-chip modeling framework

    DEFF Research Database (Denmark)

    Virk, Kashif Munir; Madsen, Jan

    2004-01-01

    We present a system-level modeling framework to model system-on-chips (SoC) consisting of heterogeneous multiprocessors and network-on-chip communication structures in order to enable the developers of today's SoC designs to take advantage of the flexibility and scalability of network-on-chip and...... SoC design. We show how a hand-held multimedia terminal, consisting of JPEG, MP3 and GSM applications, can be modeled as a multiprocessor SoC in our framework....

  9. Operating system for a real-time multiprocessor propulsion system simulator. User's manual

    Science.gov (United States)

    Cole, G. L.

    1985-01-01

    The NASA Lewis Research Center is developing and evaluating experimental hardware and software systems to help meet future needs for real-time, high-fidelity simulations of air-breathing propulsion systems. Specifically, the real-time multiprocessor simulator project focuses on the use of multiple microprocessors to achieve the required computing speed and accuracy at relatively low cost. Operating systems for such hardware configurations are generally not available. A real time multiprocessor operating system (RTMPOS) that supports a variety of multiprocessor configurations was developed at Lewis. With some modification, RTMPOS can also support various microprocessors. RTMPOS, by means of menus and prompts, provides the user with a versatile, user-friendly environment for interactively loading, running, and obtaining results from a multiprocessor-based simulator. The menu functions are described and an example simulation session is included to demonstrate the steps required to go from the simulation loading phase to the execution phase.

  10. Real-Time Multiprocessor Programming Language (RTMPL) user's manual

    Science.gov (United States)

    Arpasi, D. J.

    1985-01-01

    A real-time multiprocessor programming language (RTMPL) has been developed to provide for high-order programming of real-time simulations on systems of distributed computers. RTMPL is a structured, engineering-oriented language. The RTMPL utility supports a variety of multiprocessor configurations and types by generating assembly language programs according to user-specified targeting information. Many programming functions are assumed by the utility (e.g., data transfer and scaling) to reduce the programming chore. This manual describes RTMPL from a user's viewpoint. Source generation, applications, utility operation, and utility output are detailed. An example simulation is generated to illustrate many RTMPL features.

  11. The structural robustness of multiprocessor computing system

    Directory of Open Access Journals (Sweden)

    N. Andronaty

    1996-03-01

    Full Text Available The model of the multiprocessor computing system on the base of transputers which permits to resolve the question of valuation of a structural robustness (viability, survivability is described.

  12. One-Step Programmable Arbiters for Multiprocessors

    DEFF Research Database (Denmark)

    Højberg, Kristian Søe

    1978-01-01

    When processors in a multiprocessor system demand service from a shared bus in an asynchronous mode, a synchronous state arbiter resolves conflicts and allocates resources. Independent of the combination of requests, only one state transition is required from a free to allocated resource...

  13. Supporting Multiprocessors in the Icecap Safety-Critical Java Run-Time Environment

    DEFF Research Database (Denmark)

    Zhao, Shuai; Wellings, Andy; Korsholm, Stephan Erbs

    The current version of the Safety Critical Java (SCJ) specification defines three compliance levels. Level 0 targets single processor programs while Level 1 and 2 can support multiprocessor platforms. Level 1 programs must be fully partitioned but Level 2 programs can also be more globally...... scheduled. As of yet, there is no official Reference Implementation for SCJ. However, the icecap project has produced a Safety-Critical Java Run-time Environment based on the Hardware-near Virtual Machine (HVM). This supports SCJ at all compliance levels and provides an implementation of the safety......-critical Java (javax.safetycritical) package. This is still work-in-progress and lacks certain key features. Among these is the ability to support multiprocessor platforms. In this paper, we explore two possible options to adding multiprocessor support to this environment: the “green thread” and the “native...

  14. Performance of Multithreaded Chip Multiprocessors And Implications for Operating System Design

    OpenAIRE

    Fedorova, Alexandra; Seltzer, Margo I.; Small, Christopher A.; Nussbaum, Daniel

    2005-01-01

    An operating system’s design is often influenced by the architecture of the target hardware. While uniprocessor and multiprocessor architectures are well understood, such is not the case for multithreaded chip multiprocessors (CMT) – a new generation of processors designed to improve performance of memory-intensive applications. The first systems equipped with CMT processors are just becoming available, so it is critical that we now understand how to obtain the best performance from such syst...

  15. Reproducibility in a multiprocessor system

    Science.gov (United States)

    Bellofatto, Ralph A; Chen, Dong; Coteus, Paul W; Eisley, Noel A; Gara, Alan; Gooding, Thomas M; Haring, Rudolf A; Heidelberger, Philip; Kopcsay, Gerard V; Liebsch, Thomas A; Ohmacht, Martin; Reed, Don D; Senger, Robert M; Steinmacher-Burow, Burkhard; Sugawara, Yutaka

    2013-11-26

    Fixing a problem is usually greatly aided if the problem is reproducible. To ensure reproducibility of a multiprocessor system, the following aspects are proposed; a deterministic system start state, a single system clock, phase alignment of clocks in the system, system-wide synchronization events, reproducible execution of system components, deterministic chip interfaces, zero-impact communication with the system, precise stop of the system and a scan of the system state.

  16. Advanced lectures on multiprocessor programming (1/3)

    CERN Multimedia

    CERN. Geneva

    2011-01-01

    Three classes (60 mins) on Multiprocessor Programming Prof. Dr. Christoph von Praun Georg-Simon-Ohm University of Applied Sciences Nuremberg, Germany This is an advanced class on multiprocessor programming. The class gives an introduction to principles of concurrent objects and the notion of different progress guarantees that concurrent computations can have. The focus of this class is on non-blocking computations, i.e. concurrent programs that do not make use of locks. We discuss the implementation of practical non-blocking data structures in detail. 1st class: Introduction to concurrent objects 2nd class: Principles of non-blocking synchronization 3rd class: Concurrent queues Brief Bio of Christoph von Praun Christoph worked on a variety of analysis techniques and runtime platforms for parallel programs. Hist most recent research studies programming models and tools that support transactional synchronization. In prior work, which he also did at the IBM T.J. Watson Research Center in Yorktown Height...

  17. Academic training: Advanced lectures on multiprocessor programming

    CERN Multimedia

    PH Department

    2011-01-01

    Academic Training Lecture - Regular Programme 31 October 1, 2 November 2011 from 11:00 to 12:00 -  IT Auditorium, Bldg. 31   Three classes (60 mins) on Multiprocessor Programming Prof. Dr. Christoph von Praun Georg-Simon-Ohm University of Applied Sciences Nuremberg, Germany This is an advanced class on multiprocessor programming. The class gives an introduction to principles of concurrent objects and the notion of different progress guarantees that concurrent computations can have. The focus of this class is on non-blocking computations, i.e. concurrent programs that do not make use of locks. We discuss the implementation of practical non-blocking data structures in detail. 1st class: Introduction to concurrent objects 2nd class: Principles of non-blocking synchronization 3rd class: Concurrent queues Brief Bio of Christoph von Praun Christoph worked on a variety of analysis techniques and runtime platforms for parallel programs. Hist most recent research studies programming models an...

  18. Efficient process migration in the EMPS multiprocessor system

    NARCIS (Netherlands)

    van Dijk, G.J.W.; Gils, van M.J.

    1992-01-01

    The process migration facility in the Eindhoven multiprocessor system (EMPS) is presented. In the EMPS system, mailboxes are used for interprocess communication. These mailboxes provide transparency of location for communicating processes. The major advantages of mailbox communication in the EMPS

  19. Communication and Memory Architecture Design of Application-Specific High-End Multiprocessors

    Directory of Open Access Journals (Sweden)

    Yahya Jan

    2012-01-01

    Full Text Available This paper is devoted to the design of communication and memory architectures of massively parallel hardware multiprocessors necessary for the implementation of highly demanding applications. We demonstrated that for the massively parallel hardware multiprocessors the traditionally used flat communication architectures and multi-port memories do not scale well, and the memory and communication network influence on both the throughput and circuit area dominates the processors influence. To resolve the problems and ensure scalability, we proposed to design highly optimized application-specific hierarchical and/or partitioned communication and memory architectures through exploring and exploiting the regularity and hierarchy of the actual data flows of a given application. Furthermore, we proposed some data distribution and related data mapping schemes in the shared (global partitioned memories with the aim to eliminate the memory access conflicts, as well as, to ensure that our communication design strategies will be applicable. We incorporated these architecture synthesis strategies into our quality-driven model-based multi-processor design method and related automated architecture exploration framework. Using this framework, we performed a large series of experiments that demonstrate many various important features of the synthesized memory and communication architectures. They also demonstrate that our method and related framework are able to efficiently synthesize well scalable memory and communication architectures even for the high-end multiprocessors. The gains as high as 12-times in performance and 25-times in area can be obtained when using the hierarchical communication networks instead of the flat networks. However, for the high parallelism levels only the partitioned approach ensures the scalability in performance.

  20. Cache-aware network-on-chip for chip multiprocessors

    Science.gov (United States)

    Tatas, Konstantinos; Kyriacou, Costas; Dekoulis, George; Demetriou, Demetris; Avraam, Costas; Christou, Anastasia

    2009-05-01

    This paper presents the hardware prototype of a Network-on-Chip (NoC) for a chip multiprocessor that provides support for cache coherence, cache prefetching and cache-aware thread scheduling. A NoC with support to these cache related mechanisms can assist in improving systems performance by reducing the cache miss ratio. The presented multi-core system employs the Data-Driven Multithreading (DDM) model of execution. In DDM thread scheduling is done according to data availability, thus the system is aware of the threads to be executed in the near future. This characteristic of the DDM model allows for cache aware thread scheduling and cache prefetching. The NoC prototype is a crossbar switch with output buffering that can support a cache-aware 4-node chip multiprocessor. The prototype is built on the Xilinx ML506 board equipped with a Xilinx Virtex-5 FPGA.

  1. Hard Real-Time Performances in Multiprocessor-Embedded Systems Using ASMP-Linux

    Directory of Open Access Journals (Sweden)

    Daniel Pierre Bovet

    2008-01-01

    Full Text Available Multiprocessor systems, especially those based on multicore or multithreaded processors, and new operating system architectures can satisfy the ever increasing computational requirements of embedded systems. ASMP-LINUX is a modified, high responsiveness, open-source hard real-time operating system for multiprocessor systems capable of providing high real-time performance while maintaining the code simple and not impacting on the performances of the rest of the system. Moreover, ASMP-LINUX does not require code changing or application recompiling/relinking. In order to assess the performances of ASMP-LINUX, benchmarks have been performed on several hardware platforms and configurations.

  2. Hard Real-Time Performances in Multiprocessor-Embedded Systems Using ASMP-Linux

    Directory of Open Access Journals (Sweden)

    Betti Emiliano

    2008-01-01

    Full Text Available Abstract Multiprocessor systems, especially those based on multicore or multithreaded processors, and new operating system architectures can satisfy the ever increasing computational requirements of embedded systems. ASMP-LINUX is a modified, high responsiveness, open-source hard real-time operating system for multiprocessor systems capable of providing high real-time performance while maintaining the code simple and not impacting on the performances of the rest of the system. Moreover, ASMP-LINUX does not require code changing or application recompiling/relinking. In order to assess the performances of ASMP-LINUX, benchmarks have been performed on several hardware platforms and configurations.

  3. Operating system for a real-time multiprocessor propulsion system simulator

    Science.gov (United States)

    Cole, G. L.

    1984-01-01

    The success of the Real Time Multiprocessor Operating System (RTMPOS) in the development and evaluation of experimental hardware and software systems for real time interactive simulation of air breathing propulsion systems was evaluated. The Real Time Multiprocessor Operating System (RTMPOS) provides the user with a versatile, interactive means for loading, running, debugging and obtaining results from a multiprocessor based simulator. A front end processor (FEP) serves as the simulator controller and interface between the user and the simulator. These functions are facilitated by the RTMPOS which resides on the FEP. The RTMPOS acts in conjunction with the FEP's manufacturer supplied disk operating system that provides typical utilities like an assembler, linkage editor, text editor, file handling services, etc. Once a simulation is formulated, the RTMPOS provides for engineering level, run time operations such as loading, modifying and specifying computation flow of programs, simulator mode control, data handling and run time monitoring. Run time monitoring is a powerful feature of RTMPOS that allows the user to record all actions taken during a simulation session and to receive advisories from the simulator via the FEP. The RTMPOS is programmed mainly in PASCAL along with some assembly language routines. The RTMPOS software is easily modified to be applicable to hardware from different manufacturers.

  4. Chip-Multiprocessor Hardware Locks for Safety-Critical Java

    DEFF Research Database (Denmark)

    Strøm, Torur Biskopstø; Puffitsch, Wolfgang; Schoeberl, Martin

    2013-01-01

    and may void a task set's schedulability. In this paper we present a hardware locking mechanism to reduce the synchronization overhead. The solution is implemented for the chip-multiprocessor version of the Java Optimized Processor in the context of safety-critical Java. The implementation is compared...

  5. Debugging in a multi-processor environment

    International Nuclear Information System (INIS)

    Spann, J.M.

    1981-01-01

    The Supervisory Control and Diagnostic System (SCDS) for the Mirror Fusion Test Facility (MFTF) consists of nine 32-bit minicomputers arranged in a tightly coupled distributed computer system utilizing a share memory as the data exchange medium. Debugging of more than one program in the multi-processor environment is a difficult process. This paper describes what new tools were developed and how the testing of software is performed in the SCDS for the MFTF project

  6. The fast Amsterdam multiprocessor (FAMP) system hardware

    International Nuclear Information System (INIS)

    Hertzberger, L.O.; Kieft, G.; Kisielewski, B.; Wiggers, L.W.; Engster, C.; Koningsveld, L. van

    1981-01-01

    The architecture of a multiprocessor system is described that will be used for on-line filter and second stage trigger applications. The system is based on the MC 68000 microprocessor from Motorola. Emphasis is paid to hardware aspects, in particular the modularity, processor communication and interfacing, whereas the system software and the applications will be described in separate articles. (orig.)

  7. The ACP [Advanced Computer Program] multiprocessor system at Fermilab

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.

    1986-09-01

    The Advanced Computer Program at Fermilab has developed a multiprocessor system which is easy to use and uniquely cost effective for many high energy physics problems. The system is based on single board computers which cost under $2000 each to build including 2 Mbytes of on board memory. These standard VME modules each run experiment reconstruction code in Fortran at speeds approaching that of a VAX 11/780. Two versions have been developed: one uses Motorola's 68020 32 bit microprocessor, the other runs with AT and T's 32100. both include the corresponding floating point coprocessor chip. The first system, when fully configured, uses 70 each of the two types of processors. A 53 processor system has been operated for several months with essentially no down time by computer operators in the Fermilab Computer Center, performing at nearly the capacity of 6 CDC Cyber 175 mainframe computers. The VME crates in which the processing ''nodes'' sit are connected via a high speed ''Branch Bus'' to one or more MicroVAX computers which act as hosts handling system resource management and all I/O in offline applications. An interface from Fastbus to the Branch Bus has been developed for online use which has been tested error free at 20 Mbytes/sec for 48 hours. ACP hardware modules are now available commercially. A major package of software, including a simulator that runs on any VAX, has been developed. It allows easy migration of existing programs to this multiprocessor environment. This paper describes the ACP Multiprocessor System and early experience with it at Fermilab and elsewhere

  8. DiFX: A software correlator for very long baseline interferometry using multi-processor computing environments

    OpenAIRE

    Deller, A. T.; Tingay, S. J.; Bailes, M.; West, C.

    2007-01-01

    We describe the development of an FX style correlator for Very Long Baseline Interferometry (VLBI), implemented in software and intended to run in multi-processor computing environments, such as large clusters of commodity machines (Beowulf clusters) or computers specifically designed for high performance computing, such as multi-processor shared-memory machines. We outline the scientific and practical benefits for VLBI correlation, these chiefly being due to the inherent flexibility of softw...

  9. Software for event oriented processing on multiprocessor systems

    International Nuclear Information System (INIS)

    Fischler, M.; Areti, H.; Biel, J.; Bracker, S.; Case, G.; Gaines, I.; Husby, D.; Nash, T.

    1984-08-01

    Computing intensive problems that require the processing of numerous essentially independent events are natural customers for large scale multi-microprocessor systems. This paper describes the software required to support users with such problems in a multiprocessor environment. It is based on experience with and development work aimed at processing very large amounts of high energy physics data

  10. Stream-processing pipelines: processing of streams on multiprocessor architecture

    NARCIS (Netherlands)

    Kavaldjiev, N.K.; Smit, Gerardus Johannes Maria; Jansen, P.G.

    In this paper we study the timing aspects of the operation of stream-processing applications that run on a multiprocessor architecture. Dependencies are derived for the processing and communication times of the processors in such a system. Three cases of real-time constrained operation and four

  11. A combined PLC and CPU approach to multiprocessor control

    International Nuclear Information System (INIS)

    Harris, J.J.; Broesch, J.D.; Coon, R.M.

    1995-10-01

    A sophisticated multiprocessor control system has been developed for use in the E-Power Supply System Integrated Control (EPSSIC) on the DIII-D tokamak. EPSSIC provides control and interlocks for the ohmic heating coil power supply and its associated systems. Of particular interest is the architecture of this system: both a Programmable Logic Controller (PLC) and a Central Processor Unit (CPU) have been combined on a standard VME bus. The PLC and CPU input and output signals are routed through signal conditioning modules, which provide the necessary voltage and ground isolation. Additionally these modules adapt the signal levels to that of the VME I/O boards. One set of I/O signals is shared between the two processors. The resulting multiprocessor system provides a number of advantages: redundant operation for mission critical situations, flexible communications using conventional TCP/IP protocols, the simplicity of ladder logic programming for the majority of the control code, and an easily maintained and expandable non-proprietary system

  12. A possible approach to estimating the operational efficiency of multiprocessor systems

    International Nuclear Information System (INIS)

    Kuznetsov, N.Y.; Gorlach, S.P.; Sumskaya, A.A.

    1984-01-01

    This article presents a mathematical model that constructs the upper and lower estimates evaluating the efficiency of solution of a large class of problems using a multiprocessor system with a specific architecture. Efficiency depends on a system's architecture (e.g., the number of processors, memory volume, the number of communication links, commutation speed) and the types of problems it is intended to solve. The behavior of the model is considered in a stationary mode. The model is used to evaluate the efficiency of a particular algorithm implemented in a multiprocessor system. It is concluded that the model is flexible and enables the investigation of a broad class of problems in computational mathematics, including linear algebra and boundary-value problems of mathematical physics

  13. Scientific applications and numerical algorithms on the midas multiprocessor system

    International Nuclear Information System (INIS)

    Logan, D.; Maples, C.

    1986-01-01

    The MIDAS multiprocessor system is a multi-level, hierarchial structure designed at the Advanced Computer Architecture Laboratory of the University of California's Lawrence Berkeley Laboratory. A two-stage, 11-processor system has been operational for over a year and is currently undergoing expansion. It has been employed to investigate the performance of different methods of decomposing various problems and algorithms into a multiprocessor environment. The results of such tests on a variety of applications such as scientific data analysis, Monte Carlo calculations, and image processing, are discussed. Often such decompositions involve investigating the parallel structure of fundamental algorithms. Several basic algorithms dealing with random number generation, matrix diagonalization, fast Fourier transforms, and finite element methods in solving partial differential equations are also discussed. The performance and projected extensibilities of these decompositions on the MIDAS system are reported

  14. Parallelising a molecular dynamics algorithm on a multi-processor workstation

    Science.gov (United States)

    Müller-Plathe, Florian

    1990-12-01

    The Verlet neighbour-list algorithm is parallelised for a multi-processor Hewlett-Packard/Apollo DN10000 workstation. The implementation makes use of memory shared between the processors. It is a genuine master-slave approach by which most of the computational tasks are kept in the master process and the slaves are only called to do part of the nonbonded forces calculation. The implementation features elements of both fine-grain and coarse-grain parallelism. Apart from three calls to library routines, two of which are standard UNIX calls, and two machine-specific language extensions, the whole code is written in standard Fortran 77. Hence, it may be expected that this parallelisation concept can be transfered in parts or as a whole to other multi-processor shared-memory computers. The parallel code is routinely used in production work.

  15. The ACP (Advanced Computer Program) multiprocessor system at Fermilab

    Energy Technology Data Exchange (ETDEWEB)

    Nash, T.; Areti, H.; Atac, R.; Biel, J.; Case, G.; Cook, A.; Fischler, M.; Gaines, I.; Hance, R.; Husby, D.

    1986-09-01

    The Advanced Computer Program at Fermilab has developed a multiprocessor system which is easy to use and uniquely cost effective for many high energy physics problems. The system is based on single board computers which cost under $2000 each to build including 2 Mbytes of on board memory. These standard VME modules each run experiment reconstruction code in Fortran at speeds approaching that of a VAX 11/780. Two versions have been developed: one uses Motorola's 68020 32 bit microprocessor, the other runs with AT and T's 32100. both include the corresponding floating point coprocessor chip. The first system, when fully configured, uses 70 each of the two types of processors. A 53 processor system has been operated for several months with essentially no down time by computer operators in the Fermilab Computer Center, performing at nearly the capacity of 6 CDC Cyber 175 mainframe computers. The VME crates in which the processing ''nodes'' sit are connected via a high speed ''Branch Bus'' to one or more MicroVAX computers which act as hosts handling system resource management and all I/O in offline applications. An interface from Fastbus to the Branch Bus has been developed for online use which has been tested error free at 20 Mbytes/sec for 48 hours. ACP hardware modules are now available commercially. A major package of software, including a simulator that runs on any VAX, has been developed. It allows easy migration of existing programs to this multiprocessor environment. This paper describes the ACP Multiprocessor System and early experience with it at Fermilab and elsewhere.

  16. A general model for memory interference in a multiprocessor system with memory hierarchy

    Science.gov (United States)

    Taha, Badie A.; Standley, Hilda M.

    1989-01-01

    The problem of memory interference in a multiprocessor system with a hierarchy of shared buses and memories is addressed. The behavior of the processors is represented by a sequence of memory requests with each followed by a determined amount of processing time. A statistical queuing network model for determining the extent of memory interference in multiprocessor systems with clusters of memory hierarchies is presented. The performance of the system is measured by the expected number of busy memory clusters. The results of the analytic model are compared with simulation results, and the correlation between them is found to be very high.

  17. Energy-efficient fault tolerance in multiprocessor real-time systems

    Science.gov (United States)

    Guo, Yifeng

    The recent progress in the multiprocessor/multicore systems has important implications for real-time system design and operation. From vehicle navigation to space applications as well as industrial control systems, the trend is to deploy multiple processors in real-time systems: systems with 4 -- 8 processors are common, and it is expected that many-core systems with dozens of processing cores will be available in near future. For such systems, in addition to general temporal requirement common for all real-time systems, two additional operational objectives are seen as critical: energy efficiency and fault tolerance. An intriguing dimension of the problem is that energy efficiency and fault tolerance are typically conflicting objectives, due to the fact that tolerating faults (e.g., permanent/transient) often requires extra resources with high energy consumption potential. In this dissertation, various techniques for energy-efficient fault tolerance in multiprocessor real-time systems have been investigated. First, the Reliability-Aware Power Management (RAPM) framework, which can preserve the system reliability with respect to transient faults when Dynamic Voltage Scaling (DVS) is applied for energy savings, is extended to support parallel real-time applications with precedence constraints. Next, the traditional Standby-Sparing (SS) technique for dual processor systems, which takes both transient and permanent faults into consideration while saving energy, is generalized to support multiprocessor systems with arbitrary number of identical processors. Observing the inefficient usage of slack time in the SS technique, a Preference-Oriented Scheduling Framework is designed to address the problem where tasks are given preferences for being executed as soon as possible (ASAP) or as late as possible (ALAP). A preference-oriented earliest deadline (POED) scheduler is proposed and its application in multiprocessor systems for energy-efficient fault tolerance is

  18. Interference control by best-effort process duty-cycling in chip multi-processor systems for real-time medical image processing

    NARCIS (Netherlands)

    Westmijze, M.; Bekooij, Marco Jan Gerrit; Smit, Gerardus Johannes Maria

    2013-01-01

    Systems with chip multi-processors are currently used for several applications that have real-time requirements. In chip multi-processor architectures, many hardware resources such as parts of the cache hierarchy are shared between cores and by using such resources, applications can significantly

  19. Matrix factorization on a hypercube multiprocessor

    International Nuclear Information System (INIS)

    Geist, G.A.; Heath, M.T.

    1985-08-01

    This paper is concerned with parallel algorithms for matrix factorization on distributed-memory, message-passing multiprocessors, with special emphasis on the hypercube. Both Cholesky factorization of symmetric positive definite matrices and LU factorization of nonsymmetric matrices using partial pivoting are considered. The use of the resulting triangular factors to solve systems of linear equations by forward and back substitutions is also considered. Efficiencies of various parallel computational approaches are compared in terms of empirical results obtained on an Intel iPSC hypercube. 19 refs., 6 figs., 2 tabs

  20. Safe and Efficient Support for Embeded Multi-Processors in ADA

    Science.gov (United States)

    Ruiz, Jose F.

    2010-08-01

    New software demands increasing processing power, and multi-processor platforms are spreading as the answer to achieve the required performance. Embedded real-time systems are also subject to this trend, but in the case of real-time mission-critical systems, the properties of reliability, predictability and analyzability are also paramount. The Ada 2005 language defined a subset of its tasking model, the Ravenscar profile, that provides the basis for the implementation of deterministic and time analyzable applications on top of a streamlined run-time system. This Ravenscar tasking profile, originally designed for single processors, has proven remarkably useful for modelling verifiable real-time single-processor systems. This paper proposes a simple extension to the Ravenscar profile to support multi-processor systems using a fully partitioned approach. The implementation of this scheme is simple, and it can be used to develop applications amenable to schedulability analysis.

  1. Plasma physics modeling and the Cray-2 multiprocessor

    International Nuclear Information System (INIS)

    Killeen, J.

    1985-01-01

    The importance of computer modeling in the magnetic fusion energy research program is discussed. The need for the most advanced supercomputers is described. To meet the demand for more powerful scientific computers to solve larger and more complicated problems, the computer industry is developing multiprocessors. The role of the Cray-2 in plasma physics modeling is discussed with some examples. 28 refs., 2 figs., 1 tab

  2. Performances of multiprocessor multidisk architectures for continuous media storage

    Science.gov (United States)

    Gennart, Benoit A.; Messerli, Vincent; Hersch, Roger D.

    1996-03-01

    Multimedia interfaces increase the need for large image databases, capable of storing and reading streams of data with strict synchronicity and isochronicity requirements. In order to fulfill these requirements, we consider a parallel image server architecture which relies on arrays of intelligent disk nodes, each disk node being composed of one processor and one or more disks. This contribution analyzes through bottleneck performance evaluation and simulation the behavior of two multi-processor multi-disk architectures: a point-to-point architecture and a shared-bus architecture similar to current multiprocessor workstation architectures. We compare the two architectures on the basis of two multimedia algorithms: the compute-bound frame resizing by resampling and the data-bound disk-to-client stream transfer. The results suggest that the shared bus is a potential bottleneck despite its very high hardware throughput (400Mbytes/s) and that an architecture with addressable local memories located closely to their respective processors could partially remove this bottleneck. The point- to-point architecture is scalable and able to sustain high throughputs for simultaneous compute- bound and data-bound operations.

  3. Optical RAM-enabled cache memory and optical routing for chip multiprocessors: technologies and architectures

    Science.gov (United States)

    Pleros, Nikos; Maniotis, Pavlos; Alexoudi, Theonitsa; Fitsios, Dimitris; Vagionas, Christos; Papaioannou, Sotiris; Vyrsokinos, K.; Kanellos, George T.

    2014-03-01

    The processor-memory performance gap, commonly referred to as "Memory Wall" problem, owes to the speed mismatch between processor and electronic RAM clock frequencies, forcing current Chip Multiprocessor (CMP) configurations to consume more than 50% of the chip real-estate for caching purposes. In this article, we present our recent work spanning from Si-based integrated optical RAM cell architectures up to complete optical cache memory architectures for Chip Multiprocessor configurations. Moreover, we discuss on e/o router subsystems with up to Tb/s routing capacity for cache interconnection purposes within CMP configurations, currently pursued within the FP7 PhoxTrot project.

  4. Assessing Programming Costs of Explicit Memory Localization on a Large Scale Shared Memory Multiprocessor

    Directory of Open Access Journals (Sweden)

    Silvio Picano

    1992-01-01

    Full Text Available We present detailed experimental work involving a commercially available large scale shared memory multiple instruction stream-multiple data stream (MIMD parallel computer having a software controlled cache coherence mechanism. To make effective use of such an architecture, the programmer is responsible for designing the program's structure to match the underlying multiprocessors capabilities. We describe the techniques used to exploit our multiprocessor (the BBN TC2000 on a network simulation program, showing the resulting performance gains and the associated programming costs. We show that an efficient implementation relies heavily on the user's ability to explicitly manage the memory system.

  5. Multiprocessor development for robot control

    International Nuclear Information System (INIS)

    Lee, John Min; Kim, Seung Ho; Kim, Chang Hoi; Kim, Byung Soo; Hwang, Suk Yeong; Lee, Young Bum; Sohn, Suk Won; Kim, Woon Gi

    1990-01-01

    The project of this study is to develop a real time controller applying autonomous robotic systems operated in hostile environment. Developed control system is designed with a multiprocessor to get independency and reliability as well as to extend the system easily. The control system is designed in three distinct subsystems (supervisory control part, functional control part, and remote control part). To review the functional performance of developed controller, a prototype mobile robot, which was installed 4 DOF mainpulator, was designed and manufactured. Initial tests showed that the robot could turn with a radius of 38 cm and a maximum speed of 1.26 km/hr and go over obstacle of 18 cm in height. (author)

  6. Distributed parallel messaging for multiprocessor systems

    Science.gov (United States)

    Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

    2013-06-04

    A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.

  7. Shared random access memory resource for multiprocessor real-time systems

    International Nuclear Information System (INIS)

    Dimmler, D.G.; Hardy, W.H. II

    1977-01-01

    A shared random-access memory resource is described which is used within real-time data acquisition and control systems with multiprocessor and multibus organizations. Hardware and software aspects are discussed in a specific example where interconnections are done via a UNIBUS. The general applicability of the approach is also discussed

  8. Abstractions for aperiodic multiprocessor scheduling of real-time stream processing applications

    NARCIS (Netherlands)

    Hausmans, J.P.H.M.

    2015-01-01

    Embedded multiprocessor systems are often used in the domain of real-time stream processing applications to keep up with increasing power and performance requirements. Examples of such real-time stream processing applications are digital radio baseband processing and WLAN transceivers. These stream

  9. Multiprocessor Real-Time Scheduling with Hierarchical Processor Affinities

    OpenAIRE

    Bonifaci , Vincenzo; Brandenburg , Björn; D'Angelo , Gianlorenzo; Marchetti-Spaccamela , Alberto

    2016-01-01

    International audience; Many multiprocessor real-time operating systems offer the possibility to restrict the migrations of any task to a specified subset of processors by setting affinity masks. A notion of " strong arbitrary processor affinity scheduling " (strong APA scheduling) has been proposed; this notion avoids schedulability losses due to overly simple implementations of processor affinities. Due to potential overheads, strong APA has not been implemented so far in a real-time operat...

  10. Thermal-Aware Scheduling for Future Chip Multiprocessors

    Directory of Open Access Journals (Sweden)

    Pedro Trancoso

    2007-04-01

    Full Text Available The increased complexity and operating frequency in current single chip microprocessors is resulting in a decrease in the performance improvements. Consequently, major manufacturers offer chip multiprocessor (CMP architectures in order to keep up with the expected performance gains. This architecture is successfully being introduced in many markets including that of the embedded systems. Nevertheless, the integration of several cores onto the same chip may lead to increased heat dissipation and consequently additional costs for cooling, higher power consumption, decrease of the reliability, and thermal-induced performance loss, among others. In this paper, we analyze the evolution of the thermal issues for the future chip multiprocessor architectures and show that as the number of on-chip cores increases, the thermal-induced problems will worsen. In addition, we present several scenarios that result in excessive thermal stress to the CMP chip or significant performance loss. In order to minimize or even eliminate these problems, we propose thermal-aware scheduler (TAS algorithms. When assigning processes to cores, TAS takes their temperature and cooling ability into account in order to avoid thermal stress and at the same time improve the performance. Experimental results have shown that a TAS algorithm that considers also the temperatures of neighboring cores is able to significantly reduce the temperature-induced performance loss while at the same time, decrease the chip's temperature across many different operation and configuration scenarios.

  11. Multiprocessor Real-Time Locking Protocols for Replicated Resources

    Science.gov (United States)

    2016-07-01

    assignment problem, the ac- tual identities of the allocated replicas must be known. When locking protocols are used, tasks may experience delays due to both...Multiprocessor Real-Time Locking Protocols for Replicated Resources ∗ Catherine E. Jarrett1, Kecheng Yang1, Ming Yang1, Pontus Ekberg2, and James H...replicas to execute. In prior work on replicated resources, k-exclusion locks have been used, but this restricts tasks to lock only one replica at a time. To

  12. 3D-TV Rendering on a Multiprocessor System on a Chip

    NARCIS (Netherlands)

    Van Eijndhoven, J.T.J.; Li, X.

    2006-01-01

    This thesis focuses on the issue of mapping 3D-TV rendering applications to a multiprocessor platform. The target platform aims to address tomorrow's multi-media consumer market. The prototype chip, called Wasabi, contains a set of TriMedia processors that communicate viaa shared memory, fast

  13. Multi-processor system-level synthesis for multiple applications on platform FPGA

    NARCIS (Netherlands)

    Kumar, A.; Fernando, S.D.; Ha, Y.; Mesman, B.; Corporaal, H.; Bertels, Koen

    2007-01-01

    Multiprocessor systems-on-chip (MPSoC) are being developed in increasing numbers to support the high number of applications running on modern embedded systems. Designing and programming such systems prove to be a major challenge. Most of the current design methodologies rely on creating the design

  14. On the Parallel Elliptic Single/Multigrid Solutions about Aligned and Nonaligned Bodies Using the Virtual Machine for Multiprocessors

    Directory of Open Access Journals (Sweden)

    A. Averbuch

    1994-01-01

    Full Text Available Parallel elliptic single/multigrid solutions around an aligned and nonaligned body are presented and implemented on two multi-user and single-user shared memory multiprocessors (Sequent Symmetry and MOS and on a distributed memory multiprocessor (a Transputer network. Our parallel implementation uses the Virtual Machine for Muli-Processors (VMMP, a software package that provides a coherent set of services for explicitly parallel application programs running on diverse multiple instruction multiple data (MIMD multiprocessors, both shared memory and message passing. VMMP is intended to simplify parallel program writing and to promote portable and efficient programming. Furthermore, it ensures high portability of application programs by implementing the same services on all target multiprocessors. The performance of our algorithm is investigated in detail. It is seen to fit well the above architectures when the number of processors is less than the maximal number of grid points along the axes. In general, the efficiency in the nonaligned case is higher than in the aligned case. Alignment overhead is observed to be up to 200% in the shared-memory case and up to 65% in the message-passing case. We have demonstrated that when using VMMP, the portability of the algorithms is straightforward and efficient.

  15. Multiprocessor performance modeling with ADAS

    Science.gov (United States)

    Hayes, Paul J.; Andrews, Asa M.

    1989-01-01

    A graph managing strategy referred to as the Algorithm to Architecture Mapping Model (ATAMM) appears useful for the time-optimized execution of application algorithm graphs in embedded multiprocessors and for the performance prediction of graph designs. This paper reports the modeling of ATAMM in the Architecture Design and Assessment System (ADAS) to make an independent verification of ATAMM's performance prediction capability and to provide a user framework for the evaluation of arbitrary algorithm graphs. Following an overview of ATAMM and its major functional rules are descriptions of the ADAS model of ATAMM, methods to enter an arbitrary graph into the model, and techniques to analyze the simulation results. The performance of a 7-node graph example is evaluated using the ADAS model and verifies the ATAMM concept by substantiating previously published performance results.

  16. Multiprocessor shared-memory information exchange

    International Nuclear Information System (INIS)

    Santoline, L.L.; Bowers, M.D.; Crew, A.W.; Roslund, C.J.; Ghrist, W.D. III

    1989-01-01

    In distributed microprocessor-based instrumentation and control systems, the inter-and intra-subsystem communication requirements ultimately form the basis for the overall system architecture. This paper describes a software protocol which addresses the intra-subsystem communications problem. Specifically the protocol allows for multiple processors to exchange information via a shared-memory interface. The authors primary goal is to provide a reliable means for information to be exchanged between central application processor boards (masters) and dedicated function processor boards (slaves) in a single computer chassis. The resultant Multiprocessor Shared-Memory Information Exchange (MSMIE) protocol, a standard master-slave shared-memory interface suitable for use in nuclear safety systems, is designed to pass unidirectional buffers of information between the processors while providing a minimum, deterministic cycle time for this data exchange

  17. Design of massively parallel hardware multi-processors for highly-demanding embedded applications

    NARCIS (Netherlands)

    Jozwiak, L.; Jan, Y.

    2013-01-01

    Many new embedded applications require complex computations to be performed to tight schedules, while at the same time demanding low energy consumption and low cost. For implementation of these highly-demanding applications, highly-optimized application-specific multi-processor system-on-a-chip

  18. VME multiprocessor system for plasma control at the JT-60 Upgrade

    International Nuclear Information System (INIS)

    Kimura, T.; Kurihara, K.; Takahashi, M.; Kawamata, Y.; Akasaka, H.; Matsukawa, M.

    1989-01-01

    In this paper design and preliminary tests are reported of a VME multiprocessor system for the JT-60 Upgrade plasma control utilizing three MC88100 based RISC computers and VME buses. The design of the VME system was stimulated by faster and more accurate computation requirements for the plasma position and shape control

  19. Distributed power management of real-time applications on a GALS multiprocessor SOC

    NARCIS (Netherlands)

    Nelson, Andrew; Goossens, Kees

    2015-01-01

    It is generally desirable to reduce the power consumption of embedded systems. Dynamic Voltage and Frequency Scaling (DVFS) is a commonly applied technique to achieve power reduction at the cost of computational performance. Multiprocessor System on Chips (MPSoCs) can have multiple voltage and

  20. GOTHIC memory management : a multiprocessor shared single level store

    OpenAIRE

    Michel , Béatrice

    1990-01-01

    Gothic purpose is to build an object-oriented fault-tolerant distributed operating system for a local area network of multiprocessor workstations. This paper describes Gothic memory manager. It realizes the sharing of the secondary memory space between any process running on the Gothic system. Processes on different processors can communicate by sharing permanent information. The manager implements a shared single level storage with an invalidation protocol working on disk-pages to maintain s...

  1. Energy-Aware Real-Time Task Scheduling for Heterogeneous Multiprocessors with Particle Swarm Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    Weizhe Zhang

    2014-01-01

    Full Text Available Energy consumption in computer systems has become a more and more important issue. High energy consumption has already damaged the environment to some extent, especially in heterogeneous multiprocessors. In this paper, we first formulate and describe the energy-aware real-time task scheduling problem in heterogeneous multiprocessors. Then we propose a particle swarm optimization (PSO based algorithm, which can successfully reduce the energy cost and the time for searching feasible solutions. Experimental results show that the PSO-based energy-aware metaheuristic uses 40%–50% less energy than the GA-based and SFLA-based algorithms and spends 10% less time than the SFLA-based algorithm in finding the solutions. Besides, it can also find 19% more feasible solutions than the SFLA-based algorithm.

  2. DAEDALUS: System-Level Design Methodology for Streaming Multiprocessor Embedded Systems on Chips

    NARCIS (Netherlands)

    Stefanov, T.; Pimentel, A.; Nikolov, H.; Ha, S.; Teich, J.

    2017-01-01

    The complexity of modern embedded systems, which are increasingly based on heterogeneous multiprocessor system-on-chip (MPSoC) architectures, has led to the emergence of system-level design. To cope with this design complexity, system-level design aims at raising the abstraction level of the design

  3. Cache aware mapping of streaming apllications on a multiprocessor system-on-chip

    NARCIS (Netherlands)

    Moonen, A.J.M.; Bekooij, M.J.G.; Berg, van den R.M.J.; Meerbergen, van J.; Sciuto, D.; Peng, Z.

    2008-01-01

    Efficient use of the memory hierarchy is critical for achieving high performance in a multiprocessor system- on-chip. An external memory that is shared between processors is a bottleneck in current and future systems. Cache misses and a large cache miss penalty contribute to a low processor

  4. Multiprocessor based data acquisition system for radiation monitoring in nuclear reactors

    International Nuclear Information System (INIS)

    Pansare, M.G.; Narsaiah, A.; Anantha Krishnan, T.S.

    1989-01-01

    Expensive minicomputers are required for building powerful Data Acquisition Systems (DAS) capable of scanning and processing large number of signals in a real-time environment. However by using the inexpensive microprocessors in multiprocessor configuration it is possible to build DASs that are as powerful as minicomputer based systems at much lesser cost. This paper describes such a multiprocessor based DAS designed for acquiring data from various radiation monitoring instruments of a nuclear reactor. The system is built by using MULTIBUS standard boards based on intel 8086, 16 bit microprocessor, with local and shared memory. The system monitors upto 128 analog input channels, 64 digital input channels and actuates upto 128 digital output contacts. The system continuously checks for the alarm condition of the input channels and displays the alarm status on an ALARM CRT. Facility has been provided for the transfer of data to a central computer. At any instant of time, the information regarding different channels being monitored is available from the local console as well as through five remote terminals located at various places in the reactor building. (author)

  5. Multiprocessor system with multiple concurrent modes of execution

    Science.gov (United States)

    Ahn, Daniel; Ceze, Luis H; Chen, Dong; Gara, Alan; Heidelberger, Philip; Ohmacht, Martin

    2013-12-31

    A multiprocessor system supports multiple concurrent modes of speculative execution. Speculation identification numbers (IDs) are allocated to speculative threads from a pool of available numbers. The pool is divided into domains, with each domain being assigned to a mode of speculation. Modes of speculation include TM, TLS, and rollback. Allocation of the IDs is carried out with respect to a central state table and using hardware pointers. The IDs are used for writing different versions of speculative results in different ways of a set in a cache memory.

  6. Design considerations for a multiprocessor based data acquisition system

    International Nuclear Information System (INIS)

    Tippie, J.W.; Kulaga, J.E.

    1979-01-01

    The rapid advance of digital technology has provided the systems designer with many new design options. Hardware is no longer the controlling expense. Complex operating systems provide the flexibility and development tools needed by software designers, but restrict throughput. Multiprocessor-based systems can be used to ''front-end'' high-throughput applications while maintaining the many advantages offered by multitasking operating systems. The design of a high-throughput data acquisition system for application in low energy nuclear physics is considered

  7. A simple multiprocessor management system for event-parallel computing

    International Nuclear Information System (INIS)

    Bracker, S.; Gounder, K.; Hendrix, K.; Summers, D.

    1996-01-01

    Offline software using Transmission Control Protocol/Internet Protocol (TCP/IP) sockets to distribute particle physics events to multiple UNIX/RISC workstations is described. A modular, building block approach was taken that allowed tailoring to solve specific tasks efficiently and simply as they arose. The modest, initial cost was having to learn about sockets for interprocess communication. This multiprocessor management software has been used to control the reconstruction of eight billion raw data events from Fermilab Experiment E791

  8. A scalable single-chip multi-processor architecture with on-chip RTOS kernel

    NARCIS (Netherlands)

    Theelen, B.D.; Verschueren, A.C.; Reyes Suarez, V.V.; Stevens, M.P.J.; Nunez, A.

    2003-01-01

    Now that system-on-chip technology is emerging, single-chip multi-processors are becoming feasible. A key problem of designing such systems is the complexity of their on-chip interconnects and memory architecture. It is furthermore unclear at what level software should be integrated. An example of a

  9. Elastic pointer directory organization for scalable shared memory multiprocessors

    Institute of Scientific and Technical Information of China (English)

    Yuhang Liu; Mingfa Zhu; Limin Xiao

    2014-01-01

    In the field of supercomputing, one key issue for scal-able shared-memory multiprocessors is the design of the directory which denotes the sharing state for a cache block. A good direc-tory design intends to achieve three key attributes: reasonable memory overhead, sharer position precision and implementation complexity. However, researchers often face the problem that gain-ing one attribute may result in losing another. The paper proposes an elastic pointer directory (EPD) structure based on the analysis of shared-memory applications, taking the fact that the number of sharers for each directory entry is typical y smal . Analysis re-sults show that for 4 096 nodes, the ratio of memory overhead to the ful-map directory is 2.7%. Theoretical analysis and cycle-accurate execution-driven simulations on a 16 and 64-node cache coherence non uniform memory access (CC-NUMA) multiproces-sor show that the corresponding pointer overflow probability is reduced significantly. The performance is observed to be better than that of a limited pointers directory and almost identical to the ful-map directory, except for the slight implementation complex-ity. Using the directory cache to explore directory access locality is also studied. The experimental result shows that this is a promis-ing approach to be used in the state-of-the-art high performance computing domain.

  10. An FPGA design flow for reconfigurable network-based multi-processor systems on chip

    NARCIS (Netherlands)

    Kumar, A.; Hansson, M.A; Huisken, J.; Corporaal, H.

    2007-01-01

    Multi-processor systems on chip (MPSoC) platforms are becoming increasingly more heterogeneous and are shifting towards a more communication-centric methodology. Networks on chip (NoC) have emerged as the design paradigm for scalable on-chip communication architectures. As the system complexity

  11. Method for wiring allocation and switch configuration in a multiprocessor environment

    Science.gov (United States)

    Aridor, Yariv [Zichron Ya'akov, IL; Domany, Tamar [Kiryat Tivon, IL; Frachtenberg, Eitan [Jerusalem, IL; Gal, Yoav [Haifa, IL; Shmueli, Edi [Haifa, IL; Stockmeyer, legal representative, Robert E.; Stockmeyer, Larry Joseph [San Jose, CA

    2008-07-15

    A method for wiring allocation and switch configuration in a multiprocessor computer, the method including employing depth-first tree traversal to determine a plurality of paths among a plurality of processing elements allocated to a job along a plurality of switches and wires in a plurality of D-lines, and selecting one of the paths in accordance with at least one selection criterion.

  12. A high speed multi-tasking, multi-processor telemetry system

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Kung Chris [Univ. of Texas, El Paso, TX (United States)

    1996-12-31

    This paper describes a small size, light weight, multitasking, multiprocessor telemetry system capable of collecting 32 channels of differential signals at a sampling rate of 6.25 kHz per channel. The system is designed to collect data from remote wind turbine research sites and transfer the data via wireless communication. A description of operational theory, hardware components, and itemized cost is provided. Synchronization with other data acquisition systems and test data on data transmission rates is also given. 11 refs., 7 figs., 4 tabs.

  13. Quality-driven model-based design of multi-processor accelerators : an application to LDPC decoders

    NARCIS (Netherlands)

    Jan, Y.

    2012-01-01

    The recent spectacular progress in nano-electronic technology has enabled the implementation of very complex multi-processor systems on single chips (MPSoCs). However in parallel, new highly demanding complex embedded applications are emerging, in fields like communication and networking,

  14. Some algorithms for the solution of the symmetric eigenvalue problem on a multiprocessor electronic computer

    International Nuclear Information System (INIS)

    Molchanov, I.N.; Khimich, A.N.

    1984-01-01

    This article shows how a reflection method can be used to find the eigenvalues of a matrix by transforming the matrix to tridiagonal form. The method of conjugate gradients is used to find the smallest eigenvalue and the corresponding eigenvector of symmetric positive-definite band matrices. Topics considered include the computational scheme of the reflection method, the organization of parallel calculations by the reflection method, the computational scheme of the conjugate gradient method, the organization of parallel calculations by the conjugate gradient method, and the effectiveness of parallel algorithms. It is concluded that it is possible to increase the overall effectiveness of the multiprocessor electronic computers by either letting the newly available processors of a new problem operate in the multiprocessor mode, or by improving the coefficient of uniform partition of the original information

  15. 2: Local area networks as a multiprocessor treatment planning system

    International Nuclear Information System (INIS)

    Neblett, D.L.; Hogan, S.E.

    1987-01-01

    The creation of a local area network (LAN) of interconnected computers provides an environment of multi computer processors that adds a new dimension to treatment planning. A LAN system provides the opportunity to have two or more computers working on the plan in parallel. With high speed interprocessor transfer, events such as the time consuming task of correcting several individual beams for contours and inhomogeneities can be performed simultaneously; thus, effectively creating a parallel multiprocessor treatment planning system

  16. Temporal analysis and scheduling of hard real-time radios running on a multi-processor

    NARCIS (Netherlands)

    Moreira, O.

    2012-01-01

    On a multi-radio baseband system, multiple independent transceivers must share the resources of a multi-processor, while meeting each its own hard real-time requirements. Not all possible combinations of transceivers are known at compile time, so a solution must be found that either allows for

  17. Software for the ACP [Advanced Computer Program] multiprocessor system

    International Nuclear Information System (INIS)

    Biel, J.; Areti, H.; Atac, R.

    1987-01-01

    Software has been developed for use with the Fermilab Advanced Computer Program (ACP) multiprocessor system. The software was designed to make a system of a hundred independent node processors as easy to use as a single, powerful CPU. Subroutines have been developed by which a user's host program can send data to and get results from the program running in each of his ACP node processors. Utility programs make it easy to compile and link host and node programs, to debug a node program on an ACP development system, and to submit a debugged program to an ACP production system

  18. A measurement-based performability model for a multiprocessor system

    Science.gov (United States)

    Ilsueh, M. C.; Iyer, Ravi K.; Trivedi, K. S.

    1987-01-01

    A measurement-based performability model based on real error-data collected on a multiprocessor system is described. Model development from the raw errror-data to the estimation of cumulative reward is described. Both normal and failure behavior of the system are characterized. The measured data show that the holding times in key operational and failure states are not simple exponential and that semi-Markov process is necessary to model the system behavior. A reward function, based on the service rate and the error rate in each state, is then defined in order to estimate the performability of the system and to depict the cost of different failure types and recovery procedures.

  19. Geometric Algorithms for Private-Cache Chip Multiprocessors

    DEFF Research Database (Denmark)

    Ajwani, Deepak; Sitchinava, Nodari; Zeh, Norbert

    2010-01-01

    -D convex hulls. These results are obtained by analyzing adaptations of either the PEM merge sort algorithm or PRAM algorithms. For the second group of problems—orthogonal line segment intersection reporting, batched range reporting, and related problems—more effort is required. What distinguishes......We study techniques for obtaining efficient algorithms for geometric problems on private-cache chip multiprocessors. We show how to obtain optimal algorithms for interval stabbing counting, 1-D range counting, weighted 2-D dominance counting, and for computing 3-D maxima, 2-D lower envelopes, and 2...... these problems from the ones in the previous group is the variable output size, which requires I/O-efficient load balancing strategies based on the contribution of the individual input elements to the output size. To obtain nearly optimal algorithms for these problems, we introduce a parallel distribution...

  20. A survey of Tumult, a real-time multi-processor system

    International Nuclear Information System (INIS)

    Jansen, P.G.

    1986-01-01

    Tumult (Twente University MULTi processor system) is the name of an ongoing project aiming at the design and implementation of a modular extendible multiprocessor system. All memory is distributed and processors communicate in parallel via a fast and reliable local switching network instead of a shared bus. A distributed real-time operating system is being designed and implemented, consisting of a multi-tasking subsystem per processor. Processes can communicate via a message passing mechanism. Communication links and processes are dynamically created and disposed by the application. In this article a brief description of the system is given; communication aspects are emphasized. (Auth.)

  1. Resource Allocation Model for Modelling Abstract RTOS on Multiprocessor System-on-Chip

    DEFF Research Database (Denmark)

    Virk, Kashif Munir; Madsen, Jan

    2003-01-01

    Resource Allocation is an important problem in RTOS's, and has been an active area of research. Numerous approaches have been developed and many different techniques have been combined for a wide range of applications. In this paper, we address the problem of resource allocation in the context...... of modelling an abstract RTOS on multiprocessor SoC platforms. We discuss the implementation details of a simplified basic priority inheritance protocol for our abstract system model in SystemC....

  2. A multiprocessor computer simulation model employing a feedback scheduler/allocator for memory space and bandwidth matching and TMR processing

    Science.gov (United States)

    Bradley, D. B.; Irwin, J. D.

    1974-01-01

    A computer simulation model for a multiprocessor computer is developed that is useful for studying the problem of matching multiprocessor's memory space, memory bandwidth and numbers and speeds of processors with aggregate job set characteristics. The model assumes an input work load of a set of recurrent jobs. The model includes a feedback scheduler/allocator which attempts to improve system performance through higher memory bandwidth utilization by matching individual job requirements for space and bandwidth with space availability and estimates of bandwidth availability at the times of memory allocation. The simulation model includes provisions for specifying precedence relations among the jobs in a job set, and provisions for specifying precedence execution of TMR (Triple Modular Redundant and SIMPLEX (non redundant) jobs.

  3. Standard interfaces for program-modular multiprocessor systems

    International Nuclear Information System (INIS)

    Chernykh, E.V.

    1982-01-01

    The peculiarities of the structures of existing and developed standard interfaces used in automation systems for nuclear physical experiments are considered. general structural characteristics of multiprocessor system interfaces are revealed. The comparison of the existing system CAMAC crate and designed standards of COMPEX, E3S and FASTBUS interfaces by capacity and relative cost is carried out. The analysis of the given data shows that operation of any interface is more advantageous at the rates close to capacity values, the relative cost being minimum. In this case the advantage is on the side of interfaces with greater capacity values for which at a moderated decrease of the exchange or requests processing rate the relative costs grow slower. A higher capacity of one-cycle exchange is provided with functional data way specialization in the interface. The conclusion is drawn that most perspective trend in the development of automation systems for high energy physics experiments is using FASTBUS standard

  4. Multi-processor network implementations in Multibus II and VME

    International Nuclear Information System (INIS)

    Briegel, C.

    1992-01-01

    ACNET (Fermilab Accelerator Controls Network), a proprietary network protocol, is implemented in a multi-processor configuration for both Multibus II and VME. The implementations are contrasted by the bus protocol and software design goals. The Multibus II implementation provides for multiple processors running a duplicate set of tasks on each processor. For a network connected task, messages are distributed by a network round-robin scheduler. Further, messages can be stopped, continued, or re-routed for each task by user-callable commands. The VME implementation provides for multiple processors running one task across all processors. The process can either be fixed to a particular processor or dynamically allocated to an available processor depending on the scheduling algorithm of the multi-processing operating system. (author)

  5. Recommending the heterogeneous cluster type multi-processor system computing

    International Nuclear Information System (INIS)

    Iijima, Nobukazu

    2010-01-01

    Real-time reactor simulator had been developed by reusing the equipment of the Musashi reactor and its performance improvement became indispensable for research tools to increase sampling rate with introduction of arithmetic units using multi-Digital Signal Processor(DSP) system (cluster). In order to realize the heterogeneous cluster type multi-processor system computing, combination of two kinds of Control Processor (CP) s, Cluster Control Processor (CCP) and System Control Processor (SCP), were proposed with Large System Control Processor (LSCP) for hierarchical cluster if needed. Faster computing performance of this system was well evaluated by simulation results for simultaneous execution of plural jobs and also pipeline processing between clusters, which showed the system led to effective use of existing system and enhancement of the cost performance. (T. Tanaka)

  6. Using a commercial symmetric multiprocessor for lattice QCD

    International Nuclear Information System (INIS)

    Brower, R.C.; Chen, D.; Negele, J.W.

    1998-01-01

    In its evolution, the computer industry has reached the point when considerable computing power can be packaged on a single microprocessor chip. At the same time, costs of designing a computer system around such a CPU are growing. For these reasons we decided to explore a possibility of using commercially available symmetric multiprocessors (SMP) as building blocks for the LQCD computer. Careful analysis of the architecture allowed us to build a QCD primitive library running close to the peak performance on the UltraSPARC processor. As a result, multithreaded QCD code (both the heatbath and the Wilson fermion inverter) runs at about 50% efficiency on a single SMP. The communication between different CPUs is handled by a coherent memory system. Currently we are planning to connect several SMPs with a high bandwidth network into a single system. (orig.)

  7. Utilizing a multiprocessor architecture - The performance of MIDAS

    International Nuclear Information System (INIS)

    Maples, C.; Logan, D.; Meng, J.; Rathbun, W.; Weaver, D.

    1983-01-01

    The MIDAS architecture organizes multiple CPUs into clusters called distributed subsystems. Each subsystem consists of an array of processors controlled by a supervisory CPU. The multiprocessor array is composed of commercial CPUs (with floating point hardware) and specialized processing elements. Interprocessor communication within the array may occur either through switched memory modules or common shared memory. The architecture permits multiple processors to be focused on single problems. A distributed subsystem has been constructed and tested. It currently consists of a supervisor CPU; 16 blocks of independently switchable memory; 9 general purpose, VAX-class CPUs; and 2 specialized pipelined processors to handle I/O. Results on a variety of problems indicate that the subsystem performs 8 to 15 times faster than a standard computer with an identical CPU. The difference in performance represents the effect of differing CPU and I/O requirements

  8. Parallel algorithms for geometric connected component labeling on a hypercube multiprocessor

    Science.gov (United States)

    Belkhale, K. P.; Banerjee, P.

    1992-01-01

    Different algorithms for the geometric connected component labeling (GCCL) problem are defined each of which involves d stages of message passing, for a d-dimensional hypercube. The major idea is that in each stage a hypercube multiprocessor increases its knowledge of domain. The algorithms under consideration include the QUAD algorithm for small number of processors and the Overlap Quad algorithm for large number of processors, subject to the locality of the connected sets. These algorithms differ in their run time, memory requirements, and message complexity. They were implemented on an Intel iPSC2/D4/MX hypercube.

  9. Development of a VME multi-processor system for plasma control at the JT-60 Upgrade

    International Nuclear Information System (INIS)

    Takahashi, M.; Kurihara, K.; Kawamata, Y.; Akasaka, H.; Kimura, T.

    1992-01-01

    Design and initial operation results are reported of a VME multi-processor system [1] for plasma control at a large fusion device named 'the JT-60 Upgrade' utilizing three 32-bit MC88100 based RISC computers and VME components. Development of the system was stimulated by faster and more accurate computation requirements for the plasma position and current control. The RISC computers operate at 25 MHz along with two cashe memories named MC88200. We newly developed VME bus modules of up/down counter, analog-to-digital converter and clock pulse generator for measuring magnetic field and coil current and for synchronizing the processing in the three RISCs and direct digital controllers (DDCs) of magnet power supplies. We also evaluated that the speed of the data transfer between the VME bus system and the DDCs through CAMAC highways satisfies the above requirements. In the initial operation of the JT-60 upgrade, it has been proved that the VME multi-processor system well controls the plasma position and current with a sampling period of 250 μsec and a delay of 500 μsec. (author)

  10. Quality-Driven Model-Based Design of MultiProcessor Embedded Systems for Highlydemanding Applications

    DEFF Research Database (Denmark)

    Jozwiak, Lech; Madsen, Jan

    2013-01-01

    The recent spectacular progress in modern nano-dimension semiconductor technology enabled implementation of a complete complex multi-processor system on a single chip (MPSoC), global networking and mobile wire-less communication, and facilitated a fast progress in these areas. New important...... accessible or distant) objects, installations, machines or devices, or even implanted in human or animal body can serve as examples. However, many of the modern embedded application impose very stringent functional and parametric demands. Moreover, the spectacular advances in microelectronics introduced...

  11. Analysis and Optimisation of Hierarchically Scheduled Multiprocessor Embedded Systems

    DEFF Research Database (Denmark)

    Pop, Traian; Pop, Paul; Eles, Petru

    2008-01-01

    We present an approach to the analysis and optimisation of heterogeneous multiprocessor embedded systems. The systems are heterogeneous not only in terms of hardware components, but also in terms of communication protocols and scheduling policies. When several scheduling policies share a resource......, they are organised in a hierarchy. In this paper, we first develop a holistic scheduling and schedulability analysis that determines the timing properties of a hierarchically scheduled system. Second, we address design problems that are characteristic to such hierarchically scheduled systems: assignment...... of scheduling policies to tasks, mapping of tasks to hardware components, and the scheduling of the activities. We also present several algorithms for solving these problems. Our heuristics are able to find schedulable implementations under limited resources, achieving an efficient utilisation of the system...

  12. Prefetching in file systems for MIMD multiprocessors

    Science.gov (United States)

    Kotz, David F.; Ellis, Carla Schlatter

    1990-01-01

    The question of whether prefetching blocks on the file into the block cache can effectively reduce overall execution time of a parallel computation, even under favorable assumptions, is considered. Experiments have been conducted with an interleaved file system testbed on the Butterfly Plus multiprocessor. Results of these experiments suggest that (1) the hit ratio, the accepted measure in traditional caching studies, may not be an adequate measure of performance when the workload consists of parallel computations and parallel file access patterns, (2) caching with prefetching can significantly improve the hit ratio and the average time to perform an I/O (input/output) operation, and (3) an improvement in overall execution time has been observed in most cases. In spite of these gains, prefetching sometimes results in increased execution times (a negative result, given the optimistic nature of the study). The authors explore why it is not trivial to translate savings on individual I/O requests into consistently better overall performance and identify the key problems that need to be addressed in order to improve the potential of prefetching techniques in the environment.

  13. ARTiS, an Asymmetric Real-Time Scheduler for Linux on Multi-Processor Architectures

    OpenAIRE

    Piel , Éric; Marquet , Philippe; Soula , Julien; Osuna , Christophe; Dekeyser , Jean-Luc

    2005-01-01

    The ARTiS system is a real-time extension of the GNU/Linux scheduler dedicated to SMP (Symmetric Multi-Processors) systems. It allows to mix High Performance Computing and real-time. ARTiS exploits the SMP architecture to guarantee the preemption of a processor when the system has to schedule a real-time task. The implementation is available as a modification of the Linux kernel, especially focusing (but not restricted to) IA-64 architecture. The basic idea of ARTiS is to assign a selected se...

  14. Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

    Energy Technology Data Exchange (ETDEWEB)

    Ohmacht, Martin

    2017-08-15

    In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.

  15. Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

    Science.gov (United States)

    Ohmacht, Martin

    2014-09-09

    In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.

  16. Process Management and Exception Handling in Multiprocessor Operating Systems Using Object-Oriented Design Techniques. Revised Sep. 1988

    Science.gov (United States)

    Russo, Vincent; Johnston, Gary; Campbell, Roy

    1988-01-01

    The programming of the interrupt handling mechanisms, process switching primitives, scheduling mechanism, and synchronization primitives of an operating system for a multiprocessor require both efficient code in order to support the needs of high- performance or real-time applications and careful organization to facilitate maintenance. Although many advantages have been claimed for object-oriented class hierarchical languages and their corresponding design methodologies, the application of these techniques to the design of the primitives within an operating system has not been widely demonstrated. To investigate the role of class hierarchical design in systems programming, the authors have constructed the Choices multiprocessor operating system architecture the C++ programming language. During the implementation, it was found that many operating system design concerns can be represented advantageously using a class hierarchical approach, including: the separation of mechanism and policy; the organization of an operating system into layers, each of which represents an abstract machine; and the notions of process and exception management. In this paper, we discuss an implementation of the low-level primitives of this system and outline the strategy by which we developed our solution.

  17. Commodity multi-processor systems in the ATLAS level-2 trigger

    International Nuclear Information System (INIS)

    Abolins, M.; Blair, R.; Bock, R.; Bogaerts, A.; Dawson, J.; Ermoline, Y.; Hauser, R.; Kugel, A.; Lay, R.; Muller, M.; Noffz, K.-H.; Pope, B.; Schlereth, J.; Werner, P.

    2000-01-01

    Low cost SMP (Symmetric Multi-Processor) systems provide substantial CPU and I/O capacity. These features together with the ease of system integration make them an attractive and cost effective solution for a number of real-time applications in event selection. In ATLAS the authors consider them as intelligent input buffers (active ROB complex), as event flow supervisors or as powerful processing nodes. Measurements of the performance of one off-the-shelf commercial 4-processor PC with two PCI buses, equipped with commercial FPGA based data source cards (microEnable) and running commercial software are presented and mapped on such applications together with a long-term program of work. The SMP systems may be considered as an important building block in future data acquisition systems

  18. Multi-processor data acquisition and monitoring systems for particle physics

    International Nuclear Information System (INIS)

    White, V.; Burch, B.; Eng, K.; Heinicke, P.; Pyatetsky, M.; Ritchie, D.

    1983-01-01

    A high speed distributed processing system, using PDP-11 and VAX processors, is being developed at Fermilab. The acquisition of data is done using one or more PDP-11s. Additional processors are connected to provide either data logging or extra data analysis capabilities. Within this framework, functional interchangeability of PDP-11 and VAX processors and of the PDP-11 operating systems, RT-11 and RSX-11M, has been maintained. Inter-processor connections have been implemented in a general way using the 5 megabit DR11-W hardware currently selected for the purpose. Using this approach the authors have been able to make use of several existing data acquisition and analysis packages, such as RT/MULTI, in a multi-processor system

  19. Commodity multi-processor systems in the ATLAS level-2 trigger

    CERN Document Server

    Abolins, M; Bock, R; Bogaerts, J A C; Dawson, J; Ermoline, Y; Hauser, R; Kugel, A; Lay, R; Müller, M; Noffz, K H; Pope, B; Schlereth, J L; Werner, P

    2000-01-01

    Low cost SMP (symmetric multi-processor) systems provide substantial CPU and I/O capacity. These features together with the ease of system integration make them an attractive and cost effective solution for a number of real-time applications in event selection. In ATLAS we consider them as intelligent input buffers (an "active" ROB complex), as event flow supervisors or as powerful processing nodes. Measurements of the performance of one off-the-shelf commercial 4- processor PC with two PCI buses, equipped with commercial FPGA based data source cards (microEnable) and running commercial software are presented and mapped on such applications together with a long-term programme of work. The SMP systems may be considered as an important building block in future data acquisition systems. (9 refs).

  20. An Adaptive Hybrid Multiprocessor technique for bioinformatics sequence alignment

    KAUST Repository

    Bonny, Talal

    2012-07-28

    Sequence alignment algorithms such as the Smith-Waterman algorithm are among the most important applications in the development of bioinformatics. Sequence alignment algorithms must process large amounts of data which may take a long time. Here, we introduce our Adaptive Hybrid Multiprocessor technique to accelerate the implementation of the Smith-Waterman algorithm. Our technique utilizes both the graphics processing unit (GPU) and the central processing unit (CPU). It adapts to the implementation according to the number of CPUs given as input by efficiently distributing the workload between the processing units. Using existing resources (GPU and CPU) in an efficient way is a novel approach. The peak performance achieved for the platforms GPU + CPU, GPU + 2CPUs, and GPU + 3CPUs is 10.4 GCUPS, 13.7 GCUPS, and 18.6 GCUPS, respectively (with the query length of 511 amino acid). © 2010 IEEE.

  1. Hardware support for CSP on a Java chip multiprocessor

    DEFF Research Database (Denmark)

    Gruian, Flavius; Schoeberl, Martin

    2013-01-01

    Due to memory bandwidth limitations, chip multiprocessors (CMPs) adopting the convenient shared memory model for their main memory architecture scale poorly. On-chip core-to-core communication is a solution to this problem, that can lead to further performance increase for a number of multithreaded...... applications. Programmatically, the Communicating Sequential Processes (CSPs) paradigm provides a sound computational model for such an architecture with message based communication. In this paper we explore hardware support for CSP in the context of an embedded Java CMP. The hardware support for CSP are on......-chip communication channels, implemented by a ring-based network-on-chip (NoC), to reduce the memory bandwidth pressure on the shared memory.The presented solution is scalable and also specific for our limited resources and real-time predictability requirements. CMP architectures of three to eight processors were...

  2. Behavioral Simulation and Performance Evaluation of Multi-Processor Architectures

    Directory of Open Access Journals (Sweden)

    Ausif Mahmood

    1996-01-01

    Full Text Available The development of multi-processor architectures requires extensive behavioral simulations to verify the correctness of design and to evaluate its performance. A high level language can provide maximum flexibility in this respect if the constructs for handling concurrent processes and a time mapping mechanism are added. This paper describes a novel technique for emulating hardware processes involved in a parallel architecture such that an object-oriented description of the design is maintained. The communication and synchronization between hardware processes is handled by splitting the processes into their equivalent subprograms at the entry points. The proper scheduling of these subprograms is coordinated by a timing wheel which provides a time mapping mechanism. Finally, a high level language pre-processor is proposed so that the timing wheel and the process emulation details can be made transparent to the user.

  3. Design of Networks-on-Chip for Real-Time Multi-Processor Systems-on-Chip

    DEFF Research Database (Denmark)

    Sparsø, Jens

    2012-01-01

    This paper addresses the design of networks-on-chips for use in multi-processor systems-on-chips - the hardware platforms used in embedded systems. These platforms typically have to guarantee real-time properties, and as the network is a shared resource, it has to provide service guarantees...... (bandwidth and/or latency) to different communication flows. The paper reviews some past work in this field and the lessons learned, and the paper discusses ongoing research conducted as part of the project "Time-predictable Multi-Core Architecture for Embedded Systems" (T-CREST), supported by the European...

  4. FTMP (Fault Tolerant Multiprocessor) programmer's manual

    Science.gov (United States)

    Feather, F. E.; Liceaga, C. A.; Padilla, P. A.

    1986-01-01

    The Fault Tolerant Multiprocessor (FTMP) computer system was constructed using the Rockwell/Collins CAPS-6 processor. It is installed in the Avionics Integration Research Laboratory (AIRLAB) of NASA Langley Research Center. It is hosted by AIRLAB's System 10, a VAX 11/750, for the loading of programs and experimentation. The FTMP support software includes a cross compiler for a high level language called Automated Engineering Design (AED) System, an assembler for the CAPS-6 processor assembly language, and a linker. Access to this support software is through an automated remote access facility on the VAX which relieves the user of the burden of learning how to use the IBM 4381. This manual is a compilation of information about the FTMP support environment. It explains the FTMP software and support environment along many of the finer points of running programs on FTMP. This will be helpful to the researcher trying to run an experiment on FTMP and even to the person probing FTMP with fault injections. Much of the information in this manual can be found in other sources; we are only attempting to bring together the basic points in a single source. If the reader should need points clarified, there is a list of support documentation in the back of this manual.

  5. Multiprocessor architecture: Synthesis and evaluation

    Science.gov (United States)

    Standley, Hilda M.

    1990-01-01

    Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.

  6. Lower bounds for the head-body-tail problem on parallel machines: a computational study for the multiprocessor flow shop

    NARCIS (Netherlands)

    A. Vandevelde; J.A. Hoogeveen; C.A.J. Hurkens (Cor); J.K. Lenstra (Jan Karel)

    2005-01-01

    htmlabstractThe multiprocessor flow-shop is the generalization of the flow-shop in which each machine is replaced by a set of identical machines. As finding a minimum-length schedule is NP-hard, we set out to find good lower and upper bounds. The lower bounds are based on relaxation of the

  7. USC orthogonal multiprocessor for image processing with neural networks

    Science.gov (United States)

    Hwang, Kai; Panda, Dhabaleswar K.; Haddadi, Navid

    1990-07-01

    This paper presents the architectural features and imaging applications of the Orthogonal MultiProcessor (OMP) system, which is under construction at the University of Southern California with research funding from NSF and assistance from several industrial partners. The prototype OMP is being built with 16 Intel i860 RISC microprocessors and 256 parallel memory modules using custom-designed spanning buses, which are 2-D interleaved and orthogonally accessed without conflicts. The 16-processor OMP prototype is targeted to achieve 430 MIPS and 600 Mflops, which have been verified by simulation experiments based on the design parameters used. The prototype OMP machine will be initially applied for image processing, computer vision, and neural network simulation applications. We summarize important vision and imaging algorithms that can be restructured with neural network models. These algorithms can efficiently run on the OMP hardware with linear speedup. The ultimate goal is to develop a high-performance Visual Computer (Viscom) for integrated low- and high-level image processing and vision tasks.

  8. Job-mix modeling and system analysis of an aerospace multiprocessor.

    Science.gov (United States)

    Mallach, E. G.

    1972-01-01

    An aerospace guidance computer organization, consisting of multiple processors and memory units attached to a central time-multiplexed data bus, is described. A job mix for this type of computer is obtained by analysis of Apollo mission programs. Multiprocessor performance is then analyzed using: 1) queuing theory, under certain 'limiting case' assumptions; 2) Markov process methods; and 3) system simulation. Results of the analyses indicate: 1) Markov process analysis is a useful and efficient predictor of simulation results; 2) efficient job execution is not seriously impaired even when the system is so overloaded that new jobs are inordinately delayed in starting; 3) job scheduling is significant in determining system performance; and 4) a system having many slow processors may or may not perform better than a system of equal power having few fast processors, but will not perform significantly worse.

  9. Lower bounds for the head-body-tail problem on parallel machines : a computational study of the multiprocessor flow shop

    NARCIS (Netherlands)

    Vandevelde, A.; Hoogeveen, J.A.; Hurkens, C.A.J.; Lenstra, J.K.

    2005-01-01

    The multiprocessor flow-shop is the generalization of the flow-shop in which each machine is replaced by a set of identical machines. As finding a minimum-length schedule is NP-hard, we set out to find good lower and upper bounds. The lower bounds are based on relaxation of the capacities of all

  10. Use of a genetic algorithm to solve two-fluid flow problems on an NCUBE multiprocessor computer

    International Nuclear Information System (INIS)

    Pryor, R.J.; Cline, D.D.

    1992-01-01

    A method of solving the two-phase fluid flow equations using a genetic algorithm on a NCUBE multiprocessor computer is presented. The topics discussed are the two-phase flow equations, the genetic representation of the unknowns, the fitness function, the genetic operators, and the implementation of the algorithm on the NCUBE computer. The efficiency of the implementation is investigated using a pipe blowdown problem. Effects of varying the genetic parameters and the number of processors are presented

  11. Use of a genetic agorithm to solve two-fluid flow problems on an NCUBE multiprocessor computer

    International Nuclear Information System (INIS)

    Pryor, R.J.; Cline, D.D.

    1993-01-01

    A method of solving the two-phases fluid flow equations using a genetic algorithm on a NCUBE multiprocessor computer is presented. The topics discussed are the two-phase flow equations, the genetic representation of the unkowns, the fitness function, the genetic operators, and the implementation of the algorithm on the NCUBE computer. The efficiency of the implementation is investigated using a pipe blowdown problem. Effects of varying the genetic parameters and the number of processors are presented. (orig.)

  12. Operating System for Runtime Reconfigurable Multiprocessor Systems

    Directory of Open Access Journals (Sweden)

    Diana Göhringer

    2011-01-01

    Full Text Available Operating systems traditionally handle the task scheduling of one or more application instances on processor-like hardware architectures. RAMPSoC, a novel runtime adaptive multiprocessor System-on-Chip, exploits the dynamic reconfiguration on FPGAs to generate, start and terminate hardware and software tasks. The hardware tasks have to be transferred to the reconfigurable hardware via a configuration access port. The software tasks can be loaded into the local memory of the respective IP core either via the configuration access port or via the on-chip communication infrastructure (e.g. a Network-on-Chip. Recent-series of Xilinx FPGAs, such as Virtex-5, provide two Internal Configuration Access Ports, which cannot be accessed simultaneously. To prevent conflicts, the access to these ports as well as the hardware resource management needs to be controlled, e.g. by a special-purpose operating system running on an embedded processor. For that purpose and to handle the relations between temporally and spatially scheduled operations, the novel approach of an operating system is of high importance. This special purpose operating system, called CAP-OS (Configuration Access Port-Operating System, which will be presented in this paper, supports the clients using the configuration port with the services of priority-based access scheduling, hardware task mapping and resource management.

  13. A parallel implementation of 3-d CT image reconstruction on a hypercube multiprocessor

    International Nuclear Information System (INIS)

    Chen, C.M.; Lee, S.Y.; Cho, Z.H.

    1990-01-01

    In this paper, the authors describe how image reconstruction in computerized tomography (CT) can be parallelized on a message-passing multiprocessor. In particular, the results obtained from parallel implementation of 3-D CT image reconstruction for parallel beam geometries on the Intel hypercube, iPSC/2, are presented. A two stage pipelining approach is employed for filtering (convolution) and backprojection. The conventional sequential convolution algorithm is modified such that the symmetry of the filter kernel is fully utilized for parallelization. In the backprojection stage, the 3-D incremental algorithm, the authors' recently developed backprojection scheme which is shown to be faster than conventional algorithm, is parallelized

  14. Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

    Science.gov (United States)

    Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.

    1990-01-01

    Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.

  15. Embedded software design and programming of multiprocessor system-on-chip simulink and system C case studies

    CERN Document Server

    Popovici, Katalin; Jerraya, Ahmed A; Wolf, Marilyn

    2010-01-01

    Current multimedia and telecom applications require complex, heterogeneous multiprocessor system on chip (MPSoC) architectures with specific communication infrastructure in order to achieve the required performance. Heterogeneous MPSoC includes different types of processing units (DSP, microcontroller, ASIP) and different communication schemes (fast links, non standard memory organization and access).Programming an MPSoC requires the generation of efficient software running on MPSoC from a high level environment, by using the characteristics of the architecture. This task is known to be tediou

  16. Application of the coupled code Athlet-Quabox/Cubbox for the extreme scenarios of the OECD/NRC BWR turbine trip benchmark and its performance on multi-processor computers

    International Nuclear Information System (INIS)

    Langenbuch, S.; Schmidt, K.D.; Velkov, K.

    2003-01-01

    The OECD/NRC BWR Turbine Trip (TT) Benchmark is investigated to perform code-to-code comparison of coupled codes including a comparison to measured data which are available from turbine trip experiments at Peach Bottom 2. This Benchmark problem for a BWR over-pressure transient represents a challenging application of coupled codes which integrate 3-dimensional neutron kinetics into thermal-hydraulic system codes for best-estimate simulation of plant transients. This transient represents a typical application of coupled codes which are usually performed on powerful workstations using a single CPU. Nowadays, the availability of multi-CPUs is much easier. Indeed, powerful workstations already provide 4 to 8 CPU, computer centers give access to multi-processor systems with numbers of CPUs in the order of 16 up to several 100. Therefore, the performance of the coupled code Athlet-Quabox/Cubbox on multi-processor systems is studied. Different cases of application lead to changing requirements of the code efficiency, because the amount of computer time spent in different parts of the code is varying. This paper presents main results of the coupled code Athlet-Quabox/Cubbox for the extreme scenarios of the BWR TT Benchmark together with evaluations of the code performance on multi-processor computers. (authors)

  17. Use of the CAMAC-MULTIBUS combined protocol for organizing multi-processor operation in a crate

    International Nuclear Information System (INIS)

    Glejbman, Eh.M.

    1985-01-01

    Problems of developing electronic units for large on-line systems for nuclear-physical experiments automation and developed on the base of principles of distributed control and data processing are discussed. Crates with simultaneous disposition and operation of CAMAC moduli (EUR-4100) and those realizing the MULTIBUS hardcopy log in dataway are described. It is attained due to sharing the CAMAC and the MULTIBUS hardcopy logs in the crate dataway. Application of job scheduler and executor moduli in the MULTIBUS interface permits to organize multiprocessor operation and to obtain separation of data stream as well as to increase total computational capacity in the crate

  18. Design concepts for a virtualizable embedded MPSoC architecture enabling virtualization in embedded multi-processor systems

    CERN Document Server

    Biedermann, Alexander

    2014-01-01

    Alexander Biedermann presents a generic hardware-based virtualization approach, which may transform an array of any off-the-shelf embedded processors into a multi-processor system with high execution dynamism. Based on this approach, he highlights concepts for the design of energy aware systems, self-healing systems as well as parallelized systems. For the latter, the novel so-called Agile Processing scheme is introduced by the author, which enables a seamless transition between sequential and parallel execution schemes. The design of such virtualizable systems is further aided by introduction

  19. A Taxonomy of Reconfigurable Single-/Multiprocessor Systems-on-Chip

    Directory of Open Access Journals (Sweden)

    Diana Göhringer

    2009-01-01

    Full Text Available Runtime adaptivity of hardware in processor architectures is a novel trend, which is under investigation in a variety of research labs all over the world. The runtime exchange of modules, implemented on a reconfigurable hardware, affects the instruction flow (e.g., in reconfigurable instruction set processors or the data flow, which has a strong impact on the performance of an application. Furthermore, the choice of a certain processor architecture related to the class of target applications is a crucial point in application development. A simple example is the domain of high-performance computing applications found in meteorology or high-energy physics, where vector processors are the optimal choice. A classification scheme for computer systems was provided in 1966 by Flynn where single/multiple data and instruction streams were combined to four types of architectures. This classification is now used as a foundation for an extended classification scheme including runtime adaptivity as further degree of freedom for processor architecture design. The developed scheme is validated by a multiprocessor system implemented on reconfigurable hardware as well as by a classification of existing static and reconfigurable processor systems.

  20. On a Multiprocessor Computer Farm for Online Physics Data Processing

    CERN Document Server

    Sinanis, N J

    1999-01-01

    The topic of this thesis is the design-phase performance evaluation of a large multiprocessor (MP) computer farm intended for the on-line data processing of the Compact Muon Solenoid (CMS) experiment. CMS is a high energy Physics experiment, planned to operate at CERN (Geneva, Switzerland) during the year 2005. The CMS computer farm is consisting of 1,000 MP computer systems and a 1,000 X 1,000 communications switch. The followed approach to the farm performance evaluation is through simulation studies and evaluation of small prototype systems building blocks of the farm. For the purposes of the simulation studies, we have developed a discrete-event, event-driven simulator that is capable to describe the high-level architecture of the farm and give estimates of the farm's performance. The simulator is designed in a modular way to facilitate the development of various modules that model the behavior of the farm building blocks in the desired level of detail. With the aid of this simulator, we make a particular...

  1. Solution of the Euler and Navier-Stokes equations on MIMD distributed memory multiprocessors using cyclic reduction

    International Nuclear Information System (INIS)

    Curchitser, E.N.; Pelz, R.B.; Marconi, F.

    1992-01-01

    The Euler and Navier-Stokes equations are solved for the steady, two-dimensional flow over a NACA 0012 airfoil using a 1024 node nCUBE/2 multiprocessor. Second-order, upwind-discretized difference equations are solved implicitly using ADI factorization. Parallel cyclic reduction is employed to solve the block tridiagonal systems. For realistic problems, communication times are negligible compared to calculation times. The processors are tightly synchronized, and their loads are well balanced. When the flux Jacobians flux are frozen, the wall-clock time for one implicit timestep is about equal to that of a multistage explicit scheme. 10 refs

  2. Performance of the coupled thermalhydraulics/neutron kinetics code R/P/C on workstation clusters and multiprocessor systems

    International Nuclear Information System (INIS)

    Hammer, C.; Paffrath, M.; Boeer, R.; Finnemann, H.; Jackson, C.J.

    1996-01-01

    The light water reactor core simulation code PANBOX has been coupled with the transient analysis code RELAP5 for the purpose of performing plant safety analyses with a three-dimensional (3-D) neutron kinetics model. The system has been parallelized to improve the computational efficiency. The paper describes the features of this system with emphasis on performance aspects. Performance results are given for different types of parallelization, i. e. for using an automatic parallelizing compiler, using the portable PVM platform on a workstation cluster, using PVM on a shared memory multiprocessor, and for using machine dependent interfaces. (author)

  3. E-Token Energy-Aware Proportionate Sharing Scheduling Algorithm for Multiprocessor Systems

    Directory of Open Access Journals (Sweden)

    Pasupuleti Ramesh

    2017-01-01

    Full Text Available WSN plays vital role from small range healthcare surveillance systems to largescale environmental monitoring. Its design for energy constrained applications is a challenging issue. Sensors in WSNs are projected to run separately for longer periods. It is of excessive cost to substitute exhausted batteries which is not even possible in antagonistic situations. Multiprocessors are used in WSNs for high performance scientific computing, where each processor is assigned the same or different workload. When the computational demands of the system increase then the energy efficient approaches play an important role to increase system lifetime. Energy efficiency is commonly carried out by using proportionate fair scheduler. This introduces abnormal overloading effect. In order to overcome the existing problems E-token Energy-Aware Proportionate Sharing (EEAPS scheduling is proposed here. The power consumption for each thread/task is calculated and the tasks are allotted to the multiple processors through the auctioning mechanism. The algorithm is simulated by using the real-time simulator (RTSIM and the results are tested.

  4. Multi-processor system for real-time deconvolution and flow estimation in medical ultrasound

    DEFF Research Database (Denmark)

    Jensen, Jesper Lomborg; Jensen, Jørgen Arendt; Stetson, Paul F.

    1996-01-01

    of the algorithms. Many of the algorithms can only be properly evaluated in a clinical setting with real-time processing, which generally cannot be done with conventional equipment. This paper therefore presents a multi-processor system capable of performing 1.2 billion floating point operations per second on RF...... filter is used with a second time-reversed recursive estimation step. Here it is necessary to perform about 70 arithmetic operations per RF sample or about 1 billion operations per second for real-time deconvolution. Furthermore, these have to be floating point operations due to the adaptive nature...... interfaced to our previously-developed real-time sampling system that can acquire RF data at a rate of 20 MHz and simultaneously transmit the data at 20 MHz to the processing system via several parallel channels. These two systems can, thus, perform real-time processing of ultrasound data. The advantage...

  5. Compilation time analysis to minimize run-time overhead in preemptive scheduling on multiprocessors

    Science.gov (United States)

    Wauters, Piet; Lauwereins, Rudy; Peperstraete, J.

    1994-10-01

    This paper describes a scheduling method for hard real-time Digital Signal Processing (DSP) applications, implemented on a multi-processor. Due to the very high operating frequencies of DSP applications (typically hundreds of kHz) runtime overhead should be kept as small as possible. Because static scheduling introduces very little run-time overhead it is used as much as possible. Dynamic pre-emption of tasks is allowed if and only if it leads to better performance in spite of the extra run-time overhead. We essentially combine static scheduling with dynamic pre-emption using static priorities. Since we are dealing with hard real-time applications we must be able to guarantee at compile-time that all timing requirements will be satisfied at run-time. We will show that our method performs at least as good as any static scheduling method. It also reduces the total amount of dynamic pre-emptions compared with run time methods like deadline monotonic scheduling.

  6. Trade-Off Exploration for Target Tracking Application in a Customized Multiprocessor Architecture

    Directory of Open Access Journals (Sweden)

    Yassin El-Hillali

    2009-01-01

    Full Text Available This paper presents the design of an FPGA-based multiprocessor-system-on-chip (MPSoC architecture optimized for Multiple Target Tracking (MTT in automotive applications. An MTT system uses an automotive radar to track the speed and relative position of all the vehicles (targets within its field of view. As the number of targets increases, the computational needs of the MTT system also increase making it difficult for a single processor to handle it alone. Our implementation distributes the computational load among multiple soft processor cores optimized for executing specific computational tasks. The paper explains how we designed and profiled the MTT application to partition it among different processors. It also explains how we applied different optimizations to customize the individual processor cores to their assigned tasks and to assess their impact on performance and FPGA resource utilization. The result is a complete MTT application running on an optimized MPSoC architecture that fits in a contemporary medium-sized FPGA and that meets the application's real-time constraints.

  7. 2-D fluid dynamics models for laser driven fusion on IBM 3090 vector multiprocessors

    International Nuclear Information System (INIS)

    Atzeni, S.

    1988-01-01

    Fluid-dynamics codes for laser fusion are complex research codes, consisting of many distinct modules and embodying a variety of numerical methods. They are therefore good candidates for testing general purpose advanced computer architectures and the related software. In this paper, after a brief outline of the basic concepts of laser fusion, the implementation of the 2-D laser fusion fluid code DUED on the IBM 3090 VF vector multiprocessors is discussed. Emphasis is put on parallelization, performed by means of IBM Parallel FORTRAN (PF). It is shown how different modules have been optimized by using different features of PF: i) modules based on depth-2 nested loops exploit automatic parallelization; ii) laser light ray tracing is partitioned by scheduling parallel ICCG algorithm (executed in parallel by appropiately synchronized parallel subroutines). Performance results are given for separate modules of the code, as well as for typical complete runs

  8. Intelligent discrete particle swarm optimization for multiprocessor task scheduling problem

    Directory of Open Access Journals (Sweden)

    S Sarathambekai

    2017-03-01

    Full Text Available Discrete particle swarm optimization is one of the most recently developed population-based meta-heuristic optimization algorithm in swarm intelligence that can be used in any discrete optimization problems. This article presents a discrete particle swarm optimization algorithm to efficiently schedule the tasks in the heterogeneous multiprocessor systems. All the optimization algorithms share a common algorithmic step, namely population initialization. It plays a significant role because it can affect the convergence speed and also the quality of the final solution. The random initialization is the most commonly used method in majority of the evolutionary algorithms to generate solutions in the initial population. The initial good quality solutions can facilitate the algorithm to locate the optimal solution or else it may prevent the algorithm from finding the optimal solution. Intelligence should be incorporated to generate the initial population in order to avoid the premature convergence. This article presents a discrete particle swarm optimization algorithm, which incorporates opposition-based technique to generate initial population and greedy algorithm to balance the load of the processors. Make span, flow time, and reliability cost are three different measures used to evaluate the efficiency of the proposed discrete particle swarm optimization algorithm for scheduling independent tasks in distributed systems. Computational simulations are done based on a set of benchmark instances to assess the performance of the proposed algorithm.

  9. Numerical fluid flow and heat transfer calculations on multiprocessor systems

    Energy Technology Data Exchange (ETDEWEB)

    Oehman, G.A.; Malen, T.E.; Kuusela, P.

    1989-01-01

    The first part of the report presents the basic principles of parallel processing, and factors influencing tbe efficiency of practical applications are discussed. In a multiprocessor computer, different parts of the program code are executed in parallel, i.e. simultaneous with respect to time, on different processors, and thus it becomes possible to decrease the overall computation time by a factor, which in the ideal case is equal to the number of processors. The application study starts from the numerical solution of the twodimesional Laplace equation, which describes the steady heat conduction in a solid plate and advances through the solution of the three dimensional Laplace equation to the case of study laminar fluid flow in a twodimensional box at Reynolds numbers up to 20. Hereby the stream function-vorticity method is first applied and the SIMPLER method. The conventional (sequential) numerical algoritms for these fluid flow and heat transfer problems are found not to be ideally suited for conversion to parallel computation, but sped-up ratios considerably above 50 % of the theoretical maximum are regularly achieved in the runs. The numerical procedures we coded in the OCCAM-2 language and the test runs were performed at who Akademi on the imperimental HATHI-computers containing 16 T4l4 and 100 INMOS T800 transputers respectively.

  10. Numerical fluid flow and heat transfer calculations on multiprocessor systems

    Energy Technology Data Exchange (ETDEWEB)

    Oehman, G.A.; Malen, T.E.; Kuusela, P.

    1989-12-31

    The first part of the report presents the basic principles of parallel processing, and factors influencing tbe efficiency of practical applications are discussed. In a multiprocessor computer, different parts of the program code are executed in parallel, i.e. simultaneous with respect to time, on different processors, and thus it becomes possible to decrease the overall computation time by a factor, which in the ideal case is equal to the number of processors. The application study starts from the numerical solution of the twodimesional Laplace equation, which describes the steady heat conduction in a solid plate and advances through the solution of the three dimensional Laplace equation to the case of study laminar fluid flow in a twodimensional box at Reynolds numbers up to 20. Hereby the stream function-vorticity method is first applied and the SIMPLER method. The conventional (sequential) numerical algoritms for these fluid flow and heat transfer problems are found not to be ideally suited for conversion to parallel computation, but sped-up ratios considerably above 50 % of the theoretical maximum are regularly achieved in the runs. The numerical procedures we coded in the OCCAM-2 language and the test runs were performed at who Akademi on the imperimental HATHI-computers containing 16 T4l4 and 100 INMOS T800 transputers respectively.

  11. Scheduling for energy and reliability management on multiprocessor real-time systems

    Science.gov (United States)

    Qi, Xuan

    Scheduling algorithms for multiprocessor real-time systems have been studied for years with many well-recognized algorithms proposed. However, it is still an evolving research area and many problems remain open due to their intrinsic complexities. With the emergence of multicore processors, it is necessary to re-investigate the scheduling problems and design/develop efficient algorithms for better system utilization, low scheduling overhead, high energy efficiency, and better system reliability. Focusing cluster schedulings with optimal global schedulers, we study the utilization bound and scheduling overhead for a class of cluster-optimal schedulers. Then, taking energy/power consumption into consideration, we developed energy-efficient scheduling algorithms for real-time systems, especially for the proliferating embedded systems with limited energy budget. As the commonly deployed energy-saving technique (e.g. dynamic voltage frequency scaling (DVFS)) will significantly affect system reliability, we study schedulers that have intelligent mechanisms to recuperate system reliability to satisfy the quality assurance requirements. Extensive simulation is conducted to evaluate the performance of the proposed algorithms on reduction of scheduling overhead, energy saving, and reliability improvement. The simulation results show that the proposed reliability-aware power management schemes could preserve the system reliability while still achieving substantial energy saving.

  12. An efficient communication scheme for solving Sn equations on message-passing multiprocessors

    International Nuclear Information System (INIS)

    Azmy, Y.Y.

    1993-01-01

    Early models of Intel's hypercube multiprocessors, e.g., the iPSC/1 and iPSC/2, were characterized by the high latency of message passing. This relatively weak dependence of the communication penalty on the size of messages, in contrast to its strong dependence on the number of messages, justified using the Fan-in Fan-out algorithm (which implements a minimum spanning tree path) to perform global operations, such as global sums, etc. Recent models of message-passing computers, such as the iPSC/860 and the Paragon, have been found to possess much smaller latency, thus forcing a reexamination of the issue of performance optimization with respect to communication schemes. Essentially, the Fan-in Fan-out scheme minimizes the number of nonsimultaneous messages sent but not the volume of data traffic across the network. Furthermore, if a global operation is performed in conjunction with the message passing, a large fraction of the attached nodes remains idle as the number of utilized processors is halved in each step of the process. On the other hand, the Recursive Halving scheme offers the smallest communication cost for global operations but has some drawbacks

  13. Operating experience with a VMEbus multiprocessor system for data acquisition and reduction in nuclear physics

    International Nuclear Information System (INIS)

    Kutt, P.H.; Balamuth, D.P.

    1989-01-01

    A multiprocessor system based on commercially available VMEbus components has been developed for the acquisition and reduction of event-mode data in nuclear physics experiments. The system contains seven 68000 CPU's and 14 MB of memory. A minimal operating system handles data transfer and task allocation, and a compiler for a specially designed event analysis language produces code for the processors. The system has been in operation for four years at the University of Pennsylvania Tandem Accelerator Laboratory. Computation rates over 3 times that of a MicroVAX II have been achieved at a fraction of the cost. The use of WORM optical disks for event recording allows the processing for gigabyte data sets without operator intervention. A more powerful system is being planned which will make use of recently developed RISC processors to obtain an order of magnitude increase in computing power per node

  14. Mapping of H.264 decoding on a multiprocessor architecture

    Science.gov (United States)

    van der Tol, Erik B.; Jaspers, Egbert G.; Gelderblom, Rob H.

    2003-05-01

    Due to the increasing significance of development costs in the competitive domain of high-volume consumer electronics, generic solutions are required to enable reuse of the design effort and to increase the potential market volume. As a result from this, Systems-on-Chip (SoCs) contain a growing amount of fully programmable media processing devices as opposed to application-specific systems, which offered the most attractive solutions due to a high performance density. The following motivates this trend. First, SoCs are increasingly dominated by their communication infrastructure and embedded memory, thereby making the cost of the functional units less significant. Moreover, the continuously growing design costs require generic solutions that can be applied over a broad product range. Hence, powerful programmable SoCs are becoming increasingly attractive. However, to enable power-efficient designs, that are also scalable over the advancing VLSI technology, parallelism should be fully exploited. Both task-level and instruction-level parallelism can be provided by means of e.g. a VLIW multiprocessor architecture. To provide the above-mentioned scalability, we propose to partition the data over the processors, instead of traditional functional partitioning. An advantage of this approach is the inherent locality of data, which is extremely important for communication-efficient software implementations. Consequently, a software implementation is discussed, enabling e.g. SD resolution H.264 decoding with a two-processor architecture, whereas High-Definition (HD) decoding can be achieved with an eight-processor system, executing the same software. Experimental results show that the data communication considerably reduces up to 65% directly improving the overall performance. Apart from considerable improvement in memory bandwidth, this novel concept of partitioning offers a natural approach for optimally balancing the load of all processors, thereby further improving the

  15. A Performance-Prediction Model for PIC Applications on Clusters of Symmetric MultiProcessors: Validation with Hierarchical HPF+OpenMP Implementation

    Directory of Open Access Journals (Sweden)

    Sergio Briguglio

    2003-01-01

    Full Text Available A performance-prediction model is presented, which describes different hierarchical workload decomposition strategies for particle in cell (PIC codes on Clusters of Symmetric MultiProcessors. The devised workload decomposition is hierarchically structured: a higher-level decomposition among the computational nodes, and a lower-level one among the processors of each computational node. Several decomposition strategies are evaluated by means of the prediction model, with respect to the memory occupancy, the parallelization efficiency and the required programming effort. Such strategies have been implemented by integrating the high-level languages High Performance Fortran (at the inter-node stage and OpenMP (at the intra-node one. The details of these implementations are presented, and the experimental values of parallelization efficiency are compared with the predicted results.

  16. A parallel row-based algorithm with error control for standard-cell replacement on a hypercube multiprocessor

    Science.gov (United States)

    Sargent, Jeff Scott

    1988-01-01

    A new row-based parallel algorithm for standard-cell placement targeted for execution on a hypercube multiprocessor is presented. Key features of this implementation include a dynamic simulated-annealing schedule, row-partitioning of the VLSI chip image, and two novel new approaches to controlling error in parallel cell-placement algorithms; Heuristic Cell-Coloring and Adaptive (Parallel Move) Sequence Control. Heuristic Cell-Coloring identifies sets of noninteracting cells that can be moved repeatedly, and in parallel, with no buildup of error in the placement cost. Adaptive Sequence Control allows multiple parallel cell moves to take place between global cell-position updates. This feedback mechanism is based on an error bound derived analytically from the traditional annealing move-acceptance profile. Placement results are presented for real industry circuits and the performance is summarized of an implementation on the Intel iPSC/2 Hypercube. The runtime of this algorithm is 5 to 16 times faster than a previous program developed for the Hypercube, while producing equivalent quality placement. An integrated place and route program for the Intel iPSC/2 Hypercube is currently being developed.

  17. Parallel-vector algorithms for particle simulations on shared-memory multiprocessors

    International Nuclear Information System (INIS)

    Nishiura, Daisuke; Sakaguchi, Hide

    2011-01-01

    Over the last few decades, the computational demands of massive particle-based simulations for both scientific and industrial purposes have been continuously increasing. Hence, considerable efforts are being made to develop parallel computing techniques on various platforms. In such simulations, particles freely move within a given space, and so on a distributed-memory system, load balancing, i.e., assigning an equal number of particles to each processor, is not guaranteed. However, shared-memory systems achieve better load balancing for particle models, but suffer from the intrinsic drawback of memory access competition, particularly during (1) paring of contact candidates from among neighboring particles and (2) force summation for each particle. Here, novel algorithms are proposed to overcome these two problems. For the first problem, the key is a pre-conditioning process during which particle labels are sorted by a cell label in the domain to which the particles belong. Then, a list of contact candidates is constructed by pairing the sorted particle labels. For the latter problem, a table comprising the list indexes of the contact candidate pairs is created and used to sum the contact forces acting on each particle for all contacts according to Newton's third law. With just these methods, memory access competition is avoided without additional redundant procedures. The parallel efficiency and compatibility of these two algorithms were evaluated in discrete element method (DEM) simulations on four types of shared-memory parallel computers: a multicore multiprocessor computer, scalar supercomputer, vector supercomputer, and graphics processing unit. The computational efficiency of a DEM code was found to be drastically improved with our algorithms on all but the scalar supercomputer. Thus, the developed parallel algorithms are useful on shared-memory parallel computers with sufficient memory bandwidth.

  18. Multi-processor developments in the United States for future high energy physics experiments and accelerators

    International Nuclear Information System (INIS)

    Gaines, I.

    1988-03-01

    The use of multi-processors for analysis and high-level triggering in High Energy Physics experiments, pioneered by the early emulator systems, has reached maturity, in particular with the multiple microprocessor systems in use at Fermilab. It is widely acknowledged that such systems will fulfill the major portion of the computing needs of future large experiments. Recent developments at Fermilab's Advanced Computer Program will make such systems even more powerful, cost-effective, and easier to use than they are at present. The next generation of microprocessors, already available, will provide CPU power of about one VAX 780 equivalent/$300, while supporting most VMS FORTRAN extensions and large (>8MB) amounts of memory. Low cost high density mass storage devices (based on video tape cartridge technology) will allow parallel I/O to remove potential I/O bottlenecks in systems of over 1000 VAX equipment processors. New interconnection schemes and system software will allow more flexible topologies and extremely high data bandwidth, especially for on-line systems. This talk will summarize the work at the Advanced Computer Program and the rest of the US in this field. 3 refs., 4 figs

  19. Parallel implementation and evaluation of motion estimation system algorithms on a distributed memory multiprocessor using knowledge based mappings

    Science.gov (United States)

    Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.

    1989-01-01

    Several techniques to perform static and dynamic load balancing techniques for vision systems are presented. These techniques are novel in the sense that they capture the computational requirements of a task by examining the data when it is produced. Furthermore, they can be applied to many vision systems because many algorithms in different systems are either the same, or have similar computational characteristics. These techniques are evaluated by applying them on a parallel implementation of the algorithms in a motion estimation system on a hypercube multiprocessor system. The motion estimation system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from different time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters. It is shown that the performance gains when these data decomposition and load balancing techniques are used are significant and the overhead of using these techniques is minimal.

  20. A pipelined architecture for real time correction of non-uniformity in infrared focal plane arrays imaging system using multiprocessors

    Science.gov (United States)

    Zou, Liang; Fu, Zhuang; Zhao, YanZheng; Yang, JunYan

    2010-07-01

    This paper proposes a kind of pipelined electric circuit architecture implemented in FPGA, a very large scale integrated circuit (VLSI), which efficiently deals with the real time non-uniformity correction (NUC) algorithm for infrared focal plane arrays (IRFPA). Dual Nios II soft-core processors and a DSP with a 64+ core together constitute this image system. Each processor undertakes own systematic task, coordinating its work with each other's. The system on programmable chip (SOPC) in FPGA works steadily under the global clock frequency of 96Mhz. Adequate time allowance makes FPGA perform NUC image pre-processing algorithm with ease, which has offered favorable guarantee for the work of post image processing in DSP. And at the meantime, this paper presents a hardware (HW) and software (SW) co-design in FPGA. Thus, this systematic architecture yields an image processing system with multiprocessor, and a smart solution to the satisfaction with the performance of the system.

  1. Evict on write, a management strategy for a prefetch unit and/or first level cache in a multiprocessor system with speculative execution

    Science.gov (United States)

    Gara, Alan; Ohmacht, Martin

    2014-09-16

    In a multiprocessor system with at least two levels of cache, a speculative thread may run on a core processor in parallel with other threads. When the thread seeks to do a write to main memory, this access is to be written through the first level cache to the second level cache. After the write though, the corresponding line is deleted from the first level cache and/or prefetch unit, so that any further accesses to the same location in main memory have to be retrieved from the second level cache. The second level cache keeps track of multiple versions of data, where more than one speculative thread is running in parallel, while the first level cache does not have any of the versions during speculation. A switch allows choosing between modes of operation of a speculation blind first level cache.

  2. ePRO-MP: A Tool for Profiling and Optimizing Energy and Performance of Mobile Multiprocessor Applications

    Directory of Open Access Journals (Sweden)

    Wonil Choi

    2009-01-01

    Full Text Available For mobile multiprocessor applications, achieving high performance with low energy consumption is a challenging task. In order to help programmers to meet these design requirements, system development tools play an important role. In this paper, we describe one such development tool, ePRO-MP, which profiles and optimizes both performance and energy consumption of multi-threaded applications running on top of Linux for ARM11 MPCore-based embedded systems. One of the key features of ePRO-MP is that it can accurately estimate the energy consumption of multi-threaded applications without requiring a power measurement equipment, using a regression-based energy model. We also describe another key benefit of ePRO-MP, an automatic optimization function, using two example problems. Using the automatic optimization function, ePRO-MP can achieve high performance and low power consumption without programmer intervention. Our experimental results show that ePRO-MP can improve the performance and energy consumption by 6.1% and 4.1%, respectively, over a baseline version for the co-running applications optimization example. For the producer-consumer application optimization example, ePRO-MP improves the performance and energy consumption by 60.5% and 43.3%, respectively over a baseline version.

  3. Multiprocessor systems for real-time data acquisition on the Asdex upgrade and future plasma experiments

    International Nuclear Information System (INIS)

    Zilker, M.; Hallatschek, K.; Heimann, P.; Hertweck, F.

    1999-01-01

    In this paper we present our transputer-based multitop multiprocessor systems for data acquisition, which are currently used on the Asdex upgrade experiment. The bandwidth of these systems goes from low-speed like the calorimetry diagnostic up to highspeed and large data volume systems like the soft-X-ray and Mirnov diagnostics, which collect several hundreds of megabytes of data during a plasma discharge of ∼8 s. Further, we present the multitop-MX, a newly developed system based on transputers and powerPCs, which provides real-time facilities for analysing the acquired data, to generate necessary information for the dynamic adaptation of sample rates, and to deliver triggers when certain events in the plasma are detected. The algorithm running on the powerPCs performs a wavelet like time-frequency transform. In the last part we give an outlook how to build the next generation of data acquisition systems to be used on the future plasma experiments W7-X and ITER, but also on Asdex upgrade. The hardware of these new distributed systems should be mainly based on established industry standards like the VME-bus, PCI-bus and FiberChannel, but also emerging technologies like SCI (scalable coherent interconnect) should be considered. The systems software should be well designed with object oriented methods to simplify the maintenance process and to enable further expansions and adaptations to new problems in an easy way. (orig.)

  4. Optimal data replication: A new approach to optimizing parallel EM algorithms on a mesh-connected multiprocessor for 3D PET image reconstruction

    International Nuclear Information System (INIS)

    Chen, C.M.; Lee, S.Y.

    1995-01-01

    The EM algorithm promises an estimated image with the maximal likelihood for 3D PET image reconstruction. However, due to its long computation time, the EM algorithm has not been widely used in practice. While several parallel implementations of the EM algorithm have been developed to make the EM algorithm feasible, they do not guarantee an optimal parallelization efficiency. In this paper, the authors propose a new parallel EM algorithm which maximizes the performance by optimizing data replication on a mesh-connected message-passing multiprocessor. To optimize data replication, the authors have formally derived the optimal allocation of shared data, group sizes, integration and broadcasting of replicated data as well as the scheduling of shared data accesses. The proposed parallel EM algorithm has been implemented on an iPSC/860 with 16 PEs. The experimental and theoretical results, which are consistent with each other, have shown that the proposed parallel EM algorithm could improve performance substantially over those using unoptimized data replication

  5. Optimizing survivability of multi-state systems with multi-level protection by multi-processor genetic algorithm

    International Nuclear Information System (INIS)

    Levitin, Gregory; Dai Yuanshun; Xie Min; Leng Poh, Kim

    2003-01-01

    In this paper we consider vulnerable systems which can have different states corresponding to different combinations of available elements composing the system. Each state can be characterized by a performance rate, which is the quantitative measure of a system's ability to perform its task. Both the impact of external factors (stress) and internal causes (failures) affect system survivability, which is determined as probability of meeting a given demand. In order to increase the survivability of the system, a multi-level protection is applied to its subsystems. This means that a subsystem and its inner level of protection are in their turn protected by the protection of an outer level. This double-protected subsystem has its outer protection and so forth. In such systems, the protected subsystems can be destroyed only if all of the levels of their protection are destroyed. Each level of protection can be destroyed only if all of the outer levels of protection are destroyed. We formulate the problem of finding the structure of series-parallel multi-state system (including choice of system elements, choice of structure of multi-level protection and choice of protection methods) in order to achieve a desired level of system survivability by the minimal cost. An algorithm based on the universal generating function method is used for determination of the system survivability. A multi-processor version of genetic algorithm is used as optimization tool in order to solve the structure optimization problem. An application example is presented to illustrate the procedure presented in this paper

  6. Computational design of RNA parts, devices, and transcripts with kinetic folding algorithms implemented on multiprocessor clusters.

    Science.gov (United States)

    Thimmaiah, Tim; Voje, William E; Carothers, James M

    2015-01-01

    With progress toward inexpensive, large-scale DNA assembly, the demand for simulation tools that allow the rapid construction of synthetic biological devices with predictable behaviors continues to increase. By combining engineered transcript components, such as ribosome binding sites, transcriptional terminators, ligand-binding aptamers, catalytic ribozymes, and aptamer-controlled ribozymes (aptazymes), gene expression in bacteria can be fine-tuned, with many corollaries and applications in yeast and mammalian cells. The successful design of genetic constructs that implement these kinds of RNA-based control mechanisms requires modeling and analyzing kinetically determined co-transcriptional folding pathways. Transcript design methods using stochastic kinetic folding simulations to search spacer sequence libraries for motifs enabling the assembly of RNA component parts into static ribozyme- and dynamic aptazyme-regulated expression devices with quantitatively predictable functions (rREDs and aREDs, respectively) have been described (Carothers et al., Science 334:1716-1719, 2011). Here, we provide a detailed practical procedure for computational transcript design by illustrating a high throughput, multiprocessor approach for evaluating spacer sequences and generating functional rREDs. This chapter is written as a tutorial, complete with pseudo-code and step-by-step instructions for setting up a computational cluster with an Amazon, Inc. web server and performing the large numbers of kinefold-based stochastic kinetic co-transcriptional folding simulations needed to design functional rREDs and aREDs. The method described here should be broadly applicable for designing and analyzing a variety of synthetic RNA parts, devices and transcripts.

  7. Hybrid Simulation of the Interaction of Europa's Atmosphere with the Jovian Plasma: Multiprocessor Simulations

    Science.gov (United States)

    Dols, V. J.; Delamere, P. A.; Bagenal, F.; Cassidy, T. A.; Crary, F. J.

    2014-12-01

    We model the interaction of Europa's tenuous atmosphere with the plasma of Jupiter's torus with an improved version of our hybrid plasma code. In a hybrid plasma code, the ions are treated as kinetic Macro-particles moving under the Lorentz force and the electrons as a fluid leading to a generalized formulation of Ohm's law. In this version, the spatial simulation domain is decomposed in 2 directions and is non-uniform in the plasma convection direction. The code is run on a multi-processor supercomputer that offers 16416 cores and 2GB Ram per core. This new version allows us to tap into the large memory of the supercomputer and simulate the full interaction volume (Reuropa=1561km) with a high spatial resolution (50km). Compared to Io, Europa's atmosphere is about 100 times more tenuous, the ambient magnetic field is weaker and the density of incident plasma is lower. Consequently, the electrodynamic interaction is also weaker and substantial fluxes of thermal torus ions might reach and sputter the icy surface. Molecular O2 is the dominant atmospheric product of this surface sputtering. Observations of oxygen UV emissions (specifically the ratio of OI 1356A / 1304A emissions) are roughly consistent with an atmosphere that is composed predominantely of O2 with a small amount of atomic O. Galileo observations along flybys close to Europa have revealed the existence of induced currents in a conducting ocean under the icy crust. They also showed that, from flyby to flyby, the plasma interaction is very variable. Asymmetries of the plasma density and temperature in the wake of Europa were also observed and still elude a clear explanation. Galileo mag data also detected ion cyclotron waves, which is an indication of heavy ion pickup close to the moon. We prescribe an O2 atmosphere with a vertical density column consistent with UV observations and model the plasma properties along several Galileo flybys of the moon. We compare our results with the magnetometer

  8. Rt-Space: A Real-Time Stochastically-Provisioned Adaptive Container Environment

    Science.gov (United States)

    2017-08-04

    Real-Time Systems (ECRTS) Conference Location: Toulouse, France Paper Title: Multiprocessor Real-Time Locking Protocols for Replicated Resources...Conference Location: Lille, France Paper Title: A Contention-Sensitive Fine-Grained Locking Protocol for Multiprocessor Real-Time Systems Publication...On the Soft Real-Time Optimality of Global EDF on Multiprocessors: From Identical to Uniform Heterogeneous Publication Type: Conference Paper or

  9. Diseño e Implementación de un Multiprocessor Systems-on-Chip (MPSoC Interconectado por una Networks-on-Chip (NoC

    Directory of Open Access Journals (Sweden)

    Wilson Mauricio Chicaiza

    2013-11-01

    Full Text Available En el presente documento se presenta una breve caracterización de los medios de comunicación empleados en arquitecturas multiprocesadas. Esta caracterización tiene como objetivo principal el mostrar un nuevo modelo de comunicación basado en conmutación de paquetes a los cuales se les denomina como Networks-On-Chip (NoC. Esta publicación muestra una arquitectura de red llamada NoC Hermes, la cual fue interconectada a un Multiprocessor-Systems-on-Chip (MPSoC compuesto de cuatro procesadores MicroBlaze. Está conexión se la realizó gracias al diseño y desarrollo de una Interfaz de Red generada en código VHDL. Por medio de la Interfaz de Red se consiguió que los procesadores MicroBlaze interactúen con los Switches de Hermes a fin de crear una arquitectura multiprocesada interconectada por una NoC. Con el motivo de realizar comparaciones también se creó otra arquitectura de multiprocesadores interconectados por buses. Para ambas arquitecturas se desarrolló una aplicación de Esteganografía enla que existe multiprocesamiento de dos procesadores trabajando simultáneamente. Lamentablemente sobre dicha aplicación no fue posible medir directamente la latencia y el consumo de energía, razón por la cual se utilizó simuladores que permitieron estimar dichas mediciones.

  10. The FORCE: A portable parallel programming language supporting computational structural mechanics

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Brehm, Juergen; Ramanan, Aruna

    1989-01-01

    This project supports the conversion of codes in Computational Structural Mechanics (CSM) to a parallel form which will efficiently exploit the computational power available from multiprocessors. The work is a part of a comprehensive, FORTRAN-based system to form a basis for a parallel version of the NICE/SPAR combination which will form the CSM Testbed. The software is macro-based and rests on the force methodology developed by the principal investigator in connection with an early scientific multiprocessor. Machine independence is an important characteristic of the system so that retargeting it to the Flex/32, or any other multiprocessor on which NICE/SPAR might be imnplemented, is well supported. The principal investigator has experience in producing parallel software for both full and sparse systems of linear equations using the force macros. Other researchers have used the Force in finite element programs. It has been possible to rapidly develop software which performs at maximum efficiency on a multiprocessor. The inherent machine independence of the system also means that the parallelization will not be limited to a specific multiprocessor.

  11. Enabling MPSoC design space exploration on FPGAs

    NARCIS (Netherlands)

    Shabbir, A.; Kumar, A.; Mesman, B.; Corporaal, H.; Hussain, D.M.A.; Rajput, A.Q.K.; Chowdhry, B.S.; Gee, Q.

    2009-01-01

    Future applications for embedded systems demand chip multiprocessor designs to meet real-time deadlines. These multiprocessors are increasingly becoming heterogeneous for reasons of cost and power. Design space exploration (DSE) of application mapping becomes a major design decision in such systems.

  12. Solution of equations of electric power networks in multiprocessor computers; Solucao de equacoes de redes de energia eletrica em computadores multiprocessadores

    Energy Technology Data Exchange (ETDEWEB)

    Feltrin, Antonio Padilha

    1991-05-01

    This thesis describes a methodology for decomposing the repeat solution process the equation Ax = b independent tasks to be done in parallel, based on the matrix inverse factors (W-matrix) with partitions. The partitioning scheme proposed in this thesis consists of breaking up the W-matrix according the depths of the factorization path tree. In this scheme, all the information needed to generate the partitions can be obtained straightforward from the network factorization path tree. The partitioning algorithm is simple and ease to implement. The elements of W, matrices, except the last partition, can be obtained directly from L-matrix elements, not requiring extra work. The proposed scheme guarantees that additional fills be only created in the last partition. The forward and backward solutions are performed by rows, and the strategy proposed is to schedule on beach processor the operations corresponding to a row of each partition. It should be kept in mind that the multiprocessor environments are equipped with powerful unit processor and then it seems a sound strategy to perform the multi-add elementary operations inside the hardware in order to exploit its computing efficiency. This strategy seeks to match the parallel algorithm to the parallel architecture. The precedent relations - that give rise to delays - are replaced by multi-add operations performed inside the processor mode without external communication. In the partition, the forward, diagonal and backward solutions may be gathered, and so all the operations can be expressed as the product of matrix W{sub lp} by the update components of vector b. The performance results show that the potential speedup of the solution time is essentially bounded by the floating point operation capability of each processor, denoting that the methodology is a suitable way to exploit the growing power of the computing technology. 46 figs, 40 refs, 49 tabs.

  13. The Galley Parallel File System

    Science.gov (United States)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.

  14. The communication processor of TUMULT-64

    NARCIS (Netherlands)

    Smit, Gerardus Johannes Maria; Jansen, P.G.

    1988-01-01

    Tumult (Twente University MULTi-processor system) is a modular extendible multi-processor system designed and implemented at the Twente University of Technology in co-operation with Oce Nederland B.V. and the Dr. Neher Laboratories (Dutch PTT). Characteristics of the hardware are: MIMD type,

  15. Analyzing composability of applications on MPSoC platforms

    NARCIS (Netherlands)

    Kumar, A.; Mesman, B.; Theelen, B.D.; Corporaal, H.; Yajun, H.

    2008-01-01

    Modern day applications require use of multi-processor systems for reasons of erformance, scalability and power efficiency. As more and more applications are integrated in a single system, mapping and analyzing them on a multi-processor platform becomes a multidimensional problem. Each possible set

  16. Automated Integration of Dedicated Hardwired IP Cores in Heterogeneous MPSoCs Designed with ESPAM

    Directory of Open Access Journals (Sweden)

    Ed Deprettere

    2008-06-01

    Full Text Available This paper presents a methodology and techniques for automated integration of dedicated hardwired (HW IP cores into heterogeneous multiprocessor systems. We propose an IP core integration approach based on an HW module generation that consists of a wrapper around a predefined IP core. This approach has been implemented in a tool called ESPAM for automated multiprocessor system design, programming, and implementation. In order to keep high performance of the integrated IP cores, the structure of the IP core wrapper is devised in a way that adequately represents and efficiently implements the main characteristics of the formal model of computation, namely, Kahn process networks, we use as an underlying programming model in ESPAM. We present details about the structure of the HW module, the supported types of IP cores, and the minimum interfaces these IP cores have to provide in order to allow automated integration in heterogeneous multiprocessor systems generated by ESPAM. The ESPAM design flow, the multiprocessor platforms we consider, and the underlying programming (KPN model are introduced as well. Furthermore, we present the efficiency of our approach by applying our methodology and ESPAM tool to automatically generate, implement, and program heterogeneous multiprocessor systems that integrate dedicated IP cores and execute real-life applications.

  17. Overview of the Force Scientific Parallel Language

    Directory of Open Access Journals (Sweden)

    Gita Alaghband

    1994-01-01

    Full Text Available The Force parallel programming language designed for large-scale shared-memory multiprocessors is presented. The language provides a number of parallel constructs as extensions to the ordinary Fortran language and is implemented as a two-level macro preprocessor to support portability across shared memory multiprocessors. The global parallelism model on which the Force is based provides a powerful parallel language. The parallel constructs, generic synchronization, and freedom from process management supported by the Force has resulted in structured parallel programs that are ported to the many multiprocessors on which the Force is implemented. Two new parallel constructs for looping and functional decomposition are discussed. Several programming examples to illustrate some parallel programming approaches using the Force are also presented.

  18. MULTITASKER, Multitasking Kernel for C and FORTRAN Under UNIX

    International Nuclear Information System (INIS)

    Brooks, E.D. III

    1988-01-01

    1 - Description of program or function: MULTITASKER implements a multitasking kernel for the C and FORTRAN programming languages that runs under UNIX. The kernel provides a multitasking environment which serves two purposes. The first is to provide an efficient portable environment for the development, debugging, and execution of production multiprocessor programs. The second is to provide a means of evaluating the performance of a multitasking program on model multiprocessor hardware. The performance evaluation features require no changes in the application program source and are implemented as a set of compile- and run-time options in the kernel. 2 - Method of solution: The FORTRAN interface to the kernel is identical in function to the CRI multitasking package provided for the Cray XMP. This provides a migration path to high speed (but small N) multiprocessors once the application has been coded and debugged. With use of the UNIX m4 macro preprocessor, source compatibility can be achieved between the UNIX code development system and the target Cray multiprocessor. The kernel also provides a means of evaluating a program's performance on model multiprocessors. Execution traces may be obtained which allow the user to determine kernel overhead, memory conflicts between various tasks, and the average concurrency being exploited. The kernel may also be made to switch tasks every cpu instruction with a random execution ordering. This allows the user to look for unprotected critical regions in the program. These features, implemented as a set of compile- and run-time options, cause extra execution overhead which is not present in the standard production version of the kernel

  19. Power Minimization for Parallel Real-Time Systems with Malleable Jobs and Homogeneous Frequencies

    OpenAIRE

    Paolillo, Antonio; Goossens, Joël; Hettiarachchi, Pradeep M.; Fisher, Nathan

    2014-01-01

    In this work, we investigate the potential benefit of parallelization for both meeting real-time constraints and minimizing power consumption. We consider malleable Gang scheduling of implicit-deadline sporadic tasks upon multiprocessors. By extending schedulability criteria for malleable jobs to DVFS-enabled multiprocessor platforms, we are able to derive an offline polynomial-time optimal processor/frequency-selection algorithm. Simulations of our algorithm on randomly generated task system...

  20. Static Mapping of Functional Programs: An Example in Signal Processing

    Directory of Open Access Journals (Sweden)

    Jack B. Dennis

    1996-01-01

    Full Text Available Complex signal-processing problems are naturally described by compositions of program modules that process streams of data. In this article we discuss how such compositions may be analyzed and mapped onto multiprocessor computers to effectively exploit the massive parallelism of these applications. The methods are illustrated with an example of signal processing for an optical surveillance problem. Program transformation and analysis are used to construct a program description tree that represents the given computation as an acyclic interconnection of stream-processing modules. Each module may be mapped to a set of threads run on a group of processing elements of a target multiprocessor. Performance is considered for two forms of multiprocessor architecture, one based on conventional DSP technology and the other on a multithreaded-processing element design.

  1. On the Scalability of Time-predictable Chip-Multiprocessing

    DEFF Research Database (Denmark)

    Puffitsch, Wolfgang; Schoeberl, Martin

    2012-01-01

    Real-time systems need a time-predictable execution platform to be able to determine the worst-case execution time statically. In order to be time-predictable, several advanced processor features, such as out-of-order execution and other forms of speculation, have to be avoided. However, just using...... simple processors is not an option for embedded systems with high demands on computing power. In order to provide high performance and predictability we argue to use multiprocessor systems with a time-predictable memory interface. In this paper we present the scalability of a Java chip......-multiprocessor system that is designed to be time-predictable. Adding time-predictable caches is mandatory to achieve scalability with a shared memory multi-processor system. As Java bytecode retains information about the nature of memory accesses, it is possible to implement a memory hierarchy that takes...

  2. Programming parallel architectures - The BLAZE family of languages

    Science.gov (United States)

    Mehrotra, Piyush

    1989-01-01

    This paper gives an overview of the various approaches to programming multiprocessor architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive, since they remove much of the burden of exploiting parallel architectures from the user. This paper also describes recent work in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described.

  3. The FORCE - A highly portable parallel programming language

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.

  4. A CAMAC-VME-Macintosh data acquisition system for nuclear experiments

    Science.gov (United States)

    Anzalone, A.; Giustolisi, F.

    1989-10-01

    A multiprocessor system for data acquisition and analysis in low-energy nuclear physics has been realized. The system is built around CAMAC, the VMEbus, and the Macintosh PC. Multiprocessor software has been developed, using RTF, MACsys, and CERN cross-software. The execution of several programs that run on several VME CPUs and on an external PC is coordinated by a mailbox protocol. No operating system is used on the VME CPUs. The hardware, software, and system performance are described.

  5. A Camac-VME-MacIntosh data acquisition system for nuclear experiments

    International Nuclear Information System (INIS)

    Anzalone, A.; Giustolisi, F.

    1989-01-01

    A multiprocessor system for data acquisition and analysis in low energy nuclear physics has been realized at the Laboratorio Nazionale del Sud. the system is built around Camac, VME-bus and MacIntosh PC. A multiprocessor software has been developed, using RTF, Macsys and Cern cross-software. The execution of several programs which run on several VME-CPU's and on an external PC, is coordinated by a mail box protocol. No operating system is used on the VME-CPU's

  6. 3-dimensional magnetotelluric inversion including topography using deformed hexahedral edge finite elements and direct solvers parallelized on symmetric multiprocessor computers - Part II: direct data-space inverse solution

    Science.gov (United States)

    Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.

    2016-01-01

    Following the creation described in Part I of a deformable edge finite-element simulator for 3-D magnetotelluric (MT) responses using direct solvers, in Part II we develop an algorithm named HexMT for 3-D regularized inversion of MT data including topography. Direct solvers parallelized on large-RAM, symmetric multiprocessor (SMP) workstations are used also for the Gauss-Newton model update. By exploiting the data-space approach, the computational cost of the model update becomes much less in both time and computer memory than the cost of the forward simulation. In order to regularize using the second norm of the gradient, we factor the matrix related to the regularization term and apply its inverse to the Jacobian, which is done using the MKL PARDISO library. For dense matrix multiplication and factorization related to the model update, we use the PLASMA library which shows very good scalability across processor cores. A synthetic test inversion using a simple hill model shows that including topography can be important; in this case depression of the electric field by the hill can cause false conductors at depth or mask the presence of resistive structure. With a simple model of two buried bricks, a uniform spatial weighting for the norm of model smoothing recovered more accurate locations for the tomographic images compared to weightings which were a function of parameter Jacobians. We implement joint inversion for static distortion matrices tested using the Dublin secret model 2, for which we are able to reduce nRMS to ˜1.1 while avoiding oscillatory convergence. Finally we test the code on field data by inverting full impedance and tipper MT responses collected around Mount St Helens in the Cascade volcanic chain. Among several prominent structures, the north-south trending, eruption-controlling shear zone is clearly imaged in the inversion.

  7. Quantitative Performance Analysis of the SPEC OMPM2001 Benchmarks

    Directory of Open Access Journals (Sweden)

    Vishal Aslot

    2003-01-01

    Full Text Available The state of modern computer systems has evolved to allow easy access to multiprocessor systems by supporting multiple processors on a single physical package. As the multiprocessor hardware evolves, new ways of programming it are also developed. Some inventions may merely be adopting and standardizing the older paradigms. One such evolving standard for programming shared-memory parallel computers is the OpenMP API. The Standard Performance Evaluation Corporation (SPEC has created a suite of parallel programs called SPEC OMP to compare and evaluate modern shared-memory multiprocessor systems using the OpenMP standard. We have studied these benchmarks in detail to understand their performance on a modern architecture. In this paper, we present detailed measurements of the benchmarks. We organize, summarize, and display our measurements using a Quantitative Model. We present a detailed discussion and derivation of the model. Also, we discuss the important loops in the SPEC OMPM2001 benchmarks and the reasons for less than ideal speedup on our platform.

  8. Dynamic file-access characteristics of a production parallel scientific workload

    Science.gov (United States)

    Kotz, David; Nieuwejaar, Nils

    1994-01-01

    Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to tremendous amounts of data in parallel to hundreds or thousands of processors. Most successful systems are based on a solid understanding of the expected workload, but thus far there have been no comprehensive workload characterizations of multiprocessor file systems. This paper presents the results of a three week tracing study in which all file-related activity on a massively parallel computer was recorded. Our instrumentation differs from previous efforts in that it collects information about every I/O request and about the mix of jobs running in a production environment. We also present the results of a trace-driven caching simulation and recommendations for designers of multiprocessor file systems.

  9. New kind of user interface for controlling MFTF diagnostics

    International Nuclear Information System (INIS)

    Preckshot, G.G.; Saroyan, R.A.; Mead, J.E.

    1983-01-01

    The Mirror Fusion Test Facility (MFTF) at Lawrence Livermore National Laboratory is faced with the problem of controlling a multitude of plasma diagnostics instruments from a central, multiprocessor computer facility. A 16-bit microprocessor-based workstation allows each physicist entree into the central multiprocessor, which consists of nine Perkin-Elmer 32-bit minicomputers. The workstation provides the user interface to the larger system, with display graphics, windowing, and a physics notebook. Controlling a diagnostic is now equivalent to making entries into a traditional physics notebook

  10. A new kind of user interface for controlling MFTF diagnostics

    International Nuclear Information System (INIS)

    Preckshot, G.; Mead, J.; Saroyan, R.

    1983-01-01

    The Mirror Fusion Test Facility (MFTF) at Lawrence Livermore National Laboratory is faced with the problem of controlling a multitude of plasma diagnostics instruments from a central, multiprocessor computer facility. A 16-bit microprocessor-based workstation allows each physicist entree into the central multiprocessor, which consists of nine Perkin-Elmer 32-bit minicomputers. The workstation provides the user interface to the larger system, with display graphics, windowing, and a physics notebook. Controlling a diagnostic is now equivalent to making entries into a traditional physics notebook

  11. Programming parallel architectures: The BLAZE family of languages

    Science.gov (United States)

    Mehrotra, Piyush

    1988-01-01

    Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.

  12. Method for prefetching non-contiguous data structures

    Science.gov (United States)

    Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Brewster, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

    2009-05-05

    A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefect rather than some other predictive algorithm. This enables hardware to effectively prefect memory access patterns that are non-contiguous, but repetitive.

  13. Bringing high-performance computing to the biologist's workbench: approaches, applications, and challenges

    International Nuclear Information System (INIS)

    Oehmen, C S; Cannon, W R

    2008-01-01

    Data-intensive and high-performance computing are poised to significantly impact the future of biological research which is increasingly driven by the prevalence of high-throughput experimental methodologies for genome sequencing, transcriptomics, proteomics, and other areas. Large centers such as NIH's National Center for Biotechnology Information, The Institute for Genomic Research, and the DOE's Joint Genome Institute) have made extensive use of multiprocessor architectures to deal with some of the challenges of processing, storing and curating exponentially growing genomic and proteomic datasets, thus enabling users to rapidly access a growing public data source, as well as use analysis tools transparently on high-performance computing resources. Applying this computational power to single-investigator analysis, however, often relies on users to provide their own computational resources, forcing them to endure the learning curve of porting, building, and running software on multiprocessor architectures. Solving the next generation of large-scale biology challenges using multiprocessor machines-from small clusters to emerging petascale machines-can most practically be realized if this learning curve can be minimized through a combination of workflow management, data management and resource allocation as well as intuitive interfaces and compatibility with existing common data formats

  14. Microcomputers with higher processing capacity

    Energy Technology Data Exchange (ETDEWEB)

    1982-01-01

    Loosely-coupled multiprocessor systems comprise a number of modules with specialised functions such as arithmetic processor, bit-slice processors, single-chip peripheral processors, usually provided with local resources (I/O units, memories) and forming an integrated network. The topology of three typical multiprocessor systems are discussed: coupling via I/O modules, common memory and DMA. I/O coupled systems have serial transmission, often in master/slave configuration. Several processors may have a common memory and for fast exchange of data between processors, the DMA coupling is used. Configurations for local networks (star, loop, ring) are surveyed and dictionary of terms (LAN, SDLC, HDLC etc.) is provided.

  15. Analyse af problemstillinger i moderne produktionssystemer ved hjælp af operationsanalytiske metoder

    DEFF Research Database (Denmark)

    Paulli, Jan

    This PhD thesis consists of 6 volumes, the first being the above-mentioned Resumé. Vol. 2: Some Aspects of Applying Simulated Annealing and Tabu Search to the Quadratic Assignment Problem. Vol. 3: A Hierarchical Approach for the FMS Scheduling Problem Vol. 4: Solving the General Multiprocessor Job-shop...... Scheduling Problem (co-author: S. Dauzèrérès) Vol. 5: A Global Tabu Search Procedure for the General Multiprocessor Job-shop Scheduling Problem (co-author: S. Dauzère-Pérès) Vol. 6: Makespan Estimations in Flexible Manufacturing Systems (co-author: J.C. Fransoo, T.G. de Kok)...

  16. Preliminary design of an advanced programmable digital filter network for large passive acoustic ASW systems. [Parallel processor

    Energy Technology Data Exchange (ETDEWEB)

    McWilliams, T.; Widdoes, Jr., L. C.; Wood, L.

    1976-09-30

    The design of an extremely high performance programmable digital filter of novel architecture, the LLL Programmable Digital Filter, is described. The digital filter is a high-performance multiprocessor having general purpose applicability and high programmability; it is extremely cost effective either in a uniprocessor or a multiprocessor configuration. The architecture and instruction set of the individual processor was optimized with regard to the multiple processor configuration. The optimal structure of a parallel processing system was determined for addressing the specific Navy application centering on the advanced digital filtering of passive acoustic ASW data of the type obtained from the SOSUS net. 148 figures. (RWR)

  17. Scalable Distributed Architectures for Information Retrieval

    National Research Council Canada - National Science Library

    Lu, Zhihong

    1999-01-01

    .... Our distributed architectures exploit parallelism in information retrieval on a cluster of parallel IR servers using symmetric multiprocessors, and use partial collection replication and selection...

  18. Decay-usage scheduling in multiprocessors

    NARCIS (Netherlands)

    Epema, D.H.J.

    1998-01-01

    Decay-usage scheduling is a priority-aging time-sharing scheduling policy capable of dealing with a workload of both interactive and batch jobs by decreasing the priority of a job when it acquires CPU time, and by increasing its priority when it does not use the (a) CPU. In this article we deal with

  19. VLSI Based Multiprocessor Communications Networks.

    Science.gov (United States)

    1982-09-01

    Networks". The contract began on September 1,1980 and was approved on scientific /technical grounds for a duration of three years. Incremental funding was...values for the individual delays will vary from comunicating modules (ij) are shown in Figure 4 module to module due to processing and fabrication

  20. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    International Nuclear Information System (INIS)

    Nash, T.

    1989-05-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. 6 figs

  1. The performance of disk arrays in shared-memory database machines

    Science.gov (United States)

    Katz, Randy H.; Hong, Wei

    1993-01-01

    In this paper, we examine how disk arrays and shared memory multiprocessors lead to an effective method for constructing database machines for general-purpose complex query processing. We show that disk arrays can lead to cost-effective storage systems if they are configured from suitably small formfactor disk drives. We introduce the storage system metric data temperature as a way to evaluate how well a disk configuration can sustain its workload, and we show that disk arrays can sustain the same data temperature as a more expensive mirrored-disk configuration. We use the metric to evaluate the performance of disk arrays in XPRS, an operational shared-memory multiprocessor database system being developed at the University of California, Berkeley.

  2. A distributed real-time operating system

    International Nuclear Information System (INIS)

    Tuynman, F.; Hertzberger, L.O.

    1984-07-01

    A distributed real-time operating system, Fados, has been developed for an embedded multi-processor system. The operating system is based on a host target approach and provides for communication between arbitrary processes on host and target machine. The facilities offered are, apart from process communication, access to the file system on the host by programs on the target machine and monitoring and debugging of programs on the target machine from the host. The process communication has been designed in such a way that the possibilities are the same as those offered by the Ada programming language. The operating system is implemented on a MC 68000 based multiprocessor system in combination with a Unix host. (orig.)

  3. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    International Nuclear Information System (INIS)

    Nash, T.

    1989-01-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. (orig.)

  4. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    Science.gov (United States)

    Nash, Thomas

    1989-12-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.

  5. RT/OS - realtime programming and application environment for the COSY control system

    International Nuclear Information System (INIS)

    Weinert, A.; Hacker, U.; Haberbosch, Ch.; Henn, K.; Simon, M.

    1994-01-01

    This paper presents the highlights and constraints of RT/OS and how it can be used at other places. The support of approximately 1000 VME and VXI CPUs in network-based multiprocessor systems needs a special set of programming tools and application environment which are bundled as the RT/OS Real-Time-Operating-System. Supporting the Motorola 680x0 processors and micro-controller with a development system on UNIX and MS-DOS platforms, the tool set gives the opportunity of bringing small embedded controllers as well as VME multiprocessor systems into operation. The tool set includes a cross compiler and a special version of a remote debugger, as well as application support such as downloading and configuration management. The main feature is a modular kernel with message-based task switching, using the rendezvous principle to allow easy extension. Special features of this kernel for a distributed system are networking with the Berkeley socket library supporting TCP/IP, XNS and Novell Netware, a multiprocessor system on VMEbus with the interprocessor Communication Module (IPC), a device driver library for almost every G64 I/O card using a FieldBus Interface called PDV and a X11R4 port. ((orig.))

  6. RT/OS - realtime programming and application environment for the COSY control system

    Energy Technology Data Exchange (ETDEWEB)

    Weinert, A [Forschungszentrum, Juelich, Postfach 1913, 52425 Juelich (Germany); Hacker, U [Forschungszentrum, Juelich, Postfach 1913, 52425 Juelich (Germany); Haberbosch, Ch [Forschungszentrum, Juelich, Postfach 1913, 52425 Juelich (Germany); Henn, K [Forschungszentrum, Juelich, Postfach 1913, 52425 Juelich (Germany); Simon, M [Forschungszentrum, Juelich, Postfach 1913, 52425 Juelich (Germany)

    1994-12-15

    This paper presents the highlights and constraints of RT/OS and how it can be used at other places. The support of approximately 1000 VME and VXI CPUs in network-based multiprocessor systems needs a special set of programming tools and application environment which are bundled as the RT/OS Real-Time-Operating-System. Supporting the Motorola 680x0 processors and micro-controller with a development system on UNIX and MS-DOS platforms, the tool set gives the opportunity of bringing small embedded controllers as well as VME multiprocessor systems into operation. The tool set includes a cross compiler and a special version of a remote debugger, as well as application support such as downloading and configuration management. The main feature is a modular kernel with message-based task switching, using the rendezvous principle to allow easy extension. Special features of this kernel for a distributed system are networking with the Berkeley socket library supporting TCP/IP, XNS and Novell Netware, a multiprocessor system on VMEbus with the interprocessor Communication Module (IPC), a device driver library for almost every G64 I/O card using a FieldBus Interface called PDV and a X11R4 port. ((orig.))

  7. Debugging multi-core systems-on-chip

    NARCIS (Netherlands)

    Vermeulen, B.; Goossens, K.G.W.; Kornaros, G.

    2010-01-01

    In this chapter, we introduced three fundamental reasons why debugging a multi-processor SoC is intrinsically difficult; (1) limited internal observability, (2) asynchronicity, and (3) non-determinism.

  8. A Performance Evaluation of the Hemingway DSM System on a Network of SMPs

    National Research Council Canada - National Science Library

    Aggarwal, Anshu; Grumwald, Dirk

    1997-01-01

    .... In this paper we investigate the performance of a software distributed shared memory system, Hemingway, which is built out of such multiprocessor workstations, utilizing off-the-shelf communication networks...

  9. Shared Memory Parallelization of an Implicit ADI-type CFD Code

    Science.gov (United States)

    Hauser, Th.; Huang, P. G.

    1999-01-01

    A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.

  10. Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

    Science.gov (United States)

    Quealy, Angela; Cole, Gary L.; Blech, Richard A.

    1993-01-01

    The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.

  11. A Case for Tamper-Resistant and Tamper-Evident Computer Systems

    National Research Council Canada - National Science Library

    Solihin, Yan

    2007-01-01

    .... These attacks attempt to snoop or modify data transfer between various chips in a computer system such as between the processor and memory, and between processors in a multiprocessor interconnect network...

  12. Simulation of the behaviour of electron-optical systems using a parallel computer

    International Nuclear Information System (INIS)

    Balladore, J.L.; Hawkes, P.W.

    1990-01-01

    The advantage of using a multiprocessor computer for the calculation of electron-optical properties is investigated. A considerable reduction of computing time is obtained by reorganising the finite-element field computation. (orig.)

  13. Concurrent control system for the JAERI tandem accelerator

    International Nuclear Information System (INIS)

    Hanashima, Susumu; Shozi, Tokio; Shiozaki, Yasuo; Saito, Motoi; Oogane, Yasuo; Sekiguchi, Satoshi.

    1994-01-01

    A new control system for the JAERI tandem accelerator is constructed. The system utilizes concurrent processing technology with multiprocessor. Transputers are used both for central processor and I/O front end processors. (author)

  14. Performance and evaluation of real-time multicomputer control systems

    Science.gov (United States)

    Shin, K. G.

    1985-01-01

    Three experiments on fault tolerant multiprocessors (FTMP) were begun. They are: (1) measurement of fault latency in FTMP; (2) validation and analysis of FTMP synchronization protocols; and investigation of error propagation in FTMP.

  15. Basic operation principles of associate co-processor module for ...

    African Journals Online (AJOL)

    With such an organization, it is necessary to read each memory module address and ... for specialized computer systems using a modern element base. ... multiprocessor systems that perform associative functions and data storage functions.

  16. Development of a parallel DBMS on the basis of PostgreSQL

    OpenAIRE

    Pan, C.

    2011-01-01

    The paper describes the architecture and the design of PargreSQL parallel database management system (DBMS) for distributed memory multiprocessors. PargreSQL is based upon PostgreSQL open-source DBMS and exploits partitioned parallelism.

  17. An Asynchronous Time-Division-Multiplexed Network-on-Chip for Real-Time Systems

    DEFF Research Database (Denmark)

    Kasapaki, Evangelia

    is an important part of the T-CREST paltform and used in a number of configurations. The flexible timing organization of Argo combines asynchronous routers with mesochronous NIs, which are connected to individually clocked cores, supporting a GALS system organization. The mesochronous NIs operate at the same......Multi-processor architectures using networks-on-chip (NOCs) for communication are becoming the standard approach in the development of embedded systems and general purpose platforms. Typically, multi-processor platforms follow a globally asynchronous locally synchronous (GALS) timing organization....... This thesis focuses on the design of Argo, a NOC targeted at hard real-time multi-processor platforms with a GALS timing organization. To support real-time communication, NOCs establish end-to-end connections and provide latency and throughput guarantees for these connections. Argo uses time division...

  18. Cost benefit analysis of instrumentation, supervision and control systems for nuclear power plants

    International Nuclear Information System (INIS)

    Hagen, P.

    1973-08-01

    A cost benefit analysis is carried out on a BWR type reactor power plant in which an on-line computer performs plant supervision, reporting, logging, calibration and control functions, using display devices and plotters, while an off-line computer is available for bigger jobs such as fuel management calculations. All on-line functions are briefly described and specified. Three types of computer system are considered, a simplex system, a dual computer system and a multi-processor system. These systems are analysed with respect to reliability, back-up instrumentation requirements and costs. While the multiprocessor system gave in all cases the lowest annual failure costs, the margin to the duplex system was so small that hardware, maintenance and software costs would play an important role in making a decision. (JIW)

  19. Research Article Special Issue

    African Journals Online (AJOL)

    pc

    2018-03-07

    Mar 7, 2018 ... time characteristics, search for the ways and possibilities to minimize ... possible to simulate the system in a mode close to real working conditions, which greatly .... When designing the multiprocessor operating systems, there ...

  20. Low Overhead Real-Time Computing With General Purpose Operating Systems

    National Research Council Canada - National Science Library

    Raymond, Michael

    2004-01-01

    .... In larger systems and more recently, general-purpose operating systems such as SGI IRIX and Linux are used for new projects because they already have multiprocessor and device driver support as well a large user base...

  1. Communications Patterns in a Symbolic Multiprocessor.

    Science.gov (United States)

    1987-06-01

    instruction references that Multilisp programs make. The cache hit ratio is greatest when instruction references have a high degree of -- locality. Another...future touches hit an undetermined future. N, The only exception is Consim, in which one third of future touches hit unde- termined futures. Task...Cambridge, MA, June 1985. [52] S. Sugimoto, K. Agusa, K. Tabata , and Y. Ohno. A multi-microprocessor system for concurrent Lisp. In Proceedings of

  2. External parallel sorting with multiprocessor computers

    International Nuclear Information System (INIS)

    Comanceau, S.I.

    1984-01-01

    This article describes methods of external sorting in which the entire main computer memory is used for the internal sorting of entries, forming out of them sorted segments of the greatest possible size, and outputting them to external memories. The obtained segments are merged into larger segments until all entries form one ordered segment. The described methods are suitable for sequential files stored on magnetic tape. The needs of the sorting algorithm can be met by using the relatively slow peripheral storage devices (e.g., tapes, disks, drums). The efficiency of the external sorting methods is determined by calculating the total sorting time as a function of the number of entries to be sorted and the number of parallel processors participating in the sorting process

  3. Architecture of an acquisition system-multiprocessors

    International Nuclear Information System (INIS)

    Postec, H.

    1987-07-01

    To follow the huge increasing of concerned parameters in nuclear detection systems, acquisition systems become bigger and have to present very good rapidity performance. At Ganil, four detection systems have been set in Nautilus reaction chamber, that lead to experiment configurations with 700 parameters to process. In front of present acquisition system limitation, a device more relevant to lecture of a large number of channels show off necessary. Functionalities already operating in other systems and hardware already used have been chosen; specific technical solutions were aldo developed to use the most recent techniques and to take in account the four detection system structure of the device [fr

  4. Efficient thermal management for multiprocessor systems

    OpenAIRE

    Coşkun, Ayşe Kıvılcım

    2009-01-01

    High temperatures and large thermal variations on the die create severe challenges in system reliability, performance, leakage power, and cooling costs. Designing for worst-case thermal conditions is highly costly and time-consuming. Therefore, dynamic thermal management methods are needed to maintain safe temperature levels during execution. Conventional management techniques sacrifice performance to control temperature and only consider the hot spots, neglecting the effects of thermal varia...

  5. Mejora de los Test de Planificabilidad para Asignación Incremental de Tareas en Sistemas Multiprocesadores de Tiempo Real

    Directory of Open Access Journals (Sweden)

    Sergio Sáez

    2013-04-01

    Full Text Available Resumen: Durante la etapa de diseño de un sistema multiprocesador de tiempo real, los test de planificabilidad son una parte clave de los algoritmos de asignación de tareas. El uso de test de planificabilidad exactos permite aumentar la eficiencia de los algoritmos de asignación a costa de un incremento en el tiempo de computo necesario para validar una partición. Aunque existen varios estudios que mejoran el rendimiento de dichos test de planificabilidad, ninguno se ha centrado en el contexto en que dichos test se utilizan. Este trabajo presenta diversas mejoras en los test de planificabilidad exactos basándose en la naturaleza incremental del proceso de asignación. Abstract: During the design of a Real-Time Multiprocessor System, schedulability tests are a key component of the task allocation algorithms. Using exact schedulability tests increases the efficiency of these allocation algorithms, but the execution cost to validate a task partition is also greatly increased. Although several improvement to these schedulability test have been recently published, their use in the multiprocessor context is still unaddressed. This work presents several improvements to execution costs of the schedulability test when they are used by task allocation algorithms taking advantage of the incremental nature of this allocation process. Palabras clave: Sistema Multiprocesadores, Análisis de Planificabilidad, Sistema de Tiempo Real, Keywords: Multiprocessor Systems, Schedulability Analysis, Real-Time Systems

  6. Specification and Compilation of Real-Time Stream Processing Applications

    NARCIS (Netherlands)

    Geuns, S.J.

    2015-01-01

    This thesis is concerned with the specification, compilation and corresponding temporal analysis of real-time stream processing applications that are executed on embedded multiprocessor systems. An example of such applications are software defined radio applications. These applications typically

  7. Fiscal 1998 research report on super compiler technology; 1998 nendo super konpaira technology no chosa kenkyu

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1999-03-01

    For next-generation super computing systems, research was made on parallel and distributed compiler technology for enhancing an effective performance, and concerned software and architectures for enhancing a performance in coordination with compilers. As for parallel compiler technology, the researches of scalable automated parallel compiler technology, parallel tuning tools, and an operating system to use multi-processor resources effectively are pointed out to be important as concrete technical development issues. In addition, by developing these research results to the architecture technology of single-chip multi-processors, the possibility of development and expansion of the PC, WS and HPC (high-performance computer) markets, and creation of new industries is pointed out. Although wide-area distributed computing is being watched as next-generation computing industry, concrete industrial fields using such computing are now not clear, staying in the groping research stage. (NEDO)

  8. Portable Instrumented Communication Library

    International Nuclear Information System (INIS)

    Geist, G.A.; Heath, M.T.; Peyton, B.W.; Worley, P.H.

    2001-01-01

    1 - Description of program or function: PICL is a subroutine library that can be used to develop parallel programs that are portable across several distributed-memory multi-processors. PICL provides a portable syntax for key communication primitives and related system calls. It also provides portable routines to perform certain widely- used, high-level communication operations, such as global broadcast and global summation. PICL provides execution tracing that can be used to monitor performance or to aid in debugging. 2 - Restrictions on the complexity of the problem: PICL is a compatibility library built on top of the native multiprocessor operating system and message passing primitives. Thus, the portability of PICL programs is not guaranteed, being a function of idiosyncrasies of the different platforms. Predictable differences are captured with standard error trapping routines. PICL is a research tool, not a production software system

  9. SMARTS: Exploiting Temporal Locality and Parallelism through Vertical Execution

    International Nuclear Information System (INIS)

    Beckman, P.; Crotinger, J.; Karmesin, S.; Malony, A.; Oldehoeft, R.; Shende, S.; Smith, S.; Vajracharya, S.

    1999-01-01

    In the solution of large-scale numerical prob- lems, parallel computing is becoming simultaneously more important and more difficult. The complex organization of today's multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely com- plex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of manag- ing parallelism and data locality from the user. We present innovative algorithms, based on the macro -dataflow model, for detecting data parallelism and efficiently executing data- parallel statements on shared-memory multiprocessors. We also desclibe how these algorithms can be implemented on clusters of SMPS

  10. SMARTS: Exploiting Temporal Locality and Parallelism through Vertical Execution

    Energy Technology Data Exchange (ETDEWEB)

    Beckman, P.; Crotinger, J.; Karmesin, S.; Malony, A.; Oldehoeft, R.; Shende, S.; Smith, S.; Vajracharya, S.

    1999-01-04

    In the solution of large-scale numerical prob- lems, parallel computing is becoming simultaneously more important and more difficult. The complex organization of today's multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely com- plex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of manag- ing parallelism and data locality from the user. We present innovative algorithms, based on the macro -dataflow model, for detecting data parallelism and efficiently executing data- parallel statements on shared-memory multiprocessors. We also desclibe how these algorithms can be implemented on clusters of SMPS.

  11. Stochastic airspace simulation tool development

    Science.gov (United States)

    2009-10-01

    Modeling and simulation is often used to study : the physical world when observation may not be : practical. The overall goal of a recent and ongoing : simulation tool project has been to provide a : documented, lifecycle-managed, multi-processor : c...

  12. Automated Design of Application-Specific Smart Camera Architectures

    NARCIS (Netherlands)

    Caarls, W.

    2008-01-01

    Parallel heterogeneous multiprocessor systems are often shunned in embedded system design, not only because of their design complexity but because of the programming burden. Programs for such systems are architecture-dependent: the application developer needs architecture-specific knowledge to

  13. Hierarchical DSE for multi-ASIP platforms

    DEFF Research Database (Denmark)

    Micconi, Laura; Corvino, Rosilde; Gangadharan, Deepak

    2013-01-01

    This work proposes a hierarchical Design Space Exploration (DSE) for the design of multi-processor platforms targeted to specific applications with strict timing and area constraints. In particular, it considers platforms integrating multiple Application Specific Instruction Set Processors (ASIPs...

  14. Unified dataflow model for the analysis of data and pipeline parallelism, and buffer sizing

    NARCIS (Netherlands)

    Hausmans, J.P.H.M.; Geuns, S.J.; Wiggers, M.H.; Bekooij, Marco Jan Gerrit

    2014-01-01

    Real-time stream processing applications such as software defined radios are usually executed concurrently on multiprocessor systems. Exploiting coarse-grained data parallelism by duplicating tasks is often required, besides pipeline parallelism, to meet the temporal constraints of the applications.

  15. Dataflow-based multi-ASIP platform approach for digital control applications

    NARCIS (Netherlands)

    Frijns, R.M.W.; Kamp, A.L.J.; Stuijk, S.; Voeten, J.P.M.; Bontekoe, M.; Gemei, K.J.A.; Corporaal, H.

    2013-01-01

    To provide a good balance between the performance and flexibility of future digital control platforms, we propose an FPGA-based heterogeneous multiprocessor approach, in which the platform is composed of processing elements from a set of parameterizable heterogeneous Application-Specific

  16. Temporal analysis of static priority preemptive scheduled cyclic streaming applications using CSDF models

    NARCIS (Netherlands)

    Kurtin, Philip Sebastian; Bekooij, Marco Jan Gerrit

    2016-01-01

    Real-time streaming applications with cyclic data dependencies that are executed on multiprocessor systems with processor sharing usually require a temporal analysis to give guarantees on their temporal behavior at design time. Current accurate analysis techniques for cyclic applications that are

  17. Run-time middleware to support real-time system scenarios

    NARCIS (Netherlands)

    Goossens, K.; Koedam, M.; Sinha, S.; Nelson, A.; Geilen, M.

    2015-01-01

    Systems on Chip (SOC) are powerful multiprocessor systems capable of running multiple independent applications, often with both real-time and non-real-time requirements. Scenarios exist at two levels: first, combinations of independent applications, and second, different states of a single

  18. Microprocessor based techniques at CESR

    International Nuclear Information System (INIS)

    Giannini, G.; Cornell Univ., Ithaca, NY

    1981-01-01

    Microprocessor based systems succesfully used in connection with the High Energy Physics experimental program at the Cornell Electron Storage Ring are described. The multiprocessor calibration system for the CUSB calorimeter is analyzed in view of present and future applications. (orig.)

  19. Accuracy improvement of dataflow analysis for cyclic stream processing applications scheduled by static priority preemptive schedulers

    NARCIS (Netherlands)

    Kurtin, Philip Sebastian; Hausmans, J.P.H.M.; Geuns, S.J.; Bekooij, Marco Jan Gerrit

    2014-01-01

    Stream processing applications executed on embedded multiprocessor systems regularly contain cyclic data dependencies due to the presence of feedback loops and bounded FIFO buffers. Dataflow modeling is suitable for the temporal analysis of such applications. However, the accuracy can be

  20. A general model of concurrency and its implementation as many-core dynamic RISC processors

    NARCIS (Netherlands)

    Bernard, T.; Bousias, K.; Guang, L.; Jesshope, C.R.; Lankamp, M.; van Tol, M.W.; Zhang, L.

    2008-01-01

    This paper presents a concurrent execution model and its micro-architecture based on in-order RISC processors, which schedules instructions from large pools of contextualised threads. The model admits a strategy for programming chip multiprocessors using parallelising compilers based on existing

  1. Interactive Real-time Magnetic Resonance Imaging

    DEFF Research Database (Denmark)

    Brix, Lau

    seeks to implement and assess existing reconstruction algorithms using multi-processors of modern graphics cards and many-core computer processors and to cover some of the potential clinical applications which might benefit from using an interactive real-time MRI system. First an off...

  2. Low-cost guaranteed-throughput communication ring for real-time streaming MPSoCs

    NARCIS (Netherlands)

    Dekens, B.H.J.; Kurtin, Philip Sebastian; Bekooij, Marco Jan Gerrit; Smit, Gerardus Johannes Maria

    2013-01-01

    Connection-oriented guaranteed-throughput mesh-based networks on chip have been proposed as a replacement for buses in real-time embedded multiprocessor systems such as software defined radios. Even with attractive features like throughput and latency guarantees they are not always used because

  3. Incorporating implementation overheads in the analysis for the flexible spin-lock model

    NARCIS (Netherlands)

    Balasubramanian, S.M.N.; Afshar, S.; Gai, P.; Behnam, M.; Bril, R.J.

    2017-01-01

    The flexible spin-lock model (FSLM) unifies suspension-based and spin-based resource sharing protocols for partitioned fixed-priority preemptive scheduling based real-time multiprocessor platforms. Recent work has been done in defining the protocol for FSLM and providing a schedulability analysis

  4. Simultaneous Budget and Buffer Size Computation for Throughput-Constrained Task Graphs

    NARCIS (Netherlands)

    Wiggers, M.H.; Bekooij, Marco Jan Gerrit; Geilen, Marc C.W.; Basten, Twan

    Modern embedded multimedia systems process multiple concurrent streams of data processing jobs. Streams often have throughput requirements. These jobs are implemented on a multiprocessor system as a task graph. Tasks communicate data over buffers, where tasks wait on sufficient space in output

  5. A compositional modelling framework for exploring MPSoC systems

    DEFF Research Database (Denmark)

    Tranberg-Hansen, Anders Sejer; Madsen, Jan

    2009-01-01

    This paper presents a novel compositional framework for system level performance estimation and exploration of Multi-Processor System On Chip (MPSoC) based systems. The main contributions are the definition of a compositional model which allows quantitative performance estimation to be carried ou...

  6. Process-variation aware mapping of best-effort and real-time streaming applications to MPSoCs

    NARCIS (Netherlands)

    Mirzoyan, D.; Akesson, K.B.; Goossens, K.G.W.

    2014-01-01

    As technology scales, the impact of process variation on the maximum supported frequency (FMAX) of individual cores in a multiprocessor system-on-chip (MPSoC) becomes more pronounced. Task allocation without variation-aware performance analysis can greatly compromise performance and lead to a

  7. Task-FIFO co-scheduling of streaming applications on MPSoCs with predictable memory hierarchy

    NARCIS (Netherlands)

    Tang, Q.; Basten, A.A.; Geilen, M.C.W.; Stuijk, S.; Wei, Ji-Bo

    This article studies the scheduling of real-time streaming applications on multiprocessor systems-on-chips with predictable memory hierarchy. An iteration-based task-FIFO co-scheduling framework is proposed for this problem. We obtain FIFO size distributions using Pareto space searching, based on

  8. Task-FIFO co-scheduling of streaming applications on MPSoCs with predictable memory hierarchy

    NARCIS (Netherlands)

    Tang, Q.; Basten, T.; Geilen, M.; Stuijk, S.; Wei, J.B.

    2017-01-01

    This article studies the scheduling of real-time streaming applications on multiprocessor systems-on-chips with predictable memory hierarchy. An iteration-based task-FIFO co-scheduling framework is proposed for this problem. We obtain FIFO size distributions using Pareto space searching, based on

  9. Concurrent control system for the JAERI tandem accelerator

    International Nuclear Information System (INIS)

    Hanashima, S.; Shoji, T.; Horie, K.; Tsukihashi, Y.

    1992-01-01

    Concurrent processing with a multiprocessor system is introduced to the particle accelerator control system region. The control system is a good application in both logical and physical aspects. A renewal plan of the control system for the JAERI tandem accelerator is discussed. (author)

  10. Multidimensional model of estimated resource usage for multimedia NoC QoS

    NARCIS (Netherlands)

    Pastrnak, M.; With, de P.H.N.; Lagendijk, R.; Weber, H. Jos; Berg, van den, F. A.

    2006-01-01

    Multiprocessor systems are rapidly entering various high-performance computing segments, like multimedia processing. Instead of an increase in processor clock frequency, the new trend is enabling multiple cores in performing processing, e.g. dual or quadrapule CPUs in one subsystem. In this

  11. CSDFa: a model for exploiting the trade-off between data and pipeline parallelism

    NARCIS (Netherlands)

    Koek, Peter; Geuns, S.J.; Hausmans, J.P.H.M.; Corporaal, Henk; Bekooij, Marco Jan Gerrit

    2016-01-01

    Real-time stream processing applications, such as SDR applications, are often executed concurrently on multiprocessor systems. A unified data flow model and analysis method have been proposed that can be used to simultaneously determine the amount of pipeline and coarse-grained data parallelism

  12. The discrete-dipole-approximation code ADDA: capabilities and known limitations

    NARCIS (Netherlands)

    Yurkin, M.A.; Hoekstra, A.G.

    2011-01-01

    The open-source code ADDA is described, which implements the discrete dipole approximation (DDA), a method to simulate light scattering by finite 3D objects of arbitrary shape and composition. Besides standard sequential execution, ADDA can run on a multiprocessor distributed-memory system,

  13. A high-speed interconnect network using ternary logic

    DEFF Research Database (Denmark)

    Madsen, Jens Kargaard; Long, S. I.

    1995-01-01

    This paper describes the design and implementation of a high-speed interconnect network (ICN) for a multiprocessor system using ternary logic. By using ternary logic and a fast point-to-point communication technique called STARI (Self-Timed At Receiver's Input), the communication between...

  14. Programming heterogeneous MPSoCs tool flows to close the software productivity gap

    CERN Document Server

    Castrillón Mazo, Jerónimo

    2014-01-01

    This book provides embedded software developers with techniques for programmingheterogeneous Multi-Processor Systems-on-Chip (MPSoCs), capable of executing multiple applications simultaneously. It describes a set of algorithms and methodologies to narrow the software productivity gap, as well as an in-depth description of the underlying problems and challenges of today’s programming practices. The authors present four different tool flows: A parallelism extraction flow for applications writtenusing the C programming language, a mapping and scheduling flow for parallel applications, a special mapping flow for baseband applications in the context of Software Defined Radio (SDR) and a final flow for analyzing multiple applications at design time. The tool flows are evaluated on Virtual Platforms (VPs), which mimic different characteristics of state-of-the-art heterogeneous MPSoCs.   • Provides a novel set of algorithms and methodologies for programming heterogeneous Multi-Processor Systems-on-Chip (MPSoCs)...

  15. pn: A Tool for Improved Derivation of Process Networks

    Directory of Open Access Journals (Sweden)

    Sven Verdoolaege

    2007-04-01

    Full Text Available Current emerging embedded System-on-Chip platforms are increasingly becoming multiprocessor architectures. System designers experience significant difficulties in programming these platforms. The applications are typically specified as sequential programs that do not reveal the available parallelism in an application, thereby hindering the efficient mapping of an application onto a parallel multiprocessor platform. In this paper, we present our compiler techniques for facilitating the migration from a sequential application specification to a parallel application specification using the process network model of computation. Our work is inspired by a previous research project called Compaan. With our techniques we address optimization issues such as the generation of process networks with simplified topology and communication without sacrificing the process networks' performance. Moreover, we describe a technique for compile-time memory requirement estimation which we consider as an important contribution of this paper. We demonstrate the usefulness of our techniques on several examples.

  16. Parallel processor programs in the Federal Government

    Science.gov (United States)

    Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.

    1985-01-01

    In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.

  17. Composable local memory organisation for streaming applications on embedded MPSoCs

    NARCIS (Netherlands)

    Ambrose, J.; Molnos, A.; Nelson, A.; Cotofana, S.; Goossens, K.G.W.; Juurlink, B.

    2011-01-01

    Multi-Processor Systems on a Chip (MPSoCs) are suitable platforms for the implementation of complex embedded applications. An MPSoC is composable if the functional and temporal behaviour of each application is independent of the absence or presence of other applications. Composability is required

  18. Composable Virtual Platforms for Mixed-Criticality Embedded Systems

    NARCIS (Netherlands)

    Beyranvand Nejad, A.

    2014-01-01

    Recent trends show a steady increase towards concurrently executing more and more applications on a single embedded system. Multi-Processor System-on-Chip (MPSoC) architectures are proposed to allow complex design of embedded systems. This is achieved by integrating as many processing resources as

  19. Composable virtual platforms for mixed-criticality embedded systems

    NARCIS (Netherlands)

    Nejad, A.B.

    2014-01-01

    Recent trends show a steady increase towards concurrently executing more and more applications on a single embedded system. Multi-Processor System-on-Chip (MPSoC) architectures are proposed to allow complex design of embedded systems. This is achieved by integrating as many processing resources as

  20. An implementation of the flexible spin-lock model in ERIKA Enterprise on a multi-core platform

    NARCIS (Netherlands)

    Afshar, S.; Verwielen, M.P.W.; Gai, P.; Behnam, M.; Bril, R.J.

    2016-01-01

    Recently, the flexible spin-lock model (FSLM) has been introduced, unifying spin-based and suspension-based resource sharing protocols for real-time multiprocessor platforms by explicitly identifying the spin-lock priority as a parameter. Earlier work focused on the definition of a protocol for FSLM

  1. Scalable shared-memory multiprocessing

    CERN Document Server

    Lenoski, Daniel E

    1995-01-01

    Dr. Lenoski and Dr. Weber have experience with leading-edge research and practical issues involved in implementing large-scale parallel systems. They were key contributors to the architecture and design of the DASH multiprocessor. Currently, they are involved with commercializing scalable shared-memory technology.

  2. The Experimental Mathematician: The Pleasure of Discovery and the Role of Proof

    Science.gov (United States)

    Borwein, Jonathan M.

    2005-01-01

    The emergence of powerful mathematical computing environments, the growing availability of correspondingly powerful (multi-processor) computers and the pervasive presence of the Internet allow for mathematicians, students and teachers, to proceed heuristically and "quasi-inductively." We may increasingly use symbolic and numeric computation,…

  3. Cloud-based design and virtual prototypin environment for embedded systems

    NARCIS (Netherlands)

    Werner, S.; Lauber, A.; Koedam, M.L.P.J.; Becker, J.; Sax, E.; Goossens, K.G.W.

    2016-01-01

    The design and test of Multi-Processor Systemon- Chips (MPSoCs) and development of distributed applications and/or operating systems executed on those hardware platforms is one of the biggest challenges in today's system design. This applies in particular when short time-tomarket constraints impose

  4. Run-time mappig of multiple communicating tasks on MPSoC platforms.

    NARCIS (Netherlands)

    Singh, A.K.; Jigang, W.; Kumar, A.; Srikanthan, Th.

    2010-01-01

    Multi-task supported processing elements (PEs) are required in a Multiprocessor System-on-Chip platform for better scalability, power consumption etc. Efficient utilization of PEs needs intelligent mapping of tasks onto them. The job becomes more challenging when the workload of tasks is dynamic.

  5. Mapping of synchronous dataflow graphs on MPSoCs based on parallelism enhancement

    NARCIS (Netherlands)

    Tang, Q.; Basten, T.; Geilen, M.; Stuijk, S.; Wei, J.B.

    2017-01-01

    Multi-processor systems-on-chips are widely adopted in implementing modern streaming applications to satisfy the ever increasing computation requirements. To take advantage of this kind of platform, it is necessary to map tasks of the application properly to different processors, so as to fully

  6. Memory controllers for high-performance and real-time MPSoCs : requirements, architectures, and future trends

    NARCIS (Netherlands)

    Akesson, K.B.; Huang, Po-Chun; Clermidy, F.; Dutoit, D.; Goossens, K.G.W.; Chang, Yuan-Hao; Kuo, Tei-Wei; Vivet, P.; Wingard, D.

    2011-01-01

    Designing memory controllers for complex real-time and high-performance multi-processor systems-on-chip is challenging, since sufficient capacity and (real-time) performance must be provided in a reliable manner at low cost and with low power consumption. This special session contains four

  7. Combining on-hardware prototyping and high-level simulation for DSE of multi-ASIP systems

    NARCIS (Netherlands)

    Meloni, P.; Pomata, S.; Raffo, L.; Piscitelli, R.; Pimentel, A.D.; McAllister, J.; Bhattacharyya, S.

    2012-01-01

    Modern heterogeneous multi-processor embedded systems very often expose to the designer a large number of degrees of freedom, related to the application partitioning/mapping and to the component- and system-level architecture composition. The number is even larger when the designer targets systems

  8. A signature-based power model for MPSoC on FPGA

    NARCIS (Netherlands)

    Piscitelli, R.; Pimentel, A.D.

    2012-01-01

    This paper presents a framework for high-level power estimation of multiprocessor systems-on-chip (MPSoC) architectures on FPGA. The technique is based on abstract execution profiles, called event signatures, and it operates at a higher level of abstraction than, for example, commonly used

  9. A high-level power model for MPSoC on FPGA

    NARCIS (Netherlands)

    Piscitelli, R.; Pimentel, A.D.

    2012-01-01

    This paper presents a framework for high-level power estimation of multiprocessor systems-on-chip (MPSoC) architectures on FPGA. The technique is based on abstract execution profiles, called event signatures. As a result, it is capable of achieving good evaluation performance, thereby making the

  10. Evolving MPSoC Solutions

    DEFF Research Database (Denmark)

    A key challenge of implementing an embedded systems application on a heterogeneous multiprocessor SoC platform is to find the right partitioning of the application onto the platform architecture. The right partitioning is dependent on the characteristics of the processors and the network connecting...

  11. Composable power management with energy and power budgets per application

    NARCIS (Netherlands)

    Nelson, A.; Molnos, A.M.; Goossens, K.G.W.

    2011-01-01

    Embedded Multiprocessor Systems-on-Chip (MPSoCs) commonly run multiple applications at once. These applications may have different time criticalities, i.e. non real-time, soft real-time, and firm or hard real-time. Application-level composability is used to provide each application with its own

  12. Analysis and optimization techniques for real-time streaming image processing software on general purpose systems

    NARCIS (Netherlands)

    Westmijze, Mark

    2018-01-01

    Commercial Off The Shelf (COTS) Chip Multi-Processor (CMP) systems are for cost reasons often used in industry for soft real-time stream processing. COTS CMP systems typically have a low timing predictability, which makes it difficult to develop software applications for these systems with tight

  13. A dual shared stack for FSLM in Erika Enterprise

    NARCIS (Netherlands)

    Balasubramanian, S.M.N.; Afshar, S.; Gai, P.; Behnam, M.; Bril, R.J.

    2017-01-01

    Recently, the flexible spin-lock model (FSLM) has been introduced, unifying spin-based and suspension-based resource sharing protocols for real-time multi-core platforms. Unlike the multiprocessor stack resource policy (MSRP), FSLM doesn’t allow tasks on a core to share a single stack, however. In

  14. Mapping real-life applications on run-time reconfigurable NoC-based MPSoC on FPGA.

    NARCIS (Netherlands)

    Singh, A.K.; Kumar, A.; Srikanthan, Th.; Ha, Y.

    2010-01-01

    Multiprocessor systems-on-chip (MPSoC) are required to fulfill the performance demand of modern real-life embedded applications. These MPSoCs are employing Network-on-Chip (NoC) for reasons of efficiency and scalability. Additionally, these systems need to support run-time reconfiguration of their

  15. NASA: A generic infrastructure for system-level MP-SoC design space exploration

    NARCIS (Netherlands)

    Jia, Z.J.; Pimentel, A.D.; Thompson, M.; Bautista, T.; Núñez, A.

    2010-01-01

    System-level simulation and design space exploration (DSE) are key ingredients for the design of multiprocessor system-on-chip (MP-SoC) based embedded systems. The efforts in this area, however, typically use ad-hoc software infrastructures to facilitate and support the system-level DSE experiments.

  16. Testing of a highly reconfigurable processor core for dependable data streaming applications

    NARCIS (Netherlands)

    Kerkhoff, Hans G.; Huijts, J.J.M.

    2008-01-01

    The advances of CMOS technology towards 45 nm,the high costs of ASIC design, power limitations and fast changing application requirements have stimulated the usage of highly reconfigurable multiprocessor-cores SoCs. These processing cores within the SoC can be subsequently connected with each other

  17. Composable and Predictable Power Management

    NARCIS (Netherlands)

    Nelson, A.T.

    2014-01-01

    The functionality of embedded systems is ever growing. The computational power of embedded systems is growing to match this demand, with embedded multiprocessor systems becoming more common. The limitations of embedded systems are not always related to chip size but are commonly due to energy and/or

  18. Ring interconnection for distributed memory automation and computing system

    Energy Technology Data Exchange (ETDEWEB)

    Vinogradov, V I [Inst. for Nuclear Research of the Russian Academy of Sciences, Moscow (Russian Federation)

    1996-12-31

    Problems of development of measurement, acquisition and central systems based on a distributed memory and a ring interface are discussed. It has been found that the RAM LINK-type protocol can be used for ringlet links in non-symmetrical distributed memory architecture multiprocessor system interaction. 5 refs.

  19. Exploiting Domain Knowledge in System-level MPSoC Design Space Exploration

    NARCIS (Netherlands)

    Thompson, M.; Pimentel, A.D.

    2013-01-01

    System-level design space exploration (DSE), which is performed early in the design process, is of eminent importance to the design of complex multi-processor embedded multimedia systems. During system-level DSE, system parameters like, e.g., the number and type of processors, and the mapping of

  20. Efcient Computation of Buffer Capacities for Cyclo-Static Real-Time Systems with Back-Pressure

    NARCIS (Netherlands)

    Wiggers, M.H.; Bekooij, Marco; Bekooij, Marco Jan Gerrit; Jansen, P.G.; Smit, Gerardus Johannes Maria

    2006-01-01

    A key step in the design of cyclo-static real-time systems is the determination of buffer capacities. In our multiprocessor system, we apply back-pressure, which means that tasks wait for space in output buffers. Consequently buffer capacities affect the throughput. This requires the derivation of

  1. A real-time multichannel memory controller and optimal mapping of memory clients to memory channels

    NARCIS (Netherlands)

    Gomony, M.D.; Akesson, K.B.; Goossens, K.G.W.

    2015-01-01

    Ever-increasing demands for main memory bandwidth and memory speed/power tradeoff led to the introduction of memories with multiple memory channels, such as Wide IO DRAM. Efficient utilization of a multichannel memory as a shared resource in multiprocessor real-time systems depends on mapping of the

  2. A high-level power model for MPSoC on FPGA

    NARCIS (Netherlands)

    Piscitelli, R.; Pimentel, A.D.

    2011-01-01

    This paper presents a framework for high-level power estimation of multiprocessor systems-on-chip (MPSoC) architectures on FPGA. The technique is based on abstract execution profiles, called event signatures, and it operates at a higher level of abstraction than, e.g., commonly-used instruction-set

  3. Alpha: A real-time decentralized operating system for mission-oriented system integration and operation

    Science.gov (United States)

    Jensen, E. Douglas

    1988-01-01

    Alpha is a new kind of operating system that is unique in two highly significant ways. First, it is decentralized transparently providing reliable resource management across physically dispersed nodes, so that distributed applications programming can be done largely as though it were centralized. And second, it provides comprehensive, high technology support for real-time system integration and operation, an application area which consists predominately of aperiodic activities having critical time constraints such as deadlines. Alpha is extremely adaptable so that it can be easily optimized for a wide range of problem-specific functionality, performance, and cost. Alpha is the first systems effort of the Archons Project, and the prototype was created at Carnegie-Mellon University directly on modified Sun multiprocessor workstation hardware. It has been demonstrated with a real-time C(sup 2) application. Continuing research is leading to a series of enhanced follow-ons to Alpha; these are portable but initially hosted on Concurrent's MASSCOMP line of multiprocessor products.

  4. Processors and systems (picture processing)

    Energy Technology Data Exchange (ETDEWEB)

    Gemmar, P

    1983-01-01

    Automatic picture processing requires high performance computers and high transmission capacities in the processor units. The author examines the possibilities of operating processors in parallel in order to accelerate the processing of pictures. He therefore discusses a number of available processors and systems for picture processing and illustrates their capacities for special types of picture processing. He stresses the fact that the amount of storage required for picture processing is exceptionally high. The author concludes that it is as yet difficult to decide whether very large groups of simple processors or highly complex multiprocessor systems will provide the best solution. Both methods will be aided by the development of VLSI. New solutions have already been offered (systolic arrays and 3-d processing structures) but they also are subject to losses caused by inherently parallel algorithms. Greater efforts must be made to produce suitable software for multiprocessor systems. Some possibilities for future picture processing systems are discussed. 33 references.

  5. Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

    Science.gov (United States)

    OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

    1998-01-01

    This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).

  6. Early MIMD experience with a plasma physics simulation program on the CRAY X-MP

    International Nuclear Information System (INIS)

    Rhoades, C.E. Jr.

    1986-02-01

    This paper describes some early experience with converting a plasma physics simulation program to the CRAY X-MP, a current multiple instruction, multiple data (MIMD) computer consisting of two processors with architecture similar to that of the CRAY-1. The computer program used in this study is an all Fortran version of SELF, a two species, one space, two velocity, electromagnetic, Newtonian, particle in cell, plasma simulation code. The approach to converting SELF to use both processors of the CRAY X-MP is described in some detail. The resulting multiprocessor version of SELF is nearly a factor of two faster in real time than the single processor version. The multiprocessor version obtains 58.2+-.1 seconds of central processor time in 30+-.5 seconds of real time. For comparison, the CRAY-1 execution time if 74.5 seconds. For SELF, which is mostly scalar coding, the CRAY X-MP is about 2.5 times faster overall than the CRAY-1

  7. Design and simulation of parallel and distributed architectures for images processing

    International Nuclear Information System (INIS)

    Pirson, Alain

    1990-01-01

    The exploitation of visual information requires special computers. The diversity of operations and the Computing power involved bring about structures founded on the concepts of concurrency and distributed processing. This work identifies a vision computer with an association of dedicated intelligent entities, exchanging messages according to the model of parallelism introduced by the language Occam. It puts forward an architecture of the 'enriched processor network' type. It consists of a classical multiprocessor structure where each node is provided with specific devices. These devices perform processing tasks as well as inter-nodes dialogues. Such an architecture benefits from the homogeneity of multiprocessor networks and the power of dedicated resources. Its implementation corresponds to that of a distributed structure, tasks being allocated to each Computing element. This approach culminates in an original architecture called ATILA. This modular structure is based on a transputer network supplied with vision dedicated co-processors and powerful communication devices. (author) [fr

  8. RTOS acceleration in an MPSoC with reconfigurable hardware

    NARCIS (Netherlands)

    Zaykov, P.G.; Kuzmanov, G.; Molnos, A.M.; Goossens, K.G.W.

    2016-01-01

    In this paper, we address the problem of improving the performance of real-time embedded Multiprocessor System-on-Chip (MPSoC). Such MPSoCs often execute applications composed of multiple tasks. The tasks on each processor are scheduled by a Real-Time Operating System (RTOS) instance. To improve

  9. Design-time application mapping and platform exploration for MP-SoC customised run-time management

    NARCIS (Netherlands)

    Ykman-Couvreur, Ch.; Nollet, V.; Marescaux, T.M.; Brockmeyer, E.; Catthoor, F.; Corporaal, H.

    2007-01-01

    Abstract: In an Multi-Processor system-on-Chip (MP-SoC) environment, a customized run-time management layer should be incorporated on top of the basic Operating System services to alleviate the run-time decision-making and to globally optimise costs (e.g. energy consumption) across all active

  10. Hardware task-status manager for an RTOS with FIFO communication

    NARCIS (Netherlands)

    Zaykov, P.G.; Kuzmanov, G.; Molnos, A.M.; Goossens, K.G.W.

    2014-01-01

    In this paper, we address the problem of improving the performance of real-time embedded Multiprocessor System-on-Chip (MPSoC). Such MPSoCs often execute data-flow applications composed of multiple tasks, which communicate through First-In-First-Out (FIFO) queues. The tasks on each processor in the

  11. Design of a modular digital computer system, DRL 4. [for meeting future requirements of spaceborne computers

    Science.gov (United States)

    1972-01-01

    The design is reported of an advanced modular computer system designated the Automatically Reconfigurable Modular Multiprocessor System, which anticipates requirements for higher computing capacity and reliability for future spaceborne computers. Subjects discussed include: an overview of the architecture, mission analysis, synchronous and nonsynchronous scheduling control, reliability, and data transmission.

  12. Exploring Task Mappings on Heterogeneous MPSoCs using a Bias-Elitist Genetic Algorithm

    NARCIS (Netherlands)

    Quan, W.; Pimentel, A.D.

    2014-01-01

    Exploration of task mappings plays a crucial role in achieving high performance in heterogeneous multi-processor system-on-chip (MPSoC) platforms. The problem of optimally mapping a set of tasks onto a set of given heterogeneous processors for maximal throughput has been known, in general, to be

  13. Proceedings of the 1988 IEEE international conference on robotics and automation. Volume 1

    International Nuclear Information System (INIS)

    Anon.

    1988-01-01

    These proceedings compile the papers presented at the international conference (1988) sponsored by IEEE Council on ''Robotics and Automation''. The subjects discussed were: automation and robots of nuclear power stations; algorithms of multiprocessors; parallel processing and computer architecture; and U.S. DOE research programs on nuclear power plants

  14. Energy Model of Networks-on-Chip and a Bus

    NARCIS (Netherlands)

    Wolkotte, P.T.; Smit, Gerardus Johannes Maria; Kavaldjiev, N.K.; Becker, Jens E.; Becker, Jürgen; Nurmi, J.; Takala, J.; Hamalainen, T.D.

    2005-01-01

    A Network-on-Chip (NoC) is an energy-efficient onchip communication architecture for Multi-Processor Systemon-Chip (MPSoC) architectures. In earlier papers we proposed two Network-on-Chip architectures based on packet-switching and circuit-switching. In this paper we derive an energy model for both

  15. Efficient Computation of Buffer Capacities for Cyclo-Static Dataflow Graphs

    NARCIS (Netherlands)

    Wiggers, M.H.; Bekooij, Marco Jan Gerrit; Bekooij, Marco J.G.; Smit, Gerardus Johannes Maria

    A key step in the design of cyclo-static real-time systems is the determination of buffer capacities. In our multi-processor system, we apply back-pressure, which means that tasks wait for space in output buffers. Consequently buffer capacities affect the throughput. This requires the derivation of

  16. Efficient Computation of Buffer Capacities for Cyclo-Static Dataflow Graphs

    NARCIS (Netherlands)

    Wiggers, M.H.; Bekooij, Marco Jan Gerrit; Smit, Gerardus Johannes Maria

    2006-01-01

    A key step in the design of cyclo-static real-time systems is the determination of buffer capacities. In our multi-processor system, we apply back-pressure, which means that tasks wait for space in output buffers. Consequently buffer capacities affect the throughput. This requires the derivation of

  17. An optimal multi-channel memory controller for real-time systems

    NARCIS (Netherlands)

    Gomony, M.D.; Akesson, K.B.; Goossens, K.G.W.

    2013-01-01

    Optimal utilization of a multi-channel memory, such as Wide IO DRAM, as shared memory in multi-processor platforms depends on the mapping of memory clients to the memory channels, the granularity at which the memory requests are interleaved in each channel, and the bandwidth and memory capacity

  18. Design space pruning through hybrid analysis in system-level design space exploration

    NARCIS (Netherlands)

    Piscitelli, R.; Pimentel, A.D.

    2012-01-01

    System-level design space exploration (DSE), which is performed early in the design process, is of eminent importance to the design of complex multi-processor embedded system archi- tectures. During system-level DSE, system parameters like, e.g., the number and type of processors, the type and size

  19. Interleaving methods for hybrid system-level MPSoC design space exploration

    NARCIS (Netherlands)

    Piscitelli, R.; Pimentel, A.D.; McAllister, J.; Bhattacharyya, S.

    2012-01-01

    System-level design space exploration (DSE), which is performed early in the design process, is of eminent importance to the design of complex multi-processor embedded system architectures. During system-level DSE, system parameters like, e.g., the number and type of processors, the type and size of

  20. Pruning techniques for multi-objective system-level design space exploration

    NARCIS (Netherlands)

    Piscitelli, R.

    2014-01-01

    System-level design space exploration (DSE), which is performed early in the design process, is of eminent importance to the design of complex multi-processor embedded system architectures. During system-level DSE, system parameters like, e.g., the number and type of processors, the type and size of

  1. Highway traffic simulation on multi-processor computers

    Energy Technology Data Exchange (ETDEWEB)

    Hanebutte, U.R.; Doss, E.; Tentner, A.M.

    1997-04-01

    A computer model has been developed to simulate highway traffic for various degrees of automation with a high level of fidelity in regard to driver control and vehicle characteristics. The model simulates vehicle maneuvering in a multi-lane highway traffic system and allows for the use of Intelligent Transportation System (ITS) technologies such as an Automated Intelligent Cruise Control (AICC). The structure of the computer model facilitates the use of parallel computers for the highway traffic simulation, since domain decomposition techniques can be applied in a straight forward fashion. In this model, the highway system (i.e. a network of road links) is divided into multiple regions; each region is controlled by a separate link manager residing on an individual processor. A graphical user interface augments the computer model kv allowing for real-time interactive simulation control and interaction with each individual vehicle and road side infrastructure element on each link. Average speed and traffic volume data is collected at user-specified loop detector locations. Further, as a measure of safety the so- called Time To Collision (TTC) parameter is being recorded.

  2. The fast Amsterdam multiprocessor (FAMP) operation system

    International Nuclear Information System (INIS)

    Gosman, D.; Hertzberger, L.O.; Holthuizen, D.J.; Por, G.J.A.; Schoorel, M.

    1981-01-01

    The Fast Amsterdam Multi Processor system (FAMP system) is developed for on-line filtering and second stage triggering. The system is based on the MC 68000 microprocessor from MOTOROLA. In this report we will describe: The FAMP operating system software, the features of the slaves and supervisor in the FAMP operating system, the communication between supervisor and slaves using the dual port memories, the communication between user programs and the operating system. The hardware as well as the application of the system will be described elsewhere. (orig.)

  3. VIRTUS: a multi-processor system in FASTBUS

    International Nuclear Information System (INIS)

    Ellett, J.; Jackson, R.; Ritter, R.; Schlein, P.; Yaeger, D.; Zweizig, J.

    1986-01-01

    VIRTUS is a system of parallel MC68000-based processors interconnected by FASTBUS that is used either on-line as an intelligent trigger component or off-line for full event processing. Each processor receives the complete set of data from one event. The host computer, a VAX 11/780, down-line loads all software to the processors, controls and monitors the functioning of all processors, and writes processed data to tape. Instructions, programs, and data are transferred among the processors and the host in the form of fixed format, variable length data blocks. (Auth.)

  4. Heuristic framework for parallel sorting computations | Nwanze ...

    African Journals Online (AJOL)

    Parallel sorting techniques have become of practical interest with the advent of new multiprocessor architectures. The decreasing cost of these processors will probably in the future, make the solutions that are derived thereof to be more appealing. Efficient algorithms for sorting scheme that are encountered in a number of ...

  5. Run-time management for future MPSoC platforms

    NARCIS (Netherlands)

    Nollet, V.

    2008-01-01

    In recent years, we are witnessing the dawning of the Multi-Processor Systemon- Chip (MPSoC) era. In essence, this era is triggered by the need to handle more complex applications, while reducing overall cost of embedded (handheld) devices. This cost will mainly be determined by the cost of the

  6. Large-scale molecular dynamics simulations of self-assembling systems.

    Science.gov (United States)

    Klein, Michael L; Shinoda, Wataru

    2008-08-08

    Relentless increases in the size and performance of multiprocessor computers, coupled with new algorithms and methods, have led to novel applications of simulations across chemistry. This Perspective focuses on the use of classical molecular dynamics and so-called coarse-grain models to explore phenomena involving self-assembly in complex fluids and biological systems.

  7. Ada Dual-Use Summary: Ada Dual-Use Workshop Held in Vienna, Virginia on October 19-20, 1993. Ada Dual-Use Committee Briefing, November 8, 1993

    Science.gov (United States)

    1993-11-29

    Systems Agency C- 19 Novemtiber 8., 1993 Promoting the Use of Ada in Computer Science Curricula Timothy J. Long Ohio State University ®I To L4 ease...framework, distributed file systems, a high-performance fiber-optic connection and multi-processor modules to work with the OSF’s Mach 3 microkernel (again

  8. Hardware for Accelerating N-Modular Redundant Systems for High-Reliability Computing

    Science.gov (United States)

    Dobbs, Carl, Sr.

    2012-01-01

    A hardware unit has been designed that reduces the cost, in terms of performance and power consumption, for implementing N-modular redundancy (NMR) in a multiprocessor device. The innovation monitors transactions to memory, and calculates a form of sumcheck on-the-fly, thereby relieving the processors of calculating the sumcheck in software

  9. Parallel iterative solution of the Hermite Collocation equations on GPUs II

    International Nuclear Information System (INIS)

    Vilanakis, N; Mathioudakis, E

    2014-01-01

    Hermite Collocation is a high order finite element method for Boundary Value Problems modelling applications in several fields of science and engineering. Application of this integration free numerical solver for the solution of linear BVPs results in a large and sparse general system of algebraic equations, suggesting the usage of an efficient iterative solver especially for realistic simulations. In part I of this work an efficient parallel algorithm of the Schur complement method coupled with Bi-Conjugate Gradient Stabilized (BiCGSTAB) iterative solver has been designed for multicore computing architectures with a Graphics Processing Unit (GPU). In the present work the proposed algorithm has been extended for high performance computing environments consisting of multiprocessor machines with multiple GPUs. Since this is a distributed GPU and shared CPU memory parallel architecture, a hybrid memory treatment is needed for the development of the parallel algorithm. The realization of the algorithm took place on a multiprocessor machine HP SL390 with Tesla M2070 GPUs using the OpenMP and OpenACC standards. Execution time measurements reveal the efficiency of the parallel implementation

  10. Radon transform computer

    International Nuclear Information System (INIS)

    Current, W.; Hurst, P.; Ford, G.; Shieh, E.; Agi, I.; Nguyen, C.; Peterson, D.

    1990-01-01

    In this final report, we summarize some of our results from September 1989 to October 1990. The design, construction, and testing of a four-processor prototype multi-processor (RTP) board using TI TMS320C25 DSP chips has been completed. We are now finishing the extensive detailed final documentation of the RTP hardware and software. This extensive documentation will be provided to Steve Azevedo when we return the borrowed workstation and deliver the RTP to LLNL. A summary of the test results are in Section II. The design of our fully custom CMOS VLSI chip has been completed. The chip has been designed, the layout completed, and the chip is now going through its final pre-fabrication simulations. The present status of the custom chip design activity will be summarized in the separately submitted ''Final Report on the ''Real-Time Multi-Dimensional Processing Hardware Designs'' Project.'' Evaluations of the hardware requirements for fast filtering of data for filtered backprojection have been completed and are summarized. We briefly summarize the test results of the TMS320 multi-processor prototype RTP board evaluation

  11. Application of a Design Space Exploration Tool to Enhance Interleaver Generation

    Science.gov (United States)

    2009-06-24

    slice turbo codes,” in Proc. 3rd Int. Symp. Turbo Codes, Related Topics, Brest , 2003, pp. 343–346. [11] IEEE 802.15.3a, WPAN High Rate Alternative [12...proceedings of DATE 2004, Paris . [14] O.Muller, A.Baghdadi, M.Jezequel, “ASIP-based multiprocessor SoC design for simple and double binary turbo

  12. A parallel row-based algorithm for standard cell placement with integrated error control

    Science.gov (United States)

    Sargent, Jeff S.; Banerjee, Prith

    1989-01-01

    A new row-based parallel algorithm for standard-cell placement targeted for execution on a hypercube multiprocessor is presented. Key features of this implementation include a dynamic simulated-annealing schedule, row-partitioning of the VLSI chip image, and two novel approaches to control error in parallel cell-placement algorithms: (1) Heuristic Cell-Coloring; (2) Adaptive Sequence Length Control.

  13. Adaptation of a Fault-Tolerant Fpga-Based Launch Sequencer as a Cubesat Payload Processor

    Science.gov (United States)

    2014-06-01

    32–bit, reduced instruction set computing ( RISC ) processor that interfaces with a universal asynchronous receiver/transmitter (UART) for a field...test a fault–tolerant reduced instruction set computer processor running a subset of the multiprocessor without interlocked pipelined stages instruction...James H. Newman Thesis Co-Advisor Clark Robertson Chair, Department of Electrical and Computer Engineering iv THIS PAGE

  14. Fundamental Parallel Algorithms for Private-Cache Chip Multiprocessors

    DEFF Research Database (Denmark)

    Arge, Lars Allan; Goodrich, Michael T.; Nelson, Michael

    2008-01-01

    about the way cores are interconnected, for we assume that all inter-processor communication occurs through the memory hierarchy. We study several fundamental problems, including prefix sums, selection, and sorting, which often form the building blocks of other parallel algorithms. Indeed, we present...... two sorting algorithms, a distribution sort and a mergesort. Our algorithms are asymptotically optimal in terms of parallel cache accesses and space complexity under reasonable assumptions about the relationships between the number of processors, the size of memory, and the size of cache blocks....... In addition, we study sorting lower bounds in a computational model, which we call the parallel external-memory (PEM) model, that formalizes the essential properties of our algorithms for private-cache CMPs....

  15. A multitasking, multisinked, multiprocessor data acquisition front end

    International Nuclear Information System (INIS)

    Fox, R.; Au, R.; Molen, A.V.

    1989-01-01

    The authors have developed a generalized data acquisition front end system which is based on MC68020 processors running a commercial real time kernel (rhoSOS), and implemented primarily in a high level language (C). This system has been attached to the back end on-line computing system at NSCL via our high performance ETHERNET protocol. Data may be simultaneously sent to any number of back end systems. Fixed fraction sampling along links to back end computing is also supported. A nonprocedural program generator simplifies the development of experiment specific code

  16. Analysis, design and management of multimedia multiprocessor systems

    NARCIS (Netherlands)

    Kumar, A.

    2009-01-01

    The design of multimedia platforms is becoming increasingly more complex. Modern multimedia systems need to support a large number of applications or functions in a single device. To achieve high performance in such systems, more and more processors are being integrated into a single chip to build

  17. Lower Bounds and Semi On-line Multiprocessor Scheduling

    Directory of Open Access Journals (Sweden)

    T.C. Edwin Cheng

    2003-10-01

    Full Text Available We are given a set of identical machines and a sequence of jobs from which we know the sum of the job weights in advance. The jobs have to be assigned on-line to one of the machines and the objective is to minimize the makespan. An algorithm with performance ratio 1.6 and a lower bound of 1.5 is presented. This improves recent results by Azar and Regev who published an algorithm with performance ratio 1.625 for the less general problem that the optimal makespan is known in advance.

  18. Operating system considerations in the multiprocessor MIDAS environment

    International Nuclear Information System (INIS)

    Weaver, D.; Maples, C.; Meng, J.; Rathbun, W.

    1983-01-01

    The operating system for MIDAS provides interfaces for custom hardware, debugging facilities, and run time support. The MIDAS architecture uses various specialized hardware devices for controlling the multiple processors and to achieve high I/O throughput. The operating system interfaces with the custom hardware for diagnostics, problem setup, and loading the processors with user code. After the code is loaded, a debugging facility may be used to examine or modify the program in any of the processors, or all the processors simultaneously. During execution of the code, the operating system monitors the processors for exceptional conditions, detects hardware failures, and gathers statistics on performance. This performance information includes histograms depicting instruction execution frequency and analysis of data flow

  19. A multiprocessor system for experiments in nuclear physics

    International Nuclear Information System (INIS)

    Neudold, M.

    1986-12-01

    In the first part of this report, a provisional version of the data acquisition system for the magnetic spectrometer LITTLE JOHN at the Karlsruhe isochronous cyclotron is described. The conceptual ideas of this first version and the problems appearing during the test of its hardware and software components are discussed. Basing on the experience with this system possible extensions and new busstructures are suggested. In the second part, some prototype programs are described, which have been developed in order to test the ADC-systems and the dataways. (orig.) [de

  20. Estimating Performance of Single Bus, Shared Memory Multiprocessors

    Science.gov (United States)

    1987-05-01

    Chandy78] K.M. Chandy, C.M. Sauer, "Approximate methods for analyzing queuing network models of computing systems," Computing Surveys, vol10 , no 3...Denning78] P. Denning, J. Buzen, "The operational analysis of queueing network models", Computing Sur- veys, vol10 , no 3, September 1978, pp 225-261

  1. Benchmarking GNU Radio Kernels and Multi-Processor Scheduling

    Science.gov (United States)

    2013-01-14

    Linux, GNU Radio, and file system for target machine • Run benchmarking scripts with VOLK enabled • Run benchmarking scripts with VOLK disabled ...numpy.arange(1,11) pipes3d, stages3d = numpy.meshgrid(pipes, stages) # print results.size 86 # print pipes.size # print stages.size surf = s1.plot_surface...pipes3d, stages3d, results, rstride=1, cstride=1,cmap=cm.jet, linewidth=0, antialiased=True) 90 # f1.colorbar( surf , shrink=0.5, aspect=5) # s1.set_zlim3d

  2. A multi-microcomputer system for Monte Carlo calculations

    International Nuclear Information System (INIS)

    Hertzberger, L.O.; Berg, B.; Krasemann, H.

    1981-01-01

    We propose a microcomputer system which allows parallel processing for Monte Carlo calculations in lattice gauge theories, simulations of high energy physics experiments and presumably many other fields of current interest. The master-n-slave multiprocessor system is based on the Motorola MC 68000 microprocessor. One attraction if this processor is that it allows up to 16 M Byte random access memory. (orig.)

  3. Software Epistemology

    Science.gov (United States)

    2016-03-01

    in-vitro decision to incubate a startup, Lexumo [7], which is developing a commercial Software as a Service ( SaaS ) vulnerability assessment...LTS Label Transition System MUSE Mining and Understanding Software Enclaves RTEMS Real-Time Executive for Multi-processor Systems SaaS Software ...as a Service SSA Static Single Assignment SWE Software Epistemology UD/DU Def-Use/Use-Def Chains (Dataflow Graph)

  4. The LASS hardware processor

    International Nuclear Information System (INIS)

    Kunz, P.F.

    1976-01-01

    The problems of data analysis with hardware processors are reviewed and a description is given of a programmable processor. This processor, the 168/E, has been designed for use in the LASS multi-processor system; it has an execution speed comparable to the IBM 370/168 and uses the subset of IBM 370 instructions appropriate to the LASS analysis task. (Auth.)

  5. On the existence of cycles of every even length on generalized Fibonacci cubes

    Directory of Open Access Journals (Sweden)

    Norma Zagaglia Salvi

    1996-12-01

    Full Text Available A new topology for the interconnection of computing nodes in multiprocessors systems is the generalized Fibonacci cube.It can be embedded as a subgraph in the Boolean cube and it is also a supergraph of other structures. We prove that every edge of such a graph, but few initial cases, belongs to cycles of every even length.

  6. AMD's 64-bit Opteron processor

    CERN Multimedia

    CERN. Geneva

    2003-01-01

    This talk concentrates on issues that relate to obtaining peak performance from the Opteron processor. Compiler options, memory layout, MPI issues in multi-processor configurations and the use of a NUMA kernel will be covered. A discussion of recent benchmarking projects and results will also be included.BiographiesDavid RichDavid directs AMD's efforts in high performance computing and also in the use of Opteron processors...

  7. A transputer based coprocessor for VEDAS

    International Nuclear Information System (INIS)

    Ziem, P.; Kluge, C.; Kiehne, T.

    1989-01-01

    For real-time recalibration calculations two transputer boards with five T800 transputers have been fitted into a VME based multiprocessor system for experimental data acquisition at VEDAS. The properties of the transputer, i.e. RISC structure and embedded interprocessor links, make this processor a powerful tool for fast data treatment. The transputers have been configured as a dedicated processor network for optimized data throughput

  8. Discrete Hadamard transformation algorithm's parallelism analysis and achievement

    Science.gov (United States)

    Hu, Hui

    2009-07-01

    With respect to Discrete Hadamard Transformation (DHT) wide application in real-time signal processing while limitation in operation speed of DSP. The article makes DHT parallel research and its parallel performance analysis. Based on multiprocessor platform-TMS320C80 programming structure, the research is carried out to achieve two kinds of parallel DHT algorithms. Several experiments demonstrated the effectiveness of the proposed algorithms.

  9. Rabbit System. Low cost, high reliability front end electronics featuring 16 bit dynamic range

    International Nuclear Information System (INIS)

    Drake, G.; Droege, T.F.; Nelson, C.A. Jr.; Turner, K.J.; Ohska, T.K.

    1985-10-01

    A new crate-based front end system has been built which features low cost, compact packaging, command capability, 16 bit dynamic range digitization, and a high degree of redundancy. The crate can contain a variety of instrumentation modules, and is designed to be situated close to the detector. The system is suitable for readout of a large number of channels via parallel multiprocessor data acquisition

  10. DUAL-PROCESS, a highly reliable process control system

    International Nuclear Information System (INIS)

    Buerger, L.; Gossanyi, A.; Parkanyi, T.; Szabo, G.; Vegh, E.

    1983-02-01

    A multiprocessor process control system is described. During its development the reliability was the most important aspect because it is used in the computerized control of a 5 MW research reactor. DUAL-PROCESS is fully compatible with the earlier single processor control system PROCESS-24K. The paper deals in detail with the communication, synchronization, error detection and error recovery problems of the operating system. (author)

  11. OMNI: An optoelectronic multichannel network interface based on hybrid CMOS-SEED technology

    Science.gov (United States)

    Pinkston, Timothy M.

    1996-11-01

    This paper presents a hybrid CMOS-SEED multiprocessor network interface smart pixel design that implements a reservation-based channel control protocol for collisionless concurrent access to multiple optical interprocessor communication channels. An asynchronous optical token is used as the arbitration mechanism for reservation control instead of slotted access. This work demonstrates that complex network protocol functions can be implemented using optoelectronic smart pixel technology.

  12. The trigger and data acquisition system of the SAPHIR detector

    International Nuclear Information System (INIS)

    Honscheid, K.

    1988-10-01

    At present SAPHIR, a new experimental facility for medium energy physics is under construction at the Bonn electron accelerator ELSA (energy ≤ 3.5 GeV, duty cycle ≅ 100%). SAPHIR combines a large solid angle coverage with a tagging system and is therefore suited to investigate reactions with multi-particle final states. Structure and function of the multi-stage trigger system, which is used to select such processes, are described in this paper. With this system the trigger decision can be based on the number of charged particles as well as on the number of neutral particle detected. Several VMEbus modules have been developed, using memory look-up tables to make fast trigger decisions possible. In order to determine the number of neutral particles from the cluster distribution in the electromagnetic calorimeter some ideas of cellular had to be added. The system has a modular structure, so it can easily be extended. In the second part of this thesis the SAPHIR data acquisition system is discussed. It consists of a multiprocessor system with the VIP microcomputer as central element. The VIP is a VMEbus modul optimized for a multiprocessor environment. Its description as well as that of the other VMEbus boards developed for the SAPHIR online system can be found in this paper. As a basis for software development the operating system SOS is supplied. With SOS it is possible to write programs independent of the actual hardware configuration and so the complicated multiprocessor environment is hidden. To the user the system looks like a simple multi-tasking system. SOS is not restricted to the VIPs but can also be installed on computers of the VAX family, so that efficient mixed configurations are possible. The SAPHIR online system, based on the VIP microcomputer and the SOS operating system, is presented in the last part of this paper. This includes the read-out system, the monitoring of the different components etc. (orig./HSI) [de

  13. Parallel processing from applications to systems

    CERN Document Server

    Moldovan, Dan I

    1993-01-01

    This text provides one of the broadest presentations of parallelprocessing available, including the structure of parallelprocessors and parallel algorithms. The emphasis is on mappingalgorithms to highly parallel computers, with extensive coverage ofarray and multiprocessor architectures. Early chapters provideinsightful coverage on the analysis of parallel algorithms andprogram transformations, effectively integrating a variety ofmaterial previously scattered throughout the literature. Theory andpractice are well balanced across diverse topics in this concisepresentation. For exceptional cla

  14. Realtime Audio with Garbage Collection

    OpenAIRE

    Matheussen, Kjetil Svalastog

    2010-01-01

    Two non-moving concurrent garbage collectors tailored for realtime audio processing are described. Both collectors work on copies of the heap to avoid cache misses and audio-disruptive synchronizations. Both collectors are targeted at multiprocessor personal computers. The first garbage collector works in uncooperative environments, and can replace Hans Boehm's conservative garbage collector for C and C++. The collector does not access the virtual memory system. Neither doe...

  15. Real-time systems architectures

    International Nuclear Information System (INIS)

    Sendall, D.M.

    1986-01-01

    The aim of this paper is to explore some of the design issues in online data acquisition and monitoring systems for high-energy physics experiments. In particular it concentrates on the multi-processor aspects of the design of existing and planned experiments. The central problem to be solved by these systems is the readout and checking of the apparatus, and the recording and perhaps some processing of the data. (Auth.)

  16. CELLFS: TAKING THE "DMA" OUT OF CELL PROGRAMMING

    Energy Technology Data Exchange (ETDEWEB)

    IONKOV, LATCHESAR A. [Los Alamos National Laboratory; MIRTCHOVSKI, ANDREY A. [Los Alamos National Laboratory; NYRHINEN, AKI M. [Los Alamos National Laboratory

    2007-01-09

    In this paper we present a new programming model for the Cell BE architecture of scalar multiprocessors. They call this programming model CellFS. CellFS aims at simplifying the task of managing I/O between the local store of the processing units and main memory. The CellFS support library provides the means for transferring data via simple file I/O operations between the PPU and the SPU.

  17. FASTBUS software status

    International Nuclear Information System (INIS)

    Gustavson, D.B.

    1980-10-01

    Computer software will be needed in addition to the mechanical, electrical, protocol and timing specifications of the FASTBUS, in order to facilitate the use of this flexible new multiprocessor and multisegment data acquisition and processing system. Software considerations have been important in the FASTBUS design, but standard subroutines and recommended algorithms will be needed as the FASTBUS comes into use. This paper summarizes current FASTBUS software projects, goals and status

  18. Algebra of Synchronization with Application to Deadlock and Semaphores

    OpenAIRE

    Keith Schubert; Ernesto E. Gomez

    2011-01-01

    Modern multiprocessor architectures have exacerbated problems of coordinating access to shared data, in particular as regards to the possibility of deadlock. For example semaphores, one of the most basic synchronization primitives, present difficulties. Djikstra defined semaphores to solve the problem of mutual exclusion. Practical implementation of the concept has, however, produced semaphores that are prone to deadlock, even while the original definition is theoretically free of it. This is...

  19. The UA1 VME data acquisition system

    International Nuclear Information System (INIS)

    Cittolin, S.

    1988-01-01

    The data acquisition system of a large-scale experiment such as UA1, running at the CERN proton-antiproton collider, has to cope with very high data rates and to perform sophisticated triggering and filtering in order to analyze interesting events. These functions are performed by a variety of programmable units organized in a parallel multiprocessor system whose central architecture is based on the industry-standard VME/VMXbus. (orig.)

  20. A fast, dataflow-oriented preprocessor for cluster counting

    International Nuclear Information System (INIS)

    Imhof, M.

    1995-01-01

    One of the main problems in the use of cluster counting for particle identification in high energy physics is the huge primary data rate. By means of an online data reduction system, this data rate can be reduced to the amount known from conventional integral charge/time readout. For tests and optimization of online data reduction algorithms, a RISC-based, data-flow-oriented multiprocessor board has been designed. (orig.)

  1. High-performance computing — an overview

    Science.gov (United States)

    Marksteiner, Peter

    1996-08-01

    An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.

  2. An Adaptive Hybrid Multiprocessor technique for bioinformatics sequence alignment

    KAUST Repository

    Bonny, Talal; Salama, Khaled N.; Zidan, Mohammed A.

    2012-01-01

    Sequence alignment algorithms such as the Smith-Waterman algorithm are among the most important applications in the development of bioinformatics. Sequence alignment algorithms must process large amounts of data which may take a long time. Here, we

  3. Apparatus for multiprocessor-based control of a multiagent robot

    Science.gov (United States)

    Peters, II, Richard Alan (Inventor)

    2009-01-01

    An architecture for robot intelligence enables a robot to learn new behaviors and create new behavior sequences autonomously and interact with a dynamically changing environment. Sensory information is mapped onto a Sensory Ego-Sphere (SES) that rapidly identifies important changes in the environment and functions much like short term memory. Behaviors are stored in a DBAM that creates an active map from the robot's current state to a goal state and functions much like long term memory. A dream state converts recent activities stored in the SES and creates or modifies behaviors in the DBAM.

  4. A Heterogeneous Multiprocessor Graphics System Using Processor-Enhanced Memories

    Science.gov (United States)

    1989-02-01

    frames per second, font generation directly from conic spline descriptions, and rapid calculation of radiosity form factors. The hardware consists of...generality for rendering curved surfaces, volume data, objects dcscri id with Constructive Solid Geometry, for rendering scenes using the radiosity ...f.aces and for computing a spherical radiosity lighting model (see Section 7.6). Custom Memory Chips \\ 208 bits x 128 pixels - Renderer Board ix p o a

  5. Models and formal verification of multiprocessor system-on-chips

    DEFF Research Database (Denmark)

    Brekling, Aske Wiid; Hansen, Michael Reichhardt; Madsen, Jan

    2008-01-01

    present experimental results on rather small systems with high complexity, primarily due to differences between best-case and worst-case execution times. Considering worst-case execution times only, the system becomes deterministic and using a special version of {Uppaal}, where the no history is saved, we...

  6. A multiprocessor system for the analysis of cardiac arrhythmias

    International Nuclear Information System (INIS)

    Jugo, G. Diego J.

    1983-01-01

    The study presented here forms part of a real-time electrocardiographic signal analysis system project. The medical data analysis system consists of eight independent analyser units, all of which are connected to a controller unit by the standard IEEE-488 bus. Each bed is monitored by an arrhythmias analyser which has the facilities for analysing a patient's electrocardiogram automatically in real time. The measurements may be made in the presence of a pacemaker. The system described presents the technique of a separate and a simultaneous data analysis on two different ECG channels. This permit: a better diagnosis and an optimum use of the bus. (author) [fr

  7. Eclipse : Flexible Media Processing in a Heterogeneous Multiprocessor Template

    NARCIS (Netherlands)

    Rutten, Martijn

    2007-01-01

    Multimedia-apparatuur, zoals DVD spelers, digitale televisies en moderne autoradio's bevatten specialistische chips die een groot deel van de functionaliteit verzorgen. De toepassingen van deze chips, zoals digitale audio en video, volgen elkaar in snel tempo op. De complexiteit van multimedia chips

  8. Advancing Software Development for a Multiprocessor System-on-Chip

    Directory of Open Access Journals (Sweden)

    Stephen Bique

    2007-06-01

    Full Text Available A low-level language is the right tool to develop applications for some embedded systems. Notwithstanding, a high-level language provides a proper environment to develop the programming tools. The target device is a system-on-chip consisting of an array of processors with only local communication. Applications include typical streaming applications for digital signal processing. We describe the hardware model and stress the advantages of a flexible device. We introduce IDEA, a graphical integrated development environment for an array. A proper foundation for software development is a UML and standard programming abstractions in object-oriented languages.

  9. Numerical simulation of nuclear power plants on multiprocessor systems

    International Nuclear Information System (INIS)

    Boeer, R.; Finnemann, H.; Mueller, R.; Krapf, J.

    1990-04-01

    The present final report gives an account of work performed and results obtained at Siemens/KWU within the frame of SUPRENUM project. The coupled neutron kinetics - thermal hydraulics program system 3D-SIM for the calculation of steady-state and transient conditions in the core and in the coolant loops of light water reactors has been developed. Making efficient use of the parallel computing architecture required the implementation of novel, purpose-built software concepts which can be characterized by multigrid algorithms and parallelization by domain decomposition. The modules treating the reactor core are 3D-NEUT for neutron kinetics and 3D-THERM for thermal hydraulics, the remaining system and control components being represented by the module NLOOP. This modular structure is a main feature of 3D-SIM which allows a flexible choice of applications, employing modules individually or in combinations to fulfil a specific calculation task. In particular, 3D-SIM reactor core representation is fully 3-dimensional for both neutron diffusion and thermal-hydraulic equations, allowing coupled calculations and evaluation of crossflow between fuel elements or subchannels. Including also the loops and plant models, 3D-SIM is a powerful analytical tool for simulating pressurized plant response to a wide spectrum of events. (orig.) With 25 refs., 11 figs [de

  10. IODA - a fast, automated and flexible system for ion track analysis on film detectors

    International Nuclear Information System (INIS)

    Guth, H.; Hellmann, A.

    1995-02-01

    The IODA System (Ion Density Analysis) is used to analyse detector films, resulting from experiments at the pulse power generator KALIF (Karlsruhe Light Ion Facility). The system consists of evaluation software and a microcomputer, which controls a microscope, a video interface, and a multiprocessor subsystem. The segmentation of ion tracks is done automatically by means of digital image processing and pattern recognition. After defining an evaluation range and selecting a suitable analysis method, the film is scanned by the microscope for counting the impacts of the underlying image. According to the appearance of the ion tracks on the film, different methods can be selected. The evaluation results representing the ion density are stored in a matrix. The time needed for an evaluation at a high resolution can be shortened by shipping time consuming pattern recognition calculations to the multiprocessor subsystem. The bottlenecks of the system are the data transfer and the speed of the microscope stage. Simple handling of the system even on alphanumeric terminals had been an important design issue. This was implemented by a logically structured menue system including online help features. This report can be used a s a manual to support the user with system operation. (orig.) [de

  11. The Scalable Coherent Interface and related standards projects

    International Nuclear Information System (INIS)

    Gustavson, D.B.

    1991-09-01

    The Scalable Coherent Interface (SCI) project (IEEE P1596) found a way to avoid the limits that are inherent in bus technology. SCI provides bus-like services by transmitting packets on a collection of point-to-point unidirectional links. The SCI protocols support cache coherence in a distributed-shared-memory multiprocessor model, message passing, I/O, and local-area-network-like communication over fiber optic or wire links. VLSI circuits that operate parallel links at 1000 MByte/s and serial links at 1000 Mbit/s will be available early in 1992. Several ongoing SCI-related projects are applying the SCI technology to new areas or extending it to more difficult problems. P1596.1 defines the architecture of a bridge between SCI and VME; P1596.2 compatibly extends the cache coherence mechanism for efficient operation with kiloprocessor systems; P1596.3 defines new low-voltage (about 0.25 V) differential signals suitable for low power interfaces for CMOS or GaAs VLSI implementations of SCI; P1596.4 defines a high performance memory chip interface using these signals; P1596.5 defines data transfer formats for efficient interprocessor communication in heterogeneous multiprocessor systems. This paper reports the current status of SCI, related standards, and new projects. 16 refs

  12. Pthreads vs MPI Parallel Performance of Angular-Domain Decomposed S

    International Nuclear Information System (INIS)

    Azmy, Y.Y.; Barnett, D.A.

    2000-01-01

    Two programming models for parallelizing the Angular Domain Decomposition (ADD) of the discrete ordinates (S n ) approximation of the neutron transport equation are examined. These are the shared memory model based on the POSIX threads (Pthreads) standard, and the message passing model based on the Message Passing Interface (MPI) standard. These standard libraries are available on most multiprocessor platforms thus making the resulting parallel codes widely portable. The question is: on a fixed platform, and for a particular code solving a given test problem, which of the two programming models delivers better parallel performance? Such comparison is possible on Symmetric Multi-Processors (SMP) architectures in which several CPUs physically share a common memory, and in addition are capable of emulating message passing functionality. Implementation of the two-dimensional,(S n ), Arbitrarily High Order Transport (AHOT) code for solving neutron transport problems using these two parallelization models is described. Measured parallel performance of each model on the COMPAQ AlphaServer 8400 and the SGI Origin 2000 platforms is described, and comparison of the observed speedup for the two programming models is reported. For the case presented in this paper it appears that the MPI implementation scales better than the Pthreads implementation on both platforms

  13. A multi-transputer system for parallel Monte Carlo simulations of extensive air showers

    International Nuclear Information System (INIS)

    Gils, H.J.; Heck, D.; Oehlschlaeger, J.; Schatz, G.; Thouw, T.

    1989-01-01

    A multiprocessor computer system has been brought into operation at the Kernforschungszentrum Karlsruhe. It is dedicated to Monte Carlo simulations of extensive air showers induced by ultra-high energy cosmic rays. The architecture consists of two independently working VMEbus systems each with a 68020 microprocessor as host computer and twelve T800 transputers for parallel processing. The two systems are linked via Ethernet for data exchange. The T800 transputers are equipped with 4 Mbyte RAM each, sufficient to run rather large codes. The host computers are operated under UNIX 5.3. On the transputers compilers for PARALLEL FORTRAN, C, and PASCAL are available. The simple modular architecture of this parallel computer reflects the single purpose for which it is intended. The hardware of the multiprocessor computer is described as well as the way how the user software is handled and distributed to the 24 working processors. The performance of the parallel computer is demonstrated by well-known benchmarks and by realistic Monte Carlo simulations of air showers. Comparisons with other types of microprocessors and with large universal computers are made. It is demonstrated that a cost reduction by more than a factor of 20 is achieved by this system as compared to universal computer. (orig.)

  14. Mobile Thread Task Manager

    Science.gov (United States)

    Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin J.

    2013-01-01

    The Mobile Thread Task Manager (MTTM) is being applied to parallelizing existing flight software to understand the benefits and to develop new techniques and architectural concepts for adapting software to multicore architectures. It allocates and load-balances tasks for a group of threads that migrate across processors to improve cache performance. In order to balance-load across threads, the MTTM augments a basic map-reduce strategy to draw jobs from a global queue. In a multicore processor, memory may be "homed" to the cache of a specific processor and must be accessed from that processor. The MTTB architecture wraps access to data with thread management to move threads to the home processor for that data so that the computation follows the data in an attempt to avoid L2 cache misses. Cache homing is also handled by a memory manager that translates identifiers to processor IDs where the data will be homed (according to rules defined by the user). The user can also specify the number of threads and processors separately, which is important for tuning performance for different patterns of computation and memory access. MTTM efficiently processes tasks in parallel on a multiprocessor computer. It also provides an interface to make it easier to adapt existing software to a multiprocessor environment.

  15. Real-time optical laboratory solution of parabolic differential equations

    Science.gov (United States)

    Casasent, David; Jackson, James

    1988-01-01

    An optical laboratory matrix-vector processor is used to solve parabolic differential equations (the transient diffusion equation with two space variables and time) by an explicit algorithm. This includes optical matrix-vector nonbase-2 encoded laboratory data, the combination of nonbase-2 and frequency-multiplexed data on such processors, a high-accuracy optical laboratory solution of a partial differential equation, new data partitioning techniques, and a discussion of a multiprocessor optical matrix-vector architecture.

  16. Adaptive Mesh Refinement in CTH

    International Nuclear Information System (INIS)

    Crawford, David

    1999-01-01

    This paper reports progress on implementing a new capability of adaptive mesh refinement into the Eulerian multimaterial shock- physics code CTH. The adaptivity is block-based with refinement and unrefinement occurring in an isotropic 2:1 manner. The code is designed to run on serial, multiprocessor and massive parallel platforms. An approximate factor of three in memory and performance improvements over comparable resolution non-adaptive calculations has-been demonstrated for a number of problems

  17. Parallel External Memory Graph Algorithms

    DEFF Research Database (Denmark)

    Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari

    2010-01-01

    In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of ¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....

  18. Processor tradeoffs in distributed real-time systems

    Science.gov (United States)

    Krishna, C. M.; Shin, Kang G.; Bhandari, Inderpal S.

    1987-01-01

    The problem of the optimization of the design of real-time distributed systems is examined with reference to a class of computer architectures similar to the continuously reconfigurable multiprocessor flight control system structure, CM2FCS. Particular attention is given to the impact of processor replacement and the burn-in time on the probability of dynamic failure and mean cost. The solution is obtained numerically and interpreted in the context of real-time applications.

  19. Army Communicator. Volume 27, Number 1, Spring 2002

    Science.gov (United States)

    2002-01-01

    master’s degrees (MPA and MBA) and a PhD in economics ,” he said. ACRONYM QUICKSCAN FM – frequency modulation PDF – Panamanian Defense Force PML...support ve- hicles. Technological inserts include the multiprocessor unit and the keyboard-video-mouse switch unit. The MPU is ideally suited for...vintage and the comput- ers have different operating sys- tems). The MPU provides a versatile, configurable platform that consoli- dates up to six

  20. Sequential optimization of matrix chain multiplication relative to different cost functions

    KAUST Repository

    Chikalov, Igor; Hussain, Shahid; Moshkov, Mikhail

    2011-01-01

    In this paper, we present a methodology to optimize matrix chain multiplication sequentially relative to different cost functions such as total number of scalar multiplications, communication overhead in a multiprocessor environment, etc. For n matrices our optimization procedure requires O(n 3) arithmetic operations per one cost function. This work is done in the framework of a dynamic programming extension that allows sequential optimization relative to different criteria. © 2011 Springer-Verlag Berlin Heidelberg.

  1. Interface Message Processors for the ARPA Computer Network

    Science.gov (United States)

    1976-07-01

    and then clear the location) as its primitive locking facility (i.e., as the necessary multiprocessor lock equivalent to Dijkstra semaphores )[37]. To...of the extra storage required for the redundant copies. There is the problem of maintaining synchronization of multiple copy data bases in the presence...through any of the data base sites. I Update synchronization . Races between conflicting, "concurrent" update requests are resolved in a manner that j

  2. Memory Hierarchy Design for Next Generation Scalable Many-core Platforms

    OpenAIRE

    Azarkhish, Erfan

    2016-01-01

    Performance and energy consumption in modern computing platforms is largely dominated by the memory hierarchy. The increasing computational power in the multiprocessors and accelerators, and the emergence of the data-intensive workloads (e.g. large-scale graph traversal and scientific algorithms) requiring fast transfer of large volumes of data, are two main trends which intensify this problem by putting even higher pressure on the memory hierarchy. This increasing gap between computation spe...

  3. The FORCE: A highly portable parallel programming language

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.

  4. A parallel implementation of a maximum entropy reconstruction algorithm for PET images in a visual language

    International Nuclear Information System (INIS)

    Bastiens, K.; Lemahieu, I.

    1994-01-01

    The application of a maximum entropy reconstruction algorithm to PET images requires a lot of computing resources. A parallel implementation could seriously reduce the execution time. However, programming a parallel application is still a non trivial task, needing specialized people. In this paper a programming environment based on a visual programming language is used for a parallel implementation of the reconstruction algorithm. This programming environment allows less experienced programmers to use the performance of multiprocessor systems. (authors)

  5. ATTILA 2 S. A technical and interactive test language for architecture allowing simultaneity

    International Nuclear Information System (INIS)

    Batllo, M.

    1980-01-01

    The name ATTILA 2 S is inspired from ATLAS, test language adopted by the Department of Defence of America (D.O.D.) but cannot be implemented on our installation. ATTILA 2 S is principally characterized by: its technical vocabulary (P.O.L.), its interactivity, its simultaneity with main job (Multiprogramming and Multiprocessing allowed by multiprocessors architecture. This language has been developed for the Paris C.R.T. system (Photographies analysis system) on Control Data Cyber 72 computer [fr

  6. Implementation and Performance of Munin

    OpenAIRE

    Bennett, J.K.; Carter, J.B.; Zwaenepoel, W

    1991-01-01

    Munin is a distributed shared memory (DSM) system that allows shared memory paral­lel programs to be executed efficiently on distributed memory multiprocessors. Munin is unique among existing DSM systems in its use of multiple consistency protocols and in its use of release consistency. In Munin, shared program variables are annotated with their expected access pattern, and these annotations are then used by the runtime system to choose a consistency protocol best suited to that access patt...

  7. Parallel-Processing Test Bed For Simulation Software

    Science.gov (United States)

    Blech, Richard; Cole, Gary; Townsend, Scott

    1996-01-01

    Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).

  8. Development of a virtual training simulator for a challenging refurbishment task

    International Nuclear Information System (INIS)

    Mort, P.E.

    1996-01-01

    An overview is presented of the technology, developed by British Nuclear Fuels Limited (BNFL), to create a virtual training simulator for refurbishment tasks. It focuses on the Raffinate Project, a challenging plant modification take, performed remotely, during which component removal, welding and installation of new components are all undertaken. The Training Simulator developed required fast multiprocessor computing with system intercommunication. Operators responded well to the Training Simulator and further improvements to the system are underway. (UK)

  9. Dynamic grid refinement for partial differential equations on parallel computers

    International Nuclear Information System (INIS)

    Mccormick, S.; Quinlan, D.

    1989-01-01

    The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids to provide adaptive resolution and fast solution of PDEs. An asynchronous version of FAC, called AFAC, that completely eliminates the bottleneck to parallelism is presented. This paper describes the advantage that this algorithm has in adaptive refinement for moving singularities on multiprocessor computers. This work is applicable to the parallel solution of two- and three-dimensional shock tracking problems. 6 refs

  10. It’s an app. It’s a hypervisor. It’s a hypapp. : Design and Implementation of an eXtensible and Modular Hypervisor Framework

    Science.gov (United States)

    2012-06-26

    existing commercial-grade virtualization solutions (e.g., Xen, Linux KVM, VMware, or L4 ), but generally do not require such rich functional- ity [9...ports multiprocessor configuration on both AMD and Intel x86 platforms. Xen [8], KVM [27], VMware [39], NOVA [33], Virtual- box‡‡ and L4 are some...popular general purpose (open-source) hypervisors and microkernels which have been used for var- ious hypervisor based research [9,15,26,30,32,34,45

  11. Domain Decomposition: A Bridge between Nature and Parallel Computers

    Science.gov (United States)

    1992-09-01

    B., "Domain Decomposition Algorithms for Indefinite Elliptic Problems," S"IAM Journal of S; cientific and Statistical (’omputing, Vol. 13, 1992, pp...AD-A256 575 NASA Contractor Report 189709 ICASE Report No. 92-44 ICASE DOMAIN DECOMPOSITION: A BRIDGE BETWEEN NATURE AND PARALLEL COMPUTERS DTIC dE...effectively implemented on dis- tributed memory multiprocessors. In 1990 (as reported in Ref. 38 using the tile algo- rithm), a 103,201-unknown 2D elliptic

  12. Cache Performance Optimization for SoC Vedio Applications

    OpenAIRE

    Lei Li; Wei Zhang; HuiYao An; Xing Zhang; HuaiQi Zhu

    2014-01-01

    Chip Multiprocessors (CMPs) are adopted by industry to deal with the speed limit of the single-processor. But memory access has become the bottleneck of the performance, especially in multimedia applications. In this paper, a set of management policies is proposed to improve the cache performance for a SoC platform of video application. By analyzing the behavior of Vedio Engine, the memory-friendly writeback and efficient prefetch policies are adopted. The experiment platform is simulated by ...

  13. Introduction to Q

    International Nuclear Information System (INIS)

    Kellogg, M.; Minor, M.M.; Shlaer, S.; Spencer, N.; Thomas, R.F. Jr.; van der Beken, H.

    1978-06-01

    This manual presents an overview of the Q data-acquisition system. Its purposes are to assist an experimenter in determining whether the system is suited to his needs and to provide an introduction to the more detailed manuals. This system employs a powerful technique for dealing with multiprocessor systems by generating code and communication protocol for two processors from a single source program in a high-level language. Some details of experimental event buffering and distribution are discussed

  14. Development of a System Analysis Toolkit for Sensitivity Analysis, Uncertainty Propagation, and Estimation of Parameter Distribution

    International Nuclear Information System (INIS)

    Heo, Jaeseok; Kim, Kyung Doo

    2015-01-01

    Statistical approaches to uncertainty quantification and sensitivity analysis are very important in estimating the safety margins for an engineering design application. This paper presents a system analysis and optimization toolkit developed by Korea Atomic Energy Research Institute (KAERI), which includes multiple packages of the sensitivity analysis and uncertainty quantification algorithms. In order to reduce the computing demand, multiple compute resources including multiprocessor computers and a network of workstations are simultaneously used. A Graphical User Interface (GUI) was also developed within the parallel computing framework for users to readily employ the toolkit for an engineering design and optimization problem. The goal of this work is to develop a GUI framework for engineering design and scientific analysis problems by implementing multiple packages of system analysis methods in the parallel computing toolkit. This was done by building an interface between an engineering simulation code and the system analysis software packages. The methods and strategies in the framework were designed to exploit parallel computing resources such as those found in a desktop multiprocessor workstation or a network of workstations. Available approaches in the framework include statistical and mathematical algorithms for use in science and engineering design problems. Currently the toolkit has 6 modules of the system analysis methodologies: deterministic and probabilistic approaches of data assimilation, uncertainty propagation, Chi-square linearity test, sensitivity analysis, and FFTBM

  15. Data driven parallelism in experimental high energy physics applications

    International Nuclear Information System (INIS)

    Pohl, M.

    1987-01-01

    I present global design principles for the implementation of high energy physics data analysis code on sequential and parallel processors with mixed shared and local memory. Potential parallelism in the structure of high energy physics tasks is identified with granularity varying from a few times 10 8 instructions all the way down to a few times 10 4 instructions. It follows the hierarchical structure of detector and data acquisition systems. To take advantage of this - yet preserving the necessary portability of the code - I propose a computational model with purely data driven concurrency in Single Program Multiple Data (SPMD) mode. The task granularity is defined by varying the granularity of the central data structure manipulated. Concurrent processes coordiate themselves asynchroneously using simple lock constructs on parts of the data structure. Load balancing among processes occurs naturally. The scheme allows to map the internal layout of the data structure closely onto the layout of local and shared memory in a parallel architecture. It thus allows to optimize the application with respect to synchronization as well as data transport overheads. I present a coarse top level design for a portable implementation of this scheme on sequential machines, multiprocessor mainframes (e.g. IBM 3090), tightly coupled multiprocessors (e.g. RP-3) and loosely coupled processor arrays (e.g. LCAP, Emulating Processor Farms). (orig.)

  16. Data driven parallelism in experimental high energy physics applications

    Science.gov (United States)

    Pohl, Martin

    1987-08-01

    I present global design principles for the implementation of High Energy Physics data analysis code on sequential and parallel processors with mixed shared and local memory. Potential parallelism in the structure of High Energy Physics tasks is identified with granularity varying from a few times 10 8 instructions all the way down to a few times 10 4 instructions. It follows the hierarchical structure of detector and data acquisition systems. To take advantage of this - yet preserving the necessary portability of the code - I propose a computational model with purely data driven concurrency in Single Program Multiple Data (SPMD) mode. The Task granularity is defined by varying the granularity of the central data structure manipulated. Concurrent processes coordinate themselves asynchroneously using simple lock constructs on parts of the data structure. Load balancing among processes occurs naturally. The scheme allows to map the internal layout of the data structure closely onto the layout of local and shared memory in a parallel architecture. It thus allows to optimize the application with respect to synchronization as well as data transport overheads. I present a coarse top level design for a portable implementation of this scheme on sequential machines, multiprocessor mainframes (e.g. IBM 3090), tightly coupled multiprocessors (e.g. RP-3) and loosely coupled processor arrays (e.g. LCAP, Emulating Processor Farms).

  17. Development of a System Analysis Toolkit for Sensitivity Analysis, Uncertainty Propagation, and Estimation of Parameter Distribution

    Energy Technology Data Exchange (ETDEWEB)

    Heo, Jaeseok; Kim, Kyung Doo [KAERI, Daejeon (Korea, Republic of)

    2015-05-15

    Statistical approaches to uncertainty quantification and sensitivity analysis are very important in estimating the safety margins for an engineering design application. This paper presents a system analysis and optimization toolkit developed by Korea Atomic Energy Research Institute (KAERI), which includes multiple packages of the sensitivity analysis and uncertainty quantification algorithms. In order to reduce the computing demand, multiple compute resources including multiprocessor computers and a network of workstations are simultaneously used. A Graphical User Interface (GUI) was also developed within the parallel computing framework for users to readily employ the toolkit for an engineering design and optimization problem. The goal of this work is to develop a GUI framework for engineering design and scientific analysis problems by implementing multiple packages of system analysis methods in the parallel computing toolkit. This was done by building an interface between an engineering simulation code and the system analysis software packages. The methods and strategies in the framework were designed to exploit parallel computing resources such as those found in a desktop multiprocessor workstation or a network of workstations. Available approaches in the framework include statistical and mathematical algorithms for use in science and engineering design problems. Currently the toolkit has 6 modules of the system analysis methodologies: deterministic and probabilistic approaches of data assimilation, uncertainty propagation, Chi-square linearity test, sensitivity analysis, and FFTBM.

  18. Performance modeling of parallel algorithms for solving neutron diffusion problems

    International Nuclear Information System (INIS)

    Azmy, Y.Y.; Kirk, B.L.

    1995-01-01

    Neutron diffusion calculations are the most common computational methods used in the design, analysis, and operation of nuclear reactors and related activities. Here, mathematical performance models are developed for the parallel algorithm used to solve the neutron diffusion equation on message passing and shared memory multiprocessors represented by the Intel iPSC/860 and the Sequent Balance 8000, respectively. The performance models are validated through several test problems, and these models are used to estimate the performance of each of the two considered architectures in situations typical of practical applications, such as fine meshes and a large number of participating processors. While message passing computers are capable of producing speedup, the parallel efficiency deteriorates rapidly as the number of processors increases. Furthermore, the speedup fails to improve appreciably for massively parallel computers so that only small- to medium-sized message passing multiprocessors offer a reasonable platform for this algorithm. In contrast, the performance model for the shared memory architecture predicts very high efficiency over a wide range of number of processors reasonable for this architecture. Furthermore, the model efficiency of the Sequent remains superior to that of the hypercube if its model parameters are adjusted to make its processors as fast as those of the iPSC/860. It is concluded that shared memory computers are better suited for this parallel algorithm than message passing computers

  19. An Adaptive Insertion and Promotion Policy for Partitioned Shared Caches

    Science.gov (United States)

    Mahrom, Norfadila; Liebelt, Michael; Raof, Rafikha Aliana A.; Daud, Shuhaizar; Hafizah Ghazali, Nur

    2018-03-01

    Cache replacement policies in chip multiprocessors (CMP) have been investigated extensively and proven able to enhance shared cache management. However, competition among multiple processors executing different threads that require simultaneous access to a shared memory may cause cache contention and memory coherence problems on the chip. These issues also exist due to some drawbacks of the commonly used Least Recently Used (LRU) policy employed in multiprocessor systems, which are because of the cache lines residing in the cache longer than required. In image processing analysis of for example extra pulmonary tuberculosis (TB), an accurate diagnosis for tissue specimen is required. Therefore, a fast and reliable shared memory management system to execute algorithms for processing vast amount of specimen image is needed. In this paper, the effects of the cache replacement policy in a partitioned shared cache are investigated. The goal is to quantify whether better performance can be achieved by using less complex replacement strategies. This paper proposes a Middle Insertion 2 Positions Promotion (MI2PP) policy to eliminate cache misses that could adversely affect the access patterns and the throughput of the processors in the system. The policy employs a static predefined insertion point, near distance promotion, and the concept of ownership in the eviction policy to effectively improve cache thrashing and to avoid resource stealing among the processors.

  20. PSHED: a simplified approach to developing parallel programs

    International Nuclear Information System (INIS)

    Mahajan, S.M.; Ramesh, K.; Rajesh, K.; Somani, A.; Goel, M.

    1992-01-01

    This paper presents a simplified approach in the forms of a tree structured computational model for parallel application programs. An attempt is made to provide a standard user interface to execute programs on BARC Parallel Processing System (BPPS), a scalable distributed memory multiprocessor. The interface package called PSHED provides a basic framework for representing and executing parallel programs on different parallel architectures. The PSHED package incorporates concepts from a broad range of previous research in programming environments and parallel computations. (author). 6 refs

  1. A parallel implementation of a maximum entropy reconstruction algorithm for PET images in a visual language

    Energy Technology Data Exchange (ETDEWEB)

    Bastiens, K; Lemahieu, I [University of Ghent - ELIS Department, St. Pietersnieuwstraat 41, B-9000 Ghent (Belgium)

    1994-12-31

    The application of a maximum entropy reconstruction algorithm to PET images requires a lot of computing resources. A parallel implementation could seriously reduce the execution time. However, programming a parallel application is still a non trivial task, needing specialized people. In this paper a programming environment based on a visual programming language is used for a parallel implementation of the reconstruction algorithm. This programming environment allows less experienced programmers to use the performance of multiprocessor systems. (authors). 8 refs, 3 figs, 1 tab.

  2. A RECURSIVE ALGORITHM SUITABLE FOR REAL-TIME MEASUREMENT

    Directory of Open Access Journals (Sweden)

    Giovanni Bucci

    1995-12-01

    Full Text Available This paper deals with a recursive algorithm suitable for realtime measurement applications, based on an indirect technique, useful in those applications where the required quantities cannot be measured in a straightforward way. To cope with time constraints a parallel formulation of it, suitable to be implemented on multiprocessor systems, is presented. The adopted concurrent implementation is based on factorization techniques. Some experimental results related to the application of the system for carrying out measurements on synchronous motors are included.

  3. Self-powered information measuring wireless networks using the distribution of tasks within multicore processors

    Science.gov (United States)

    Zhuravska, Iryna M.; Koretska, Oleksandra O.; Musiyenko, Maksym P.; Surtel, Wojciech; Assembay, Azat; Kovalev, Vladimir; Tleshova, Akmaral

    2017-08-01

    The article contains basic approaches to develop the self-powered information measuring wireless networks (SPIM-WN) using the distribution of tasks within multicore processors critical applying based on the interaction of movable components - as in the direction of data transmission as wireless transfer of energy coming from polymetric sensors. Base mathematic model of scheduling tasks within multiprocessor systems was modernized to schedule and allocate tasks between cores of one-crystal computer (SoC) to increase energy efficiency SPIM-WN objects.

  4. Design and realization of real-time processing system for seismic exploration

    International Nuclear Information System (INIS)

    Zhang Sifeng; Cao Ping; Song Kezhu; Yao Lin

    2010-01-01

    For solving real-time seismic data processing problems, a high-speed, large-capacity and real-time data processing system is designed based on FPGA and ARM. With the advantages of multi-processor, DRPS has the characteristics of high-speed data receiving, large-capacity data storage, protocol analysis, data splicing, data converting from time sequence into channel sequence, no dead time data ping-pong storage, etc. And with the embedded Linux operating system, DRPS has the characteristics of flexibility and reliability. (authors)

  5. Progress on the data acquisition system at NSCL

    International Nuclear Information System (INIS)

    Fox, R.; Au, R.; Glynn, T.; Pollack, B.; Vander Mulen, A.

    1985-01-01

    We report on the progress made in data acquisition software development. Specifically, a unique generalized data routing scheme has been developed which allows user online analysis programs to be safely integrated into the acquisition system without endangering data sent to event recording devices. We describe the structure of the system. Performance is discussed and throughput calculated from it. User experience with the system is also discussed. We show how this routing system may easily be adapted to loosely coupled multiprocessor systems

  6. Parallel algorithms for simulating continuous time Markov chains

    Science.gov (United States)

    Nicol, David M.; Heidelberger, Philip

    1992-01-01

    We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.

  7. Compiling for Application Specific Computational Acceleration in Reconfigurable Architectures Final Report CRADA No. TSB-2033-01

    Energy Technology Data Exchange (ETDEWEB)

    De Supinski, B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Caliga, D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2017-09-28

    The primary objective of this project was to develop memory optimization technology to efficiently deliver data to, and distribute data within, the SRC-6's Field Programmable Gate Array- ("FPGA") based Multi-Adaptive Processors (MAPs). The hardware/software approach was to explore efficient MAP configurations and generate the compiler technology to exploit those configurations. This memory accessing technology represents an important step towards making reconfigurable symmetric multi-processor (SMP) architectures that will be a costeffective solution for large-scale scientific computing.

  8. Windows Vista Kernel-Mode: Functions, Security Enhancements and Flaws

    Directory of Open Access Journals (Sweden)

    Mohammed D. ABDULMALIK

    2008-06-01

    Full Text Available Microsoft has made substantial enhancements to the kernel of the Microsoft Windows Vista operating system. Kernel improvements are significant because the kernel provides low-level operating system functions, including thread scheduling, interrupt and exception dispatching, multiprocessor synchronization, and a set of routines and basic objects.This paper describes some of the kernel security enhancements for 64-bit edition of Windows Vista. We also point out some weakness areas (flaws that can be attacked by malicious leading to compromising the kernel.

  9. Evaluation of SuperLU on multicore architectures

    International Nuclear Information System (INIS)

    Li, X S

    2008-01-01

    The Chip Multiprocessor (CMP) will be the basic building block for computer systems ranging from laptops to supercomputers. New software developments at all levels are needed to fully utilize these systems. In this work, we evaluate performance of different high-performance sparse LU factorization and triangular solution algorithms on several representative multicore machines. We included both Pthreads and MPI implementations in this study and found that the Pthreads implementation consistently delivers good performance and that a left-looking algorithm is usually superior

  10. Study of an analog/logic processor for the design of an auto patch hybrid computer

    International Nuclear Information System (INIS)

    Koched, Hassen

    1976-01-01

    This paper presents the experimental study of an analog multiprocessor designed at SES/CEN-Saclay. An application of such a device as a basic component of an auto-patch hybrid computer is presented. First, the description of the processor, and a presentation of the theoretical concepts which governed the design of the processor are given. Experiments on an hybrid computer are then presented. Finally, different systems of automatic patching are presented, and conveniently modified, for the use of such a processor. (author) [fr

  11. VMEbus interface for spectroscopy ADCs

    International Nuclear Information System (INIS)

    Jaeaeskelaeinen, M.

    1987-01-01

    A high performance VMEbus interface for spectroscopy ADCs and other similar devices used in nuclear spectroscopy coincidence experiments has been developed. This new module can be used to interface existing spectroscopy ADCs with fast parallel data transfer into the industry standard multiprocessor VMEbus. The unit provides a fast direct readout of the ADC data into the VMEbus memory. The interface also has built-in capabilities that enable it to be used in coincidence experiments for slow data timing and ADC pattern recognition. (orig.)

  12. I/O-Optimal Distribution Sweeping on Private-Cache Chip Multiprocessors

    DEFF Research Database (Denmark)

    Ajwani, Deepak; Sitchinava, Nodar; Zeh, Norbert

    2011-01-01

    /PB) for a number of problems on axis aligned objects; P denotes the number of cores/processors, B denotes the number of elements that fit in a cache line, N and K denote the sizes of the input and output, respectively, and sortp(N) denotes the I/O complexity of sorting N items using P processors in the PEM model...... framework was introduced recently, and a number of algorithms for problems on axis-aligned objects were obtained using this framework. The obtained algorithms were efficient but not optimal. In this paper, we improve the framework to obtain algorithms with the optimal I/O complexity of O(sortp(N) + K...

  13. Preconditioning Strategies for Solving Elliptic Difference Equations on a Multiprocessor.

    Science.gov (United States)

    1982-01-01

    CALLING MA31C. C TD - TIME REQUIRED BY MA31C TO PERFORM THE C FACTORIZATION. 160 C TDT - TOTAL TIME REQUIRED BY SUBROUTINE FACTOR. C TDT-T IM3-T IMI...TPD-TIM2-TIM1 TD -TIM3-TIM2 165 C WRITE(LP,70) TDT,TPD, TD 70 FORMAT(7H TDT - ,F6.3,7H TPD " F6.3,6H TD ,F6.3) C WRITE(LP, 85) NTYPE,NVERN 170 85 FORMAT...INTEGER INI(IAI) ,INJ(IAJ) ,IK(NN,4) INTEGER NU(3000) C COMMON/EA 4BD /PRVT(4),IPRVT(6) 15 COMMON/MA31I/DD,LP,MP COMMON/MA31J/LROW,LCOL,NCP,ND, IPD

  14. Simulation of Particulate Flows Multi-Processor Machines with Distributed Memory

    Energy Technology Data Exchange (ETDEWEB)

    Uhlmann, M.

    2004-07-01

    We presented a method for the parallelization of an immersed boundary algorithm for particulate flows using the MPI standard of communication. The treatment of the fluid phase used the domain decomposition technique over a Cartesian processor grid. The solution of the Helmholtz problem is approximately factorized an relies upon apparel tri-diagonal solver the Poisson problem is solved by means of a parallel multi-grid technique similar to MUDPACK. for the solid phase we employ a master-slaves technique where one processor handles all the particles contained in its Eulerian fluid sub-domain and zero or more neighbor processors collaborate in the computation of particle-related quantities whenever a particle position over laps the boundary of a sub-domain. the parallel efficiency for some preliminary computations is presented. (Author) 9 refs.

  15. An accurate analysis for guaranteed performance of multiprocessor streaming applications

    NARCIS (Netherlands)

    Poplavko, P.

    2008-01-01

    Already for more than a decade, consumer electronic devices have been available for entertainment, educational, or telecommunication tasks based on multimedia streaming applications, i.e., applications that process streams of audio and video samples in digital form. Multimedia capabilities are

  16. Dynamic scheduling and analysis of real time systems with multiprocessors

    Directory of Open Access Journals (Sweden)

    M.D. Nashid Anjum

    2016-08-01

    Full Text Available This research work considers a scenario of cloud computing job-shop scheduling problems. We consider m realtime jobs with various lengths and n machines with different computational speeds and costs. Each job has a deadline to be met, and the profit of processing a packet of a job differs from other jobs. Moreover, considered deadlines are either hard or soft and a penalty is applied if a deadline is missed where the penalty is considered as an exponential function of time. The scheduling problem has been formulated as a mixed integer non-linear programming problem whose objective is to maximize net-profit. The formulated problem is computationally hard and not solvable in deterministic polynomial time. This research work proposes an algorithm named the Tube-tap algorithm as a solution to this scheduling optimization problem. Extensive simulation shows that the proposed algorithm outperforms existing solutions in terms of maximizing net-profit and preserving deadlines.

  17. Static and dynamic task mapping onto network on chip multiprocessors

    Directory of Open Access Journals (Sweden)

    Freddy Bolaños-Martínez

    2014-01-01

    Full Text Available Las redes en circuito integrado (NoC representan un importante paradigma de uso creciente para los sistemas multiprocesador en circuito integrado (MPSoC, debido a su flexibilidad y escalabilidad. Las estrategias de tolerancia a fallos han venido adquiriendo importancia, a medida que los procesos de manufactura incursionan en dimensiones por debajo del micrómetro y la complejidad de los diseños aumenta. Este artículo describe un algoritmo de aprendizaje incremental basado en población (PBIL, orientado a optimizar el proceso de mapeo en tiempo de diseño, así como a encontrar soluciones de mapeo óptimas en tiempo de ejecución, para hacer frente a fallos de único nodo en la red. En ambos casos, los objetivos de optimización corresponden al tiempo de ejecución de las aplicaciones y al ancho de banda pico que aparece en la red. Las simulaciones se basaron en un algoritmo de ruteo XY determinístico, operando sobre una topología de malla 2D para la NoC. Los resultados obtenidos son prometedores. El algoritmo propuesto exhibe un desempeño superior a otras técnicas reportadas cuando el tamaño del problema aumenta.

  18. DAPHNE: a parallel multiprocessor data acquisition system for nuclear physics

    International Nuclear Information System (INIS)

    Welch, L.C.

    1984-01-01

    This paper describes a project to meet these data acquisition needs for a new accelerator, ATLAS, being built at Argonne National Laboratory. ATLAS is a heavy-ion linear superconducting accelerator providing beam energies up to 25 MeV/A with a relative spread in beam energy as good as .0001 and a time spread of less than 100 psec. Details about the hardware front end, command language, data structure, and the flow of event treatment are covered

  19. Parallel diffusion calculation for the PHAETON on-line multiprocessor computer

    International Nuclear Information System (INIS)

    Collart, J.M.; Fedon-Magnaud, C.; Lautard, J.J.

    1987-04-01

    The aim of the PHAETON project is the design of an on-line computer in order to increase the immediate knowledge of the main operating and safety parameters in power plants. A significant stage is the computation of the three dimensional flux distribution. For cost and safety reason a computer based on a parallel microprocessor architecture has been studied. This paper presents a first approach to parallelized three dimensional diffusion calculation. A computing software has been written and built in a four processors demonstrator. We present the realization in progress, concerning the final equipment. 8 refs

  20. Computers as Components Principles of Embedded Computing System Design

    CERN Document Server

    Wolf, Wayne

    2008-01-01

    This book was the first to bring essential knowledge on embedded systems technology and techniques under a single cover. This second edition has been updated to the state-of-the-art by reworking and expanding performance analysis with more examples and exercises, and coverage of electronic systems now focuses on the latest applications. Researchers, students, and savvy professionals schooled in hardware or software design, will value Wayne Wolf's integrated engineering design approach.The second edition gives a more comprehensive view of multiprocessors including VLIW and superscalar archite

  1. Concurrent algorithms for nuclear shell model calculations

    International Nuclear Information System (INIS)

    Mackenzie, L.M.; Macleod, A.M.; Berry, D.J.; Whitehead, R.R.

    1988-01-01

    The calculation of nuclear properties has proved very successful for light nuclei, but is limited by the power of the present generation of computers. Starting with an analysis of current techniques, this paper discusses how these can be modified to map parallelism inherent in the mathematics onto appropriate parallel machines. A prototype dedicated multiprocessor for nuclear structure calculations, designed and constructed by the authors, is described and evaluated. The approach adopted is discussed in the context of a number of generically similar algorithms. (orig.)

  2. An Expert System Interfaced with a Database System to Perform Troubleshooting of Aircraft Carrier Piping Systems

    Science.gov (United States)

    1988-12-01

    interval of four feet, and are numbered sequentially bow to stem. * "wing tank" is a tank or void, outboard of the holding bulkhead, away from the center...system and DBMS simultaneously with a multi-processor, allowing queries to the DBMS without terminating the expert system. This method was judged...RECIRC). eductor -strip("Y"):- ask _ques _read_ans(OVBD,"ovbd dis open"),ovbd dis-open(OVBD). eductor-strip("N"):- ask_ques read_ans( LINEUP , "strip lineup

  3. Die-stacking architecture

    CERN Document Server

    Xie, Yuan

    2015-01-01

    The emerging three-dimensional (3D) chip architectures, with their intrinsic capability of reducing the wire length, promise attractive solutions to reduce the delay of interconnects in future microprocessors. 3D memory stacking enables much higher memory bandwidth for future chip-multiprocessor design, mitigating the ""memory wall"" problem. In addition, heterogenous integration enabled by 3D technology can also result in innovative designs for future microprocessors. This book first provides a brief introduction to this emerging technology, and then presents a variety of approaches to design

  4. Parallelism and Scalability in an Image Processing Application

    DEFF Research Database (Denmark)

    Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

    2008-01-01

    parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chip. This means that parallel processing is required in application areas that traditionally have not used...

  5. Parallelism and Scalability in an Image Processing Application

    DEFF Research Database (Denmark)

    Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

    2009-01-01

    parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chips. This means that parallel processing is required in application areas that traditionally have not used...

  6. Data acquisition system for the Large Scintillating Neutrino Detector at Los Alamos

    International Nuclear Information System (INIS)

    Anderson, G.; Cohen, I.; Homann, B.; Smith, D.; Strossman, W.; VanDalen, G.J.; Weaver, L.S.; Evans, D.; Vernon, W.; Band, A.; Burman, R.; Chang, T.; Federspiel, F.; Foreman, W.; Gomulka, S.; Hart, G.; Kozlowski, T.; Louis, W.C.; Margulies, J.; Nuanes, A.; Sandberg, V.; Thompson, T.N.; White, D.H.; Whitehouse, D.

    1992-01-01

    The data acquisition system for the Large Scintillating Neutrino Detector (LSND) is described. The system collects time and charge information in real time from 1600 photomultiplier tubes and passes the data in intelligent-trigger selected time windows to analysis computers, where events are reconstructed and analyzed as candidates for a variety of neutrino-related physics processes. The system is composed of fourteen VME crates linked to a Silicon Graphics, Inc. ''4D/480'' multiprocessor computer through multiple, parallel Ethernets, and a collection of contemporary high-performance workstations

  7. Optical Array Processor: Laboratory Results

    Science.gov (United States)

    Casasent, David; Jackson, James; Vaerewyck, Gerard

    1987-01-01

    A Space Integrating (SI) Optical Linear Algebra Processor (OLAP) is described and laboratory results on its performance in several practical engineering problems are presented. The applications include its use in the solution of a nonlinear matrix equation for optimal control and a parabolic Partial Differential Equation (PDE), the transient diffusion equation with two spatial variables. Frequency-multiplexed, analog and high accuracy non-base-two data encoding are used and discussed. A multi-processor OLAP architecture is described and partitioning and data flow issues are addressed.

  8. Evaluation of the State of the Ecologically Problematic Mining and Industrial Objects in Kalush Region by Electromagnetic Methods and their Monitoring

    Directory of Open Access Journals (Sweden)

    Deshchytsya, S.A., Pidvirny, O.I., Romanyuk, O.I., Sadovy, Yu.V., Kolyadenko, V.V., Savkiv, L.G., Myshchyshyn, Yu.S.

    2016-09-01

    Full Text Available Environmentally hazardous geological processes involved in the waste deposits of potash and rock salt pose significant real threats to the environment. To detect, to study and to prevent in time such the processes (carsting, suffosations, landslides hardware and software complex for induction sounding of the geological environment in the near zone of field source was created and tested. The multiprocessor system for management, collection and transfer to the end users of the data for analysis and operational geological interpretation was developed.

  9. Parallelization of a Monte Carlo particle transport simulation code

    Science.gov (United States)

    Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

    2010-05-01

    We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.

  10. Multilevel Parallelization of AutoDock 4.2

    Directory of Open Access Journals (Sweden)

    Norgan Andrew P

    2011-04-01

    Full Text Available Abstract Background Virtual (computational screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4. Results Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Conclusions Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI and node-level (OpenMP parallelization to best fit both workloads and computational resources.

  11. Multilevel Parallelization of AutoDock 4.2.

    Science.gov (United States)

    Norgan, Andrew P; Coffman, Paul K; Kocher, Jean-Pierre A; Katzmann, David J; Sosa, Carlos P

    2011-04-28

    Virtual (computational) screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4). Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O) traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI) and node-level (OpenMP) parallelization to best fit both workloads and computational resources.

  12. Linear array implementation of the EM algorithm for PET image reconstruction

    International Nuclear Information System (INIS)

    Rajan, K.; Patnaik, L.M.; Ramakrishna, J.

    1995-01-01

    The PET image reconstruction based on the EM algorithm has several attractive advantages over the conventional convolution back projection algorithms. However, the PET image reconstruction based on the EM algorithm is computationally burdensome for today's single processor systems. In addition, a large memory is required for the storage of the image, projection data, and the probability matrix. Since the computations are easily divided into tasks executable in parallel, multiprocessor configurations are the ideal choice for fast execution of the EM algorithms. In tis study, the authors attempt to overcome these two problems by parallelizing the EM algorithm on a multiprocessor systems. The parallel EM algorithm on a linear array topology using the commercially available fast floating point digital signal processor (DSP) chips as the processing elements (PE's) has been implemented. The performance of the EM algorithm on a 386/387 machine, IBM 6000 RISC workstation, and on the linear array system is discussed and compared. The results show that the computational speed performance of a linear array using 8 DSP chips as PE's executing the EM image reconstruction algorithm is about 15.5 times better than that of the IBM 6000 RISC workstation. The novelty of the scheme is its simplicity. The linear array topology is expandable with a larger number of PE's. The architecture is not dependant on the DSP chip chosen, and the substitution of the latest DSP chip is straightforward and could yield better speed performance

  13. A heterogeneous hierarchical architecture for real-time computing

    Energy Technology Data Exchange (ETDEWEB)

    Skroch, D.A.; Fornaro, R.J.

    1988-12-01

    The need for high-speed data acquisition and control algorithms has prompted continued research in the area of multiprocessor systems and related programming techniques. The result presented here is a unique hardware and software architecture for high-speed real-time computer systems. The implementation of a prototype of this architecture has required the integration of architecture, operating systems and programming languages into a cohesive unit. This report describes a Heterogeneous Hierarchial Architecture for Real-Time (H{sup 2} ART) and system software for program loading and interprocessor communication.

  14. Computer Architecture A Quantitative Approach

    CERN Document Server

    Hennessy, John L

    2007-01-01

    The era of seemingly unlimited growth in processor performance is over: single chip architectures can no longer overcome the performance limitations imposed by the power they consume and the heat they generate. Today, Intel and other semiconductor firms are abandoning the single fast processor model in favor of multi-core microprocessors--chips that combine two or more processors in a single package. In the fourth edition of Computer Architecture, the authors focus on this historic shift, increasing their coverage of multiprocessors and exploring the most effective ways of achieving parallelis

  15. Distributed optimization system and method

    Science.gov (United States)

    Hurtado, John E.; Dohrmann, Clark R.; Robinett, III, Rush D.

    2003-06-10

    A search system and method for controlling multiple agents to optimize an objective using distributed sensing and cooperative control. The search agent can be one or more physical agents, such as a robot, and can be software agents for searching cyberspace. The objective can be: chemical sources, temperature sources, radiation sources, light sources, evaders, trespassers, explosive sources, time dependent sources, time independent sources, function surfaces, maximization points, minimization points, and optimal control of a system such as a communication system, an economy, a crane, and a multi-processor computer.

  16. Distributed Optimization System

    Science.gov (United States)

    Hurtado, John E.; Dohrmann, Clark R.; Robinett, III, Rush D.

    2004-11-30

    A search system and method for controlling multiple agents to optimize an objective using distributed sensing and cooperative control. The search agent can be one or more physical agents, such as a robot, and can be software agents for searching cyberspace. The objective can be: chemical sources, temperature sources, radiation sources, light sources, evaders, trespassers, explosive sources, time dependent sources, time independent sources, function surfaces, maximization points, minimization points, and optimal control of a system such as a communication system, an economy, a crane, and a multi-processor computer.

  17. Ring-array processor distribution topology for optical interconnects

    Science.gov (United States)

    Li, Yao; Ha, Berlin; Wang, Ting; Wang, Sunyu; Katz, A.; Lu, X. J.; Kanterakis, E.

    1992-01-01

    The existing linear and rectangular processor distribution topologies for optical interconnects, although promising in many respects, cannot solve problems such as clock skews, the lack of supporting elements for efficient optical implementation, etc. The use of a ring-array processor distribution topology, however, can overcome these problems. Here, a study of the ring-array topology is conducted with an aim of implementing various fast clock rate, high-performance, compact optical networks for digital electronic multiprocessor computers. Practical design issues are addressed. Some proof-of-principle experimental results are included.

  18. Characterization of Static/Dynamic Topological Routing For Grid Networks

    DEFF Research Database (Denmark)

    Gutierrez Lopez, Jose Manuel; Cuevas, Ruben; Riaz, M. Tahir

    2009-01-01

    Grid or 2D Mesh structures are becoming one of the most attractive network topologies to study. They can be used in many different fields raging from future broadband networks to multiprocessors structures. In addition, the high requirements of future services and applications demand more flexible...... and adaptive networks. Topological routing in grid networks is a simple and efficient alternative to traditional routing techniques, e.g. routing tables, and the paper extends this kind of routing providing a "Dynamic" attribute. This new property attempts to improve the overall network performance for future...

  19. Photonic network-on-chip design

    CERN Document Server

    Bergman, Keren; Biberman, Aleksandr; Chan, Johnnie; Hendry, Gilbert

    2013-01-01

    This book provides a comprehensive synthesis of the theory and practice of photonic devices for networks-on-chip. It outlines the issues in designing photonic network-on-chip architectures for future many-core high performance chip multiprocessors. The discussion is built from the bottom up: starting with the design and implementation of key photonic devices and building blocks, reviewing networking and network-on-chip theory and existing research, and finishing with describing various architectures, their characteristics, and the impact they will have on a computing system. After acquainting

  20. A Reactive and Cycle-True IP Emulator for MPSoC Exploration

    DEFF Research Database (Denmark)

    Mahadevan, Shankar; Angiolini, Federico; Sparsø, Jens

    2008-01-01

    The design of MultiProcessor Systems-on-Chip (MPSoC) emphasizes intellectual-property (IP)-based communication-centric approaches. Therefore, for the optimization of the MPSoC interconnect, the designer must develop traffic models that realistically capture the application behavior as executing...... on the IP core. In this paper, we introduce a Reactive IP Emulator (RIPE) that enables an effective emulation of the IP-core behavior in multiple environments, including bit and cycle-true simulation. The RIPE is built as a multithreaded abstract instruction-set processor, and it can generate reactive...

  1. Macintosh in the laboratory - an approach

    International Nuclear Information System (INIS)

    Taylor, B.

    1988-01-01

    The high degree of parallelism possible in a distributed configuration of VME multiprocessors minimizes dead-time in the read-out of a large detector and allows sophisticated triggering and filtering systems to be implemented. Experience has also shown that systems based on standard instrumentation can be readily reconfigured as needs change, and enhanced as technology evolves. Apple Macintosh computers can be used as cost-effective software development workstations for such systems and their graphics-oriented use-interface has proved well-suited to control and monitoring tasks during data-taking. (orig./HSI).

  2. A parallel buffer tree

    DEFF Research Database (Denmark)

    Sitchinava, Nodar; Zeh, Norbert

    2012-01-01

    We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number of...... in the optimal OhOf(psortN + K/PB) parallel I/O complexity, where K is the size of the output reported in the process and psortN is the parallel I/O complexity of sorting N elements using P processors....

  3. A scalable parallel algorithm for multiple objective linear programs

    Science.gov (United States)

    Wiecek, Malgorzata M.; Zhang, Hong

    1994-01-01

    This paper presents an ADBASE-based parallel algorithm for solving multiple objective linear programs (MOLP's). Job balance, speedup and scalability are of primary interest in evaluating efficiency of the new algorithm. Implementation results on Intel iPSC/2 and Paragon multiprocessors show that the algorithm significantly speeds up the process of solving MOLP's, which is understood as generating all or some efficient extreme points and unbounded efficient edges. The algorithm gives specially good results for large and very large problems. Motivation and justification for solving such large MOLP's are also included.

  4. On the impact of communication complexity in the design of parallel numerical algorithms

    Science.gov (United States)

    Gannon, D.; Vanrosendale, J.

    1984-01-01

    This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation.

  5. A high performance parallel approach to medical imaging

    International Nuclear Information System (INIS)

    Frieder, G.; Frieder, O.; Stytz, M.R.

    1988-01-01

    Research into medical imaging using general purpose parallel processing architectures is described and a review of the performance of previous medical imaging machines is provided. Results demonstrating that general purpose parallel architectures can achieve performance comparable to other, specialized, medical imaging machine architectures is presented. A new back-to-front hidden-surface removal algorithm is described. Results demonstrating the computational savings obtained by using the modified back-to-front hidden-surface removal algorithm are presented. Performance figures for forming a full-scale medical image on a mesh interconnected multiprocessor are presented

  6. From LPF to eLISA: new approach in payload software

    Science.gov (United States)

    Gesa, Ll.; Martin, V.; Conchillo, A.; Ortega, J. A.; Mateos, I.; Torrents, A.; Lopez-Zaragoza, J. P.; Rivas, F.; Lloro, I.; Nofrarias, M.; Sopuerta, CF.

    2017-05-01

    eLISA will be the first observatory in space to explore the Gravitational Universe. It will gather revolutionary information about the dark universe. This implies a robust and reliable embedded control software and hardware working together. With the lessons learnt with the LISA Pathfinder payload software as baseline, we will introduce in this short article the key concepts and new approaches that our group is working on in terms of software: multiprocessor, self-modifying-code strategies, 100% hardware and software monitoring, embedded scripting, Time and Space Partition among others.

  7. Application Portable Parallel Library

    Science.gov (United States)

    Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

    1995-01-01

    Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.

  8. A Time-predictable Memory Network-on-Chip

    DEFF Research Database (Denmark)

    Schoeberl, Martin; Chong, David VH; Puffitsch, Wolfgang

    2014-01-01

    To derive safe bounds on worst-case execution times (WCETs), all components of a computer system need to be time-predictable: the processor pipeline, the caches, the memory controller, and memory arbitration on a multicore processor. This paper presents a solution for time-predictable memory...... arbitration and access for chip-multiprocessors. The memory network-on-chip is organized as a tree with time-division multiplexing (TDM) of accesses to the shared memory. The TDM based arbitration completely decouples processor cores and allows WCET analysis of the memory accesses on individual cores without...

  9. SEU and latch-up results on transputers

    International Nuclear Information System (INIS)

    Bezerra, F.; Velazco, R.; Assoum, A.; Benezech, D.

    1996-01-01

    Transputers are processors with four very high-speed serial links in addition to their conventional input/output interfaces. These links allow them in particular to be used in networks as part of a multiprocessor architecture. Transputers are also equipped with an internal cache memory intended to increase the program execution speed by limiting the number of external accesses. IMST805 and IMST225 transputers are being considered for space applications. In a view to predict their in orbit behavior, their sensitivity to Single Event Phenomena has been evaluated under heavy ions

  10. "Real-Time Optical Laboratory Linear Algebra Solution Of Partial Differential Equations"

    Science.gov (United States)

    Casasent, David; Jackson, James

    1986-03-01

    A Space Integrating (SI) Optical Linear Algebra Processor (OLAP) employing space and frequency-multiplexing, new partitioning and data flow, and achieving high accuracy performance with a non base-2 number system is described. Laboratory data on the performance of this system and the solution of parabolic Partial Differential Equations (PDEs) is provided. A multi-processor OLAP system is also described for the first time. It use in the solution of multiple banded matrices that frequently arise is then discussed. The utility and flexibility of this processor compared to digital systolic architectures should be apparent.

  11. Simulation-based Modeling Frameworks for Networked Multi-processor System-on-Chip

    DEFF Research Database (Denmark)

    Mahadevan, Shankar

    2006-01-01

    the requirements to model the application and the architecture properties independent of the NoC, and then use these applications to successfully validate the approach against a reference cycle-true system. The presence of a standard socket at the intellectual property (IP) and the NoC interface in both the ARTS...

  12. Hardware Locks with Priority Ceiling Emulation for a Java Chip-Multiprocessor

    DEFF Research Database (Denmark)

    Strøm, Torur Biskopstø; Schoeberl, Martin

    2015-01-01

    According to the safety-critical Java specification, priority ceiling emulation is a requirement for implementations, as it has preferable properties, such as avoiding priority inversion and being deadlock free on uni-core systems. In this paper we explore our hardware supported implementation...... of priority ceiling emulation on the multicore Java optimized processor, and compare it to the existing hardware locks on the Java optimized processor. We find that the additional overhead for priority ceiling emulation on a multicore processor is several times higher than simpler, non-premptive locks, mainly...

  13. Feedback control and beam diagnostic algorithms for a multiprocessor DSP system

    International Nuclear Information System (INIS)

    Teytelman, D.; Claus, R.; Fox, J.; Hindi, H.; Linscott, I.; Prabhakar, S.

    1996-09-01

    The multibunch longitudinal feedback system developed for use by PEP-II, ALS and DAΦNE uses a parallel array of digital signal processors to calculate the feedback signals from measurements of beam motion. The system is designed with general-purpose programmable elements which allow many feedback operating modes as well as system diagnostics, calibrations and accelerator measurements. The overall signal processing architecture of the system is illustrated. The real-time DSP algorithms and off-line postprocessing tools are presented. The problems in managing 320 K samples of data collected in one beam transient measurement are discussed and the solutions are presented. Example software structures are presented showing the beam feedback process, techniques for modal analysis of beam motion(used to quantify growth and damping rates of instabilities) and diagnostic functions (such as timing adjustment of beam pick-up and kicker components). These operating techniques are illustrated with example results obtained from the system installed at the Advanced Light Source at LBL

  14. A multiprocessor system for the analysis of pictures of nuclear events

    CERN Document Server

    Bacilieri, P; Matteuzzi, P; Sini, G P; Zanotti, U

    1979-01-01

    The pictures of nuclear events obtained from the bubble chambers such as Gargamelle and BEBC at CERN and others from Serpukhov are geometrically processed at CNAF (Centro Nazionale Analysis Photogrammi) in Bologna. The analysis system includes an Erasme table and a CRT flying spot digitizer. The difficulties connected with the pictures of the four stereoscopic views of the bubble chambers are overcome by the choice of a strong interactive system. (0 refs).

  15. HEP - A semaphore-synchronized multiprocessor with central control. [Heterogeneous Element Processor

    Science.gov (United States)

    Gilliland, M. C.; Smith, B. J.; Calvert, W.

    1976-01-01

    The paper describes the design concept of the Heterogeneous Element Processor (HEP), a system tailored to the special needs of scientific simulation. In order to achieve high-speed computation required by simulation, HEP features a hierarchy of processes executing in parallel on a number of processors, with synchronization being largely accomplished by hardware. A full-empty-reserve scheme of synchronization is realized by zero-one-valued hardware semaphores. A typical system has, besides the control computer and the scheduler, an algebraic module, a memory module, a first-in first-out (FIFO) module, an integrator module, and an I/O module. The architecture of the scheduler and the algebraic module is examined in detail.

  16. A heterogeneous multiprocessor architecture for low-power audio signal processing applications

    DEFF Research Database (Denmark)

    Paker, Ozgun; Sparsø, Jens; Haandbæk, Niels

    2001-01-01

    . The processors are tailored for different classes of filtering algorithms (FIR, IIR, N-LMS etc.), and in a typical system the communication among processors occurs at the sampling rate only. The processors are parameterized in word-size, memory-size, etc. and can be instantiated according to the needs...... of the application at hand using a normal synthesis based ASIC design flow. To give an impression of the size of a processor we mention that one of the FIR processors in a prototype design has 16 instructions, a 32 word×16 bit program memory, a 64 word×16 bit data memory and a 25 word×16 bit coefficient memory....... Early results obtained from the design of a prototype chip containing filter processors for a hearing aid application, indicate a power consumption that is an order of magnitude better than current state of the art low-power audio DSPs implemented using full-custom techniques. This is due to: (1...

  17. 3D Reconfigurable NoC Multiprocessor Imaging Interferometer for Space Climate

    Science.gov (United States)

    Dekoulis, George

    2016-07-01

    This paper describes the development of an imaging interferometer for long-term observations of solar activity related events. Heliospheric physics phenomena are responsible for causing irregularities to the ionospheric-magnetospheric plasmasphere. Distinct signatures of these events are captured and studied over long periods of time deducting crucial conclusions about the short-term Space Weather and in the long run about Space Climate. The new prototype features an eight-channel implementation. The available hardware resources permit a 256- channel configuration for accurate beam scanning of the Earth's plasmasphere. A dual-polarization scheme has been implemented for obtaining accurate measurements. The system is based on state-of-the-art three-dimensional reconfigurable logic and exhibits a performance increase in the range of 70% compared to similar instruments in operation. Special circuits allow measurements of the most intense heliospheric physics events to be fully captured and analyzed.

  18. 3D Reconfigurable NoC Multiprocessor Portable Sounder for Plasmaspheric Studies

    Science.gov (United States)

    Dekoulis, George

    2016-07-01

    The paper describes the development of a prototype imaging sounder for studying the irregularities of the ionospheric plasma. Cutting edge three-dimensional reconfigurable logic has been implemented allowing highly-intensive scientific calculations to be performed in hardware. The new parallel processing algorithms implemented offer a significant amount of performance improvement in the range of 80% compared to existing digital sounder implementations. The current system configuration is taking into consideration the modern scientific needs for portability during scientific campaigns. The prototype acts as a digital signal processing experimentation platform for future larger-scale digital sounder instrumentations for measuring complex planetary plasmaspheric environments.

  19. Simulation of Particulate Flows on Multi-Processor Machines with Distributed Memory

    International Nuclear Information System (INIS)

    Uhlmann, M.

    2004-01-01

    We present a method for the parallelization of an immersed boundary algorithm for particulate flows using the MPI standard of communication. The treatment of the fluid phase uses the domain decomposition technique over a Cartesian processor grid. The solution of the Hehnholtz problem is approximately factorized an relies upon apparel tri-diagonal solver; the Poisson problem is solved by means of a parallel multi-grid technique simulator MUDPACK. For the solid phase we employ a master-slaves technique where one process or handles all the particles contained in its Eulerian fluid sub-domain and zero or more neighbor processors collaborate in the computation of particle-related quantities whenever a particle position overlaps the boundary of a sub- do mam.The parallel efficiency for some preliminary computations is presented. (Author) 9 refs

  20. Centralized multiprocessor control system for the frascati storage rings DAΦNE

    International Nuclear Information System (INIS)

    Di Pirro, G.; Milardi, C.; Serio, M.

    1992-01-01

    We describe the status of the DANTE (DAΦne New Tools Environment) control system for the new DAΦNE Φ-factory under construction at the Frascati National Laboratories. The system is based on a centralized communication architecture for simplicity and reliability. A central processor unit coordinates all communications between the consoles and the lower level distributed processing power, and continuously updates a central memory that contains the whole machine status. We have developed a system of VME Fiber Optic interfaces allowing very fast point to point communication between distant processors. Macintosh II personal computers are used as consoles. The lower levels are all built using the VME standard. (author)

  1. The Triangle: a Multiprocessor Architecture for Fast Curve and Surface Generation.

    Science.gov (United States)

    1987-08-01

    In this type of design, the designer may begin with a mental image of the desired shape, the goal being communication of that shape to the system. The...system in turn stores some internal representation of the shape. For "free-form" shapes such as the outline of a character in a typography system...by stereo-graphic viewing is that points of evaluation must be passed through two perspective transformations - one for the left eye image and one for

  2. Application of Raptor-M3G to reactor dosimetry problems on massively parallel architectures - 026

    International Nuclear Information System (INIS)

    Longoni, G.

    2010-01-01

    The solution of complex 3-D radiation transport problems requires significant resources both in terms of computation time and memory availability. Therefore, parallel algorithms and multi-processor architectures are required to solve efficiently large 3-D radiation transport problems. This paper presents the application of RAPTOR-M3G (Rapid Parallel Transport Of Radiation - Multiple 3D Geometries) to reactor dosimetry problems. RAPTOR-M3G is a newly developed parallel computer code designed to solve the discrete ordinates (SN) equations on multi-processor computer architectures. This paper presents the results for a reactor dosimetry problem using a 3-D model of a commercial 2-loop pressurized water reactor (PWR). The accuracy and performance of RAPTOR-M3G will be analyzed and the numerical results obtained from the calculation will be compared directly to measurements of the neutron field in the reactor cavity air gap. The parallel performance of RAPTOR-M3G on massively parallel architectures, where the number of computing nodes is in the order of hundreds, will be analyzed up to four hundred processors. The performance results will be presented based on two supercomputing architectures: the POPLE supercomputer operated by the Pittsburgh Supercomputing Center and the Westinghouse computer cluster. The Westinghouse computer cluster is equipped with a standard Ethernet network connection and an InfiniBand R interconnects capable of a bandwidth in excess of 20 GBit/sec. Therefore, the impact of the network architecture on RAPTOR-M3G performance will be analyzed as well. (authors)

  3. An introduction to the FASTBUS system

    International Nuclear Information System (INIS)

    Rimmer, E.M.

    1980-11-01

    FASTBUS is a standard bus system being developed, primarily, for high-speed data acquisition and processing in the next generation of large-scale physics experiments. It is built up from backplane segments housed in crates, linked together by cable segments for inter-crate communication. Each segment supports multiprocessors, and independent segment operation permits a high degree of parallelism. Handshaked bus protocols, uniform system-wide, ensure reliability and both high- and low-speed devices can be accommodated. A synchronous mode provides for data block transfers at maximum speed. The fundamental structure of FASTBUS and details of its basic operations are presented. (author)

  4. Introduction to the Fastbus

    International Nuclear Information System (INIS)

    Gustavson, D.B.

    1979-08-01

    The Fastbus is a modular data bus system for data acquisition and data processing. It is a multiprocessor system with multiple bus segments which operate independently but link together for passing data. It operates asynchronously to accommodate very high- and very low-speed devices over long or short paths, and uses handshake protocols for reliability. It can also operate synchronously without handshakes for transfer of data blocks at maximum speed. The goals, history, and motivation for the Fastbus are summarized briefly. The structure of the Fastbus system is described in general, and some details of its operation are introduced

  5. Development of Ada language control software for the NASA power management and distribution test bed

    Science.gov (United States)

    Wright, Ted; Mackin, Michael; Gantose, Dave

    1989-01-01

    The Ada language software developed to control the NASA Lewis Research Center's Power Management and Distribution testbed is described. The testbed is a reduced-scale prototype of the electric power system to be used on space station Freedom. It is designed to develop and test hardware and software for a 20-kHz power distribution system. The distributed, multiprocessor, testbed control system has an easy-to-use operator interface with an understandable English-text format. A simple interface for algorithm writers that uses the same commands as the operator interface is provided, encouraging interactive exploration of the system.

  6. A heterogeneous multi-core platform for low power signal processing in systems-on-chip

    DEFF Research Database (Denmark)

    Paker, Ozgun; Sparsø, Jens; Haandbæk, Niels

    2002-01-01

    is based on message passing. The mini-cores are designed as parameterized soft macros intended for a synthesis based design flow. A 520.000 transistor 0.25µm CMOS prototype chip containing 6 mini-cores has been fabricated and tested. Its power consumption is only 50% higher than a hardwired ASIC and more......This paper presents a low-power and programmable DSP architecture - a heterogeneous multiprocessor platform consisting of standard CPU/DSP cores, and a set of simple instruction set processors called mini-cores each optimized for a particular class of algorithm (FIR, IIR, LMS, etc.). Communication...

  7. Additive Difference Schemes for Filtration Problems in Multilayer Systems

    CERN Document Server

    Ayrjan, E A; Pavlush, M; Fedorov, A V

    2000-01-01

    In the present paper difference schemes for solution of the plane filtration problem in multilayer systems are analyzed within the framework of difference schemes general theory. Attention is paid to splitting the schemes on physical processes of filtration along water-carring layers and vertical motion between layers. Some absolutely stable additive difference schemes are obtained the realization of which needs no software modification. Parallel algorithm connected with the solving of the filtration problem in every water-carring layer on a single processor is constructed. Program realization on the multi-processor system SPP2000 at JINR is discussed.

  8. Droplet flow along the wall of rectangular channel with gradient of wettability

    Science.gov (United States)

    Kupershtokh, A. L.

    2018-03-01

    The lattice Boltzmann equations (LBE) method (LBM) is applicable for simulating the multiphysics problems of fluid flows with free boundaries, taking into account the viscosity, surface tension, evaporation and wetting degree of a solid surface. Modeling of the nonstationary motion of a drop of liquid along a solid surface with a variable level of wettability is carried out. For the computer simulation of such a problem, the three-dimensional lattice Boltzmann equations method D3Q19 is used. The LBE method allows us to parallelize the calculations on multiprocessor graphics accelerators using the CUDA programming technology.

  9. Parallelizing an electron transport Monte Carlo simulator (MOCASIN 2.0)

    International Nuclear Information System (INIS)

    Schwetman, H.; Burdick, S.

    1988-01-01

    Electron transport simulators are tools for studying electrical properties of semiconducting materials and devices. As demands for modeling more complex devices and new materials have emerged, so have demands for more processing power. This paper documents a project to convert an electron transport simulator (MOCASIN 2.0) to a parallel processing environment. In addition to describing the conversion, the paper presents PPL, a parallel programming version of C running on a Sequent multiprocessor system. In timing tests, models that simulated the movement of 2,000 particles for 100 time steps were executed on ten processors, with a parallel efficiency of over 97%

  10. Introduction to programming multiple-processor computers

    International Nuclear Information System (INIS)

    Hicks, H.R.; Lynch, V.E.

    1985-04-01

    FORTRAN applications programs can be executed on multiprocessor computers in either a unitasking (traditional) or multitasking form. The latter allows a single job to use more than one processor simultaneously, with a consequent reduction in wall-clock time and, perhaps, the cost of the calculation. An introduction to programming in this environment is presented. The concepts of synchronization and data sharing using EVENTS and LOCKS are illustrated with examples. The strategy of strong synchronization and the use of synchronization templates are proposed. We emphasize that incorrect multitasking programs can produce irreproducible results, which makes debugging more difficult

  11. Speedup predictions on large scientific parallel programs

    International Nuclear Information System (INIS)

    Williams, E.; Bobrowicz, F.

    1985-01-01

    How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory

  12. An introduction to programming multiple-processor computers

    International Nuclear Information System (INIS)

    Hicks, H.R.; Lynch, V.E.

    1986-01-01

    Fortran applications programs can be executed on multiprocessor computers in either a unitasking (traditional) or multitasking form. The later allows a single job to use more than one processor simultaneously, with a consequent reduction in elapsed time and, perhaps, the cost of the calculation. An introduction to programming in this environment is presented. The concept of synchronization and data sharing using EVENTS and LOCKS are illustrated with examples. The strategy of strong synchronization and the use of synchronization templates are proposed. We emphasize that incorrect multitasking programs can produce irreducible results, which makes debugging more difficult

  13. Parallel implementations of 2D explicit Euler solvers

    International Nuclear Information System (INIS)

    Giraud, L.; Manzini, G.

    1996-01-01

    In this work we present a subdomain partitioning strategy applied to an explicit high-resolution Euler solver. We describe the design of a portable parallel multi-domain code suitable for parallel environments. We present several implementations on a representative range of MlMD computers that include shared memory multiprocessors, distributed virtual shared memory computers, as well as networks of workstations. Computational results are given to illustrate the efficiency, the scalability, and the limitations of the different approaches. We discuss also the effect of the communication protocol on the optimal domain partitioning strategy for the distributed memory computers

  14. Language Constructs for Data Partitioning and Distribution

    Directory of Open Access Journals (Sweden)

    P. Crooks

    1995-01-01

    Full Text Available This article presents a survey of language features for distributed memory multiprocessor systems (DMMs, in particular, systems that provide features for data partitioning and distribution. In these systems the programmer is freed from consideration of the low-level details of the target architecture in that there is no need to program explicit processes or specify interprocess communication. Programs are written according to the shared memory programming paradigm but the programmer is required to specify, by means of directives, additional syntax or interactive methods, how the data of the program are decomposed and distributed.

  15. TOPOLOGY FOR A DSP BASED BEAM CONTROL SYSTEM IN THE AGS BOOSTER

    International Nuclear Information System (INIS)

    DELONG, J.; BRENNAN, J.M.; HAYES, T.; LE, T.N.; SMITH, K.

    2003-01-01

    The AGS Booster supports beams of ions and protons with a wide range of energies on a pulse-by-pulse modulation basis. This requires an agile beam control system highly integrated with its controls. To implement this system digital techniques in the form of Digital Signal Processors, Direct Digital Synthesizers, digital receivers and high speed Analog to Digital Converters are used. Signals from the beam and cavity pick-ups, as well as measurements of magnetic field strength in the ring dipoles are processed in real time. To facilitate this a multi-processor topology with high bandwidth data links is being designed

  16. Blocking Optimality in Distributed Real-Time Locking Protocols

    Directory of Open Access Journals (Sweden)

    Björn Bernhard Brandenburg

    2014-09-01

    Full Text Available Lower and upper bounds on the maximum priority inversion blocking (pi-blocking that is generally unavoidable in distributed multiprocessor real-time locking protocols (where resources may be accessed only from specific synchronization processors are established. Prior work on suspension-based shared-memory multiprocessor locking protocols (which require resources to be accessible from all processors has established asymptotically tight bounds of Ω(m and Ω(n maximum pi-blocking under suspension-oblivious and suspension-aware analysis, respectively, where m denotes the total number of processors and n denotes the number of tasks. In this paper, it is shown that, in the case of distributed semaphore protocols, there exist two different task allocation scenarios that give rise to distinct lower bounds. In the case of co-hosted task allocation, where application tasks may also be assigned to synchronization processors (i.e., processors hosting critical sections, Ω(Φ · n maximum pi-blocking is unavoidable for some tasks under any locking protocol under both suspension-aware and suspension-oblivious schedulability analysis, where Φ denotes the ratio of the maximum response time to the shortest period. In contrast, in the case of disjoint task allocation (i.e., if application tasks may not be assigned to synchronization processors, only Ω(m and Ω(n maximum pi-blocking is fundamentally unavoidable under suspension-oblivious and suspension-aware analysis, respectively, as in the shared-memory case. These bounds are shown to be asymptotically tight with the construction of two new distributed real-time locking protocols that ensure O(m and O(n maximum pi-blocking under suspension-oblivious and suspension-aware analysis, respectively.

  17. Ultra-fast fluence optimization for beam angle selection algorithms

    Science.gov (United States)

    Bangert, M.; Ziegenhein, P.; Oelfke, U.

    2014-03-01

    Beam angle selection (BAS) including fluence optimization (FO) is among the most extensive computational tasks in radiotherapy. Precomputed dose influence data (DID) of all considered beam orientations (up to 100 GB for complex cases) has to be handled in the main memory and repeated FOs are required for different beam ensembles. In this paper, the authors describe concepts accelerating FO for BAS algorithms using off-the-shelf multiprocessor workstations. The FO runtime is not dominated by the arithmetic load of the CPUs but by the transportation of DID from the RAM to the CPUs. On multiprocessor workstations, however, the speed of data transportation from the main memory to the CPUs is non-uniform across the RAM; every CPU has a dedicated memory location (node) with minimum access time. We apply a thread node binding strategy to ensure that CPUs only access DID from their preferred node. Ideal load balancing for arbitrary beam ensembles is guaranteed by distributing the DID of every candidate beam equally to all nodes. Furthermore we use a custom sorting scheme of the DID to minimize the overall data transportation. The framework is implemented on an AMD Opteron workstation. One FO iteration comprising dose, objective function, and gradient calculation takes between 0.010 s (9 beams, skull, 0.23 GB DID) and 0.070 s (9 beams, abdomen, 1.50 GB DID). Our overall FO time is < 1 s for small cases, larger cases take ~ 4 s. BAS runs including FOs for 1000 different beam ensembles take ~ 15-70 min, depending on the treatment site. This enables an efficient clinical evaluation of different BAS algorithms.

  18. Ultra-fast fluence optimization for beam angle selection algorithms

    International Nuclear Information System (INIS)

    Bangert, M; Ziegenhein, P; Oelfke, U

    2014-01-01

    Beam angle selection (BAS) including fluence optimization (FO) is among the most extensive computational tasks in radiotherapy. Precomputed dose influence data (DID) of all considered beam orientations (up to 100 GB for complex cases) has to be handled in the main memory and repeated FOs are required for different beam ensembles. In this paper, the authors describe concepts accelerating FO for BAS algorithms using off-the-shelf multiprocessor workstations. The FO runtime is not dominated by the arithmetic load of the CPUs but by the transportation of DID from the RAM to the CPUs. On multiprocessor workstations, however, the speed of data transportation from the main memory to the CPUs is non-uniform across the RAM; every CPU has a dedicated memory location (node) with minimum access time. We apply a thread node binding strategy to ensure that CPUs only access DID from their preferred node. Ideal load balancing for arbitrary beam ensembles is guaranteed by distributing the DID of every candidate beam equally to all nodes. Furthermore we use a custom sorting scheme of the DID to minimize the overall data transportation. The framework is implemented on an AMD Opteron workstation. One FO iteration comprising dose, objective function, and gradient calculation takes between 0.010 s (9 beams, skull, 0.23 GB DID) and 0.070 s (9 beams, abdomen, 1.50 GB DID). Our overall FO time is < 1 s for small cases, larger cases take ∼ 4 s. BAS runs including FOs for 1000 different beam ensembles take ∼ 15–70 min, depending on the treatment site. This enables an efficient clinical evaluation of different BAS algorithms.

  19. Development of Parallel Code for the Alaska Tsunami Forecast Model

    Science.gov (United States)

    Bahng, B.; Knight, W. R.; Whitmore, P.

    2014-12-01

    The Alaska Tsunami Forecast Model (ATFM) is a numerical model used to forecast propagation and inundation of tsunamis generated by earthquakes and other means in both the Pacific and Atlantic Oceans. At the U.S. National Tsunami Warning Center (NTWC), the model is mainly used in a pre-computed fashion. That is, results for hundreds of hypothetical events are computed before alerts, and are accessed and calibrated with observations during tsunamis to immediately produce forecasts. ATFM uses the non-linear, depth-averaged, shallow-water equations of motion with multiply nested grids in two-way communications between domains of each parent-child pair as waves get closer to coastal waters. Even with the pre-computation the task becomes non-trivial as sub-grid resolution gets finer. Currently, the finest resolution Digital Elevation Models (DEM) used by ATFM are 1/3 arc-seconds. With a serial code, large or multiple areas of very high resolution can produce run-times that are unrealistic even in a pre-computed approach. One way to increase the model performance is code parallelization used in conjunction with a multi-processor computing environment. NTWC developers have undertaken an ATFM code-parallelization effort to streamline the creation of the pre-computed database of results with the long term aim of tsunami forecasts from source to high resolution shoreline grids in real time. Parallelization will also permit timely regeneration of the forecast model database with new DEMs; and, will make possible future inclusion of new physics such as the non-hydrostatic treatment of tsunami propagation. The purpose of our presentation is to elaborate on the parallelization approach and to show the compute speed increase on various multi-processor systems.

  20. Asynchronous and corrected-asynchronous numerical solutions of parabolic PDES on MIMD multiprocessors

    Science.gov (United States)

    Amitai, Dganit; Averbuch, Amir; Itzikowitz, Samuel; Turkel, Eli

    1991-01-01

    A major problem in achieving significant speed-up on parallel machines is the overhead involved with synchronizing the concurrent process. Removing the synchronization constraint has the potential of speeding up the computation. The authors present asynchronous (AS) and corrected-asynchronous (CA) finite difference schemes for the multi-dimensional heat equation. Although the discussion concentrates on the Euler scheme for the solution of the heat equation, it has the potential for being extended to other schemes and other parabolic partial differential equations (PDEs). These schemes are analyzed and implemented on the shared memory multi-user Sequent Balance machine. Numerical results for one and two dimensional problems are presented. It is shown experimentally that the synchronization penalty can be about 50 percent of run time: in most cases, the asynchronous scheme runs twice as fast as the parallel synchronous scheme. In general, the efficiency of the parallel schemes increases with processor load, with the time level, and with the problem dimension. The efficiency of the AS may reach 90 percent and over, but it provides accurate results only for steady-state values. The CA, on the other hand, is less efficient, but provides more accurate results for intermediate (non steady-state) values.

  1. Parallel algorithms for quantum chemistry. I. Integral transformations on a hypercube multiprocessor

    International Nuclear Information System (INIS)

    Whiteside, R.A.; Binkley, J.S.; Colvin, M.E.; Schaefer, H.F. III

    1987-01-01

    For many years it has been recognized that fundamental physical constraints such as the speed of light will limit the ultimate speed of single processor computers to less than about three billion floating point operations per second (3 GFLOPS). This limitation is becoming increasingly restrictive as commercially available machines are now within an order of magnitude of this asymptotic limit. A natural way to avoid this limit is to harness together many processors to work on a single computational problem. In principle, these parallel processing computers have speeds limited only by the number of processors one chooses to acquire. The usefulness of potentially unlimited processing speed to a computationally intensive field such as quantum chemistry is obvious. If these methods are to be applied to significantly larger chemical systems, parallel schemes will have to be employed. For this reason we have developed distributed-memory algorithms for a number of standard quantum chemical methods. We are currently implementing these on a 32 processor Intel hypercube. In this paper we present our algorithm and benchmark results for one of the bottleneck steps in quantum chemical calculations: the four index integral transformation

  2. Behavior characterization of the shared last-level cache in a chip multiprocessor

    OpenAIRE

    Benedicte Illescas, Pedro

    2014-01-01

    [CATALÀ] Aquest projecte consisteix a analitzar diferents aspectes de la jerarquia de memòria i entendre la seva influència al rendiment del sistema. Els aspectes que s'analitzaran són els algorismes de reemplaçament, els esquemes de mapeig de memòria i les polítiques de pàgina de memòria. [ANGLÈS] This project consists in analyzing different aspects of the memory hierarchy and understanding its influence in the overall system performance. The aspects that will be analyzed are cache replac...

  3. A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

    Science.gov (United States)

    Rao, Hariprasad Nannapaneni

    1989-01-01

    The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.

  4. Preemptive scheduling in a two-stage multiprocessor flow shop is NP-hard

    NARCIS (Netherlands)

    Hoogeveen, J.A.; Lenstra, J.K.; Veltman, B.

    1996-01-01

    In 1954, Johnson gave an efficient algorithm for minimizing makespan in a two-machine flow shop; there is no advantage to preemption in this case. McNaughton's wrap-around rule of 1959 finds a shortest preemptive schedule on identical parallel machines in linear time. A similarly efficient algorithm

  5. Green computing: power optimisation of vfi-based real-time multiprocessor dataflow applications

    NARCIS (Netherlands)

    Ahmad, W.; Holzenspies, P.K.F.; Stoelinga, Mariëlle Ida Antoinette; van de Pol, Jan Cornelis

    2015-01-01

    Execution time is no longer the only performance metric for computer systems. In fact, a trend is emerging to trade raw performance for energy savings. Techniques like Dynamic Power Management (DPM, switching to low power state) and Dynamic Voltage and Frequency Scaling (DVFS, throttling processor

  6. Multi-processor system for real-time flow estimation in medical ultrasound imaging

    DEFF Research Database (Denmark)

    Stetson, Paul F.; Jensen, Jesper Lomborg; Antonius, Peter

    1997-01-01

    the processed data. The generous bandwidth of the links makes it easy to balance the computational load among the processors.In order to manage the shared system memory and to make use of the parallel processing capabilities of the system, a real-time multitasking kernel has been developed. The kernel uses...

  7. A Self Consistent Multiprocessor Space Charge Algorithm that is Almost Embarrassingly Parallel

    International Nuclear Information System (INIS)

    Nissen, Edward; Erdelyi, B.; Manikonda, S.L.

    2012-01-01

    We present a space charge code that is self consistent, massively parallelizeable, and requires very little communication between computer nodes; making the calculation almost embarrassingly parallel. This method is implemented in the code COSY Infinity where the differential algebras used in this code are important to the algorithm's proper functioning. The method works by calculating the self consistent space charge distribution using the statistical moments of the test particles, and converting them into polynomial series coefficients. These coefficients are combined with differential algebraic integrals to form the potential, and electric fields. The result is a map which contains the effects of space charge. This method allows for massive parallelization since its statistics based solver doesn't require any binning of particles, and only requires a vector containing the partial sums of the statistical moments for the different nodes to be passed. All other calculations are done independently. The resulting maps can be used to analyze the system using normal form analysis, as well as advance particles in numbers and at speeds that were previously impossible.

  8. Can High Bandwidth and Latency Justify Large Cache Blocks in Scalable Multiprocessors?

    Science.gov (United States)

    1994-01-01

    400 MB/second. 4 Dubnicki’s work used trace-driven simulation, with traces collected on an 8-processor machine. We would expect such small-scale...312 1 6 32 64 of odk Sb* Bad64.M Figure 17: Miss rate of Ind Blocked LU. Figure 18: MCPR of Ind Blocked LU. overall miss rate of TGauss is a factor of...easily. 17 (’his approach assunics that the model paramelers we collect from simulations with infinite band- width (such as the miss rate and the

  9. Multicore Processing and ARTEMIS - An incentive to develop the European Multiprocessor research

    DEFF Research Database (Denmark)

    Seceleanu, Tiberius; Tenhunen, Hannu; Jerraya, Ahmed

    2006-01-01

    to a traditional SOC view, concurrency at all levels plays a deterministic role, while problems such as power consumption, addressable separately in the nodes of a DS, must be unitary considered. Thus, distinct research and development issues must be defined for MPSOC, building on the indispensable experience...... in DS and SOC, and other related domains. Primarily motivated by market concerns, and also by the promises of the available billion transistor technology, MPSOC is increasingly becoming the preferred target for embedded systems (ES) implementations. Furthermore, the possibility to fit a huge number...

  10. Green computing: efficient energy management of multiprocessor streaming applications via model checking

    NARCIS (Netherlands)

    Ahmad, W.

    2017-01-01

    Streaming applications such as virtual reality, video conferencing, and face detection, impose high demands on a system’s performance and battery life. With the advancement in mobile computing, these applications are increasingly implemented on battery-constrained platforms, such as gaming consoles,

  11. High performance image processing of SPRINT

    Energy Technology Data Exchange (ETDEWEB)

    DeGroot, T. [Lawrence Livermore National Lab., CA (United States)

    1994-11-15

    This talk will describe computed tomography (CT) reconstruction using filtered back-projection on SPRINT parallel computers. CT is a computationally intensive task, typically requiring several minutes to reconstruct a 512x512 image. SPRINT and other parallel computers can be applied to CT reconstruction to reduce computation time from minutes to seconds. SPRINT is a family of massively parallel computers developed at LLNL. SPRINT-2.5 is a 128-node multiprocessor whose performance can exceed twice that of a Cray-Y/MP. SPRINT-3 will be 10 times faster. Described will be the parallel algorithms for filtered back-projection and their execution on SPRINT parallel computers.

  12. Automatic code generation for distributed robotic systems

    International Nuclear Information System (INIS)

    Jones, J.P.

    1993-01-01

    Hetero Helix is a software environment which supports relatively large robotic system development projects. The environment supports a heterogeneous set of message-passing LAN-connected common-bus multiprocessors, but the programming model seen by software developers is a simple shared memory. The conceptual simplicity of shared memory makes it an extremely attractive programming model, especially in large projects where coordinating a large number of people can itself become a significant source of complexity. We present results from three system development efforts conducted at Oak Ridge National Laboratory over the past several years. Each of these efforts used automatic software generation to create 10 to 20 percent of the system

  13. Vectorization and parallelization of a production reactor assembly code

    International Nuclear Information System (INIS)

    Vujic, J.L.; Martin, W.R.; Michigan Univ., Ann Arbor, MI

    1991-01-01

    In order to use efficiently the new features of supercomputers, production codes, usually written 10 -20 years ago, must be tailored for modern computer architectures. We have chosen to optimize the CPM-2 code, a production reactor assembly code based on the collision probability transport method. Substantial speedup in the execution times was obtained with the parallel/vector version of the CPM-2 code. In addition, we have developed a new transfer probability method, which removes some of the modelling limitations of the collision probability method encoded in the CPM-2 code, and can fully utilize the parallel/vector architecture of a multiprocessor IBM 3090. (author)

  14. Portable software for distributed readout controllers and event builders in FASTBUS and VME

    International Nuclear Information System (INIS)

    Pordes, R.; Berg, D.; Berman, E.; Bernett, M.; Brown, D.; Constanta-Fanourakis, P.; Dorries, T.; Haire, M.; Joshi, U.; Kaczar, K.; Mackinnon, B.; Moore, C.; Nicinski, T.; Oleynik, G.; Petravick, D.; Sergey, G.; Slimmer, D.; Streets, J.; Votava, M.; White, V.

    1989-12-01

    We report on software developed as part of the PAN-DA system to support the functions of front end readout controllers and event builders in multiprocessor, multilevel, distributed data acquisition systems. For the next generation data acquisition system we have undertaken to design and implement software tools that are easily transportable to new modules. The first implementation of this software is for Motorola 68K series processor boards in FASTBUS and VME and will be used in the Fermilab accelerator run at the beginning of 1990. We use a Real Time Kernel Operating System. The software provides general connectivity tools for control, diagnosis and monitoring. 17 refs., 7 figs

  15. Overview of the Scalable Coherent Interface, IEEE STD 1596 (SCI)

    International Nuclear Information System (INIS)

    Gustavson, D.B.; James, D.V.; Wiggers, H.A.

    1992-10-01

    The Scalable Coherent Interface standard defines a new generation of interconnection that spans the full range from supercomputer memory 'bus' to campus-wide network. SCI provides bus-like services and a shared-memory software model while using an underlying, packet protocol on many independent communication links. Initially these links are 1 GByte/s (wires) and 1 GBit/s (fiber), but the protocol scales well to future faster or lower-cost technologies. The interconnect may use switches, meshes, and rings. The SCI distributed-shared-memory model is simple and versatile, enabling for the first time a smooth integration of highly parallel multiprocessors, workstations, personal computers, I/O, networking and data acquisition

  16. Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

    Science.gov (United States)

    Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.

    2003-01-01

    Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.

  17. Data acquisition and control system for the IPNS time-of-flight neutron scattering instruments

    International Nuclear Information System (INIS)

    Daly, R.T.; Haumann, J.R.; Kraimer, M.R.; Lenkszus, F.R.; Lidinsky, W.P.; Morgan, C.B.; Rutledge, L.L.; Rynes, P.E.; Tippie, J.W.

    1979-01-01

    The Argonne Intense Pulsed Neutron System (IPNS-I) presently under construction at Argonne National Laboratory will include a number of neutron scattering instruments. This study investigates the data acquisition requirements of these instruments and proposes three alternative multiprocessor systems which will satisfy these requirements. All proposals are star configurations with a super-mini as the central node or HOST. The first proposal is based on front-ends composed of two or more 16-bit microcomputers, the second proposal is based on front ends consisting of a combination of a mini and microcomputers, and the third is based on a minicomputer with an intelligent CAMAC controller

  18. Bus-based architectures in the H1 data acquisition system

    International Nuclear Information System (INIS)

    Haynes, W.J.

    1992-08-01

    The recently commissioned Hadron-Electron Ring Anage (HERA) electron-proton collider at DESY, Deutsches Elektronen Synchrotron has presented a challenging environment in coping with huge volumes of data in real-time. With bunch-crossing intervals of 96 nanoseconds, the H1 experiment can generate nearly 3 Megabytes of raw digital information from several hundred thousand electronic channels. The open VMEbus standard is exploited to provide the necessary modularity and system flexibility for embedding fast readout electronics in a multiprocessor design array. The architectural concepts of the data acquisition structure are discussed together with many of the experiences gained in implementing a large multi-crate system. (author)

  19. Runtime Performance Monitoring Tool for RTEMS System Software

    Science.gov (United States)

    Cho, B.; Kim, S.; Park, H.; Kim, H.; Choi, J.; Chae, D.; Lee, J.

    2007-08-01

    RTEMS is a commercial-grade real-time operating system that supports multi-processor computers. However, there are not many development tools for RTEMS. In this paper, we report new RTEMS-based runtime performance monitoring tool. We have implemented a light weight runtime monitoring task with an extension to the RTEMS APIs. Using our tool, software developers can verify various performance- related parameters during runtime. Our tool can be used during software development phase and in-orbit operation as well. Our implemented target agent is light weight and has small overhead using SpaceWire interface. Efforts to reduce overhead and to add other monitoring parameters are currently under research.

  20. The processor farm for online triggering and full event reconstruction of the HERA-B experiment at HERA

    International Nuclear Information System (INIS)

    Gellrich, A.; Dippel, R.; Gensch, U.; Kowallik, R.; Legrand, I.C.; Leich, H.; Sun, F.; Wegner, P.

    1996-01-01

    The main goal of the HERA-B experiment which start taking data in 1988 is to study CP violation in B decays. This article describes the concept and the planned implementation of a multi-processor system, called processor farm,as the last part of the data acquisition and trigger system of the HERA B experiment. The third level trigger task and a full online event reconstruction will be performed on this processor farm, consisting of more then 100 powerful RISC processors which are based on commercial hardware boards. The controlling will be done by a real-time operating system which provides a software development environment, including FORTRAN and C compilers. (author)

  1. Control and data acquisition system for Beam Line C and HRS

    International Nuclear Information System (INIS)

    Naivar, F.J.; Rogers, W.L.; Simmonds, D.D.; Spencer, J.E.; Spencer, N.C.; Stark, C.F.

    1977-03-01

    The High Resolution Spectrometer (HRS) at LAMPF is a multimillion dollar facility intended for the study of proton induced reactions and scattering with 100 to 800 MeV protons. It will be used by research scientists from universities and other national laboratories throughout the United States. The design resolution of +-1.5:10 -5 is better than any similar system ever attempted in this energy range. To obtain this quality as well as fully realize the research potential of the facility, a rather sophisticated computer control and data acquisition system is required. The facility, based on a general multiuser, multitask, multiprocessor system, is described

  2. On-line data analysis and monitoring for H1 drift chambers

    International Nuclear Information System (INIS)

    Duellmann, D.

    1992-01-01

    The on-line monitoring, slow control and calibration of the H1 central jet chamber uses a VME multiprocessor system to perform the analysis and a connected Macintosh computer as graphical interface to the operator on shift. Tasks of this system are: Analysis of event data including on-line track search; on-line calibration from normal events and testpulse events; control of the high voltage and monitoring of settings and currents; monitoring of temperature, pressure and mixture of the chambergas. A program package is described which controls the dataflow between data aquisition, different VME CPUs and Macintosh. It allows to run off-line style programs for the different tasks. (orig.)

  3. On-line data analysis and monitoring for H1 drift chambers

    Science.gov (United States)

    Düllmann, Dirk

    1992-05-01

    The on-line monitoring, slow control and calibration of the H1 central jet chamber uses a VME multiprocessor system to perform the analysis and a connected Macintosh computer as graphical interface to the operator on shift. Task of this system are: - analysis of event data including on-line track search, - on-line calibration from normal events and testpulse events, - control of the high voltage and monitoring of settings and currents, - monitoring of temperature, pressure and mixture of the chambergas. A program package is described which controls the dataflow between data aquisition, differnt VME CPUs and Macintosh. It allows to run off-line style programs for the different tasks.

  4. Vectorization and parallelization of a production reactor assembly code

    International Nuclear Information System (INIS)

    Vujic, J.L.; Martin, W.R.

    1991-01-01

    In order to efficiently use new features of supercomputers, production codes, usually written 10 - 20 years ago, must be tailored for modern computer architectures. We have chosen to optimize the CPM-2 code, a production reactor assembly code based on the collision probability transport method. Substantial speedups in the execution times were obtained with the parallel/vector version of the CPM-2 code. In addition, we have developed a new transfer probability method, which removes some of the modelling limitations of the collision probability method encoded in the CPM-2 code, and can fully utilize parallel/vector architecture of a multiprocessor IBM 3090. (author)

  5. A real time tracking vision system and its application to robotics

    International Nuclear Information System (INIS)

    Inoue, Hirochika

    1994-01-01

    Among various sensing channels the vision is most important for making robot intelligent. If provided with a high speed visual tracking capability, the robot-environment interaction becomes dynamic instead of static, and thus the potential repertoire of robot behavior becomes very rich. For this purpose we developed a real-time tracking vision system. The fundamental operation on which our system based is the calculation of correlation between local images. Use of special chip for correlation and the multi-processor configuration enable the robot to track more than hundreds cues in full video rate. In addition to the fundamental visual performance, applications for robot behavior control are also introduced. (author)

  6. Development of a coupling scheme between MCNP5 and subchanflow for the PIN- and fuel Assembly-Wise simulation of LWR and innovative reactors

    International Nuclear Information System (INIS)

    Ivanov, A.; Sanchez, V.; Imke, U.

    2011-01-01

    In order to increase the accuracy and the degree of spatial resolution of core design studies, coupled 3D neutronic (deterministic and Monte Carlo) and 3D thermal hydraulics (CFD and subchannel) codes are being developed worldwide. At KIT both deterministic and Monte Carlo codes were coupled with subchannel codes and applied to predict the safety-related design parameters such as pin power, maximal cladding and fuel temperature, DNB. These coupling approaches were revised and improved based on the experience gained. One particular example is replacing COBRA-TF with SUBCHANFLOW, in-house development subchannel code, in the COBRA-TF/MCNP coupling, accompanied with new way of radial mapping between the neutronic and thermal hydraulic domains. The new coupled system MCNP5/SUBCHANFLOW makes it possible to investigate variety of fuel assembly types (BWR, PWR or SCFR). Key issues in such a coupled system are the way in which thermal-hydraulic/neutronic feedbacks, accuracy of the Monte Carlo solutions and observation of convergence during the iterative solution are handled. Another key issue that might be considered is the optimal application of parallel computing. Using multi-processor computer architectures, it is possible to reduce the Monte- Carlo running time and obtain converged results within reasonable time limit. In particular it is shown that by exploiting the capabilities of multi-processor calculation, it is possible to investigate large fuel assemblies in a pin-by-pin manner with a resolution at pin and subchannel level. One of the most important issues addressed in the current work is the temperature effects on nuclear data. For the particular studies pseudo material approach was used, which produces interpolated results for Doppler broadened cross sections from NJOY pre-generated nuclear data. (author)

  7. DIRAC universal pilots

    Science.gov (United States)

    Stagni, F.; McNab, A.; Luzzi, C.; Krzemien, W.; Consortium, DIRAC

    2017-10-01

    In the last few years, new types of computing models, such as IAAS (Infrastructure as a Service) and IAAC (Infrastructure as a Client), gained popularity. New resources may come as part of pledged resources, while others are in the form of opportunistic ones. Most but not all of these new infrastructures are based on virtualization techniques. In addition, some of them, present opportunities for multi-processor computing slots to the users. Virtual Organizations are therefore facing heterogeneity of the available resources and the use of an Interware software like DIRAC to provide the transparent, uniform interface has become essential. The transparent access to the underlying resources is realized by implementing the pilot model. DIRAC’s newest generation of generic pilots (the so-called Pilots 2.0) are the “pilots for all the skies”, and have been successfully released in production more than a year ago. They use a plugin mechanism that makes them easily adaptable. Pilots 2.0 have been used for fetching and running jobs on every type of resource, being it a Worker Node (WN) behind a CREAM/ARC/HTCondor/DIRAC Computing element, a Virtual Machine running on IaaC infrastructures like Vac or BOINC, on IaaS cloud resources managed by Vcycle, the LHCb High Level Trigger farm nodes, and any type of opportunistic computing resource. Make a machine a “Pilot Machine”, and all diversities between them will disappear. This contribution describes how pilots are made suitable for different resources, and the recent steps taken towards a fully unified framework, including monitoring. Also, the cases of multi-processor computing slots either on real or virtual machines, with the whole node or a partition of it, is discussed.

  8. Architecture of high reliable control systems using complex software

    International Nuclear Information System (INIS)

    Tallec, M.

    1990-01-01

    The problems involved by the use of complex softwares in control systems that must insure a very high level of safety are examined. The first part makes a brief description of the prototype of PROSPER system. PROSPER means protection system for nuclear reactor with high performances. It has been installed on a French nuclear power plant at the beginnning of 1987 and has been continually working since that time. This prototype is realized on a multi-processors system. The processors communicate between themselves using interruptions and protected shared memories. On each processor, one or more protection algorithms are implemented. Those algorithms use data coming directly from the plant and, eventually, data computed by the other protection algorithms. Each processor makes its own acquisitions from the process and sends warning messages if some operating anomaly is detected. All algorithms are activated concurrently on an asynchronous way. The results are presented and the safety related problems are detailed. - The second part is about measurements' validation. First, we describe how the sensors' measurements will be used in a protection system. Then, a proposal for a method based on the techniques of artificial intelligence (expert systems and neural networks) is presented. - The last part is about the problems of architectures of systems including hardware and software: the different types of redundancies used till now and a proposition of a multi-processors architecture which uses an operating system that is able to manage several tasks implemented on different processors, which verifies the good operating of each of those tasks and of the related processors and which allows to carry on the operation of the system, even in a degraded manner when a failure has been detected are detailed [fr

  9. Contributing to the design of run-time systems dedicated to high performance computing

    International Nuclear Information System (INIS)

    Perache, M.

    2006-10-01

    In the field of intensive scientific computing, the quest for performance has to face the increasing complexity of parallel architectures. Nowadays, these machines exhibit a deep memory hierarchy which complicates the design of efficient parallel applications. This thesis proposes a programming environment allowing to design efficient parallel programs on top of clusters of multi-processors. It features a programming model centered around collective communications and synchronizations, and provides load balancing facilities. The programming interface, named MPC, provides high level paradigms which are optimized according to the underlying architecture. The environment is fully functional and used within the CEA/DAM (TERANOVA) computing center. The evaluations presented in this document confirm the relevance of our approach. (author)

  10. Fermilab advanced computer program multi-microprocessor project

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Biel, J.

    1985-06-01

    Fermilab's Advanced Computer Program is constructing a powerful 128 node multi-microprocessor system for data analysis in high-energy physics. The system will use commercial 32-bit microprocessors programmed in Fortran-77. Extensive software supports easy migration of user applications from a uniprocessor environment to the multiprocessor and provides sophisticated program development, debugging, and error handling and recovery tools. This system is designed to be readily copied, providing computing cost effectiveness of below $2200 per VAX 11/780 equivalent. The low cost, commercial availability, compatibility with off-line analysis programs, and high data bandwidths (up to 160 MByte/sec) make the system an ideal choice for applications to on-line triggers as well as an offline data processor

  11. Optical backplane interconnect switch for data processors and computers

    Science.gov (United States)

    Hendricks, Herbert D.; Benz, Harry F.; Hammer, Jacob M.

    1989-01-01

    An optoelectronic integrated device design is reported which can be used to implement an all-optical backplane interconnect switch. The switch is sized to accommodate an array of processors and memories suitable for direct replacement into the basic avionic multiprocessor backplane. The optical backplane interconnect switch is also suitable for direct replacement of the PI bus traffic switch and at the same time, suitable for supporting pipelining of the processor and memory. The 32 bidirectional switchable interconnects are configured with broadcast capability for controls, reconfiguration, and messages. The approach described here can handle a serial interconnection of data processors or a line-to-link interconnection of data processors. An optical fiber demonstration of this approach is presented.

  12. COMPUTING: International symposium

    International Nuclear Information System (INIS)

    Anon.

    1984-01-01

    Recent Developments in Computing, Processor, and Software Research for High Energy Physics, a four-day international symposium, was held in Guanajuato, Mexico, from 8-11 May, with 112 attendees from nine countries. The symposium was the third in a series of meetings exploring activities in leading-edge computing technology in both processor and software research and their effects on high energy physics. Topics covered included fixed-target on- and off-line reconstruction processors; lattice gauge and general theoretical processors and computing; multiprocessor projects; electron-positron collider on- and offline reconstruction processors; state-of-the-art in university computer science and industry; software research; accelerator processors; and proton-antiproton collider on and off-line reconstruction processors

  13. A supercomputer for parallel data analysis

    International Nuclear Information System (INIS)

    Kolpakov, I.F.; Senner, A.E.; Smirnov, V.A.

    1987-01-01

    The project of a powerful multiprocessor system is proposed. The main purpose of the project is to develop a low cost computer system with a processing rate of a few tens of millions of operations per second. The system solves many problems of data analysis from high-energy physics spectrometers. It includes about 70 MOTOROLA-68020 based powerful slave microprocessor boards liaisoned through the VME crates to a host VAX micro computer. Each single microprocessor board performs the same algorithm requiring large computing time. The host computer distributes data over the microprocessor board, collects and combines obtained results. The architecture of the system easily allows one to use it in the real time mode

  14. Median and Morphological Specialized Processors for a Real-Time Image Data Processing

    Directory of Open Access Journals (Sweden)

    Kazimierz Wiatr

    2002-01-01

    Full Text Available This paper presents the considerations on selecting a multiprocessor MISD architecture for fast implementation of the vision image processing. Using the author′s earlier experience with real-time systems, implementing of specialized hardware processors based on the programmable FPGA systems has been proposed in the pipeline architecture. In particular, the following processors are presented: median filter and morphological processor. The structure of a universal reconfigurable processor developed has been proposed as well. Experimental results are presented as delays on LCA level implementation for median filter, morphological processor, convolution processor, look-up-table processor, logic processor and histogram processor. These times compare with delays in general purpose processor and DSP processor.

  15. Data management system advanced development

    Science.gov (United States)

    Douglas, Katherine; Humphries, Terry

    1990-01-01

    The Data Management System (DMS) Advanced Development task provides for the development of concepts, new tools, DMS services, and for the testing of the Space Station DMS hardware and software. It also provides for the development of techniques capable of determining the effects of system changes/enhancements, additions of new technology, and/or hardware and software growth on system performance. This paper will address the built-in characteristics which will support network monitoring requirements in the design of the evolving DMS network implementation, functional and performance requirements for a real-time, multiprogramming, multiprocessor operating system, and the possible use of advanced development techniques such as expert systems and artificial intelligence tools in the DMS design.

  16. EPICS IOC development on open source RTEMS RTOS

    International Nuclear Information System (INIS)

    Bharade, S.K.; Joshi, Gopal; Das, D.

    2015-01-01

    Modern control systems applications are often built on top of a real time operating system. At LEHIPA beamlines, open source control systems offer a modem solution for cost effectiveness and technical competence. The 'Experimental Physics and Industrial Control System' (EPICS) and the 'Real-Time Operating System for Multiprocessor Systems' (RTEMS) were chosen to develop the core control system. Presently, the EPICS/RTEMS/MVME5500 control system is implemented for RF Protection Interlock system and BPM. This paper shares an experience of building RTEMS for MVME5500, configuring and run EPICS for RTEMS-MVME5500 architecture. It further shows EPICS bench marking results using EPICS catime and hamesstest utilities for this architecture. (author)

  17. Parallel processing of dose calculation for external photon beam therapy

    International Nuclear Information System (INIS)

    Kunieda, Etsuo; Ando, Yutaka; Tsukamoto, Nobuhiro; Ito, Hisao; Kubo, Atsushi

    1994-01-01

    We implemented external photon beam dose calculation programs into a parallel processor system consisting of Transputers, 32-bit processors especially suitable for multi-processor configuration. Two network conformations, binary-tree and pipeline, were evaluated for rectangular and irregular field dose calculation algorithms. Although computation speed increased in proportion to the number of CPU, substantial overhead caused by inter-processor communication occurred when a smaller computation load was delivered to each processor. On the other hand, for irregular field calculation, which requires more computation capability for each calculation point, the communication overhead was still less even when more than 50 processors were involved. Real-time responses could be expected for more complex algorithms by increasing the number of processors. (author)

  18. Contributing to the design of run-time systems dedicated to high performance computing; Contribution a l'elaboration d'environnements de programmation dedies au calcul scientifique hautes performances

    Energy Technology Data Exchange (ETDEWEB)

    Perache, M

    2006-10-15

    In the field of intensive scientific computing, the quest for performance has to face the increasing complexity of parallel architectures. Nowadays, these machines exhibit a deep memory hierarchy which complicates the design of efficient parallel applications. This thesis proposes a programming environment allowing to design efficient parallel programs on top of clusters of multi-processors. It features a programming model centered around collective communications and synchronizations, and provides load balancing facilities. The programming interface, named MPC, provides high level paradigms which are optimized according to the underlying architecture. The environment is fully functional and used within the CEA/DAM (TERANOVA) computing center. The evaluations presented in this document confirm the relevance of our approach. (author)

  19. Contributing to the design of run-time systems dedicated to high performance computing; Contribution a l'elaboration d'environnements de programmation dedies au calcul scientifique hautes performances

    Energy Technology Data Exchange (ETDEWEB)

    Perache, M

    2006-10-15

    In the field of intensive scientific computing, the quest for performance has to face the increasing complexity of parallel architectures. Nowadays, these machines exhibit a deep memory hierarchy which complicates the design of efficient parallel applications. This thesis proposes a programming environment allowing to design efficient parallel programs on top of clusters of multi-processors. It features a programming model centered around collective communications and synchronizations, and provides load balancing facilities. The programming interface, named MPC, provides high level paradigms which are optimized according to the underlying architecture. The environment is fully functional and used within the CEA/DAM (TERANOVA) computing center. The evaluations presented in this document confirm the relevance of our approach. (author)

  20. An ultrasonic sensor controller for mapping and servo control in robotic systems

    International Nuclear Information System (INIS)

    Drotning, W.D.; Garcia, P. Jr.

    1993-03-01

    An ultrasonic sensor controller has been developed and applied in a variety of robotic systems for operation in hazardous environments. The controller consists of hardware and software that control multiple ultrasonic range sensors and provide workspace information to robot controllers for rapid, safe, and reliable operation in hazardous and remote environments. The hardware consists of a programmable multichannel controller that resides on a VMEbus for high speed communication to a multiprocessor architecture. The sensor controller has been used in a number of applications, which include providing high precision range information for proximity servo control of robots, and performing surface and obstacle mapping functions for safe path planning of robots in unstructured environments