WorldWideScience

Sample records for distributed-memory multiprocessor computers

  1. Scientific Programming Languages for Distributed Memory Multiprocessors: Paradigms and Research Issues

    National Research Council Canada - National Science Library

    Rosing, Matthew; Schnabel, Robert B; Weaver, Robert P

    1991-01-01

    This paper attempts to identify some of the central concepts, issues, and challenges that are emerging in the development of imperative, data parallel programming languages for distributed memory multiprocessors...

  2. Distributed-memory matrix computations

    DEFF Research Database (Denmark)

    Balle, Susanne Mølleskov

    1995-01-01

    in these algorithms is that many scientific applications rely heavily on the performance of the involved dense linear algebra building blocks. Even though we consider the distributed-memory as well as the shared-memory programming paradigm, the major part of the thesis is dedicated to distributed-memory architectures....... We emphasize distributed-memory massively parallel computers - such as the Connection Machines model CM-200 and model CM-5/CM-5E - available to us at UNI-C and at Thinking Machines Corporation. The CM-200 was at the time this project started one of the few existing massively parallel computers...... algorithm is investigated. this algorithm is built on top of several scan-operations. What difficulties occur when implementing this algorithm to massively parallel computers?...

  3. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    International Nuclear Information System (INIS)

    Nash, T.

    1989-05-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. 6 figs

  4. Parallel implementation and evaluation of motion estimation system algorithms on a distributed memory multiprocessor using knowledge based mappings

    Science.gov (United States)

    Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.

    1989-01-01

    Several techniques to perform static and dynamic load balancing techniques for vision systems are presented. These techniques are novel in the sense that they capture the computational requirements of a task by examining the data when it is produced. Furthermore, they can be applied to many vision systems because many algorithms in different systems are either the same, or have similar computational characteristics. These techniques are evaluated by applying them on a parallel implementation of the algorithms in a motion estimation system on a hypercube multiprocessor system. The motion estimation system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from different time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters. It is shown that the performance gains when these data decomposition and load balancing techniques are used are significant and the overhead of using these techniques is minimal.

  5. Simulation of Particulate Flows Multi-Processor Machines with Distributed Memory

    Energy Technology Data Exchange (ETDEWEB)

    Uhlmann, M.

    2004-07-01

    We presented a method for the parallelization of an immersed boundary algorithm for particulate flows using the MPI standard of communication. The treatment of the fluid phase used the domain decomposition technique over a Cartesian processor grid. The solution of the Helmholtz problem is approximately factorized an relies upon apparel tri-diagonal solver the Poisson problem is solved by means of a parallel multi-grid technique similar to MUDPACK. for the solid phase we employ a master-slaves technique where one processor handles all the particles contained in its Eulerian fluid sub-domain and zero or more neighbor processors collaborate in the computation of particle-related quantities whenever a particle position over laps the boundary of a sub-domain. the parallel efficiency for some preliminary computations is presented. (Author) 9 refs.

  6. Simulation of Particulate Flows on Multi-Processor Machines with Distributed Memory

    International Nuclear Information System (INIS)

    Uhlmann, M.

    2004-01-01

    We present a method for the parallelization of an immersed boundary algorithm for particulate flows using the MPI standard of communication. The treatment of the fluid phase uses the domain decomposition technique over a Cartesian processor grid. The solution of the Hehnholtz problem is approximately factorized an relies upon apparel tri-diagonal solver; the Poisson problem is solved by means of a parallel multi-grid technique simulator MUDPACK. For the solid phase we employ a master-slaves technique where one process or handles all the particles contained in its Eulerian fluid sub-domain and zero or more neighbor processors collaborate in the computation of particle-related quantities whenever a particle position overlaps the boundary of a sub- do mam.The parallel efficiency for some preliminary computations is presented. (Author) 9 refs

  7. A compositional reservoir simulator on distributed memory parallel computers

    International Nuclear Information System (INIS)

    Rame, M.; Delshad, M.

    1995-01-01

    This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. A portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented

  8. The ACP (Advanced Computer Program) multiprocessor system at Fermilab

    Energy Technology Data Exchange (ETDEWEB)

    Nash, T.; Areti, H.; Atac, R.; Biel, J.; Case, G.; Cook, A.; Fischler, M.; Gaines, I.; Hance, R.; Husby, D.

    1986-09-01

    The Advanced Computer Program at Fermilab has developed a multiprocessor system which is easy to use and uniquely cost effective for many high energy physics problems. The system is based on single board computers which cost under $2000 each to build including 2 Mbytes of on board memory. These standard VME modules each run experiment reconstruction code in Fortran at speeds approaching that of a VAX 11/780. Two versions have been developed: one uses Motorola's 68020 32 bit microprocessor, the other runs with AT and T's 32100. both include the corresponding floating point coprocessor chip. The first system, when fully configured, uses 70 each of the two types of processors. A 53 processor system has been operated for several months with essentially no down time by computer operators in the Fermilab Computer Center, performing at nearly the capacity of 6 CDC Cyber 175 mainframe computers. The VME crates in which the processing ''nodes'' sit are connected via a high speed ''Branch Bus'' to one or more MicroVAX computers which act as hosts handling system resource management and all I/O in offline applications. An interface from Fastbus to the Branch Bus has been developed for online use which has been tested error free at 20 Mbytes/sec for 48 hours. ACP hardware modules are now available commercially. A major package of software, including a simulator that runs on any VAX, has been developed. It allows easy migration of existing programs to this multiprocessor environment. This paper describes the ACP Multiprocessor System and early experience with it at Fermilab and elsewhere.

  9. The ACP [Advanced Computer Program] multiprocessor system at Fermilab

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.

    1986-09-01

    The Advanced Computer Program at Fermilab has developed a multiprocessor system which is easy to use and uniquely cost effective for many high energy physics problems. The system is based on single board computers which cost under $2000 each to build including 2 Mbytes of on board memory. These standard VME modules each run experiment reconstruction code in Fortran at speeds approaching that of a VAX 11/780. Two versions have been developed: one uses Motorola's 68020 32 bit microprocessor, the other runs with AT and T's 32100. both include the corresponding floating point coprocessor chip. The first system, when fully configured, uses 70 each of the two types of processors. A 53 processor system has been operated for several months with essentially no down time by computer operators in the Fermilab Computer Center, performing at nearly the capacity of 6 CDC Cyber 175 mainframe computers. The VME crates in which the processing ''nodes'' sit are connected via a high speed ''Branch Bus'' to one or more MicroVAX computers which act as hosts handling system resource management and all I/O in offline applications. An interface from Fastbus to the Branch Bus has been developed for online use which has been tested error free at 20 Mbytes/sec for 48 hours. ACP hardware modules are now available commercially. A major package of software, including a simulator that runs on any VAX, has been developed. It allows easy migration of existing programs to this multiprocessor environment. This paper describes the ACP Multiprocessor System and early experience with it at Fermilab and elsewhere

  10. Scalable Multiprocessor for High-Speed Computing in Space

    Science.gov (United States)

    Lux, James; Lang, Minh; Nishimoto, Kouji; Clark, Douglas; Stosic, Dorothy; Bachmann, Alex; Wilkinson, William; Steffke, Richard

    2004-01-01

    A report discusses the continuing development of a scalable multiprocessor computing system for hard real-time applications aboard a spacecraft. "Hard realtime applications" signifies applications, like real-time radar signal processing, in which the data to be processed are generated at "hundreds" of pulses per second, each pulse "requiring" millions of arithmetic operations. In these applications, the digital processors must be tightly integrated with analog instrumentation (e.g., radar equipment), and data input/output must be synchronized with analog instrumentation, controlled to within fractions of a microsecond. The scalable multiprocessor is a cluster of identical commercial-off-the-shelf generic DSP (digital-signal-processing) computers plus generic interface circuits, including analog-to-digital converters, all controlled by software. The processors are computers interconnected by high-speed serial links. Performance can be increased by adding hardware modules and correspondingly modifying the software. Work is distributed among the processors in a parallel or pipeline fashion by means of a flexible master/slave control and timing scheme. Each processor operates under its own local clock; synchronization is achieved by broadcasting master time signals to all the processors, which compute offsets between the master clock and their local clocks.

  11. Parallel grid generation algorithm for distributed memory computers

    Science.gov (United States)

    Moitra, Stuti; Moitra, Anutosh

    1994-01-01

    A parallel grid-generation algorithm and its implementation on the Intel iPSC/860 computer are described. The grid-generation scheme is based on an algebraic formulation of homotopic relations. Methods for utilizing the inherent parallelism of the grid-generation scheme are described, and implementation of multiple levELs of parallelism on multiple instruction multiple data machines are indicated. The algorithm is capable of providing near orthogonality and spacing control at solid boundaries while requiring minimal interprocessor communications. Results obtained on the Intel hypercube for a blended wing-body configuration are used to demonstrate the effectiveness of the algorithm. Fortran implementations bAsed on the native programming model of the iPSC/860 computer and the Express system of software tools are reported. Computational gains in execution time speed-up ratios are given.

  12. Parallel Reduction of Large Radar Interferometry Scenes on a Mid-scale, Symmetric Multiprocessor Mainframe Computer

    Science.gov (United States)

    Harcke, L. J.; Zebker, H. A.

    2006-12-01

    We report on experiences in processing repeat-orbit interferometry data sets on a mid-scale multiprocessor mainframe computer. Newer applications of interferometric and polarimetric data processing, such as permanent scatterer deformation monitoring, require the generation of many tens of repeat-pass interferometry data pairs, perhaps 30 to 50, to provide sufficient input to the deformation model. Moving existing radar processing techniques toward massively parallel computation provides a path to coping with such large data sets, which can consist of 30 to 50 gigabytes (GB) of raw data. In June 2006, the Stanford School of Earth Sciences dedicated a new computation center for general research use. Two large machines compose the center: a single-node, symmetric multiprocessor (SMP) machine with 48 processor cores and a single 192~GB memory, and a 64 node distributed cluster containing 128 processor cores with at least 2~GB of memory per node. Distributed processing of the matched filter for synthetic aperture radar image formation requires a high communication-to-computation ratio. Experiments performed over a decade ago on distributed memory supercomputers, and repeated a half-decade ago on commodity workstation clusters, both demonstrated saturation of inter-node communication links. For this reason, we chose to parallelize the interferometric processor on the shared memory computer using the OpenMP programming standard. We find, not unexpectedly, that the input/output stage of processing standard 100-by-100~kilometer ERS-1 scenes quickly dominates the total computation time, and that only modest increases in processing time are achieved after 8 to 16 processor cores are brought to bear on a single data set. The input and output data sit in single, serially accessed disk files, creating a bottleneck for overall throughput. This points to a scheme for efficient partitioning of mid-size (24 to 48~core) machines for reducing large Earth science data sets, where 3 to

  13. Program partitioning for NUMA multiprocessor computer systems. [Nonuniform memory access

    Energy Technology Data Exchange (ETDEWEB)

    Wolski, R.M.; Feo, J.T. (Lawrence Livermore National Lab., CA (United States))

    1993-11-01

    Program partitioning and scheduling are essential steps in programming non-shared-memory computer systems. Partitioning is the separation of program operations into sequential tasks, and scheduling is the assignment of tasks to processors. To be effective, automatic methods require an accurate representation of the model of computation and the target architecture. Current partitioning methods assume today's most prevalent models -- macro dataflow and a homogeneous/two-level multicomputer system. Based on communication channels, neither model represents well the emerging class of NUMA multiprocessor computer systems consisting of hierarchical read/write memories. Consequently, the partitions generated by extant methods do not execute well on these systems. In this paper, the authors extend the conventional graph representation of the macro-dataflow model to enable mapping heuristics to consider the complex communication options supported by NUMA architectures. They describe two such heuristics. Simulated execution times of program graphs show that the model and heuristics generate higher quality program mappings than current methods for NUMA architectures.

  14. A scalable parallel black oil simulator on distributed memory parallel computers

    Science.gov (United States)

    Wang, Kun; Liu, Hui; Chen, Zhangxin

    2015-11-01

    This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.

  15. High-Performance Computation of Distributed-Memory Parallel 3D Voronoi and Delaunay Tessellation

    Energy Technology Data Exchange (ETDEWEB)

    Peterka, Tom; Morozov, Dmitriy; Phillips, Carolyn

    2014-11-14

    Computing a Voronoi or Delaunay tessellation from a set of points is a core part of the analysis of many simulated and measured datasets: N-body simulations, molecular dynamics codes, and LIDAR point clouds are just a few examples. Such computational geometry methods are common in data analysis and visualization; but as the scale of simulations and observations surpasses billions of particles, the existing serial and shared-memory algorithms no longer suffice. A distributed-memory scalable parallel algorithm is the only feasible approach. The primary contribution of this paper is a new parallel Delaunay and Voronoi tessellation algorithm that automatically determines which neighbor points need to be exchanged among the subdomains of a spatial decomposition. Other contributions include periodic and wall boundary conditions, comparison of our method using two popular serial libraries, and application to numerous science datasets.

  16. Computational cost of isogeometric multi-frontal solvers on parallel distributed memory machines

    KAUST Repository

    Woźniak, Maciej

    2015-02-01

    This paper derives theoretical estimates of the computational cost for isogeometric multi-frontal direct solver executed on parallel distributed memory machines. We show theoretically that for the Cp-1 global continuity of the isogeometric solution, both the computational cost and the communication cost of a direct solver are of order O(log(N)p2) for the one dimensional (1D) case, O(Np2) for the two dimensional (2D) case, and O(N4/3p2) for the three dimensional (3D) case, where N is the number of degrees of freedom and p is the polynomial order of the B-spline basis functions. The theoretical estimates are verified by numerical experiments performed with three parallel multi-frontal direct solvers: MUMPS, PaStiX and SuperLU, available through PETIGA toolkit built on top of PETSc. Numerical results confirm these theoretical estimates both in terms of p and N. For a given problem size, the strong efficiency rapidly decreases as the number of processors increases, becoming about 20% for 256 processors for a 3D example with 1283 unknowns and linear B-splines with C0 global continuity, and 15% for a 3D example with 643 unknowns and quartic B-splines with C3 global continuity. At the same time, one cannot arbitrarily increase the problem size, since the memory required by higher order continuity spaces is large, quickly consuming all the available memory resources even in the parallel distributed memory version. Numerical results also suggest that the use of distributed parallel machines is highly beneficial when solving higher order continuity spaces, although the number of processors that one can efficiently employ is somehow limited.

  17. Matrix factorization on a hypercube multiprocessor

    International Nuclear Information System (INIS)

    Geist, G.A.; Heath, M.T.

    1985-08-01

    This paper is concerned with parallel algorithms for matrix factorization on distributed-memory, message-passing multiprocessors, with special emphasis on the hypercube. Both Cholesky factorization of symmetric positive definite matrices and LU factorization of nonsymmetric matrices using partial pivoting are considered. The use of the resulting triangular factors to solve systems of linear equations by forward and back substitutions is also considered. Efficiencies of various parallel computational approaches are compared in terms of empirical results obtained on an Intel iPSC hypercube. 19 refs., 6 figs., 2 tabs

  18. Construction and Application of an AMR Algorithm for Distributed Memory Computers

    OpenAIRE

    Deiterding, Ralf

    2003-01-01

    While the parallelization of blockstructured adaptive mesh refinement techniques is relatively straight-forward on shared memory architectures, appropriate distribution strategies for the emerging generation of distributed memory machines are a topic of on-going research. In this paper, a locality-preserving domain decomposition is proposed that partitions the entire AMR hierarchy from the base level on. It is shown that the approach reduces the communication costs and simplifies the im...

  19. Features of development and analysis of the simulation model of a multiprocessor computer system

    Directory of Open Access Journals (Sweden)

    O. M. Brekhov

    2017-01-01

    Full Text Available Over the past decade, multiprocessor systems have been applied in computer technology. At present,multi-core processors are equipped not only with supercomputers, but also with the vast majority of mobile devices. This creates the need for students to learn the basic principles of their construction and functioning.One of the possible methods for analyzing the operation of multiprocessor systems is simulation modeling.Its use contributes to a better understanding of the effect of workload and structure parameters on performance. The article considers the features of the development of the simulation model for estimating the time characteristics of a multiprocessor computer system, as well as the use of the regenerative method of model analysis. The characteristics of the software implementation of the inverse kinematics solution of the robot are adopted as a workload. The given task consists in definition of turns in joints of the manipulator on known angular and linear position of its grasp. An analytical algorithm for solving the problem was chosen, namely, the method of simple kinematic relations. The work of the program is characterized by the presence of parallel calculations, during which resource conflicts arise between the processor cores, involved in simultaneous access to the memory via a common bus. In connection with the high information connectivity between parallel running programs, it is assumed that all processing cores use shared memory. The simulation model takes into account probabilistic memory accesses and tracks emerging queues to shared resources. The collected statistics reveal the productive and overhead time costs for the program implementation for each processor core involved. The simulation results show the unevenness of kernel utilization, downtime in queues to shared resources and temporary losses while waiting for other cores due to information dependencies. The results of the simulation are estimated by the

  20. Highway traffic simulation on multi-processor computers

    Energy Technology Data Exchange (ETDEWEB)

    Hanebutte, U.R.; Doss, E.; Tentner, A.M.

    1997-04-01

    A computer model has been developed to simulate highway traffic for various degrees of automation with a high level of fidelity in regard to driver control and vehicle characteristics. The model simulates vehicle maneuvering in a multi-lane highway traffic system and allows for the use of Intelligent Transportation System (ITS) technologies such as an Automated Intelligent Cruise Control (AICC). The structure of the computer model facilitates the use of parallel computers for the highway traffic simulation, since domain decomposition techniques can be applied in a straight forward fashion. In this model, the highway system (i.e. a network of road links) is divided into multiple regions; each region is controlled by a separate link manager residing on an individual processor. A graphical user interface augments the computer model kv allowing for real-time interactive simulation control and interaction with each individual vehicle and road side infrastructure element on each link. Average speed and traffic volume data is collected at user-specified loop detector locations. Further, as a measure of safety the so- called Time To Collision (TTC) parameter is being recorded.

  1. Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer. Volume 4: FTMP executive summary

    Science.gov (United States)

    Smith, T. B., III; Lala, J. H.

    1984-01-01

    The FTMP architecture is a high reliability computer concept modeled after a homogeneous multiprocessor architecture. Elements of the FTMP are operated in tight synchronism with one another and hardware fault-detection and fault-masking is provided which is transparent to the software. Operating system design and user software design is thus greatly simplified. Performance of the FTMP is also comparable to that of a simplex equivalent due to the efficiency of fault handling hardware. The FTMP project constructed an engineering module of the FTMP, programmed the machine and extensively tested the architecture through fault injection and other stress testing. This testing confirmed the soundness of the FTMP concepts.

  2. Multiprocessors and runtime compilation

    Science.gov (United States)

    Saltz, Joel; Berryman, Harry; Wu, Janet

    1990-01-01

    Runtime preprocessing plays a major role in many efficient algorithms in computer science, as well as playing an important role in exploiting multiprocessor architectures. Examples are given that elucidate the importance of runtime preprocessing and show how these optimizations can be integrated into compilers. To support the arguments, transformations implemented in prototype multiprocessor compilers are described and benchmarks from the iPSC2/860, the CM-2, and the Encore Multimax/320 are presented.

  3. Use of a genetic algorithm to solve two-fluid flow problems on an NCUBE multiprocessor computer

    International Nuclear Information System (INIS)

    Pryor, R.J.; Cline, D.D.

    1992-01-01

    A method of solving the two-phase fluid flow equations using a genetic algorithm on a NCUBE multiprocessor computer is presented. The topics discussed are the two-phase flow equations, the genetic representation of the unknowns, the fitness function, the genetic operators, and the implementation of the algorithm on the NCUBE computer. The efficiency of the implementation is investigated using a pipe blowdown problem. Effects of varying the genetic parameters and the number of processors are presented

  4. Efficient implementation of multidimensional fast fourier transform on a distributed-memory parallel multi-node computer

    Science.gov (United States)

    Bhanot, Gyan V [Princeton, NJ; Chen, Dong [Croton-On-Hudson, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

    2012-01-10

    The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.

  5. A multiprocessor computer simulation model employing a feedback scheduler/allocator for memory space and bandwidth matching and TMR processing

    Science.gov (United States)

    Bradley, D. B.; Irwin, J. D.

    1974-01-01

    A computer simulation model for a multiprocessor computer is developed that is useful for studying the problem of matching multiprocessor's memory space, memory bandwidth and numbers and speeds of processors with aggregate job set characteristics. The model assumes an input work load of a set of recurrent jobs. The model includes a feedback scheduler/allocator which attempts to improve system performance through higher memory bandwidth utilization by matching individual job requirements for space and bandwidth with space availability and estimates of bandwidth availability at the times of memory allocation. The simulation model includes provisions for specifying precedence relations among the jobs in a job set, and provisions for specifying precedence execution of TMR (Triple Modular Redundant and SIMPLEX (non redundant) jobs.

  6. Multiprocessor programming environment

    Energy Technology Data Exchange (ETDEWEB)

    Smith, M.B.; Fornaro, R.

    1988-12-01

    Programming tools and techniques have been well developed for traditional uniprocessor computer systems. The focus of this research project is on the development of a programming environment for a high speed real time heterogeneous multiprocessor system, with special emphasis on languages and compilers. The new tools and techniques will allow a smooth transition for programmers with experience only on single processor systems.

  7. Embedded multiprocessors scheduling and synchronization

    CERN Document Server

    Sriram, Sundararajan

    2009-01-01

    Techniques for Optimizing Multiprocessor Implementations of Signal Processing ApplicationsAn indispensable component of the information age, signal processing is embedded in a variety of consumer devices, including cell phones and digital television, as well as in communication infrastructure, such as media servers and cellular base stations. Multiple programmable processors, along with custom hardware running in parallel, are needed to achieve the computation throughput required of such applications. Reviews important research in key areas related to the multiprocessor implementation of multi

  8. Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

    Science.gov (United States)

    Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.

    1990-01-01

    Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.

  9. Distributed-Memory Fast Maximal Independent Set

    Energy Technology Data Exchange (ETDEWEB)

    Kanewala Appuhamilage, Thejaka Amila J.; Zalewski, Marcin J.; Lumsdaine, Andrew

    2017-09-13

    The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby’s seminal MIS algorithms, “Luby(A)” and “Luby(B),” to distributed-memory execution, and we evaluate their performance. We compare our results with the “Filtered MIS” implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.

  10. A view of Kanerva's sparse distributed memory

    Science.gov (United States)

    Denning, P. J.

    1986-01-01

    Pentti Kanerva is working on a new class of computers, which are called pattern computers. Pattern computers may close the gap between capabilities of biological organisms to recognize and act on patterns (visual, auditory, tactile, or olfactory) and capabilities of modern computers. Combinations of numeric, symbolic, and pattern computers may one day be capable of sustaining robots. The overview of the requirements for a pattern computer, a summary of Kanerva's Sparse Distributed Memory (SDM), and examples of tasks this computer can be expected to perform well are given.

  11. Multiprocessor development for robot control

    International Nuclear Information System (INIS)

    Lee, Jong Min; Kim, Seung Ho; Hwang, Suk Yeoung; Sohn, Surg Won; Kim, Byung Soo; Kim, Chang Hoi; Lee, Yong Bum; Kim, Woong Ki

    1988-12-01

    The object of this project is to develop a multiprocessor system which is essential to robot technology. A multiprocessor system interconnecting many single board computer is much faster and flexible than a single processor. The developed multiprocessor will be used to control nuclear mobile robot, so a loosely coupled system is adopted as a robot controller. A total configuration of controller is divided into three main parts in related with its function. It is consisted of supervisory control part, functional control part, remote control part. The designed control system is to be expanded easily for further use with a modular architecture, so the functional independency within sub-systems can be obtained throughout the system structure. Electromagnetic interference affecting to the control system is minimized by using optical fiber as communication media between robot and control system. System performances is enhanced not only by using distributed architecture in hardware, but by adopting real-time, multi-tasking operating system in software. The iRMX86 OS is used and reconfigured for real-time, multi-tasking operation. RS-485 serial communication protocol is used between functional control part and remote control part. Since the developed multiprocessor control system is an essential and fundamental technology for artificial intelligent robot, the result of this project can be applied directly to nuclear mobile robot. (Author)

  12. Green computing: power optimisation of VFI-based real-time multiprocessor dataflow applications (extended version)

    NARCIS (Netherlands)

    Ahmad, W.; Holzenspies, P.K.F.; Stoelinga, Mariëlle Ida Antoinette; van de Pol, Jan Cornelis

    2015-01-01

    Execution time is no longer the only performance metric for computer systems. In fact, a trend is emerging to trade raw performance for energy savings. Techniques like Dynamic Power Management (DPM, switching to low power state) and Dynamic Voltage and Frequency Scaling (DVFS, throttling processor

  13. Green computing: efficient energy management of multiprocessor streaming applications via model checking

    NARCIS (Netherlands)

    Ahmad, W.

    2017-01-01

    Streaming applications such as virtual reality, video conferencing, and face detection, impose high demands on a system’s performance and battery life. With the advancement in mobile computing, these applications are increasingly implemented on battery-constrained platforms, such as gaming consoles,

  14. Green computing: power optimisation of vfi-based real-time multiprocessor dataflow applications

    NARCIS (Netherlands)

    Ahmad, W.; Holzenspies, P.K.F.; Stoelinga, Mariëlle Ida Antoinette; van de Pol, Jan Cornelis

    2015-01-01

    Execution time is no longer the only performance metric for computer systems. In fact, a trend is emerging to trade raw performance for energy savings. Techniques like Dynamic Power Management (DPM, switching to low power state) and Dynamic Voltage and Frequency Scaling (DVFS, throttling processor

  15. Icarus: A 2-D Direct Simulation Monte Carlo (DSMC) Code for Multi-Processor Computers

    International Nuclear Information System (INIS)

    BARTEL, TIMOTHY J.; PLIMPTON, STEVEN J.; GALLIS, MICHAIL A.

    2001-01-01

    Icarus is a 2D Direct Simulation Monte Carlo (DSMC) code which has been optimized for the parallel computing environment. The code is based on the DSMC method of Bird[11.1] and models from free-molecular to continuum flowfields in either cartesian (x, y) or axisymmetric (z, r) coordinates. Computational particles, representing a given number of molecules or atoms, are tracked as they have collisions with other particles or surfaces. Multiple species, internal energy modes (rotation and vibration), chemistry, and ion transport are modeled. A new trace species methodology for collisions and chemistry is used to obtain statistics for small species concentrations. Gas phase chemistry is modeled using steric factors derived from Arrhenius reaction rates or in a manner similar to continuum modeling. Surface chemistry is modeled with surface reaction probabilities; an optional site density, energy dependent, coverage model is included. Electrons are modeled by either a local charge neutrality assumption or as discrete simulational particles. Ion chemistry is modeled with electron impact chemistry rates and charge exchange reactions. Coulomb collision cross-sections are used instead of Variable Hard Sphere values for ion-ion interactions. The electro-static fields can either be: externally input, a Langmuir-Tonks model or from a Green's Function (Boundary Element) based Poison Solver. Icarus has been used for subsonic to hypersonic, chemically reacting, and plasma flows. The Icarus software package includes the grid generation, parallel processor decomposition, post-processing, and restart software. The commercial graphics package, Tecplot, is used for graphics display. All of the software packages are written in standard Fortran

  16. Multiprocessor communication system

    NARCIS (Netherlands)

    Bekooij, Marco Jan Gerrit

    2010-01-01

    The invention relates to a multiprocessor communication system including at least two processors which communicate with each other, the processors have network interfaces. The network interfaces contain at least one register which indicates the existing data or free space available.

  17. Multiprocessor data acquisition system

    International Nuclear Information System (INIS)

    Haumann, J.R.; Crawford, R.K.

    1987-01-01

    A multiprocessor data acquisition system has been built to replace the single processor systems at the Intense Pulsed Neutron Source (IPNS) at Argonne National Laboratory. The multiprocessor system was needed to accommodate the higher data rates at IPNS brought about by improvements in the source and changes in instrument configurations. This paper describes the hardware configuration of the system and the method of task sharing and compares results to the single processor system

  18. Complexity of scheduling multiprocessor tasks with prespecified processor allocations

    NARCIS (Netherlands)

    Hoogeveen, J.A.; van de Velde, S.L.; van de Velde, S.L.; Veltman, Bart

    1995-01-01

    We investigate the computational complexity of scheduling multiprocessor tasks with prespecified processor allocations. We consider two criteria: minimizing schedule length and minimizing the sum of the task completion times. In addition, we investigate the complexity of problems when precedence

  19. The art of multiprocessor programming

    CERN Document Server

    Herlihy, Maurice

    2012-01-01

    Revised and updated with improvements conceived in parallel programming courses, The Art of Multiprocessor Programming is an authoritative guide to multicore programming. It introduces a higher level set of software development skills than that needed for efficient single-core programming. This book provides comprehensive coverage of the new principles, algorithms, and tools necessary for effective multiprocessor programming. Students and professionals alike will benefit from thorough coverage of key multiprocessor programming issues. This revised edition incorporates much-demanded updates t

  20. 3-dimensional magnetotelluric inversion including topography using deformed hexahedral edge finite elements and direct solvers parallelized on symmetric multiprocessor computers - Part II: direct data-space inverse solution

    Science.gov (United States)

    Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.

    2016-01-01

    Following the creation described in Part I of a deformable edge finite-element simulator for 3-D magnetotelluric (MT) responses using direct solvers, in Part II we develop an algorithm named HexMT for 3-D regularized inversion of MT data including topography. Direct solvers parallelized on large-RAM, symmetric multiprocessor (SMP) workstations are used also for the Gauss-Newton model update. By exploiting the data-space approach, the computational cost of the model update becomes much less in both time and computer memory than the cost of the forward simulation. In order to regularize using the second norm of the gradient, we factor the matrix related to the regularization term and apply its inverse to the Jacobian, which is done using the MKL PARDISO library. For dense matrix multiplication and factorization related to the model update, we use the PLASMA library which shows very good scalability across processor cores. A synthetic test inversion using a simple hill model shows that including topography can be important; in this case depression of the electric field by the hill can cause false conductors at depth or mask the presence of resistive structure. With a simple model of two buried bricks, a uniform spatial weighting for the norm of model smoothing recovered more accurate locations for the tomographic images compared to weightings which were a function of parameter Jacobians. We implement joint inversion for static distortion matrices tested using the Dublin secret model 2, for which we are able to reduce nRMS to ˜1.1 while avoiding oscillatory convergence. Finally we test the code on field data by inverting full impedance and tipper MT responses collected around Mount St Helens in the Cascade volcanic chain. Among several prominent structures, the north-south trending, eruption-controlling shear zone is clearly imaged in the inversion.

  1. Academic training: Advanced lectures on multiprocessor programming

    CERN Multimedia

    PH Department

    2011-01-01

    Academic Training Lecture - Regular Programme 31 October 1, 2 November 2011 from 11:00 to 12:00 -  IT Auditorium, Bldg. 31   Three classes (60 mins) on Multiprocessor Programming Prof. Dr. Christoph von Praun Georg-Simon-Ohm University of Applied Sciences Nuremberg, Germany This is an advanced class on multiprocessor programming. The class gives an introduction to principles of concurrent objects and the notion of different progress guarantees that concurrent computations can have. The focus of this class is on non-blocking computations, i.e. concurrent programs that do not make use of locks. We discuss the implementation of practical non-blocking data structures in detail. 1st class: Introduction to concurrent objects 2nd class: Principles of non-blocking synchronization 3rd class: Concurrent queues Brief Bio of Christoph von Praun Christoph worked on a variety of analysis techniques and runtime platforms for parallel programs. Hist most recent research studies programming models an...

  2. Debugging in a multi-processor environment

    International Nuclear Information System (INIS)

    Spann, J.M.

    1981-01-01

    The Supervisory Control and Diagnostic System (SCDS) for the Mirror Fusion Test Facility (MFTF) consists of nine 32-bit minicomputers arranged in a tightly coupled distributed computer system utilizing a share memory as the data exchange medium. Debugging of more than one program in the multi-processor environment is a difficult process. This paper describes what new tools were developed and how the testing of software is performed in the SCDS for the MFTF project

  3. Multiprocessor architecture: Synthesis and evaluation

    Science.gov (United States)

    Standley, Hilda M.

    1990-01-01

    Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.

  4. Programmable partitioning for high-performance coherence domains in a multiprocessor system

    Science.gov (United States)

    Blumrich, Matthias A [Ridgefield, CT; Salapura, Valentina [Chappaqua, NY

    2011-01-25

    A multiprocessor computing system and a method of logically partitioning a multiprocessor computing system are disclosed. The multiprocessor computing system comprises a multitude of processing units, and a multitude of snoop units. Each of the processing units includes a local cache, and the snoop units are provided for supporting cache coherency in the multiprocessor system. Each of the snoop units is connected to a respective one of the processing units and to all of the other snoop units. The multiprocessor computing system further includes a partitioning system for using the snoop units to partition the multitude of processing units into a plurality of independent, memory-consistent, adjustable-size processing groups. Preferably, when the processor units are partitioned into these processing groups, the partitioning system also configures the snoop units to maintain cache coherency within each of said groups.

  5. Languages, compilers and run-time environments for distributed memory machines

    CERN Document Server

    Saltz, J

    1992-01-01

    Papers presented within this volume cover a wide range of topics related to programming distributed memory machines. Distributed memory architectures, although having the potential to supply the very high levels of performance required to support future computing needs, present awkward programming problems. The major issue is to design methods which enable compilers to generate efficient distributed memory programs from relatively machine independent program specifications. This book is the compilation of papers describing a wide range of research efforts aimed at easing the task of programmin

  6. Distributed parallel messaging for multiprocessor systems

    Science.gov (United States)

    Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

    2013-06-04

    A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.

  7. Large Data Visualization on Distributed Memory Mulit-GPU Clusters

    Energy Technology Data Exchange (ETDEWEB)

    Childs, Henry R.

    2010-03-01

    Data sets of immense size are regularly generated on large scale computing resources. Even among more traditional methods for acquisition of volume data, such as MRI and CT scanners, data which is too large to be effectively visualization on standard workstations is now commonplace. One solution to this problem is to employ a 'visualization cluster,' a small to medium scale cluster dedicated to performing visualization and analysis of massive data sets generated on larger scale supercomputers. These clusters are designed to fit a different need than traditional supercomputers, and therefore their design mandates different hardware choices, such as increased memory, and more recently, graphics processing units (GPUs). While there has been much previous work on distributed memory visualization as well as GPU visualization, there is a relative dearth of algorithms which effectively use GPUs at a large scale in a distributed memory environment. In this work, we study a common visualization technique in a GPU-accelerated, distributed memory setting, and present performance characteristics when scaling to extremely large data sets.

  8. Parallel-vector algorithms for particle simulations on shared-memory multiprocessors

    International Nuclear Information System (INIS)

    Nishiura, Daisuke; Sakaguchi, Hide

    2011-01-01

    Over the last few decades, the computational demands of massive particle-based simulations for both scientific and industrial purposes have been continuously increasing. Hence, considerable efforts are being made to develop parallel computing techniques on various platforms. In such simulations, particles freely move within a given space, and so on a distributed-memory system, load balancing, i.e., assigning an equal number of particles to each processor, is not guaranteed. However, shared-memory systems achieve better load balancing for particle models, but suffer from the intrinsic drawback of memory access competition, particularly during (1) paring of contact candidates from among neighboring particles and (2) force summation for each particle. Here, novel algorithms are proposed to overcome these two problems. For the first problem, the key is a pre-conditioning process during which particle labels are sorted by a cell label in the domain to which the particles belong. Then, a list of contact candidates is constructed by pairing the sorted particle labels. For the latter problem, a table comprising the list indexes of the contact candidate pairs is created and used to sum the contact forces acting on each particle for all contacts according to Newton's third law. With just these methods, memory access competition is avoided without additional redundant procedures. The parallel efficiency and compatibility of these two algorithms were evaluated in discrete element method (DEM) simulations on four types of shared-memory parallel computers: a multicore multiprocessor computer, scalar supercomputer, vector supercomputer, and graphics processing unit. The computational efficiency of a DEM code was found to be drastically improved with our algorithms on all but the scalar supercomputer. Thus, the developed parallel algorithms are useful on shared-memory parallel computers with sufficient memory bandwidth.

  9. Real-Time Multiprocessor Programming Language (RTMPL) user's manual

    Science.gov (United States)

    Arpasi, D. J.

    1985-01-01

    A real-time multiprocessor programming language (RTMPL) has been developed to provide for high-order programming of real-time simulations on systems of distributed computers. RTMPL is a structured, engineering-oriented language. The RTMPL utility supports a variety of multiprocessor configurations and types by generating assembly language programs according to user-specified targeting information. Many programming functions are assumed by the utility (e.g., data transfer and scaling) to reduce the programming chore. This manual describes RTMPL from a user's viewpoint. Source generation, applications, utility operation, and utility output are detailed. An example simulation is generated to illustrate many RTMPL features.

  10. Page placement policies for NUMA multiprocessors

    Energy Technology Data Exchange (ETDEWEB)

    LaRowe, R.P. Jr.; Ellis, C.S. (Dept. of Computer Science, Duke Univ., Durham, NC (US))

    1991-02-01

    In many parallel applications, the size of the program's data exceeds even the very large amount of main memory available on large-scale multiprocessors. Virtual memory, in the sense of a transparent management of the main/secondary memory hierarchy, is a natural solution. The replacement, fetch, and placement policies used in uniprocessor paging systems need to be reexamined in light of the differences in the behavior of parallel computations and in the memory architectures of multiprocessors. In this paper the authors investigate the impact of page placement in nonuniform memory access time (NUMA) shared memory MIMD machines. The authors experimentally evaluate several paging algorithms that incorporate different approaches to the placement issue. Under certain workload assumptions, the results show that placement algorithms that are strongly biased toward local frame allocations but are able to borrow remote frames can reduce the number of page faults over strictly local allocation. The increased cost of memory operations due to the extra remote accesses is more than compensated for by the savings resulting from the reduction in demand fetches, effectively reducing the computation completion time for these programs without having adverse effects on the performance of typical NUMA programs. The authors also discuss some early results obtained from an actual kernel implementation of one of the page placement algorithms.

  11. 2: Local area networks as a multiprocessor treatment planning system

    International Nuclear Information System (INIS)

    Neblett, D.L.; Hogan, S.E.

    1987-01-01

    The creation of a local area network (LAN) of interconnected computers provides an environment of multi computer processors that adds a new dimension to treatment planning. A LAN system provides the opportunity to have two or more computers working on the plan in parallel. With high speed interprocessor transfer, events such as the time consuming task of correcting several individual beams for contours and inhomogeneities can be performed simultaneously; thus, effectively creating a parallel multiprocessor treatment planning system

  12. Plasma physics modeling and the Cray-2 multiprocessor

    International Nuclear Information System (INIS)

    Killeen, J.

    1985-01-01

    The importance of computer modeling in the magnetic fusion energy research program is discussed. The need for the most advanced supercomputers is described. To meet the demand for more powerful scientific computers to solve larger and more complicated problems, the computer industry is developing multiprocessors. The role of the Cray-2 in plasma physics modeling is discussed with some examples. 28 refs., 2 figs., 1 tab

  13. Safety-critical Java with cyclic executives on chip-multiprocessors

    DEFF Research Database (Denmark)

    Ravn, Anders P.; Schoeberl, Martin

    2012-01-01

    Chip-multiprocessors offer increased processing power at a low cost. However, in order to use them for real-time systems, tasks have to be scheduled efficiently and predictably. It is well known that finding optimal schedules is a computationally hard problem. In this paper we present a solution ...... for multiprocessors, we have implemented it in the context of safety-critical Java on a Java processor....

  14. A Multiprocessor Operating System Simulator

    Science.gov (United States)

    Johnston, Gary M.; Campbell, Roy H.

    1988-01-01

    This paper describes a multiprocessor operating system simulator that was developed by the authors in the Fall semester of 1987. The simulator was built in response to the need to provide students with an environment in which to build and test operating system concepts as part of the coursework of a third-year undergraduate operating systems course. Written in C++, the simulator uses the co-routine style task package that is distributed with the AT&T C++ Translator to provide a hierarchy of classes that represents a broad range of operating system software and hardware components. The class hierarchy closely follows that of the 'Choices' family of operating systems for loosely- and tightly-coupled multiprocessors. During an operating system course, these classes are refined and specialized by students in homework assignments to facilitate experimentation with different aspects of operating system design and policy decisions. The current implementation runs on the IBM RT PC under 4.3bsd UNIX.

  15. Parallel Simulation of Chip-Multiprocessor Architectures

    National Research Council Canada - National Science Library

    Chidester, Matthew C; George, Alan D

    2002-01-01

    Chip-multiprocessor (CMP) architectures present a challenge for efficient simulation, combining the requirements of a detailed microprocessor simulator with that of a tightly-coupled parallel system...

  16. Parallel Breadth-First Search on Distributed Memory Systems

    Energy Technology Data Exchange (ETDEWEB)

    Computational Research Division; Buluc, Aydin; Madduri, Kamesh

    2011-04-15

    Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms. We present two highly-tuned par- allel approaches for BFS on large parallel systems: a level-synchronous strategy that relies on a simple vertex-based partitioning of the graph, and a two-dimensional sparse matrix- partitioning-based approach that mitigates parallel commu- nication overhead. For both approaches, we also present hybrid versions with intra-node multithreading. Our novel hybrid two-dimensional algorithm reduces communication times by up to a factor of 3.5, relative to a common vertex based approach. Our experimental study identifies execu- tion regimes in which these approaches will be competitive, and we demonstrate extremely high performance on lead- ing distributed-memory parallel systems. For instance, for a 40,000-core parallel execution on Hopper, an AMD Magny- Cours based system, we achieve a BFS performance rate of 17.8 billion edge visits per second on an undirected graph of 4.3 billion vertices and 68.7 billion edges with skewed degree distribution.

  17. Multiprocessor shared-memory information exchange

    International Nuclear Information System (INIS)

    Santoline, L.L.; Bowers, M.D.; Crew, A.W.; Roslund, C.J.; Ghrist, W.D. III

    1989-01-01

    In distributed microprocessor-based instrumentation and control systems, the inter-and intra-subsystem communication requirements ultimately form the basis for the overall system architecture. This paper describes a software protocol which addresses the intra-subsystem communications problem. Specifically the protocol allows for multiple processors to exchange information via a shared-memory interface. The authors primary goal is to provide a reliable means for information to be exchanged between central application processor boards (masters) and dedicated function processor boards (slaves) in a single computer chassis. The resultant Multiprocessor Shared-Memory Information Exchange (MSMIE) protocol, a standard master-slave shared-memory interface suitable for use in nuclear safety systems, is designed to pass unidirectional buffers of information between the processors while providing a minimum, deterministic cycle time for this data exchange

  18. Operating system for a real-time multiprocessor propulsion system simulator. User's manual

    Science.gov (United States)

    Cole, G. L.

    1985-01-01

    The NASA Lewis Research Center is developing and evaluating experimental hardware and software systems to help meet future needs for real-time, high-fidelity simulations of air-breathing propulsion systems. Specifically, the real-time multiprocessor simulator project focuses on the use of multiple microprocessors to achieve the required computing speed and accuracy at relatively low cost. Operating systems for such hardware configurations are generally not available. A real time multiprocessor operating system (RTMPOS) that supports a variety of multiprocessor configurations was developed at Lewis. With some modification, RTMPOS can also support various microprocessors. RTMPOS, by means of menus and prompts, provides the user with a versatile, user-friendly environment for interactively loading, running, and obtaining results from a multiprocessor-based simulator. The menu functions are described and an example simulation session is included to demonstrate the steps required to go from the simulation loading phase to the execution phase.

  19. Advanced lectures on multiprocessor programming (1/3)

    CERN Multimedia

    CERN. Geneva

    2011-01-01

    Three classes (60 mins) on Multiprocessor Programming Prof. Dr. Christoph von Praun Georg-Simon-Ohm University of Applied Sciences Nuremberg, Germany This is an advanced class on multiprocessor programming. The class gives an introduction to principles of concurrent objects and the notion of different progress guarantees that concurrent computations can have. The focus of this class is on non-blocking computations, i.e. concurrent programs that do not make use of locks. We discuss the implementation of practical non-blocking data structures in detail. 1st class: Introduction to concurrent objects 2nd class: Principles of non-blocking synchronization 3rd class: Concurrent queues Brief Bio of Christoph von Praun Christoph worked on a variety of analysis techniques and runtime platforms for parallel programs. Hist most recent research studies programming models and tools that support transactional synchronization. In prior work, which he also did at the IBM T.J. Watson Research Center in Yorktown Height...

  20. Migration of vectorized iterative solvers to distributed memory architectures

    Energy Technology Data Exchange (ETDEWEB)

    Pommerell, C. [AT& T Bell Labs., Murray Hill, NJ (United States); Ruehl, R. [CSCS-ETH, Manno (Switzerland)

    1994-12-31

    Both necessity and opportunity motivate the use of high-performance computers for iterative linear solvers. Necessity results from the size of the problems being solved-smaller problems are often better handled by direct methods. Opportunity arises from the formulation of the iterative methods in terms of simple linear algebra operations, even if this {open_quote}natural{close_quotes} parallelism is not easy to exploit in irregularly structured sparse matrices and with good preconditioners. As a result, high-performance implementations of iterative solvers have attracted a lot of interest in recent years. Most efforts are geared to vectorize or parallelize the dominating operation-structured or unstructured sparse matrix-vector multiplication, or to increase locality and parallelism by reformulating the algorithm-reducing global synchronization in inner products or local data exchange in preconditioners. Target architectures for iterative solvers currently include mostly vector supercomputers and architectures with one or few optimized (e.g., super-scalar and/or super-pipelined RISC) processors and hierarchical memory systems. More recently, parallel computers with physically distributed memory and a better price/performance ratio have been offered by vendors as a very interesting alternative to vector supercomputers. However, programming comfort on such distributed memory parallel processors (DMPPs) still lags behind. Here the authors are concerned with iterative solvers and their changing computing environment. In particular, they are considering migration from traditional vector supercomputers to DMPPs. Application requirements force one to use flexible and portable libraries. They want to extend the portability of iterative solvers rather than reimplementing everything for each new machine, or even for each new architecture.

  1. Scientific applications and numerical algorithms on the midas multiprocessor system

    International Nuclear Information System (INIS)

    Logan, D.; Maples, C.

    1986-01-01

    The MIDAS multiprocessor system is a multi-level, hierarchial structure designed at the Advanced Computer Architecture Laboratory of the University of California's Lawrence Berkeley Laboratory. A two-stage, 11-processor system has been operational for over a year and is currently undergoing expansion. It has been employed to investigate the performance of different methods of decomposing various problems and algorithms into a multiprocessor environment. The results of such tests on a variety of applications such as scientific data analysis, Monte Carlo calculations, and image processing, are discussed. Often such decompositions involve investigating the parallel structure of fundamental algorithms. Several basic algorithms dealing with random number generation, matrix diagonalization, fast Fourier transforms, and finite element methods in solving partial differential equations are also discussed. The performance and projected extensibilities of these decompositions on the MIDAS system are reported

  2. Models and formal verification of multiprocessor system-on-chips

    DEFF Research Database (Denmark)

    Brekling, Aske Wiid; Hansen, Michael Reichhardt; Madsen, Jan

    2008-01-01

    , the configuration of the execution platform and the mapping of the application onto this platform. The computational model provides a basis for formal analysis of systems. The model is translated to timed automata and a tool for system verification and simulation has been developed using Uppaal as backend. We...... a discrete model of computation for such systems and characterize the size of the computation tree it suffices to consider when checking for schedulability. Analysis of multiprocessor system on chips is a major challenge due to the freedom of interrelated choices concerning the application level...

  3. Multiprocessor performance modeling with ADAS

    Science.gov (United States)

    Hayes, Paul J.; Andrews, Asa M.

    1989-01-01

    A graph managing strategy referred to as the Algorithm to Architecture Mapping Model (ATAMM) appears useful for the time-optimized execution of application algorithm graphs in embedded multiprocessors and for the performance prediction of graph designs. This paper reports the modeling of ATAMM in the Architecture Design and Assessment System (ADAS) to make an independent verification of ATAMM's performance prediction capability and to provide a user framework for the evaluation of arbitrary algorithm graphs. Following an overview of ATAMM and its major functional rules are descriptions of the ADAS model of ATAMM, methods to enter an arbitrary graph into the model, and techniques to analyze the simulation results. The performance of a 7-node graph example is evaluated using the ADAS model and verifies the ATAMM concept by substantiating previously published performance results.

  4. Multiprocessor development for robot control

    International Nuclear Information System (INIS)

    Lee, John Min; Kim, Seung Ho; Kim, Chang Hoi; Kim, Byung Soo; Hwang, Suk Yeong; Lee, Young Bum; Sohn, Suk Won; Kim, Woon Gi

    1990-01-01

    The project of this study is to develop a real time controller applying autonomous robotic systems operated in hostile environment. Developed control system is designed with a multiprocessor to get independency and reliability as well as to extend the system easily. The control system is designed in three distinct subsystems (supervisory control part, functional control part, and remote control part). To review the functional performance of developed controller, a prototype mobile robot, which was installed 4 DOF mainpulator, was designed and manufactured. Initial tests showed that the robot could turn with a radius of 38 cm and a maximum speed of 1.26 km/hr and go over obstacle of 18 cm in height. (author)

  5. Multiprocessor scheduling for real-time systems

    CERN Document Server

    Baruah, Sanjoy; Buttazzo, Giorgio

    2015-01-01

    This book provides a comprehensive overview of both theoretical and pragmatic aspects of resource-allocation and scheduling in multiprocessor and multicore hard-real-time systems.  The authors derive new, abstract models of real-time tasks that capture accurately the salient features of real application systems that are to be implemented on multiprocessor platforms, and identify rules for mapping application systems onto the most appropriate models.  New run-time multiprocessor scheduling algorithms are presented, which are demonstrably better than those currently used, both in terms of run-time efficiency and tractability of off-line analysis.  Readers will benefit from a new design and analysis framework for multiprocessor real-time systems, which will translate into a significantly enhanced ability to provide formally verified, safety-critical real-time systems at a significantly lower cost.

  6. Control and Reliability of Optical Networks in Multiprocessors

    Science.gov (United States)

    Olsen, James Jonathan

    1993-01-01

    Optical communication links have great potential to improve the performance of interconnection networks within large parallel multiprocessors, but the problems of semiconductor laser drive control and reliability inhibit their wide use. These problems have been solved in the telecommunications context, but the telecommunications solutions, based on a small number of links, are often too bulky, complex, power-hungry, and expensive to be feasible for use in a multiprocessor network with thousands of optical links. The main problems with the telecommunications approaches are that they are, by definition, designed for long-distance communication and therefore deal with communications links in isolation, instead of in an overall systems context. By taking a system-level approach to solving the laser reliability problem in a multiprocessor, and by exploiting the short -distance nature of the links, one can achieve small, simple, low-power, and inexpensive solutions, practical for implementation in the thousands of optical links that might be used in a multiprocessor. Through modeling and experimentation, I demonstrate that such system-level solutions exist, and are feasible for use in a multiprocessor network. I divide semiconductor laser reliability problems into two classes: transient errors and hard failures, and develop solutions to each type of problem in the context of a large multiprocessor. I find that for transient errors, the computer system would require a very low bit-error-rate (BER), such as 10^{-23}, if no provision were made for error control. Optical links cannot achieve such rates directly, but I find that a much more reasonable link-level BER (such as 10^{-7} ) would be acceptable with simple error detection coding. I then propose a feedback system that will enable lasers to achieve these error levels even when laser threshold current varies. Instead of telecommunications techniques, which require laser output power monitors, I describe a software

  7. Performance analysis of dynamic load balancing algorithm for multiprocessor interconnection network

    Directory of Open Access Journals (Sweden)

    M.U. Bokhari

    2016-09-01

    Full Text Available Multiprocessor interconnection network have become powerful parallel computing system for real-time applications. Nowadays the many researchers posses studies on the dynamic load balancing in multiprocessor system. Load balancing is the method of dividing the total load among the processors of the distributed system to progress task's response time as well as resource utilization whereas ignoring a condition where few processors are overloaded or underloaded or moderately loaded. However, in dynamic load balancing algorithm presumes no priori information about behaviour of tasks or the global state of the system. There are numerous issues while designing an efficient dynamic load balancing algorithm that involves utilization of system, amount of information transferred among processors, selection of tasks for migration, load evaluation, comparison of load levels and many more. This paper enlightens the performance analysis on dynamic load balancing strategy (DLBS algorithm, used for hypercube network in multiprocessor system.

  8. Hard Real-Time Performances in Multiprocessor-Embedded Systems Using ASMP-Linux

    Directory of Open Access Journals (Sweden)

    Betti Emiliano

    2008-01-01

    Full Text Available Abstract Multiprocessor systems, especially those based on multicore or multithreaded processors, and new operating system architectures can satisfy the ever increasing computational requirements of embedded systems. ASMP-LINUX is a modified, high responsiveness, open-source hard real-time operating system for multiprocessor systems capable of providing high real-time performance while maintaining the code simple and not impacting on the performances of the rest of the system. Moreover, ASMP-LINUX does not require code changing or application recompiling/relinking. In order to assess the performances of ASMP-LINUX, benchmarks have been performed on several hardware platforms and configurations.

  9. Safety-critical Java with cyclic executives on chip-multiprocessors

    DEFF Research Database (Denmark)

    Ravn, Anders P.; Schoeberl, Martin

    2012-01-01

    Chip-multiprocessors offer increased processing power at a low cost. However, in order to use them for real-time systems, tasks have to be scheduled efficiently and predictably. It is well known that finding optimal schedules is a computationally hard problem. In this paper we present a solution...... that uses model checking to find a static schedule, if one exists at all, which gives an implementation of a table driven multiprocessor scheduler. Mutual exclusion to access shared resources is guaranteed by including access constraints in the schedule generation. To evaluate the proposed cyclic executive...

  10. High Performance Polar Decomposition on Distributed Memory Systems

    KAUST Repository

    Sukkari, Dalal E.

    2016-08-08

    The polar decomposition of a dense matrix is an important operation in linear algebra. It can be directly calculated through the singular value decomposition (SVD) or iteratively using the QR dynamically-weighted Halley algorithm (QDWH). The former is difficult to parallelize due to the preponderant number of memory-bound operations during the bidiagonal reduction. We investigate the latter scenario, which performs more floating-point operations but exposes at the same time more parallelism, and therefore, runs closer to the theoretical peak performance of the system, thanks to more compute-bound matrix operations. Profiling results show the performance scalability of QDWH for calculating the polar decomposition using around 9200 MPI processes on well and ill-conditioned matrices of 100K×100K problem size. We study then the performance impact of the QDWH-based polar decomposition as a pre-processing step toward calculating the SVD itself. The new distributed-memory implementation of the QDWH-SVD solver achieves up to five-fold speedup against current state-of-the-art vendor SVD implementations. © Springer International Publishing Switzerland 2016.

  11. Compiler-directed cache management in multiprocessors

    Science.gov (United States)

    Cheong, Hoichi; Veidenbaum, Alexander V.

    1990-01-01

    The necessity of finding alternatives to hardware-based cache coherence strategies for large-scale multiprocessor systems is discussed. Three different software-based strategies sharing the same goals and general approach are presented. They consist of a simple invalidation approach, a fast selective invalidation scheme, and a version control scheme. The strategies are suitable for shared-memory multiprocessor systems with interconnection networks and a large number of processors. Results of trace-driven simulations conducted on numerical benchmark routines to compare the performance of the three schemes are presented.

  12. Realtime multiprocessor for mobile ad hoc networks

    Directory of Open Access Journals (Sweden)

    T. Jungeblut

    2008-05-01

    Full Text Available This paper introduces a real-time Multiprocessor System-On-Chip (MPSoC for low power wireless applications. The multiprocessor is based on eight 32bit RISC processors that are connected via an Network-On-Chip (NoC. The NoC follows a novel approach with guaranteed bandwidth to the application that meets hard realtime requirements. At a clock frequency of 100 MHz the total power consumption of the MPSoC that has been fabricated in 180 nm UMC standard cell technology is 772 mW.

  13. The fast Amsterdam multiprocessor (FAMP) system hardware

    International Nuclear Information System (INIS)

    Hertzberger, L.O.; Kieft, G.; Kisielewski, B.; Wiggers, L.W.; Engster, C.; Koningsveld, L. van

    1981-01-01

    The architecture of a multiprocessor system is described that will be used for on-line filter and second stage trigger applications. The system is based on the MC 68000 microprocessor from Motorola. Emphasis is paid to hardware aspects, in particular the modularity, processor communication and interfacing, whereas the system software and the applications will be described in separate articles. (orig.)

  14. One-Step Programmable Arbiters for Multiprocessors

    DEFF Research Database (Denmark)

    Højberg, Kristian Søe

    1978-01-01

    When processors in a multiprocessor system demand service from a shared bus in an asynchronous mode, a synchronous state arbiter resolves conflicts and allocates resources. Independent of the combination of requests, only one state transition is required from a free to allocated resource...

  15. The Fast Amsterdam Multiprocessor (FAMP) operating system

    CERN Document Server

    Gosman, D; Holthuizen, D J; Por, G J A; Schoorel, M

    1981-01-01

    The Fast Amsterdam Multiprocessor system (FAMP system) is developed for online filtering and second stage triggering. The system is based on the MC68000 microprocessor from MOTOA. The authors describe the FAMP operating system software, the features of the slaves and supervisor in the FAMP operating system, the communication between supervisor and slaves using the dual port memories, the communication between user programs and the operating system. (7 refs).

  16. Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

    Energy Technology Data Exchange (ETDEWEB)

    Jin, Shuangshuang; Chen, Yousu; Wu, Di; Diao, Ruisheng; Huang, Zhenyu

    2015-12-09

    Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Message Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.

  17. Using GPI-2 for distributed memory paralleliziation of the caffe toolbox to speed up deep neural network training

    OpenAIRE

    Kühn, Martin; Keuper, Janis; Pfreundt, Franz-Josef

    2017-01-01

    Deep Neural Network (DNN) are currently of great inter- est in research and application. The training of these net- works is a compute intensive and time consuming task. To reduce training times to a bearable amount at reasonable cost we extend the popular Caffe toolbox for DNN with an efficient distributed memory communication pattern. To achieve good scalability we emphasize the overlap of computation and communication and prefer fine granu- lar synchronization patterns over global barriers...

  18. Supporting shared data structures on distributed memory architectures

    Science.gov (United States)

    Koelbel, Charles; Mehrotra, Piyush; Vanrosendale, John

    1990-01-01

    Programming nonshared memory systems is more difficult than programming shared memory systems, since there is no support for shared data structures. Current programming languages for distributed memory architectures force the user to decompose all data structures into separate pieces, with each piece owned by one of the processors in the machine, and with all communication explicitly specified by low-level message-passing primitives. A new programming environment is presented for distributed memory architectures, providing a global name space and allowing direct access to remote parts of data values. The analysis and program transformations required to implement this environment are described, and the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes are described.

  19. Geometric Algorithms for Private-Cache Chip Multiprocessors

    DEFF Research Database (Denmark)

    Ajwani, Deepak; Sitchinava, Nodari; Zeh, Norbert

    2010-01-01

    these problems from the ones in the previous group is the variable output size, which requires I/O-efficient load balancing strategies based on the contribution of the individual input elements to the output size. To obtain nearly optimal algorithms for these problems, we introduce a parallel distribution......We study techniques for obtaining efficient algorithms for geometric problems on private-cache chip multiprocessors. We show how to obtain optimal algorithms for interval stabbing counting, 1-D range counting, weighted 2-D dominance counting, and for computing 3-D maxima, 2-D lower envelopes, and 2......-D convex hulls. These results are obtained by analyzing adaptations of either the PEM merge sort algorithm or PRAM algorithms. For the second group of problems—orthogonal line segment intersection reporting, batched range reporting, and related problems—more effort is required. What distinguishes...

  20. Hardware support for CSP on a Java chip multiprocessor

    DEFF Research Database (Denmark)

    Gruian, Flavius; Schoeberl, Martin

    2013-01-01

    Due to memory bandwidth limitations, chip multiprocessors (CMPs) adopting the convenient shared memory model for their main memory architecture scale poorly. On-chip core-to-core communication is a solution to this problem, that can lead to further performance increase for a number of multithreaded...... applications. Programmatically, the Communicating Sequential Processes (CSPs) paradigm provides a sound computational model for such an architecture with message based communication. In this paper we explore hardware support for CSP in the context of an embedded Java CMP. The hardware support for CSP are on......-chip communication channels, implemented by a ring-based network-on-chip (NoC), to reduce the memory bandwidth pressure on the shared memory.The presented solution is scalable and also specific for our limited resources and real-time predictability requirements. CMP architectures of three to eight processors were...

  1. A portable implementation of ARPACK for distributed memory parallel architectures

    Energy Technology Data Exchange (ETDEWEB)

    Maschhoff, K.J.; Sorensen, D.C.

    1996-12-31

    ARPACK is a package of Fortran 77 subroutines which implement the Implicitly Restarted Arnoldi Method used for solving large sparse eigenvalue problems. A parallel implementation of ARPACK is presented which is portable across a wide range of distributed memory platforms and requires minimal changes to the serial code. The communication layers used for message passing are the Basic Linear Algebra Communication Subprograms (BLACS) developed for the ScaLAPACK project and Message Passing Interface(MPI).

  2. Distributed-Memory Breadth-First Search on Massive Graphs

    Energy Technology Data Exchange (ETDEWEB)

    Buluc, Aydin [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Beamer, Scott [Univ. of California, Berkeley, CA (United States). Dept. of Electrical Engineering and Computer Sciences; Madduri, Kamesh [Pennsylvania State Univ., University Park, PA (United States). Computer Science & Engineering Dept.; Asanovic, Krste [Univ. of California, Berkeley, CA (United States). Dept. of Electrical Engineering and Computer Sciences; Patterson, David [Univ. of California, Berkeley, CA (United States). Dept. of Electrical Engineering and Computer Sciences

    2017-09-26

    This chapter studies the problem of traversing large graphs using the breadth-first search order on distributed-memory supercomputers. We consider both the traditional level-synchronous top-down algorithm as well as the recently discovered direction optimizing algorithm. We analyze the performance and scalability trade-offs in using different local data structures such as CSR and DCSC, enabling in-node multithreading, and graph decompositions such as 1D and 2D decomposition.

  3. Method for wiring allocation and switch configuration in a multiprocessor environment

    Science.gov (United States)

    Aridor, Yariv [Zichron Ya'akov, IL; Domany, Tamar [Kiryat Tivon, IL; Frachtenberg, Eitan [Jerusalem, IL; Gal, Yoav [Haifa, IL; Shmueli, Edi [Haifa, IL; Stockmeyer, legal representative, Robert E.; Stockmeyer, Larry Joseph [San Jose, CA

    2008-07-15

    A method for wiring allocation and switch configuration in a multiprocessor computer, the method including employing depth-first tree traversal to determine a plurality of paths among a plurality of processing elements allocated to a job along a plurality of switches and wires in a plurality of D-lines, and selecting one of the paths in accordance with at least one selection criterion.

  4. Temporal Partitioning and Multi-Processor Scheduling for Reconfigurable Architectures

    DEFF Research Database (Denmark)

    Popp, Andreas; Le Moullec, Yannick; Koch, Peter

    This poster presentation outlines a proposed framework for handling mapping of signal processing applications to heterogeneous reconfigurable architectures. The methodology consists of an extension to traditional multi-processor scheduling by creating a separate HW track for generation of groups...... of tasks that are handled similarly to SW processes in a traditional multi-processor scheduling context....

  5. Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

    Science.gov (United States)

    Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

    2011-01-01

    The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.

  6. A Screen Space GPGPU Surface LIC Algorithm for Distributed Memory Data Parallel Sort Last Rendering Infrastructures

    Energy Technology Data Exchange (ETDEWEB)

    Loring, Burlen; Karimabadi, Homa; Rortershteyn, Vadim

    2014-07-01

    The surface line integral convolution(LIC) visualization technique produces dense visualization of vector fields on arbitrary surfaces. We present a screen space surface LIC algorithm for use in distributed memory data parallel sort last rendering infrastructures. The motivations for our work are to support analysis of datasets that are too large to fit in the main memory of a single computer and compatibility with prevalent parallel scientific visualization tools such as ParaView and VisIt. By working in screen space using OpenGL we can leverage the computational power of GPUs when they are available and run without them when they are not. We address efficiency and performance issues that arise from the transformation of data from physical to screen space by selecting an alternate screen space domain decomposition. We analyze the algorithm's scaling behavior with and without GPUs on two high performance computing systems using data from turbulent plasma simulations.

  7. Energy-Aware Real-Time Task Scheduling for Heterogeneous Multiprocessors with Particle Swarm Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    Weizhe Zhang

    2014-01-01

    Full Text Available Energy consumption in computer systems has become a more and more important issue. High energy consumption has already damaged the environment to some extent, especially in heterogeneous multiprocessors. In this paper, we first formulate and describe the energy-aware real-time task scheduling problem in heterogeneous multiprocessors. Then we propose a particle swarm optimization (PSO based algorithm, which can successfully reduce the energy cost and the time for searching feasible solutions. Experimental results show that the PSO-based energy-aware metaheuristic uses 40%–50% less energy than the GA-based and SFLA-based algorithms and spends 10% less time than the SFLA-based algorithm in finding the solutions. Besides, it can also find 19% more feasible solutions than the SFLA-based algorithm.

  8. Optimizing NEURON Simulation Environment Using Remote Memory Access with Recursive Doubling on Distributed Memory Systems.

    Science.gov (United States)

    Shehzad, Danish; Bozkuş, Zeki

    2016-01-01

    Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI) is used. MPI_Allgather collective is exercised for spikes exchange after each interval across distributed memory systems. The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors. This necessitates improving communication methodology to decrease the spikes exchange time over distributed memory systems. This work has improved MPI_Allgather method using Remote Memory Access (RMA) by moving two-sided communication to one-sided communication, and use of recursive doubling mechanism facilitates achieving efficient communication between the processors in precise steps. This approach enhanced communication concurrency and has improved overall runtime making NEURON more efficient for simulation of large neuronal network models.

  9. Operating system for a real-time multiprocessor propulsion system simulator

    Science.gov (United States)

    Cole, G. L.

    1984-01-01

    The success of the Real Time Multiprocessor Operating System (RTMPOS) in the development and evaluation of experimental hardware and software systems for real time interactive simulation of air breathing propulsion systems was evaluated. The Real Time Multiprocessor Operating System (RTMPOS) provides the user with a versatile, interactive means for loading, running, debugging and obtaining results from a multiprocessor based simulator. A front end processor (FEP) serves as the simulator controller and interface between the user and the simulator. These functions are facilitated by the RTMPOS which resides on the FEP. The RTMPOS acts in conjunction with the FEP's manufacturer supplied disk operating system that provides typical utilities like an assembler, linkage editor, text editor, file handling services, etc. Once a simulation is formulated, the RTMPOS provides for engineering level, run time operations such as loading, modifying and specifying computation flow of programs, simulator mode control, data handling and run time monitoring. Run time monitoring is a powerful feature of RTMPOS that allows the user to record all actions taken during a simulation session and to receive advisories from the simulator via the FEP. The RTMPOS is programmed mainly in PASCAL along with some assembly language routines. The RTMPOS software is easily modified to be applicable to hardware from different manufacturers.

  10. Fault diagnosis in sparse multiprocessor systems

    Science.gov (United States)

    Blough, Douglas M.; Sullivan, Gregory F.; Masson, Gerald M.

    1988-01-01

    The problem of fault diagnosis in multiprocessor systems is considered under a uniformly probabilistic model in which processors are faulty with probability p. This work focuses on minimizing the number of tests that must be conducted in order to correctly diagnose the state of every processor in the system with high probability. A diagnosis algorithm that can correctly diagnose the state of every processor with probability approaching one in a class of systems performing slightly greater than a linear number of tests is presented. A nearly matching lower bound on the number of tests required to achieve correct diagnosis in arbitrary systems is also proven. The number of tests required under this probabilistic model is shown to be significantly less than under a bounded-size fault set model. Because the number of tests that must be conducted is a measure of the diagnosis overhead, these results represent a dramatic improvement in the performance of system-level diagnosis technique.

  11. Particle simulation on a distributed memory highly parallel processor

    International Nuclear Information System (INIS)

    Sato, Hiroyuki; Ikesaka, Morio

    1990-01-01

    This paper describes parallel molecular dynamics simulation of atoms governed by local force interaction. The space in the model is divided into cubic subspaces and mapped to the processor array of the CAP-256, a distributed memory, highly parallel processor developed at Fujitsu Labs. We developed a new technique to avoid redundant calculation of forces between atoms in different processors. Experiments showed the communication overhead was less than 5%, and the idle time due to load imbalance was less than 11% for two model problems which contain 11,532 and 46,128 argon atoms. From the software simulation, the CAP-II which is under development is estimated to be about 45 times faster than CAP-256 and will be able to run the same problem about 40 times faster than Fujitsu's M-380 mainframe when 256 processors are used. (author)

  12. Multiprocessor Priority Ceiling Emulation for Safety-Critical Java

    DEFF Research Database (Denmark)

    Strøm, Torur Biskopstø; Schoeberl, Martin

    2015-01-01

    Priority ceiling emulation has preferable properties on uniprocessor systems, such as avoiding priority inversion and being deadlock free. This has made it a popular locking protocol. According to the safety-critical Java specication, priority ceiling emulation is a requirement for implementations....... However, implementing the protocol for multiprocessor systemsis more complex so implementations might perform worse than non-preemptive implementations. In this paper we compare two multiprocessor lock implementations with hardware support for the Java optimized processor: non-preemptive locking...

  13. Hardware locks for a real-time Java chip multiprocessor

    DEFF Research Database (Denmark)

    Strøm, Torur Biskopstø; Puffitsch, Wolfgang; Schoeberl, Martin

    2016-01-01

    A software locking mechanism commonly protects shared resources for multithreaded applications. This mechanism can, especially in chip-multiprocessor systems, result in a large synchronization overhead. For real-time systems in particular, this overhead increases the worst-case execution time....... This improvement can allow a larger number of real-time tasks to be reliably scheduled on a multiprocessor real-time platform....

  14. A distributed-memory hierarchical solver for general sparse linear systems

    Energy Technology Data Exchange (ETDEWEB)

    Chen, Chao [Stanford Univ., CA (United States). Inst. for Computational and Mathematical Engineering; Pouransari, Hadi [Stanford Univ., CA (United States). Dept. of Mechanical Engineering; Rajamanickam, Sivasankaran [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Center for Computing Research; Boman, Erik G. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Center for Computing Research; Darve, Eric [Stanford Univ., CA (United States). Inst. for Computational and Mathematical Engineering and Dept. of Mechanical Engineering

    2017-12-20

    We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by every processor. We also provide various numerical results to demonstrate the versatility and scalability of the parallel algorithm.

  15. Intelligent discrete particle swarm optimization for multiprocessor task scheduling problem

    Directory of Open Access Journals (Sweden)

    S Sarathambekai

    2017-03-01

    Full Text Available Discrete particle swarm optimization is one of the most recently developed population-based meta-heuristic optimization algorithm in swarm intelligence that can be used in any discrete optimization problems. This article presents a discrete particle swarm optimization algorithm to efficiently schedule the tasks in the heterogeneous multiprocessor systems. All the optimization algorithms share a common algorithmic step, namely population initialization. It plays a significant role because it can affect the convergence speed and also the quality of the final solution. The random initialization is the most commonly used method in majority of the evolutionary algorithms to generate solutions in the initial population. The initial good quality solutions can facilitate the algorithm to locate the optimal solution or else it may prevent the algorithm from finding the optimal solution. Intelligence should be incorporated to generate the initial population in order to avoid the premature convergence. This article presents a discrete particle swarm optimization algorithm, which incorporates opposition-based technique to generate initial population and greedy algorithm to balance the load of the processors. Make span, flow time, and reliability cost are three different measures used to evaluate the efficiency of the proposed discrete particle swarm optimization algorithm for scheduling independent tasks in distributed systems. Computational simulations are done based on a set of benchmark instances to assess the performance of the proposed algorithm.

  16. Trajectory optimization on multiprocessors - A comparison of three implementation strategies

    Science.gov (United States)

    Summerset, Twain K.; Chowkwanyun, Raymond M.

    The optimization of atmospheric flight vehicle trajectories can require the simulation of several thousand individual trajectories. Such a task can be extremely time consuming if simulating each trajectory requires numerically integrating a set of nonlinear differential equations. This traditional approach, which may require many hours' worth of analysis on a time-shared computer facility, is a bottleneck in space mission planning and limits the number of trajectory design options a mission planner can evaluate. To achieve marked reductions in trajectory design solution times, parallel optimization techniques are proposed. In this paper, three strategies for implementing trajectory optimization methods on multiprocessors will be compared. The comparisons will be illustrated through four trajectory design examples. In the first two examples, maximum reentry downrange and crossrange optimal control problems are posed for a generic maneuvering aerodynamic space vehicle. The third example is Troesch's problem, while the fourth example is the classic Brachistochrone problem. Each of the examples are posed as two-point boundary value problems whose solutions can be expressed as the solutions to a set of nonlinear equations.

  17. Massively Parallel Polar Decomposition on Distributed-Memory Systems

    KAUST Repository

    Ltaief, Hatem

    2018-01-01

    We present a high-performance implementation of the Polar Decomposition (PD) on distributed-memory systems. Building upon on the QR-based Dynamically Weighted Halley (QDWH) algorithm, the key idea lies in finding the best rational approximation for the scalar sign function, which also corresponds to the polar factor for symmetric matrices, to further accelerate the QDWH convergence. Based on the Zolotarev rational functions—introduced by Zolotarev (ZOLO) in 1877— this new PD algorithm ZOLO-PD converges within two iterations even for ill-conditioned matrices, instead of the original six iterations needed for QDWH. ZOLO-PD uses the property of Zolotarev functions that optimality is maintained when two functions are composed in an appropriate manner. The resulting ZOLO-PD has a convergence rate up to seventeen, in contrast to the cubic convergence rate for QDWH. This comes at the price of higher arithmetic costs and memory footprint. These extra floating-point operations can, however, be processed in an embarrassingly parallel fashion. We demonstrate performance using up to 102, 400 cores on two supercomputers. We demonstrate that, in the presence of a large number of processing units, ZOLO-PD is able to outperform QDWH by up to 2.3X speedup, especially in situations where QDWH runs out of work, for instance, in the strong scaling mode of operation.

  18. Operating System for Runtime Reconfigurable Multiprocessor Systems

    Directory of Open Access Journals (Sweden)

    Diana Göhringer

    2011-01-01

    Full Text Available Operating systems traditionally handle the task scheduling of one or more application instances on processor-like hardware architectures. RAMPSoC, a novel runtime adaptive multiprocessor System-on-Chip, exploits the dynamic reconfiguration on FPGAs to generate, start and terminate hardware and software tasks. The hardware tasks have to be transferred to the reconfigurable hardware via a configuration access port. The software tasks can be loaded into the local memory of the respective IP core either via the configuration access port or via the on-chip communication infrastructure (e.g. a Network-on-Chip. Recent-series of Xilinx FPGAs, such as Virtex-5, provide two Internal Configuration Access Ports, which cannot be accessed simultaneously. To prevent conflicts, the access to these ports as well as the hardware resource management needs to be controlled, e.g. by a special-purpose operating system running on an embedded processor. For that purpose and to handle the relations between temporally and spatially scheduled operations, the novel approach of an operating system is of high importance. This special purpose operating system, called CAP-OS (Configuration Access Port-Operating System, which will be presented in this paper, supports the clients using the configuration port with the services of priority-based access scheduling, hardware task mapping and resource management.

  19. Best Speed Fit EDF Scheduling for Performance Asymmetric Multiprocessors

    Directory of Open Access Journals (Sweden)

    Peng Wu

    2017-01-01

    Full Text Available In order to improve the performance of a real-time system, asymmetric multiprocessors have been proposed. The benefits of improved system performance and reduced power consumption from such architectures cannot be fully exploited unless suitable task scheduling and task allocation approaches are implemented at the operating system level. Unfortunately, most of the previous research on scheduling algorithms for performance asymmetric multiprocessors is focused on task priority assignment. They simply assign the highest priority task to the fastest processor. In this paper, we propose BSF-EDF (best speed fit for earliest deadline first for performance asymmetric multiprocessor scheduling. This approach chooses a suitable processor rather than the fastest one, when allocating tasks. With this proposed BSF-EDF scheduling, we also derive an effective schedulability test.

  20. A QDWH-Based SVD Software Framework on Distributed-Memory Manycore Systems

    KAUST Repository

    Sukkari, Dalal

    2017-01-01

    This paper presents a high performance software framework for computing a dense SVD on distributed- memory manycore systems. Originally introduced by Nakatsukasa et al. (Nakatsukasa et al. 2010; Nakatsukasa and Higham 2013), the SVD solver relies on the polar decomposition using the QR Dynamically-Weighted Halley algorithm (QDWH). Although the QDWH-based SVD algorithm performs a significant amount of extra floating-point operations compared to the traditional SVD with the one-stage bidiagonal reduction, the inherent high level of concurrency associated with Level 3 BLAS compute-bound kernels ultimately compensates for the arithmetic complexity overhead. Using the ScaLAPACK two-dimensional block cyclic data distribution with a rectangular processor topology, the resulting QDWH-SVD further reduces excessive communications during the panel factorization, while increasing the degree of parallelism during the update of the trailing submatrix, as opposed to relying to the default square processor grid. After detailing the algorithmic complexity and the memory footprint of the algorithm, we conduct a thorough performance analysis and study the impact of the grid topology on the performance by looking at the communication and computation profiling trade-offs. We report performance results against state-of-the-art existing QDWH software implementations (e.g., Elemental) and their SVD extensions on large-scale distributed-memory manycore systems based on commodity Intel x86 Haswell processors and Knights Landing (KNL) architecture. The QDWH-SVD framework achieves up to 3/8-fold on the Haswell/KNL-based platforms, respectively, against ScaLAPACK PDGESVD and turns out to be a competitive alternative for well and ill-conditioned matrices. We finally come up herein with a performance model based on these empirical results. Our QDWH-based polar decomposition and its SVD extension are freely available at https://github.com/ecrc/qdwh.git and https

  1. Chip-Multiprocessor Hardware Locks for Safety-Critical Java

    DEFF Research Database (Denmark)

    Strøm, Torur Biskopstø; Puffitsch, Wolfgang; Schoeberl, Martin

    2013-01-01

    and may void a task set's schedulability. In this paper we present a hardware locking mechanism to reduce the synchronization overhead. The solution is implemented for the chip-multiprocessor version of the Java Optimized Processor in the context of safety-critical Java. The implementation is compared...

  2. Stream-processing pipelines: processing of streams on multiprocessor architecture

    NARCIS (Netherlands)

    Kavaldjiev, N.K.; Smit, Gerardus Johannes Maria; Jansen, P.G.

    In this paper we study the timing aspects of the operation of stream-processing applications that run on a multiprocessor architecture. Dependencies are derived for the processing and communication times of the processors in such a system. Three cases of real-time constrained operation and four

  3. Studying an Eulerian Computer Model on Different High-performance Computer Platforms and Some Applications

    Science.gov (United States)

    Georgiev, K.; Zlatev, Z.

    2010-11-01

    The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.

  4. From medical images to flow computations without user-generated meshes.

    Science.gov (United States)

    Dillard, Seth I; Mousel, John A; Shrestha, Liza; Raghavan, Madhavan L; Vigmostad, Sarah C

    2014-10-01

    Biomedical flow computations in patient-specific geometries require integrating image acquisition and processing with fluid flow solvers. Typically, image-based modeling processes involve several steps, such as image segmentation, surface mesh generation, volumetric flow mesh generation, and finally, computational simulation. These steps are performed separately, often using separate pieces of software, and each step requires considerable expertise and investment of time on the part of the user. In this paper, an alternative framework is presented in which the entire image-based modeling process is performed on a Cartesian domain where the image is embedded within the domain as an implicit surface. Thus, the framework circumvents the need for generating surface meshes to fit complex geometries and subsequent creation of body-fitted flow meshes. Cartesian mesh pruning, local mesh refinement, and massive parallelization provide computational efficiency; the image-to-computation techniques adopted are chosen to be suitable for distributed memory architectures. The complete framework is demonstrated with flow calculations computed in two 3D image reconstructions of geometrically dissimilar intracranial aneurysms. The flow calculations are performed on multiprocessor computer architectures and are compared against calculations performed with a standard multistep route. Copyright © 2014 John Wiley & Sons, Ltd.

  5. MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems.

    Science.gov (United States)

    González-Domínguez, Jorge; Liu, Yongchao; Touriño, Juan; Schmidt, Bertil

    2016-12-15

    MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on a cluster with 32 nodes (each containing two Intel Haswell processors) shows reductions in execution time of over one order of magnitude for typical input datasets. Furthermore, MSAProbs-MPI using eight nodes is faster than the GPU-accelerated QuickProbs running on a Tesla K20. Another strong point is that MSAProbs-MPI can deal with large datasets for which MSAProbs and QuickProbs might fail due to time and memory constraints, respectively. Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at http://msaprobs.sourceforge.net CONTACT: jgonzalezd@udc.esSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems.

    Science.gov (United States)

    González-Domínguez, Jorge; Expósito, Roberto R

    2018-01-01

    Biclustering techniques are gaining attention in the analysis of large-scale datasets as they identify two-dimensional submatrices where both rows and columns are correlated. In this work we present ParBiBit, a parallel tool to accelerate the search of interesting biclusters on binary datasets, which are very popular on different fields such as genetics, marketing or text mining. It is based on the state-of-the-art sequential Java tool BiBit, which has been proved accurate by several studies, especially on scenarios that result on many large biclusters. ParBiBit uses the same methodology as BiBit (grouping the binary information into patterns) and provides the same results. Nevertheless, our tool significantly improves performance thanks to an efficient implementation based on C++11 that includes support for threads and MPI processes in order to exploit the compute capabilities of modern distributed-memory systems, which provide several multicore CPU nodes interconnected through a network. Our performance evaluation with 18 representative input datasets on two different eight-node systems shows that our tool is significantly faster than the original BiBit. Source code in C++ and MPI running on Linux systems as well as a reference manual are available at https://sourceforge.net/projects/parbibit/.

  7. Recognition of simple visual images using a sparse distributed memory: Some implementations and experiments

    Science.gov (United States)

    Jaeckel, Louis A.

    1990-01-01

    Previously, a method was described of representing a class of simple visual images so that they could be used with a Sparse Distributed Memory (SDM). Herein, two possible implementations are described of a SDM, for which these images, suitably encoded, will serve both as addresses to the memory and as data to be stored in the memory. A key feature of both implementations is that a pattern that is represented as an unordered set with a variable number of members can be used as an address to the memory. In the 1st model, an image is encoded as a 9072 bit string to be used as a read or write address; the bit string may also be used as data to be stored in the memory. Another representation, in which an image is encoded as a 256 bit string, may be used with either model as data to be stored in the memory, but not as an address. In the 2nd model, an image is not represented as a vector of fixed length to be used as an address. Instead, a rule is given for determining which memory locations are to be activated in response to an encoded image. This activation rule treats the pieces of an image as an unordered set. With this model, the memory can be simulated, based on a method of computing the approximate result of a read operation.

  8. A HETEROGENEOUS MULTIPROCESSOR SYSTEM-ON-CHIP ARCHITECTURE INCORPORATING MEMORY ALLOCATION

    Directory of Open Access Journals (Sweden)

    T.Thillaikkarasi

    2010-06-01

    Full Text Available This paper describes the development of a Multiprocessor System-on-Chip (MPSoC with a novel interconnect architecture incorporating memory allocation. It addresses the problem of mapping a process network with data dependent behavior and soft real time constraints onto the heterogeneous multiprocessor System on Chip (SoC architectures and focuses on a memory allocation step which is based on an integer linear programming model. An application is modeled as Kahn Process Network (KPN which makes the parallelism present in the application explicit. The main contribution of our work is an MILP based approach which can be used to map the KPN of streaming applications with data dependent behavior and interleaved computation and communication. Our solution minimizes hardware cost while taking into account the performance constraints. One of the salient features of our work is that it takes into account the additional overheads because of data communication conflicts. It permits to obtain an optimal distributed shared memory architecture minimizing the global cost to access the shared data in the application, and the memory cost. Our approach allows automatic generation of an architecture-level specification of the application.

  9. Cache-aware network-on-chip for chip multiprocessors

    Science.gov (United States)

    Tatas, Konstantinos; Kyriacou, Costas; Dekoulis, George; Demetriou, Demetris; Avraam, Costas; Christou, Anastasia

    2009-05-01

    This paper presents the hardware prototype of a Network-on-Chip (NoC) for a chip multiprocessor that provides support for cache coherence, cache prefetching and cache-aware thread scheduling. A NoC with support to these cache related mechanisms can assist in improving systems performance by reducing the cache miss ratio. The presented multi-core system employs the Data-Driven Multithreading (DDM) model of execution. In DDM thread scheduling is done according to data availability, thus the system is aware of the threads to be executed in the near future. This characteristic of the DDM model allows for cache aware thread scheduling and cache prefetching. The NoC prototype is a crossbar switch with output buffering that can support a cache-aware 4-node chip multiprocessor. The prototype is built on the Xilinx ML506 board equipped with a Xilinx Virtex-5 FPGA.

  10. High speed vision processor with reconfigurable processing element array based on full-custom distributed memory

    Science.gov (United States)

    Chen, Zhe; Yang, Jie; Shi, Cong; Qin, Qi; Liu, Liyuan; Wu, Nanjian

    2016-04-01

    In this paper, a hybrid vision processor based on a compact full-custom distributed memory for near-sensor high-speed image processing is proposed. The proposed processor consists of a reconfigurable processing element (PE) array, a row processor (RP) array, and a dual-core microprocessor. The PE array includes two-dimensional processing elements with a compact full-custom distributed memory. It supports real-time reconfiguration between the PE array and the self-organized map (SOM) neural network. The vision processor is fabricated using a 0.18 µm CMOS technology. The circuit area of the distributed memory is reduced markedly into 1/3 of that of the conventional memory so that the circuit area of the vision processor is reduced by 44.2%. Experimental results demonstrate that the proposed design achieves correct functions.

  11. Design considerations for a multiprocessor based data acquisition system

    International Nuclear Information System (INIS)

    Tippie, J.W.; Kulaga, J.E.

    1979-01-01

    The rapid advance of digital technology has provided the systems designer with many new design options. Hardware is no longer the controlling expense. Complex operating systems provide the flexibility and development tools needed by software designers, but restrict throughput. Multiprocessor-based systems can be used to ''front-end'' high-throughput applications while maintaining the many advantages offered by multitasking operating systems. The design of a high-throughput data acquisition system for application in low energy nuclear physics is considered

  12. A system for simulating shared memory in heterogeneous distributed-memory networks with specialization for robotics applications

    Energy Technology Data Exchange (ETDEWEB)

    Jones, J.P.; Bangs, A.L.; Butler, P.L.

    1991-01-01

    Hetero Helix is a programming environment which simulates shared memory on a heterogeneous network of distributed-memory computers. The machines in the network may vary with respect to their native operating systems and internal representation of numbers. Hetero Helix presents a simple programming model to developers, and also considers the needs of designers, system integrators, and maintainers. The key software technology underlying Hetero Helix is the use of a compiler'' which analyzes the data structures in shared memory and automatically generates code which translates data representations from the format native to each machine into a common format, and vice versa. The design of Hetero Helix was motivated in particular by the requirements of robotics applications. Hetero Helix has been used successfully in an integration effort involving 27 CPUs in a heterogeneous network and a body of software totaling roughly 100,00 lines of code. 25 refs., 6 figs.

  13. Simulation-based Modeling Frameworks for Networked Multi-processor System-on-Chip

    DEFF Research Database (Denmark)

    Mahadevan, Shankar

    2006-01-01

    : namely ARTS and RIPE, that allows to model hardware (computation time, power consumption, network latency, caching effect, etc.) and software (application partition and mapping, operating system scheduling, interrupt handling, etc.) aspects from system-level to cycle-true abstraction. Thereby, we can......This thesis deals with modeling aspects of multi-processor system-on-chip (MpSoC) design affected by the on-chip interconnect, also called the Network-on-Chip (NoC), at various levels of abstraction. To begin with, we undertook a comprehensive survey of research and design practices of networked Mp...... realistically model the application executing on the architecture. This includes e.g. accurate modeling of synchronization, cache refills, context switching effects, so on, which are critically dependent on the architecture and the performance of the NoC. The foundation of the ARTS model is abstract tasks...

  14. Performance of the coupled thermalhydraulics/neutron kinetics code R/P/C on workstation clusters and multiprocessor systems

    International Nuclear Information System (INIS)

    Hammer, C.; Paffrath, M.; Boeer, R.; Finnemann, H.; Jackson, C.J.

    1996-01-01

    The light water reactor core simulation code PANBOX has been coupled with the transient analysis code RELAP5 for the purpose of performing plant safety analyses with a three-dimensional (3-D) neutron kinetics model. The system has been parallelized to improve the computational efficiency. The paper describes the features of this system with emphasis on performance aspects. Performance results are given for different types of parallelization, i. e. for using an automatic parallelizing compiler, using the portable PVM platform on a workstation cluster, using PVM on a shared memory multiprocessor, and for using machine dependent interfaces. (author)

  15. Parallelization of the molecular dynamics code GROMOS87 for distributed memory parallel architectures

    NARCIS (Netherlands)

    Green, DG; Meacham, KE; vanHoesel, F; Hertzberger, B; Serazzi, G

    1995-01-01

    This paper describes the techniques and methodologies employed during parallelization of the Molecular Dynamics (MD) code GROMOS87, with the specific requirement that the program run efficiently on a range of distributed-memory parallel platforms. We discuss the preliminary results of our parallel

  16. Cyclic executive for safety-critical Java on chip-multiprocessors

    DEFF Research Database (Denmark)

    Ravn, Anders P.; Schoeberl, Martin

    2010-01-01

    , that uses model checking to find a static schedule, if one exists at all, which gives an implementation of a table driven multiprocessor scheduler. To evaluate the proposed cyclic executive for multiprocessors we have implemented it in the context of safety-critical Java on a Java processor....

  17. Dependence driven execution for multiprogrammed multiprocessor

    Energy Technology Data Exchange (ETDEWEB)

    Vajracharya, S. [Los Alamos National Lab., NM (United States); Grunwald, D. [Univ. of Colorado, Boulder, CO (United States). Dept. of Computer Science

    1998-12-31

    Barrier synchronizations can be very expensive on multiprogramming environment because no process can go past a barrier until all the processes have arrived. If a process participating at a barrier is swapped out by the operating system, the rest of participating processes end up waiting for the swapped-out process. This paper presents a compile-time/run-time system that uses a dependence-driven execution to overlap the execution of computations separated by barriers so that the processes do not spend most of the time idling at the synchronization point.

  18. An implementation of ray tracing algorithm for the multiprocessor machines

    Directory of Open Access Journals (Sweden)

    Samardžić Aleksandar B.

    2006-01-01

    Full Text Available Ray Tracing is an algorithm for generating photo-realistic pictures of the 3D scenes, given scene description, lighting condition and viewing parameters as inputs. The algorithm is inherently convenient for parallelization and the simplest parallelization scheme is for the shared-memory parallel machines (multiprocessors. This paper presents two implementations of the algorithm developed by the authors for alike machines, one using the POSIX threads API and another one using the OpenMP API. The paper also presents results of rendering some test scenes using these implementations and discusses our parallel algorithm version efficiency.

  19. An Efficient Asymmetric Distributed Lock for Embedded Multiprocessor Systems

    OpenAIRE

    Rutgers, J.H.; Bekooij, Marco Jan Gerrit; Smit, Gerardus Johannes Maria

    2012-01-01

    Efficient synchronization is a key concern in an embedded many-core system-on-chip (SoC). The use of atomic read-modify-write instructions combined with cache coherency as synchronization primitive is not always an option for shared-memory SoCs due to the lack of suitable IP. Furthermore, there are doubts about the scalability of hardware cache coherency protocols. Existing distributed locks for NUMA multiprocessor systems do not rely on cache coherency and are more scalable, but exchange man...

  20. Thermal-Aware Scheduling for Future Chip Multiprocessors

    Directory of Open Access Journals (Sweden)

    Pedro Trancoso

    2007-04-01

    Full Text Available The increased complexity and operating frequency in current single chip microprocessors is resulting in a decrease in the performance improvements. Consequently, major manufacturers offer chip multiprocessor (CMP architectures in order to keep up with the expected performance gains. This architecture is successfully being introduced in many markets including that of the embedded systems. Nevertheless, the integration of several cores onto the same chip may lead to increased heat dissipation and consequently additional costs for cooling, higher power consumption, decrease of the reliability, and thermal-induced performance loss, among others. In this paper, we analyze the evolution of the thermal issues for the future chip multiprocessor architectures and show that as the number of on-chip cores increases, the thermal-induced problems will worsen. In addition, we present several scenarios that result in excessive thermal stress to the CMP chip or significant performance loss. In order to minimize or even eliminate these problems, we propose thermal-aware scheduler (TAS algorithms. When assigning processes to cores, TAS takes their temperature and cooling ability into account in order to avoid thermal stress and at the same time improve the performance. Experimental results have shown that a TAS algorithm that considers also the temperatures of neighboring cores is able to significantly reduce the temperature-induced performance loss while at the same time, decrease the chip's temperature across many different operation and configuration scenarios.

  1. Power profiling of Cholesky and QR factorizations on distributed memory systems

    KAUST Repository

    Bosilca, George

    2012-08-30

    This paper presents the power profile of two high performance dense linear algebra libraries on distributed memory systems, ScaLAPACK and DPLASMA. From the algorithmic perspective, their methodologies are opposite. The former is based on block algorithms and relies on multithreaded BLAS and a two-dimensional block cyclic data distribution to achieve high parallel performance. The latter is based on tile algorithms running on top of a tile data layout and uses fine-grained task parallelism combined with a dynamic distributed scheduler (DAGuE) to leverage distributed memory systems. We present performance results (Gflop/s) as well as the power profile (Watts) of two common dense factorizations needed to solve linear systems of equations, namely Cholesky and QR. The reported numbers show that DPLASMA surpasses ScaLAPACK not only in terms of performance (up to 2X speedup) but also in terms of energy efficiency (up to 62 %). © 2012 Springer-Verlag (outside the USA).

  2. Distributed memory compiler methods for irregular problems: Data copy reuse and runtime partitioning

    Science.gov (United States)

    Das, Raja; Ponnusamy, Ravi; Saltz, Joel; Mavriplis, Dimitri

    1991-01-01

    Outlined here are two methods which we believe will play an important role in any distributed memory compiler able to handle sparse and unstructured problems. We describe how to link runtime partitioners to distributed memory compilers. In our scheme, programmers can implicitly specify how data and loop iterations are to be distributed between processors. This insulates users from having to deal explicitly with potentially complex algorithms that carry out work and data partitioning. We also describe a viable mechanism for tracking and reusing copies of off-processor data. In many programs, several loops access the same off-processor memory locations. As long as it can be verified that the values assigned to off-processor memory locations remain unmodified, we show that we can effectively reuse stored off-processor data. We present experimental data from a 3-D unstructured Euler solver run on iPSC/860 to demonstrate the usefulness of our methods.

  3. Monte Carlo photon transport on shared memory and distributed memory parallel processors

    International Nuclear Information System (INIS)

    Martin, W.R.; Wan, T.C.; Abdel-Rahman, T.S.; Mudge, T.N.; Miura, K.

    1987-01-01

    Parallelized Monte Carlo algorithms for analyzing photon transport in an inertially confined fusion (ICF) plasma are considered. Algorithms were developed for shared memory (vector and scalar) and distributed memory (scalar) parallel processors. The shared memory algorithm was implemented on the IBM 3090/400, and timing results are presented for dedicated runs with two, three, and four processors. Two alternative distributed memory algorithms (replication and dispatching) were implemented on a hypercube parallel processor (1 through 64 nodes). The replication algorithm yields essentially full efficiency for all cube sizes; with the 64-node configuration, the absolute performance is nearly the same as with the CRAY X-MP. The dispatching algorithm also yields efficiencies above 80% in a large simulation for the 64-processor configuration

  4. Learning to read aloud: A neural network approach using sparse distributed memory

    Science.gov (United States)

    Joglekar, Umesh Dwarkanath

    1989-01-01

    An attempt to solve a problem of text-to-phoneme mapping is described which does not appear amenable to solution by use of standard algorithmic procedures. Experiments based on a model of distributed processing are also described. This model (sparse distributed memory (SDM)) can be used in an iterative supervised learning mode to solve the problem. Additional improvements aimed at obtaining better performance are suggested.

  5. Method and apparatus for single-stepping coherence events in a multiprocessor system under software control

    Science.gov (United States)

    Blumrich, Matthias A.; Salapura, Valentina

    2010-11-02

    An apparatus and method are disclosed for single-stepping coherence events in a multiprocessor system under software control in order to monitor the behavior of a memory coherence mechanism. Single-stepping coherence events in a multiprocessor system is made possible by adding one or more step registers. By accessing these step registers, one or more coherence requests are processed by the multiprocessor system. The step registers determine if the snoop unit will operate by proceeding in a normal execution mode, or operate in a single-step mode.

  6. Trade-Off Exploration for Target Tracking Application in a Customized Multiprocessor Architecture

    Directory of Open Access Journals (Sweden)

    Yassin El-Hillali

    2009-01-01

    Full Text Available This paper presents the design of an FPGA-based multiprocessor-system-on-chip (MPSoC architecture optimized for Multiple Target Tracking (MTT in automotive applications. An MTT system uses an automotive radar to track the speed and relative position of all the vehicles (targets within its field of view. As the number of targets increases, the computational needs of the MTT system also increase making it difficult for a single processor to handle it alone. Our implementation distributes the computational load among multiple soft processor cores optimized for executing specific computational tasks. The paper explains how we designed and profiled the MTT application to partition it among different processors. It also explains how we applied different optimizations to customize the individual processor cores to their assigned tasks and to assess their impact on performance and FPGA resource utilization. The result is a complete MTT application running on an optimized MPSoC architecture that fits in a contemporary medium-sized FPGA and that meets the application's real-time constraints.

  7. Analysis and Optimisation of Hierarchically Scheduled Multiprocessor Embedded Systems

    DEFF Research Database (Denmark)

    Pop, Traian; Pop, Paul; Eles, Petru

    2008-01-01

    We present an approach to the analysis and optimisation of heterogeneous multiprocessor embedded systems. The systems are heterogeneous not only in terms of hardware components, but also in terms of communication protocols and scheduling policies. When several scheduling policies share a resource......, they are organised in a hierarchy. In this paper, we first develop a holistic scheduling and schedulability analysis that determines the timing properties of a hierarchically scheduled system. Second, we address design problems that are characteristic to such hierarchically scheduled systems: assignment...... of scheduling policies to tasks, mapping of tasks to hardware components, and the scheduling of the activities. We also present several algorithms for solving these problems. Our heuristics are able to find schedulable implementations under limited resources, achieving an efficient utilisation of the system...

  8. Testing and operating a multiprocessor chip with processor redundancy

    Science.gov (United States)

    Bellofatto, Ralph E; Douskey, Steven M; Haring, Rudolf A; McManus, Moyra K; Ohmacht, Martin; Schmunkamp, Dietmar; Sugavanam, Krishnan; Weatherford, Bryan J

    2014-10-21

    A system and method for improving the yield rate of a multiprocessor semiconductor chip that includes primary processor cores and one or more redundant processor cores. A first tester conducts a first test on one or more processor cores, and encodes results of the first test in an on-chip non-volatile memory. A second tester conducts a second test on the processor cores, and encodes results of the second test in an external non-volatile storage device. An override bit of a multiplexer is set if a processor core fails the second test. In response to the override bit, the multiplexer selects a physical-to-logical mapping of processor IDs according to one of: the encoded results in the memory device or the encoded results in the external storage device. On-chip logic configures the processor cores according to the selected physical-to-logical mapping.

  9. Interprocessor invocation on a NUMA multiprocessor. Technical report

    Energy Technology Data Exchange (ETDEWEB)

    Cox, A.L.; Fowler, R.J.; Veenstra, J.E.

    1990-10-01

    On a distributed shared memory machine, the problem of minimizing accesses to remote memory modules is crucial for obtaining high performance. We describe an object-based, parallel programming system called OSMIUM to support experiments with mechanisms for performing invocations on remote objects. The mechanisms we have studied include: non-cached access to remote memory, data migration, and function-shipping using an interprocessor invocation protocol (IIP). Our analyses and experiments indicate that IIP competes well with the alternatives, especially when the structure of user programs requires synchronized access to data structures. While these results are obtained on a NUMA multiprocessor, they are also applicable to systems that use hardware cache coherency techniques.

  10. Multi-processor network implementations in Multibus II and VME

    International Nuclear Information System (INIS)

    Briegel, C.

    1992-01-01

    ACNET (Fermilab Accelerator Controls Network), a proprietary network protocol, is implemented in a multi-processor configuration for both Multibus II and VME. The implementations are contrasted by the bus protocol and software design goals. The Multibus II implementation provides for multiple processors running a duplicate set of tasks on each processor. For a network connected task, messages are distributed by a network round-robin scheduler. Further, messages can be stopped, continued, or re-routed for each task by user-callable commands. The VME implementation provides for multiple processors running one task across all processors. The process can either be fixed to a particular processor or dynamically allocated to an available processor depending on the scheduling algorithm of the multi-processing operating system. (author)

  11. Standard interfaces for program-modular multiprocessor systems

    International Nuclear Information System (INIS)

    Chernykh, E.V.

    1982-01-01

    The peculiarities of the structures of existing and developed standard interfaces used in automation systems for nuclear physical experiments are considered. general structural characteristics of multiprocessor system interfaces are revealed. The comparison of the existing system CAMAC crate and designed standards of COMPEX, E3S and FASTBUS interfaces by capacity and relative cost is carried out. The analysis of the given data shows that operation of any interface is more advantageous at the rates close to capacity values, the relative cost being minimum. In this case the advantage is on the side of interfaces with greater capacity values for which at a moderated decrease of the exchange or requests processing rate the relative costs grow slower. A higher capacity of one-cycle exchange is provided with functional data way specialization in the interface. The conclusion is drawn that most perspective trend in the development of automation systems for high energy physics experiments is using FASTBUS standard

  12. Fundamental Parallel Algorithms for Private-Cache Chip Multiprocessors

    DEFF Research Database (Denmark)

    Arge, Lars Allan; Goodrich, Michael T.; Nelson, Michael

    2008-01-01

    In this paper, we study parallel algorithms for private-cache chip multiprocessors (CMPs), focusing on methods for foundational problems that are scalable with the number of cores. By focusing on private-cache CMPs, we show that we can design efficient algorithms that need no additional assumptions...... about the way cores are interconnected, for we assume that all inter-processor communication occurs through the memory hierarchy. We study several fundamental problems, including prefix sums, selection, and sorting, which often form the building blocks of other parallel algorithms. Indeed, we present...... two sorting algorithms, a distribution sort and a mergesort. Our algorithms are asymptotically optimal in terms of parallel cache accesses and space complexity under reasonable assumptions about the relationships between the number of processors, the size of memory, and the size of cache blocks...

  13. An Adaptive Hybrid Multiprocessor technique for bioinformatics sequence alignment

    KAUST Repository

    Bonny, Talal

    2012-07-28

    Sequence alignment algorithms such as the Smith-Waterman algorithm are among the most important applications in the development of bioinformatics. Sequence alignment algorithms must process large amounts of data which may take a long time. Here, we introduce our Adaptive Hybrid Multiprocessor technique to accelerate the implementation of the Smith-Waterman algorithm. Our technique utilizes both the graphics processing unit (GPU) and the central processing unit (CPU). It adapts to the implementation according to the number of CPUs given as input by efficiently distributing the workload between the processing units. Using existing resources (GPU and CPU) in an efficient way is a novel approach. The peak performance achieved for the platforms GPU + CPU, GPU + 2CPUs, and GPU + 3CPUs is 10.4 GCUPS, 13.7 GCUPS, and 18.6 GCUPS, respectively (with the query length of 511 amino acid). © 2010 IEEE.

  14. Behavioral Simulation and Performance Evaluation of Multi-Processor Architectures

    Directory of Open Access Journals (Sweden)

    Ausif Mahmood

    1996-01-01

    Full Text Available The development of multi-processor architectures requires extensive behavioral simulations to verify the correctness of design and to evaluate its performance. A high level language can provide maximum flexibility in this respect if the constructs for handling concurrent processes and a time mapping mechanism are added. This paper describes a novel technique for emulating hardware processes involved in a parallel architecture such that an object-oriented description of the design is maintained. The communication and synchronization between hardware processes is handled by splitting the processes into their equivalent subprograms at the entry points. The proper scheduling of these subprograms is coordinated by a timing wheel which provides a time mapping mechanism. Finally, a high level language pre-processor is proposed so that the timing wheel and the process emulation details can be made transparent to the user.

  15. Joint Application Mapping/Interconnect Synthesis Techniques for Embedded Chip-Scale Multiprocessors

    National Research Council Canada - National Science Library

    Bambha, Neal K; Bhattacharyya, Shuvra S

    2005-01-01

    .... In this paper, we present high-level scheduling and interconnect topology synthesis techniques for embedded multiprocessor systems-on-chip that are streamlined for one or more digital signal processing applications...

  16. System-Level Design Methodologies for Networked Multiprocessor Systems-on-Chip

    DEFF Research Database (Denmark)

    Virk, Kashif Munir

    2008-01-01

    is the first such attempt in the published literature. The second part of the thesis deals with the issues related to the development of system-level design methodologies for networked multiprocessor systems-on-chip at various levels of design abstraction with special focus on the modeling and design...... at the system-level. The multiprocessor modeling framework is then extended to include models of networked multiprocessor systems-on-chip which is then employed to model wireless sensor networks both at the sensor node level as well as the wireless network level. In the third and the final part, the thesis...... to the transaction-level model. The thesis, as a whole makes contributions by describing a design methodology for networked multiprocessor embedded systems at three layers of abstraction from system-level through transaction-level to the cycle accurate level as well as demonstrating it practically by implementing...

  17. Multiprocessor based 4K data acquisition system with enhanced system capabilities

    International Nuclear Information System (INIS)

    Mohindra, N.V.; Ram, L.S.; Gopalakrishnan, K.R.; Bayala, A.K.

    1989-01-01

    A multiprocessor based 4K data acquisition system has been designed using a number of processors working simultaneously to give enhanced system capabilities. The master processor is assigned to carry out high speed data acquisition and display of spectrum and on line computations while second processor gives independently alpha-numeric page display for communications. A third processor which may even be a personal computer is assigned to carry out complex calculations required for processing data acquired by master processor. All these processors communicate with each other through serial link leaving no chance of bus contention of any type. A novel method has been incorporated into the system to have maximum utilization for master processor power. This enables full 4K live spectrum display on XYZ unit with multiple cursor markers and selected regions of interest, expanded display of ROI, intensified regions of special interest, and also allows alpha-numeric display on same XYZ unit. Even with all this for ADC with 4μ sec conversion time, the system virtually offers zero dead time for data acquisition and storage. When used as a stand alone system without PC, processing functions like Area Integration, Peak Analysis, Spectrum smoothening and Energy calibration have been incorporated. (author)

  18. A Taxonomy of Reconfigurable Single-/Multiprocessor Systems-on-Chip

    Directory of Open Access Journals (Sweden)

    Diana Göhringer

    2009-01-01

    Full Text Available Runtime adaptivity of hardware in processor architectures is a novel trend, which is under investigation in a variety of research labs all over the world. The runtime exchange of modules, implemented on a reconfigurable hardware, affects the instruction flow (e.g., in reconfigurable instruction set processors or the data flow, which has a strong impact on the performance of an application. Furthermore, the choice of a certain processor architecture related to the class of target applications is a crucial point in application development. A simple example is the domain of high-performance computing applications found in meteorology or high-energy physics, where vector processors are the optimal choice. A classification scheme for computer systems was provided in 1966 by Flynn where single/multiple data and instruction streams were combined to four types of architectures. This classification is now used as a foundation for an extended classification scheme including runtime adaptivity as further degree of freedom for processor architecture design. The developed scheme is validated by a multiprocessor system implemented on reconfigurable hardware as well as by a classification of existing static and reconfigurable processor systems.

  19. E-Token Energy-Aware Proportionate Sharing Scheduling Algorithm for Multiprocessor Systems

    Directory of Open Access Journals (Sweden)

    Pasupuleti Ramesh

    2017-01-01

    Full Text Available WSN plays vital role from small range healthcare surveillance systems to largescale environmental monitoring. Its design for energy constrained applications is a challenging issue. Sensors in WSNs are projected to run separately for longer periods. It is of excessive cost to substitute exhausted batteries which is not even possible in antagonistic situations. Multiprocessors are used in WSNs for high performance scientific computing, where each processor is assigned the same or different workload. When the computational demands of the system increase then the energy efficient approaches play an important role to increase system lifetime. Energy efficiency is commonly carried out by using proportionate fair scheduler. This introduces abnormal overloading effect. In order to overcome the existing problems E-token Energy-Aware Proportionate Sharing (EEAPS scheduling is proposed here. The power consumption for each thread/task is calculated and the tasks are allotted to the multiple processors through the auctioning mechanism. The algorithm is simulated by using the real-time simulator (RTSIM and the results are tested.

  20. Performance of Multithreaded Chip Multiprocessors And Implications for Operating System Design

    OpenAIRE

    Fedorova, Alexandra; Seltzer, Margo I.; Small, Christopher A.; Nussbaum, Daniel

    2005-01-01

    An operating system’s design is often influenced by the architecture of the target hardware. While uniprocessor and multiprocessor architectures are well understood, such is not the case for multithreaded chip multiprocessors (CMT) – a new generation of processors designed to improve performance of memory-intensive applications. The first systems equipped with CMT processors are just becoming available, so it is critical that we now understand how to obtain the best performance from such syst...

  1. Hierarchical N-body methods on shared address space multiprocessors.

    Science.gov (United States)

    Holt, C.; Singh, J. P.

    The authors examine the parallelization issues in and architectural implications of the two dominant adaptive hierarchical N-body methods: the Barnes-Hut method and the Fast Multipole Method. They show that excellent parallel performance can be obtained on cache-coherent shared address space multiprocessors, by demonstrating performance on three cache-coherent machines: the Stanford DASH, the Kendall Square Research KSR-1, and the Silicon Graphics Challenge. Even on machines that have their main memory physically distributed among processing nodes and highly nonuniform memory access costs, the speedups are obtained without any attention to where memory is allocated on the machine. The authors show that the reason for good performance is the high degree of temporal locality afforded by the applications, and the fact that working sets are small (and scale slowly) so that caching shared data automatically in hardware exploits this locality very effectively. Even if data distribution in main memory is assumed to be free, it does not help very much. Finally, they address a potential bottleneck in scaling the parallelism to large machines, namely the fraction of time spent in building the tree used by hierarchical N-body methods.

  2. Software Coherence in Multiprocessor Memory Systems. Ph.D. Thesis

    Science.gov (United States)

    Bolosky, William Joseph

    1993-01-01

    Processors are becoming faster and multiprocessor memory interconnection systems are not keeping up. Therefore, it is necessary to have threads and the memory they access as near one another as possible. Typically, this involves putting memory or caches with the processors, which gives rise to the problem of coherence: if one processor writes an address, any other processor reading that address must see the new value. This coherence can be maintained by the hardware or with software intervention. Systems of both types have been built in the past; the hardware-based systems tended to outperform the software ones. However, the ratio of processor to interconnect speed is now so high that the extra overhead of the software systems may no longer be significant. This issue is explored both by implementing a software maintained system and by introducing and using the technique of offline optimal analysis of memory reference traces. It finds that in properly built systems, software maintained coherence can perform comparably to or even better than hardware maintained coherence. The architectural features necessary for efficient software coherence to be profitable include a small page size, a fast trap mechanism, and the ability to execute instructions while remote memory references are outstanding.

  3. Principles for problem aggregation and assignment in medium scale multiprocessors

    Science.gov (United States)

    Nicol, David M.; Saltz, Joel H.

    1987-01-01

    One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior.

  4. Pipelined multiprocessor system-on-chip for multimedia

    CERN Document Server

    Javaid, Haris

    2014-01-01

    This book describes analytical models and estimation methods to enhance performance estimation of pipelined multiprocessor systems-on-chip (MPSoCs).  A framework is introduced for both design-time and run-time optimizations. For design space exploration, several algorithms are presented to minimize the area footprint of a pipelined MPSoC under a latency or a throughput constraint.  A novel adaptive pipelined MPSoC architecture is described, where idle processors are transitioned into low-power states at run-time to reduce energy consumption. Multi-mode pipelined MPSoCs are introduced, where multiple pipelined MPSoCs optimized separately are merged into a single pipelined MPSoC, enabling further reduction of the area footprint by sharing the processors and communication buffers. Readers will benefit from the authors’ combined use of analytical models, estimation methods and exploration algorithms and will be enabled to explore billions of design points in a few minutes.   ·         Describes the ...

  5. Shuffle-exchange type interconnection networks for multiprocessor systems

    Energy Technology Data Exchange (ETDEWEB)

    Huang, S.T.

    1985-01-01

    There are two major parts in this dissertation. In the first part, a new model, Finite Permutation Machine (FPM), and a set of theorems are developed to capture the theory of operations for the permutation networks used in SIMD mutliprocessor systems. Using this new framework, a long-lasting open problem is partially solved: Are 2n-1 passes of shuffle exchange necessary and sufficient to realize all permutations., where n = log/sub 2/N and N is the number of inputs and outputs interconnected by the network. In the second part, a solution is presented to the resource scheduling problem in large scale loosely coupled MIMD multiprocessor systems. First an interconnection network called circular shuffle network (CSN) is proposed. CSN is a circular form of shuffle-exchange type multistage interconnection network, with the switching node considered as processors. The author defines CSN with homogeneous switching nodes in the entire network as homogeneous CSN (HCSN). HCSN provides two important properties, namely, clustering of nodes in the entire network with respect to any node and efficient partial broadcast mechanisms. Second a distributed scheduling model is described that takes advantage of the above two properties that HCSN provides.

  6. Using data tagging to improve the performance of Kanerva's sparse distributed memory

    Science.gov (United States)

    Rogers, David

    1988-01-01

    The standard formulation of Kanerva's sparse distributed memory (SDM) involves the selection of a large number of data storage locations, followed by averaging the data contained in those locations to reconstruct the stored data. A variant of this model is discussed, in which the predominant pattern is the focus of reconstruction. First, one architecture is proposed which returns the predominant pattern rather than the average pattern. However, this model will require too much storage for most uses. Next, a hybrid model is proposed, called tagged SDM, which approximates the results of the predominant pattern machine, but is nearly as efficient as Kanerva's original formulation. Finally, some experimental results are shown which confirm that significant improvements in the recall capability of SDM can be achieved using the tagged architecture.

  7. A data base for on-line event analysis on a distributed memory machine

    International Nuclear Information System (INIS)

    Argante, E.; Meesters, M.R.J.; Willers, I.; Stok, P. van der

    1996-01-01

    Parallel in-memory databases can enhance the structuring and parallelization of programs used in High Energy Physics (HEP). Efficient database access routines are used as communication primitives which hide the communication topology in contrast to the more explicit communications like PVM or MPI. A parallel in-memory database, called SPIDER, has been implemented on a 32 node Meiko CS-2 distributed memory machine. The SPIDER primitives generate a lower overhead than the one generated by PVM or MPI. The even reconstruction program, CPREAD, of the CLEAR experiment, has been used as test case. Performance measurements showed that CPREAD interfaced to SPIDER can easily cope with the event rate generated by CPLEAR. (author)

  8. A Case for Tamper-Resistant and Tamper-Evident Computer Systems

    National Research Council Canada - National Science Library

    Solihin, Yan

    2007-01-01

    .... These attacks attempt to snoop or modify data transfer between various chips in a computer system such as between the processor and memory, and between processors in a multiprocessor interconnect network...

  9. Supporting Multiprocessors in the Icecap Safety-Critical Java Run-Time Environment

    DEFF Research Database (Denmark)

    Zhao, Shuai; Wellings, Andy; Korsholm, Stephan Erbs

    2015-01-01

    The current version of the Safety Critical Java (SCJ) specification defines three compliance levels. Level 0 targets single processor programs while Level 1 and 2 can support multiprocessor platforms. Level 1 programs must be fully partitioned but Level 2 programs can also be more globally...... scheduled. As of yet, there is no official Reference Implementation for SCJ. However, the icecap project has produced a Safety-Critical Java Run-time Environment based on the Hardware-near Virtual Machine (HVM). This supports SCJ at all compliance levels and provides an implementation of the safety......-critical Java (javax.safetycritical) package. This is still work-in-progress and lacks certain key features. Among these is the ability to support multiprocessor platforms. In this paper, we explore two possible options to adding multiprocessor support to this environment: the “green thread” and the “native...

  10. Optical RAM-enabled cache memory and optical routing for chip multiprocessors: technologies and architectures

    Science.gov (United States)

    Pleros, Nikos; Maniotis, Pavlos; Alexoudi, Theonitsa; Fitsios, Dimitris; Vagionas, Christos; Papaioannou, Sotiris; Vyrsokinos, K.; Kanellos, George T.

    2014-03-01

    The processor-memory performance gap, commonly referred to as "Memory Wall" problem, owes to the speed mismatch between processor and electronic RAM clock frequencies, forcing current Chip Multiprocessor (CMP) configurations to consume more than 50% of the chip real-estate for caching purposes. In this article, we present our recent work spanning from Si-based integrated optical RAM cell architectures up to complete optical cache memory architectures for Chip Multiprocessor configurations. Moreover, we discuss on e/o router subsystems with up to Tb/s routing capacity for cache interconnection purposes within CMP configurations, currently pursued within the FP7 PhoxTrot project.

  11. Communication strategies for angular domain decomposition of transport calculations on message passing multiprocessors

    International Nuclear Information System (INIS)

    Azmy, Y.Y.

    1997-01-01

    The effect of three communication schemes for solving Arbitrarily High Order Transport (AHOT) methods of the Nodal type on parallel performance is examined via direct measurements and performance models. The target architecture in this study is Oak Ridge National Laboratory's 128 node Paragon XP/S 5 computer and the parallelization is based on the Parallel Virtual Machine (PVM) library. However, the conclusions reached can be easily generalized to a large class of message passing platforms and communication software. The three schemes considered here are: (1) PVM's global operations (broadcast and reduce) which utilizes the Paragon's native corresponding operations based on a spanning tree routing; (2) the Bucket algorithm wherein the angular domain decomposition of the mesh sweep is complemented with a spatial domain decomposition of the accumulation process of the scalar flux from the angular flux and the convergence test; (3) a distributed memory version of the Bucket algorithm that pushes the spatial domain decomposition one step farther by actually distributing the fixed source and flux iterates over the memories of the participating processes. Their conclusion is that the Bucket algorithm is the most efficient of the three if all participating processes have sufficient memories to hold the entire problem arrays. Otherwise, the third scheme becomes necessary at an additional cost to speedup and parallel efficiency that is quantifiable via the parallel performance model

  12. DAEDALUS: System-Level Design Methodology for Streaming Multiprocessor Embedded Systems on Chips

    NARCIS (Netherlands)

    Stefanov, T.; Pimentel, A.; Nikolov, H.; Ha, S.; Teich, J.

    2017-01-01

    The complexity of modern embedded systems, which are increasingly based on heterogeneous multiprocessor system-on-chip (MPSoC) architectures, has led to the emergence of system-level design. To cope with this design complexity, system-level design aims at raising the abstraction level of the design

  13. Evaluation of the impact chip multiprocessors have on SNL application performance.

    Energy Technology Data Exchange (ETDEWEB)

    Doerfler, Douglas W.

    2009-10-01

    This report describes trans-organizational efforts to investigate the impact of chip multiprocessors (CMPs) on the performance of important Sandia application codes. The impact of CMPs on the performance and applicability of Sandia's system software was also investigated. The goal of the investigation was to make algorithmic and architectural recommendations for next generation platform acquisitions.

  14. 3D-TV Rendering on a Multiprocessor System on a Chip

    NARCIS (Netherlands)

    Van Eijndhoven, J.T.J.; Li, X.

    2006-01-01

    This thesis focuses on the issue of mapping 3D-TV rendering applications to a multiprocessor platform. The target platform aims to address tomorrow's multi-media consumer market. The prototype chip, called Wasabi, contains a set of TriMedia processors that communicate viaa shared memory, fast

  15. Fast multiprocessor scheduling with fixed task binding of large scale industrial cyber physical systems

    NARCIS (Netherlands)

    Adyanthaya, S.; Geilen, M.; Basten, T.; Schiffelers, R.; Theelen, B.; Voeten, J.

    2013-01-01

    Latest trends in embedded platform architectures show a steady shift from high frequency single core platforms to lower-frequency but highly-parallel execution platforms. Scheduling applications with stringent latency requirements on such multiprocessor platforms is challenging. Our work is

  16. On-Line Dependability Enhancement of Multiprocessor SoCs by Resource Management

    NARCIS (Netherlands)

    ter Braak, T.D.; Burgess, S.T.; Hurskainen, H.; Kerkhoff, Hans G.; Vermeulen, B.; Zhang, X.

    2010-01-01

    This paper describes a new approach towards dependable design of homogeneous multi-processor SoCs in an example satellite-navigation application. First, the NoC dependability is functionally verified via embedded software. Then the Xentium processor tiles are periodically verified via on-line

  17. Abstractions for aperiodic multiprocessor scheduling of real-time stream processing applications

    NARCIS (Netherlands)

    Hausmans, J.P.H.M.

    2015-01-01

    Embedded multiprocessor systems are often used in the domain of real-time stream processing applications to keep up with increasing power and performance requirements. Examples of such real-time stream processing applications are digital radio baseband processing and WLAN transceivers. These stream

  18. HAPI: An event-driven simulator for real-time multiprocessor systems

    NARCIS (Netherlands)

    Kurtin, Philip Sebastian; Hausmans, J.P.H.M.; Bekooij, Marco Jan Gerrit

    2016-01-01

    Many embedded multiprocessor systems have hard real-time requirements which should be guaranteed at design time by means of analytical techniques that cover all cases. It is desirable to evaluate the correctness and tightness of the analysis results by means of simulation. However, verification of

  19. Accelerated Cyclic Reduction: A Distributed-Memory Fast Solver for Structured Linear Systems

    KAUST Repository

    Chávez, Gustavo

    2017-12-15

    We present Accelerated Cyclic Reduction (ACR), a distributed-memory fast solver for rank-compressible block tridiagonal linear systems arising from the discretization of elliptic operators, developed here for three dimensions. Algorithmic synergies between Cyclic Reduction and hierarchical matrix arithmetic operations result in a solver that has O(kNlogN(logN+k2)) arithmetic complexity and O(k Nlog N) memory footprint, where N is the number of degrees of freedom and k is the rank of a block in the hierarchical approximation, and which exhibits substantial concurrency. We provide a baseline for performance and applicability by comparing with the multifrontal method with and without hierarchical semi-separable matrices, with algebraic multigrid and with the classic cyclic reduction method. Over a set of large-scale elliptic systems with features of nonsymmetry and indefiniteness, the robustness of the direct solvers extends beyond that of the multigrid solver, and relative to the multifrontal approach ACR has lower or comparable execution time and size of the factors, with substantially lower numerical ranks. ACR exhibits good strong and weak scaling in a distributed context and, as with any direct solver, is advantageous for problems that require the solution of multiple right-hand sides. Numerical experiments show that the rank k patterns are of O(1) for the Poisson equation and of O(n) for the indefinite Helmholtz equation. The solver is ideal in situations where low-accuracy solutions are sufficient, or otherwise as a preconditioner within an iterative method.

  20. Advanced compilation techniques in the PARADIGM compiler for distributed-memory multicomputers

    Science.gov (United States)

    Su, Ernesto; Lain, Antonio; Ramaswamy, Shankar; Palermo, Daniel J.; Hodges, Eugene W., IV; Banerjee, Prithviraj

    1995-01-01

    The PARADIGM compiler project provides an automated means to parallelize programs, written in a serial programming model, for efficient execution on distributed-memory multicomputers. .A previous implementation of the compiler based on the PTD representation allowed symbolic array sizes, affine loop bounds and array subscripts, and variable number of processors, provided that arrays were single or multi-dimensionally block distributed. The techniques presented here extend the compiler to also accept multidimensional cyclic and block-cyclic distributions within a uniform symbolic framework. These extensions demand more sophisticated symbolic manipulation capabilities. A novel aspect of our approach is to meet this demand by interfacing PARADIGM with a powerful off-the-shelf symbolic package, Mathematica. This paper describes some of the Mathematica routines that performs various transformations, shows how they are invoked and used by the compiler to overcome the new challenges, and presents experimental results for code involving cyclic and block-cyclic arrays as evidence of the feasibility of the approach.

  1. Kmerind: A Flexible Parallel Library for K-mer Indexing of Biological Sequences on Distributed Memory Systems.

    Science.gov (United States)

    Pan, Tony; Flick, Patrick; Jain, Chirag; Liu, Yongchao; Aluru, Srinivas

    2017-10-09

    Counting and indexing fixed length substrings, or k-mers, in biological sequences is a key step in many bioinformatics tasks including genome alignment and mapping, genome assembly, and error correction. While advances in next generation sequencing technologies have dramatically reduced the cost and improved latency and throughput, few bioinformatics tools can efficiently process the datasets at the current generation rate of 1.8 terabases every 3 days. We present Kmerind, a high performance parallel k-mer indexing library for distributed memory environments. The Kmerind library provides a set of simple and consistent APIs with sequential semantics and parallel implementations that are designed to be flexible and extensible. Kmerind's k-mer counter performs similarly or better than the best existing k-mer counting tools even on shared memory systems. In a distributed memory environment, Kmerind counts k-mers in a 120 GB sequence read dataset in less than 13 seconds on 1024 Xeon CPU cores, and fully indexes their positions in approximately 17 seconds. Querying for 1% of the k-mers in these indices can be completed in 0.23 seconds and 28 seconds, respectively. Kmerind is the first k-mer indexing library for distributed memory environments, and the first extensible library for general k-mer indexing and counting. Kmerind is available at https://github.com/ParBLiSS/kmerind.

  2. Validation of fault-free behavior of a reliable multiprocessor system - FTMP: A case study. [Fault-Tolerant Multi-Processor avionics

    Science.gov (United States)

    Clune, E.; Segall, Z.; Siewiorek, D.

    1984-01-01

    A program of experiments has been conducted at NASA-Langley to test the fault-free performance of a Fault-Tolerant Multiprocessor (FTMP) avionics system for next-generation aircraft. Baseline measurements of an operating FTMP system were obtained with respect to the following parameters: instruction execution time, frame size, and the variation of clock ticks. The mechanisms of frame stretching were also investigated. The experimental results are summarized in a table. Areas of interest for future tests are identified, with emphasis given to the implementation of a synthetic workload generation mechanism on FTMP.

  3. Communication and Memory Architecture Design of Application-Specific High-End Multiprocessors

    Directory of Open Access Journals (Sweden)

    Yahya Jan

    2012-01-01

    Full Text Available This paper is devoted to the design of communication and memory architectures of massively parallel hardware multiprocessors necessary for the implementation of highly demanding applications. We demonstrated that for the massively parallel hardware multiprocessors the traditionally used flat communication architectures and multi-port memories do not scale well, and the memory and communication network influence on both the throughput and circuit area dominates the processors influence. To resolve the problems and ensure scalability, we proposed to design highly optimized application-specific hierarchical and/or partitioned communication and memory architectures through exploring and exploiting the regularity and hierarchy of the actual data flows of a given application. Furthermore, we proposed some data distribution and related data mapping schemes in the shared (global partitioned memories with the aim to eliminate the memory access conflicts, as well as, to ensure that our communication design strategies will be applicable. We incorporated these architecture synthesis strategies into our quality-driven model-based multi-processor design method and related automated architecture exploration framework. Using this framework, we performed a large series of experiments that demonstrate many various important features of the synthesized memory and communication architectures. They also demonstrate that our method and related framework are able to efficiently synthesize well scalable memory and communication architectures even for the high-end multiprocessors. The gains as high as 12-times in performance and 25-times in area can be obtained when using the hierarchical communication networks instead of the flat networks. However, for the high parallelism levels only the partitioned approach ensures the scalability in performance.

  4. Resource Allocation Model for Modelling Abstract RTOS on Multiprocessor System-on-Chip

    DEFF Research Database (Denmark)

    Virk, Kashif Munir; Madsen, Jan

    2003-01-01

    Resource Allocation is an important problem in RTOS's, and has been an active area of research. Numerous approaches have been developed and many different techniques have been combined for a wide range of applications. In this paper, we address the problem of resource allocation in the context...... of modelling an abstract RTOS on multiprocessor SoC platforms. We discuss the implementation details of a simplified basic priority inheritance protocol for our abstract system model in SystemC....

  5. Architecture, On-Chip Network and Programming Interface Concept for Multiprocessor System-on-Chip

    OpenAIRE

    Samman, Faizal Arya

    2017-01-01

    in Proc. of the International Conference onSmart Green Technology in Electrical and Information Systems (ICSGTEIS), 2016, publised in IEEE Explorer (indexed by SCOPUS) This paper presents a system architecture, data communnication scheme and application programming interface model or concept for a multiprocessor system based on a network-on-chip (NoC) platform. Each processing node connected to a mesh node has its own local (instruction and data) memory portion, and a global (shared) memor...

  6. Dynamic scheduling and analysis of real time systems with multiprocessors

    Directory of Open Access Journals (Sweden)

    M.D. Nashid Anjum

    2016-08-01

    Full Text Available This research work considers a scenario of cloud computing job-shop scheduling problems. We consider m realtime jobs with various lengths and n machines with different computational speeds and costs. Each job has a deadline to be met, and the profit of processing a packet of a job differs from other jobs. Moreover, considered deadlines are either hard or soft and a penalty is applied if a deadline is missed where the penalty is considered as an exponential function of time. The scheduling problem has been formulated as a mixed integer non-linear programming problem whose objective is to maximize net-profit. The formulated problem is computationally hard and not solvable in deterministic polynomial time. This research work proposes an algorithm named the Tube-tap algorithm as a solution to this scheduling optimization problem. Extensive simulation shows that the proposed algorithm outperforms existing solutions in terms of maximizing net-profit and preserving deadlines.

  7. Aperiodic Multiprocessor Scheduling for Real-Time Stream Processing Applications

    NARCIS (Netherlands)

    Wiggers, M.H.

    2009-01-01

    This thesis is concerned with the computation of buffer capacities that guarantee satisfaction of timing and resource constraints for task graphs with aperiodic task execution rates that are executed on run-time scheduled resources. Stream processing applications such as digital radio baseband

  8. Interference control by best-effort process duty-cycling in chip multi-processor systems for real-time medical image processing

    NARCIS (Netherlands)

    Westmijze, M.; Bekooij, Marco Jan Gerrit; Smit, Gerardus Johannes Maria

    2013-01-01

    Systems with chip multi-processors are currently used for several applications that have real-time requirements. In chip multi-processor architectures, many hardware resources such as parts of the cache hierarchy are shared between cores and by using such resources, applications can significantly

  9. Multiprocessor Scheduling for Hard Real-Time Software

    Science.gov (United States)

    1990-06-01

    OF MONITORING ORGANIZATION Computer Sciencie Dept. (if applicable) Naval Postgraduate School Naval Postgraduate School 52 6c. ADDRESS (City, State...the designer propagate the effects of a change. The user interface consists of a syntax- directed editor with graphics capabilities, and expert system...represented by a graph G consisting of n nodes representing jobs and directed arcs representing ordering restrictions. To ensure the jobs represented by

  10. Systems analysis of the space shuttle. [communication systems, computer systems, and power distribution

    Science.gov (United States)

    Schilling, D. L.; Oh, S. J.; Thau, F.

    1975-01-01

    Developments in communications systems, computer systems, and power distribution systems for the space shuttle are described. The use of high speed delta modulation for bit rate compression in the transmission of television signals is discussed. Simultaneous Multiprocessor Organization, an approach to computer organization, is presented. Methods of computer simulation and automatic malfunction detection for the shuttle power distribution system are also described.

  11. A survey of Tumult, a real-time multi-processor system

    International Nuclear Information System (INIS)

    Jansen, P.G.

    1986-01-01

    Tumult (Twente University MULTi processor system) is the name of an ongoing project aiming at the design and implementation of a modular extendible multiprocessor system. All memory is distributed and processors communicate in parallel via a fast and reliable local switching network instead of a shared bus. A distributed real-time operating system is being designed and implemented, consisting of a multi-tasking subsystem per processor. Processes can communicate via a message passing mechanism. Communication links and processes are dynamically created and disposed by the application. In this article a brief description of the system is given; communication aspects are emphasized. (Auth.)

  12. A system-level multiprocessor system-on-chip modeling framework

    DEFF Research Database (Denmark)

    Virk, Kashif Munir; Madsen, Jan

    2004-01-01

    We present a system-level modeling framework to model system-on-chips (SoC) consisting of heterogeneous multiprocessors and network-on-chip communication structures in order to enable the developers of today's SoC designs to take advantage of the flexibility and scalability of network-on-chip...... and rapidly explore high-level design alternatives to meet their system requirements. We present a modeling approach for developing high-level performance models for these SoC designs and outline how this system-level performance analysis capability can be integrated into an overall environment for efficient...

  13. Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

    Science.gov (United States)

    Ohmacht, Martin

    2014-09-09

    In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.

  14. A high speed multi-tasking, multi-processor telemetry system

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Kung Chris [Univ. of Texas, El Paso, TX (United States)

    1996-12-31

    This paper describes a small size, light weight, multitasking, multiprocessor telemetry system capable of collecting 32 channels of differential signals at a sampling rate of 6.25 kHz per channel. The system is designed to collect data from remote wind turbine research sites and transfer the data via wireless communication. A description of operational theory, hardware components, and itemized cost is provided. Synchronization with other data acquisition systems and test data on data transmission rates is also given. 11 refs., 7 figs., 4 tabs.

  15. Parallel algorithms for geometric connected component labeling on a hypercube multiprocessor

    Science.gov (United States)

    Belkhale, K. P.; Banerjee, P.

    1992-01-01

    Different algorithms for the geometric connected component labeling (GCCL) problem are defined each of which involves d stages of message passing, for a d-dimensional hypercube. The major idea is that in each stage a hypercube multiprocessor increases its knowledge of domain. The algorithms under consideration include the QUAD algorithm for small number of processors and the Overlap Quad algorithm for large number of processors, subject to the locality of the connected sets. These algorithms differ in their run time, memory requirements, and message complexity. They were implemented on an Intel iPSC2/D4/MX hypercube.

  16. A parallel implementation of 3-d CT image reconstruction on a hypercube multiprocessor

    International Nuclear Information System (INIS)

    Chen, C.M.; Lee, S.Y.; Cho, Z.H.

    1990-01-01

    In this paper, the authors describe how image reconstruction in computerized tomography (CT) can be parallelized on a message-passing multiprocessor. In particular, the results obtained from parallel implementation of 3-D CT image reconstruction for parallel beam geometries on the Intel hypercube, iPSC/2, are presented. A two stage pipelining approach is employed for filtering (convolution) and backprojection. The conventional sequential convolution algorithm is modified such that the symmetry of the filter kernel is fully utilized for parallelization. In the backprojection stage, the 3-D incremental algorithm, the authors' recently developed backprojection scheme which is shown to be faster than conventional algorithm, is parallelized

  17. Stereo Vision and 3D Reconstruction on a Distributed Memory System

    NARCIS (Netherlands)

    Kuijpers, N.H.L.; Paar, G.; Lukkien, J.J.

    1996-01-01

    An important research topic in image processing is stereo vision. The objective is to compute a 3-dimensional representation of some scenery from two 2-dimensional digital images. Constructing a 3-dimensional representation involves finding pairs of pixels from the two images which correspond to the

  18. Capacity for patterns and sequences in Kanerva's SDM as compared to other associative memory models. [Sparse, Distributed Memory

    Science.gov (United States)

    Keeler, James D.

    1988-01-01

    The information capacity of Kanerva's Sparse Distributed Memory (SDM) and Hopfield-type neural networks is investigated. Under the approximations used here, it is shown that the total information stored in these systems is proportional to the number connections in the network. The proportionality constant is the same for the SDM and Hopfield-type models independent of the particular model, or the order of the model. The approximations are checked numerically. This same analysis can be used to show that the SDM can store sequences of spatiotemporal patterns, and the addition of time-delayed connections allows the retrieval of context dependent temporal patterns. A minor modification of the SDM can be used to store correlated patterns.

  19. SNAIL: A multiprocessor based on the simple serial synchronized multistage interconnection network architecture

    Energy Technology Data Exchange (ETDEWEB)

    Sasahara, M.; Terada, J.; Zhou, L.; Gaye, K.; Yamato, J.; Ogura, S.; Amano, H. [Keio Univ., Yokohama (Japan)

    1994-12-31

    Simple Serial Synchronized (SSS) Multistage Interconnection Network (MIN) is a novel MIN architecture for connecting processors and memory modules in multiprocessors. Synchronized bit-serial communication simplifies the structure/control, and also solves the pin-limitation problem. Here, design, implementation, and evaluation of a multiprocessor prototype called SNAIL with the SSS-MIN are presented. The heart of SNAIL is the prototype 1 {mu} CMOS SSS-MIN gate array chip which exchanges packets from 16 inputs with 50MHz clock. The message combining is implemented only with 20% increases of the hardware. From the empirical evaluation with some application programs, it appears that the latency and synchronization overhead of the SSS-MIN are tolerable, and the bandwidth of the SSS-MIN is sufficient. Although the performance improvement with the bit serial message combine is not so large (1%) when instructions are stored in the local memory, it becomes up to 400% when instructions are stored in the shared memory.

  20. Extreme Scale Computing Studies

    Science.gov (United States)

    2010-12-01

    partitioned address space. The second category of departmental systems is composed of clusters of PC boxes (often called Beowulf systems), and at first...message passing models as distributed memory machines including low-cost Beowulf clusters became the architecture of choice. In each case, there were...an expert in the field of parallel computer system architecture and parallel programming methods. Dr. Sterling led the Beowulf Project that performed

  1. The FORCE: A portable parallel programming language supporting computational structural mechanics

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Brehm, Juergen; Ramanan, Aruna

    1989-01-01

    This project supports the conversion of codes in Computational Structural Mechanics (CSM) to a parallel form which will efficiently exploit the computational power available from multiprocessors. The work is a part of a comprehensive, FORTRAN-based system to form a basis for a parallel version of the NICE/SPAR combination which will form the CSM Testbed. The software is macro-based and rests on the force methodology developed by the principal investigator in connection with an early scientific multiprocessor. Machine independence is an important characteristic of the system so that retargeting it to the Flex/32, or any other multiprocessor on which NICE/SPAR might be imnplemented, is well supported. The principal investigator has experience in producing parallel software for both full and sparse systems of linear equations using the force macros. Other researchers have used the Force in finite element programs. It has been possible to rapidly develop software which performs at maximum efficiency on a multiprocessor. The inherent machine independence of the system also means that the parallelization will not be limited to a specific multiprocessor.

  2. Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

    KAUST Repository

    Quintin, Jean-Noel

    2013-10-01

    Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon\\'s algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon\\'s algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.

  3. Multicore Processing and ARTEMIS - An incentive to develop the European Multiprocessor research

    DEFF Research Database (Denmark)

    Seceleanu, Tiberius; Tenhunen, Hannu; Jerraya, Ahmed

    2006-01-01

    in DS and SOC, and other related domains. Primarily motivated by market concerns, and also by the promises of the available billion transistor technology, MPSOC is increasingly becoming the preferred target for embedded systems (ES) implementations. Furthermore, the possibility to fit a huge number...... in the recent period that technological advances allow for a change of this paradigm towards on-chip distributed platforms, or multi-core, or multi processor system-on-chip (MPSOC). A multiprocessor architecture may be defined as: onchip clusters of heterogeneous functionality modules, cooperating...... in the implementation of multiple concurrent applications. Architecturally, MPSOC combine characteristics from both distributed (DS) and on-chip systems (SOC). However, addressing issues from either one of these later paradigms will not necessarily bring optimal benefits to MPSOC. For instance, in MPSOC, differently...

  4. Role of WDM optical interconnections in a distributed shared-memory multiprocessor

    Science.gov (United States)

    Ghose, Kanad; Singhvi, Nitin K.; Horsell, R. K.

    1993-07-01

    This paper explores the use of optical spanning bus interconnections using optical fiber links and wavelength division multiplexing (WDM) with statically assigned channels for implementing distributed shared memory multiprocessors (DSMM). The WDM optical links allow processor synchronization and coherent caches -- two very important requirements for a DSMM -- to be implemented efficiently. Simultaneous broadcasts possible on several channels along a WDM photonic link allow barrier synchronizations to be fast and scalable. The large bandwidth capability of WDM optical links and their ability for accommodating simultaneously -- active channels permits a write-update based cache coherence protocol to be implemented. Our proposed cache coherence protocol is sensitive to the relatively longer propagation delays along optic fiber links and possible mismatch in speeds between the electronic and photonic components. The protocol is hierarchical, reflecting the hierarchy in the interconnection, making it easily scalable with system size. In addition, the proposed protocol performs opportunistic request combining, a pleasant side-effect of using optical links.

  5. Commodity multi-processor systems in the ATLAS level-2 trigger

    International Nuclear Information System (INIS)

    Abolins, M.; Blair, R.; Bock, R.; Bogaerts, A.; Dawson, J.; Ermoline, Y.; Hauser, R.; Kugel, A.; Lay, R.; Muller, M.; Noffz, K.-H.; Pope, B.; Schlereth, J.; Werner, P.

    2000-01-01

    Low cost SMP (Symmetric Multi-Processor) systems provide substantial CPU and I/O capacity. These features together with the ease of system integration make them an attractive and cost effective solution for a number of real-time applications in event selection. In ATLAS the authors consider them as intelligent input buffers (active ROB complex), as event flow supervisors or as powerful processing nodes. Measurements of the performance of one off-the-shelf commercial 4-processor PC with two PCI buses, equipped with commercial FPGA based data source cards (microEnable) and running commercial software are presented and mapped on such applications together with a long-term program of work. The SMP systems may be considered as an important building block in future data acquisition systems

  6. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications

    Energy Technology Data Exchange (ETDEWEB)

    Kamil, Shoaib A; Hendry, Gilbert; Biberman, Aleksandr; Chan, Johnnie; Lee, Benjamin G.; Mohiyuddin, Marghoob; Jain, Ankit; Bergman, Keren; Carloni, Luca; Kubiatowicz, John; Oliker, Leonid; Shalf, John

    2009-01-31

    As multiprocessors scale to unprecedented numbers of cores in order to sustain performance growth, it is vital that these gains are not nullified by high energy consumption from inter-core communication. With recent advances in 3D Integration CMOS technology, the possibility for realizing hybrid photonic-electronic networks-on-chip warrants investigating real application traces on functionally comparable photonic and electronic network designs. We present a comparative analysis using both synthetic benchmarks as well as real applications, run through detailed cycle accurate models implemented under the OMNeT++ discrete event simulation environment. Results show that when utilizing standard process-to-processor mapping methods, this hybrid network can achieve 75X improvement in energy efficiency for synthetic benchmarks and up to 37X improvement for real scientific applications, defined as network performance per energy spent, over an electronic mesh for large messages across a variety of communication patterns.

  7. Commodity multi-processor systems in the ATLAS level-2 trigger

    CERN Document Server

    Abolins, M; Bock, R; Bogaerts, J A C; Dawson, J; Ermoline, Y; Hauser, R; Kugel, A; Lay, R; Müller, M; Noffz, K H; Pope, B; Schlereth, J L; Werner, P

    2000-01-01

    Low cost SMP (symmetric multi-processor) systems provide substantial CPU and I/O capacity. These features together with the ease of system integration make them an attractive and cost effective solution for a number of real-time applications in event selection. In ATLAS we consider them as intelligent input buffers (an "active" ROB complex), as event flow supervisors or as powerful processing nodes. Measurements of the performance of one off-the-shelf commercial 4- processor PC with two PCI buses, equipped with commercial FPGA based data source cards (microEnable) and running commercial software are presented and mapped on such applications together with a long-term programme of work. The SMP systems may be considered as an important building block in future data acquisition systems. (9 refs).

  8. Performance Analysis of a Reconfigurable Shared Memory Multiprocessor System for Embedded Applications

    Directory of Open Access Journals (Sweden)

    Darcy Cook

    2014-11-01

    Full Text Available This paper presents a method to predict performance of multiple processor cores in a reconfigurable system for embedded applications. A multiprocessor framework is developed with the capability of reconfigurable processors in a shared memory system optimized for stream-oriented data and signal processing applications. The framework features a discrete time Markov based stochastic tool, which is used to analyze memory contention in the shared memory architecture, and to predict the performance increase (speed of execution when the number of processors is varied. Performance predictions for variations of other system parameters, such as different task allocations and the number of pipeline stages are possible as well. The results of the prediction tool were verified by experimental results of a green screen application developed and run on a Xilinx Virtex-II Pro FPGA with MicroBlaze soft processors.

  9. SYMNET: An Optical Interconnection Network for Scalable High-Performance Symmetric Multiprocessors

    Science.gov (United States)

    Louri, Ahmed; Karanth Kodi, Avinash

    2003-06-01

    We address the primary limitation of the bandwidth to satisfy the demands for address transactions in future cache-coherent symmetric multiprocessors (SMPs). It is widely known that the bus speed and the coherence overhead limit the snoop /address bandwidth needed to broadcast address transactions to all processors. As a solution, we propose a scalable address subnetwork called symmetric multiprocessor network (SYMNET) in which address requests and snoop responses of SMPs are implemented optically. SYMNET not only has the ability to pipeline address requests, but also multiple address requests from different processors can propagate through the address subnetwork simultaneously. This is in contrast with all electrical bus-based SMPs, where only a single request is broadcast on the physical address bus at any given point in time. The simultaneous propagation of multiple address requests in SYMNET increases the available address bandwidth and lowers the latency of the network, but the preservation of cache coherence can no longer be maintained with the usual fast snooping protocols. A modified snooping cache-coherence protocol, coherence in SYMNET (COSYM) is introduced to solve the coherence problem. We evaluated SYMNET with a subset of Splash-2 benchmarks and compared it with the electrical bus-based MOESI (modified, owned, exclusive, shared, invalid) protocol. Our simulation studies have shown a 5 -66% improvement in execution time for COSYM as compared with MOESI for various applications. Simulations have also shown that the average latency for a transaction to complete by use of COSYM protocol was 5 -78% better than the MOESI protocol. SYMNET can scale up to hundreds of processors while still using fast snooping-based cache-coherence protocols, and additional performance gains may be attained with further improvement in optical device technology.

  10. Some methods of encoding simple visual images for use with a sparse distributed memory, with applications to character recognition

    Science.gov (United States)

    Jaeckel, Louis A.

    1989-01-01

    To study the problems of encoding visual images for use with a Sparse Distributed Memory (SDM), I consider a specific class of images- those that consist of several pieces, each of which is a line segment or an arc of a circle. This class includes line drawings of characters such as letters of the alphabet. I give a method of representing a segment of an arc by five numbers in a continuous way; that is, similar arcs have similar representations. I also give methods for encoding these numbers as bit strings in an approximately continuous way. The set of possible segments and arcs may be viewed as a five-dimensional manifold M, whose structure is like a Mobious strip. An image, considered to be an unordered set of segments and arcs, is therefore represented by a set of points in M - one for each piece. I then discuss the problem of constructing a preprocessor to find the segments and arcs in these images, although a preprocessor has not been developed. I also describe a possible extension of the representation.

  11. Optimal data replication: A new approach to optimizing parallel EM algorithms on a mesh-connected multiprocessor for 3D PET image reconstruction

    International Nuclear Information System (INIS)

    Chen, C.M.; Lee, S.Y.

    1995-01-01

    The EM algorithm promises an estimated image with the maximal likelihood for 3D PET image reconstruction. However, due to its long computation time, the EM algorithm has not been widely used in practice. While several parallel implementations of the EM algorithm have been developed to make the EM algorithm feasible, they do not guarantee an optimal parallelization efficiency. In this paper, the authors propose a new parallel EM algorithm which maximizes the performance by optimizing data replication on a mesh-connected message-passing multiprocessor. To optimize data replication, the authors have formally derived the optimal allocation of shared data, group sizes, integration and broadcasting of replicated data as well as the scheduling of shared data accesses. The proposed parallel EM algorithm has been implemented on an iPSC/860 with 16 PEs. The experimental and theoretical results, which are consistent with each other, have shown that the proposed parallel EM algorithm could improve performance substantially over those using unoptimized data replication

  12. Memory management and compiler support for rapid recovery from failures in computer systems

    Science.gov (United States)

    Fuchs, W. K.

    1991-01-01

    This paper describes recent developments in the use of memory management and compiler technology to support rapid recovery from failures in computer systems. The techniques described include cache coherence protocols for user transparent checkpointing in multiprocessor systems, compiler-based checkpoint placement, compiler-based code modification for multiple instruction retry, and forward recovery in distributed systems utilizing optimistic execution.

  13. Parallel optical interconnects may reduce the communication bottleneck in symmetric multiprocessors

    Science.gov (United States)

    Henri Collet, Jacques; Hlayhel, Wissam; Litaize, Daniel

    2001-07-01

    We start with a detailed analysis of the communication issues in today s symmetric multiprocessor (SMP) architectures to study the benefits of implementing optical interconnects (OI) in these machines. We show that the transmission of block addresses is the most critical communication bottleneck of future large SMPs owing to the need to preserve the coherence of data duplicated in caches. An address transmission bandwidth as high as 200 -300 Gb /s may be necessary in ten years from now; this requirement will represent a difficult challenge for shared electric buses. In this context we suggest the introduction of simple point-to-point OIs for a SMP cache-coherent switch, i.e., for a VLSI switch that would emulate the shared-bus function. The operation might require as much as 10,000 input -outputs (IOs) to connect 100 processors, particularly if one maintains the present parallelism of transmissions to preserve a large bandwidth and a short memory access latency. The interest for OIs comes from the potential increase of the transmission frequency and from the possible integration of such a high density of IOs on top of electronic chips to overcome packaging issues. Then we consider the implementation of an optical bus that is a multipoint optical line involving more optical technology. This solution allows multiple simultaneous accesses to the bus, but the preservation of the coherence of caches can no longer be maintained with the usual fast snooping protocols.

  14. The design of linear algebra libraries for high performance computers

    Energy Technology Data Exchange (ETDEWEB)

    Dongarra, J.J. [Tennessee Univ., Knoxville, TN (United States). Dept. of Computer Science]|[Oak Ridge National Lab., TN (United States); Walker, D.W. [Oak Ridge National Lab., TN (United States)

    1993-08-01

    This paper discusses the design of linear algebra libraries for high performance computers. Particular emphasis is placed on the development of scalable algorithms for MIMD distributed memory concurrent computers. A brief description of the EISPACK, LINPACK, and LAPACK libraries is given, followed by an outline of ScaLAPACK, which is a distributed memory version of LAPACK currently under development. The importance of block-partitioned algorithms in reducing the frequency of data movement between different levels of hierarchical memory is stressed. The use of such algorithms helps reduce the message startup costs on distributed memory concurrent computers. Other key ideas in our approach are the use of distributed versions of the Level 3 Basic Linear Algebra Subprograms (BLAS) as computational building blocks, and the use of Basic Linear Algebra Communication Subprograms (BLACS) as communication building blocks. Together the distributed BLAS and the BLACS can be used to construct higher-level algorithms, and hide many details of the parallelism from the application developer. The block-cyclic data distribution is described, and adopted as a good way of distributing block-partitioned matrices. Block-partitioned versions of the Cholesky and LU factorizations are presented, and optimization issues associated with the implementation of the LU factorization algorithm on distributed memory concurrent computers are discussed, together with its performance on the Intel Delta system. Finally, approaches to the design of library interfaces are reviewed.

  15. Efficient computation of aerodynamic influence coefficients for aeroelastic analysis on a transputer network

    Science.gov (United States)

    Janetzke, David C.; Murthy, Durbha V.

    1991-01-01

    Aeroelastic analysis is multi-disciplinary and computationally expensive. Hence, it can greatly benefit from parallel processing. As part of an effort to develop an aeroelastic capability on a distributed memory transputer network, a parallel algorithm for the computation of aerodynamic influence coefficients is implemented on a network of 32 transputers. The aerodynamic influence coefficients are calculated using a 3-D unsteady aerodynamic model and a parallel discretization. Efficiencies up to 85 percent were demonstrated using 32 processors. The effect of subtask ordering, problem size, and network topology are presented. A comparison to results on a shared memory computer indicates that higher speedup is achieved on the distributed memory system.

  16. Design concepts for a virtualizable embedded MPSoC architecture enabling virtualization in embedded multi-processor systems

    CERN Document Server

    Biedermann, Alexander

    2014-01-01

    Alexander Biedermann presents a generic hardware-based virtualization approach, which may transform an array of any off-the-shelf embedded processors into a multi-processor system with high execution dynamism. Based on this approach, he highlights concepts for the design of energy aware systems, self-healing systems as well as parallelized systems. For the latter, the novel so-called Agile Processing scheme is introduced by the author, which enables a seamless transition between sequential and parallel execution schemes. The design of such virtualizable systems is further aided by introduction

  17. Embedded software design and programming of multiprocessor system-on-chip simulink and system C case studies

    CERN Document Server

    Popovici, Katalin; Jerraya, Ahmed A; Wolf, Marilyn

    2010-01-01

    Current multimedia and telecom applications require complex, heterogeneous multiprocessor system on chip (MPSoC) architectures with specific communication infrastructure in order to achieve the required performance. Heterogeneous MPSoC includes different types of processing units (DSP, microcontroller, ASIP) and different communication schemes (fast links, non standard memory organization and access).Programming an MPSoC requires the generation of efficient software running on MPSoC from a high level environment, by using the characteristics of the architecture. This task is known to be tediou

  18. Design of Networks-on-Chip for Real-Time Multi-Processor Systems-on-Chip

    DEFF Research Database (Denmark)

    Sparsø, Jens

    2012-01-01

    This paper addresses the design of networks-on-chips for use in multi-processor systems-on-chips - the hardware platforms used in embedded systems. These platforms typically have to guarantee real-time properties, and as the network is a shared resource, it has to provide service guarantees...... (bandwidth and/or latency) to different communication flows. The paper reviews some past work in this field and the lessons learned, and the paper discusses ongoing research conducted as part of the project "Time-predictable Multi-Core Architecture for Embedded Systems" (T-CREST), supported by the European...

  19. Computer Architecture A Quantitative Approach

    CERN Document Server

    Hennessy, John L

    2007-01-01

    The era of seemingly unlimited growth in processor performance is over: single chip architectures can no longer overcome the performance limitations imposed by the power they consume and the heat they generate. Today, Intel and other semiconductor firms are abandoning the single fast processor model in favor of multi-core microprocessors--chips that combine two or more processors in a single package. In the fourth edition of Computer Architecture, the authors focus on this historic shift, increasing their coverage of multiprocessors and exploring the most effective ways of achieving parallelis

  20. COMPUTING

    CERN Document Server

    M. Kasemann

    Overview In autumn the main focus was to process and handle CRAFT data and to perform the Summer08 MC production. The operational aspects were well covered by regular Computing Shifts, experts on duty and Computing Run Coordination. At the Computing Resource Board (CRB) in October a model to account for service work at Tier 2s was approved. The computing resources for 2009 were reviewed for presentation at the C-RRB. The quarterly resource monitoring is continuing. Facilities/Infrastructure operations Operations during CRAFT data taking ran fine. This proved to be a very valuable experience for T0 workflows and operations. The transfers of custodial data to most T1s went smoothly. A first round of reprocessing started at the Tier-1 centers end of November; it will take about two weeks. The Computing Shifts procedure was tested full scale during this period and proved to be very efficient: 30 Computing Shifts Persons (CSP) and 10 Computing Resources Coordinators (CRC). The shift program for the shut down w...

  1. COMPUTING

    CERN Multimedia

    I. Fisk

    2011-01-01

    Introduction CMS distributed computing system performed well during the 2011 start-up. The events in 2011 have more pile-up and are more complex than last year; this results in longer reconstruction times and harder events to simulate. Significant increases in computing capacity were delivered in April for all computing tiers, and the utilisation and load is close to the planning predictions. All computing centre tiers performed their expected functionalities. Heavy-Ion Programme The CMS Heavy-Ion Programme had a very strong showing at the Quark Matter conference. A large number of analyses were shown. The dedicated heavy-ion reconstruction facility at the Vanderbilt Tier-2 is still involved in some commissioning activities, but is available for processing and analysis. Facilities and Infrastructure Operations Facility and Infrastructure operations have been active with operations and several important deployment tasks. Facilities participated in the testing and deployment of WMAgent and WorkQueue+Request...

  2. COMPUTING

    CERN Multimedia

    M. Kasemann

    Overview During the past three months activities were focused on data operations, testing and re-enforcing shift and operational procedures for data production and transfer, MC production and on user support. Planning of the computing resources in view of the new LHC calendar in ongoing. Two new task forces were created for supporting the integration work: Site Commissioning, which develops tools helping distributed sites to monitor job and data workflows, and Analysis Support, collecting the user experience and feedback during analysis activities and developing tools to increase efficiency. The development plan for DMWM for 2009/2011 was developed at the beginning of the year, based on the requirements from the Physics, Computing and Offline groups (see Offline section). The Computing management meeting at FermiLab on February 19th and 20th was an excellent opportunity discussing the impact and for addressing issues and solutions to the main challenges facing CMS computing. The lack of manpower is particul...

  3. COMPUTING

    CERN Multimedia

    P. McBride

    The Computing Project is preparing for a busy year where the primary emphasis of the project moves towards steady operations. Following the very successful completion of Computing Software and Analysis challenge, CSA06, last fall, we have reorganized and established four groups in computing area: Commissioning, User Support, Facility/Infrastructure Operations and Data Operations. These groups work closely together with groups from the Offline Project in planning for data processing and operations. Monte Carlo production has continued since CSA06, with about 30M events produced each month to be used for HLT studies and physics validation. Monte Carlo production will continue throughout the year in the preparation of large samples for physics and detector studies ramping to 50 M events/month for CSA07. Commissioning of the full CMS computing system is a major goal for 2007. Site monitoring is an important commissioning component and work is ongoing to devise CMS specific tests to be included in Service Availa...

  4. COMPUTING

    CERN Multimedia

    I. Fisk

    2013-01-01

    Computing activity had ramped down after the completion of the reprocessing of the 2012 data and parked data, but is increasing with new simulation samples for analysis and upgrade studies. Much of the Computing effort is currently involved in activities to improve the computing system in preparation for 2015. Operations Office Since the beginning of 2013, the Computing Operations team successfully re-processed the 2012 data in record time, not only by using opportunistic resources like the San Diego Supercomputer Center which was accessible, to re-process the primary datasets HTMHT and MultiJet in Run2012D much earlier than planned. The Heavy-Ion data-taking period was successfully concluded in February collecting almost 500 T. Figure 3: Number of events per month (data) In LS1, our emphasis is to increase efficiency and flexibility of the infrastructure and operation. Computing Operations is working on separating disk and tape at the Tier-1 sites and the full implementation of the xrootd federation ...

  5. Process Management and Exception Handling in Multiprocessor Operating Systems Using Object-Oriented Design Techniques. Revised Sep. 1988

    Science.gov (United States)

    Russo, Vincent; Johnston, Gary; Campbell, Roy

    1988-01-01

    The programming of the interrupt handling mechanisms, process switching primitives, scheduling mechanism, and synchronization primitives of an operating system for a multiprocessor require both efficient code in order to support the needs of high- performance or real-time applications and careful organization to facilitate maintenance. Although many advantages have been claimed for object-oriented class hierarchical languages and their corresponding design methodologies, the application of these techniques to the design of the primitives within an operating system has not been widely demonstrated. To investigate the role of class hierarchical design in systems programming, the authors have constructed the Choices multiprocessor operating system architecture the C++ programming language. During the implementation, it was found that many operating system design concerns can be represented advantageously using a class hierarchical approach, including: the separation of mechanism and policy; the organization of an operating system into layers, each of which represents an abstract machine; and the notions of process and exception management. In this paper, we discuss an implementation of the low-level primitives of this system and outline the strategy by which we developed our solution.

  6. Dynamic grid refinement for partial differential equations on parallel computers

    International Nuclear Information System (INIS)

    Mccormick, S.; Quinlan, D.

    1989-01-01

    The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids to provide adaptive resolution and fast solution of PDEs. An asynchronous version of FAC, called AFAC, that completely eliminates the bottleneck to parallelism is presented. This paper describes the advantage that this algorithm has in adaptive refinement for moving singularities on multiprocessor computers. This work is applicable to the parallel solution of two- and three-dimensional shock tracking problems. 6 refs

  7. COMPUTING

    CERN Multimedia

    M. Kasemann P. McBride Edited by M-C. Sawley with contributions from: P. Kreuzer D. Bonacorsi S. Belforte F. Wuerthwein L. Bauerdick K. Lassila-Perini M-C. Sawley

    Introduction More than seventy CMS collaborators attended the Computing and Offline Workshop in San Diego, California, April 20-24th to discuss the state of readiness of software and computing for collisions. Focus and priority were given to preparations for data taking and providing room for ample dialog between groups involved in Commissioning, Data Operations, Analysis and MC Production. Throughout the workshop, aspects of software, operating procedures and issues addressing all parts of the computing model were discussed. Plans for the CMS participation in STEP’09, the combined scale testing for all four experiments due in June 2009, were refined. The article in CMS Times by Frank Wuerthwein gave a good recap of the highly collaborative atmosphere of the workshop. Many thanks to UCSD and to the organizers for taking care of this workshop, which resulted in a long list of action items and was definitely a success. A considerable amount of effort and care is invested in the estimate of the comput...

  8. COMPUTING

    CERN Multimedia

    I. Fisk

    2010-01-01

    Introduction It has been a very active quarter in Computing with interesting progress in all areas. The activity level at the computing facilities, driven by both organised processing from data operations and user analysis, has been steadily increasing. The large-scale production of simulated events that has been progressing throughout the fall is wrapping-up and reprocessing with pile-up will continue. A large reprocessing of all the proton-proton data has just been released and another will follow shortly. The number of analysis jobs by users each day, that was already hitting the computing model expectations at the time of ICHEP, is now 33% higher. We are expecting a busy holiday break to ensure samples are ready in time for the winter conferences. Heavy Ion An activity that is still in progress is computing for the heavy-ion program. The heavy-ion events are collected without zero suppression, so the event size is much large at roughly 11 MB per event of RAW. The central collisions are more complex and...

  9. Hybrid Simulation of the Interaction of Europa's Atmosphere with the Jovian Plasma: Multiprocessor Simulations

    Science.gov (United States)

    Dols, V. J.; Delamere, P. A.; Bagenal, F.; Cassidy, T. A.; Crary, F. J.

    2014-12-01

    We model the interaction of Europa's tenuous atmosphere with the plasma of Jupiter's torus with an improved version of our hybrid plasma code. In a hybrid plasma code, the ions are treated as kinetic Macro-particles moving under the Lorentz force and the electrons as a fluid leading to a generalized formulation of Ohm's law. In this version, the spatial simulation domain is decomposed in 2 directions and is non-uniform in the plasma convection direction. The code is run on a multi-processor supercomputer that offers 16416 cores and 2GB Ram per core. This new version allows us to tap into the large memory of the supercomputer and simulate the full interaction volume (Reuropa=1561km) with a high spatial resolution (50km). Compared to Io, Europa's atmosphere is about 100 times more tenuous, the ambient magnetic field is weaker and the density of incident plasma is lower. Consequently, the electrodynamic interaction is also weaker and substantial fluxes of thermal torus ions might reach and sputter the icy surface. Molecular O2 is the dominant atmospheric product of this surface sputtering. Observations of oxygen UV emissions (specifically the ratio of OI 1356A / 1304A emissions) are roughly consistent with an atmosphere that is composed predominantely of O2 with a small amount of atomic O. Galileo observations along flybys close to Europa have revealed the existence of induced currents in a conducting ocean under the icy crust. They also showed that, from flyby to flyby, the plasma interaction is very variable. Asymmetries of the plasma density and temperature in the wake of Europa were also observed and still elude a clear explanation. Galileo mag data also detected ion cyclotron waves, which is an indication of heavy ion pickup close to the moon. We prescribe an O2 atmosphere with a vertical density column consistent with UV observations and model the plasma properties along several Galileo flybys of the moon. We compare our results with the magnetometer

  10. COMPUTING

    CERN Multimedia

    P. McBride

    It has been a very active year for the computing project with strong contributions from members of the global community. The project has focused on site preparation and Monte Carlo production. The operations group has begun processing data from P5 as part of the global data commissioning. Improvements in transfer rates and site availability have been seen as computing sites across the globe prepare for large scale production and analysis as part of CSA07. Preparations for the upcoming Computing Software and Analysis Challenge CSA07 are progressing. Ian Fisk and Neil Geddes have been appointed as coordinators for the challenge. CSA07 will include production tests of the Tier-0 production system, reprocessing at the Tier-1 sites and Monte Carlo production at the Tier-2 sites. At the same time there will be a large analysis exercise at the Tier-2 centres. Pre-production simulation of the Monte Carlo events for the challenge is beginning. Scale tests of the Tier-0 will begin in mid-July and the challenge it...

  11. COMPUTING

    CERN Multimedia

    M. Kasemann

    CCRC’08 challenges and CSA08 During the February campaign of the Common Computing readiness challenges (CCRC’08), the CMS computing team had achieved very good results. The link between the detector site and the Tier0 was tested by gradually increasing the number of parallel transfer streams well beyond the target. Tests covered the global robustness at the Tier0, processing a massive number of very large files and with a high writing speed to tapes.  Other tests covered the links between the different Tiers of the distributed infrastructure and the pre-staging and reprocessing capacity of the Tier1’s: response time, data transfer rate and success rate for Tape to Buffer staging of files kept exclusively on Tape were measured. In all cases, coordination with the sites was efficient and no serious problem was found. These successful preparations prepared the ground for the second phase of the CCRC’08 campaign, in May. The Computing Software and Analysis challen...

  12. COMPUTING

    CERN Multimedia

    I. Fisk

    2011-01-01

    Introduction It has been a very active quarter in Computing with interesting progress in all areas. The activity level at the computing facilities, driven by both organised processing from data operations and user analysis, has been steadily increasing. The large-scale production of simulated events that has been progressing throughout the fall is wrapping-up and reprocessing with pile-up will continue. A large reprocessing of all the proton-proton data has just been released and another will follow shortly. The number of analysis jobs by users each day, that was already hitting the computing model expectations at the time of ICHEP, is now 33% higher. We are expecting a busy holiday break to ensure samples are ready in time for the winter conferences. Heavy Ion The Tier 0 infrastructure was able to repack and promptly reconstruct heavy-ion collision data. Two copies were made of the data at CERN using a large CASTOR disk pool, and the core physics sample was replicated ...

  13. COMPUTING

    CERN Multimedia

    M. Kasemann

    Introduction More than seventy CMS collaborators attended the Computing and Offline Workshop in San Diego, California, April 20-24th to discuss the state of readiness of software and computing for collisions. Focus and priority were given to preparations for data taking and providing room for ample dialog between groups involved in Commissioning, Data Operations, Analysis and MC Production. Throughout the workshop, aspects of software, operating procedures and issues addressing all parts of the computing model were discussed. Plans for the CMS participation in STEP’09, the combined scale testing for all four experiments due in June 2009, were refined. The article in CMS Times by Frank Wuerthwein gave a good recap of the highly collaborative atmosphere of the workshop. Many thanks to UCSD and to the organizers for taking care of this workshop, which resulted in a long list of action items and was definitely a success. A considerable amount of effort and care is invested in the estimate of the co...

  14. COMPUTING

    CERN Multimedia

    I. Fisk

    2012-01-01

    Introduction Computing continued with a high level of activity over the winter in preparation for conferences and the start of the 2012 run. 2012 brings new challenges with a new energy, more complex events, and the need to make the best use of the available time before the Long Shutdown. We expect to be resource constrained on all tiers of the computing system in 2012 and are working to ensure the high-priority goals of CMS are not impacted. Heavy ions After a successful 2011 heavy-ion run, the programme is moving to analysis. During the run, the CAF resources were well used for prompt analysis. Since then in 2012 on average 200 job slots have been used continuously at Vanderbilt for analysis workflows. Operations Office As of 2012, the Computing Project emphasis has moved from commissioning to operation of the various systems. This is reflected in the new organisation structure where the Facilities and Data Operations tasks have been merged into a common Operations Office, which now covers everything ...

  15. COMPUTING

    CERN Multimedia

    M. Kasemann

    Introduction During the past six months, Computing participated in the STEP09 exercise, had a major involvement in the October exercise and has been working with CMS sites on improving open issues relevant for data taking. At the same time operations for MC production, real data reconstruction and re-reconstructions and data transfers at large scales were performed. STEP09 was successfully conducted in June as a joint exercise with ATLAS and the other experiments. It gave good indication about the readiness of the WLCG infrastructure with the two major LHC experiments stressing the reading, writing and processing of physics data. The October Exercise, in contrast, was conducted as an all-CMS exercise, where Physics, Computing and Offline worked on a common plan to exercise all steps to efficiently access and analyze data. As one of the major results, the CMS Tier-2s demonstrated to be fully capable for performing data analysis. In recent weeks, efforts were devoted to CMS Computing readiness. All th...

  16. COMPUTING

    CERN Multimedia

    I. Fisk

    2010-01-01

    Introduction The first data taking period of November produced a first scientific paper, and this is a very satisfactory step for Computing. It also gave the invaluable opportunity to learn and debrief from this first, intense period, and make the necessary adaptations. The alarm procedures between different groups (DAQ, Physics, T0 processing, Alignment/calibration, T1 and T2 communications) have been reinforced. A major effort has also been invested into remodeling and optimizing operator tasks in all activities in Computing, in parallel with the recruitment of new Cat A operators. The teams are being completed and by mid year the new tasks will have been assigned. CRB (Computing Resource Board) The Board met twice since last CMS week. In December it reviewed the experience of the November data-taking period and could measure the positive improvements made for the site readiness. It also reviewed the policy under which Tier-2 are associated with Physics Groups. Such associations are decided twice per ye...

  17. Program partitioning and scheduling for NUMA computer architectures

    Energy Technology Data Exchange (ETDEWEB)

    Wolski, R.M.

    1994-03-01

    To effect the parallel execution of a program on a multiprocessor, each of the program`s constituent computations must be assigned to a processing resource within the multiprocessor. The problem of making this assignment so that execution time is minimized (known as the mapping problem) has been shown to be NP-complete. However, heuristics based on the performance characteristics of the target multiprocessor can yield execution times that approach the minimum possible. The mapping problem can be divided in to the problem of partitioning the computations into sequential threads, and the problem of scheduling those threads on the processors of the target system. This dissertation presents a logical framework and a set of heuristics that operate within the framework for the automatic partitioning and scheduling of programs at compile-time. The framework is based on the memory-node execution model which correctly captures the interaction between computations, processors, and the communication resources within a multiprocessor. The CP and HEF heuristics manipulate the features of the memory-node model to produce efficient program mappings. The effectiveness of the partitioning and scheduling techniques is investigated for Non-uniform Memory Access (NUMA) architecture types. To test the versatility of the approach, results are presented both for processors implementing strict execution semantics, and non-strict load/store semantics popular with RISC systems. The partitioner and scheduler are also used to investigate the possible advantages of multithreading (using either hardware or software), and the effectiveness of massively parallel systems, within a scientific programming context.

  18. A Parallel Distributed-Memory Particle Method Enables Acquisition-Rate Segmentation of Large Fluorescence Microscopy Images

    Science.gov (United States)

    Afshar, Yaser; Sbalzarini, Ivo F.

    2016-01-01

    Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 1010 pixels), but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments. PMID:27046144

  19. COMPUTING

    CERN Multimedia

    Matthias Kasemann

    Overview The main focus during the summer was to handle data coming from the detector and to perform Monte Carlo production. The lessons learned during the CCRC and CSA08 challenges in May were addressed by dedicated PADA campaigns lead by the Integration team. Big improvements were achieved in the stability and reliability of the CMS Tier1 and Tier2 centres by regular and systematic follow-up of faults and errors with the help of the Savannah bug tracking system. In preparation for data taking the roles of a Computing Run Coordinator and regular computing shifts monitoring the services and infrastructure as well as interfacing to the data operations tasks are being defined. The shift plan until the end of 2008 is being put together. User support worked on documentation and organized several training sessions. The ECoM task force delivered the report on “Use Cases for Start-up of pp Data-Taking” with recommendations and a set of tests to be performed for trigger rates much higher than the ...

  20. COMPUTING

    CERN Multimedia

    P. MacBride

    The Computing Software and Analysis Challenge CSA07 has been the main focus of the Computing Project for the past few months. Activities began over the summer with the preparation of the Monte Carlo data sets for the challenge and tests of the new production system at the Tier-0 at CERN. The pre-challenge Monte Carlo production was done in several steps: physics generation, detector simulation, digitization, conversion to RAW format and the samples were run through the High Level Trigger (HLT). The data was then merged into three "Soups": Chowder (ALPGEN), Stew (Filtered Pythia) and Gumbo (Pythia). The challenge officially started when the first Chowder events were reconstructed on the Tier-0 on October 3rd. The data operations teams were very busy during the the challenge period. The MC production teams continued with signal production and processing while the Tier-0 and Tier-1 teams worked on splitting the Soups into Primary Data Sets (PDS), reconstruction and skimming. The storage sys...

  1. COMPUTING

    CERN Document Server

    2010-01-01

    Introduction Just two months after the “LHC First Physics” event of 30th March, the analysis of the O(200) million 7 TeV collision events in CMS accumulated during the first 60 days is well under way. The consistency of the CMS computing model has been confirmed during these first weeks of data taking. This model is based on a hierarchy of use-cases deployed between the different tiers and, in particular, the distribution of RECO data to T1s, who then serve data on request to T2s, along a topology known as “fat tree”. Indeed, during this period this model was further extended by almost full “mesh” commissioning, meaning that RECO data were shipped to T2s whenever possible, enabling additional physics analyses compared with the “fat tree” model. Computing activities at the CMS Analysis Facility (CAF) have been marked by a good time response for a load almost evenly shared between ALCA (Alignment and Calibration tasks - highest p...

  2. COMPUTING

    CERN Multimedia

    I. Fisk

    2013-01-01

    Computing operation has been lower as the Run 1 samples are completing and smaller samples for upgrades and preparations are ramping up. Much of the computing activity is focusing on preparations for Run 2 and improvements in data access and flexibility of using resources. Operations Office Data processing was slow in the second half of 2013 with only the legacy re-reconstruction pass of 2011 data being processed at the sites.   Figure 1: MC production and processing was more in demand with a peak of over 750 Million GEN-SIM events in a single month.   Figure 2: The transfer system worked reliably and efficiently and transferred on average close to 520 TB per week with peaks at close to 1.2 PB.   Figure 3: The volume of data moved between CMS sites in the last six months   The tape utilisation was a focus for the operation teams with frequent deletion campaigns from deprecated 7 TeV MC GEN-SIM samples to INVALID datasets, which could be cleaned up...

  3. COMPUTING

    CERN Multimedia

    I. Fisk

    2012-01-01

      Introduction Computing activity has been running at a sustained, high rate as we collect data at high luminosity, process simulation, and begin to process the parked data. The system is functional, though a number of improvements are planned during LS1. Many of the changes will impact users, we hope only in positive ways. We are trying to improve the distributed analysis tools as well as the ability to access more data samples more transparently.  Operations Office Figure 2: Number of events per month, for 2012 Since the June CMS Week, Computing Operations teams successfully completed data re-reconstruction passes and finished the CMSSW_53X MC campaign with over three billion events available in AOD format. Recorded data was successfully processed in parallel, exceeding 1.2 billion raw physics events per month for the first time in October 2012 due to the increase in data-parking rate. In parallel, large efforts were dedicated to WMAgent development and integrati...

  4. COMPUTING

    CERN Multimedia

    M. Kasemann

    Introduction A large fraction of the effort was focused during the last period into the preparation and monitoring of the February tests of Common VO Computing Readiness Challenge 08. CCRC08 is being run by the WLCG collaboration in two phases, between the centres and all experiments. The February test is dedicated to functionality tests, while the May challenge will consist of running at all centres and with full workflows. For this first period, a number of functionality checks of the computing power, data repositories and archives as well as network links are planned. This will help assess the reliability of the systems under a variety of loads, and identifying possible bottlenecks. Many tests are scheduled together with other VOs, allowing the full scale stress test. The data rates (writing, accessing and transfer¬ring) are being checked under a variety of loads and operating conditions, as well as the reliability and transfer rates of the links between Tier-0 and Tier-1s. In addition, the capa...

  5. COMPUTING

    CERN Multimedia

    Contributions from I. Fisk

    2012-01-01

    Introduction The start of the 2012 run has been busy for Computing. We have reconstructed, archived, and served a larger sample of new data than in 2011, and we are in the process of producing an even larger new sample of simulations at 8 TeV. The running conditions and system performance are largely what was anticipated in the plan, thanks to the hard work and preparation of many people. Heavy ions Heavy Ions has been actively analysing data and preparing for conferences.  Operations Office Figure 6: Transfers from all sites in the last 90 days For ICHEP and the Upgrade efforts, we needed to produce and process record amounts of MC samples while supporting the very successful data-taking. This was a large burden, especially on the team members. Nevertheless the last three months were very successful and the total output was phenomenal, thanks to our dedicated site admins who keep the sites operational and the computing project members who spend countless hours nursing the...

  6. COMPUTING

    CERN Multimedia

    I. Fisk

    2011-01-01

    Introduction The Computing Team successfully completed the storage, initial processing, and distribution for analysis of proton-proton data in 2011. There are still a variety of activities ongoing to support winter conference activities and preparations for 2012. Heavy ions The heavy-ion run for 2011 started in early November and has already demonstrated good machine performance and success of some of the more advanced workflows planned for 2011. Data collection will continue until early December. Facilities and Infrastructure Operations Operational and deployment support for WMAgent and WorkQueue+Request Manager components, routinely used in production by Data Operations, are provided. The GlideInWMS and components installation are now deployed at CERN, which is added to the GlideInWMS factory placed in the US. There has been new operational collaboration between the CERN team and the UCSD GlideIn factory operators, covering each others time zones by monitoring/debugging pilot jobs sent from the facto...

  7. Algorithms for parallel computers

    International Nuclear Information System (INIS)

    Churchhouse, R.F.

    1985-01-01

    Until relatively recently almost all the algorithms for use on computers had been designed on the (usually unstated) assumption that they were to be run on single processor, serial machines. With the introduction of vector processors, array processors and interconnected systems of mainframes, minis and micros, however, various forms of parallelism have become available. The advantage of parallelism is that it offers increased overall processing speed but it also raises some fundamental questions, including: (i) which, if any, of the existing 'serial' algorithms can be adapted for use in the parallel mode. (ii) How close to optimal can such adapted algorithms be and, where relevant, what are the convergence criteria. (iii) How can we design new algorithms specifically for parallel systems. (iv) For multi-processor systems how can we handle the software aspects of the interprocessor communications. Aspects of these questions illustrated by examples are considered in these lectures. (orig.)

  8. The Distributed Memory Computing Conference (5th) Held in Charleston, South Carolina on April 8-12, 1990. Volume 1. Applications

    Science.gov (United States)

    1991-03-31

    sent to the spacecraft and simulate what the spacecraft will do with these commands when it receives them. We also describe promising results form...Factorization ........ 322 J.L. Barlow and U.B. Vemulapati LU Factorization of Sparse, Unsymmetric Jacobian Matrices on Multicomputers: Experience, Strategies...478 S.J. Plimpton Transputer Modelling of Be Star Circumstellar Discs ..................... 484 M.J. Gorrod, M.J. Coe

  9. COMPUTING

    CERN Multimedia

    M. Kasemann

    CMS relies on a well functioning, distributed computing infrastructure. The Site Availability Monitoring (SAM) and the Job Robot submission have been very instrumental for site commissioning in order to increase availability of more sites such that they are available to participate in CSA07 and are ready to be used for analysis. The commissioning process has been further developed, including "lessons learned" documentation via the CMS twiki. Recently the visualization, presentation and summarizing of SAM tests for sites has been redesigned, it is now developed by the central ARDA project of WLCG. Work to test the new gLite Workload Management System was performed; a 4 times increase in throughput with respect to LCG Resource Broker is observed. CMS has designed and launched a new-generation traffic load generator called "LoadTest" to commission and to keep exercised all data transfer routes in the CMS PhE-DEx topology. Since mid-February, a transfer volume of about 12 P...

  10. The ACP [Advanced Computer Program] Branch bus and real-time applications of the ACP multiprocessor system

    International Nuclear Information System (INIS)

    Hance, R.; Areti, H.; Atac, R.

    1987-01-01

    The ACP Branchbus, a high speed differential bus for data movement in multiprocessing and data acquisition environments, is described. This bus was designed as the central bus in the ACP multiprocessing system. In its full implementation with 16 branches and a bus switch, it will handle data rates of 160 MByte/sec and allow reliable data transmission over inter rack distances. We also summarize applications of the ACP system in experimental data acquisition, triggering and monitoring, with special attention paid to FASTBUS environments

  11. PSHED: a simplified approach to developing parallel programs

    International Nuclear Information System (INIS)

    Mahajan, S.M.; Ramesh, K.; Rajesh, K.; Somani, A.; Goel, M.

    1992-01-01

    This paper presents a simplified approach in the forms of a tree structured computational model for parallel application programs. An attempt is made to provide a standard user interface to execute programs on BARC Parallel Processing System (BPPS), a scalable distributed memory multiprocessor. The interface package called PSHED provides a basic framework for representing and executing parallel programs on different parallel architectures. The PSHED package incorporates concepts from a broad range of previous research in programming environments and parallel computations. (author). 6 refs

  12. Memory interface simulator: A computer design aid

    Science.gov (United States)

    Taylor, D. S.; Williams, T.; Weatherbee, J. E.

    1972-01-01

    Results are presented of a study conducted with a digital simulation model being used in the design of the Automatically Reconfigurable Modular Multiprocessor System (ARMMS), a candidate computer system for future manned and unmanned space missions. The model simulates the activity involved as instructions are fetched from random access memory for execution in one of the system central processing units. A series of model runs measured instruction execution time under various assumptions pertaining to the CPU's and the interface between the CPU's and RAM. Design tradeoffs are presented in the following areas: Bus widths, CPU microprogram read only memory cycle time, multiple instruction fetch, and instruction mix.

  13. Cluster computing for digital microscopy.

    Science.gov (United States)

    Carrington, Walter A; Lisin, Dimitri

    2004-06-01

    Microscopy is becoming increasingly digital and dependent on computation. Some of the computational tasks in microscopy are computationally intense, such as image restoration (deconvolution), some optical calculations, image segmentation, and image analysis. Several modern microscope technologies enable the acquisition of very large data sets. 3D imaging of live cells over time, multispectral imaging, very large tiled 3D images of thick samples, or images from high throughput biology all can produce extremely large images. These large data sets place a very large burden on laboratory computer resources. This combination of computationally intensive tasks and larger data sizes can easily exceed the capability of single personal computers. The large multiprocessor computers that are the traditional technology for larger tasks are too expensive for most laboratories. An alternative approach is to use a number of inexpensive personal computers as a cluster; that is, use multiple networked computers programmed to run the problem in parallel on all the computers in the cluster. By the use of relatively inexpensive over-the-counter hardware and open source software, this approach can be much more cost effective for many tasks. We discuss the different computer architectures available, and their advantages and disadvantages. Copyright 2004 Wiley-Liss, Inc.

  14. Bringing high-performance computing to the biologist's workbench: approaches, applications, and challenges

    International Nuclear Information System (INIS)

    Oehmen, C S; Cannon, W R

    2008-01-01

    Data-intensive and high-performance computing are poised to significantly impact the future of biological research which is increasingly driven by the prevalence of high-throughput experimental methodologies for genome sequencing, transcriptomics, proteomics, and other areas. Large centers such as NIH's National Center for Biotechnology Information, The Institute for Genomic Research, and the DOE's Joint Genome Institute) have made extensive use of multiprocessor architectures to deal with some of the challenges of processing, storing and curating exponentially growing genomic and proteomic datasets, thus enabling users to rapidly access a growing public data source, as well as use analysis tools transparently on high-performance computing resources. Applying this computational power to single-investigator analysis, however, often relies on users to provide their own computational resources, forcing them to endure the learning curve of porting, building, and running software on multiprocessor architectures. Solving the next generation of large-scale biology challenges using multiprocessor machines-from small clusters to emerging petascale machines-can most practically be realized if this learning curve can be minimized through a combination of workflow management, data management and resource allocation as well as intuitive interfaces and compatibility with existing common data formats

  15. Low-Cost Heterogeneous Embedded Multiprocessor Architecture for Real-Time Stream Processing Applications

    NARCIS (Netherlands)

    Dekens, B.H.J.

    2015-01-01

    SDR applications are often stream processing applications that are computationally intensive which results in a low throughput on homogeneous multi-core architectures and thus could benefit significantly from the use of stream processing accelerators. The integration of stream processing

  16. A parallel row-based algorithm with error control for standard-cell replacement on a hypercube multiprocessor

    Science.gov (United States)

    Sargent, Jeff Scott

    1988-01-01

    A new row-based parallel algorithm for standard-cell placement targeted for execution on a hypercube multiprocessor is presented. Key features of this implementation include a dynamic simulated-annealing schedule, row-partitioning of the VLSI chip image, and two novel new approaches to controlling error in parallel cell-placement algorithms; Heuristic Cell-Coloring and Adaptive (Parallel Move) Sequence Control. Heuristic Cell-Coloring identifies sets of noninteracting cells that can be moved repeatedly, and in parallel, with no buildup of error in the placement cost. Adaptive Sequence Control allows multiple parallel cell moves to take place between global cell-position updates. This feedback mechanism is based on an error bound derived analytically from the traditional annealing move-acceptance profile. Placement results are presented for real industry circuits and the performance is summarized of an implementation on the Intel iPSC/2 Hypercube. The runtime of this algorithm is 5 to 16 times faster than a previous program developed for the Hypercube, while producing equivalent quality placement. An integrated place and route program for the Intel iPSC/2 Hypercube is currently being developed.

  17. Computing possibilities in the mid 1990s

    International Nuclear Information System (INIS)

    Nash, T.

    1988-09-01

    This paper describes the kind of computing resources it may be possible to make available for experiments in high energy physics in the mid and late 1990s. We outline some of the work going on today, particularly at Fermilab's Advanced Computer Program, that projects to the future. We attempt to define areas in which coordinated R and D efforts should prove fruitful to provide for on and off-line computing in the SSC era. Because of extraordinary components anticipated from industry, we can be optimistic even to the level of predicting million VAX equivalent on-line multiprocessor/data acquisition systems for SSC detectors. Managing this scale of computing will require a new approach to large hardware and software systems. 15 refs., 6 figs

  18. Performance Comparison of Mainframe, Workstations, Clusters, and Desktop Computers

    Science.gov (United States)

    Farley, Douglas L.

    2005-01-01

    A performance evaluation of a variety of computers frequently found in a scientific or engineering research environment was conducted using a synthetic and application program benchmarks. From a performance perspective, emerging commodity processors have superior performance relative to legacy mainframe computers. In many cases, the PC clusters exhibited comparable performance with traditional mainframe hardware when 8-12 processors were used. The main advantage of the PC clusters was related to their cost. Regardless of whether the clusters were built from new computers or whether they were created from retired computers their performance to cost ratio was superior to the legacy mainframe computers. Finally, the typical annual maintenance cost of legacy mainframe computers is several times the cost of new equipment such as multiprocessor PC workstations. The savings from eliminating the annual maintenance fee on legacy hardware can result in a yearly increase in total computational capability for an organization.

  19. Centralized multiprocessor control system for the frascati storage rings DAΦNE

    International Nuclear Information System (INIS)

    Di Pirro, G.; Milardi, C.; Serio, M.

    1992-01-01

    We describe the status of the DANTE (DAΦne New Tools Environment) control system for the new DAΦNE Φ-factory under construction at the Frascati National Laboratories. The system is based on a centralized communication architecture for simplicity and reliability. A central processor unit coordinates all communications between the consoles and the lower level distributed processing power, and continuously updates a central memory that contains the whole machine status. We have developed a system of VME Fiber Optic interfaces allowing very fast point to point communication between distant processors. Macintosh II personal computers are used as consoles. The lower levels are all built using the VME standard. (author)

  20. The Computational Physics Program of the national MFE Computer Center

    Energy Technology Data Exchange (ETDEWEB)

    Mirin, A.A.

    1989-01-01

    Since June 1974, the MFE Computer Center has been engaged in a significant computational physics effort. The principal objective of the Computational Physics Group is to develop advanced numerical models for the investigation of plasma phenomena and the simulation of present and future magnetic confinement devices. Another major objective of the group is to develop efficient algorithms and programming techniques for current and future generations of supercomputers. The Computational Physics Group has been involved in several areas of fusion research. One main area is the application of Fokker-Planck/quasilinear codes to tokamaks. Another major area is the investigation of resistive magnetohydrodynamics in three dimensions, with applications to tokamaks and compact toroids. A third area is the investigation of kinetic instabilities using a 3-D particle code; this work is often coupled with the task of numerically generating equilibria which model experimental devices. Ways to apply statistical closure approximations to study tokamak-edge plasma turbulence have been under examination, with the hope of being able to explain anomalous transport. Also, we are collaborating in an international effort to evaluate fully three-dimensional linear stability of toroidal devices. In addition to these computational physics studies, the group has developed a number of linear systems solvers for general classes of physics problems and has been making a major effort at ascertaining how to efficiently utilize multiprocessor computers. A summary of these programs are included in this paper. 6 tabs.

  1. The use of a multi-processor minicomputer for communication system simulation

    Science.gov (United States)

    Binder, R.; Kuo, F. F.

    1974-01-01

    An experimental facility is described which allows new computer communications techniques to be tested under conditions closely approximating those of real systems. A three-processor minicomputer configuration is used to achieve real-time operation at channel transmission rates of up to 50 Kbits per second. One processor runs a channel controller-concentrator program, a second is dedicated to simulation of the communication channel characteristics, and the third to the simulation of up to 1000 user terminals. The latter are divided into classes consisting of interactive time-sharing users of differing characteristics and file nodes, mixed in different proportions. Real user nodes are connected to the channel simulator processor, providing experience with actual operating characteristics under different channel loadings.

  2. A heterogeneous hierarchical architecture for real-time computing

    Energy Technology Data Exchange (ETDEWEB)

    Skroch, D.A.; Fornaro, R.J.

    1988-12-01

    The need for high-speed data acquisition and control algorithms has prompted continued research in the area of multiprocessor systems and related programming techniques. The result presented here is a unique hardware and software architecture for high-speed real-time computer systems. The implementation of a prototype of this architecture has required the integration of architecture, operating systems and programming languages into a cohesive unit. This report describes a Heterogeneous Hierarchial Architecture for Real-Time (H{sup 2} ART) and system software for program loading and interprocessor communication.

  3. Design for scalability in 3D computer graphics architectures

    DEFF Research Database (Denmark)

    Holten-Lund, Hans Erik

    2002-01-01

    This thesis describes useful methods and techniques for designing scalable hybrid parallel rendering architectures for 3D computer graphics. Various techniques for utilizing parallelism in a pipelines system are analyzed. During the Ph.D study a prototype 3D graphics architecture named Hybris has...... been developed. Hybris is a prototype rendering architeture which can be tailored to many specific 3D graphics applications and implemented in various ways. Parallel software implementations for both single and multi-processor Windows 2000 system have been demonstrated. Working hardware...... as a case study and an application of the Hybris graphics architecture....

  4. Asynchronous and corrected-asynchronous numerical solutions of parabolic PDES on MIMD multiprocessors

    Science.gov (United States)

    Amitai, Dganit; Averbuch, Amir; Itzikowitz, Samuel; Turkel, Eli

    1991-01-01

    A major problem in achieving significant speed-up on parallel machines is the overhead involved with synchronizing the concurrent process. Removing the synchronization constraint has the potential of speeding up the computation. The authors present asynchronous (AS) and corrected-asynchronous (CA) finite difference schemes for the multi-dimensional heat equation. Although the discussion concentrates on the Euler scheme for the solution of the heat equation, it has the potential for being extended to other schemes and other parabolic partial differential equations (PDEs). These schemes are analyzed and implemented on the shared memory multi-user Sequent Balance machine. Numerical results for one and two dimensional problems are presented. It is shown experimentally that the synchronization penalty can be about 50 percent of run time: in most cases, the asynchronous scheme runs twice as fast as the parallel synchronous scheme. In general, the efficiency of the parallel schemes increases with processor load, with the time level, and with the problem dimension. The efficiency of the AS may reach 90 percent and over, but it provides accurate results only for steady-state values. The CA, on the other hand, is less efficient, but provides more accurate results for intermediate (non steady-state) values.

  5. Parallel computation of aerodynamic influence coefficients for aeroelastic analysis on a transputer network

    Science.gov (United States)

    Janetzke, D. C.; Murthy, D. V.

    1991-01-01

    Aeroelastic analysis is mult-disciplinary and computationally expensive. Hence, it can greatly benefit from parallel processing. As part of an effort to develop an aeroelastic analysis capability on a distributed-memory transputer network, a parallel algorithm for the computation of aerodynamic influence coefficients is implemented on a network of 32 transputers. The aerodynamic influence coefficients are calculated using a three-dimensional unsteady aerodynamic model and a panel discretization. Efficiencies up to 85 percent are demonstrated using 32 processors. The effects of subtask ordering, problem size and network topology are presented. A comparison to results on a shared-memory computer indicates that higher speedup is achieved on the distributed-memory system.

  6. The Same-Source Parallel MM5

    Directory of Open Access Journals (Sweden)

    John Michalakes

    2000-01-01

    Full Text Available Beginning with the March 1998 release of the Penn State University/NCAR Mesoscale Model (MM5, and continuing through eight subsequent releases up to the present, the official version has run on distributed -memory (DM parallel computers. Source translation and runtime library support minimize the impact of parallelization on the original model source code, with the result that the majority of code is line-for-line identical with the original version. Parallel performance and scaling are equivalent to earlier, hand-parallelized versions; the modifications have no effect when the code is compiled and run without the DM option. Supported computers include the IBM SP, Cray T3E, Fujitsu VPP, Compaq Alpha clusters, and clusters of PCs (so-called Beowulf clusters. The approach also is compatible with shared-memory parallel directives, allowing distributed-memory/shared-memory hybrid parallelization on distributed-memory clusters of symmetric multiprocessors.

  7. High performance parallel computers for science: New developments at the Fermilab advanced computer program

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.

    1988-08-01

    Fermilab's Advanced Computer Program (ACP) has been developing highly cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 MFlops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction. 10 refs., 7 figs

  8. Optimal dynamic remapping of data parallel computations

    Science.gov (United States)

    Nicol, David M.; Reynolds, Paul F., Jr.

    1990-01-01

    A large class of data parallel computations is characterized by a sequence of phases, with phase changes occurring unpredictably. Dynamic remapping of the workload to processors may be required to maintain good performance. The problem considered, for which the utility of remapping and the future behavior of the workload are uncertain, arises when phases exhibit stable execution requirements during a given phase, but requirements change radically between phases. For these situations, a workload assignment generated for one phase may hinder performance during the next phase. This problem is treated formally for a probabilistic model of computation with at most two phases. The authors address the fundamental problem of balancing the expected remapping performance gain against the delay cost, and they derive the optimal remapping decision policy. The promise of the approach is shown by application to multiprocessor implementations of an adaptive gridding fluid dynamics program and to a battlefield simulation program.

  9. The computational physics program of the National MFE Computer Center

    International Nuclear Information System (INIS)

    Mirin, A.A.

    1988-01-01

    The principal objective of the Computational Physics Group is to develop advanced numerical models for the investigation of plasma phenomena and the simulation of present and future magnetic confinement devices. Another major objective of the group is to develop efficient algorithms and programming techniques for current and future generation of supercomputers. The computational physics group is involved in several areas of fusion research. One main area is the application of Fokker-Planck/quasilinear codes to tokamaks. Another major area is the investigation of resistive magnetohydrodynamics in three dimensions, with applications to compact toroids. Another major area is the investigation of kinetic instabilities using a 3-D particle code. This work is often coupled with the task of numerically generating equilibria which model experimental devices. Ways to apply statistical closure approximations to study tokamak-edge plasma turbulence are being examined. In addition to these computational physics studies, the group has developed a number of linear systems solvers for general classes of physics problems and has been making a major effort at ascertaining how to efficiently utilize multiprocessor computers

  10. Sierra toolkit computational mesh conceptual model

    International Nuclear Information System (INIS)

    Baur, David G.; Edwards, Harold Carter; Cochran, William K.; Williams, Alan B.; Sjaardema, Gregory D.

    2010-01-01

    The Sierra Toolkit computational mesh is a software library intended to support massively parallel multi-physics computations on dynamically changing unstructured meshes. This domain of intended use is inherently complex due to distributed memory parallelism, parallel scalability, heterogeneity of physics, heterogeneous discretization of an unstructured mesh, and runtime adaptation of the mesh. Management of this inherent complexity begins with a conceptual analysis and modeling of this domain of intended use; i.e., development of a domain model. The Sierra Toolkit computational mesh software library is designed and implemented based upon this domain model. Software developers using, maintaining, or extending the Sierra Toolkit computational mesh library must be familiar with the concepts/domain model presented in this report.

  11. Visualization and Data Analysis for High-Performance Computing

    Energy Technology Data Exchange (ETDEWEB)

    Sewell, Christopher Meyer [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-09-27

    This is a set of slides from a guest lecture for a class at the University of Texas, El Paso on visualization and data analysis for high-performance computing. The topics covered are the following: trends in high-performance computing; scientific visualization, such as OpenGL, ray tracing and volume rendering, VTK, and ParaView; data science at scale, such as in-situ visualization, image databases, distributed memory parallelism, shared memory parallelism, VTK-m, "big data", and then an analysis example.

  12. Iterative robust multiprocessor scheduling

    NARCIS (Netherlands)

    Adyanthaya, S.; Geilen, M.; Basten, T.; Voeten, J.; Schiffelers, R.

    2015-01-01

    General purpose platforms are characterized by unpredictable timing behavior. Real-time schedules of tasks on general purpose platforms need to be robust against variations in task execution times. We define robustness in terms of the expected number of tasks that miss deadlines. We present an

  13. Evict on write, a management strategy for a prefetch unit and/or first level cache in a multiprocessor system with speculative execution

    Science.gov (United States)

    Gara, Alan; Ohmacht, Martin

    2014-09-16

    In a multiprocessor system with at least two levels of cache, a speculative thread may run on a core processor in parallel with other threads. When the thread seeks to do a write to main memory, this access is to be written through the first level cache to the second level cache. After the write though, the corresponding line is deleted from the first level cache and/or prefetch unit, so that any further accesses to the same location in main memory have to be retrieved from the second level cache. The second level cache keeps track of multiple versions of data, where more than one speculative thread is running in parallel, while the first level cache does not have any of the versions during speculation. A switch allows choosing between modes of operation of a speculation blind first level cache.

  14. Parallel implementations of 2D explicit Euler solvers

    International Nuclear Information System (INIS)

    Giraud, L.; Manzini, G.

    1996-01-01

    In this work we present a subdomain partitioning strategy applied to an explicit high-resolution Euler solver. We describe the design of a portable parallel multi-domain code suitable for parallel environments. We present several implementations on a representative range of MlMD computers that include shared memory multiprocessors, distributed virtual shared memory computers, as well as networks of workstations. Computational results are given to illustrate the efficiency, the scalability, and the limitations of the different approaches. We discuss also the effect of the communication protocol on the optimal domain partitioning strategy for the distributed memory computers

  15. TCW: transcriptome computational workbench.

    Science.gov (United States)

    Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R

    2013-01-01

    The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw.

  16. Research on computer systems benchmarking

    Science.gov (United States)

    Smith, Alan Jay (Principal Investigator)

    1996-01-01

    This grant addresses the topic of research on computer systems benchmarking and is more generally concerned with performance issues in computer systems. This report reviews work in those areas during the period of NASA support under this grant. The bulk of the work performed concerned benchmarking and analysis of CPUs, compilers, caches, and benchmark programs. The first part of this work concerned the issue of benchmark performance prediction. A new approach to benchmarking and machine characterization was reported, using a machine characterizer that measures the performance of a given system in terms of a Fortran abstract machine. Another report focused on analyzing compiler performance. The performance impact of optimization in the context of our methodology for CPU performance characterization was based on the abstract machine model. Benchmark programs are analyzed in another paper. A machine-independent model of program execution was developed to characterize both machine performance and program execution. By merging these machine and program characterizations, execution time can be estimated for arbitrary machine/program combinations. The work was continued into the domain of parallel and vector machines, including the issue of caches in vector processors and multiprocessors. All of the afore-mentioned accomplishments are more specifically summarized in this report, as well as those smaller in magnitude supported by this grant.

  17. Particle orbit tracking on a parallel computer: Hypertrack

    International Nuclear Information System (INIS)

    Cole, B.; Bourianoff, G.; Pilat, F.; Talman, R.

    1991-05-01

    A program has been written which performs particle orbit tracking on the Intel iPSC/860 distributed memory parallel computer. The tracking is performed using a thin element approach. A brief description of the structure and performance of the code is presented, along with applications of the code to the analysis of accelerator lattices for the SSC. The concept of ''ensemble tracking'', i.e. the tracking of ensemble averages of noninteracting particles, such as the emittance, is presented. Preliminary results of such studies will be presented. 2 refs., 6 figs

  18. Remarks on the development of a multiblock three-dimensional Euler code for out of core and multiprocessor calculations

    International Nuclear Information System (INIS)

    Jameson, A.; Leicher, S.; Dawson, J.; Tel Aviv Univ., Israel)

    1985-01-01

    A multiblock modification of the FLO57 code for three-dimensional wing calculations is described and demonstrated. The theoretical basis of the multistage time-stepping algorithm is reviewed; the multiblock grid structure is explained; and results from a computation of vortical flow past a delta wing, using 2.5 x 10 to the 6th grid points and performed on a Cray X/MP computer with a 128-Mword solid-state storage device, are presented graphically. 6 references

  19. A RISC multiprocessor event trigger for the data acquisition system of the H1 experiment at HERA

    International Nuclear Information System (INIS)

    Campbell, A.J.

    1991-09-01

    In late 1991 HERA will for the first time collide stored beams of electrons and protons. This paper describes the multiple (RISC) modern reduced instruction set processor system for online event filtering and reconstruction installed within the data acquisition system of the H1 experiment. Data is processed at a continuous average rate of ∼ 6 Mbytes/s in parallel by ∼ 20 R3000 VMEbus based monoboard computers providing some 400 mips computing power. (author)

  20. A new taxonomy for distributed computer systems based upon operating system structure

    Science.gov (United States)

    Foudriat, E. C.

    1985-01-01

    Characteristics of the resource structure found in the operating system are considered as a mechanism for classifying distributed computer systems. Since the operating system resources, themselves, are too diversified to provide a consistent classification, the structure upon which resources are built and shared are examined. The location and control character of this indivisibility provides the taxonomy for separating uniprocessors, computer networks, network computers (fully distributed processing systems or decentralized computers) and algorithm and/or data control multiprocessors. The taxonomy is important because it divides machines into a classification that is relevant or important to the client and not the hardware architect. It also defines the character of the kernel O/S structure needed for future computer systems. What constitutes an operating system for a fully distributed processor is discussed in detail.

  1. Fermilab advanced computer program multi-microprocessor project

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Biel, J.

    1985-06-01

    Fermilab's Advanced Computer Program is constructing a powerful 128 node multi-microprocessor system for data analysis in high-energy physics. The system will use commercial 32-bit microprocessors programmed in Fortran-77. Extensive software supports easy migration of user applications from a uniprocessor environment to the multiprocessor and provides sophisticated program development, debugging, and error handling and recovery tools. This system is designed to be readily copied, providing computing cost effectiveness of below $2200 per VAX 11/780 equivalent. The low cost, commercial availability, compatibility with off-line analysis programs, and high data bandwidths (up to 160 MByte/sec) make the system an ideal choice for applications to on-line triggers as well as an offline data processor

  2. Multicore Challenges and Benefits for High Performance Scientific Computing

    Directory of Open Access Journals (Sweden)

    Ida M.B. Nielsen

    2008-01-01

    Full Text Available Until recently, performance gains in processors were achieved largely by improvements in clock speeds and instruction level parallelism. Thus, applications could obtain performance increases with relatively minor changes by upgrading to the latest generation of computing hardware. Currently, however, processor performance improvements are realized by using multicore technology and hardware support for multiple threads within each core, and taking full advantage of this technology to improve the performance of applications requires exposure of extreme levels of software parallelism. We will here discuss the architecture of parallel computers constructed from many multicore chips as well as techniques for managing the complexity of programming such computers, including the hybrid message-passing/multi-threading programming model. We will illustrate these ideas with a hybrid distributed memory matrix multiply and a quantum chemistry algorithm for energy computation using Møller–Plesset perturbation theory.

  3. Building the Teraflops/Petabytes Production Computing Center

    International Nuclear Information System (INIS)

    Kramer, William T.C.; Lucas, Don; Simon, Horst D.

    1999-01-01

    In just one decade, the 1990s, supercomputer centers have undergone two fundamental transitions which require rethinking their operation and their role in high performance computing. The first transition in the early to mid-1990s resulted from a technology change in high performance computing architecture. Highly parallel distributed memory machines built from commodity parts increased the operational complexity of the supercomputer center, and required the introduction of intellectual services as equally important components of the center. The second transition is happening in the late 1990s as centers are introducing loosely coupled clusters of SMPs as their premier high performance computing platforms, while dealing with an ever-increasing volume of data. In addition, increasing network bandwidth enables new modes of use of a supercomputer center, in particular, computational grid applications. In this paper we describe what steps NERSC is taking to address these issues and stay at the leading edge of supercomputing centers.; N

  4. Parallel Computing Strategies for Irregular Algorithms

    Science.gov (United States)

    Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

  5. Compiling for Application Specific Computational Acceleration in Reconfigurable Architectures Final Report CRADA No. TSB-2033-01

    Energy Technology Data Exchange (ETDEWEB)

    De Supinski, B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Caliga, D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2017-09-28

    The primary objective of this project was to develop memory optimization technology to efficiently deliver data to, and distribute data within, the SRC-6's Field Programmable Gate Array- ("FPGA") based Multi-Adaptive Processors (MAPs). The hardware/software approach was to explore efficient MAP configurations and generate the compiler technology to exploit those configurations. This memory accessing technology represents an important step towards making reconfigurable symmetric multi-processor (SMP) architectures that will be a costeffective solution for large-scale scientific computing.

  6. Development of a parallel DBMS on the basis of PostgreSQL

    OpenAIRE

    Pan, C.

    2011-01-01

    The paper describes the architecture and the design of PargreSQL parallel database management system (DBMS) for distributed memory multiprocessors. PargreSQL is based upon PostgreSQL open-source DBMS and exploits partitioned parallelism.

  7. The Design and Evaluation of "CAPTools"--A Computer Aided Parallelization Toolkit

    Science.gov (United States)

    Yan, Jerry; Frumkin, Michael; Hribar, Michelle; Jin, Haoqiang; Waheed, Abdul; Johnson, Steve; Cross, Jark; Evans, Emyr; Ierotheou, Constantinos; Leggett, Pete; hide

    1998-01-01

    Writing applications for high performance computers is a challenging task. Although writing code by hand still offers the best performance, it is extremely costly and often not very portable. The Computer Aided Parallelization Tools (CAPTools) are a toolkit designed to help automate the mapping of sequential FORTRAN scientific applications onto multiprocessors. CAPTools consists of the following major components: an inter-procedural dependence analysis module that incorporates user knowledge; a 'self-propagating' data partitioning module driven via user guidance; an execution control mask generation and optimization module for the user to fine tune parallel processing of individual partitions; a program transformation/restructuring facility for source code clean up and optimization; a set of browsers through which the user interacts with CAPTools at each stage of the parallelization process; and a code generator supporting multiple programming paradigms on various multiprocessors. Besides describing the rationale behind the architecture of CAPTools, the parallelization process is illustrated via case studies involving structured and unstructured meshes. The programming process and the performance of the generated parallel programs are compared against other programming alternatives based on the NAS Parallel Benchmarks, ARC3D and other scientific applications. Based on these results, a discussion on the feasibility of constructing architectural independent parallel applications is presented.

  8. Centaure: an heterogeneous parallel architecture for computer vision

    International Nuclear Information System (INIS)

    Peythieux, Marc

    1997-01-01

    This dissertation deals with the architecture of parallel computers dedicated to computer vision. In the first chapter, the problem to be solved is presented, as well as the architecture of the Sympati and Symphonie computers, on which this work is based. The second chapter is about the state of the art of computers and integrated processors that can execute computer vision and image processing codes. The third chapter contains a description of the architecture of Centaure. It has an heterogeneous structure: it is composed of a multiprocessor system based on Analog Devices ADSP21060 Sharc digital signal processor, and of a set of Symphonie computers working in a multi-SIMD fashion. Centaure also has a modular structure. Its basic node is composed of one Symphonie computer, tightly coupled to a Sharc thanks to a dual ported memory. The nodes of Centaure are linked together by the Sharc communication links. The last chapter deals with a performance validation of Centaure. The execution times on Symphonie and on Centaure of a benchmark which is typical of industrial vision, are presented and compared. In the first place, these results show that the basic node of Centaure allows a faster execution than Symphonie, and that increasing the size of the tested computer leads to a better speed-up with Centaure than with Symphonie. In the second place, these results validate the choice of running the low level structure of Centaure in a multi- SIMD fashion. (author) [fr

  9. Distributed Memory Programming on Many-Cores

    DEFF Research Database (Denmark)

    Berthold, Jost; Dieterle, Mischa; Lobachev, Oleg

    2009-01-01

    is tailored to networks of workstations. Recent work has shown that this implementation shows surprisingly competitive performance on many-core machines, compared to dedicated shared-memory implementations of parallel Haskell. In the paper we describe a case study with different Eden divide......-and-conquer skeletons. We analyse their performance comparing example applications implemented using these Eden skeletons against parallel Haskell implementations using shared memory on many-core machines...

  10. Distributed Memory Programming on Many-Cores

    DEFF Research Database (Denmark)

    Berthold, Jost; Dieterle, Mischa; Lobachev, Oleg

    2009-01-01

    Eden is a parallel extension of the lazy functional language Haskell providing dynamic process creation and automatic data exchange. As a Haskell extension, Eden takes a high-level approach to parallel programming and thereby simplifies parallel program development. The current implementation...

  11. TCW: transcriptome computational workbench.

    Directory of Open Access Journals (Sweden)

    Carol Soderlund

    Full Text Available BACKGROUND: The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. METHODOLOGY: The Transcriptome Computational Workbench (TCW provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms. The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina or assembling long sequences (e.g. Sanger, 454, transcripts, annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. CONCLUSION: It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the

  12. High-reliability logic system evaluation of a programmed multiprocessor solution. Application in the nuclear reactor safety field

    International Nuclear Information System (INIS)

    Lallement, Dominique.

    1979-01-01

    Nuclear reactors are monitored by several systems combined. The hydraulic and mechanical limitations on the equipment and the heat transfer requirements in the core set a reliable working range for the boiler defined with certain safety margins. The control system tends to keep the power plant within this working range. The protection system covers all the electrical and mechanical equipment needed to safeguard the boiler in the event of abnormal transients or accidents accounted for in the design of the plant. On units in service protection is handled by cabled automatic systems. For better reliability and safety operation, greater flexibility of use (modularity, adaptability) and improved start-up criteria by data processing the tendency is to use digital programmed systems. Computers are already present in control systems but their introduction into protection systems meets with some reticence on the part of the nuclear safety authorities. A study on the replacement of conventional by digital protection systems is presented. From choices partly made on the principles which should govern the hardware and software of a protection system the reliability of different structures and elements was examined and an experimental model built with its simulator and test facilities. A prototype based on these options and studies is being built and is to be set up on one of the CEN-G reactors for tests [fr

  13. Data-Parallel Programming in a Multithreaded Environment

    Directory of Open Access Journals (Sweden)

    Matthew Haines

    1997-01-01

    Full Text Available Research on programming distributed memory multiprocessors has resulted in a well-understood programming model, namely data-parallel programming. However, data-parallel programming in a multithreaded environment is far less understood. For example, if multiple threads within the same process belong to different data-parallel computations, then the architecture, compiler, or run-time system must ensure that relative indexing and collective operations are handled properly and efficiently. We introduce a run-time-based solution for data-parallel programming in a distributed memory environment that handles the problems of relative indexing and collective communications among thread groups. As a result, the data-parallel programming model can now be executed in a multithreaded environment, such as a system using threads to support both task and data parallelism.

  14. A manual for PARTI runtime primitives

    Science.gov (United States)

    Berryman, Harry; Saltz, Joel

    1990-01-01

    Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.

  15. A manual for PARTI runtime primitives, revision 1

    Science.gov (United States)

    Das, Raja; Saltz, Joel; Berryman, Harry

    1991-01-01

    Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.

  16. Contributing to the design of run-time systems dedicated to high performance computing

    International Nuclear Information System (INIS)

    Perache, M.

    2006-10-01

    In the field of intensive scientific computing, the quest for performance has to face the increasing complexity of parallel architectures. Nowadays, these machines exhibit a deep memory hierarchy which complicates the design of efficient parallel applications. This thesis proposes a programming environment allowing to design efficient parallel programs on top of clusters of multi-processors. It features a programming model centered around collective communications and synchronizations, and provides load balancing facilities. The programming interface, named MPC, provides high level paradigms which are optimized according to the underlying architecture. The environment is fully functional and used within the CEA/DAM (TERANOVA) computing center. The evaluations presented in this document confirm the relevance of our approach. (author)

  17. Parallel Computer System for 3D Visualization Stereo on GPU

    Science.gov (United States)

    Al-Oraiqat, Anas M.; Zori, Sergii A.

    2018-03-01

    This paper proposes the organization of a parallel computer system based on Graphic Processors Unit (GPU) for 3D stereo image synthesis. The development is based on the modified ray tracing method developed by the authors for fast search of tracing rays intersections with scene objects. The system allows significant increase in the productivity for the 3D stereo synthesis of photorealistic quality. The generalized procedure of 3D stereo image synthesis on the Graphics Processing Unit/Graphics Processing Clusters (GPU/GPC) is proposed. The efficiency of the proposed solutions by GPU implementation is compared with single-threaded and multithreaded implementations on the CPU. The achieved average acceleration in multi-thread implementation on the test GPU and CPU is about 7.5 and 1.6 times, respectively. Studying the influence of choosing the size and configuration of the computational Compute Unified Device Archi-tecture (CUDA) network on the computational speed shows the importance of their correct selection. The obtained experimental estimations can be significantly improved by new GPUs with a large number of processing cores and multiprocessors, as well as optimized configuration of the computing CUDA network.

  18. Wing-Body Aeroelasticity Using Finite-Difference Fluid/Finite-Element Structural Equations on Parallel Computers

    Science.gov (United States)

    Byun, Chansup; Guruswamy, Guru P.; Kutler, Paul (Technical Monitor)

    1994-01-01

    In recent years significant advances have been made for parallel computers in both hardware and software. Now parallel computers have become viable tools in computational mechanics. Many application codes developed on conventional computers have been modified to benefit from parallel computers. Significant speedups in some areas have been achieved by parallel computations. For single-discipline use of both fluid dynamics and structural dynamics, computations have been made on wing-body configurations using parallel computers. However, only a limited amount of work has been completed in combining these two disciplines for multidisciplinary applications. The prime reason is the increased level of complication associated with a multidisciplinary approach. In this work, procedures to compute aeroelasticity on parallel computers using direct coupling of fluid and structural equations will be investigated for wing-body configurations. The parallel computer selected for computations is an Intel iPSC/860 computer which is a distributed-memory, multiple-instruction, multiple data (MIMD) computer with 128 processors. In this study, the computational efficiency issues of parallel integration of both fluid and structural equations will be investigated in detail. The fluid and structural domains will be modeled using finite-difference and finite-element approaches, respectively. Results from the parallel computer will be compared with those from the conventional computers using a single processor. This study will provide an efficient computational tool for the aeroelastic analysis of wing-body structures on MIMD type parallel computers.

  19. Cross-scale Efficient Tensor Contractions for Coupled Cluster Computations Through Multiple Programming Model Backends

    Energy Technology Data Exchange (ETDEWEB)

    Ibrahim, Khaled Z. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Epifanovsky, Evgeny [Q-Chem, Inc., Pleasanton, CA (United States); Williams, Samuel W. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Krylov, Anna I. [Univ. of Southern California, Los Angeles, CA (United States). Dept. of Chemistry

    2016-07-26

    Coupled-cluster methods provide highly accurate models of molecular structure by explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix-matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts to extend the Libtensor framework to work in the distributed memory environment in a scalable and energy efficient manner. We achieve up to 240 speedup compared with the best optimized shared memory implementation. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures, (Cray XC30&XC40, BlueGene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance. Nevertheless, we preserve a uni ed interface to both programming models to maintain the productivity of computational quantum chemists.

  20. [Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].

    Science.gov (United States)

    Furuta, Takuya; Sato, Tatsuhiko

    2015-01-01

    Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.

  1. VLSI Based Multiprocessor Communications Networks.

    Science.gov (United States)

    1982-09-01

    bar . CB, and the modified banyan, BA [ 141, [15]. Thus of intermodule connections is rlogN N1. Here, because of the con- 7" - CB or 7 - BA. While...MICROCOPY RESOLUTION TESI CHART NATONAL BUREAU 01 STAN[APC IS 9 .A Vdd GROUND F1 INPUTS CLOCK 02 CLOCK 01c-- __ ___ FROM PLA OUTPUTS Figure 9: Layout of... bAr 1 v AAL AAr -A - A -- T X bAR • 7"i ’ -’"-" ------ .. ...-- Figure 10(a): Petri net for the one dimensional, asynchronous systolic array module

  2. A Distributed Computing Infrastructure for Computational Thermodynamic Calculations of Solid-Liquid Phase Equilibria

    Science.gov (United States)

    Ghiorso, M. S.; Kress, V. C.

    2004-12-01

    routines is being accessed. Fourth, the flexibility of calling library functions means that the client has more control over the configuration and output of the MELTS calculation. Fifth, if the client computer is a multi-processor compute cluster capable of issuing parallel requests to the MELTS "remote" library, then these requests may be in turn parallelized to the server compute cluster to enhance throughput and performance. Application of this computational model to fluid dynamical simulations of melting and transport in the Earth's mantle is envisioned. Further information and example clients for utilizing the current prototype library for distributed computing applications can be found at http://melts.uchicago.edu.

  3. Distributed computing feasibility in a non-dedicated homogeneous distributed system

    Science.gov (United States)

    Leutenegger, Scott T.; Sun, Xian-He

    1993-01-01

    The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. The feasibility of such a non-dedicated parallel processing environment assuming workstation processes have preemptive priority over parallel tasks is addressed. An analytical model is developed to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. A new term task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. It was proposed that task ratio is a useful metric for determining how large the demand of a parallel applications must be in order to make efficient use of a non-dedicated distributed system.

  4. MEDUSA - An overset grid flow solver for network-based parallel computer systems

    Science.gov (United States)

    Smith, Merritt H.; Pallis, Jani M.

    1993-01-01

    Continuing improvement in processing speed has made it feasible to solve the Reynolds-Averaged Navier-Stokes equations for simple three-dimensional flows on advanced workstations. Combining multiple workstations into a network-based heterogeneous parallel computer allows the application of programming principles learned on MIMD (Multiple Instruction Multiple Data) distributed memory parallel computers to the solution of larger problems. An overset-grid flow solution code has been developed which uses a cluster of workstations as a network-based parallel computer. Inter-process communication is provided by the Parallel Virtual Machine (PVM) software. Solution speed equivalent to one-third of a Cray-YMP processor has been achieved from a cluster of nine commonly used engineering workstation processors. Load imbalance and communication overhead are the principal impediments to parallel efficiency in this application.

  5. Computer Sciences and Data Systems, volume 1

    Science.gov (United States)

    1987-01-01

    Topics addressed include: software engineering; university grants; institutes; concurrent processing; sparse distributed memory; distributed operating systems; intelligent data management processes; expert system for image analysis; fault tolerant software; and architecture research.

  6. Performance evaluation of scientific programs on advanced architecture computers

    International Nuclear Information System (INIS)

    Walker, D.W.; Messina, P.; Baille, C.F.

    1988-01-01

    Recently a number of advanced architecture machines have become commercially available. These new machines promise better cost-performance then traditional computers, and some of them have the potential of competing with current supercomputers, such as the Cray X/MP, in terms of maximum performance. This paper describes an on-going project to evaluate a broad range of advanced architecture computers using a number of complete scientific application programs. The computers to be evaluated include distributed- memory machines such as the NCUBE, INTEL and Caltech/JPL hypercubes, and the MEIKO computing surface, shared-memory, bus architecture machines such as the Sequent Balance and the Alliant, very long instruction word machines such as the Multiflow Trace 7/200 computer, traditional supercomputers such as the Cray X.MP and Cray-2, and SIMD machines such as the Connection Machine. Currently 11 application codes from a number of scientific disciplines have been selected, although it is not intended to run all codes on all machines. Results are presented for two of the codes (QCD and missile tracking), and future work is proposed

  7. Combined Scheduling and Mapping for Scalable Computing with Parallel Tasks

    Directory of Open Access Journals (Sweden)

    Jörg Dümmler

    2012-01-01

    Full Text Available Recent and future parallel clusters and supercomputers use symmetric multiprocessors (SMPs and multi-core processors as basic nodes, providing a huge amount of parallel resources. These systems often have hierarchically structured interconnection networks combining computing resources at different levels, starting with the interconnect within multi-core processors up to the interconnection network combining nodes of the cluster or supercomputer. The challenge for the programmer is that these computing resources should be utilized efficiently by exploiting the available degree of parallelism of the application program and by structuring the application in a way which is sensitive to the heterogeneous interconnect. In this article, we pursue a parallel programming method using parallel tasks to structure parallel implementations. A parallel task can be executed by multiple processors or cores and, for each activation of a parallel task, the actual number of executing cores can be adapted to the specific execution situation. In particular, we propose a new combined scheduling and mapping technique for parallel tasks with dependencies that takes the hierarchical structure of modern multi-core clusters into account. An experimental evaluation shows that the presented programming approach can lead to a significantly higher performance compared to standard data parallel implementations.

  8. Grid Computing

    Indian Academy of Sciences (India)

    IAS Admin

    A computing grid interconnects resources such as high per- formance computers, scientific databases, and computer- controlled scientific instruments of cooperating organiza- tions each of which is autonomous. It precedes and is quite different from cloud computing, which provides computing resources by vendors to ...

  9. Computer group

    International Nuclear Information System (INIS)

    Bauer, H.; Black, I.; Heusler, A.; Hoeptner, G.; Krafft, F.; Lang, R.; Moellenkamp, R.; Mueller, W.; Mueller, W.F.; Schati, C.; Schmidt, A.; Schwind, D.; Weber, G.

    1983-01-01

    The computer groups has been reorganized to take charge for the general purpose computers DEC10 and VAX and the computer network (Dataswitch, DECnet, IBM - connections to GSI and IPP, preparation for Datex-P). (orig.)

  10. Radon transform computer

    International Nuclear Information System (INIS)

    Current, W.; Hurst, P.; Ford, G.; Shieh, E.; Agi, I.; Nguyen, C.; Peterson, D.

    1990-01-01

    In this final report, we summarize some of our results from September 1989 to October 1990. The design, construction, and testing of a four-processor prototype multi-processor (RTP) board using TI TMS320C25 DSP chips has been completed. We are now finishing the extensive detailed final documentation of the RTP hardware and software. This extensive documentation will be provided to Steve Azevedo when we return the borrowed workstation and deliver the RTP to LLNL. A summary of the test results are in Section II. The design of our fully custom CMOS VLSI chip has been completed. The chip has been designed, the layout completed, and the chip is now going through its final pre-fabrication simulations. The present status of the custom chip design activity will be summarized in the separately submitted ''Final Report on the ''Real-Time Multi-Dimensional Processing Hardware Designs'' Project.'' Evaluations of the hardware requirements for fast filtering of data for filtered backprojection have been completed and are summarized. We briefly summarize the test results of the TMS320 multi-processor prototype RTP board evaluation

  11. Computer Music

    Science.gov (United States)

    Cook, Perry R.

    This chapter covers algorithms, technologies, computer languages, and systems for computer music. Computer music involves the application of computers and other digital/electronic technologies to music composition, performance, theory, history, and the study of perception. The field combines digital signal processing, computational algorithms, computer languages, hardware and software systems, acoustics, psychoacoustics (low-level perception of sounds from the raw acoustic signal), and music cognition (higher-level perception of musical style, form, emotion, etc.).

  12. The DANTE Boltzmann transport solver: An unstructured mesh, 3-D, spherical harmonics algorithm compatible with parallel computer architectures

    International Nuclear Information System (INIS)

    McGhee, J.M.; Roberts, R.M.; Morel, J.E.

    1997-01-01

    A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner for scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated

  13. Many-core technologies: The move to energy-efficient, high-throughput x86 computing (TFLOPS on a chip)

    CERN Multimedia

    CERN. Geneva

    2012-01-01

    With Moore's Law alive and well, more and more parallelism is introduced into all computing platforms at all levels of integration and programming to achieve higher performance and energy efficiency. Especially in the area of High-Performance Computing (HPC) users can entertain a combination of different hardware and software parallel architectures and programming environments. Those technologies range from vectorization and SIMD computation over shared memory multi-threading (e.g. OpenMP) to distributed memory message passing (e.g. MPI) on cluster systems. We will discuss HPC industry trends and Intel's approach to it from processor/system architectures and research activities to hardware and software tools technologies. This includes the recently announced new Intel(r) Many Integrated Core (MIC) architecture for highly-parallel workloads and general purpose, energy efficient TFLOPS performance, some of its architectural features and its programming environment. At the end we will have a br...

  14. Computational composites

    DEFF Research Database (Denmark)

    Vallgårda, Anna K. A.; Redström, Johan

    2007-01-01

    Computational composite is introduced as a new type of composite material. Arguing that this is not just a metaphorical maneuver, we provide an analysis of computational technology as material in design, which shows how computers share important characteristics with other materials used in design...... and architecture. We argue that the notion of computational composites provides a precise understanding of the computer as material, and of how computations need to be combined with other materials to come to expression as material. Besides working as an analysis of computers from a designer’s point of view......, the notion of computational composites may also provide a link for computer science and human-computer interaction to an increasingly rapid development and use of new materials in design and architecture....

  15. Heuristic framework for parallel sorting computations | Nwanze ...

    African Journals Online (AJOL)

    Parallel sorting techniques have become of practical interest with the advent of new multiprocessor architectures. The decreasing cost of these processors will probably in the future, make the solutions that are derived thereof to be more appealing. Efficient algorithms for sorting scheme that are encountered in a number of ...

  16. Exploiting Data Sparsity for Large-Scale Matrix Computations

    KAUST Repository

    Akbudak, Kadir

    2018-02-24

    Exploiting data sparsity in dense matrices is an algorithmic bridge between architectures that are increasingly memory-austere on a per-core basis and extreme-scale applications. The Hierarchical matrix Computations on Manycore Architectures (HiCMA) library tackles this challenging problem by achieving significant reductions in time to solution and memory footprint, while preserving a specified accuracy requirement of the application. HiCMA provides a high-performance implementation on distributed-memory systems of one of the most widely used matrix factorization in large-scale scientific applications, i.e., the Cholesky factorization. It employs the tile low-rank data format to compress the dense data-sparse off-diagonal tiles of the matrix. It then decomposes the matrix computations into interdependent tasks and relies on the dynamic runtime system StarPU for asynchronous out-of-order scheduling, while allowing high user-productivity. Performance comparisons and memory footprint on matrix dimensions up to eleven million show a performance gain and memory saving of more than an order of magnitude for both metrics on thousands of cores, against state-of-the-art open-source and vendor optimized numerical libraries. This represents an important milestone in enabling large-scale matrix computations toward solving big data problems in geospatial statistics for climate/weather forecasting applications.

  17. Diseño e Implementación de un Multiprocessor Systems-on-Chip (MPSoC Interconectado por una Networks-on-Chip (NoC

    Directory of Open Access Journals (Sweden)

    Wilson Mauricio Chicaiza

    2013-11-01

    Full Text Available En el presente documento se presenta una breve caracterización de los medios de comunicación empleados en arquitecturas multiprocesadas. Esta caracterización tiene como objetivo principal el mostrar un nuevo modelo de comunicación basado en conmutación de paquetes a los cuales se les denomina como Networks-On-Chip (NoC. Esta publicación muestra una arquitectura de red llamada NoC Hermes, la cual fue interconectada a un Multiprocessor-Systems-on-Chip (MPSoC compuesto de cuatro procesadores MicroBlaze. Está conexión se la realizó gracias al diseño y desarrollo de una Interfaz de Red generada en código VHDL. Por medio de la Interfaz de Red se consiguió que los procesadores MicroBlaze interactúen con los Switches de Hermes a fin de crear una arquitectura multiprocesada interconectada por una NoC. Con el motivo de realizar comparaciones también se creó otra arquitectura de multiprocesadores interconectados por buses. Para ambas arquitecturas se desarrolló una aplicación de Esteganografía enla que existe multiprocesamiento de dos procesadores trabajando simultáneamente. Lamentablemente sobre dicha aplicación no fue posible medir directamente la latencia y el consumo de energía, razón por la cual se utilizó simuladores que permitieron estimar dichas mediciones.

  18. Computational Medicine

    DEFF Research Database (Denmark)

    Nygaard, Jens Vinge

    2017-01-01

    The Health Technology Program at Aarhus University applies computational biology to investigate the heterogeneity of tumours......The Health Technology Program at Aarhus University applies computational biology to investigate the heterogeneity of tumours...

  19. Computational Composites

    DEFF Research Database (Denmark)

    Vallgårda, Anna K. A.

    The problematic addressed in the dissertation is generally shaped by a sensation that something is amiss within the area of Ubiquitous Computing. Ubiquitous Computing as a vision—as a program—sets out to challenge the idea of the computer as a desktop computer and to explore the potential...... of the new microprocessors and network technologies. However, the understanding of the computer represented within this program poses a challenge for the intentions of the program. The computer is understood as a multitude of invisible intelligent information devices which confines the computer as a tool...... to solve well-defined problems within specified contexts—something that rarely exists in practice. Nonetheless, the computer will continue to grow more ubiquitous as moore's law still apply and as its components become ever cheaper. The question is how, and for what we will use it? How will it...

  20. Green Computing

    Directory of Open Access Journals (Sweden)

    K. Shalini

    2013-01-01

    Full Text Available Green computing is all about using computers in a smarter and eco-friendly way. It is the environmentally responsible use of computers and related resources which includes the implementation of energy-efficient central processing units, servers and peripherals as well as reduced resource consumption and proper disposal of electronic waste .Computers certainly make up a large part of many people lives and traditionally are extremely damaging to the environment. Manufacturers of computer and its parts have been espousing the green cause to help protect environment from computers and electronic waste in any way.Research continues into key areas such as making the use of computers as energy-efficient as Possible, and designing algorithms and systems for efficiency-related computer technologies.

  1. Grid Computing

    Indian Academy of Sciences (India)

    A computing grid interconnects resources such as high performancecomputers, scientific databases, and computercontrolledscientific instruments of cooperating organizationseach of which is autonomous. It precedes and is quitedifferent from cloud computing, which provides computingresources by vendors to customers ...

  2. Phenomenological Computation?

    DEFF Research Database (Denmark)

    Brier, Søren

    2014-01-01

    Open peer commentary on the article “Info-computational Constructivism and Cognition” by Gordana Dodig-Crnkovic. Upshot: The main problems with info-computationalism are: (1) Its basic concept of natural computing has neither been defined theoretically or implemented practically. (2. It cannot en...... cybernetics and Maturana and Varela’s theory of autopoiesis, which are both erroneously taken to support info-computationalism....

  3. Quantum computers and quantum computations

    International Nuclear Information System (INIS)

    Valiev, Kamil' A

    2005-01-01

    This review outlines the principles of operation of quantum computers and their elements. The theory of ideal computers that do not interact with the environment and are immune to quantum decohering processes is presented. Decohering processes in quantum computers are investigated. The review considers methods for correcting quantum computing errors arising from the decoherence of the state of the quantum computer, as well as possible methods for the suppression of the decohering processes. A brief enumeration of proposed quantum computer realizations concludes the review. (reviews of topical problems)

  4. Quantum Computing for Computer Architects

    CERN Document Server

    Metodi, Tzvetan

    2011-01-01

    Quantum computers can (in theory) solve certain problems far faster than a classical computer running any known classical algorithm. While existing technologies for building quantum computers are in their infancy, it is not too early to consider their scalability and reliability in the context of the design of large-scale quantum computers. To architect such systems, one must understand what it takes to design and model a balanced, fault-tolerant quantum computer architecture. The goal of this lecture is to provide architectural abstractions for the design of a quantum computer and to explore

  5. Pervasive Computing

    NARCIS (Netherlands)

    Silvis-Cividjian, N.

    This book provides a concise introduction to Pervasive Computing, otherwise known as Internet of Things (IoT) and Ubiquitous Computing (Ubicomp) which addresses the seamless integration of computing systems within everyday objects. By introducing the core topics and exploring assistive pervasive

  6. Computational vision

    CERN Document Server

    Wechsler, Harry

    1990-01-01

    The book is suitable for advanced courses in computer vision and image processing. In addition to providing an overall view of computational vision, it contains extensive material on topics that are not usually covered in computer vision texts (including parallel distributed processing and neural networks) and considers many real applications.

  7. Grid Computing

    CERN Document Server

    Yen, Eric

    2008-01-01

    Based on the Grid Computing: International Symposium on Grid Computing (ISGC) 2007, held in Taipei, Taiwan in March of 2007, this title presents the grid solutions and research results in grid operations, grid middleware, biomedical operations, and e-science applications. It is suitable for graduate-level students in computer science.

  8. Optical Computing

    Indian Academy of Sciences (India)

    Optics has been used in computing for a number of years but the main emphasis has been and continues to be to link portions of computers, for communications, or more intrin- sically in devices that have some optical application or component (optical pattern recognition, etc). Optical digi- tal computers are still some years ...

  9. Development of a software for a multi-processor system aimed at the on-line control of nuclear physics experiments

    International Nuclear Information System (INIS)

    Poggioli, Jean Renaud

    1984-01-01

    This research thesis reports the development of a software for an acquisition computer aimed at the on-line control of nuclear physics experiments. An original architecture, based on the assignment of a processor to each fundamental task, enables the implementation of a high performance system. In order to make the user free of programming constraints, the author developed a software for dynamic generation of acquisition and processing codes. These codes are created from a data base which is programmed by the user by using a language close to the physical reality. Procedures of interactive control of the experiment are thus simplified by displaying function menus on the operator terminal. The author evokes possible hardware improvements and possible extensions of the system [fr

  10. High performance statistical computing with parallel R: applications to biology and climate modelling

    International Nuclear Information System (INIS)

    Samatova, Nagiza F; Branstetter, Marcia; Ganguly, Auroop R; Hettich, Robert; Khan, Shiraj; Kora, Guruprasad; Li, Jiangtian; Ma, Xiaosong; Pan, Chongle; Shoshani, Arie; Yoginath, Srikanth

    2006-01-01

    Ultrascale computing and high-throughput experimental technologies have enabled the production of scientific data about complex natural phenomena. With this opportunity, comes a new problem - the massive quantities of data so produced. Answers to fundamental questions about the nature of those phenomena remain largely hidden in the produced data. The goal of this work is to provide a scalable high performance statistical data analysis framework to help scientists perform interactive analyses of these raw data to extract knowledge. Towards this goal we have been developing an open source parallel statistical analysis package, called Parallel R, that lets scientists employ a wide range of statistical analysis routines on high performance shared and distributed memory architectures without having to deal with the intricacies of parallelizing these routines

  11. Human Computation

    CERN Multimedia

    CERN. Geneva

    2008-01-01

    What if people could play computer games and accomplish work without even realizing it? What if billions of people collaborated to solve important problems for humanity or generate training data for computers? My work aims at a general paradigm for doing exactly that: utilizing human processing power to solve computational problems in a distributed manner. In particular, I focus on harnessing human time and energy for addressing problems that computers cannot yet solve. Although computers have advanced dramatically in many respects over the last 50 years, they still do not possess the basic conceptual intelligence or perceptual capabilities...

  12. Parallel computations

    CERN Document Server

    1982-01-01

    Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn

  13. Quantum computation

    International Nuclear Information System (INIS)

    Deutsch, D.

    1992-01-01

    As computers become ever more complex, they inevitably become smaller. This leads to a need for components which are fabricated and operate on increasingly smaller size scales. Quantum theory is already taken into account in microelectronics design. This article explores how quantum theory will need to be incorporated into computers in future in order to give them their components functionality. Computation tasks which depend on quantum effects will become possible. Physicists may have to reconsider their perspective on computation in the light of understanding developed in connection with universal quantum computers. (UK)

  14. Computer sciences

    Science.gov (United States)

    Smith, Paul H.

    1988-01-01

    The Computer Science Program provides advanced concepts, techniques, system architectures, algorithms, and software for both space and aeronautics information sciences and computer systems. The overall goal is to provide the technical foundation within NASA for the advancement of computing technology in aerospace applications. The research program is improving the state of knowledge of fundamental aerospace computing principles and advancing computing technology in space applications such as software engineering and information extraction from data collected by scientific instruments in space. The program includes the development of special algorithms and techniques to exploit the computing power provided by high performance parallel processors and special purpose architectures. Research is being conducted in the fundamentals of data base logic and improvement techniques for producing reliable computing systems.

  15. Contributing to the design of run-time systems dedicated to high performance computing; Contribution a l'elaboration d'environnements de programmation dedies au calcul scientifique hautes performances

    Energy Technology Data Exchange (ETDEWEB)

    Perache, M

    2006-10-15

    In the field of intensive scientific computing, the quest for performance has to face the increasing complexity of parallel architectures. Nowadays, these machines exhibit a deep memory hierarchy which complicates the design of efficient parallel applications. This thesis proposes a programming environment allowing to design efficient parallel programs on top of clusters of multi-processors. It features a programming model centered around collective communications and synchronizations, and provides load balancing facilities. The programming interface, named MPC, provides high level paradigms which are optimized according to the underlying architecture. The environment is fully functional and used within the CEA/DAM (TERANOVA) computing center. The evaluations presented in this document confirm the relevance of our approach. (author)

  16. Computer Literacy: Teaching Computer Ethics.

    Science.gov (United States)

    Troutner, Joanne

    1986-01-01

    Suggests learning activities for teaching computer ethics in three areas: (1) equal access; (2) computer crime; and (3) privacy. Topics include computer time, advertising, class enrollments, copyright law, sabotage ("worms"), the Privacy Act of 1974 and the Freedom of Information Act of 1966. (JM)

  17. Population-based learning of load balancing policies for a distributed computer system

    Science.gov (United States)

    Mehra, Pankaj; Wah, Benjamin W.

    1993-01-01

    Effective load-balancing policies use dynamic resource information to schedule tasks in a distributed computer system. We present a novel method for automatically learning such policies. At each site in our system, we use a comparator neural network to predict the relative speedup of an incoming task using only the resource-utilization patterns obtained prior to the task's arrival. Outputs of these comparator networks are broadcast periodically over the distributed system, and the resource schedulers at each site use these values to determine the best site for executing an incoming task. The delays incurred in propagating workload information and tasks from one site to another, as well as the dynamic and unpredictable nature of workloads in multiprogrammed multiprocessors, may cause the workload pattern at the time of execution to differ from patterns prevailing at the times of load-index computation and decision making. Our load-balancing policy accommodates this uncertainty by using certain tunable parameters. We present a population-based machine-learning algorithm that adjusts these parameters in order to achieve high average speedups with respect to local execution. Our results show that our load-balancing policy, when combined with the comparator neural network for workload characterization, is effective in exploiting idle resources in a distributed computer system.

  18. High-speed computation of the EM algorithm for PET image reconstruction

    International Nuclear Information System (INIS)

    Rajan, K.; Patnaik, L.M.; Ramakrishna, J.

    1994-01-01

    The PET image reconstruction based on the EM algorithm has several attractive advantages over the conventional convolution backprojection algorithms. However, two major drawbacks have impeded the routine use of the EM algorithm, namely, the long computational time due to slow convergence and the large memory required for the storage of the image, projection data and the probability matrix. In this study, the authors attempts to solve these two problems by parallelizing the EM algorithm on a multiprocessor system. The authors have implemented an extended hypercube (EH) architecture for the high-speed computation of the EM algorithm using the commercially available fast floating point digital signal processor (DSP) chips as the processing elements (PEs). The authors discuss and compare the performance of the EM algorithm on a 386/387 machine, CD 4360 mainframe, and on the EH system. The results show that the computational speed performance of an EH using DSP chips as PEs executing the EM image reconstruction algorithm is about 130 times better than that of the CD 4360 mainframe. The EH topology is expandable with more number of PEs

  19. Some computational challenges of developing efficient parallel algorithms for data-dependent computations in thermal-hydraulics supercomputer applications

    International Nuclear Information System (INIS)

    Woodruff, S.B.

    1994-01-01

    The Transient Reactor Analysis Code (TRAC), which features a two-fluid treatment of thermal-hydraulics, is designed to model transients in water reactors and related facilities. One of the major computational costs associated with TRAC and similar codes is calculating constitutive coefficients. Although the formulations for these coefficients are local, the costs are flow-regime- or data-dependent; i.e., the computations needed for a given spatial node often vary widely as a function of time. Consequently, a fixed, uniform assignment of nodes to prallel processors will result in degraded computational efficiency due to the poor load balancing. A standard method for treating data-dependent models on vector architectures has been to use gather operations (or indirect adressing) to sort the nodes into subsets that (temporarily) share a common computational model. However, this method is not effective on distributed memory data parallel architectures, where indirect adressing involves expensive communication overhead. Another serious problem with this method involves software engineering challenges in the areas of maintainability and extensibility. For example, an implementation that was hand-tuned to achieve good computational efficiency would have to be rewritten whenever the decision tree governing the sorting was modified. Using an example based on the calculation of the wall-to-liquid and wall-to-vapor heat-transfer coefficients for three nonboiling flow regimes, we describe how the use of the Fortran 90 WHERE construct and automatic inlining of functions can be used to ameliorate this problem while improving both efficiency and software engineering. Unfortunately, a general automatic solution to the load-balancing problem associated with data-dependent computations is not yet available for massively parallel architectures. We discuss why developers should either wait for such solutions or consider alternative numerical algorithms, such as a neural network

  20. Computer programming and computer systems

    CERN Document Server

    Hassitt, Anthony

    1966-01-01

    Computer Programming and Computer Systems imparts a "reading knowledge? of computer systems.This book describes the aspects of machine-language programming, monitor systems, computer hardware, and advanced programming that every thorough programmer should be acquainted with. This text discusses the automatic electronic digital computers, symbolic language, Reverse Polish Notation, and Fortran into assembly language. The routine for reading blocked tapes, dimension statements in subroutines, general-purpose input routine, and efficient use of memory are also elaborated.This publication is inten

  1. Research in Parallel Algorithms and Software for Computational Aerosciences

    Science.gov (United States)

    Domel, Neal D.

    1996-01-01

    Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.

  2. Functional requirements document for the Earth Observing System Data and Information System (EOSDIS) Scientific Computing Facilities (SCF) of the NASA/MSFC Earth Science and Applications Division, 1992

    Science.gov (United States)

    Botts, Michael E.; Phillips, Ron J.; Parker, John V.; Wright, Patrick D.

    1992-01-01

    Five scientists at MSFC/ESAD have EOS SCF investigator status. Each SCF has unique tasks which require the establishment of a computing facility dedicated to accomplishing those tasks. A SCF Working Group was established at ESAD with the charter of defining the computing requirements of the individual SCFs and recommending options for meeting these requirements. The primary goal of the working group was to determine which computing needs can be satisfied using either shared resources or separate but compatible resources, and which needs require unique individual resources. The requirements investigated included CPU-intensive vector and scalar processing, visualization, data storage, connectivity, and I/O peripherals. A review of computer industry directions and a market survey of computing hardware provided information regarding important industry standards and candidate computing platforms. It was determined that the total SCF computing requirements might be most effectively met using a hierarchy consisting of shared and individual resources. This hierarchy is composed of five major system types: (1) a supercomputer class vector processor; (2) a high-end scalar multiprocessor workstation; (3) a file server; (4) a few medium- to high-end visualization workstations; and (5) several low- to medium-range personal graphics workstations. Specific recommendations for meeting the needs of each of these types are presented.

  3. Organic Computing

    CERN Document Server

    Würtz, Rolf P

    2008-01-01

    Organic Computing is a research field emerging around the conviction that problems of organization in complex systems in computer science, telecommunications, neurobiology, molecular biology, ethology, and possibly even sociology can be tackled scientifically in a unified way. From the computer science point of view, the apparent ease in which living systems solve computationally difficult problems makes it inevitable to adopt strategies observed in nature for creating information processing machinery. In this book, the major ideas behind Organic Computing are delineated, together with a sparse sample of computational projects undertaken in this new field. Biological metaphors include evolution, neural networks, gene-regulatory networks, networks of brain modules, hormone system, insect swarms, and ant colonies. Applications are as diverse as system design, optimization, artificial growth, task allocation, clustering, routing, face recognition, and sign language understanding.

  4. Computed Tomography

    Science.gov (United States)

    Castellano, Isabel; Geleijns, Jacob

    After its clinical introduction in 1973, computed tomography developed from an x-ray modality for axial imaging in neuroradiology into a versatile three dimensional imaging modality for a wide range of applications in for example oncology, vascular radiology, cardiology, traumatology and even in interventional radiology. Computed tomography is applied for diagnosis, follow-up studies and screening of healthy subpopulations with specific risk factors. This chapter provides a general introduction in computed tomography, covering a short history of computed tomography, technology, image quality, dosimetry, room shielding, quality control and quality criteria.

  5. Biological computation

    CERN Document Server

    Lamm, Ehud

    2011-01-01

    Introduction and Biological BackgroundBiological ComputationThe Influence of Biology on Mathematics-Historical ExamplesBiological IntroductionModels and Simulations Cellular Automata Biological BackgroundThe Game of Life General Definition of Cellular Automata One-Dimensional AutomataExamples of Cellular AutomataComparison with a Continuous Mathematical Model Computational UniversalitySelf-Replication Pseudo Code Evolutionary ComputationEvolutionary Biology and Evolutionary ComputationGenetic AlgorithmsExample ApplicationsAnalysis of the Behavior of Genetic AlgorithmsLamarckian Evolution Genet

  6. Computational Deception

    NARCIS (Netherlands)

    Nijholt, Antinus; Acosta, P.S.; Cravo, P.

    2010-01-01

    In the future our daily life interactions with other people, with computers, robots and smart environments will be recorded and interpreted by computers or embedded intelligence in environments, furniture, robots, displays, and wearables. These sensors record our activities, our behaviour, and our

  7. Grid Computing

    Science.gov (United States)

    Foster, Ian

    2001-08-01

    The term "Grid Computing" refers to the use, for computational purposes, of emerging distributed Grid infrastructures: that is, network and middleware services designed to provide on-demand and high-performance access to all important computational resources within an organization or community. Grid computing promises to enable both evolutionary and revolutionary changes in the practice of computational science and engineering based on new application modalities such as high-speed distributed analysis of large datasets, collaborative engineering and visualization, desktop access to computation via "science portals," rapid parameter studies and Monte Carlo simulations that use all available resources within an organization, and online analysis of data from scientific instruments. In this article, I examine the status of Grid computing circa 2000, briefly reviewing some relevant history, outlining major current Grid research and development activities, and pointing out likely directions for future work. I also present a number of case studies, selected to illustrate the potential of Grid computing in various areas of science.

  8. Platform computing

    CERN Multimedia

    2002-01-01

    "Platform Computing releases first grid-enabled workload management solution for IBM eServer Intel and UNIX high performance computing clusters. This Out-of-the-box solution maximizes the performance and capability of applications on IBM HPC clusters" (1/2 page) .

  9. Quantum Computing

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 5; Issue 9. Quantum Computing - Building Blocks of a Quantum Computer. C S Vijay Vishal Gupta. General Article Volume 5 Issue 9 September 2000 pp 69-81. Fulltext. Click here to view fulltext PDF. Permanent link:

  10. Computational Pathology

    Science.gov (United States)

    Louis, David N.; Feldman, Michael; Carter, Alexis B.; Dighe, Anand S.; Pfeifer, John D.; Bry, Lynn; Almeida, Jonas S.; Saltz, Joel; Braun, Jonathan; Tomaszewski, John E.; Gilbertson, John R.; Sinard, John H.; Gerber, Georg K.; Galli, Stephen J.; Golden, Jeffrey A.; Becich, Michael J.

    2016-01-01

    Context We define the scope and needs within the new discipline of computational pathology, a discipline critical to the future of both the practice of pathology and, more broadly, medical practice in general. Objective To define the scope and needs of computational pathology. Data Sources A meeting was convened in Boston, Massachusetts, in July 2014 prior to the annual Association of Pathology Chairs meeting, and it was attended by a variety of pathologists, including individuals highly invested in pathology informatics as well as chairs of pathology departments. Conclusions The meeting made recommendations to promote computational pathology, including clearly defining the field and articulating its value propositions; asserting that the value propositions for health care systems must include means to incorporate robust computational approaches to implement data-driven methods that aid in guiding individual and population health care; leveraging computational pathology as a center for data interpretation in modern health care systems; stating that realizing the value proposition will require working with institutional administrations, other departments, and pathology colleagues; declaring that a robust pipeline should be fostered that trains and develops future computational pathologists, for those with both pathology and non-pathology backgrounds; and deciding that computational pathology should serve as a hub for data-related research in health care systems. The dissemination of these recommendations to pathology and bioinformatics departments should help facilitate the development of computational pathology. PMID:26098131

  11. Cloud Computing

    Indian Academy of Sciences (India)

    IAS Admin

    2014-03-01

    Mar 1, 2014 ... decade in computing. In this article we define cloud computing, various services available on the cloud infrastructure, and the different types of cloud. We then discuss the technological trends which have led to its emergence, its advantages and disadvan- tages, and the applications which are appropriate ...

  12. GPGPU COMPUTING

    Directory of Open Access Journals (Sweden)

    BOGDAN OANCEA

    2012-05-01

    Full Text Available Since the first idea of using GPU to general purpose computing, things have evolved over the years and now there are several approaches to GPU programming. GPU computing practically began with the introduction of CUDA (Compute Unified Device Architecture by NVIDIA and Stream by AMD. These are APIs designed by the GPU vendors to be used together with the hardware that they provide. A new emerging standard, OpenCL (Open Computing Language tries to unify different GPU general computing API implementations and provides a framework for writing programs executed across heterogeneous platforms consisting of both CPUs and GPUs. OpenCL provides parallel computing using task-based and data-based parallelism. In this paper we will focus on the CUDA parallel computing architecture and programming model introduced by NVIDIA. We will present the benefits of the CUDA programming model. We will also compare the two main approaches, CUDA and AMD APP (STREAM and the new framwork, OpenCL that tries to unify the GPGPU computing models.

  13. Computer Insecurity.

    Science.gov (United States)

    Wilson, David L.

    1994-01-01

    College administrators recently appealed to students and faculty to change their computer passwords after security experts announced that tens of thousands had been stolen by computer hackers. Federal officials are investigating. Such attacks are not uncommon, but the most effective solutions are either inconvenient or cumbersome. (MSE)

  14. Quantum Computing

    Indian Academy of Sciences (India)

    In the first part of this article, we had looked at how quantum physics can be harnessed to make the building blocks of a quantum computer. In this concluding part, we look at algorithms which can exploit the power of this computational device, and some practical difficulties in building such a device. Quantum Algorithms.

  15. Cloud Computing

    Indian Academy of Sciences (India)

    IAS Admin

    2014-03-01

    Mar 1, 2014 ... Thus the availability of computing as a utility which allows organizations to pay service providers for what they use and eliminates the need to budget huge amounts to buy and maintain large computing infrastructure is a welcome development. Amazon, an e-commerce company, started operations in 1995.

  16. Computational Composites

    DEFF Research Database (Denmark)

    Vallgårda, Anna K. A.

    this understanding could entail in terms of developing new expressional appearances of computational technology, new ways of working with it, and new technological possibilities. The investigations are carried out in relation to, or as part of three experiments with computers and materials (PLANKS, Copper...

  17. Cloud Computing

    DEFF Research Database (Denmark)

    Krogh, Simon

    2013-01-01

    with technological changes, the paradigmatic pendulum has swung between increased centralization on one side and a focus on distributed computing that pushes IT power out to end users on the other. With the introduction of outsourcing and cloud computing, centralization in large data centers is again dominating...... the IT scene. In line with the views presented by Nicolas Carr in 2003 (Carr, 2003), it is a popular assumption that cloud computing will be the next utility (like water, electricity and gas) (Buyya, Yeo, Venugopal, Broberg, & Brandic, 2009). However, this assumption disregards the fact that most IT production......), for instance, in establishing and maintaining trust between the involved parties (Sabherwal, 1999). So far, research in cloud computing has neglected this perspective and focused entirely on aspects relating to technology, economy, security and legal questions. While the core technologies of cloud computing (e...

  18. Development of superconductor electronics technology for high-end computing

    Science.gov (United States)

    Silver, A.; Kleinsasser, A.; Kerber, G.; Herr, Q.; Dorojevets, M.; Bunyk, P.; Abelson, L.

    2003-12-01

    This paper describes our programme to develop and demonstrate ultra-high performance single flux quantum (SFQ) VLSI technology that will enable superconducting digital processors for petaFLOPS-scale computing. In the hybrid technology, multi-threaded architecture, the computational engine to power a petaFLOPS machine at affordable power will consist of 4096 SFQ multi-chip processors, with 50 to 100 GHz clock frequency and associated cryogenic RAM. We present the superconducting technology requirements, progress to date and our plan to meet these requirements. We improved SFQ Nb VLSI by two generations, to a 8 kA cm-2, 1.25 µm junction process, incorporated new CAD tools into our methodology, demonstrated methods for recycling the bias current and data communication at speeds up to 60 Gb s-1, both on and between chips through passive transmission lines. FLUX-1 is the most ambitious project implemented in SFQ technology to date, a prototype general-purpose 8 bit microprocessor chip. We are testing the FLUX-1 chip (5K gates, 20 GHz clock) and designing a 32 bit floating-point SFQ multiplier with vector-register memory. We report correct operation of the complete stripline-connected gate library with large bias margins, as well as several larger functional units used in FLUX-1. The next stage will be an SFQ multi-processor machine. Important challenges include further reducing chip supply current and on-chip power dissipation, developing at least 64 kbit, sub-nanosecond cryogenic RAM chips, developing thermally and electrically efficient high data rate cryogenic-to-ambient input/output technology and improving Nb VLSI to increase gate density.

  19. Computational Streetscapes

    Directory of Open Access Journals (Sweden)

    Paul M. Torrens

    2016-09-01

    Full Text Available Streetscapes have presented a long-standing interest in many fields. Recently, there has been a resurgence of attention on streetscape issues, catalyzed in large part by computing. Because of computing, there is more understanding, vistas, data, and analysis of and on streetscape phenomena than ever before. This diversity of lenses trained on streetscapes permits us to address long-standing questions, such as how people use information while mobile, how interactions with people and things occur on streets, how we might safeguard crowds, how we can design services to assist pedestrians, and how we could better support special populations as they traverse cities. Amid each of these avenues of inquiry, computing is facilitating new ways of posing these questions, particularly by expanding the scope of what-if exploration that is possible. With assistance from computing, consideration of streetscapes now reaches across scales, from the neurological interactions that form among place cells in the brain up to informatics that afford real-time views of activity over whole urban spaces. For some streetscape phenomena, computing allows us to build realistic but synthetic facsimiles in computation, which can function as artificial laboratories for testing ideas. In this paper, I review the domain science for studying streetscapes from vantages in physics, urban studies, animation and the visual arts, psychology, biology, and behavioral geography. I also review the computational developments shaping streetscape science, with particular emphasis on modeling and simulation as informed by data acquisition and generation, data models, path-planning heuristics, artificial intelligence for navigation and way-finding, timing, synthetic vision, steering routines, kinematics, and geometrical treatment of collision detection and avoidance. I also discuss the implications that the advances in computing streetscapes might have on emerging developments in cyber

  20. Software Synthesis for High Productivity Exascale Computing

    Energy Technology Data Exchange (ETDEWEB)

    Bodik, Rastislav [Univ. of Washington, Seattle, WA (United States)

    2010-09-01

    Over the three years of our project, we accomplished three key milestones: We demonstrated how ideas from generative programming and software synthesis can help support the development of bulk-synchronous distributed memory kernels. These ideas are realized in a new language called MSL, a C-like language that combines synthesis features with high level notations for array manipulation and bulk-synchronous parallelism to simplify the semantic analysis required for synthesis. We also demonstrated that these high level notations map easily to low level C code and show that the performance of this generated code matches that of handwritten Fortran. Second, we introduced the idea of solver-aided domain-specific languages (SDSLs), which are an emerging class of computer-aided programming systems. SDSLs ease the construction of programs by automating tasks such as verification, debugging, synthesis, and non-deterministic execution. SDSLs are implemented by translating the DSL program into logical constraints. Next, we developed a symbolic virtual machine called Rosette, which simplifies the construction of such SDSLs and their compilers. We have used Rosette to build SynthCL, a subset of OpenCL that supports synthesis. Third, we developed novel numeric algorithms that move as little data as possible, either between levels of a memory hierarchy or between parallel processors over a network. We achieved progress in three aspects of this problem. First we determined lower bounds on communication. Second, we compared these lower bounds to widely used versions of these algorithms, and noted that these widely used algorithms usually communicate asymptotically more than is necessary. Third, we identified or invented new algorithms for most linear algebra problems that do attain these lower bounds, and demonstrated large speed-ups in theory and practice.

  1. Low Overhead Real-Time Computing With General Purpose Operating Systems

    National Research Council Canada - National Science Library

    Raymond, Michael

    2004-01-01

    .... In larger systems and more recently, general-purpose operating systems such as SGI IRIX and Linux are used for new projects because they already have multiprocessor and device driver support as well a large user base...

  2. COMPUTATIONAL THINKING

    Directory of Open Access Journals (Sweden)

    Evgeniy K. Khenner

    2016-01-01

    Full Text Available Abstract. The aim of the research is to draw attention of the educational community to the phenomenon of computational thinking which actively discussed in the last decade in the foreign scientific and educational literature, to substantiate of its importance, practical utility and the right on affirmation in Russian education.Methods. The research is based on the analysis of foreign studies of the phenomenon of computational thinking and the ways of its formation in the process of education; on comparing the notion of «computational thinking» with related concepts used in the Russian scientific and pedagogical literature.Results. The concept «computational thinking» is analyzed from the point of view of intuitive understanding and scientific and applied aspects. It is shown as computational thinking has evolved in the process of development of computers hardware and software. The practice-oriented interpretation of computational thinking which dominant among educators is described along with some ways of its formation. It is shown that computational thinking is a metasubject result of general education as well as its tool. From the point of view of the author, purposeful development of computational thinking should be one of the tasks of the Russian education.Scientific novelty. The author gives a theoretical justification of the role of computational thinking schemes as metasubject results of learning. The dynamics of the development of this concept is described. This process is connected with the evolution of computer and information technologies as well as increase of number of the tasks for effective solutions of which computational thinking is required. Author substantiated the affirmation that including «computational thinking » in the set of pedagogical concepts which are used in the national education system fills an existing gap.Practical significance. New metasubject result of education associated with

  3. Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via Tree-Grafting

    International Nuclear Information System (INIS)

    Azad, Ariful; Buluc, Aydn; Pothen, Alex

    2016-01-01

    It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its shared-memory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. This algorithm searches for augmenting paths via specialized breadth-first searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. Algorithms that employ multiple-source searches cannot discard a search tree once no augmenting path is discovered from the tree, unlike algorithms that rely on single-source searches. We describe a novel tree-grafting method that eliminates most of the redundant edge traversals resulting from this property of multiple-source searches. We also employ the recent direction-optimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. Here, we provide a proof of correctness for our algorithm. Our NUMA-aware implementation is scalable to 80 threads of an Intel multiprocessor and to 240 threads on an Intel Knights Corner coprocessor. On average, our parallel algorithm runs an order of magnitude faster than the fastest algorithms available. The performance improvement is more significant on graphs with small matching number.

  4. Computational physics

    International Nuclear Information System (INIS)

    Anon.

    1987-01-01

    Computers have for many years played a vital role in the acquisition and treatment of experimental data, but they have more recently taken up a much more extended role in physics research. The numerical and algebraic calculations now performed on modern computers make it possible to explore consequences of basic theories in a way which goes beyond the limits of both analytic insight and experimental investigation. This was brought out clearly at the Conference on Perspectives in Computational Physics, held at the International Centre for Theoretical Physics, Trieste, Italy, from 29-31 October

  5. Computational physics

    CERN Document Server

    Newman, Mark

    2013-01-01

    A complete introduction to the field of computational physics, with examples and exercises in the Python programming language. Computers play a central role in virtually every major physics discovery today, from astrophysics and particle physics to biophysics and condensed matter. This book explains the fundamentals of computational physics and describes in simple terms the techniques that every physicist should know, such as finite difference methods, numerical quadrature, and the fast Fourier transform. The book offers a complete introduction to the topic at the undergraduate level, and is also suitable for the advanced student or researcher who wants to learn the foundational elements of this important field.

  6. Computer interfacing

    CERN Document Server

    Dixey, Graham

    1994-01-01

    This book explains how computers interact with the world around them and therefore how to make them a useful tool. Topics covered include descriptions of all the components that make up a computer, principles of data exchange, interaction with peripherals, serial communication, input devices, recording methods, computer-controlled motors, and printers.In an informative and straightforward manner, Graham Dixey describes how to turn what might seem an incomprehensible 'black box' PC into a powerful and enjoyable tool that can help you in all areas of your work and leisure. With plenty of handy

  7. Computational Viscoelasticity

    CERN Document Server

    Marques, Severino P C

    2012-01-01

    This text is a guide how to solve problems in which viscoelasticity is present using existing commercial computational codes. The book gives information on codes’ structure and use, data preparation  and output interpretation and verification. The first part of the book introduces the reader to the subject, and to provide the models, equations and notation to be used in the computational applications. The second part shows the most important Computational techniques: Finite elements formulation, Boundary elements formulation, and presents the solutions of Viscoelastic problems with Abaqus.

  8. A highly efficient parallel algorithm for solving the neutron diffusion nodal equations on shared-memory computers

    International Nuclear Information System (INIS)

    Azmy, Y.Y.; Kirk, B.L.

    1990-01-01

    Modern parallel computer architectures offer an enormous potential for reducing CPU and wall-clock execution times of large-scale computations commonly performed in various applications in science and engineering. Recently, several authors have reported their efforts in developing and implementing parallel algorithms for solving the neutron diffusion equation on a variety of shared- and distributed-memory parallel computers. Testing of these algorithms for a variety of two- and three-dimensional meshes showed significant speedup of the computation. Even for very large problems (i.e., three-dimensional fine meshes) executed concurrently on a few nodes in serial (nonvector) mode, however, the measured computational efficiency is very low (40 to 86%). In this paper, the authors present a highly efficient (∼85 to 99.9%) algorithm for solving the two-dimensional nodal diffusion equations on the Sequent Balance 8000 parallel computer. Also presented is a model for the performance, represented by the efficiency, as a function of problem size and the number of participating processors. The model is validated through several tests and then extrapolated to larger problems and more processors to predict the performance of the algorithm in more computationally demanding situations

  9. A Framework for Parallel Unstructured Grid Generation for Complex Aerodynamic Simulations

    Science.gov (United States)

    Zagaris, George; Pirzadeh, Shahyar Z.; Chrisochoides, Nikos

    2009-01-01

    A framework for parallel unstructured grid generation targeting both shared memory multi-processors and distributed memory architectures is presented. The two fundamental building-blocks of the framework consist of: (1) the Advancing-Partition (AP) method used for domain decomposition and (2) the Advancing Front (AF) method used for mesh generation. Starting from the surface mesh of the computational domain, the AP method is applied recursively to generate a set of sub-domains. Next, the sub-domains are meshed in parallel using the AF method. The recursive nature of domain decomposition naturally maps to a divide-and-conquer algorithm which exhibits inherent parallelism. For the parallel implementation, the Master/Worker pattern is employed to dynamically balance the varying workloads of each task on the set of available CPUs. Performance results by this approach are presented and discussed in detail as well as future work and improvements.

  10. Computational Literacy

    DEFF Research Database (Denmark)

    Chongtay, Rocio; Robering, Klaus

    2016-01-01

    In recent years, there has been a growing interest in and recognition of the importance of Computational Literacy, a skill generally considered to be necessary for success in the 21st century. While much research has concentrated on requirements, tools, and teaching methodologies for the acquisit......In recent years, there has been a growing interest in and recognition of the importance of Computational Literacy, a skill generally considered to be necessary for success in the 21st century. While much research has concentrated on requirements, tools, and teaching methodologies...... for the acquisition of Computational Literacy at basic educational levels, focus on higher levels of education has been much less prominent. The present paper considers the case of courses for higher education programs within the Humanities. A model is proposed which conceives of Computational Literacy as a layered...

  11. Computing Religion

    DEFF Research Database (Denmark)

    Nielbo, Kristoffer Laigaard; Braxton, Donald M.; Upal, Afzal

    2012-01-01

    The computational approach has become an invaluable tool in many fields that are directly relevant to research in religious phenomena. Yet the use of computational tools is almost absent in the study of religion. Given that religion is a cluster of interrelated phenomena and that research...... concerning these phenomena should strive for multilevel analysis, this article argues that the computational approach offers new methodological and theoretical opportunities to the study of religion. We argue that the computational approach offers 1.) an intermediary step between any theoretical construct...... and its targeted empirical space and 2.) a new kind of data which allows the researcher to observe abstract constructs, estimate likely outcomes, and optimize empirical designs. Because sophisticated mulitilevel research is a collaborative project we also seek to introduce to scholars of religion some...

  12. Computational Controversy

    NARCIS (Netherlands)

    Timmermans, Benjamin; Kuhn, Tobias; Beelen, Kaspar; Aroyo, Lora

    2017-01-01

    Climate change, vaccination, abortion, Trump: Many topics are surrounded by fierce controversies. The nature of such heated debates and their elements have been studied extensively in the social science literature. More recently, various computational approaches to controversy analysis have

  13. COMPUTERS HAZARDS

    Directory of Open Access Journals (Sweden)

    Andrzej Augustynek

    2007-01-01

    Full Text Available In June 2006, over 12.6 million Polish users of the Web registered. On the average, each of them spent 21 hours and 37 minutes monthly browsing the Web. That is why the problems of the psychological aspects of computer utilization have become an urgent research subject. The results of research into the development of Polish information society carried out in AGH University of Science and Technology, under the leadership of Leslaw H. Haber, in the period from 2000 until present time, indicate the emergence dynamic changes in the ways of computer utilization and their circumstances. One of the interesting regularities has been the inverse proportional relation between the level of computer skills and the frequency of the Web utilization.It has been found that in 2005, compared to 2000, the following changes occurred:- A significant drop in the number of students who never used computers and the Web;- Remarkable increase in computer knowledge and skills (particularly pronounced in the case of first years student- Decreasing gap in computer skills between students of the first and the third year; between male and female students;- Declining popularity of computer games.It has been demonstrated also that the hazard of computer screen addiction was the highest in he case of unemployed youth outside school system. As much as 12% of this group of young people were addicted to computer. A lot of leisure time that these youths enjoyed inducted them to excessive utilization of the Web. Polish housewives are another population group in risk of addiction to the Web. The duration of long Web charts carried out by younger and younger youths has been another matter of concern. Since the phenomenon of computer addiction is relatively new, no specific therapy methods has been developed. In general, the applied therapy in relation to computer addition syndrome is similar to the techniques applied in the cases of alcohol or gambling addiction. Individual and group

  14. Computational sustainability

    CERN Document Server

    Kersting, Kristian; Morik, Katharina

    2016-01-01

    The book at hand gives an overview of the state of the art research in Computational Sustainability as well as case studies of different application scenarios. This covers topics such as renewable energy supply, energy storage and e-mobility, efficiency in data centers and networks, sustainable food and water supply, sustainable health, industrial production and quality, etc. The book describes computational methods and possible application scenarios.

  15. Computing farms

    International Nuclear Information System (INIS)

    Yeh, G.P.

    2000-01-01

    High-energy physics, nuclear physics, space sciences, and many other fields have large challenges in computing. In recent years, PCs have achieved performance comparable to the high-end UNIX workstations, at a small fraction of the price. We review the development and broad applications of commodity PCs as the solution to CPU needs, and look forward to the important and exciting future of large-scale PC computing

  16. Computational oncology.

    Science.gov (United States)

    Lefor, Alan T

    2011-08-01

    Oncology research has traditionally been conducted using techniques from the biological sciences. The new field of computational oncology has forged a new relationship between the physical sciences and oncology to further advance research. By applying physics and mathematics to oncologic problems, new insights will emerge into the pathogenesis and treatment of malignancies. One major area of investigation in computational oncology centers around the acquisition and analysis of data, using improved computing hardware and software. Large databases of cellular pathways are being analyzed to understand the interrelationship among complex biological processes. Computer-aided detection is being applied to the analysis of routine imaging data including mammography and chest imaging to improve the accuracy and detection rate for population screening. The second major area of investigation uses computers to construct sophisticated mathematical models of individual cancer cells as well as larger systems using partial differential equations. These models are further refined with clinically available information to more accurately reflect living systems. One of the major obstacles in the partnership between physical scientists and the oncology community is communications. Standard ways to convey information must be developed. Future progress in computational oncology will depend on close collaboration between clinicians and investigators to further the understanding of cancer using these new approaches.

  17. Proceedings: Sisal `93

    Energy Technology Data Exchange (ETDEWEB)

    Feo, J.T. [ed.

    1993-10-01

    This report contain papers on: Programmability and performance issues; The case of an iterative partial differential equation solver; Implementing the kernal of the Australian Region Weather Prediction Model in Sisal; Even and quarter-even prime length symmetric FFTs and their Sisal Implementations; Top-down thread generation for Sisal; Overlapping communications and computations on NUMA architechtures; Compiling technique based on dataflow analysis for funtional programming language Valid; Copy elimination for true multidimensional arrays in Sisal 2.0; Increasing parallelism for an optimization that reduces copying in IF2 graphs; Caching in on Sisal; Cache performance of Sisal Vs. FORTRAN; FFT algorithms on a shared-memory multiprocessor; A parallel implementation of nonnumeric search problems in Sisal; Computer vision algorithms in Sisal; Compilation of Sisal for a high-performance data driven vector processor; Sisal on distributed memory machines; A virtual shared addressing system for distributed memory Sisal; Developing a high-performance FFT algorithm in Sisal for a vector supercomputer; Implementation issues for IF2 on a static data-flow architechture; and Systematic control of parallelism in array-based data-flow computation. Selected papers have been indexed separately for inclusion in the Energy Science and Technology Database.

  18. Proceedings of the second SISAL users` conference

    Energy Technology Data Exchange (ETDEWEB)

    Feo, J T; Frerking, C; Miller, P J [eds.

    1992-12-01

    This report contains papers on the following topics: A sisal code for computing the fourier transform on S{sub N}; five ways to fill your knapsack; simulating material dislocation motion in sisal; candis as an interface for sisal; parallelisation and performance of the burg algorithm on a shared-memory multiprocessor; use of genetic algorithm in sisal to solve the file design problem; implementing FFT`s in sisal; programming and evaluating the performance of signal processing applications in the sisal programming environment; sisal and Von Neumann-based languages: translation and intercommunication; an IF2 code generator for ADAM architecture; program partitioning for NUMA multiprocessor computer systems; mapping functional parallelism on distributed memory machines; implicit array copying: prevention is better than cure ; mathematical syntax for sisal; an approach for optimizing recursive functions; implementing arrays in sisal 2.0; Fol: an object oriented extension to the sisal language; twine: a portable, extensible sisal execution kernel; and investigating the memory performance of the optimizing sisal compiler.

  19. Computational creativity

    Directory of Open Access Journals (Sweden)

    López de Mántaras Badia, Ramon

    2013-12-01

    Full Text Available New technologies, and in particular artificial intelligence, are drastically changing the nature of creative processes. Computers are playing very significant roles in creative activities such as music, architecture, fine arts, and science. Indeed, the computer is already a canvas, a brush, a musical instrument, and so on. However, we believe that we must aim at more ambitious relations between computers and creativity. Rather than just seeing the computer as a tool to help human creators, we could see it as a creative entity in its own right. This view has triggered a new subfield of Artificial Intelligence called Computational Creativity. This article addresses the question of the possibility of achieving computational creativity through some examples of computer programs capable of replicating some aspects of creative behavior in the fields of music and science.Las nuevas tecnologías y en particular la Inteligencia Artificial están cambiando de forma importante la naturaleza del proceso creativo. Los ordenadores están jugando un papel muy significativo en actividades artísticas tales como la música, la arquitectura, las bellas artes y la ciencia. Efectivamente, el ordenador ya es el lienzo, el pincel, el instrumento musical, etc. Sin embargo creemos que debemos aspirar a relaciones más ambiciosas entre los ordenadores y la creatividad. En lugar de verlos solamente como herramientas de ayuda a la creación, los ordenadores podrían ser considerados agentes creativos. Este punto de vista ha dado lugar a un nuevo subcampo de la Inteligencia Artificial denominado Creatividad Computacional. En este artículo abordamos la cuestión de la posibilidad de alcanzar dicha creatividad computacional mediante algunos ejemplos de programas de ordenador capaces de replicar algunos aspectos relacionados con el comportamiento creativo en los ámbitos de la música y la ciencia.

  20. Computational mechanics

    Energy Technology Data Exchange (ETDEWEB)

    Goudreau, G.L.

    1993-03-01

    The Computational Mechanics thrust area sponsors research into the underlying solid, structural and fluid mechanics and heat transfer necessary for the development of state-of-the-art general purpose computational software. The scale of computational capability spans office workstations, departmental computer servers, and Cray-class supercomputers. The DYNA, NIKE, and TOPAZ codes have achieved world fame through our broad collaborators program, in addition to their strong support of on-going Lawrence Livermore National Laboratory (LLNL) programs. Several technology transfer initiatives have been based on these established codes, teaming LLNL analysts and researchers with counterparts in industry, extending code capability to specific industrial interests of casting, metalforming, and automobile crash dynamics. The next-generation solid/structural mechanics code, ParaDyn, is targeted toward massively parallel computers, which will extend performance from gigaflop to teraflop power. Our work for FY-92 is described in the following eight articles: (1) Solution Strategies: New Approaches for Strongly Nonlinear Quasistatic Problems Using DYNA3D; (2) Enhanced Enforcement of Mechanical Contact: The Method of Augmented Lagrangians; (3) ParaDyn: New Generation Solid/Structural Mechanics Codes for Massively Parallel Processors; (4) Composite Damage Modeling; (5) HYDRA: A Parallel/Vector Flow Solver for Three-Dimensional, Transient, Incompressible Viscous How; (6) Development and Testing of the TRIM3D Radiation Heat Transfer Code; (7) A Methodology for Calculating the Seismic Response of Critical Structures; and (8) Reinforced Concrete Damage Modeling.

  1. Computationally efficient implementation of combustion chemistry in parallel PDF calculations

    International Nuclear Information System (INIS)

    Lu Liuyan; Lantz, Steven R.; Ren Zhuyin; Pope, Stephen B.

    2009-01-01

    In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f m pi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel

  2. Computationally efficient implementation of combustion chemistry in parallel PDF calculations

    Science.gov (United States)

    Lu, Liuyan; Lantz, Steven R.; Ren, Zhuyin; Pope, Stephen B.

    2009-08-01

    In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f_mpi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel

  3. Quantum computing

    International Nuclear Information System (INIS)

    Steane, Andrew

    1998-01-01

    The subject of quantum computing brings together ideas from classical information theory, computer science, and quantum physics. This review aims to summarize not just quantum computing, but the whole subject of quantum information theory. Information can be identified as the most general thing which must propagate from a cause to an effect. It therefore has a fundamentally important role in the science of physics. However, the mathematical treatment of information, especially information processing, is quite recent, dating from the mid-20th century. This has meant that the full significance of information as a basic concept in physics is only now being discovered. This is especially true in quantum mechanics. The theory of quantum information and computing puts this significance on a firm footing, and has led to some profound and exciting new insights into the natural world. Among these are the use of quantum states to permit the secure transmission of classical information (quantum cryptography), the use of quantum entanglement to permit reliable transmission of quantum states (teleportation), the possibility of preserving quantum coherence in the presence of irreversible noise processes (quantum error correction), and the use of controlled quantum evolution for efficient computation (quantum computation). The common theme of all these insights is the use of quantum entanglement as a computational resource. It turns out that information theory and quantum mechanics fit together very well. In order to explain their relationship, this review begins with an introduction to classical information theory and computer science, including Shannon's theorem, error correcting codes, Turing machines and computational complexity. The principles of quantum mechanics are then outlined, and the Einstein, Podolsky and Rosen (EPR) experiment described. The EPR-Bell correlations, and quantum entanglement in general, form the essential new ingredient which distinguishes quantum from

  4. Portable parallel stochastic optimization for the design of aeropropulsion components

    Science.gov (United States)

    Sues, Robert H.; Rhodes, G. S.

    1994-01-01

    This report presents the results of Phase 1 research to develop a methodology for performing large-scale Multi-disciplinary Stochastic Optimization (MSO) for the design of aerospace systems ranging from aeropropulsion components to complete aircraft configurations. The current research recognizes that such design optimization problems are computationally expensive, and require the use of either massively parallel or multiple-processor computers. The methodology also recognizes that many operational and performance parameters are uncertain, and that uncertainty must be considered explicitly to achieve optimum performance and cost. The objective of this Phase 1 research was to initialize the development of an MSO methodology that is portable to a wide variety of hardware platforms, while achieving efficient, large-scale parallelism when multiple processors are available. The first effort in the project was a literature review of available computer hardware, as well as review of portable, parallel programming environments. The first effort was to implement the MSO methodology for a problem using the portable parallel programming language, Parallel Virtual Machine (PVM). The third and final effort was to demonstrate the example on a variety of computers, including a distributed-memory multiprocessor, a distributed-memory network of workstations, and a single-processor workstation. Results indicate the MSO methodology can be well-applied towards large-scale aerospace design problems. Nearly perfect linear speedup was demonstrated for computation of optimization sensitivity coefficients on both a 128-node distributed-memory multiprocessor (the Intel iPSC/860) and a network of workstations (speedups of almost 19 times achieved for 20 workstations). Very high parallel efficiencies (75 percent for 31 processors and 60 percent for 50 processors) were also achieved for computation of aerodynamic influence coefficients on the Intel. Finally, the multi-level parallelization

  5. Computational hydraulics

    Science.gov (United States)

    Brebbia, C. A.; Ferrante, A. J.

    Computational hydraulics is discussed in detail, combining classical hydraulics with new methods such as finite elements and boundary elements, both presented in a matrix formulation. The basic properties and concepts of fluids are first reviewed, and pipe flow is treated, giving empirical formulae. Aspects of pipe networks are covered, including energy losses, total systems of equations and their solution, linear and nonlinear analyses and computer programs. Open-channel flow is treated, including Chezy and Manning formulae, optimum hydraulic section, nonuniform flow, and analysis and computation. Potential flow is addressed, including the application of Euler's equations, flow nets, finite element and boundary element solutions and programs. The applications of Navier-Stokes equations to Newtonian fluids and turbulence is considered. Finally, turbomachinery is discussed.

  6. Quantum computers.

    Science.gov (United States)

    Ladd, T D; Jelezko, F; Laflamme, R; Nakamura, Y; Monroe, C; O'Brien, J L

    2010-03-04

    Over the past several decades, quantum information science has emerged to seek answers to the question: can we gain some advantage by storing, transmitting and processing information encoded in systems that exhibit unique quantum properties? Today it is understood that the answer is yes, and many research groups around the world are working towards the highly ambitious technological goal of building a quantum computer, which would dramatically improve computational power for particular tasks. A number of physical systems, spanning much of modern physics, are being developed for quantum computation. However, it remains unclear which technology, if any, will ultimately prove successful. Here we describe the latest developments for each of the leading approaches and explain the major challenges for the future.

  7. Computational Psychiatry

    Science.gov (United States)

    Wang, Xiao-Jing; Krystal, John H.

    2014-01-01

    Psychiatric disorders such as autism and schizophrenia arise from abnormalities in brain systems that underlie cognitive, emotional and social functions. The brain is enormously complex and its abundant feedback loops on multiple scales preclude intuitive explication of circuit functions. In close interplay with experiments, theory and computational modeling are essential for understanding how, precisely, neural circuits generate flexible behaviors and their impairments give rise to psychiatric symptoms. This Perspective highlights recent progress in applying computational neuroscience to the study of mental disorders. We outline basic approaches, including identification of core deficits that cut across disease categories, biologically-realistic modeling bridging cellular and synaptic mechanisms with behavior, model-aided diagnosis. The need for new research strategies in psychiatry is urgent. Computational psychiatry potentially provides powerful tools for elucidating pathophysiology that may inform both diagnosis and treatment. To achieve this promise will require investment in cross-disciplinary training and research in this nascent field. PMID:25442941

  8. Architecture of an acquisition system-multiprocessors

    International Nuclear Information System (INIS)

    Postec, H.

    1987-07-01

    To follow the huge increasing of concerned parameters in nuclear detection systems, acquisition systems become bigger and have to present very good rapidity performance. At Ganil, four detection systems have been set in Nautilus reaction chamber, that lead to experiment configurations with 700 parameters to process. In front of present acquisition system limitation, a device more relevant to lecture of a large number of channels show off necessary. Functionalities already operating in other systems and hardware already used have been chosen; specific technical solutions were aldo developed to use the most recent techniques and to take in account the four detection system structure of the device [fr

  9. A CRAY-Class Multiprocessor Simulator.

    Science.gov (United States)

    1983-09-01

    are used to denote optional parameters. * Ellipsis notation (...) is used to denote the repetition of a3parameter list. Vertical bars are used to...OLSO NID H IUATRMNR O8 C IT AT119kA" "s C ~~ .. 9 1"ŕ ’o RfuI ;’" IS 1 ’! TESI ULAT0 R HEN OUTY- SIHIP, SYSOI.AN 5 UQIVLCnA!-TE TOS C *..T PI IGWATORT

  10. Multigrid Algorithms on the Hypercube Multiprocessor,

    Science.gov (United States)

    1985-02-01

    multigrid methods is distinguished from others by their use of a hi- erarchy of coarser grids (in addition to the one on which the solution is sought) in...have been considered by Grosch [5, 6], Brandt [2], and Chan and Schreiber [3]. The major drawback of this class of parallel multigrid methods is that... multigrid methods In the previous sections we have not addressed some of the important aspects of multi-level meshes that appear in multigrid methods . In

  11. Modeling the Performance of the Concert Multiprocessor.

    Science.gov (United States)

    1987-05-01

    acce~sses inl the cloc:kwise direction airiitld (le IiirgluS. As dopicte’d in F+igure 1.6, " " a menor ), access to> it Iliglhb oit ing R111 in the...8217 requests subject to 1, and 3. has thle mal ~ximumli number of the longest requests subl-ject to I and 2 (i.e. a request subset " ith recqucsts of length 1

  12. Modeling A Circuit Switched Multiprocessor Interconnect

    Science.gov (United States)

    1989-10-01

    at stages of the network nearer the memories is artificially elevated. This would tend to make the predicted values for U more pessimistic. 2. In...and by grants from the Sloan foundation and IBM. 2The design currently usam a packet-switched direct network with wormhole routing [l ] although

  13. The fast Amsterdam multiprocessor (FAMP) operation system

    International Nuclear Information System (INIS)

    Gosman, D.; Hertzberger, L.O.; Holthuizen, D.J.; Por, G.J.A.; Schoorel, M.

    1981-01-01

    The Fast Amsterdam Multi Processor system (FAMP system) is developed for on-line filtering and second stage triggering. The system is based on the MC 68000 microprocessor from MOTOROLA. In this report we will describe: The FAMP operating system software, the features of the slaves and supervisor in the FAMP operating system, the communication between supervisor and slaves using the dual port memories, the communication between user programs and the operating system. The hardware as well as the application of the system will be described elsewhere. (orig.)

  14. XNUSIM - Graphical Interface for a Multiprocessor Simulator

    Science.gov (United States)

    1989-09-01

    win- dow, it triggers the MainMenu(see Below). Message Window Display any shor Help message available and or messages from xnusim to the user...Listing Window Display the file that is currently being executed, and shows the last line that • had been executed when stepping through. Command Window

  15. Cloud Computing

    CERN Document Server

    Antonopoulos, Nick

    2010-01-01

    Cloud computing has recently emerged as a subject of substantial industrial and academic interest, though its meaning and scope is hotly debated. For some researchers, clouds are a natural evolution towards the full commercialisation of grid systems, while others dismiss the term as a mere re-branding of existing pay-per-use technologies. From either perspective, 'cloud' is now the label of choice for accountable pay-per-use access to third party applications and computational resources on a massive scale. Clouds support patterns of less predictable resource use for applications and services a

  16. Computer security

    CERN Document Server

    Gollmann, Dieter

    2011-01-01

    A completely up-to-date resource on computer security Assuming no previous experience in the field of computer security, this must-have book walks you through the many essential aspects of this vast topic, from the newest advances in software and technology to the most recent information on Web applications security. This new edition includes sections on Windows NT, CORBA, and Java and discusses cross-site scripting and JavaScript hacking as well as SQL injection. Serving as a helpful introduction, this self-study guide is a wonderful starting point for examining the variety of competing sec

  17. Computational engineering

    CERN Document Server

    2014-01-01

    The book presents state-of-the-art works in computational engineering. Focus is on mathematical modeling, numerical simulation, experimental validation and visualization in engineering sciences. In particular, the following topics are presented: constitutive models and their implementation into finite element codes, numerical models in nonlinear elasto-dynamics including seismic excitations, multiphase models in structural engineering and multiscale models of materials systems, sensitivity and reliability analysis of engineering structures, the application of scientific computing in urban water management and hydraulic engineering, and the application of genetic algorithms for the registration of laser scanner point clouds.

  18. Computational vision

    Science.gov (United States)

    Barrow, H. G.; Tenenbaum, J. M.

    1981-01-01

    The range of fundamental computational principles underlying human vision that equally apply to artificial and natural systems is surveyed. There emerges from research a view of the structuring of vision systems as a sequence of levels of representation, with the initial levels being primarily iconic (edges, regions, gradients) and the highest symbolic (surfaces, objects, scenes). Intermediate levels are constrained by information made available by preceding levels and information required by subsequent levels. In particular, it appears that physical and three-dimensional surface characteristics provide a critical transition from iconic to symbolic representations. A plausible vision system design incorporating these principles is outlined, and its key computational processes are elaborated.

  19. Reconfigurable Computing

    CERN Document Server

    Cardoso, Joao MP

    2011-01-01

    As the complexity of modern embedded systems increases, it becomes less practical to design monolithic processing platforms. As a result, reconfigurable computing is being adopted widely for more flexible design. Reconfigurable Computers offer the spatial parallelism and fine-grained customizability of application-specific circuits with the postfabrication programmability of software. To make the most of this unique combination of performance and flexibility, designers need to be aware of both hardware and software issues. FPGA users must think not only about the gates needed to perform a comp

  20. Computer systems

    Science.gov (United States)

    Olsen, Lola

    1992-01-01

    In addition to the discussions, Ocean Climate Data Workshop hosts gave participants an opportunity to hear about, see, and test for themselves some of the latest computer tools now available for those studying climate change and the oceans. Six speakers described computer systems and their functions. The introductory talks were followed by demonstrations to small groups of participants and some opportunities for participants to get hands-on experience. After this familiarization period, attendees were invited to return during the course of the Workshop and have one-on-one discussions and further hands-on experience with these systems. Brief summaries or abstracts of introductory presentations are addressed.

  1. Computer viruses

    Science.gov (United States)

    Denning, Peter J.

    1988-01-01

    The worm, Trojan horse, bacterium, and virus are destructive programs that attack information stored in a computer's memory. Virus programs, which propagate by incorporating copies of themselves into other programs, are a growing menace in the late-1980s world of unprotected, networked workstations and personal computers. Limited immunity is offered by memory protection hardware, digitally authenticated object programs,and antibody programs that kill specific viruses. Additional immunity can be gained from the practice of digital hygiene, primarily the refusal to use software from untrusted sources. Full immunity requires attention in a social dimension, the accountability of programmers.

  2. Computational artifacts

    DEFF Research Database (Denmark)

    Schmidt, Kjeld; Bansler, Jørgen P.

    2016-01-01

    The key concern of CSCW research is that of understanding computing technologies in the social context of their use, that is, as integral features of our practices and our lives, and to think of their design and implementation under that perspective. However, the question of the nature...... of that which is actually integrated in our practices is often discussed in confusing ways, if at all. The article aims to try to clarify the issue and in doing so revisits and reconsiders the notion of ‘computational artifact’....

  3. Computational Logistics

    DEFF Research Database (Denmark)

    This book constitutes the refereed proceedings of the 4th International Conference on Computational Logistics, ICCL 2013, held in Copenhagen, Denmark, in September 2013. The 19 papers presented in this volume were carefully reviewed and selected for inclusion in the book. They are organized in to...... in topical sections named: maritime shipping, road transport, vehicle routing problems, aviation applications, and logistics and supply chain management.......This book constitutes the refereed proceedings of the 4th International Conference on Computational Logistics, ICCL 2013, held in Copenhagen, Denmark, in September 2013. The 19 papers presented in this volume were carefully reviewed and selected for inclusion in the book. They are organized...

  4. Computational Logistics

    DEFF Research Database (Denmark)

    Pacino, Dario; Voss, Stefan; Jensen, Rune Møller

    2013-01-01

    This book constitutes the refereed proceedings of the 4th International Conference on Computational Logistics, ICCL 2013, held in Copenhagen, Denmark, in September 2013. The 19 papers presented in this volume were carefully reviewed and selected for inclusion in the book. They are organized in to...... in topical sections named: maritime shipping, road transport, vehicle routing problems, aviation applications, and logistics and supply chain management.......This book constitutes the refereed proceedings of the 4th International Conference on Computational Logistics, ICCL 2013, held in Copenhagen, Denmark, in September 2013. The 19 papers presented in this volume were carefully reviewed and selected for inclusion in the book. They are organized...

  5. Computer busses

    CERN Document Server

    Buchanan, William

    2000-01-01

    As more and more equipment is interface or'bus' driven, either by the use of controllers or directly from PCs, the question of which bus to use is becoming increasingly important both in industry and in the office. 'Computer Busses' has been designed to help choose the best type of bus for the particular application.There are several books which cover individual busses, but none which provide a complete guide to computer busses. The author provides a basic theory of busses and draws examples and applications from real bus case studies. Busses are analysed using from a top-down approach, helpin

  6. The discrete-dipole-approximation code ADDA: capabilities and known limitations

    NARCIS (Netherlands)

    Yurkin, M.A.; Hoekstra, A.G.

    2011-01-01

    The open-source code ADDA is described, which implements the discrete dipole approximation (DDA), a method to simulate light scattering by finite 3D objects of arbitrary shape and composition. Besides standard sequential execution, ADDA can run on a multiprocessor distributed-memory system,

  7. Riemannian computing in computer vision

    CERN Document Server

    Srivastava, Anuj

    2016-01-01

    This book presents a comprehensive treatise on Riemannian geometric computations and related statistical inferences in several computer vision problems. This edited volume includes chapter contributions from leading figures in the field of computer vision who are applying Riemannian geometric approaches in problems such as face recognition, activity recognition, object detection, biomedical image analysis, and structure-from-motion. Some of the mathematical entities that necessitate a geometric analysis include rotation matrices (e.g. in modeling camera motion), stick figures (e.g. for activity recognition), subspace comparisons (e.g. in face recognition), symmetric positive-definite matrices (e.g. in diffusion tensor imaging), and function-spaces (e.g. in studying shapes of closed contours).   ·         Illustrates Riemannian computing theory on applications in computer vision, machine learning, and robotics ·         Emphasis on algorithmic advances that will allow re-application in other...

  8. Computational Finance

    DEFF Research Database (Denmark)

    Rasmussen, Lykke

    One of the major challenges in todays post-crisis finance environment is calculating the sensitivities of complex products for hedging and risk management. Historically, these derivatives have been determined using bump-and-revalue, but due to the increasing magnitude of these computations does...

  9. Optical Computing

    Indian Academy of Sciences (India)

    (For example, the Japanese Earth Simu- lator, a computer system developed by NEC, uses a ..... quite similar to the one shown in Figure 1, except that the phthalocyanine film was replaced by a hollow fiber ... and hence funds were provided accordingly. The areas of space exploration, earth resource utilization, communi-.

  10. Computing News

    CERN Multimedia

    McCubbin, N

    2001-01-01

    We are still five years from the first LHC data, so we have plenty of time to get the computing into shape, don't we? Well, yes and no: there is time, but there's an awful lot to do! The recently-completed CERN Review of LHC Computing gives the flavour of the LHC computing challenge. The hardware scale for each of the LHC experiments is millions of 'SpecInt95' (SI95) units of cpu power and tens of PetaBytes of data storage. PCs today are about 20-30SI95, and expected to be about 100 SI95 by 2005, so it's a lot of PCs. This hardware will be distributed across several 'Regional Centres' of various sizes, connected by high-speed networks. How to realise this in an orderly and timely fashion is now being discussed in earnest by CERN, Funding Agencies, and the LHC experiments. Mixed in with this is, of course, the GRID concept...but that's a topic for another day! Of course hardware, networks and the GRID constitute just one part of the computing. Most of the ATLAS effort is spent on software development. What we ...

  11. Statistical Computing

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 4; Issue 10. Statistical Computing - Understanding Randomness and Random Numbers. Sudhakar Kunte. Series Article Volume 4 Issue 10 October 1999 pp 16-21. Fulltext. Click here to view fulltext PDF. Permanent link:

  12. Quantum Computing

    Indian Academy of Sciences (India)

    It was suggested that the dynamics of quantum systems could be used to perform computation in a much more efficient way. After this initial excitement, things slowed down for some time till 1994 when Peter Shor announced his polynomial time factorization algorithm 1 which uses quantum dynamics. The study of quantum ...

  13. [Grid computing

    CERN Multimedia

    Wolinsky, H

    2003-01-01

    "Turn on a water spigot, and it's like tapping a bottomless barrel of water. Ditto for electricity: Flip the switch, and the supply is endless. But computing is another matter. Even with the Internet revolution enabling us to connect in new ways, we are still limited to self-contained systems running locally stored software, limited by corporate, institutional and geographic boundaries" (1 page).

  14. Optical Computing

    Indian Academy of Sciences (India)

    Debabrata Goswami is at the. Tata Institute of Fundamen- tal Research, Mumbai, where he explores the applications of ultrafast shaped pulses to coherent control, high-speed communication and computing. He is also associated as a Visiting. Faculty at liT, Kanpur, where he will be teaching a new course on Quantum.

  15. Cloud computing.

    Science.gov (United States)

    Wink, Diane M

    2012-01-01

    In this bimonthly series, the author examines how nurse educators can use Internet and Web-based technologies such as search, communication, and collaborative writing tools; social networking and social bookmarking sites; virtual worlds; and Web-based teaching and learning programs. This article describes how cloud computing can be used in nursing education.

  16. Computational biology

    DEFF Research Database (Denmark)

    Hartmann, Lars Røeboe; Jones, Neil; Simonsen, Jakob Grue

    2011-01-01

    Computation via biological devices has been the subject of close scrutiny since von Neumann’s early work some 60 years ago. In spite of the many relevant works in this field, the notion of programming biological devices seems to be, at best, ill-defined. While many devices are claimed or proved t...

  17. Optical Computing

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 8; Issue 6. Optical Computing - Optical Components and Storage Systems. Debabrata Goswami. General Article Volume 8 Issue 6 June 2003 pp 56-71. Fulltext. Click here to view fulltext PDF. Permanent link:

  18. Optical Computing

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 8; Issue 7. Optical Computing - Research Trends. Debabrata Goswami. General Article Volume 8 Issue 7 July 2003 pp 8-21. Fulltext. Click here to view fulltext PDF. Permanent link: http://www.ias.ac.in/article/fulltext/reso/008/07/0008-0021. Keywords.

  19. Quantum Computation

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 16; Issue 9. Quantum Computation - Particle and Wave Aspects of Algorithms. Apoorva Patel. General Article Volume 16 Issue 9 September 2011 pp 821-835. Fulltext. Click here to view fulltext PDF. Permanent link:

  20. Quantum Computing

    Indian Academy of Sciences (India)

    quantum dynamics. The study of quantum systems for computation has come into its own since then. In this article we will look at a few concepts which make this framewor k so powerful. 2. Quantum Physics Basics. Consider an electron (say, in a H atom) with two energy levels (ground state and one excited state). In general ...

  1. Computable Frames in Computable Banach Spaces

    Directory of Open Access Journals (Sweden)

    S.K. Kaushik

    2016-06-01

    Full Text Available We develop some parts of the frame theory in Banach spaces from the point of view of Computable Analysis. We define computable M-basis and use it to construct a computable Banach space of scalar valued sequences. Computable Xd frames and computable Banach frames are also defined and computable versions of sufficient conditions for their existence are obtained.

  2. Monitoring of computing resource use of active software releases at ATLAS

    CERN Document Server

    AUTHOR|(INSPIRE)INSPIRE-00219183; The ATLAS collaboration

    2017-01-01

    The LHC is the world’s most powerful particle accelerator, colliding protons at centre of mass energy of 13 TeV. As the energy and frequency of collisions has grown in the search for new physics, so too has demand for computing resources needed for event reconstruction. We will report on the evolution of resource usage in terms of CPU and RAM in key ATLAS offline reconstruction workflows at the TierO at CERN and on the WLCG. Monitoring of workflows is achieved using the ATLAS PerfMon package, which is the standard ATLAS performance monitoring system running inside Athena jobs. Systematic daily monitoring has recently been expanded to include all workflows beginning at Monte Carlo generation through to end-user physics analysis, beyond that of event reconstruction. Moreover, the move to a multiprocessor mode in production jobs has facilitated the use of tools, such as “MemoryMonitor”, to measure the memory shared across processors in jobs. Resource consumption is broken down into software domains and dis...

  3. Monitoring of Computing Resource Use of Active Software Releases in ATLAS

    CERN Document Server

    Limosani, Antonio; The ATLAS collaboration

    2016-01-01

    The LHC is the world's most powerful particle accelerator, colliding protons at centre of mass energy of 13 TeV. As the energy and frequency of collisions has grown in the search for new physics, so too has demand for computing resources needed for event reconstruction. We will report on the evolution of resource usage in terms of CPU and RAM in key ATLAS offline reconstruction workflows at the Tier0 at CERN and on the WLCG. Monitoring of workflows is achieved using the ATLAS PerfMon package, which is the standard ATLAS performance monitoring system running inside Athena jobs. Systematic daily monitoring has recently been expanded to include all workflows beginning at Monte Carlo generation through to end user physics analysis, beyond that of event reconstruction. Moreover, the move to a multiprocessor mode in production jobs has facilitated the use of tools, such as "MemoryMonitor", to measure the memory shared across processors in jobs. Resource consumption is broken down into software domains and displayed...

  4. Computational Electromagnetics

    CERN Document Server

    Rylander, Thomas; Bondeson, Anders

    2013-01-01

    Computational Electromagnetics is a young and growing discipline, expanding as a result of the steadily increasing demand for software for the design and analysis of electrical devices. This book introduces three of the most popular numerical methods for simulating electromagnetic fields: the finite difference method, the finite element method and the method of moments. In particular it focuses on how these methods are used to obtain valid approximations to the solutions of Maxwell's equations, using, for example, "staggered grids" and "edge elements." The main goal of the book is to make the reader aware of different sources of errors in numerical computations, and also to provide the tools for assessing the accuracy of numerical methods and their solutions. To reach this goal, convergence analysis, extrapolation, von Neumann stability analysis, and dispersion analysis are introduced and used frequently throughout the book. Another major goal of the book is to provide students with enough practical understan...

  5. Extending the length and time scales of Gram–Schmidt Lyapunov vector computations

    International Nuclear Information System (INIS)

    Costa, Anthony B.; Green, Jason R.

    2013-01-01

    Lyapunov vectors have found growing interest recently due to their ability to characterize systems out of thermodynamic equilibrium. The computation of orthogonal Gram–Schmidt vectors requires multiplication and QR decomposition of large matrices, which grow as N 2 (with the particle count). This expense has limited such calculations to relatively small systems and short time scales. Here, we detail two implementations of an algorithm for computing Gram–Schmidt vectors. The first is a distributed-memory message-passing method using Scalapack. The second uses the newly-released MAGMA library for GPUs. We compare the performance of both codes for Lennard–Jones fluids from N=100 to 1300 between Intel Nahalem/Infiniband DDR and NVIDIA C2050 architectures. To our best knowledge, these are the largest systems for which the Gram–Schmidt Lyapunov vectors have been computed, and the first time their calculation has been GPU-accelerated. We conclude that Lyapunov vector calculations can be significantly extended in length and time by leveraging the power of GPU-accelerated linear algebra

  6. A family of domain decomposition methods for the massively parallel solution of computational mechanics problems

    Science.gov (United States)

    Pierson, Kendall Hugh

    The Finite Element Tearing and Interconnecting (FETI) algorithms are numerically scalable iterative domain decomposition methods for solving systems of equations generated from the finite element discretization of second- or fourth-order elasticity problems. These methods have been substantially improved over the last ten years and recently shown parallel scalability up to one thousand processors. The purpose of this thesis is to present and investigate a dual-primal FETI method, which addresses some of the critical issues related to the original FETI methods. These critical issues involve the accurate computation of the local rigid body modes, the cost and size of the FETI coarse problems with respect to fourth-order elasticity problems, and the overall robustness and versatility of the equation solver. These improvements due to the dual-primal FETI formulation are especially beneficial when implemented on massively parallel distributed memory computers such as the Accelerated Strategic Computing Initiative (ASCI) Red Option supercomputer. Numerical results will be shown detailing scalability with respect to the mesh size, subdomain size, and the number of elements per subdomain for both second- and fourth-order elasticity problems. Parallel scalability will be reported for various large scale realistic problems on a SGI Origin 2000 and the ASCI Red option massively parallel supercomputer. Lastly, results from linear dynamics, eigenvalue analysis and geometrically non-linear static problems will be shown highlighting the benefits of FETI methods for solving large-scale problems with multiple right hand sides.

  7. Spatial Computation

    Science.gov (United States)

    2003-12-01

    2001), Las Vegas, June 2001. [BRM+99] Jonathan Babb, Martin Rinard, Csaba Andras Moritz, Walter Lee, Matthew Frank Rajeev Barua, and Saman...Springer Verlag. [CA88] David E. Culler and Arvind. Resource requirements of dataflow programs. In International Symposium on Computer Architecture...Rajeev Barua, Matthew Frank, Devabhaktuni Srikrishna, Jonathan Babb, Vivek Sarkar, and Saman Amarasinghe. Space-time scheduling of instruction-level

  8. Computed radiography

    International Nuclear Information System (INIS)

    Pupchek, G.

    2004-01-01

    Computed radiography (CR) is an image acquisition process that is used to create digital, 2-dimensional radiographs. CR employs a photostimulable phosphor-based imaging plate, replacing the standard x-ray film and intensifying screen combination. Conventional radiographic exposure equipment is used with no modification required to the existing system. CR can transform an analog x-ray department into a digital one and eliminates the need for chemicals, water, darkrooms and film processor headaches. (author)

  9. Parallelization of a Monte Carlo particle transport simulation code

    Science.gov (United States)

    Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

    2010-05-01

    We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.

  10. Customizable computing

    CERN Document Server

    Chen, Yu-Ting; Gill, Michael; Reinman, Glenn; Xiao, Bingjun

    2015-01-01

    Since the end of Dennard scaling in the early 2000s, improving the energy efficiency of computation has been the main concern of the research community and industry. The large energy efficiency gap between general-purpose processors and application-specific integrated circuits (ASICs) motivates the exploration of customizable architectures, where one can adapt the architecture to the workload. In this Synthesis lecture, we present an overview and introduction of the recent developments on energy-efficient customizable architectures, including customizable cores and accelerators, on-chip memory

  11. Multiparty Computations

    DEFF Research Database (Denmark)

    Dziembowski, Stefan

    papers [1,2]. In [1] we assume that the adversary can corrupt any set from a given adversary structure. In this setting we study a problem of doing efficient VSS and MPC given an access to a secret sharing scheme (SS). For all adversary structures where VSS is possible at all, we show that, up...... here and discuss other problems caused by the adaptiveness. All protocols in the thesis are formally specified and the proofs of their security are given. [1]Ronald Cramer, Ivan Damgård, Stefan Dziembowski, Martin Hirt, and Tal Rabin. Efficient multiparty computations with dishonest minority...

  12. Computer vision

    Science.gov (United States)

    Gennery, D.; Cunningham, R.; Saund, E.; High, J.; Ruoff, C.

    1981-01-01

    The field of computer vision is surveyed and assessed, key research issues are identified, and possibilities for a future vision system are discussed. The problems of descriptions of two and three dimensional worlds are discussed. The representation of such features as texture, edges, curves, and corners are detailed. Recognition methods are described in which cross correlation coefficients are maximized or numerical values for a set of features are measured. Object tracking is discussed in terms of the robust matching algorithms that must be devised. Stereo vision, camera control and calibration, and the hardware and systems architecture are discussed.

  13. FPGA-based distributed computing microarchitecture for complex physical dynamics investigation.

    Science.gov (United States)

    Borgese, Gianluca; Pace, Calogero; Pantano, Pietro; Bilotta, Eleonora

    2013-09-01

    In this paper, we present a distributed computing system, called DCMARK, aimed at solving partial differential equations at the basis of many investigation fields, such as solid state physics, nuclear physics, and plasma physics. This distributed architecture is based on the cellular neural network paradigm, which allows us to divide the differential equation system solving into many parallel integration operations to be executed by a custom multiprocessor system. We push the number of processors to the limit of one processor for each equation. In order to test the present idea, we choose to implement DCMARK on a single FPGA, designing the single processor in order to minimize its hardware requirements and to obtain a large number of easily interconnected processors. This approach is particularly suited to study the properties of 1-, 2- and 3-D locally interconnected dynamical systems. In order to test the computing platform, we implement a 200 cells, Korteweg-de Vries (KdV) equation solver and perform a comparison between simulations conducted on a high performance PC and on our system. Since our distributed architecture takes a constant computing time to solve the equation system, independently of the number of dynamical elements (cells) of the CNN array, it allows us to reduce the elaboration time more than other similar systems in the literature. To ensure a high level of reconfigurability, we design a compact system on programmable chip managed by a softcore processor, which controls the fast data/control communication between our system and a PC Host. An intuitively graphical user interface allows us to change the calculation parameters and plot the results.

  14. Manyscale Computing for Sensor Processing in Support of Space Situational Awareness

    Science.gov (United States)

    Schmalz, M.; Chapman, W.; Hayden, E.; Sahni, S.; Ranka, S.

    2014-09-01

    Increasing image and signal data burden associated with sensor data processing in support of space situational awareness implies continuing computational throughput growth beyond the petascale regime. In addition to growing applications data burden and diversity, the breadth, diversity and scalability of high performance computing architectures and their various organizations challenge the development of a single, unifying, practicable model of parallel computation. Therefore, models for scalable parallel processing have exploited architectural and structural idiosyncrasies, yielding potential misapplications when legacy programs are ported among such architectures. In response to this challenge, we have developed a concise, efficient computational paradigm and software called Manyscale Computing to facilitate efficient mapping of annotated application codes to heterogeneous parallel architectures. Our theory, algorithms, software, and experimental results support partitioning and scheduling of application codes for envisioned parallel architectures, in terms of work atoms that are mapped (for example) to threads or thread blocks on computational hardware. Because of the rigor, completeness, conciseness, and layered design of our manyscale approach, application-to-architecture mapping is feasible and scalable for architectures at petascales, exascales, and above. Further, our methodology is simple, relying primarily on a small set of primitive mapping operations and support routines that are readily implemented on modern parallel processors such as graphics processing units (GPUs) and hybrid multi-processors (HMPs). In this paper, we overview the opportunities and challenges of manyscale computing for image and signal processing in support of space situational awareness applications. We discuss applications in terms of a layered hardware architecture (laboratory > supercomputer > rack > processor > component hierarchy). Demonstration applications include

  15. Research in Computer Forensics

    Science.gov (United States)

    2002-06-01

    3 D. WHAT IS COMPUTER FORENSICS ..........................................................6 E. SURVEY OF AGENCIES AND VENDORS PROVIDING COMPUTER...lead to the formulation of computer forensic material for a potential Computer Forensic Course at NPS. 6 D. WHAT IS COMPUTER FORENSICS...Individualization 8. Reconstruction 63 What is Computer Forensics? Computer Forensics involves the identification, extraction, preservation and

  16. Computer Tree

    Directory of Open Access Journals (Sweden)

    Onur AĞAOĞLU

    2014-12-01

    Full Text Available It is crucial that gifted and talented students should be supported by different educational methods for their interests and skills. The science and arts centres (gifted centres provide the Supportive Education Program for these students with an interdisciplinary perspective. In line with the program, an ICT lesson entitled “Computer Tree” serves for identifying learner readiness levels, and defining the basic conceptual framework. A language teacher also contributes to the process, since it caters for the creative function of the basic linguistic skills. The teaching technique is applied for 9-11 aged student level. The lesson introduces an evaluation process including basic information, skills, and interests of the target group. Furthermore, it includes an observation process by way of peer assessment. The lesson is considered to be a good sample of planning for any subject, for the unpredicted convergence of visual and technical abilities with linguistic abilities.

  17. Brain computer

    Directory of Open Access Journals (Sweden)

    Sarah N. Abdulkader

    2015-07-01

    Full Text Available Brain computer interface technology represents a highly growing field of research with application systems. Its contributions in medical fields range from prevention to neuronal rehabilitation for serious injuries. Mind reading and remote communication have their unique fingerprint in numerous fields such as educational, self-regulation, production, marketing, security as well as games and entertainment. It creates a mutual understanding between users and the surrounding systems. This paper shows the application areas that could benefit from brain waves in facilitating or achieving their goals. We also discuss major usability and technical challenges that face brain signals utilization in various components of BCI system. Different solutions that aim to limit and decrease their effects have also been reviewed.

  18. Social Computing

    CERN Multimedia

    CERN. Geneva

    2011-01-01

    The past decade has witnessed a momentous transformation in the way people interact with each other. Content is now co-produced, shared, classified, and rated by millions of people, while attention has become the ephemeral and valuable resource that everyone seeks to acquire. This talk will describe how social attention determines the production and consumption of content within both the scientific community and social media, how its dynamics can be used to predict the future and the role that social media plays in setting the public agenda. About the speaker Bernardo Huberman is a Senior HP Fellow and Director of the Social Computing Lab at Hewlett Packard Laboratories. He received his Ph.D. in Physics from the University of Pennsylvania, and is currently a Consulting Professor in the Department of Applied Physics at Stanford University. He originally worked in condensed matter physics, ranging from superionic conductors to two-dimensional superfluids, and made contributions to the theory of critical p...

  19. computer networks

    Directory of Open Access Journals (Sweden)

    N. U. Ahmed

    2002-01-01

    Full Text Available In this paper, we construct a new dynamic model for the Token Bucket (TB algorithm used in computer networks and use systems approach for its analysis. This model is then augmented by adding a dynamic model for a multiplexor at an access node where the TB exercises a policing function. In the model, traffic policing, multiplexing and network utilization are formally defined. Based on the model, we study such issues as (quality of service QoS, traffic sizing and network dimensioning. Also we propose an algorithm using feedback control to improve QoS and network utilization. Applying MPEG video traces as the input traffic to the model, we verify the usefulness and effectiveness of our model.

  20. Computational neuroscience

    CERN Document Server

    Blackwell, Kim L

    2014-01-01

    Progress in Molecular Biology and Translational Science provides a forum for discussion of new discoveries, approaches, and ideas in molecular biology. It contains contributions from leaders in their fields and abundant references. This volume brings together different aspects of, and approaches to, molecular and multi-scale modeling, with applications to a diverse range of neurological diseases. Mathematical and computational modeling offers a powerful approach for examining the interaction between molecular pathways and ionic channels in producing neuron electrical activity. It is well accepted that non-linear interactions among diverse ionic channels can produce unexpected neuron behavior and hinder a deep understanding of how ion channel mutations bring about abnormal behavior and disease. Interactions with the diverse signaling pathways activated by G protein coupled receptors or calcium influx adds an additional level of complexity. Modeling is an approach to integrate myriad data sources into a cohesiv...

  1. Computed tomography

    International Nuclear Information System (INIS)

    Boyd, D.P.

    1989-01-01

    This paper reports on computed tomographic (CT) scanning which has improved computer-assisted imaging modalities for radiologic diagnosis. The advantage of this modality is its ability to image thin cross-sectional planes of the body, thus uncovering density information in three dimensions without tissue superposition problems. Because this enables vastly superior imaging of soft tissues in the brain and body, CT scanning was immediately successful and continues to grow in importance as improvements are made in speed, resolution, and cost efficiency. CT scanners are used for general purposes, and the more advanced machines are generally preferred in large hospitals, where volume and variety of usage justifies the cost. For imaging in the abdomen, a scanner with a rapid speed is preferred because peristalsis, involuntary motion of the diaphram, and even cardiac motion are present and can significantly degrade image quality. When contrast media is used in imaging to demonstrate scanner, immediate review of images, and multiformat hardcopy production. A second console is reserved for the radiologist to read images and perform the several types of image analysis that are available. Since CT images contain quantitative information in terms of density values and contours of organs, quantitation of volumes, areas, and masses is possible. This is accomplished with region-of- interest methods, which involve the electronic outlining of the selected region of the television display monitor with a trackball-controlled cursor. In addition, various image- processing options, such as edge enhancement (for viewing fine details of edges) or smoothing filters (for enhancing the detectability of low-contrast lesions) are useful tools

  2. A Comparison of PETSC Library and HPF Implementations of an Archetypal PDE Computation

    Science.gov (United States)

    Hayder, M. Ehtesham; Keyes, David E.; Mehrotra, Piyush

    1997-01-01

    Two paradigms for distributed-memory parallel computation that free the application programmer from the details of message passing are compared for an archetypal structured scientific computation a nonlinear, structured-grid partial differential equation boundary value problem using the same algorithm on the same hardware. Both paradigms, parallel libraries represented by Argonne's PETSC, and parallel languages represented by the Portland Group's HPF, are found to be easy to use for this problem class, and both are reasonably effective in exploiting concurrency after a short learning curve. The level of involvement required by the application programmer under either paradigm includes specification of the data partitioning (corresponding to a geometrically simple decomposition of the domain of the PDE). Programming in SPAM style for the PETSC library requires writing the routines that discretize the PDE and its Jacobian, managing subdomain-to-processor mappings (affine global- to-local index mappings), and interfacing to library solver routines. Programming for HPF requires a complete sequential implementation of the same algorithm, introducing concurrency through subdomain blocking (an effort similar to the index mapping), and modest experimentation with rewriting loops to elucidate to the compiler the latent concurrency. Correctness and scalability are cross-validated on up to 32 nodes of an IBM SP2.

  3. Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures

    Energy Technology Data Exchange (ETDEWEB)

    Dong, Fengguang [Univ. of Tennessee, Knoxville, TN (United States); Tomov, Stanimire [Univ. of Tennessee, Knoxville, TN (United States); Dongarra, Jack [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

    2011-06-01

    We present a new methodology for utilizing all CPU cores and all GPUs on a heterogeneous multicore and multi-GPU system to support matrix computations e ciently. Our approach is able to achieve the objectives of a high degree of parallelism, minimized synchronization, minimized communication, and load balancing. Our main idea is to treat the heterogeneous system as a distributed-memory machine, and to use a heterogeneous 1-D block cyclic distribution to allocate data to the host system and GPUs to minimize communication. We have designed heterogeneous algorithms with two di erent tile sizes (one for CPU cores and the other for GPUs) to cope with processor heterogeneity. We propose an auto-tuning method to determine the best tile sizes to attain both high performance and load balancing. We have also implemented a new runtime system and applied it to the Cholesky and QR factorizations. Our experiments on a compute node with two Intel Westmere hexa-core CPUs and three Nvidia Fermi GPUs demonstrate good weak scalability, strong scalability, load balance, and e ciency of our approach.

  4. Tiling and Asynchronous Communication Optimizations for Stencil Computations

    KAUST Repository

    Malas, Tareq

    2015-12-07

    The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large bandwidth and low latency of caches to accelerate stencil updates and approach theoretical peak performance. A key ingredient is the reduction of data traffic across slow data paths, especially the main memory interface. Most of the established work concentrates on updating separate cache blocks per thread, which works on all types of shared memory systems, regardless of whether there is a shared cache among the cores. This approach is memory-bandwidth limited in several situations, where the cache space for each thread can be too small to provide sufficient in-cache data reuse. We introduce a generalized multi-dimensional intra-tile parallelization scheme for shared-cache multicore processors that results in a significant reduction of cache size requirements and shows a large saving in memory bandwidth usage compared to existing approaches. It also provides data access patterns that allow efficient hardware prefetching. Our parameterized thread groups concept provides a controllable trade-off between concurrency and memory usage, shifting the pressure between the memory interface and the Central Processing Unit (CPU).We also introduce efficient diamond tiling structure for both shared memory cache blocking and distributed memory relaxed-synchronization communication, demonstrated using one-dimensional domain decomposition. We describe the approach and our open-source testbed implementation details (called Girih), present performance results on contemporary Intel processors, and apply advanced performance modeling techniques to reconcile the observed performance with hardware capabilities. Furthermore, we conduct a comparison with the state-of-the-art stencil frameworks PLUTO and Pochoir in shared memory, using corner-case stencil operators. We study the

  5. Task-Driven Computing

    National Research Council Canada - National Science Library

    Wang, Zhenyu

    2000-01-01

    .... They will want to use the resources to perform computing tasks. Today's computing infrastructure does not support this model of computing very well because computers interact with users in terms of low level abstractions...

  6. Experimental DNA computing

    NARCIS (Netherlands)

    Henkel, Christiaan

    2005-01-01

    Because of their information storing and processing capabilities, nucleic acids are interesting building blocks for molecular scale computers. Potential applications of such DNA computers range from massively parallel computation to computational gene therapy. In this thesis, several implementations

  7. Computer Refurbishment

    International Nuclear Information System (INIS)

    Ichiyen, Norman; Chan, Dominic; Thompson, Paul

    2004-01-01

    The major activity for the 18-month refurbishment outage at the Point Lepreau Generating Station is the replacement of all 380 fuel channel and calandria tube assemblies and the lower portion of connecting feeder pipes. New Brunswick Power would also take advantage of this outage to conduct a number of repairs, replacements, inspections and upgrades (such as rewinding or replacing the generator, replacement of shutdown system trip computers, replacement of certain valves and expansion joints, inspection of systems not normally accessible, etc.). This would allow for an additional 25 to 30 years. Among the systems to be replaced are the PDC's for both shutdown systems. Assessments have been completed for both the SDS1 and SDS2 PDC's, and it has been decided to replace the SDS2 PDCs with the same hardware and software approach that has been used successfully for the Wolsong 2, 3, and 4 and the Qinshan 1 and 2 SDS2 PDCs. For SDS1, it has been decided to use the same software development methodology that was used successfully for the Wolsong and Qinshan called the I A and to use a new hardware platform in order to ensure successful operation for the 25-30 year station operating life. The selected supplier is Triconex, which uses a triple modular redundant architecture that will enhance the robustness/fault tolerance of the design with respect to equipment failures

  8. Computed radiography

    International Nuclear Information System (INIS)

    Itoh, Hiroshi

    1987-01-01

    In an effort to evaluate the feasibility of introducing computed radiography (FCR) into mass screening for lung cancer, the ability of FCR to detect nodules one cm in diameter was examined using a humanoid chest phantom. Based on the receiver operating characteristic (ROC) analysis, the detectability of FCR was compared with that of conventional radiography and photofluorography. The values of area under ROC curves were higher for FCR (0.963 for image similar to that with conventional film-intensifying screen system, image A; and 0.952 for processed image, image B) than the other two methods (0.774 for radiography and 0.789 for photofluorography). Degradation of image quality in FCR could be avoided by a wide latitude even if proper exposure techniques might not be employed. Images A and B in FCR yielded excellent delineation for nodules in the lung field and in the retrocardiac and subdiaphragmatic regions, respectively. This may have implications for the value of simultaneous interpretation of both images in increasing diagnostic accuracy. Structured noise of the ribs and blood vessels had scarcely an effect on nodule detectability in FCR. Radiation dose could be reduced to one third of the standard dose. It can thus be concluded that FCR is feasible in mass screening for lung cancer in terms of increased diagnostic ability and low radiation doses. (Namekawa, K.)

  9. Analog and hybrid computing

    CERN Document Server

    Hyndman, D E

    2013-01-01

    Analog and Hybrid Computing focuses on the operations of analog and hybrid computers. The book first outlines the history of computing devices that influenced the creation of analog and digital computers. The types of problems to be solved on computers, computing systems, and digital computers are discussed. The text looks at the theory and operation of electronic analog computers, including linear and non-linear computing units and use of analog computers as operational amplifiers. The monograph examines the preparation of problems to be deciphered on computers. Flow diagrams, methods of ampl

  10. Prestack Parallel Modeling of Dispersive and Attenuative Medium

    Directory of Open Access Journals (Sweden)

    How-Wei Chen

    2006-01-01

    Full Text Available This study presents an efficient parallelized staggered grid pseudospectral method for 2-D viscoacoustic seismic waveform modeling that runs in a highperformance multi-processor computer and an in-house developed PC cluster. Parallel simulation permits several processors to be used for solving a single large problem with a high computation to communication ratio. Thus, parallelizing the serial scheme effectively reduces the computation time. Computational results indicate a reasonably consistent parallel performance when using different FFTs in pseudospectral computations. Meanwhile, a virtually perfect linear speedup can be achieved in a distributed- memory multi-processor environment. Effectiveness of the proposed algorithm is demonstrated using synthetic examples by simulating multiple shot gathers consistent with field coordinates. For dispersive and attenuating media, the propagating wavefield possesses the observable differences in waveform, amplitude and travel-times. The resulting effects on seismic signals, such as the decreased amplitude because of intrinsic Q and temporal shift because of physical dispersion phenomena, can be analyzed quantitatively. Anelastic effects become more visible owing to cumulative propagation effects. Field data application is presented in simulating OBS wide-angle seismic marine data for deep crustal structure study. The fine details of deep crustal velocity and attenuation structures in the survey area can be resolved by comparing simulated waveforms with observed seismograms recorded at various distances. Parallel performance is analyzed through speedup and efficiency for a variety of computing platforms. Effective parallel implementation requires numerous independent CPU intensive sub-jobs with low latency and high bandwidth inter-processor communication.

  11. Computational technologies advanced topics

    CERN Document Server

    Vabishchevich, Petr N

    2015-01-01

    This book discusses questions of numerical solutions of applied problems on parallel computing systems. Nowadays, engineering and scientific computations are carried out on parallel computing systems, which provide parallel data processing on a few computing nodes. In constructing computational algorithms, mathematical problems are separated in relatively independent subproblems in order to solve them on a single computing node.

  12. Computing handbook computer science and software engineering

    CERN Document Server

    Gonzalez, Teofilo; Tucker, Allen

    2014-01-01

    Overview of Computer Science Structure and Organization of Computing Peter J. DenningComputational Thinking Valerie BarrAlgorithms and Complexity Data Structures Mark WeissBasic Techniques for Design and Analysis of Algorithms Edward ReingoldGraph and Network Algorithms Samir Khuller and Balaji RaghavachariComputational Geometry Marc van KreveldComplexity Theory Eric Allender, Michael Loui, and Kenneth ReganFormal Models and Computability Tao Jiang, Ming Li, and Bala

  13. Specialized computer architectures for computational aerodynamics

    Science.gov (United States)

    Stevenson, D. K.

    1978-01-01

    In recent years, computational fluid dynamics has made significant progress in modelling aerodynamic phenomena. Currently, one of the major barriers to future development lies in the compute-intensive nature of the numerical formulations and the relative high cost of performing these computations on commercially available general purpose computers, a cost high with respect to dollar expenditure and/or elapsed time. Today's computing technology will support a program designed to create specialized computing facilities to be dedicated to the important problems of computational aerodynamics. One of the still unresolved questions is the organization of the computing components in such a facility. The characteristics of fluid dynamic problems which will have significant impact on the choice of computer architecture for a specialized facility are reviewed.

  14. New computing paradigms suggested by DNA computing: computing by carving.

    Science.gov (United States)

    Manca, V; Martín-Vide, C; Păun, G

    1999-10-01

    Inspired by the experiments in the emerging area of DNA computing, a somewhat unusual type of computation strategy was recently proposed by one of us: to generate a (large) set of candidate solutions of a problem, then remove the non-solutions such that what remains is the set of solutions. This has been called a computation by carving. This idea leads both to a speculation with possible important consequences--computing non-recursively enumerable languages--and to interesting theoretical computer science (formal language) questions.

  15. Advances in time-domain electromagnetic simulation capabilities through the use of overset grids and massively parallel computing

    Science.gov (United States)

    Blake, Douglas Clifton

    A new methodology is presented for conducting numerical simulations of electromagnetic scattering and wave-propagation phenomena on massively parallel computing platforms. A process is constructed which is rooted in the Finite-Volume Time-Domain (FVTD) technique to create a simulation capability that is both versatile and practical. In terms of versatility, the method is platform independent, is easily modifiable, and is capable of solving a large number of problems with no alterations. In terms of practicality, the method is sophisticated enough to solve problems of engineering significance and is not limited to mere academic exercises. In order to achieve this capability, techniques are integrated from several scientific disciplines including computational fluid dynamics, computational electromagnetics, and parallel computing. The end result is the first FVTD solver capable of utilizing the highly flexible overset-gridding process in a distributed-memory computing environment. In the process of creating this capability, work is accomplished to conduct the first study designed to quantify the effects of domain-decomposition dimensionality on the parallel performance of hyperbolic partial differential equations solvers; to develop a new method of partitioning a computational domain comprised of overset grids; and to provide the first detailed assessment of the applicability of overset grids to the field of computational electromagnetics. Using these new methods and capabilities, results from a large number of wave propagation and scattering simulations are presented. The overset-grid FVTD algorithm is demonstrated to produce results of comparable accuracy to single-grid simulations while simultaneously shortening the grid-generation process and increasing the flexibility and utility of the FVTD technique. Furthermore, the new domain-decomposition approaches developed for overset grids are shown to be capable of producing partitions that are better load balanced and

  16. Computer Literacy Education

    Science.gov (United States)

    1989-01-01

    curricula, systems must reorder their priorities.ඇ One question for computer-literacy advocates is this: What is computer literacy more important...Context." AEDS Journal, 17, 3 (Spring 1984) 1-13. "Reader’s Survey Results: What Is Computer Literacy?" Classroom Computer Learning (March 1986) p. 53...Acquisition of Computer Literacy." Journal of Computer-Based Information, 12, 1 (Winter 1985) 12-16. "\\ What is Computer Literacy?" Article 10c in Cannings

  17. Applied Parallel Computing Industrial Computation and Optimization

    DEFF Research Database (Denmark)

    Madsen, Kaj; NA NA NA Olesen, Dorte

    Proceedings and the Third International Workshop on Applied Parallel Computing in Industrial Problems and Optimization (PARA96)......Proceedings and the Third International Workshop on Applied Parallel Computing in Industrial Problems and Optimization (PARA96)...

  18. Further computer appreciation

    CERN Document Server

    Fry, T F

    2014-01-01

    Further Computer Appreciation is a comprehensive cover of the principles and aspects in computer appreciation. The book starts by describing the development of computers from the first to the third computer generations, to the development of processors and storage systems, up to the present position of computers and future trends. The text tackles the basic elements, concepts and functions of digital computers, computer arithmetic, input media and devices, and computer output. The basic central processor functions, data storage and the organization of data by classification of computer files,

  19. Computing at Stanford.

    Science.gov (United States)

    Feigenbaum, Edward A.; Nielsen, Norman R.

    1969-01-01

    This article provides a current status report on the computing and computer science activities at Stanford University, focusing on the Computer Science Department, the Stanford Computation Center, the recently established regional computing network, and the Institute for Mathematical Studies in the Social Sciences. Also considered are such topics…

  20. Information and Computation

    OpenAIRE

    Gershenson, Carlos

    2013-01-01

    In this chapter, concepts related to information and computation are reviewed in the context of human computation. A brief introduction to information theory and different types of computation is given. Two examples of human computation systems, online social networks and Wikipedia, are used to illustrate how these can be described and compared in terms of information and computation.

  1. Democratizing Computer Science

    Science.gov (United States)

    Margolis, Jane; Goode, Joanna; Ryoo, Jean J.

    2015-01-01

    Computer science programs are too often identified with a narrow stratum of the student population, often white or Asian boys who have access to computers at home. But because computers play such a huge role in our world today, all students can benefit from the study of computer science and the opportunity to build skills related to computing. The…

  2. Computed Tomography (CT) -- Head

    Medline Plus

    Full Text Available ... Professions Site Index A-Z Computed Tomography (CT) - Head Computed tomography (CT) of the head uses special ... the Head? What is CT Scanning of the Head? Computed tomography, more commonly known as a CT ...

  3. Quantum computation with superconductors

    OpenAIRE

    Irastorza Gabilondo, Amaia

    2017-01-01

    Quantum computation using supercoducting qubits. Qubits are quantum bits used in quantum computers. Superconducting qubits are a strong option for building a quantum computer. But not just that, as they are macroscopic objects they question the limits of quantum physics.

  4. DNA computing models

    CERN Document Server

    Ignatova, Zoya; Zimmermann, Karl-Heinz

    2008-01-01

    In this excellent text, the reader is given a comprehensive introduction to the field of DNA computing. The book emphasizes computational methods to tackle central problems of DNA computing, such as controlling living cells, building patterns, and generating nanomachines.

  5. Computer Viruses: An Overview.

    Science.gov (United States)

    Marmion, Dan

    1990-01-01

    Discusses the early history and current proliferation of computer viruses that occur on Macintosh and DOS personal computers, mentions virus detection programs, and offers suggestions for how libraries can protect themselves and their users from damage by computer viruses. (LRW)

  6. Computers and the landscape

    Science.gov (United States)

    Gary H. Elsner

    1979-01-01

    Computers can analyze and help to plan the visual aspects of large wildland landscapes. This paper categorizes and explains current computer methods available. It also contains a futuristic dialogue between a landscape architect and a computer.

  7. Soft computing in computer and information science

    CERN Document Server

    Fray, Imed; Pejaś, Jerzy

    2015-01-01

    This book presents a carefully selected and reviewed collection of papers presented during the 19th Advanced Computer Systems conference ACS-2014. The Advanced Computer Systems conference concentrated from its beginning on methods and algorithms of artificial intelligence. Further future brought new areas of interest concerning technical informatics related to soft computing and some more technological aspects of computer science such as multimedia and computer graphics, software engineering, web systems, information security and safety or project management. These topics are represented in the present book under the categories Artificial Intelligence, Design of Information and Multimedia Systems, Information Technology Security and Software Technologies.

  8. Computational Intelligence, Cyber Security and Computational Models

    CERN Document Server

    Anitha, R; Lekshmi, R; Kumar, M; Bonato, Anthony; Graña, Manuel

    2014-01-01

    This book contains cutting-edge research material presented by researchers, engineers, developers, and practitioners from academia and industry at the International Conference on Computational Intelligence, Cyber Security and Computational Models (ICC3) organized by PSG College of Technology, Coimbatore, India during December 19–21, 2013. The materials in the book include theory and applications for design, analysis, and modeling of computational intelligence and security. The book will be useful material for students, researchers, professionals, and academicians. It will help in understanding current research trends and findings and future scope of research in computational intelligence, cyber security, and computational models.

  9. Distributed Computing: An Overview

    OpenAIRE

    Md. Firoj Ali; Rafiqul Zaman Khan

    2015-01-01

    Decrease in hardware costs and advances in computer networking technologies have led to increased interest in the use of large-scale parallel and distributed computing systems. Distributed computing systems offer the potential for improved performance and resource sharing. In this paper we have made an overview on distributed computing. In this paper we studied the difference between parallel and distributed computing, terminologies used in distributed computing, task allocation in distribute...

  10. Computer Virus and Trends

    OpenAIRE

    Tutut Handayani; Soenarto Usna,Drs.MMSI

    2004-01-01

    Since its appearance the first time in the mid-1980s, computer virus has invited various controversies that still lasts to this day. Along with the development of computer systems technology, viruses komputerpun find new ways to spread itself through a variety of existing communications media. This paper discusses about some things related to computer viruses, namely: the definition and history of computer viruses; the basics of computer viruses; state of computer viruses at this time; and ...

  11. Computing technology in the 1980's. [computers

    Science.gov (United States)

    Stone, H. S.

    1978-01-01

    Advances in computing technology have been led by consistently improving semiconductor technology. The semiconductor industry has turned out ever faster, smaller, and less expensive devices since transistorized computers were first introduced 20 years ago. For the next decade, there appear to be new advances possible, with the rate of introduction of improved devices at least equal to the historic trends. The implication of these projections is that computers will enter new markets and will truly be pervasive in business, home, and factory as their cost diminishes and their computational power expands to new levels. The computer industry as we know it today will be greatly altered in the next decade, primarily because the raw computer system will give way to computer-based turn-key information and control systems.

  12. Cloud Computing Quality

    Directory of Open Access Journals (Sweden)

    Anamaria Şiclovan

    2013-02-01

    Full Text Available Cloud computing was and it will be a new way of providing Internet services and computers. This calculation approach is based on many existing services, such as the Internet, grid computing, Web services. Cloud computing as a system aims to provide on demand services more acceptable as price and infrastructure. It is exactly the transition from computer to a service offered to the consumers as a product delivered online. This paper is meant to describe the quality of cloud computing services, analyzing the advantages and characteristics offered by it. It is a theoretical paper.Keywords: Cloud computing, QoS, quality of cloud computing

  13. Quantum computational supremacy.

    Science.gov (United States)

    Harrow, Aram W; Montanaro, Ashley

    2017-09-13

    The field of quantum algorithms aims to find ways to speed up the solution of computational problems by using a quantum computer. A key milestone in this field will be when a universal quantum computer performs a computational task that is beyond the capability of any classical computer, an event known as quantum supremacy. This would be easier to achieve experimentally than full-scale quantum computing, but involves new theoretical challenges. Here we present the leading proposals to achieve quantum supremacy, and discuss how we can reliably compare the power of a classical computer to the power of a quantum computer.

  14. Computers in nuclear medicine

    International Nuclear Information System (INIS)

    Giannone, Carlos A.

    1999-01-01

    This chapter determines: capture and observation of images in computers; hardware and software used, personal computers, networks and workstations. The use of special filters determine the quality image

  15. Plasticity: modeling & computation

    National Research Council Canada - National Science Library

    Borja, Ronaldo Israel

    2013-01-01

    .... "Plasticity Modeling & Computation" is a textbook written specifically for students who want to learn the theoretical, mathematical, and computational aspects of inelastic deformation in solids...

  16. Computer jargon explained

    CERN Document Server

    Enticknap, Nicholas

    2014-01-01

    Computer Jargon Explained is a feature in Computer Weekly publications that discusses 68 of the most commonly used technical computing terms. The book explains what the terms mean and why the terms are important to computer professionals. The text also discusses how the terms relate to the trends and developments that are driving the information technology industry. Computer jargon irritates non-computer people and in turn causes problems for computer people. The technology and the industry are changing so rapidly; it is very hard even for professionals to keep updated. Computer people do not

  17. Computers and data processing

    CERN Document Server

    Deitel, Harvey M

    1985-01-01

    Computers and Data Processing provides information pertinent to the advances in the computer field. This book covers a variety of topics, including the computer hardware, computer programs or software, and computer applications systems.Organized into five parts encompassing 19 chapters, this book begins with an overview of some of the fundamental computing concepts. This text then explores the evolution of modern computing systems from the earliest mechanical calculating devices to microchips. Other chapters consider how computers present their results and explain the storage and retrieval of

  18. Computer hardware fault administration

    Science.gov (United States)

    Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.

    2010-09-14

    Computer hardware fault administration carried out in a parallel computer, where the parallel computer includes a plurality of compute nodes. The compute nodes are coupled for data communications by at least two independent data communications networks, where each data communications network includes data communications links connected to the compute nodes. Typical embodiments carry out hardware fault administration by identifying a location of a defective link in the first data communications network of the parallel computer and routing communications data around the defective link through the second data communications network of the parallel computer.

  19. Infrastructure for distributed enterprise simulation

    Energy Technology Data Exchange (ETDEWEB)

    Johnson, M.M.; Yoshimura, A.S.; Goldsby, M.E. [and others

    1998-01-01

    Traditional discrete-event simulations employ an inherently sequential algorithm and are run on a single computer. However, the demands of many real-world problems exceed the capabilities of sequential simulation systems. Often the capacity of a computer`s primary memory limits the size of the models that can be handled, and in some cases parallel execution on multiple processors could significantly reduce the simulation time. This paper describes the development of an Infrastructure for Distributed Enterprise Simulation (IDES) - a large-scale portable parallel simulation framework developed to support Sandia National Laboratories` mission in stockpile stewardship. IDES is based on the Breathing-Time-Buckets synchronization protocol, and maps a message-based model of distributed computing onto an object-oriented programming model. IDES is portable across heterogeneous computing architectures, including single-processor systems, networks of workstations and multi-processor computers with shared or distributed memory. The system provides a simple and sufficient application programming interface that can be used by scientists to quickly model large-scale, complex enterprise systems. In the background and without involving the user, IDES is capable of making dynamic use of idle processing power available throughout the enterprise network. 16 refs., 14 figs.

  20. Missile signal processing common computer architecture for rapid technology upgrade

    Science.gov (United States)

    Rabinkin, Daniel V.; Rutledge, Edward; Monticciolo, Paul

    2004-10-01

    Interceptor missiles process IR images to locate an intended target and guide the interceptor towards it. Signal processing requirements have increased as the sensor bandwidth increases and interceptors operate against more sophisticated targets. A typical interceptor signal processing chain is comprised of two parts. Front-end video processing operates on all pixels of the image and performs such operations as non-uniformity correction (NUC), image stabilization, frame integration and detection. Back-end target processing, which tracks and classifies targets detected in the image, performs such algorithms as Kalman tracking, spectral feature extraction and target discrimination. In the past, video processing was implemented using ASIC components or FPGAs because computation requirements exceeded the throughput of general-purpose processors. Target processing was performed using hybrid architectures that included ASICs, DSPs and general-purpose processors. The resulting systems tended to be function-specific, and required custom software development. They were developed using non-integrated toolsets and test equipment was developed along with the processor platform. The lifespan of a system utilizing the signal processing platform often spans decades, while the specialized nature of processor hardware and software makes it difficult and costly to upgrade. As a result, the signal processing systems often run on outdated technology, algorithms are difficult to update, and system effectiveness is impaired by the inability to rapidly respond to new threats. A new design approach is made possible three developments; Moore's Law - driven improvement in computational throughput; a newly introduced vector computing capability in general purpose processors; and a modern set of open interface software standards. Today's multiprocessor commercial-off-the-shelf (COTS) platforms have sufficient throughput to support interceptor signal processing requirements. This application

  1. Advances in unconventional computing

    CERN Document Server

    2017-01-01

    The unconventional computing is a niche for interdisciplinary science, cross-bred of computer science, physics, mathematics, chemistry, electronic engineering, biology, material science and nanotechnology. The aims of this book are to uncover and exploit principles and mechanisms of information processing in and functional properties of physical, chemical and living systems to develop efficient algorithms, design optimal architectures and manufacture working prototypes of future and emergent computing devices. This first volume presents theoretical foundations of the future and emergent computing paradigms and architectures. The topics covered are computability, (non-)universality and complexity of computation; physics of computation, analog and quantum computing; reversible and asynchronous devices; cellular automata and other mathematical machines; P-systems and cellular computing; infinity and spatial computation; chemical and reservoir computing. The book is the encyclopedia, the first ever complete autho...

  2. Resource-Efficient, Hierarchical Auto-Tuning of a Hybrid Lattice Boltzmann Computation on the Cray XT4

    Energy Technology Data Exchange (ETDEWEB)

    Computational Research Division, Lawrence Berkeley National Laboratory; NERSC, Lawrence Berkeley National Laboratory; Computer Science Department, University of California, Berkeley; Williams, Samuel; Carter, Jonathan; Oliker, Leonid; Shalf, John; Yelick, Katherine

    2009-05-04

    We apply auto-tuning to a hybrid MPI-pthreads lattice Boltzmann computation running on the Cray XT4 at National Energy Research Scientific Computing Center (NERSC). Previous work showed that multicore-specific auto-tuning can improve the performance of lattice Boltzmann magnetohydrodynamics (LBMHD) by a factor of 4x when running on dual- and quad-core Opteron dual-socket SMPs. We extend these studies to the distributed memory arena via a hybrid MPI/pthreads implementation. In addition to conventional auto-tuning at the local SMP node, we tune at the message-passing level to determine the optimal aspect ratio as well as the correct balance between MPI tasks and threads per MPI task. Our study presents a detailed performance analysis when moving along an isocurve of constant hardware usage: fixed total memory, total cores, and total nodes. Overall, our work points to approaches for improving intra- and inter-node efficiency on large-scale multicore systems for demanding scientific applications.

  3. Resource-Efficient, Hierarchical Auto-Tuning of a Hybrid Lattice Boltzmann Computation on the Cray XT4

    International Nuclear Information System (INIS)

    Williams, Samuel; Carter, Jonathan; Oliker, Leonid; Shalf, John; Yelick, Katherine

    2009-01-01

    We apply auto-tuning to a hybrid MPI-pthreads lattice Boltzmann computation running on the Cray XT4 at National Energy Research Scientific Computing Center (NERSC). Previous work showed that multicore-specific auto-tuning can improve the performance of lattice Boltzmann magnetohydrodynamics (LBMHD) by a factor of 4x when running on dual- and quad-core Opteron dual-socket SMPs. We extend these studies to the distributed memory arena via a hybrid MPI/pthreads implementation. In addition to conventional auto-tuning at the local SMP node, we tune at the message-passing level to determine the optimal aspect ratio as well as the correct balance between MPI tasks and threads per MPI task. Our study presents a detailed performance analysis when moving along an isocurve of constant hardware usage: fixed total memory, total cores, and total nodes. Overall, our work points to approaches for improving intra- and inter-node efficiency on large-scale multicore systems for demanding scientific applications

  4. Monitoring of computing resource use of active software releases at ATLAS

    Science.gov (United States)

    Limosani, Antonio; ATLAS Collaboration

    2017-10-01

    The LHC is the world’s most powerful particle accelerator, colliding protons at centre of mass energy of 13 TeV. As the energy and frequency of collisions has grown in the search for new physics, so too has demand for computing resources needed for event reconstruction. We will report on the evolution of resource usage in terms of CPU and RAM in key ATLAS offline reconstruction workflows at the TierO at CERN and on the WLCG. Monitoring of workflows is achieved using the ATLAS PerfMon package, which is the standard ATLAS performance monitoring system running inside Athena jobs. Systematic daily monitoring has recently been expanded to include all workflows beginning at Monte Carlo generation through to end-user physics analysis, beyond that of event reconstruction. Moreover, the move to a multiprocessor mode in production jobs has facilitated the use of tools, such as “MemoryMonitor”, to measure the memory shared across processors in jobs. Resource consumption is broken down into software domains and displayed in plots generated using Python visualization libraries and collected into pre-formatted auto-generated Web pages, which allow the ATLAS developer community to track the performance of their algorithms. This information is however preferentially filtered to domain leaders and developers through the use of JIRA and via reports given at ATLAS software meetings. Finally, we take a glimpse of the future by reporting on the expected CPU and RAM usage in benchmark workflows associated with the High Luminosity LHC and anticipate the ways performance monitoring will evolve to understand and benchmark future workflows.

  5. Hardware Architectures for Data-Intensive Computing Problems: A Case Study for String Matching

    Energy Technology Data Exchange (ETDEWEB)

    Tumeo, Antonino; Villa, Oreste; Chavarría-Miranda, Daniel

    2012-12-28

    DNA analysis is an emerging application of high performance bioinformatic. Modern sequencing machinery are able to provide, in few hours, large input streams of data, which needs to be matched against exponentially growing databases of known fragments. The ability to recognize these patterns effectively and fastly may allow extending the scale and the reach of the investigations performed by biology scientists. Aho-Corasick is an exact, multiple pattern matching algorithm often at the base of this application. High performance systems are a promising platform to accelerate this algorithm, which is computationally intensive but also inherently parallel. Nowadays, high performance systems also include heterogeneous processing elements, such as Graphic Processing Units (GPUs), to further accelerate parallel algorithms. Unfortunately, the Aho-Corasick algorithm exhibits large performance variability, depending on the size of the input streams, on the number of patterns to search and on the number of matches, and poses significant challenges on current high performance software and hardware implementations. An adequate mapping of the algorithm on the target architecture, coping with the limit of the underlining hardware, is required to reach the desired high throughputs. In this paper, we discuss the implementation of the Aho-Corasick algorithm for GPU-accelerated high performance systems. We present an optimized implementation of Aho-Corasick for GPUs and discuss its tradeoffs on the Tesla T10 and he new Tesla T20 (codename Fermi) GPUs. We then integrate the optimized GPU code, respectively, in a MPI-based and in a pthreads-based load balancer to enable execution of the algorithm on clusters and large sharedmemory multiprocessors (SMPs) accelerated with multiple GPUs.

  6. A performance model for the communication in fast multipole methods on high-performance computing platforms

    KAUST Repository

    Ibeid, Huda

    2016-03-04

    Exascale systems are predicted to have approximately 1 billion cores, assuming gigahertz cores. Limitations on affordable network topologies for distributed memory systems of such massive scale bring new challenges to the currently dominant parallel programing model. Currently, there are many efforts to evaluate the hardware and software bottlenecks of exascale designs. It is therefore of interest to model application performance and to understand what changes need to be made to ensure extrapolated scalability. The fast multipole method (FMM) was originally developed for accelerating N-body problems in astrophysics and molecular dynamics but has recently been extended to a wider range of problems. Its high arithmetic intensity combined with its linear complexity and asynchronous communication patterns make it a promising algorithm for exascale systems. In this paper, we discuss the challenges for FMM on current parallel computers and future exascale architectures, with a focus on internode communication. We focus on the communication part only; the efficiency of the computational kernels are beyond the scope of the present study. We develop a performance model that considers the communication patterns of the FMM and observe a good match between our model and the actual communication time on four high-performance computing (HPC) systems, when latency, bandwidth, network topology, and multicore penalties are all taken into account. To our knowledge, this is the first formal characterization of internode communication in FMM that validates the model against actual measurements of communication time. The ultimate communication model is predictive in an absolute sense; however, on complex systems, this objective is often out of reach or of a difficulty out of proportion to its benefit when there exists a simpler model that is inexpensive and sufficient to guide coding decisions leading to improved scaling. The current model provides such guidance.

  7. 8 bit computer

    OpenAIRE

    Jankovskij, Robert

    2018-01-01

    In this paper the author looks into an eight bit computer structure and the computers components, their structure, pros and cons. An eight bit computer which can execute basic instructions and arithmetic operations such as addition and subtraction of eight bit numbers is built out of integrated circuits. Data transfers between computer components are monitored and reviewed.

  8. The Glass Computer

    Science.gov (United States)

    Paesler, M. A.

    2009-01-01

    Digital computers use different kinds of memory, each of which is either volatile or nonvolatile. On most computers only the hard drive memory is nonvolatile, i.e., it retains all information stored on it when the power is off. When a computer is turned on, an operating system stored on the hard drive is loaded into the computer's memory cache and…

  9. Computability and unsolvability

    CERN Document Server

    Davis, Martin

    1985-01-01

    ""A clearly written, well-presented survey of an intriguing subject."" - Scientific American. Classic text considers general theory of computability, computable functions, operations on computable functions, Turing machines self-applied, unsolvable decision problems, applications of general theory, mathematical logic, Kleene hierarchy, computable functionals, classification of unsolvable decision problems and more.

  10. My Computer Romance

    Science.gov (United States)

    Campbell, Gardner

    2007-01-01

    In this article, the author relates the big role of computers in his life as a writer. The author narrates that he has been using a computer for nearly twenty years now. He relates that computers has set his writing free. When he started writing, he was just using an electric typewriter. He also relates that his romance with computers is also a…

  11. Mathematics for computer graphics

    CERN Document Server

    Vince, John

    2006-01-01

    Helps you understand the mathematical ideas used in computer animation, virtual reality, CAD, and other areas of computer graphics. This work also helps you to rediscover the mathematical techniques required to solve problems and design computer programs for computer graphic applications

  12. Adolescents' Computer Art.

    Science.gov (United States)

    Clements, Robert D.

    1985-01-01

    Adolescents react very positively to computer graphics programs. The biggest obstacle to initiation of computer art in schools is teacher attitudes. Things to consider when starting a computer graphics program are discussed, and some illustrations of student computer art are provided. (RM)

  13. How Computer Graphics Work.

    Science.gov (United States)

    Prosise, Jeff

    This document presents the principles behind modern computer graphics without straying into the arcane languages of mathematics and computer science. Illustrations accompany the clear, step-by-step explanations that describe how computers draw pictures. The 22 chapters of the book are organized into 5 sections. "Part 1: Computer Graphics in…

  14. Computational Social Sciences

    OpenAIRE

    Amaral, Inês

    2017-01-01

    Computational social sciences is a research discipline at the interface between computer science and the traditional social sciences. This interdisciplinary and emerging scientific field uses computationally methods to analyze and model social phenomena, social structures, and collective behavior. The main computational approaches to the social sciences are social network analysis, automated information extraction systems, social geographic information systems, comp...

  15. Marketers increase computer usage

    Energy Technology Data Exchange (ETDEWEB)

    1984-10-01

    A special study is presented on the use of computers in the fuel oil business. In 1984, 86% of the marketers used a computer and all of them used the computer for the billing. A large portion, 95%, used them to schedule delivery, and 91% used the computer to control credit. All of these percentages were similar to those for 1981.

  16. Computer Viruses. Technology Update.

    Science.gov (United States)

    Ponder, Tim, Comp.; Ropog, Marty, Comp.; Keating, Joseph, Comp.

    This document provides general information on computer viruses, how to help protect a computer network from them, measures to take if a computer becomes infected. Highlights include the origins of computer viruses; virus contraction; a description of some common virus types (File Virus, Boot Sector/Partition Table Viruses, Trojan Horses, and…

  17. Great Principles of Computing

    OpenAIRE

    Denning, Peter J.

    2008-01-01

    The Great Principles of Computing is a framework for understanding computing as a field of science. The website ...April 2008 (Rev. 8/31/08) The Great Principles of Computing is a framework for understanding computing as a field of science.

  18. Review of An Introduction to Parallel and Vector Scientific Computing

    Energy Technology Data Exchange (ETDEWEB)

    Bailey, David H.; Lefton, Lew

    2006-06-30

    to see the publication of the book An Introduction to Parallel and Vector Scientic Computing, written by Ronald W. Shonkwiler and Lew Lefton, both of the Georgia Institute of Technology. They have taken the bull by the horns and produced a book that appears to be entirely satisfactory as an introductory textbook for use in such a course. It is also of interest to the much broader community of researchers who are already in the field, laboring day by day to improve the power and performance of their numerical simulations. The book is organized into 11 chapters, plus an appendix. The first three chapters describe the basics of system architecture including vector, parallel and distributed memory systems, the details of task dependence and synchronization, and the various programming models currently in use - threads, MPI and OpenMP. Chapters four through nine provide a competent introduction to floating-point arithmetic, numerical error and numerical linear algebra. Some of the topics presented include Gaussian elimination, LU decomposition, tridiagonal systems, Givens rotations, QR decompositions, Gauss-Seidel iterations and Householder transformations. Chapters 10 and 11 introduce Monte Carlo methods and schemes for discrete optimization such as genetic algorithms.

  19. Brief: Managing computing technology

    International Nuclear Information System (INIS)

    Startzman, R.A.

    1994-01-01

    While computing is applied widely in the production segment of the petroleum industry, its effective application is the primary goal of computing management. Computing technology has changed significantly since the 1950's, when computers first began to influence petroleum technology. The ability to accomplish traditional tasks faster and more economically probably is the most important effect that computing has had on the industry. While speed and lower cost are important, are they enough? Can computing change the basic functions of the industry? When new computing technology is introduced improperly, it can clash with traditional petroleum technology. This paper examines the role of management in merging these technologies

  20. Roadmap to greener computing

    CERN Document Server

    Nguemaleu, Raoul-Abelin Choumin

    2014-01-01

    A concise and accessible introduction to green computing and green IT, this book addresses how computer science and the computer infrastructure affect the environment and presents the main challenges in making computing more environmentally friendly. The authors review the methodologies, designs, frameworks, and software development tools that can be used in computer science to reduce energy consumption and still compute efficiently. They also focus on Computer Aided Design (CAD) and describe what design engineers and CAD software applications can do to support new streamlined business directi

  1. Computer mathematics for programmers

    CERN Document Server

    Abney, Darrell H; Sibrel, Donald W

    1985-01-01

    Computer Mathematics for Programmers presents the Mathematics that is essential to the computer programmer.The book is comprised of 10 chapters. The first chapter introduces several computer number systems. Chapter 2 shows how to perform arithmetic operations using the number systems introduced in Chapter 1. The third chapter covers the way numbers are stored in computers, how the computer performs arithmetic on real numbers and integers, and how round-off errors are generated in computer programs. Chapter 4 details the use of algorithms and flowcharting as problem-solving tools for computer p

  2. Parallel computing works

    Energy Technology Data Exchange (ETDEWEB)

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  3. Research on Comparison of Cloud Computing and Grid Computing

    OpenAIRE

    Liu Yuxi; Wang Jianhua

    2012-01-01

    The development of computer industry is promoted by the progress of distributed computing, parallel computing and grid computing, so the cloud computing movement rises. This study describes the types of cloud computing services, the similarities and differences of cloud computing and grid computing, meanwhile discusses the better aspect of cloud computing than grid computing, and refers the common problems faced to the both computing, and some security issues.

  4. Toward Cloud Computing Evolution

    OpenAIRE

    Susanto, Heru; Almunawar, Mohammad Nabil; Kang, Chen Chin

    2012-01-01

    -Information Technology (IT) shaped the success of organizations, giving them a solid foundation that increases both their level of efficiency as well as productivity. The computing industry is witnessing a paradigm shift in the way computing is performed worldwide. There is a growing awareness among consumers and enterprises to access their IT resources extensively through a "utility" model known as "cloud computing." Cloud computing was initially rooted in distributed grid-based computing. ...

  5. Algorithmically specialized parallel computers

    CERN Document Server

    Snyder, Lawrence; Gannon, Dennis B

    1985-01-01

    Algorithmically Specialized Parallel Computers focuses on the concept and characteristics of an algorithmically specialized computer.This book discusses the algorithmically specialized computers, algorithmic specialization using VLSI, and innovative architectures. The architectures and algorithms for digital signal, speech, and image processing and specialized architectures for numerical computations are also elaborated. Other topics include the model for analyzing generalized inter-processor, pipelined architecture for search tree maintenance, and specialized computer organization for raster

  6. The digital computer

    CERN Document Server

    Parton, K C

    2014-01-01

    The Digital Computer focuses on the principles, methodologies, and applications of the digital computer. The publication takes a look at the basic concepts involved in using a digital computer, simple autocode examples, and examples of working advanced design programs. Discussions focus on transformer design synthesis program, machine design analysis program, solution of standard quadratic equations, harmonic analysis, elementary wage calculation, and scientific calculations. The manuscript then examines commercial and automatic programming, how computers work, and the components of a computer

  7. Advanced computers and Monte Carlo

    International Nuclear Information System (INIS)

    Jordan, T.L.

    1979-01-01

    High-performance parallelism that is currently available is synchronous in nature. It is manifested in such architectures as Burroughs ILLIAC-IV, CDC STAR-100, TI ASC, CRI CRAY-1, ICL DAP, and many special-purpose array processors designed for signal processing. This form of parallelism has apparently not been of significant value to many important Monte Carlo calculations. Nevertheless, there is much asynchronous parallelism in many of these calculations. A model of a production code that requires up to 20 hours per problem on a CDC 7600 is studied for suitability on some asynchronous architectures that are on the drawing board. The code is described and some of its properties and resource requirements ae identified to compare with corresponding properties and resource requirements are identified to compare with corresponding properties and resource requirements are identified to compare with corresponding properties and resources of some asynchronous multiprocessor architectures. Arguments are made for programer aids and special syntax to identify and support important asynchronous parallelism. 2 figures, 5 tables

  8. Synthetic Computation: Chaos Computing, Logical Stochastic Resonance, and Adaptive Computing

    Science.gov (United States)

    Kia, Behnam; Murali, K.; Jahed Motlagh, Mohammad-Reza; Sinha, Sudeshna; Ditto, William L.

    Nonlinearity and chaos can illustrate numerous behaviors and patterns, and one can select different patterns from this rich library of patterns. In this paper we focus on synthetic computing, a field that engineers and synthesizes nonlinear systems to obtain computation. We explain the importance of nonlinearity, and describe how nonlinear systems can be engineered to perform computation. More specifically, we provide an overview of chaos computing, a field that manually programs chaotic systems to build different types of digital functions. Also we briefly describe logical stochastic resonance (LSR), and then extend the approach of LSR to realize combinational digital logic systems via suitable concatenation of existing logical stochastic resonance blocks. Finally we demonstrate how a chaotic system can be engineered and mated with different machine learning techniques, such as artificial neural networks, random searching, and genetic algorithm, to design different autonomous systems that can adapt and respond to environmental conditions.

  9. Know Your Personal Computer Introduction to Computers

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 1; Issue 1. Know Your Personal Computer Introduction to Computers. Siddhartha Kumar Ghoshal. Series Article Volume 1 Issue 1 January 1996 pp 48-55. Fulltext. Click here to view fulltext PDF. Permanent link:

  10. Computers and Computation. Readings from Scientific American.

    Science.gov (United States)

    Fenichel, Robert R.; Weizenbaum, Joseph

    A collection of articles from "Scientific American" magazine has been put together at this time because the current period in computer science is one of consolidation rather than innovation. A few years ago, computer science was moving so swiftly that even the professional journals were more archival than informative; but today it is…

  11. Computational Biology, Advanced Scientific Computing, and Emerging Computational Architectures

    Energy Technology Data Exchange (ETDEWEB)

    None

    2007-06-27

    This CRADA was established at the start of FY02 with $200 K from IBM and matching funds from DOE to support post-doctoral fellows in collaborative research between International Business Machines and Oak Ridge National Laboratory to explore effective use of emerging petascale computational architectures for the solution of computational biology problems. 'No cost' extensions of the CRADA were negotiated with IBM for FY03 and FY04.

  12. Simultaneous Budget and Buffer Size Computation for Throughput-Constrained Task Graphs

    NARCIS (Netherlands)

    Wiggers, M.H.; Bekooij, Marco Jan Gerrit; Geilen, Marc C.W.; Basten, Twan

    Modern embedded multimedia systems process multiple concurrent streams of data processing jobs. Streams often have throughput requirements. These jobs are implemented on a multiprocessor system as a task graph. Tasks communicate data over buffers, where tasks wait on sufficient space in output

  13. Efficient Computation of Buffer Capacities for Cyclo-Static Dataflow Graphs

    NARCIS (Netherlands)

    Wiggers, M.H.; Bekooij, Marco Jan Gerrit; Bekooij, Marco J.G.; Smit, Gerardus Johannes Maria

    A key step in the design of cyclo-static real-time systems is the determination of buffer capacities. In our multi-processor system, we apply back-pressure, which means that tasks wait for space in output buffers. Consequently buffer capacities affect the throughput. This requires the derivation of

  14. Efficient Computation of Buffer Capacities for Cyclo-Static Dataflow Graphs

    NARCIS (Netherlands)

    Wiggers, M.H.; Bekooij, Marco Jan Gerrit; Smit, Gerardus Johannes Maria

    2006-01-01

    A key step in the design of cyclo-static real-time systems is the determination of buffer capacities. In our multi-processor system, we apply back-pressure, which means that tasks wait for space in output buffers. Consequently buffer capacities affect the throughput. This requires the derivation of

  15. Hardware for Accelerating N-Modular Redundant Systems for High-Reliability Computing

    Science.gov (United States)

    Dobbs, Carl, Sr.

    2012-01-01

    A hardware unit has been designed that reduces the cost, in terms of performance and power consumption, for implementing N-modular redundancy (NMR) in a multiprocessor device. The innovation monitors transactions to memory, and calculates a form of sumcheck on-the-fly, thereby relieving the processors of calculating the sumcheck in software

  16. Heterotic computing: exploiting hybrid computational devices.

    Science.gov (United States)

    Kendon, Viv; Sebald, Angelika; Stepney, Susan

    2015-07-28

    Current computational theory deals almost exclusively with single models: classical, neural, analogue, quantum, etc. In practice, researchers use ad hoc combinations, realizing only recently that they can be fundamentally more powerful than the individual parts. A Theo Murphy meeting brought together theorists and practitioners of various types of computing, to engage in combining the individual strengths to produce powerful new heterotic devices. 'Heterotic computing' is defined as a combination of two or more computational systems such that they provide an advantage over either substrate used separately. This post-meeting collection of articles provides a wide-ranging survey of the state of the art in diverse computational paradigms, together with reflections on their future combination into powerful and practical applications. © 2015 The Author(s) Published by the Royal Society. All rights reserved.

  17. Parallel computing for homogeneous diffusion and transport equations in neutronics; Calcul parallele pour les equations de diffusion et de transport homogenes en neutronique

    Energy Technology Data Exchange (ETDEWEB)

    Pinchedez, K

    1999-06-01

    Parallel computing meets the ever-increasing requirements for neutronic computer code speed and accuracy. In this work, two different approaches have been considered. We first parallelized the sequential algorithm used by the neutronics code CRONOS developed at the French Atomic Energy Commission. The algorithm computes the dominant eigenvalue associated with PN simplified transport equations by a mixed finite element method. Several parallel algorithms have been developed on distributed memory machines. The performances of the parallel algorithms have been studied experimentally by implementation on a T3D Cray and theoretically by complexity models. A comparison of various parallel algorithms has confirmed the chosen implementations. We next applied a domain sub-division technique to the two-group diffusion Eigen problem. In the modal synthesis-based method, the global spectrum is determined from the partial spectra associated with sub-domains. Then the Eigen problem is expanded on a family composed, on the one hand, from eigenfunctions associated with the sub-domains and, on the other hand, from functions corresponding to the contribution from the interface between the sub-domains. For a 2-D homogeneous core, this modal method has been validated and its accuracy has been measured. (author)

  18. Cloud Computing for radiologists.

    Science.gov (United States)

    Kharat, Amit T; Safvi, Amjad; Thind, Ss; Singh, Amarjit

    2012-07-01

    Cloud computing is a concept wherein a computer grid is created using the Internet with the sole purpose of utilizing shared resources such as computer software, hardware, on a pay-per-use model. Using Cloud computing, radiology users can efficiently manage multimodality imaging units by using the latest software and hardware without paying huge upfront costs. Cloud computing systems usually work on public, private, hybrid, or community models. Using the various components of a Cloud, such as applications, client, infrastructure, storage, services, and processing power, Cloud computing can help imaging units rapidly scale and descale operations and avoid huge spending on maintenance of costly applications and storage. Cloud computing allows flexibility in imaging. It sets free radiology from the confines of a hospital and creates a virtual mobile office. The downsides to Cloud computing involve security and privacy issues which need to be addressed to ensure the success of Cloud computing in the future.

  19. Cloud Computing for radiologists

    International Nuclear Information System (INIS)

    Kharat, Amit T; Safvi, Amjad; Thind, SS; Singh, Amarjit

    2012-01-01

    Cloud computing is a concept wherein a computer grid is created using the Internet with the sole purpose of utilizing shared resources such as computer software, hardware, on a pay-per-use model. Using Cloud computing, radiology users can efficiently manage multimodality imaging units by using the latest software and hardware without paying huge upfront costs. Cloud computing systems usually work on public, private, hybrid, or community models. Using the various components of a Cloud, such as applications, client, infrastructure, storage, services, and processing power, Cloud computing can help imaging units rapidly scale and descale operations and avoid huge spending on maintenance of costly applications and storage. Cloud computing allows flexibility in imaging. It sets free radiology from the confines of a hospital and creates a virtual mobile office. The downsides to Cloud computing involve security and privacy issues which need to be addressed to ensure the success of Cloud computing in the future

  20. Review of quantum computation

    International Nuclear Information System (INIS)

    Lloyd, S.

    1992-01-01

    Digital computers are machines that can be programmed to perform logical and arithmetical operations. Contemporary digital computers are ''universal,'' in the sense that a program that runs on one computer can, if properly compiled, run on any other computer that has access to enough memory space and time. Any one universal computer can simulate the operation of any other; and the set of tasks that any such machine can perform is common to all universal machines. Since Bennett's discovery that computation can be carried out in a non-dissipative fashion, a number of Hamiltonian quantum-mechanical systems have been proposed whose time-evolutions over discrete intervals are equivalent to those of specific universal computers. The first quantum-mechanical treatment of computers was given by Benioff, who exhibited a Hamiltonian system with a basis whose members corresponded to the logical states of a Turing machine. In order to make the Hamiltonian local, in the sense that its structure depended only on the part of the computation being performed at that time, Benioff found it necessary to make the Hamiltonian time-dependent. Feynman discovered a way to make the computational Hamiltonian both local and time-independent by incorporating the direction of computation in the initial condition. In Feynman's quantum computer, the program is a carefully prepared wave packet that propagates through different computational states. Deutsch presented a quantum computer that exploits the possibility of existing in a superposition of computational states to perform tasks that a classical computer cannot, such as generating purely random numbers, and carrying out superpositions of computations as a method of parallel processing. In this paper, we show that such computers, by virtue of their common function, possess a common form for their quantum dynamics