WorldWideScience

Sample records for distributed-memory multiprocessor computers

  1. Multigrid solution of diffusion equations on distributed memory multiprocessor systems

    International Nuclear Information System (INIS)

    Finnemann, H.

    1988-01-01

    The subject is the solution of partial differential equations for simulation of the reactor core on high-performance computers. The parallelization and implementation of nodal multigrid diffusion algorithms on array and ring configurations of the DIRMU multiprocessor system is outlined. The particular iteration scheme employed in the nodal expansion method appears similarly efficient in serial and parallel environments. The combination of modern multi-level techniques with innovative hardware (vector-multiprocessor systems) provides powerful tools needed for real time simulation of physical systems. The parallel efficiencies range from 70 to 90%. The same performance is estimated for large problems on large multiprocessor systems being designed at present. (orig.) [de

  2. Distributed-memory matrix computations

    DEFF Research Database (Denmark)

    Balle, Susanne Mølleskov

    1995-01-01

    The main goal of this project is to investigate, develop, and implement algorithms for numerical linear algebra on parallel computers in order to acquire expertise in methods for parallel computations. An important motivation for analyzaing and investigating the potential for parallelism in these......The main goal of this project is to investigate, develop, and implement algorithms for numerical linear algebra on parallel computers in order to acquire expertise in methods for parallel computations. An important motivation for analyzaing and investigating the potential for parallelism...... in these algorithms is that many scientific applications rely heavily on the performance of the involved dense linear algebra building blocks. Even though we consider the distributed-memory as well as the shared-memory programming paradigm, the major part of the thesis is dedicated to distributed-memory architectures....... We emphasize distributed-memory massively parallel computers - such as the Connection Machines model CM-200 and model CM-5/CM-5E - available to us at UNI-C and at Thinking Machines Corporation. The CM-200 was at the time this project started one of the few existing massively parallel computers...

  3. Ring interconnection for distributed memory automation and computing system

    Energy Technology Data Exchange (ETDEWEB)

    Vinogradov, V I [Inst. for Nuclear Research of the Russian Academy of Sciences, Moscow (Russian Federation)

    1996-12-31

    Problems of development of measurement, acquisition and central systems based on a distributed memory and a ring interface are discussed. It has been found that the RAM LINK-type protocol can be used for ringlet links in non-symmetrical distributed memory architecture multiprocessor system interaction. 5 refs.

  4. The structural robustness of multiprocessor computing system

    Directory of Open Access Journals (Sweden)

    N. Andronaty

    1996-03-01

    Full Text Available The model of the multiprocessor computing system on the base of transputers which permits to resolve the question of valuation of a structural robustness (viability, survivability is described.

  5. Distributed Memory Parallel Computing with SEAWAT

    Science.gov (United States)

    Verkaik, J.; Huizer, S.; van Engelen, J.; Oude Essink, G.; Ram, R.; Vuik, K.

    2017-12-01

    Fresh groundwater reserves in coastal aquifers are threatened by sea-level rise, extreme weather conditions, increasing urbanization and associated groundwater extraction rates. To counteract these threats, accurate high-resolution numerical models are required to optimize the management of these precious reserves. The major model drawbacks are long run times and large memory requirements, limiting the predictive power of these models. Distributed memory parallel computing is an efficient technique for reducing run times and memory requirements, where the problem is divided over multiple processor cores. A new Parallel Krylov Solver (PKS) for SEAWAT is presented. PKS has recently been applied to MODFLOW and includes Conjugate Gradient (CG) and Biconjugate Gradient Stabilized (BiCGSTAB) linear accelerators. Both accelerators are preconditioned by an overlapping additive Schwarz preconditioner in a way that: a) subdomains are partitioned using Recursive Coordinate Bisection (RCB) load balancing, b) each subdomain uses local memory only and communicates with other subdomains by Message Passing Interface (MPI) within the linear accelerator, c) it is fully integrated in SEAWAT. Within SEAWAT, the PKS-CG solver replaces the Preconditioned Conjugate Gradient (PCG) solver for solving the variable-density groundwater flow equation and the PKS-BiCGSTAB solver replaces the Generalized Conjugate Gradient (GCG) solver for solving the advection-diffusion equation. PKS supports the third-order Total Variation Diminishing (TVD) scheme for computing advection. Benchmarks were performed on the Dutch national supercomputer (https://userinfo.surfsara.nl/systems/cartesius) using up to 128 cores, for a synthetic 3D Henry model (100 million cells) and the real-life Sand Engine model ( 10 million cells). The Sand Engine model was used to investigate the potential effect of the long-term morphological evolution of a large sand replenishment and climate change on fresh groundwater resources

  6. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    International Nuclear Information System (INIS)

    Nash, T.

    1989-05-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. 6 figs

  7. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    International Nuclear Information System (INIS)

    Nash, T.

    1989-01-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. (orig.)

  8. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    Science.gov (United States)

    Nash, Thomas

    1989-12-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.

  9. Parallel implementation and evaluation of motion estimation system algorithms on a distributed memory multiprocessor using knowledge based mappings

    Science.gov (United States)

    Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.

    1989-01-01

    Several techniques to perform static and dynamic load balancing techniques for vision systems are presented. These techniques are novel in the sense that they capture the computational requirements of a task by examining the data when it is produced. Furthermore, they can be applied to many vision systems because many algorithms in different systems are either the same, or have similar computational characteristics. These techniques are evaluated by applying them on a parallel implementation of the algorithms in a motion estimation system on a hypercube multiprocessor system. The motion estimation system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from different time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters. It is shown that the performance gains when these data decomposition and load balancing techniques are used are significant and the overhead of using these techniques is minimal.

  10. Simulation of Particulate Flows on Multi-Processor Machines with Distributed Memory

    International Nuclear Information System (INIS)

    Uhlmann, M.

    2004-01-01

    We present a method for the parallelization of an immersed boundary algorithm for particulate flows using the MPI standard of communication. The treatment of the fluid phase uses the domain decomposition technique over a Cartesian processor grid. The solution of the Hehnholtz problem is approximately factorized an relies upon apparel tri-diagonal solver; the Poisson problem is solved by means of a parallel multi-grid technique simulator MUDPACK. For the solid phase we employ a master-slaves technique where one process or handles all the particles contained in its Eulerian fluid sub-domain and zero or more neighbor processors collaborate in the computation of particle-related quantities whenever a particle position overlaps the boundary of a sub- do mam.The parallel efficiency for some preliminary computations is presented. (Author) 9 refs

  11. Simulation of Particulate Flows Multi-Processor Machines with Distributed Memory

    Energy Technology Data Exchange (ETDEWEB)

    Uhlmann, M.

    2004-07-01

    We presented a method for the parallelization of an immersed boundary algorithm for particulate flows using the MPI standard of communication. The treatment of the fluid phase used the domain decomposition technique over a Cartesian processor grid. The solution of the Helmholtz problem is approximately factorized an relies upon apparel tri-diagonal solver the Poisson problem is solved by means of a parallel multi-grid technique similar to MUDPACK. for the solid phase we employ a master-slaves technique where one processor handles all the particles contained in its Eulerian fluid sub-domain and zero or more neighbor processors collaborate in the computation of particle-related quantities whenever a particle position over laps the boundary of a sub-domain. the parallel efficiency for some preliminary computations is presented. (Author) 9 refs.

  12. A compositional reservoir simulator on distributed memory parallel computers

    International Nuclear Information System (INIS)

    Rame, M.; Delshad, M.

    1995-01-01

    This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. A portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented

  13. Solution of the Euler and Navier-Stokes equations on MIMD distributed memory multiprocessors using cyclic reduction

    International Nuclear Information System (INIS)

    Curchitser, E.N.; Pelz, R.B.; Marconi, F.

    1992-01-01

    The Euler and Navier-Stokes equations are solved for the steady, two-dimensional flow over a NACA 0012 airfoil using a 1024 node nCUBE/2 multiprocessor. Second-order, upwind-discretized difference equations are solved implicitly using ADI factorization. Parallel cyclic reduction is employed to solve the block tridiagonal systems. For realistic problems, communication times are negligible compared to calculation times. The processors are tightly synchronized, and their loads are well balanced. When the flux Jacobians flux are frozen, the wall-clock time for one implicit timestep is about equal to that of a multistage explicit scheme. 10 refs

  14. Adaptive Dynamic Process Scheduling on Distributed Memory Parallel Computers

    Directory of Open Access Journals (Sweden)

    Wei Shu

    1994-01-01

    Full Text Available One of the challenges in programming distributed memory parallel machines is deciding how to allocate work to processors. This problem is particularly important for computations with unpredictable dynamic behaviors or irregular structures. We present a scheme for dynamic scheduling of medium-grained processes that is useful in this context. The adaptive contracting within neighborhood (ACWN is a dynamic, distributed, load-dependent, and scalable scheme. It deals with dynamic and unpredictable creation of processes and adapts to different systems. The scheme is described and contrasted with two other schemes that have been proposed in this context, namely the randomized allocation and the gradient model. The performance of the three schemes on an Intel iPSC/2 hypercube is presented and analyzed. The experimental results show that even though the ACWN algorithm incurs somewhat larger overhead than the randomized allocation, it achieves better performance in most cases due to its adaptiveness. Its feature of quickly spreading the work helps it outperform the gradient model in performance and scalability.

  15. Computational cost of isogeometric multi-frontal solvers on parallel distributed memory machines

    KAUST Repository

    Woźniak, Maciej; Paszyński, Maciej R.; Pardo, D.; Dalcin, Lisandro; Calo, Victor M.

    2015-01-01

    This paper derives theoretical estimates of the computational cost for isogeometric multi-frontal direct solver executed on parallel distributed memory machines. We show theoretically that for the Cp-1 global continuity of the isogeometric solution

  16. The ACP (Advanced Computer Program) multiprocessor system at Fermilab

    Energy Technology Data Exchange (ETDEWEB)

    Nash, T.; Areti, H.; Atac, R.; Biel, J.; Case, G.; Cook, A.; Fischler, M.; Gaines, I.; Hance, R.; Husby, D.

    1986-09-01

    The Advanced Computer Program at Fermilab has developed a multiprocessor system which is easy to use and uniquely cost effective for many high energy physics problems. The system is based on single board computers which cost under $2000 each to build including 2 Mbytes of on board memory. These standard VME modules each run experiment reconstruction code in Fortran at speeds approaching that of a VAX 11/780. Two versions have been developed: one uses Motorola's 68020 32 bit microprocessor, the other runs with AT and T's 32100. both include the corresponding floating point coprocessor chip. The first system, when fully configured, uses 70 each of the two types of processors. A 53 processor system has been operated for several months with essentially no down time by computer operators in the Fermilab Computer Center, performing at nearly the capacity of 6 CDC Cyber 175 mainframe computers. The VME crates in which the processing ''nodes'' sit are connected via a high speed ''Branch Bus'' to one or more MicroVAX computers which act as hosts handling system resource management and all I/O in offline applications. An interface from Fastbus to the Branch Bus has been developed for online use which has been tested error free at 20 Mbytes/sec for 48 hours. ACP hardware modules are now available commercially. A major package of software, including a simulator that runs on any VAX, has been developed. It allows easy migration of existing programs to this multiprocessor environment. This paper describes the ACP Multiprocessor System and early experience with it at Fermilab and elsewhere.

  17. The ACP [Advanced Computer Program] multiprocessor system at Fermilab

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.

    1986-09-01

    The Advanced Computer Program at Fermilab has developed a multiprocessor system which is easy to use and uniquely cost effective for many high energy physics problems. The system is based on single board computers which cost under $2000 each to build including 2 Mbytes of on board memory. These standard VME modules each run experiment reconstruction code in Fortran at speeds approaching that of a VAX 11/780. Two versions have been developed: one uses Motorola's 68020 32 bit microprocessor, the other runs with AT and T's 32100. both include the corresponding floating point coprocessor chip. The first system, when fully configured, uses 70 each of the two types of processors. A 53 processor system has been operated for several months with essentially no down time by computer operators in the Fermilab Computer Center, performing at nearly the capacity of 6 CDC Cyber 175 mainframe computers. The VME crates in which the processing ''nodes'' sit are connected via a high speed ''Branch Bus'' to one or more MicroVAX computers which act as hosts handling system resource management and all I/O in offline applications. An interface from Fastbus to the Branch Bus has been developed for online use which has been tested error free at 20 Mbytes/sec for 48 hours. ACP hardware modules are now available commercially. A major package of software, including a simulator that runs on any VAX, has been developed. It allows easy migration of existing programs to this multiprocessor environment. This paper describes the ACP Multiprocessor System and early experience with it at Fermilab and elsewhere

  18. Software for the ACP [Advanced Computer Program] multiprocessor system

    International Nuclear Information System (INIS)

    Biel, J.; Areti, H.; Atac, R.

    1987-01-01

    Software has been developed for use with the Fermilab Advanced Computer Program (ACP) multiprocessor system. The software was designed to make a system of a hundred independent node processors as easy to use as a single, powerful CPU. Subroutines have been developed by which a user's host program can send data to and get results from the program running in each of his ACP node processors. Utility programs make it easy to compile and link host and node programs, to debug a node program on an ACP development system, and to submit a debugged program to an ACP production system

  19. External parallel sorting with multiprocessor computers

    International Nuclear Information System (INIS)

    Comanceau, S.I.

    1984-01-01

    This article describes methods of external sorting in which the entire main computer memory is used for the internal sorting of entries, forming out of them sorted segments of the greatest possible size, and outputting them to external memories. The obtained segments are merged into larger segments until all entries form one ordered segment. The described methods are suitable for sequential files stored on magnetic tape. The needs of the sorting algorithm can be met by using the relatively slow peripheral storage devices (e.g., tapes, disks, drums). The efficiency of the external sorting methods is determined by calculating the total sorting time as a function of the number of entries to be sorted and the number of parallel processors participating in the sorting process

  20. Parallel grid generation algorithm for distributed memory computers

    Science.gov (United States)

    Moitra, Stuti; Moitra, Anutosh

    1994-01-01

    A parallel grid-generation algorithm and its implementation on the Intel iPSC/860 computer are described. The grid-generation scheme is based on an algebraic formulation of homotopic relations. Methods for utilizing the inherent parallelism of the grid-generation scheme are described, and implementation of multiple levELs of parallelism on multiple instruction multiple data machines are indicated. The algorithm is capable of providing near orthogonality and spacing control at solid boundaries while requiring minimal interprocessor communications. Results obtained on the Intel hypercube for a blended wing-body configuration are used to demonstrate the effectiveness of the algorithm. Fortran implementations bAsed on the native programming model of the iPSC/860 computer and the Express system of software tools are reported. Computational gains in execution time speed-up ratios are given.

  1. Recommending the heterogeneous cluster type multi-processor system computing

    International Nuclear Information System (INIS)

    Iijima, Nobukazu

    2010-01-01

    Real-time reactor simulator had been developed by reusing the equipment of the Musashi reactor and its performance improvement became indispensable for research tools to increase sampling rate with introduction of arithmetic units using multi-Digital Signal Processor(DSP) system (cluster). In order to realize the heterogeneous cluster type multi-processor system computing, combination of two kinds of Control Processor (CP) s, Cluster Control Processor (CCP) and System Control Processor (SCP), were proposed with Large System Control Processor (LSCP) for hierarchical cluster if needed. Faster computing performance of this system was well evaluated by simulation results for simultaneous execution of plural jobs and also pipeline processing between clusters, which showed the system led to effective use of existing system and enhancement of the cost performance. (T. Tanaka)

  2. On a Multiprocessor Computer Farm for Online Physics Data Processing

    CERN Document Server

    Sinanis, N J

    1999-01-01

    The topic of this thesis is the design-phase performance evaluation of a large multiprocessor (MP) computer farm intended for the on-line data processing of the Compact Muon Solenoid (CMS) experiment. CMS is a high energy Physics experiment, planned to operate at CERN (Geneva, Switzerland) during the year 2005. The CMS computer farm is consisting of 1,000 MP computer systems and a 1,000 X 1,000 communications switch. The followed approach to the farm performance evaluation is through simulation studies and evaluation of small prototype systems building blocks of the farm. For the purposes of the simulation studies, we have developed a discrete-event, event-driven simulator that is capable to describe the high-level architecture of the farm and give estimates of the farm's performance. The simulator is designed in a modular way to facilitate the development of various modules that model the behavior of the farm building blocks in the desired level of detail. With the aid of this simulator, we make a particular...

  3. Studies of electron collisions with polyatomic molecules using distributed-memory parallel computers

    International Nuclear Information System (INIS)

    Winstead, C.; Hipes, P.G.; Lima, M.A.P.; McKoy, V.

    1991-01-01

    Elastic electron scattering cross sections from 5--30 eV are reported for the molecules C 2 H 4 , C 2 H 6 , C 3 H 8 , Si 2 H 6 , and GeH 4 , obtained using an implementation of the Schwinger multichannel method for distributed-memory parallel computer architectures. These results, obtained within the static-exchange approximation, are in generally good agreement with the available experimental data. These calculations demonstrate the potential of highly parallel computation in the study of collisions between low-energy electrons and polyatomic gases. The computational methodology discussed is also directly applicable to the calculation of elastic cross sections at higher levels of approximation (target polarization) and of electronic excitation cross sections

  4. Computational cost of isogeometric multi-frontal solvers on parallel distributed memory machines

    KAUST Repository

    Woźniak, Maciej

    2015-02-01

    This paper derives theoretical estimates of the computational cost for isogeometric multi-frontal direct solver executed on parallel distributed memory machines. We show theoretically that for the Cp-1 global continuity of the isogeometric solution, both the computational cost and the communication cost of a direct solver are of order O(log(N)p2) for the one dimensional (1D) case, O(Np2) for the two dimensional (2D) case, and O(N4/3p2) for the three dimensional (3D) case, where N is the number of degrees of freedom and p is the polynomial order of the B-spline basis functions. The theoretical estimates are verified by numerical experiments performed with three parallel multi-frontal direct solvers: MUMPS, PaStiX and SuperLU, available through PETIGA toolkit built on top of PETSc. Numerical results confirm these theoretical estimates both in terms of p and N. For a given problem size, the strong efficiency rapidly decreases as the number of processors increases, becoming about 20% for 256 processors for a 3D example with 1283 unknowns and linear B-splines with C0 global continuity, and 15% for a 3D example with 643 unknowns and quartic B-splines with C3 global continuity. At the same time, one cannot arbitrarily increase the problem size, since the memory required by higher order continuity spaces is large, quickly consuming all the available memory resources even in the parallel distributed memory version. Numerical results also suggest that the use of distributed parallel machines is highly beneficial when solving higher order continuity spaces, although the number of processors that one can efficiently employ is somehow limited.

  5. A simple multiprocessor management system for event-parallel computing

    International Nuclear Information System (INIS)

    Bracker, S.; Gounder, K.; Hendrix, K.; Summers, D.

    1996-01-01

    Offline software using Transmission Control Protocol/Internet Protocol (TCP/IP) sockets to distribute particle physics events to multiple UNIX/RISC workstations is described. A modular, building block approach was taken that allowed tailoring to solve specific tasks efficiently and simply as they arose. The modest, initial cost was having to learn about sockets for interprocess communication. This multiprocessor management software has been used to control the reconstruction of eight billion raw data events from Fermilab Experiment E791

  6. DiFX: A software correlator for very long baseline interferometry using multi-processor computing environments

    OpenAIRE

    Deller, A. T.; Tingay, S. J.; Bailes, M.; West, C.

    2007-01-01

    We describe the development of an FX style correlator for Very Long Baseline Interferometry (VLBI), implemented in software and intended to run in multi-processor computing environments, such as large clusters of commodity machines (Beowulf clusters) or computers specifically designed for high performance computing, such as multi-processor shared-memory machines. We outline the scientific and practical benefits for VLBI correlation, these chiefly being due to the inherent flexibility of softw...

  7. Some algorithms for the solution of the symmetric eigenvalue problem on a multiprocessor electronic computer

    International Nuclear Information System (INIS)

    Molchanov, I.N.; Khimich, A.N.

    1984-01-01

    This article shows how a reflection method can be used to find the eigenvalues of a matrix by transforming the matrix to tridiagonal form. The method of conjugate gradients is used to find the smallest eigenvalue and the corresponding eigenvector of symmetric positive-definite band matrices. Topics considered include the computational scheme of the reflection method, the organization of parallel calculations by the reflection method, the computational scheme of the conjugate gradient method, the organization of parallel calculations by the conjugate gradient method, and the effectiveness of parallel algorithms. It is concluded that it is possible to increase the overall effectiveness of the multiprocessor electronic computers by either letting the newly available processors of a new problem operate in the multiprocessor mode, or by improving the coefficient of uniform partition of the original information

  8. Matrix factorization on a hypercube multiprocessor

    International Nuclear Information System (INIS)

    Geist, G.A.; Heath, M.T.

    1985-08-01

    This paper is concerned with parallel algorithms for matrix factorization on distributed-memory, message-passing multiprocessors, with special emphasis on the hypercube. Both Cholesky factorization of symmetric positive definite matrices and LU factorization of nonsymmetric matrices using partial pivoting are considered. The use of the resulting triangular factors to solve systems of linear equations by forward and back substitutions is also considered. Efficiencies of various parallel computational approaches are compared in terms of empirical results obtained on an Intel iPSC hypercube. 19 refs., 6 figs., 2 tabs

  9. A parallelization study of the general purpose Monte Carlo code MCNP4 on a distributed memory highly parallel computer

    International Nuclear Information System (INIS)

    Yamazaki, Takao; Fujisaki, Masahide; Okuda, Motoi; Takano, Makoto; Masukawa, Fumihiro; Naito, Yoshitaka

    1993-01-01

    The general purpose Monte Carlo code MCNP4 has been implemented on the Fujitsu AP1000 distributed memory highly parallel computer. Parallelization techniques developed and studied are reported. A shielding analysis function of the MCNP4 code is parallelized in this study. A technique to map a history to each processor dynamically and to map control process to a certain processor was applied. The efficiency of parallelized code is up to 80% for a typical practical problem with 512 processors. These results demonstrate the advantages of a highly parallel computer to the conventional computers in the field of shielding analysis by Monte Carlo method. (orig.)

  10. Sparse distributed memory overview

    Science.gov (United States)

    Raugh, Mike

    1990-01-01

    The Sparse Distributed Memory (SDM) project is investigating the theory and applications of massively parallel computing architecture, called sparse distributed memory, that will support the storage and retrieval of sensory and motor patterns characteristic of autonomous systems. The immediate objectives of the project are centered in studies of the memory itself and in the use of the memory to solve problems in speech, vision, and robotics. Investigation of methods for encoding sensory data is an important part of the research. Examples of NASA missions that may benefit from this work are Space Station, planetary rovers, and solar exploration. Sparse distributed memory offers promising technology for systems that must learn through experience and be capable of adapting to new circumstances, and for operating any large complex system requiring automatic monitoring and control. Sparse distributed memory is a massively parallel architecture motivated by efforts to understand how the human brain works. Sparse distributed memory is an associative memory, able to retrieve information from cues that only partially match patterns stored in the memory. It is able to store long temporal sequences derived from the behavior of a complex system, such as progressive records of the system's sensory data and correlated records of the system's motor controls.

  11. Construction and Application of an AMR Algorithm for Distributed Memory Computers

    OpenAIRE

    Deiterding, Ralf

    2003-01-01

    While the parallelization of blockstructured adaptive mesh refinement techniques is relatively straight-forward on shared memory architectures, appropriate distribution strategies for the emerging generation of distributed memory machines are a topic of on-going research. In this paper, a locality-preserving domain decomposition is proposed that partitions the entire AMR hierarchy from the base level on. It is shown that the approach reduces the communication costs and simplifies the im...

  12. Highway traffic simulation on multi-processor computers

    Energy Technology Data Exchange (ETDEWEB)

    Hanebutte, U.R.; Doss, E.; Tentner, A.M.

    1997-04-01

    A computer model has been developed to simulate highway traffic for various degrees of automation with a high level of fidelity in regard to driver control and vehicle characteristics. The model simulates vehicle maneuvering in a multi-lane highway traffic system and allows for the use of Intelligent Transportation System (ITS) technologies such as an Automated Intelligent Cruise Control (AICC). The structure of the computer model facilitates the use of parallel computers for the highway traffic simulation, since domain decomposition techniques can be applied in a straight forward fashion. In this model, the highway system (i.e. a network of road links) is divided into multiple regions; each region is controlled by a separate link manager residing on an individual processor. A graphical user interface augments the computer model kv allowing for real-time interactive simulation control and interaction with each individual vehicle and road side infrastructure element on each link. Average speed and traffic volume data is collected at user-specified loop detector locations. Further, as a measure of safety the so- called Time To Collision (TTC) parameter is being recorded.

  13. Parallelization of MCNP 4, a Monte Carlo neutron and photon transport code system, in highly parallel distributed memory type computer

    International Nuclear Information System (INIS)

    Masukawa, Fumihiro; Takano, Makoto; Naito, Yoshitaka; Yamazaki, Takao; Fujisaki, Masahide; Suzuki, Koichiro; Okuda, Motoi.

    1993-11-01

    In order to improve the accuracy and calculating speed of shielding analyses, MCNP 4, a Monte Carlo neutron and photon transport code system, has been parallelized and measured of its efficiency in the highly parallel distributed memory type computer, AP1000. The code has been analyzed statically and dynamically, then the suitable algorithm for parallelization has been determined for the shielding analysis functions of MCNP 4. This includes a strategy where a new history is assigned to the idling processor element dynamically during the execution. Furthermore, to avoid the congestion of communicative processing, the batch concept, processing multi-histories by a unit, has been introduced. By analyzing a sample cask problem with 2,000,000 histories by the AP1000 with 512 processor elements, the 82 % of parallelization efficiency is achieved, and the calculational speed has been estimated to be around 50 times as fast as that of FACOM M-780. (author)

  14. Time complexity analysis for distributed memory computers: implementation of parallel conjugate gradient method

    NARCIS (Netherlands)

    Hoekstra, A.G.; Sloot, P.M.A.; Haan, M.J.; Hertzberger, L.O.; van Leeuwen, J.

    1991-01-01

    New developments in Computer Science, both hardware and software, offer researchers, such as physicists, unprecedented possibilities to solve their computational intensive problems.However, full exploitation of e.g. new massively parallel computers, parallel languages or runtime environments

  15. Efficient implementation of multidimensional fast fourier transform on a distributed-memory parallel multi-node computer

    Science.gov (United States)

    Bhanot, Gyan V [Princeton, NJ; Chen, Dong [Croton-On-Hudson, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

    2012-01-10

    The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.

  16. Efficient implementation of a multidimensional fast fourier transform on a distributed-memory parallel multi-node computer

    Science.gov (United States)

    Bhanot, Gyan V [Princeton, NJ; Chen, Dong [Croton-On-Hudson, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

    2008-01-01

    The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.

  17. Use of a genetic algorithm to solve two-fluid flow problems on an NCUBE multiprocessor computer

    International Nuclear Information System (INIS)

    Pryor, R.J.; Cline, D.D.

    1992-01-01

    A method of solving the two-phase fluid flow equations using a genetic algorithm on a NCUBE multiprocessor computer is presented. The topics discussed are the two-phase flow equations, the genetic representation of the unknowns, the fitness function, the genetic operators, and the implementation of the algorithm on the NCUBE computer. The efficiency of the implementation is investigated using a pipe blowdown problem. Effects of varying the genetic parameters and the number of processors are presented

  18. Use of a genetic agorithm to solve two-fluid flow problems on an NCUBE multiprocessor computer

    International Nuclear Information System (INIS)

    Pryor, R.J.; Cline, D.D.

    1993-01-01

    A method of solving the two-phases fluid flow equations using a genetic algorithm on a NCUBE multiprocessor computer is presented. The topics discussed are the two-phase flow equations, the genetic representation of the unkowns, the fitness function, the genetic operators, and the implementation of the algorithm on the NCUBE computer. The efficiency of the implementation is investigated using a pipe blowdown problem. Effects of varying the genetic parameters and the number of processors are presented. (orig.)

  19. Three-dimensional magnetic field computation on a distributed memory parallel processor

    International Nuclear Information System (INIS)

    Barion, M.L.

    1990-01-01

    The analysis of three-dimensional magnetic fields by finite element methods frequently proves too onerous a task for the computing resource on which it is attempted. When non-linear and transient effects are included, it may become impossible to calculate the field distribution to sufficient resolution. One approach to this problem is to exploit the natural parallelism in the finite element method via parallel processing. This paper reports on an implementation of a finite element code for non-linear three-dimensional low-frequency magnetic field calculation on Intel's iPSC/2

  20. A multiprocessor computer simulation model employing a feedback scheduler/allocator for memory space and bandwidth matching and TMR processing

    Science.gov (United States)

    Bradley, D. B.; Irwin, J. D.

    1974-01-01

    A computer simulation model for a multiprocessor computer is developed that is useful for studying the problem of matching multiprocessor's memory space, memory bandwidth and numbers and speeds of processors with aggregate job set characteristics. The model assumes an input work load of a set of recurrent jobs. The model includes a feedback scheduler/allocator which attempts to improve system performance through higher memory bandwidth utilization by matching individual job requirements for space and bandwidth with space availability and estimates of bandwidth availability at the times of memory allocation. The simulation model includes provisions for specifying precedence relations among the jobs in a job set, and provisions for specifying precedence execution of TMR (Triple Modular Redundant and SIMPLEX (non redundant) jobs.

  1. Sparse distributed memory

    Science.gov (United States)

    Denning, Peter J.

    1989-01-01

    Sparse distributed memory was proposed be Pentti Kanerva as a realizable architecture that could store large patterns and retrieve them based on partial matches with patterns representing current sensory inputs. This memory exhibits behaviors, both in theory and in experiment, that resemble those previously unapproached by machines - e.g., rapid recognition of faces or odors, discovery of new connections between seemingly unrelated ideas, continuation of a sequence of events when given a cue from the middle, knowing that one doesn't know, or getting stuck with an answer on the tip of one's tongue. These behaviors are now within reach of machines that can be incorporated into the computing systems of robots capable of seeing, talking, and manipulating. Kanerva's theory is a break with the Western rationalistic tradition, allowing a new interpretation of learning and cognition that respects biology and the mysteries of individual human beings.

  2. Computational design of RNA parts, devices, and transcripts with kinetic folding algorithms implemented on multiprocessor clusters.

    Science.gov (United States)

    Thimmaiah, Tim; Voje, William E; Carothers, James M

    2015-01-01

    With progress toward inexpensive, large-scale DNA assembly, the demand for simulation tools that allow the rapid construction of synthetic biological devices with predictable behaviors continues to increase. By combining engineered transcript components, such as ribosome binding sites, transcriptional terminators, ligand-binding aptamers, catalytic ribozymes, and aptamer-controlled ribozymes (aptazymes), gene expression in bacteria can be fine-tuned, with many corollaries and applications in yeast and mammalian cells. The successful design of genetic constructs that implement these kinds of RNA-based control mechanisms requires modeling and analyzing kinetically determined co-transcriptional folding pathways. Transcript design methods using stochastic kinetic folding simulations to search spacer sequence libraries for motifs enabling the assembly of RNA component parts into static ribozyme- and dynamic aptazyme-regulated expression devices with quantitatively predictable functions (rREDs and aREDs, respectively) have been described (Carothers et al., Science 334:1716-1719, 2011). Here, we provide a detailed practical procedure for computational transcript design by illustrating a high throughput, multiprocessor approach for evaluating spacer sequences and generating functional rREDs. This chapter is written as a tutorial, complete with pseudo-code and step-by-step instructions for setting up a computational cluster with an Amazon, Inc. web server and performing the large numbers of kinefold-based stochastic kinetic co-transcriptional folding simulations needed to design functional rREDs and aREDs. The method described here should be broadly applicable for designing and analyzing a variety of synthetic RNA parts, devices and transcripts.

  3. Multiprocessor programming environment

    Energy Technology Data Exchange (ETDEWEB)

    Smith, M.B.; Fornaro, R.

    1988-12-01

    Programming tools and techniques have been well developed for traditional uniprocessor computer systems. The focus of this research project is on the development of a programming environment for a high speed real time heterogeneous multiprocessor system, with special emphasis on languages and compilers. The new tools and techniques will allow a smooth transition for programmers with experience only on single processor systems.

  4. Multiprocessor development for robot control

    International Nuclear Information System (INIS)

    Lee, Jong Min; Kim, Byung Soo; Kim, Chang Hoi; Hwang, Suk Yong; Sohn, Surg Won; Yoon, Tae Seob; Lee, Yong Bum; Kim, Woong Ki

    1988-02-01

    A mutiprocessor system that is essential to A.I. (Artificial Intelligence) robot control was developed. A.I. robot control needs very complex real time control. The multiprocessor system interconnecting many SBC's (Single Board Computer) is much faster and accurater than using only one SBC. Various multiprocessor systems and their applications were compared and discussed. The multiprocessor architecture system is specially designed to be used in nuclear environments. The main functions are job distribution, multitasking, and intelligent remote control by SDLC protocol using optical fiber. The system can be applied to position control for locomotion and manipulation, data fusion system, and image processing. (Author)

  5. Embedded multiprocessors scheduling and synchronization

    CERN Document Server

    Sriram, Sundararajan

    2009-01-01

    Techniques for Optimizing Multiprocessor Implementations of Signal Processing ApplicationsAn indispensable component of the information age, signal processing is embedded in a variety of consumer devices, including cell phones and digital television, as well as in communication infrastructure, such as media servers and cellular base stations. Multiple programmable processors, along with custom hardware running in parallel, are needed to achieve the computation throughput required of such applications. Reviews important research in key areas related to the multiprocessor implementation of multi

  6. Multiprocessors for high energy physics

    International Nuclear Information System (INIS)

    Pohl, M.

    1987-01-01

    I review the role, status and progress of multiprocessor projects relevant to high energy physics. A short overview of the large variety of multiprocessors architectures is given, with special emphasis on machines suitable for experimental data reconstruction. A lot of progress has been made in the attempt to make the use of multiprocessors less painful by creating a ''Parallel Programming Environment'' supporting the non-expert user. A high degree of usability has been reached for coarse grain (event level) parallelism. The program development tools available on various systems (subroutine packages, preprocessors and parallelizing compilers) are discussed in some detail. Tools for execution control and debugging are also developing, thus opening the path from dedicated systems for large scale, stable production towards support of a more general job mix. At medium term, multiprocessors will thus cover a growing fraction of the typical high energy physics computing task. (orig.)

  7. Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

    Science.gov (United States)

    Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.

    1990-01-01

    Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.

  8. Distributed-Memory Fast Maximal Independent Set

    Energy Technology Data Exchange (ETDEWEB)

    Kanewala Appuhamilage, Thejaka Amila J.; Zalewski, Marcin J.; Lumsdaine, Andrew

    2017-09-13

    The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby’s seminal MIS algorithms, “Luby(A)” and “Luby(B),” to distributed-memory execution, and we evaluate their performance. We compare our results with the “Filtered MIS” implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.

  9. The parallel processing of EGS4 code on distributed memory scalar parallel computer:Intel Paragon XP/S15-256

    Energy Technology Data Exchange (ETDEWEB)

    Takemiya, Hiroshi; Ohta, Hirofumi; Honma, Ichirou

    1996-03-01

    The parallelization of Electro-Magnetic Cascade Monte Carlo Simulation Code, EGS4 on distributed memory scalar parallel computer: Intel Paragon XP/S15-256 is described. EGS4 has the feature that calculation time for one incident particle is quite different from each other because of the dynamic generation of secondary particles and different behavior of each particle. Granularity for parallel processing, parallel programming model and the algorithm of parallel random number generation are discussed and two kinds of method, each of which allocates particles dynamically or statically, are used for the purpose of realizing high speed parallel processing of this code. Among four problems chosen for performance evaluation, the speedup factors for three problems have been attained to nearly 100 times with 128 processor. It has been found that when both the calculation time for each incident particles and its dispersion are large, it is preferable to use dynamic particle allocation method which can average the load for each processor. And it has also been found that when they are small, it is preferable to use static particle allocation method which reduces the communication overhead. Moreover, it is pointed out that to get the result accurately, it is necessary to use double precision variables in EGS4 code. Finally, the workflow of program parallelization is analyzed and tools for program parallelization through the experience of the EGS4 parallelization are discussed. (author).

  10. A view of Kanerva's sparse distributed memory

    Science.gov (United States)

    Denning, P. J.

    1986-01-01

    Pentti Kanerva is working on a new class of computers, which are called pattern computers. Pattern computers may close the gap between capabilities of biological organisms to recognize and act on patterns (visual, auditory, tactile, or olfactory) and capabilities of modern computers. Combinations of numeric, symbolic, and pattern computers may one day be capable of sustaining robots. The overview of the requirements for a pattern computer, a summary of Kanerva's Sparse Distributed Memory (SDM), and examples of tasks this computer can be expected to perform well are given.

  11. Parallel diffusion calculation for the PHAETON on-line multiprocessor computer

    International Nuclear Information System (INIS)

    Collart, J.M.; Fedon-Magnaud, C.; Lautard, J.J.

    1987-04-01

    The aim of the PHAETON project is the design of an on-line computer in order to increase the immediate knowledge of the main operating and safety parameters in power plants. A significant stage is the computation of the three dimensional flux distribution. For cost and safety reason a computer based on a parallel microprocessor architecture has been studied. This paper presents a first approach to parallelized three dimensional diffusion calculation. A computing software has been written and built in a four processors demonstrator. We present the realization in progress, concerning the final equipment. 8 refs

  12. Lower bounds for the head-body-tail problem on parallel machines: a computational study for the multiprocessor flow shop

    NARCIS (Netherlands)

    A. Vandevelde; J.A. Hoogeveen; C.A.J. Hurkens (Cor); J.K. Lenstra (Jan Karel)

    2005-01-01

    htmlabstractThe multiprocessor flow-shop is the generalization of the flow-shop in which each machine is replaced by a set of identical machines. As finding a minimum-length schedule is NP-hard, we set out to find good lower and upper bounds. The lower bounds are based on relaxation of the

  13. Lower bounds for the head-body-tail problem on parallel machines : a computational study of the multiprocessor flow shop

    NARCIS (Netherlands)

    Vandevelde, A.; Hoogeveen, J.A.; Hurkens, C.A.J.; Lenstra, J.K.

    2005-01-01

    The multiprocessor flow-shop is the generalization of the flow-shop in which each machine is replaced by a set of identical machines. As finding a minimum-length schedule is NP-hard, we set out to find good lower and upper bounds. The lower bounds are based on relaxation of the capacities of all

  14. Multiprocessor development for robot control

    International Nuclear Information System (INIS)

    Lee, Jong Min; Kim, Seung Ho; Hwang, Suk Yeoung; Sohn, Surg Won; Kim, Byung Soo; Kim, Chang Hoi; Lee, Yong Bum; Kim, Woong Ki

    1988-12-01

    The object of this project is to develop a multiprocessor system which is essential to robot technology. A multiprocessor system interconnecting many single board computer is much faster and flexible than a single processor. The developed multiprocessor will be used to control nuclear mobile robot, so a loosely coupled system is adopted as a robot controller. A total configuration of controller is divided into three main parts in related with its function. It is consisted of supervisory control part, functional control part, remote control part. The designed control system is to be expanded easily for further use with a modular architecture, so the functional independency within sub-systems can be obtained throughout the system structure. Electromagnetic interference affecting to the control system is minimized by using optical fiber as communication media between robot and control system. System performances is enhanced not only by using distributed architecture in hardware, but by adopting real-time, multi-tasking operating system in software. The iRMX86 OS is used and reconfigured for real-time, multi-tasking operation. RS-485 serial communication protocol is used between functional control part and remote control part. Since the developed multiprocessor control system is an essential and fundamental technology for artificial intelligent robot, the result of this project can be applied directly to nuclear mobile robot. (Author)

  15. Green computing: power optimisation of VFI-based real-time multiprocessor dataflow applications (extended version)

    NARCIS (Netherlands)

    Ahmad, W.; Holzenspies, P.K.F.; Stoelinga, Mariëlle Ida Antoinette; van de Pol, Jan Cornelis

    2015-01-01

    Execution time is no longer the only performance metric for computer systems. In fact, a trend is emerging to trade raw performance for energy savings. Techniques like Dynamic Power Management (DPM, switching to low power state) and Dynamic Voltage and Frequency Scaling (DVFS, throttling processor

  16. Green computing: power optimisation of vfi-based real-time multiprocessor dataflow applications

    NARCIS (Netherlands)

    Ahmad, W.; Holzenspies, P.K.F.; Stoelinga, Mariëlle Ida Antoinette; van de Pol, Jan Cornelis

    2015-01-01

    Execution time is no longer the only performance metric for computer systems. In fact, a trend is emerging to trade raw performance for energy savings. Techniques like Dynamic Power Management (DPM, switching to low power state) and Dynamic Voltage and Frequency Scaling (DVFS, throttling processor

  17. Green computing: efficient energy management of multiprocessor streaming applications via model checking

    NARCIS (Netherlands)

    Ahmad, W.

    2017-01-01

    Streaming applications such as virtual reality, video conferencing, and face detection, impose high demands on a system’s performance and battery life. With the advancement in mobile computing, these applications are increasingly implemented on battery-constrained platforms, such as gaming consoles,

  18. Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

    Science.gov (United States)

    Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.

    2017-12-01

    This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.

  19. Icarus: A 2-D Direct Simulation Monte Carlo (DSMC) Code for Multi-Processor Computers

    International Nuclear Information System (INIS)

    BARTEL, TIMOTHY J.; PLIMPTON, STEVEN J.; GALLIS, MICHAIL A.

    2001-01-01

    Icarus is a 2D Direct Simulation Monte Carlo (DSMC) code which has been optimized for the parallel computing environment. The code is based on the DSMC method of Bird[11.1] and models from free-molecular to continuum flowfields in either cartesian (x, y) or axisymmetric (z, r) coordinates. Computational particles, representing a given number of molecules or atoms, are tracked as they have collisions with other particles or surfaces. Multiple species, internal energy modes (rotation and vibration), chemistry, and ion transport are modeled. A new trace species methodology for collisions and chemistry is used to obtain statistics for small species concentrations. Gas phase chemistry is modeled using steric factors derived from Arrhenius reaction rates or in a manner similar to continuum modeling. Surface chemistry is modeled with surface reaction probabilities; an optional site density, energy dependent, coverage model is included. Electrons are modeled by either a local charge neutrality assumption or as discrete simulational particles. Ion chemistry is modeled with electron impact chemistry rates and charge exchange reactions. Coulomb collision cross-sections are used instead of Variable Hard Sphere values for ion-ion interactions. The electro-static fields can either be: externally input, a Langmuir-Tonks model or from a Green's Function (Boundary Element) based Poison Solver. Icarus has been used for subsonic to hypersonic, chemically reacting, and plasma flows. The Icarus software package includes the grid generation, parallel processor decomposition, post-processing, and restart software. The commercial graphics package, Tecplot, is used for graphics display. All of the software packages are written in standard Fortran

  20. Modeling and Analyzing Real-Time Multiprocessor Systems

    NARCIS (Netherlands)

    Wiggers, M.H.; Thiele, Lothar; Lee, Edward A.; Schlieker, Simon; Bekooij, Marco Jan Gerrit

    2010-01-01

    Researchers have proposed approaches to verify that real-time multiprocessor systems meet their timeliness constraints. These approaches make assumptions on the model of computation, the load placed on the multiprocessor system, and the faults that can arise. This heterogeneous set of assumptions

  1. Multiprocessor data acquisition system

    International Nuclear Information System (INIS)

    Haumann, J.R.; Crawford, R.K.

    1987-01-01

    A multiprocessor data acquisition system has been built to replace the single processor systems at the Intense Pulsed Neutron Source (IPNS) at Argonne National Laboratory. The multiprocessor system was needed to accommodate the higher data rates at IPNS brought about by improvements in the source and changes in instrument configurations. This paper describes the hardware configuration of the system and the method of task sharing and compares results to the single processor system

  2. Application of the coupled code Athlet-Quabox/Cubbox for the extreme scenarios of the OECD/NRC BWR turbine trip benchmark and its performance on multi-processor computers

    International Nuclear Information System (INIS)

    Langenbuch, S.; Schmidt, K.D.; Velkov, K.

    2003-01-01

    The OECD/NRC BWR Turbine Trip (TT) Benchmark is investigated to perform code-to-code comparison of coupled codes including a comparison to measured data which are available from turbine trip experiments at Peach Bottom 2. This Benchmark problem for a BWR over-pressure transient represents a challenging application of coupled codes which integrate 3-dimensional neutron kinetics into thermal-hydraulic system codes for best-estimate simulation of plant transients. This transient represents a typical application of coupled codes which are usually performed on powerful workstations using a single CPU. Nowadays, the availability of multi-CPUs is much easier. Indeed, powerful workstations already provide 4 to 8 CPU, computer centers give access to multi-processor systems with numbers of CPUs in the order of 16 up to several 100. Therefore, the performance of the coupled code Athlet-Quabox/Cubbox on multi-processor systems is studied. Different cases of application lead to changing requirements of the code efficiency, because the amount of computer time spent in different parts of the code is varying. This paper presents main results of the coupled code Athlet-Quabox/Cubbox for the extreme scenarios of the BWR TT Benchmark together with evaluations of the code performance on multi-processor computers. (authors)

  3. File-System Workload on a Scientific Multiprocessor

    Science.gov (United States)

    Kotz, David; Nieuwejaar, Nils

    1995-01-01

    Many scientific applications have intense computational and I/O requirements. Although multiprocessors have permitted astounding increases in computational performance, the formidable I/O needs of these applications cannot be met by current multiprocessors a their I/O subsystems. To prevent I/O subsystems from forever bottlenecking multiprocessors and limiting the range of feasible applications, new I/O subsystems must be designed. The successful design of computer systems (both hardware and software) depends on a thorough understanding of their intended use. A system designer optimizes the policies and mechanisms for the cases expected to most common in the user's workload. In the case of multiprocessor file systems, however, designers have been forced to build file systems based only on speculation about how they would be used, extrapolating from file-system characterizations of general-purpose workloads on uniprocessor and distributed systems or scientific workloads on vector supercomputers (see sidebar on related work). To help these system designers, in June 1993 we began the Charisma Project, so named because the project sought to characterize 1/0 in scientific multiprocessor applications from a variety of production parallel computing platforms and sites. The Charisma project is unique in recording individual read and write requests-in live, multiprogramming, parallel workloads (rather than from selected or nonparallel applications). In this article, we present the first results from the project: a characterization of the file-system workload an iPSC/860 multiprocessor running production, parallel scientific applications at NASA's Ames Research Center.

  4. The art of multiprocessor programming

    CERN Document Server

    Herlihy, Maurice

    2012-01-01

    Revised and updated with improvements conceived in parallel programming courses, The Art of Multiprocessor Programming is an authoritative guide to multicore programming. It introduces a higher level set of software development skills than that needed for efficient single-core programming. This book provides comprehensive coverage of the new principles, algorithms, and tools necessary for effective multiprocessor programming. Students and professionals alike will benefit from thorough coverage of key multiprocessor programming issues. This revised edition incorporates much-demanded updates t

  5. 3-dimensional magnetotelluric inversion including topography using deformed hexahedral edge finite elements and direct solvers parallelized on symmetric multiprocessor computers - Part II: direct data-space inverse solution

    Science.gov (United States)

    Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.

    2016-01-01

    Following the creation described in Part I of a deformable edge finite-element simulator for 3-D magnetotelluric (MT) responses using direct solvers, in Part II we develop an algorithm named HexMT for 3-D regularized inversion of MT data including topography. Direct solvers parallelized on large-RAM, symmetric multiprocessor (SMP) workstations are used also for the Gauss-Newton model update. By exploiting the data-space approach, the computational cost of the model update becomes much less in both time and computer memory than the cost of the forward simulation. In order to regularize using the second norm of the gradient, we factor the matrix related to the regularization term and apply its inverse to the Jacobian, which is done using the MKL PARDISO library. For dense matrix multiplication and factorization related to the model update, we use the PLASMA library which shows very good scalability across processor cores. A synthetic test inversion using a simple hill model shows that including topography can be important; in this case depression of the electric field by the hill can cause false conductors at depth or mask the presence of resistive structure. With a simple model of two buried bricks, a uniform spatial weighting for the norm of model smoothing recovered more accurate locations for the tomographic images compared to weightings which were a function of parameter Jacobians. We implement joint inversion for static distortion matrices tested using the Dublin secret model 2, for which we are able to reduce nRMS to ˜1.1 while avoiding oscillatory convergence. Finally we test the code on field data by inverting full impedance and tipper MT responses collected around Mount St Helens in the Cascade volcanic chain. Among several prominent structures, the north-south trending, eruption-controlling shear zone is clearly imaged in the inversion.

  6. Solution of equations of electric power networks in multiprocessor computers; Solucao de equacoes de redes de energia eletrica em computadores multiprocessadores

    Energy Technology Data Exchange (ETDEWEB)

    Feltrin, Antonio Padilha

    1991-05-01

    This thesis describes a methodology for decomposing the repeat solution process the equation Ax = b independent tasks to be done in parallel, based on the matrix inverse factors (W-matrix) with partitions. The partitioning scheme proposed in this thesis consists of breaking up the W-matrix according the depths of the factorization path tree. In this scheme, all the information needed to generate the partitions can be obtained straightforward from the network factorization path tree. The partitioning algorithm is simple and ease to implement. The elements of W, matrices, except the last partition, can be obtained directly from L-matrix elements, not requiring extra work. The proposed scheme guarantees that additional fills be only created in the last partition. The forward and backward solutions are performed by rows, and the strategy proposed is to schedule on beach processor the operations corresponding to a row of each partition. It should be kept in mind that the multiprocessor environments are equipped with powerful unit processor and then it seems a sound strategy to perform the multi-add elementary operations inside the hardware in order to exploit its computing efficiency. This strategy seeks to match the parallel algorithm to the parallel architecture. The precedent relations - that give rise to delays - are replaced by multi-add operations performed inside the processor mode without external communication. In the partition, the forward, diagonal and backward solutions may be gathered, and so all the operations can be expressed as the product of matrix W{sub lp} by the update components of vector b. The performance results show that the potential speedup of the solution time is essentially bounded by the floating point operation capability of each processor, denoting that the methodology is a suitable way to exploit the growing power of the computing technology. 46 figs, 40 refs, 49 tabs.

  7. Academic training: Advanced lectures on multiprocessor programming

    CERN Multimedia

    PH Department

    2011-01-01

    Academic Training Lecture - Regular Programme 31 October 1, 2 November 2011 from 11:00 to 12:00 -  IT Auditorium, Bldg. 31   Three classes (60 mins) on Multiprocessor Programming Prof. Dr. Christoph von Praun Georg-Simon-Ohm University of Applied Sciences Nuremberg, Germany This is an advanced class on multiprocessor programming. The class gives an introduction to principles of concurrent objects and the notion of different progress guarantees that concurrent computations can have. The focus of this class is on non-blocking computations, i.e. concurrent programs that do not make use of locks. We discuss the implementation of practical non-blocking data structures in detail. 1st class: Introduction to concurrent objects 2nd class: Principles of non-blocking synchronization 3rd class: Concurrent queues Brief Bio of Christoph von Praun Christoph worked on a variety of analysis techniques and runtime platforms for parallel programs. Hist most recent research studies programming models an...

  8. The Distributed Memory Computing Conference (5th) Held in Charleston, South Carolina on April 8-12, 1990. Volume 1. Applications

    Science.gov (United States)

    1991-03-31

    194 R. Battiti Session 8: Ray Tracing Hypercube Algorithm for Radiosity in a Ray Tracing Environment ............. 206 S.A. Hermitage, T.L...for Radiosity in a Ray Tracing Environment Shirley A. Hermitage Terrance L. Huntsberger Department of Computer Science Beverly A. Huntsberger Augusta...Abstract or transmitted, and in most cases, ignores any diffuse reflection. Radiosity , on the other hand, attempts to account for a phenomenon that ray

  9. Debugging in a multi-processor environment

    International Nuclear Information System (INIS)

    Spann, J.M.

    1981-01-01

    The Supervisory Control and Diagnostic System (SCDS) for the Mirror Fusion Test Facility (MFTF) consists of nine 32-bit minicomputers arranged in a tightly coupled distributed computer system utilizing a share memory as the data exchange medium. Debugging of more than one program in the multi-processor environment is a difficult process. This paper describes what new tools were developed and how the testing of software is performed in the SCDS for the MFTF project

  10. TUMULT, the Twente University multiprocessor

    NARCIS (Netherlands)

    Scholten, Johan; Jansen, P.G.

    1988-01-01

    TUMULT, (Twente University multiprocessor) is described. Its aim is the design and implementation of a modular extendable multiprocessor system. Up to 15 processing elements are connected through an interprocessor communication network, using message-passing for the exchange of data. The hardware is

  11. Multiprocessor architecture: Synthesis and evaluation

    Science.gov (United States)

    Standley, Hilda M.

    1990-01-01

    Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.

  12. Languages, compilers and run-time environments for distributed memory machines

    CERN Document Server

    Saltz, J

    1992-01-01

    Papers presented within this volume cover a wide range of topics related to programming distributed memory machines. Distributed memory architectures, although having the potential to supply the very high levels of performance required to support future computing needs, present awkward programming problems. The major issue is to design methods which enable compilers to generate efficient distributed memory programs from relatively machine independent program specifications. This book is the compilation of papers describing a wide range of research efforts aimed at easing the task of programmin

  13. Parallel-vector algorithms for particle simulations on shared-memory multiprocessors

    International Nuclear Information System (INIS)

    Nishiura, Daisuke; Sakaguchi, Hide

    2011-01-01

    Over the last few decades, the computational demands of massive particle-based simulations for both scientific and industrial purposes have been continuously increasing. Hence, considerable efforts are being made to develop parallel computing techniques on various platforms. In such simulations, particles freely move within a given space, and so on a distributed-memory system, load balancing, i.e., assigning an equal number of particles to each processor, is not guaranteed. However, shared-memory systems achieve better load balancing for particle models, but suffer from the intrinsic drawback of memory access competition, particularly during (1) paring of contact candidates from among neighboring particles and (2) force summation for each particle. Here, novel algorithms are proposed to overcome these two problems. For the first problem, the key is a pre-conditioning process during which particle labels are sorted by a cell label in the domain to which the particles belong. Then, a list of contact candidates is constructed by pairing the sorted particle labels. For the latter problem, a table comprising the list indexes of the contact candidate pairs is created and used to sum the contact forces acting on each particle for all contacts according to Newton's third law. With just these methods, memory access competition is avoided without additional redundant procedures. The parallel efficiency and compatibility of these two algorithms were evaluated in discrete element method (DEM) simulations on four types of shared-memory parallel computers: a multicore multiprocessor computer, scalar supercomputer, vector supercomputer, and graphics processing unit. The computational efficiency of a DEM code was found to be drastically improved with our algorithms on all but the scalar supercomputer. Thus, the developed parallel algorithms are useful on shared-memory parallel computers with sufficient memory bandwidth.

  14. A general purpose subroutine for fast fourier transform on a distributed memory parallel machine

    Science.gov (United States)

    Dubey, A.; Zubair, M.; Grosch, C. E.

    1992-01-01

    One issue which is central in developing a general purpose Fast Fourier Transform (FFT) subroutine on a distributed memory parallel machine is the data distribution. It is possible that different users would like to use the FFT routine with different data distributions. Thus, there is a need to design FFT schemes on distributed memory parallel machines which can support a variety of data distributions. An FFT implementation on a distributed memory parallel machine which works for a number of data distributions commonly encountered in scientific applications is presented. The problem of rearranging the data after computing the FFT is also addressed. The performance of the implementation on a distributed memory parallel machine Intel iPSC/860 is evaluated.

  15. Distributed parallel messaging for multiprocessor systems

    Science.gov (United States)

    Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

    2013-06-04

    A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.

  16. Real-Time Multiprocessor Programming Language (RTMPL) user's manual

    Science.gov (United States)

    Arpasi, D. J.

    1985-01-01

    A real-time multiprocessor programming language (RTMPL) has been developed to provide for high-order programming of real-time simulations on systems of distributed computers. RTMPL is a structured, engineering-oriented language. The RTMPL utility supports a variety of multiprocessor configurations and types by generating assembly language programs according to user-specified targeting information. Many programming functions are assumed by the utility (e.g., data transfer and scaling) to reduce the programming chore. This manual describes RTMPL from a user's viewpoint. Source generation, applications, utility operation, and utility output are detailed. An example simulation is generated to illustrate many RTMPL features.

  17. Software for event oriented processing on multiprocessor systems

    International Nuclear Information System (INIS)

    Fischler, M.; Areti, H.; Biel, J.; Bracker, S.; Case, G.; Gaines, I.; Husby, D.; Nash, T.

    1984-08-01

    Computing intensive problems that require the processing of numerous essentially independent events are natural customers for large scale multi-microprocessor systems. This paper describes the software required to support users with such problems in a multiprocessor environment. It is based on experience with and development work aimed at processing very large amounts of high energy physics data

  18. Multiprocessor systems and their concurrency

    Energy Technology Data Exchange (ETDEWEB)

    Starke, P H

    1984-01-01

    A multiprocessor system can be considered as a collection of finite automata which communicate over channels or shared memory units. The behaviour of such a system can be described by a semilanguage. This approach allows to define a numerical measure for the concurrency of multiprocessor systems and of distributed systems. This measure is characterized algebraically and the reconfiguration problem asking for an algorithm to construct an l-processor system which is equivalent to a given n-processor system is solved in the paper. 6 references.

  19. Runtime adaptive multi-processor system-on-chip: RAMPSoC

    OpenAIRE

    Göhringer, D.; Hübner, M.; Schatz, V.; Becker, J.

    2008-01-01

    Current trends in high performance computing show, that the usage of multiprocessor systems on chip are one approach for the requirements of computing intensive applications. The multiprocessor system on chip (MPSoC) approaches often provide a static and homogeneous infrastructure of networked microprocessor on the chip die. A novel idea in this research area is to introduce the dynamic adaptivity of reconfigurable hardware in order to provide a flexible heterogeneous set of processing elemen...

  20. Plasma physics modeling and the Cray-2 multiprocessor

    International Nuclear Information System (INIS)

    Killeen, J.

    1985-01-01

    The importance of computer modeling in the magnetic fusion energy research program is discussed. The need for the most advanced supercomputers is described. To meet the demand for more powerful scientific computers to solve larger and more complicated problems, the computer industry is developing multiprocessors. The role of the Cray-2 in plasma physics modeling is discussed with some examples. 28 refs., 2 figs., 1 tab

  1. 2: Local area networks as a multiprocessor treatment planning system

    International Nuclear Information System (INIS)

    Neblett, D.L.; Hogan, S.E.

    1987-01-01

    The creation of a local area network (LAN) of interconnected computers provides an environment of multi computer processors that adds a new dimension to treatment planning. A LAN system provides the opportunity to have two or more computers working on the plan in parallel. With high speed interprocessor transfer, events such as the time consuming task of correcting several individual beams for contours and inhomogeneities can be performed simultaneously; thus, effectively creating a parallel multiprocessor treatment planning system

  2. Parallel Breadth-First Search on Distributed Memory Systems

    Energy Technology Data Exchange (ETDEWEB)

    Computational Research Division; Buluc, Aydin; Madduri, Kamesh

    2011-04-15

    Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms. We present two highly-tuned par- allel approaches for BFS on large parallel systems: a level-synchronous strategy that relies on a simple vertex-based partitioning of the graph, and a two-dimensional sparse matrix- partitioning-based approach that mitigates parallel commu- nication overhead. For both approaches, we also present hybrid versions with intra-node multithreading. Our novel hybrid two-dimensional algorithm reduces communication times by up to a factor of 3.5, relative to a common vertex based approach. Our experimental study identifies execu- tion regimes in which these approaches will be competitive, and we demonstrate extremely high performance on lead- ing distributed-memory parallel systems. For instance, for a 40,000-core parallel execution on Hopper, an AMD Magny- Cours based system, we achieve a BFS performance rate of 17.8 billion edge visits per second on an undirected graph of 4.3 billion vertices and 68.7 billion edges with skewed degree distribution.

  3. Safety-critical Java with cyclic executives on chip-multiprocessors

    DEFF Research Database (Denmark)

    Ravn, Anders P.; Schoeberl, Martin

    2012-01-01

    Chip-multiprocessors offer increased processing power at a low cost. However, in order to use them for real-time systems, tasks have to be scheduled efficiently and predictably. It is well known that finding optimal schedules is a computationally hard problem. In this paper we present a solution ...... for multiprocessors, we have implemented it in the context of safety-critical Java on a Java processor....

  4. A Multiprocessor Operating System Simulator

    Science.gov (United States)

    Johnston, Gary M.; Campbell, Roy H.

    1988-01-01

    This paper describes a multiprocessor operating system simulator that was developed by the authors in the Fall semester of 1987. The simulator was built in response to the need to provide students with an environment in which to build and test operating system concepts as part of the coursework of a third-year undergraduate operating systems course. Written in C++, the simulator uses the co-routine style task package that is distributed with the AT&T C++ Translator to provide a hierarchy of classes that represents a broad range of operating system software and hardware components. The class hierarchy closely follows that of the 'Choices' family of operating systems for loosely- and tightly-coupled multiprocessors. During an operating system course, these classes are refined and specialized by students in homework assignments to facilitate experimentation with different aspects of operating system design and policy decisions. The current implementation runs on the IBM RT PC under 4.3bsd UNIX.

  5. Reproducibility in a multiprocessor system

    Science.gov (United States)

    Bellofatto, Ralph A; Chen, Dong; Coteus, Paul W; Eisley, Noel A; Gara, Alan; Gooding, Thomas M; Haring, Rudolf A; Heidelberger, Philip; Kopcsay, Gerard V; Liebsch, Thomas A; Ohmacht, Martin; Reed, Don D; Senger, Robert M; Steinmacher-Burow, Burkhard; Sugawara, Yutaka

    2013-11-26

    Fixing a problem is usually greatly aided if the problem is reproducible. To ensure reproducibility of a multiprocessor system, the following aspects are proposed; a deterministic system start state, a single system clock, phase alignment of clocks in the system, system-wide synchronization events, reproducible execution of system components, deterministic chip interfaces, zero-impact communication with the system, precise stop of the system and a scan of the system state.

  6. Operating system for a real-time multiprocessor propulsion system simulator. User's manual

    Science.gov (United States)

    Cole, G. L.

    1985-01-01

    The NASA Lewis Research Center is developing and evaluating experimental hardware and software systems to help meet future needs for real-time, high-fidelity simulations of air-breathing propulsion systems. Specifically, the real-time multiprocessor simulator project focuses on the use of multiple microprocessors to achieve the required computing speed and accuracy at relatively low cost. Operating systems for such hardware configurations are generally not available. A real time multiprocessor operating system (RTMPOS) that supports a variety of multiprocessor configurations was developed at Lewis. With some modification, RTMPOS can also support various microprocessors. RTMPOS, by means of menus and prompts, provides the user with a versatile, user-friendly environment for interactively loading, running, and obtaining results from a multiprocessor-based simulator. The menu functions are described and an example simulation session is included to demonstrate the steps required to go from the simulation loading phase to the execution phase.

  7. Multiprocessor shared-memory information exchange

    International Nuclear Information System (INIS)

    Santoline, L.L.; Bowers, M.D.; Crew, A.W.; Roslund, C.J.; Ghrist, W.D. III

    1989-01-01

    In distributed microprocessor-based instrumentation and control systems, the inter-and intra-subsystem communication requirements ultimately form the basis for the overall system architecture. This paper describes a software protocol which addresses the intra-subsystem communications problem. Specifically the protocol allows for multiple processors to exchange information via a shared-memory interface. The authors primary goal is to provide a reliable means for information to be exchanged between central application processor boards (masters) and dedicated function processor boards (slaves) in a single computer chassis. The resultant Multiprocessor Shared-Memory Information Exchange (MSMIE) protocol, a standard master-slave shared-memory interface suitable for use in nuclear safety systems, is designed to pass unidirectional buffers of information between the processors while providing a minimum, deterministic cycle time for this data exchange

  8. Migration of vectorized iterative solvers to distributed memory architectures

    Energy Technology Data Exchange (ETDEWEB)

    Pommerell, C. [AT& T Bell Labs., Murray Hill, NJ (United States); Ruehl, R. [CSCS-ETH, Manno (Switzerland)

    1994-12-31

    Both necessity and opportunity motivate the use of high-performance computers for iterative linear solvers. Necessity results from the size of the problems being solved-smaller problems are often better handled by direct methods. Opportunity arises from the formulation of the iterative methods in terms of simple linear algebra operations, even if this {open_quote}natural{close_quotes} parallelism is not easy to exploit in irregularly structured sparse matrices and with good preconditioners. As a result, high-performance implementations of iterative solvers have attracted a lot of interest in recent years. Most efforts are geared to vectorize or parallelize the dominating operation-structured or unstructured sparse matrix-vector multiplication, or to increase locality and parallelism by reformulating the algorithm-reducing global synchronization in inner products or local data exchange in preconditioners. Target architectures for iterative solvers currently include mostly vector supercomputers and architectures with one or few optimized (e.g., super-scalar and/or super-pipelined RISC) processors and hierarchical memory systems. More recently, parallel computers with physically distributed memory and a better price/performance ratio have been offered by vendors as a very interesting alternative to vector supercomputers. However, programming comfort on such distributed memory parallel processors (DMPPs) still lags behind. Here the authors are concerned with iterative solvers and their changing computing environment. In particular, they are considering migration from traditional vector supercomputers to DMPPs. Application requirements force one to use flexible and portable libraries. They want to extend the portability of iterative solvers rather than reimplementing everything for each new machine, or even for each new architecture.

  9. Advanced lectures on multiprocessor programming (1/3)

    CERN Multimedia

    CERN. Geneva

    2011-01-01

    Three classes (60 mins) on Multiprocessor Programming Prof. Dr. Christoph von Praun Georg-Simon-Ohm University of Applied Sciences Nuremberg, Germany This is an advanced class on multiprocessor programming. The class gives an introduction to principles of concurrent objects and the notion of different progress guarantees that concurrent computations can have. The focus of this class is on non-blocking computations, i.e. concurrent programs that do not make use of locks. We discuss the implementation of practical non-blocking data structures in detail. 1st class: Introduction to concurrent objects 2nd class: Principles of non-blocking synchronization 3rd class: Concurrent queues Brief Bio of Christoph von Praun Christoph worked on a variety of analysis techniques and runtime platforms for parallel programs. Hist most recent research studies programming models and tools that support transactional synchronization. In prior work, which he also did at the IBM T.J. Watson Research Center in Yorktown Height...

  10. Single-chip serial channel enhances multi-processor systems

    Energy Technology Data Exchange (ETDEWEB)

    Millar, J.

    1982-01-01

    In this paper multiprocessor systems are described and explained. The impact that VLSI advancements are having on multiprocessor design is pointed out. The TMS 7041 single-chip microcomputer is described briefly, highlighting its multiprocessor communication capability. And finally, a typical multiprocessor system is shown, implementing the TMS 7041.

  11. Scientific applications and numerical algorithms on the midas multiprocessor system

    International Nuclear Information System (INIS)

    Logan, D.; Maples, C.

    1986-01-01

    The MIDAS multiprocessor system is a multi-level, hierarchial structure designed at the Advanced Computer Architecture Laboratory of the University of California's Lawrence Berkeley Laboratory. A two-stage, 11-processor system has been operational for over a year and is currently undergoing expansion. It has been employed to investigate the performance of different methods of decomposing various problems and algorithms into a multiprocessor environment. The results of such tests on a variety of applications such as scientific data analysis, Monte Carlo calculations, and image processing, are discussed. Often such decompositions involve investigating the parallel structure of fundamental algorithms. Several basic algorithms dealing with random number generation, matrix diagonalization, fast Fourier transforms, and finite element methods in solving partial differential equations are also discussed. The performance and projected extensibilities of these decompositions on the MIDAS system are reported

  12. On the Parallel Elliptic Single/Multigrid Solutions about Aligned and Nonaligned Bodies Using the Virtual Machine for Multiprocessors

    Directory of Open Access Journals (Sweden)

    A. Averbuch

    1994-01-01

    Full Text Available Parallel elliptic single/multigrid solutions around an aligned and nonaligned body are presented and implemented on two multi-user and single-user shared memory multiprocessors (Sequent Symmetry and MOS and on a distributed memory multiprocessor (a Transputer network. Our parallel implementation uses the Virtual Machine for Muli-Processors (VMMP, a software package that provides a coherent set of services for explicitly parallel application programs running on diverse multiple instruction multiple data (MIMD multiprocessors, both shared memory and message passing. VMMP is intended to simplify parallel program writing and to promote portable and efficient programming. Furthermore, it ensures high portability of application programs by implementing the same services on all target multiprocessors. The performance of our algorithm is investigated in detail. It is seen to fit well the above architectures when the number of processors is less than the maximal number of grid points along the axes. In general, the efficiency in the nonaligned case is higher than in the aligned case. Alignment overhead is observed to be up to 200% in the shared-memory case and up to 65% in the message-passing case. We have demonstrated that when using VMMP, the portability of the algorithms is straightforward and efficient.

  13. Multiprocessor scheduling for real-time systems

    CERN Document Server

    Baruah, Sanjoy; Buttazzo, Giorgio

    2015-01-01

    This book provides a comprehensive overview of both theoretical and pragmatic aspects of resource-allocation and scheduling in multiprocessor and multicore hard-real-time systems.  The authors derive new, abstract models of real-time tasks that capture accurately the salient features of real application systems that are to be implemented on multiprocessor platforms, and identify rules for mapping application systems onto the most appropriate models.  New run-time multiprocessor scheduling algorithms are presented, which are demonstrably better than those currently used, both in terms of run-time efficiency and tractability of off-line analysis.  Readers will benefit from a new design and analysis framework for multiprocessor real-time systems, which will translate into a significantly enhanced ability to provide formally verified, safety-critical real-time systems at a significantly lower cost.

  14. Multiprocessor development for robot control

    International Nuclear Information System (INIS)

    Lee, John Min; Kim, Seung Ho; Kim, Chang Hoi; Kim, Byung Soo; Hwang, Suk Yeong; Lee, Young Bum; Sohn, Suk Won; Kim, Woon Gi

    1990-01-01

    The project of this study is to develop a real time controller applying autonomous robotic systems operated in hostile environment. Developed control system is designed with a multiprocessor to get independency and reliability as well as to extend the system easily. The control system is designed in three distinct subsystems (supervisory control part, functional control part, and remote control part). To review the functional performance of developed controller, a prototype mobile robot, which was installed 4 DOF mainpulator, was designed and manufactured. Initial tests showed that the robot could turn with a radius of 38 cm and a maximum speed of 1.26 km/hr and go over obstacle of 18 cm in height. (author)

  15. Multiprocessor performance modeling with ADAS

    Science.gov (United States)

    Hayes, Paul J.; Andrews, Asa M.

    1989-01-01

    A graph managing strategy referred to as the Algorithm to Architecture Mapping Model (ATAMM) appears useful for the time-optimized execution of application algorithm graphs in embedded multiprocessors and for the performance prediction of graph designs. This paper reports the modeling of ATAMM in the Architecture Design and Assessment System (ADAS) to make an independent verification of ATAMM's performance prediction capability and to provide a user framework for the evaluation of arbitrary algorithm graphs. Following an overview of ATAMM and its major functional rules are descriptions of the ADAS model of ATAMM, methods to enter an arbitrary graph into the model, and techniques to analyze the simulation results. The performance of a 7-node graph example is evaluated using the ADAS model and verifies the ATAMM concept by substantiating previously published performance results.

  16. Prefetching in file systems for MIMD multiprocessors

    Science.gov (United States)

    Kotz, David F.; Ellis, Carla Schlatter

    1990-01-01

    The question of whether prefetching blocks on the file into the block cache can effectively reduce overall execution time of a parallel computation, even under favorable assumptions, is considered. Experiments have been conducted with an interleaved file system testbed on the Butterfly Plus multiprocessor. Results of these experiments suggest that (1) the hit ratio, the accepted measure in traditional caching studies, may not be an adequate measure of performance when the workload consists of parallel computations and parallel file access patterns, (2) caching with prefetching can significantly improve the hit ratio and the average time to perform an I/O (input/output) operation, and (3) an improvement in overall execution time has been observed in most cases. In spite of these gains, prefetching sometimes results in increased execution times (a negative result, given the optimistic nature of the study). The authors explore why it is not trivial to translate savings on individual I/O requests into consistently better overall performance and identify the key problems that need to be addressed in order to improve the potential of prefetching techniques in the environment.

  17. VME multiprocessor system for plasma control at the JT-60 Upgrade

    International Nuclear Information System (INIS)

    Kimura, T.; Kurihara, K.; Takahashi, M.; Kawamata, Y.; Akasaka, H.; Matsukawa, M.

    1989-01-01

    In this paper design and preliminary tests are reported of a VME multiprocessor system for the JT-60 Upgrade plasma control utilizing three MC88100 based RISC computers and VME buses. The design of the VME system was stimulated by faster and more accurate computation requirements for the plasma position and shape control

  18. Massively Parallel Polar Decomposition on Distributed-Memory Systems

    KAUST Repository

    Ltaief, Hatem; Sukkari, Dalal E.; Esposito, Aniello; Nakatsukasa, Yuji; Keyes, David E.

    2018-01-01

    We present a high-performance implementation of the Polar Decomposition (PD) on distributed-memory systems. Building upon on the QR-based Dynamically Weighted Halley (QDWH) algorithm, the key idea lies in finding the best rational approximation

  19. FTMP (Fault Tolerant Multiprocessor) programmer's manual

    Science.gov (United States)

    Feather, F. E.; Liceaga, C. A.; Padilla, P. A.

    1986-01-01

    The Fault Tolerant Multiprocessor (FTMP) computer system was constructed using the Rockwell/Collins CAPS-6 processor. It is installed in the Avionics Integration Research Laboratory (AIRLAB) of NASA Langley Research Center. It is hosted by AIRLAB's System 10, a VAX 11/750, for the loading of programs and experimentation. The FTMP support software includes a cross compiler for a high level language called Automated Engineering Design (AED) System, an assembler for the CAPS-6 processor assembly language, and a linker. Access to this support software is through an automated remote access facility on the VAX which relieves the user of the burden of learning how to use the IBM 4381. This manual is a compilation of information about the FTMP support environment. It explains the FTMP software and support environment along many of the finer points of running programs on FTMP. This will be helpful to the researcher trying to run an experiment on FTMP and even to the person probing FTMP with fault injections. Much of the information in this manual can be found in other sources; we are only attempting to bring together the basic points in a single source. If the reader should need points clarified, there is a list of support documentation in the back of this manual.

  20. High Performance Polar Decomposition on Distributed Memory Systems

    KAUST Repository

    Sukkari, Dalal E.

    2016-08-08

    The polar decomposition of a dense matrix is an important operation in linear algebra. It can be directly calculated through the singular value decomposition (SVD) or iteratively using the QR dynamically-weighted Halley algorithm (QDWH). The former is difficult to parallelize due to the preponderant number of memory-bound operations during the bidiagonal reduction. We investigate the latter scenario, which performs more floating-point operations but exposes at the same time more parallelism, and therefore, runs closer to the theoretical peak performance of the system, thanks to more compute-bound matrix operations. Profiling results show the performance scalability of QDWH for calculating the polar decomposition using around 9200 MPI processes on well and ill-conditioned matrices of 100K×100K problem size. We study then the performance impact of the QDWH-based polar decomposition as a pre-processing step toward calculating the SVD itself. The new distributed-memory implementation of the QDWH-SVD solver achieves up to five-fold speedup against current state-of-the-art vendor SVD implementations. © Springer International Publishing Switzerland 2016.

  1. Control and Reliability of Optical Networks in Multiprocessors

    Science.gov (United States)

    Olsen, James Jonathan

    1993-01-01

    Optical communication links have great potential to improve the performance of interconnection networks within large parallel multiprocessors, but the problems of semiconductor laser drive control and reliability inhibit their wide use. These problems have been solved in the telecommunications context, but the telecommunications solutions, based on a small number of links, are often too bulky, complex, power-hungry, and expensive to be feasible for use in a multiprocessor network with thousands of optical links. The main problems with the telecommunications approaches are that they are, by definition, designed for long-distance communication and therefore deal with communications links in isolation, instead of in an overall systems context. By taking a system-level approach to solving the laser reliability problem in a multiprocessor, and by exploiting the short -distance nature of the links, one can achieve small, simple, low-power, and inexpensive solutions, practical for implementation in the thousands of optical links that might be used in a multiprocessor. Through modeling and experimentation, I demonstrate that such system-level solutions exist, and are feasible for use in a multiprocessor network. I divide semiconductor laser reliability problems into two classes: transient errors and hard failures, and develop solutions to each type of problem in the context of a large multiprocessor. I find that for transient errors, the computer system would require a very low bit-error-rate (BER), such as 10^{-23}, if no provision were made for error control. Optical links cannot achieve such rates directly, but I find that a much more reasonable link-level BER (such as 10^{-7} ) would be acceptable with simple error detection coding. I then propose a feedback system that will enable lasers to achieve these error levels even when laser threshold current varies. Instead of telecommunications techniques, which require laser output power monitors, I describe a software

  2. Hard Real-Time Performances in Multiprocessor-Embedded Systems Using ASMP-Linux

    Directory of Open Access Journals (Sweden)

    Daniel Pierre Bovet

    2008-01-01

    Full Text Available Multiprocessor systems, especially those based on multicore or multithreaded processors, and new operating system architectures can satisfy the ever increasing computational requirements of embedded systems. ASMP-LINUX is a modified, high responsiveness, open-source hard real-time operating system for multiprocessor systems capable of providing high real-time performance while maintaining the code simple and not impacting on the performances of the rest of the system. Moreover, ASMP-LINUX does not require code changing or application recompiling/relinking. In order to assess the performances of ASMP-LINUX, benchmarks have been performed on several hardware platforms and configurations.

  3. Hard Real-Time Performances in Multiprocessor-Embedded Systems Using ASMP-Linux

    Directory of Open Access Journals (Sweden)

    Betti Emiliano

    2008-01-01

    Full Text Available Abstract Multiprocessor systems, especially those based on multicore or multithreaded processors, and new operating system architectures can satisfy the ever increasing computational requirements of embedded systems. ASMP-LINUX is a modified, high responsiveness, open-source hard real-time operating system for multiprocessor systems capable of providing high real-time performance while maintaining the code simple and not impacting on the performances of the rest of the system. Moreover, ASMP-LINUX does not require code changing or application recompiling/relinking. In order to assess the performances of ASMP-LINUX, benchmarks have been performed on several hardware platforms and configurations.

  4. Assessing Programming Costs of Explicit Memory Localization on a Large Scale Shared Memory Multiprocessor

    Directory of Open Access Journals (Sweden)

    Silvio Picano

    1992-01-01

    Full Text Available We present detailed experimental work involving a commercially available large scale shared memory multiple instruction stream-multiple data stream (MIMD parallel computer having a software controlled cache coherence mechanism. To make effective use of such an architecture, the programmer is responsible for designing the program's structure to match the underlying multiprocessors capabilities. We describe the techniques used to exploit our multiprocessor (the BBN TC2000 on a network simulation program, showing the resulting performance gains and the associated programming costs. We show that an efficient implementation relies heavily on the user's ability to explicitly manage the memory system.

  5. PGHPF – An Optimizing High Performance Fortran Compiler for Distributed Memory Machines

    Directory of Open Access Journals (Sweden)

    Zeki Bozkus

    1997-01-01

    Full Text Available High Performance Fortran (HPF is the first widely supported, efficient, and portable parallel programming language for shared and distributed memory systems. HPF is realized through a set of directive-based extensions to Fortran 90. It enables application developers and Fortran end-users to write compact, portable, and efficient software that will compile and execute on workstations, shared memory servers, clusters, traditional supercomputers, or massively parallel processors. This article describes a production-quality HPF compiler for a set of parallel machines. Compilation techniques such as data and computation distribution, communication generation, run-time support, and optimization issues are elaborated as the basis for an HPF compiler implementation on distributed memory machines. The performance of this compiler on benchmark programs demonstrates that high efficiency can be achieved executing HPF code on parallel architectures.

  6. Design of massively parallel hardware multi-processors for highly-demanding embedded applications

    NARCIS (Netherlands)

    Jozwiak, L.; Jan, Y.

    2013-01-01

    Many new embedded applications require complex computations to be performed to tight schedules, while at the same time demanding low energy consumption and low cost. For implementation of these highly-demanding applications, highly-optimized application-specific multi-processor system-on-a-chip

  7. Distributed power management of real-time applications on a GALS multiprocessor SOC

    NARCIS (Netherlands)

    Nelson, Andrew; Goossens, Kees

    2015-01-01

    It is generally desirable to reduce the power consumption of embedded systems. Dynamic Voltage and Frequency Scaling (DVFS) is a commonly applied technique to achieve power reduction at the cost of computational performance. Multiprocessor System on Chips (MPSoCs) can have multiple voltage and

  8. Realtime multiprocessor for mobile ad hoc networks

    Directory of Open Access Journals (Sweden)

    T. Jungeblut

    2008-05-01

    Full Text Available This paper introduces a real-time Multiprocessor System-On-Chip (MPSoC for low power wireless applications. The multiprocessor is based on eight 32bit RISC processors that are connected via an Network-On-Chip (NoC. The NoC follows a novel approach with guaranteed bandwidth to the application that meets hard realtime requirements. At a clock frequency of 100 MHz the total power consumption of the MPSoC that has been fabricated in 180 nm UMC standard cell technology is 772 mW.

  9. Using a commercial symmetric multiprocessor for lattice QCD

    International Nuclear Information System (INIS)

    Brower, R.C.; Chen, D.; Negele, J.W.

    1998-01-01

    In its evolution, the computer industry has reached the point when considerable computing power can be packaged on a single microprocessor chip. At the same time, costs of designing a computer system around such a CPU are growing. For these reasons we decided to explore a possibility of using commercially available symmetric multiprocessors (SMP) as building blocks for the LQCD computer. Careful analysis of the architecture allowed us to build a QCD primitive library running close to the peak performance on the UltraSPARC processor. As a result, multithreaded QCD code (both the heatbath and the Wilson fermion inverter) runs at about 50% efficiency on a single SMP. The communication between different CPUs is handled by a coherent memory system. Currently we are planning to connect several SMPs with a high bandwidth network into a single system. (orig.)

  10. Performances of multiprocessor multidisk architectures for continuous media storage

    Science.gov (United States)

    Gennart, Benoit A.; Messerli, Vincent; Hersch, Roger D.

    1996-03-01

    Multimedia interfaces increase the need for large image databases, capable of storing and reading streams of data with strict synchronicity and isochronicity requirements. In order to fulfill these requirements, we consider a parallel image server architecture which relies on arrays of intelligent disk nodes, each disk node being composed of one processor and one or more disks. This contribution analyzes through bottleneck performance evaluation and simulation the behavior of two multi-processor multi-disk architectures: a point-to-point architecture and a shared-bus architecture similar to current multiprocessor workstation architectures. We compare the two architectures on the basis of two multimedia algorithms: the compute-bound frame resizing by resampling and the data-bound disk-to-client stream transfer. The results suggest that the shared bus is a potential bottleneck despite its very high hardware throughput (400Mbytes/s) and that an architecture with addressable local memories located closely to their respective processors could partially remove this bottleneck. The point- to-point architecture is scalable and able to sustain high throughputs for simultaneous compute- bound and data-bound operations.

  11. One-Step Programmable Arbiters for Multiprocessors

    DEFF Research Database (Denmark)

    Højberg, Kristian Søe

    1978-01-01

    When processors in a multiprocessor system demand service from a shared bus in an asynchronous mode, a synchronous state arbiter resolves conflicts and allocates resources. Independent of the combination of requests, only one state transition is required from a free to allocated resource...

  12. Shared performance monitor in a multiprocessor system

    Science.gov (United States)

    Chiu, George; Gara, Alan G.; Salapura, Valentina

    2012-07-24

    A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU comprises: a plurality of performance counters each for counting signals representing occurrences of events from one or more the plurality of processor units in the multiprocessor system; and, a plurality of input devices for receiving the event signals from one or more processor devices of the plurality of processor units, the plurality of input devices programmable to select event signals for receipt by one or more of the plurality of performance counters for counting, wherein the PMU is shared between multiple processing units, or within a group of processors in the multiprocessing system. The PMU is further programmed to monitor event signals issued from non-processor devices.

  13. The fast Amsterdam multiprocessor (FAMP) system hardware

    International Nuclear Information System (INIS)

    Hertzberger, L.O.; Kieft, G.; Kisielewski, B.; Wiggers, L.W.; Engster, C.; Koningsveld, L. van

    1981-01-01

    The architecture of a multiprocessor system is described that will be used for on-line filter and second stage trigger applications. The system is based on the MC 68000 microprocessor from Motorola. Emphasis is paid to hardware aspects, in particular the modularity, processor communication and interfacing, whereas the system software and the applications will be described in separate articles. (orig.)

  14. Dynamic overset grid communication on distributed memory parallel processors

    Science.gov (United States)

    Barszcz, Eric; Weeratunga, Sisira K.; Meakin, Robert L.

    1993-01-01

    A parallel distributed memory implementation of intergrid communication for dynamic overset grids is presented. Included are discussions of various options considered during development. Results are presented comparing an Intel iPSC/860 to a single processor Cray Y-MP. Results for grids in relative motion show the iPSC/860 implementation to be faster than the Cray implementation.

  15. Computer Generation of Fourier Transform Libraries for Distributed Memory Architectures

    Science.gov (United States)

    2010-12-01

    tractions used in quantum chemistry . It too performs algebraic transformations tominimize the operations count, and then optimizes code based on...existing parallel DFT algorithms, including their strengths and weaknesses. Four-stepFFT.The four-step algorithm [Hegland, 1994;Norton and Silberger , 1987...Sadayappan, and Alexander Sibiryakov. Synthesis of high-performance parallel programs for a class of ab initio quan- tum chemistry models. Proc. of

  16. Elastic pointer directory organization for scalable shared memory multiprocessors

    Institute of Scientific and Technical Information of China (English)

    Yuhang Liu; Mingfa Zhu; Limin Xiao

    2014-01-01

    In the field of supercomputing, one key issue for scal-able shared-memory multiprocessors is the design of the directory which denotes the sharing state for a cache block. A good direc-tory design intends to achieve three key attributes: reasonable memory overhead, sharer position precision and implementation complexity. However, researchers often face the problem that gain-ing one attribute may result in losing another. The paper proposes an elastic pointer directory (EPD) structure based on the analysis of shared-memory applications, taking the fact that the number of sharers for each directory entry is typical y smal . Analysis re-sults show that for 4 096 nodes, the ratio of memory overhead to the ful-map directory is 2.7%. Theoretical analysis and cycle-accurate execution-driven simulations on a 16 and 64-node cache coherence non uniform memory access (CC-NUMA) multiproces-sor show that the corresponding pointer overflow probability is reduced significantly. The performance is observed to be better than that of a limited pointers directory and almost identical to the ful-map directory, except for the slight implementation complex-ity. Using the directory cache to explore directory access locality is also studied. The experimental result shows that this is a promis-ing approach to be used in the state-of-the-art high performance computing domain.

  17. Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

    Energy Technology Data Exchange (ETDEWEB)

    Jin, Shuangshuang; Chen, Yousu; Wu, Di; Diao, Ruisheng; Huang, Zhenyu

    2015-12-09

    Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Message Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.

  18. Distributed-Memory Breadth-First Search on Massive Graphs

    Energy Technology Data Exchange (ETDEWEB)

    Buluc, Aydin [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Beamer, Scott [Univ. of California, Berkeley, CA (United States). Dept. of Electrical Engineering and Computer Sciences; Madduri, Kamesh [Pennsylvania State Univ., University Park, PA (United States). Computer Science & Engineering Dept.; Asanovic, Krste [Univ. of California, Berkeley, CA (United States). Dept. of Electrical Engineering and Computer Sciences; Patterson, David [Univ. of California, Berkeley, CA (United States). Dept. of Electrical Engineering and Computer Sciences

    2017-09-26

    This chapter studies the problem of traversing large graphs using the breadth-first search order on distributed-memory supercomputers. We consider both the traditional level-synchronous top-down algorithm as well as the recently discovered direction optimizing algorithm. We analyze the performance and scalability trade-offs in using different local data structures such as CSR and DCSC, enabling in-node multithreading, and graph decompositions such as 1D and 2D decomposition.

  19. A portable implementation of ARPACK for distributed memory parallel architectures

    Energy Technology Data Exchange (ETDEWEB)

    Maschhoff, K.J.; Sorensen, D.C.

    1996-12-31

    ARPACK is a package of Fortran 77 subroutines which implement the Implicitly Restarted Arnoldi Method used for solving large sparse eigenvalue problems. A parallel implementation of ARPACK is presented which is portable across a wide range of distributed memory platforms and requires minimal changes to the serial code. The communication layers used for message passing are the Basic Linear Algebra Communication Subprograms (BLACS) developed for the ScaLAPACK project and Message Passing Interface(MPI).

  20. Geometric Algorithms for Private-Cache Chip Multiprocessors

    DEFF Research Database (Denmark)

    Ajwani, Deepak; Sitchinava, Nodari; Zeh, Norbert

    2010-01-01

    -D convex hulls. These results are obtained by analyzing adaptations of either the PEM merge sort algorithm or PRAM algorithms. For the second group of problems—orthogonal line segment intersection reporting, batched range reporting, and related problems—more effort is required. What distinguishes......We study techniques for obtaining efficient algorithms for geometric problems on private-cache chip multiprocessors. We show how to obtain optimal algorithms for interval stabbing counting, 1-D range counting, weighted 2-D dominance counting, and for computing 3-D maxima, 2-D lower envelopes, and 2...... these problems from the ones in the previous group is the variable output size, which requires I/O-efficient load balancing strategies based on the contribution of the individual input elements to the output size. To obtain nearly optimal algorithms for these problems, we introduce a parallel distribution...

  1. Utilizing a multiprocessor architecture - The performance of MIDAS

    International Nuclear Information System (INIS)

    Maples, C.; Logan, D.; Meng, J.; Rathbun, W.; Weaver, D.

    1983-01-01

    The MIDAS architecture organizes multiple CPUs into clusters called distributed subsystems. Each subsystem consists of an array of processors controlled by a supervisory CPU. The multiprocessor array is composed of commercial CPUs (with floating point hardware) and specialized processing elements. Interprocessor communication within the array may occur either through switched memory modules or common shared memory. The architecture permits multiple processors to be focused on single problems. A distributed subsystem has been constructed and tested. It currently consists of a supervisor CPU; 16 blocks of independently switchable memory; 9 general purpose, VAX-class CPUs; and 2 specialized pipelined processors to handle I/O. Results on a variety of problems indicate that the subsystem performs 8 to 15 times faster than a standard computer with an identical CPU. The difference in performance represents the effect of differing CPU and I/O requirements

  2. Hardware support for CSP on a Java chip multiprocessor

    DEFF Research Database (Denmark)

    Gruian, Flavius; Schoeberl, Martin

    2013-01-01

    Due to memory bandwidth limitations, chip multiprocessors (CMPs) adopting the convenient shared memory model for their main memory architecture scale poorly. On-chip core-to-core communication is a solution to this problem, that can lead to further performance increase for a number of multithreaded...... applications. Programmatically, the Communicating Sequential Processes (CSPs) paradigm provides a sound computational model for such an architecture with message based communication. In this paper we explore hardware support for CSP in the context of an embedded Java CMP. The hardware support for CSP are on......-chip communication channels, implemented by a ring-based network-on-chip (NoC), to reduce the memory bandwidth pressure on the shared memory.The presented solution is scalable and also specific for our limited resources and real-time predictability requirements. CMP architectures of three to eight processors were...

  3. Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

    Science.gov (United States)

    Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

    2011-01-01

    The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.

  4. Parallelising a molecular dynamics algorithm on a multi-processor workstation

    Science.gov (United States)

    Müller-Plathe, Florian

    1990-12-01

    The Verlet neighbour-list algorithm is parallelised for a multi-processor Hewlett-Packard/Apollo DN10000 workstation. The implementation makes use of memory shared between the processors. It is a genuine master-slave approach by which most of the computational tasks are kept in the master process and the slaves are only called to do part of the nonbonded forces calculation. The implementation features elements of both fine-grain and coarse-grain parallelism. Apart from three calls to library routines, two of which are standard UNIX calls, and two machine-specific language extensions, the whole code is written in standard Fortran 77. Hence, it may be expected that this parallelisation concept can be transfered in parts or as a whole to other multi-processor shared-memory computers. The parallel code is routinely used in production work.

  5. A Screen Space GPGPU Surface LIC Algorithm for Distributed Memory Data Parallel Sort Last Rendering Infrastructures

    Energy Technology Data Exchange (ETDEWEB)

    Loring, Burlen; Karimabadi, Homa; Rortershteyn, Vadim

    2014-07-01

    The surface line integral convolution(LIC) visualization technique produces dense visualization of vector fields on arbitrary surfaces. We present a screen space surface LIC algorithm for use in distributed memory data parallel sort last rendering infrastructures. The motivations for our work are to support analysis of datasets that are too large to fit in the main memory of a single computer and compatibility with prevalent parallel scientific visualization tools such as ParaView and VisIt. By working in screen space using OpenGL we can leverage the computational power of GPUs when they are available and run without them when they are not. We address efficiency and performance issues that arise from the transformation of data from physical to screen space by selecting an alternate screen space domain decomposition. We analyze the algorithm's scaling behavior with and without GPUs on two high performance computing systems using data from turbulent plasma simulations.

  6. Method for wiring allocation and switch configuration in a multiprocessor environment

    Science.gov (United States)

    Aridor, Yariv [Zichron Ya'akov, IL; Domany, Tamar [Kiryat Tivon, IL; Frachtenberg, Eitan [Jerusalem, IL; Gal, Yoav [Haifa, IL; Shmueli, Edi [Haifa, IL; Stockmeyer, legal representative, Robert E.; Stockmeyer, Larry Joseph [San Jose, CA

    2008-07-15

    A method for wiring allocation and switch configuration in a multiprocessor computer, the method including employing depth-first tree traversal to determine a plurality of paths among a plurality of processing elements allocated to a job along a plurality of switches and wires in a plurality of D-lines, and selecting one of the paths in accordance with at least one selection criterion.

  7. Investigation of implementing a synchronization protocol under multiprocessors hierarchical scheduling

    NARCIS (Netherlands)

    Nemati, F.; Behnam, M.; Bril, R.J.

    2009-01-01

    In the multi-core and multiprocessor domain, there has been considerable work done on scheduling techniques assuming that real-time tasks are independent. In practice a typical real-time system usually share logical resources among tasks. However, synchronization in the multiprocessor area has not

  8. Optimizing NEURON Simulation Environment Using Remote Memory Access with Recursive Doubling on Distributed Memory Systems.

    Science.gov (United States)

    Shehzad, Danish; Bozkuş, Zeki

    2016-01-01

    Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI) is used. MPI_Allgather collective is exercised for spikes exchange after each interval across distributed memory systems. The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors. This necessitates improving communication methodology to decrease the spikes exchange time over distributed memory systems. This work has improved MPI_Allgather method using Remote Memory Access (RMA) by moving two-sided communication to one-sided communication, and use of recursive doubling mechanism facilitates achieving efficient communication between the processors in precise steps. This approach enhanced communication concurrency and has improved overall runtime making NEURON more efficient for simulation of large neuronal network models.

  9. Optimizing NEURON Simulation Environment Using Remote Memory Access with Recursive Doubling on Distributed Memory Systems

    Directory of Open Access Journals (Sweden)

    Danish Shehzad

    2016-01-01

    Full Text Available Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI is used. MPI_Allgather collective is exercised for spikes exchange after each interval across distributed memory systems. The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors. This necessitates improving communication methodology to decrease the spikes exchange time over distributed memory systems. This work has improved MPI_Allgather method using Remote Memory Access (RMA by moving two-sided communication to one-sided communication, and use of recursive doubling mechanism facilitates achieving efficient communication between the processors in precise steps. This approach enhanced communication concurrency and has improved overall runtime making NEURON more efficient for simulation of large neuronal network models.

  10. Interoperable mesh components for large-scale, distributed-memory simulations

    International Nuclear Information System (INIS)

    Devine, K; Leung, V; Diachin, L; Miller, M

    2009-01-01

    SciDAC applications have a demonstrated need for advanced software tools to manage the complexities associated with sophisticated geometry, mesh, and field manipulation tasks, particularly as computer architectures move toward the petascale. In this paper, we describe a software component - an abstract data model and programming interface - designed to provide support for parallel unstructured mesh operations. We describe key issues that must be addressed to successfully provide high-performance, distributed-memory unstructured mesh services and highlight some recent research accomplishments in developing new load balancing and MPI-based communication libraries appropriate for leadership class computing. Finally, we give examples of the use of parallel adaptive mesh modification in two SciDAC applications.

  11. A possible approach to estimating the operational efficiency of multiprocessor systems

    International Nuclear Information System (INIS)

    Kuznetsov, N.Y.; Gorlach, S.P.; Sumskaya, A.A.

    1984-01-01

    This article presents a mathematical model that constructs the upper and lower estimates evaluating the efficiency of solution of a large class of problems using a multiprocessor system with a specific architecture. Efficiency depends on a system's architecture (e.g., the number of processors, memory volume, the number of communication links, commutation speed) and the types of problems it is intended to solve. The behavior of the model is considered in a stationary mode. The model is used to evaluate the efficiency of a particular algorithm implemented in a multiprocessor system. It is concluded that the model is flexible and enables the investigation of a broad class of problems in computational mathematics, including linear algebra and boundary-value problems of mathematical physics

  12. Energy-Aware Real-Time Task Scheduling for Heterogeneous Multiprocessors with Particle Swarm Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    Weizhe Zhang

    2014-01-01

    Full Text Available Energy consumption in computer systems has become a more and more important issue. High energy consumption has already damaged the environment to some extent, especially in heterogeneous multiprocessors. In this paper, we first formulate and describe the energy-aware real-time task scheduling problem in heterogeneous multiprocessors. Then we propose a particle swarm optimization (PSO based algorithm, which can successfully reduce the energy cost and the time for searching feasible solutions. Experimental results show that the PSO-based energy-aware metaheuristic uses 40%–50% less energy than the GA-based and SFLA-based algorithms and spends 10% less time than the SFLA-based algorithm in finding the solutions. Besides, it can also find 19% more feasible solutions than the SFLA-based algorithm.

  13. Numerical fluid flow and heat transfer calculations on multiprocessor systems

    Energy Technology Data Exchange (ETDEWEB)

    Oehman, G.A.; Malen, T.E.; Kuusela, P.

    1989-01-01

    The first part of the report presents the basic principles of parallel processing, and factors influencing tbe efficiency of practical applications are discussed. In a multiprocessor computer, different parts of the program code are executed in parallel, i.e. simultaneous with respect to time, on different processors, and thus it becomes possible to decrease the overall computation time by a factor, which in the ideal case is equal to the number of processors. The application study starts from the numerical solution of the twodimesional Laplace equation, which describes the steady heat conduction in a solid plate and advances through the solution of the three dimensional Laplace equation to the case of study laminar fluid flow in a twodimensional box at Reynolds numbers up to 20. Hereby the stream function-vorticity method is first applied and the SIMPLER method. The conventional (sequential) numerical algoritms for these fluid flow and heat transfer problems are found not to be ideally suited for conversion to parallel computation, but sped-up ratios considerably above 50 % of the theoretical maximum are regularly achieved in the runs. The numerical procedures we coded in the OCCAM-2 language and the test runs were performed at who Akademi on the imperimental HATHI-computers containing 16 T4l4 and 100 INMOS T800 transputers respectively.

  14. Numerical fluid flow and heat transfer calculations on multiprocessor systems

    Energy Technology Data Exchange (ETDEWEB)

    Oehman, G.A.; Malen, T.E.; Kuusela, P.

    1989-12-31

    The first part of the report presents the basic principles of parallel processing, and factors influencing tbe efficiency of practical applications are discussed. In a multiprocessor computer, different parts of the program code are executed in parallel, i.e. simultaneous with respect to time, on different processors, and thus it becomes possible to decrease the overall computation time by a factor, which in the ideal case is equal to the number of processors. The application study starts from the numerical solution of the twodimesional Laplace equation, which describes the steady heat conduction in a solid plate and advances through the solution of the three dimensional Laplace equation to the case of study laminar fluid flow in a twodimensional box at Reynolds numbers up to 20. Hereby the stream function-vorticity method is first applied and the SIMPLER method. The conventional (sequential) numerical algoritms for these fluid flow and heat transfer problems are found not to be ideally suited for conversion to parallel computation, but sped-up ratios considerably above 50 % of the theoretical maximum are regularly achieved in the runs. The numerical procedures we coded in the OCCAM-2 language and the test runs were performed at who Akademi on the imperimental HATHI-computers containing 16 T4l4 and 100 INMOS T800 transputers respectively.

  15. Parallel SN algorithms in shared- and distributed-memory environments

    International Nuclear Information System (INIS)

    Haghighat, Alireza; Hunter, Melissa A.; Mattis, Ronald E.

    1995-01-01

    Different 2-D spatial domain partitioning Sn transport theory algorithms have been developed on the basis of the Block-Jacobi iterative scheme. These algorithms have been incorporated into TWOTRAN-II, and tested on a shared-memory CRAY Y-MP C90 and a distributed-memory IBM SP1. For a series of fixed source r-z geometry homogeneous problems, parallel efficiencies in a range of 50-90% are achieved on the C90 with 6 processors, and lower values (20-60%) are obtained on the SP1. It is demonstrated that better performance is attainable if one addresses issues such as convergence rate, load-balancing, and granularity for both architectures, as well as message passing (network bandwidth and latency) for SP1. (author). 17 refs, 4 figs

  16. Particle simulation on a distributed memory highly parallel processor

    International Nuclear Information System (INIS)

    Sato, Hiroyuki; Ikesaka, Morio

    1990-01-01

    This paper describes parallel molecular dynamics simulation of atoms governed by local force interaction. The space in the model is divided into cubic subspaces and mapped to the processor array of the CAP-256, a distributed memory, highly parallel processor developed at Fujitsu Labs. We developed a new technique to avoid redundant calculation of forces between atoms in different processors. Experiments showed the communication overhead was less than 5%, and the idle time due to load imbalance was less than 11% for two model problems which contain 11,532 and 46,128 argon atoms. From the software simulation, the CAP-II which is under development is estimated to be about 45 times faster than CAP-256 and will be able to run the same problem about 40 times faster than Fujitsu's M-380 mainframe when 256 processors are used. (author)

  17. A distributed-memory hierarchical solver for general sparse linear systems

    Energy Technology Data Exchange (ETDEWEB)

    Chen, Chao [Stanford Univ., CA (United States). Inst. for Computational and Mathematical Engineering; Pouransari, Hadi [Stanford Univ., CA (United States). Dept. of Mechanical Engineering; Rajamanickam, Sivasankaran [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Center for Computing Research; Boman, Erik G. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Center for Computing Research; Darve, Eric [Stanford Univ., CA (United States). Inst. for Computational and Mathematical Engineering and Dept. of Mechanical Engineering

    2017-12-20

    We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by every processor. We also provide various numerical results to demonstrate the versatility and scalability of the parallel algorithm.

  18. Operating system for a real-time multiprocessor propulsion system simulator

    Science.gov (United States)

    Cole, G. L.

    1984-01-01

    The success of the Real Time Multiprocessor Operating System (RTMPOS) in the development and evaluation of experimental hardware and software systems for real time interactive simulation of air breathing propulsion systems was evaluated. The Real Time Multiprocessor Operating System (RTMPOS) provides the user with a versatile, interactive means for loading, running, debugging and obtaining results from a multiprocessor based simulator. A front end processor (FEP) serves as the simulator controller and interface between the user and the simulator. These functions are facilitated by the RTMPOS which resides on the FEP. The RTMPOS acts in conjunction with the FEP's manufacturer supplied disk operating system that provides typical utilities like an assembler, linkage editor, text editor, file handling services, etc. Once a simulation is formulated, the RTMPOS provides for engineering level, run time operations such as loading, modifying and specifying computation flow of programs, simulator mode control, data handling and run time monitoring. Run time monitoring is a powerful feature of RTMPOS that allows the user to record all actions taken during a simulation session and to receive advisories from the simulator via the FEP. The RTMPOS is programmed mainly in PASCAL along with some assembly language routines. The RTMPOS software is easily modified to be applicable to hardware from different manufacturers.

  19. Efficient process migration in the EMPS multiprocessor system

    NARCIS (Netherlands)

    van Dijk, G.J.W.; Gils, van M.J.

    1992-01-01

    The process migration facility in the Eindhoven multiprocessor system (EMPS) is presented. In the EMPS system, mailboxes are used for interprocess communication. These mailboxes provide transparency of location for communicating processes. The major advantages of mailbox communication in the EMPS

  20. USC orthogonal multiprocessor for image processing with neural networks

    Science.gov (United States)

    Hwang, Kai; Panda, Dhabaleswar K.; Haddadi, Navid

    1990-07-01

    This paper presents the architectural features and imaging applications of the Orthogonal MultiProcessor (OMP) system, which is under construction at the University of Southern California with research funding from NSF and assistance from several industrial partners. The prototype OMP is being built with 16 Intel i860 RISC microprocessors and 256 parallel memory modules using custom-designed spanning buses, which are 2-D interleaved and orthogonally accessed without conflicts. The 16-processor OMP prototype is targeted to achieve 430 MIPS and 600 Mflops, which have been verified by simulation experiments based on the design parameters used. The prototype OMP machine will be initially applied for image processing, computer vision, and neural network simulation applications. We summarize important vision and imaging algorithms that can be restructured with neural network models. These algorithms can efficiently run on the OMP hardware with linear speedup. The ultimate goal is to develop a high-performance Visual Computer (Viscom) for integrated low- and high-level image processing and vision tasks.

  1. Multiprocessor Priority Ceiling Emulation for Safety-Critical Java

    DEFF Research Database (Denmark)

    Strøm, Torur Biskopstø; Schoeberl, Martin

    2015-01-01

    Priority ceiling emulation has preferable properties on uniprocessor systems, such as avoiding priority inversion and being deadlock free. This has made it a popular locking protocol. According to the safety-critical Java specication, priority ceiling emulation is a requirement for implementations....... However, implementing the protocol for multiprocessor systemsis more complex so implementations might perform worse than non-preemptive implementations. In this paper we compare two multiprocessor lock implementations with hardware support for the Java optimized processor: non-preemptive locking...

  2. Hardware locks for a real-time Java chip multiprocessor

    DEFF Research Database (Denmark)

    Strøm, Torur Biskopstø; Puffitsch, Wolfgang; Schoeberl, Martin

    2016-01-01

    A software locking mechanism commonly protects shared resources for multithreaded applications. This mechanism can, especially in chip-multiprocessor systems, result in a large synchronization overhead. For real-time systems in particular, this overhead increases the worst-case execution time....... This improvement can allow a larger number of real-time tasks to be reliably scheduled on a multiprocessor real-time platform....

  3. Multiprocessor Global Scheduling on Frame-Based DVFS Systems

    OpenAIRE

    Berten, Vandy; Goossens, Joël

    2008-01-01

    International audience; In this work, we are interested in multiprocessor energy efficient systems where task durations are not known in advance but are known stochastically. More precisely we consider global scheduling algorithms for frame-based multiprocessor stochastic DVFS (Dynamic Voltage and Frequency Scaling) systems. Moreover we consider processors with a discrete set of available frequencies. We provide a global scheduling algorithm, and formally show that no deadline will ever be mi...

  4. Massively Parallel Polar Decomposition on Distributed-Memory Systems

    KAUST Repository

    Ltaief, Hatem

    2018-01-01

    We present a high-performance implementation of the Polar Decomposition (PD) on distributed-memory systems. Building upon on the QR-based Dynamically Weighted Halley (QDWH) algorithm, the key idea lies in finding the best rational approximation for the scalar sign function, which also corresponds to the polar factor for symmetric matrices, to further accelerate the QDWH convergence. Based on the Zolotarev rational functions—introduced by Zolotarev (ZOLO) in 1877— this new PD algorithm ZOLO-PD converges within two iterations even for ill-conditioned matrices, instead of the original six iterations needed for QDWH. ZOLO-PD uses the property of Zolotarev functions that optimality is maintained when two functions are composed in an appropriate manner. The resulting ZOLO-PD has a convergence rate up to seventeen, in contrast to the cubic convergence rate for QDWH. This comes at the price of higher arithmetic costs and memory footprint. These extra floating-point operations can, however, be processed in an embarrassingly parallel fashion. We demonstrate performance using up to 102, 400 cores on two supercomputers. We demonstrate that, in the presence of a large number of processing units, ZOLO-PD is able to outperform QDWH by up to 2.3X speedup, especially in situations where QDWH runs out of work, for instance, in the strong scaling mode of operation.

  5. Intelligent discrete particle swarm optimization for multiprocessor task scheduling problem

    Directory of Open Access Journals (Sweden)

    S Sarathambekai

    2017-03-01

    Full Text Available Discrete particle swarm optimization is one of the most recently developed population-based meta-heuristic optimization algorithm in swarm intelligence that can be used in any discrete optimization problems. This article presents a discrete particle swarm optimization algorithm to efficiently schedule the tasks in the heterogeneous multiprocessor systems. All the optimization algorithms share a common algorithmic step, namely population initialization. It plays a significant role because it can affect the convergence speed and also the quality of the final solution. The random initialization is the most commonly used method in majority of the evolutionary algorithms to generate solutions in the initial population. The initial good quality solutions can facilitate the algorithm to locate the optimal solution or else it may prevent the algorithm from finding the optimal solution. Intelligence should be incorporated to generate the initial population in order to avoid the premature convergence. This article presents a discrete particle swarm optimization algorithm, which incorporates opposition-based technique to generate initial population and greedy algorithm to balance the load of the processors. Make span, flow time, and reliability cost are three different measures used to evaluate the efficiency of the proposed discrete particle swarm optimization algorithm for scheduling independent tasks in distributed systems. Computational simulations are done based on a set of benchmark instances to assess the performance of the proposed algorithm.

  6. ARTiS, an Asymmetric Real-Time Scheduler for Linux on Multi-Processor Architectures

    OpenAIRE

    Piel , Éric; Marquet , Philippe; Soula , Julien; Osuna , Christophe; Dekeyser , Jean-Luc

    2005-01-01

    The ARTiS system is a real-time extension of the GNU/Linux scheduler dedicated to SMP (Symmetric Multi-Processors) systems. It allows to mix High Performance Computing and real-time. ARTiS exploits the SMP architecture to guarantee the preemption of a processor when the system has to schedule a real-time task. The implementation is available as a modification of the Linux kernel, especially focusing (but not restricted to) IA-64 architecture. The basic idea of ARTiS is to assign a selected se...

  7. Operating System for Runtime Reconfigurable Multiprocessor Systems

    Directory of Open Access Journals (Sweden)

    Diana Göhringer

    2011-01-01

    Full Text Available Operating systems traditionally handle the task scheduling of one or more application instances on processor-like hardware architectures. RAMPSoC, a novel runtime adaptive multiprocessor System-on-Chip, exploits the dynamic reconfiguration on FPGAs to generate, start and terminate hardware and software tasks. The hardware tasks have to be transferred to the reconfigurable hardware via a configuration access port. The software tasks can be loaded into the local memory of the respective IP core either via the configuration access port or via the on-chip communication infrastructure (e.g. a Network-on-Chip. Recent-series of Xilinx FPGAs, such as Virtex-5, provide two Internal Configuration Access Ports, which cannot be accessed simultaneously. To prevent conflicts, the access to these ports as well as the hardware resource management needs to be controlled, e.g. by a special-purpose operating system running on an embedded processor. For that purpose and to handle the relations between temporally and spatially scheduled operations, the novel approach of an operating system is of high importance. This special purpose operating system, called CAP-OS (Configuration Access Port-Operating System, which will be presented in this paper, supports the clients using the configuration port with the services of priority-based access scheduling, hardware task mapping and resource management.

  8. A QDWH-Based SVD Software Framework on Distributed-Memory Manycore Systems

    KAUST Repository

    Sukkari, Dalal

    2017-01-01

    This paper presents a high performance software framework for computing a dense SVD on distributed- memory manycore systems. Originally introduced by Nakatsukasa et al. (Nakatsukasa et al. 2010; Nakatsukasa and Higham 2013), the SVD solver relies on the polar decomposition using the QR Dynamically-Weighted Halley algorithm (QDWH). Although the QDWH-based SVD algorithm performs a significant amount of extra floating-point operations compared to the traditional SVD with the one-stage bidiagonal reduction, the inherent high level of concurrency associated with Level 3 BLAS compute-bound kernels ultimately compensates for the arithmetic complexity overhead. Using the ScaLAPACK two-dimensional block cyclic data distribution with a rectangular processor topology, the resulting QDWH-SVD further reduces excessive communications during the panel factorization, while increasing the degree of parallelism during the update of the trailing submatrix, as opposed to relying to the default square processor grid. After detailing the algorithmic complexity and the memory footprint of the algorithm, we conduct a thorough performance analysis and study the impact of the grid topology on the performance by looking at the communication and computation profiling trade-offs. We report performance results against state-of-the-art existing QDWH software implementations (e.g., Elemental) and their SVD extensions on large-scale distributed-memory manycore systems based on commodity Intel x86 Haswell processors and Knights Landing (KNL) architecture. The QDWH-SVD framework achieves up to 3/8-fold on the Haswell/KNL-based platforms, respectively, against ScaLAPACK PDGESVD and turns out to be a competitive alternative for well and ill-conditioned matrices. We finally come up herein with a performance model based on these empirical results. Our QDWH-based polar decomposition and its SVD extension are freely available at https://github.com/ecrc/qdwh.git and https

  9. Development of a VME multi-processor system for plasma control at the JT-60 Upgrade

    International Nuclear Information System (INIS)

    Takahashi, M.; Kurihara, K.; Kawamata, Y.; Akasaka, H.; Kimura, T.

    1992-01-01

    Design and initial operation results are reported of a VME multi-processor system [1] for plasma control at a large fusion device named 'the JT-60 Upgrade' utilizing three 32-bit MC88100 based RISC computers and VME components. Development of the system was stimulated by faster and more accurate computation requirements for the plasma position and current control. The RISC computers operate at 25 MHz along with two cashe memories named MC88200. We newly developed VME bus modules of up/down counter, analog-to-digital converter and clock pulse generator for measuring magnetic field and coil current and for synchronizing the processing in the three RISCs and direct digital controllers (DDCs) of magnet power supplies. We also evaluated that the speed of the data transfer between the VME bus system and the DDCs through CAMAC highways satisfies the above requirements. In the initial operation of the JT-60 upgrade, it has been proved that the VME multi-processor system well controls the plasma position and current with a sampling period of 250 μsec and a delay of 500 μsec. (author)

  10. Best Speed Fit EDF Scheduling for Performance Asymmetric Multiprocessors

    Directory of Open Access Journals (Sweden)

    Peng Wu

    2017-01-01

    Full Text Available In order to improve the performance of a real-time system, asymmetric multiprocessors have been proposed. The benefits of improved system performance and reduced power consumption from such architectures cannot be fully exploited unless suitable task scheduling and task allocation approaches are implemented at the operating system level. Unfortunately, most of the previous research on scheduling algorithms for performance asymmetric multiprocessors is focused on task priority assignment. They simply assign the highest priority task to the fastest processor. In this paper, we propose BSF-EDF (best speed fit for earliest deadline first for performance asymmetric multiprocessor scheduling. This approach chooses a suitable processor rather than the fastest one, when allocating tasks. With this proposed BSF-EDF scheduling, we also derive an effective schedulability test.

  11. Power profiling of Cholesky and QR factorizations on distributed memory systems

    KAUST Repository

    Bosilca, George; Ltaief, Hatem; Dongarra, Jack

    2012-01-01

    with a dynamic distributed scheduler (DAGuE) to leverage distributed memory systems. We present performance results (Gflop/s) as well as the power profile (Watts) of two common dense factorizations needed to solve linear systems of equations, namely

  12. ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems.

    Science.gov (United States)

    González-Domínguez, Jorge; Expósito, Roberto R

    2018-01-01

    Biclustering techniques are gaining attention in the analysis of large-scale datasets as they identify two-dimensional submatrices where both rows and columns are correlated. In this work we present ParBiBit, a parallel tool to accelerate the search of interesting biclusters on binary datasets, which are very popular on different fields such as genetics, marketing or text mining. It is based on the state-of-the-art sequential Java tool BiBit, which has been proved accurate by several studies, especially on scenarios that result on many large biclusters. ParBiBit uses the same methodology as BiBit (grouping the binary information into patterns) and provides the same results. Nevertheless, our tool significantly improves performance thanks to an efficient implementation based on C++11 that includes support for threads and MPI processes in order to exploit the compute capabilities of modern distributed-memory systems, which provide several multicore CPU nodes interconnected through a network. Our performance evaluation with 18 representative input datasets on two different eight-node systems shows that our tool is significantly faster than the original BiBit. Source code in C++ and MPI running on Linux systems as well as a reference manual are available at https://sourceforge.net/projects/parbibit/.

  13. MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems.

    Science.gov (United States)

    González-Domínguez, Jorge; Liu, Yongchao; Touriño, Juan; Schmidt, Bertil

    2016-12-15

    MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on a cluster with 32 nodes (each containing two Intel Haswell processors) shows reductions in execution time of over one order of magnitude for typical input datasets. Furthermore, MSAProbs-MPI using eight nodes is faster than the GPU-accelerated QuickProbs running on a Tesla K20. Another strong point is that MSAProbs-MPI can deal with large datasets for which MSAProbs and QuickProbs might fail due to time and memory constraints, respectively. Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at http://msaprobs.sourceforge.net CONTACT: jgonzalezd@udc.esSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. Recognition of simple visual images using a sparse distributed memory: Some implementations and experiments

    Science.gov (United States)

    Jaeckel, Louis A.

    1990-01-01

    Previously, a method was described of representing a class of simple visual images so that they could be used with a Sparse Distributed Memory (SDM). Herein, two possible implementations are described of a SDM, for which these images, suitably encoded, will serve both as addresses to the memory and as data to be stored in the memory. A key feature of both implementations is that a pattern that is represented as an unordered set with a variable number of members can be used as an address to the memory. In the 1st model, an image is encoded as a 9072 bit string to be used as a read or write address; the bit string may also be used as data to be stored in the memory. Another representation, in which an image is encoded as a 256 bit string, may be used with either model as data to be stored in the memory, but not as an address. In the 2nd model, an image is not represented as a vector of fixed length to be used as an address. Instead, a rule is given for determining which memory locations are to be activated in response to an encoded image. This activation rule treats the pieces of an image as an unordered set. With this model, the memory can be simulated, based on a method of computing the approximate result of a read operation.

  15. Chip-Multiprocessor Hardware Locks for Safety-Critical Java

    DEFF Research Database (Denmark)

    Strøm, Torur Biskopstø; Puffitsch, Wolfgang; Schoeberl, Martin

    2013-01-01

    and may void a task set's schedulability. In this paper we present a hardware locking mechanism to reduce the synchronization overhead. The solution is implemented for the chip-multiprocessor version of the Java Optimized Processor in the context of safety-critical Java. The implementation is compared...

  16. Stream-processing pipelines: processing of streams on multiprocessor architecture

    NARCIS (Netherlands)

    Kavaldjiev, N.K.; Smit, Gerardus Johannes Maria; Jansen, P.G.

    In this paper we study the timing aspects of the operation of stream-processing applications that run on a multiprocessor architecture. Dependencies are derived for the processing and communication times of the processors in such a system. Three cases of real-time constrained operation and four

  17. Multiprocessor based data acquisition system for radiation monitoring in nuclear reactors

    International Nuclear Information System (INIS)

    Pansare, M.G.; Narsaiah, A.; Anantha Krishnan, T.S.

    1989-01-01

    Expensive minicomputers are required for building powerful Data Acquisition Systems (DAS) capable of scanning and processing large number of signals in a real-time environment. However by using the inexpensive microprocessors in multiprocessor configuration it is possible to build DASs that are as powerful as minicomputer based systems at much lesser cost. This paper describes such a multiprocessor based DAS designed for acquiring data from various radiation monitoring instruments of a nuclear reactor. The system is built by using MULTIBUS standard boards based on intel 8086, 16 bit microprocessor, with local and shared memory. The system monitors upto 128 analog input channels, 64 digital input channels and actuates upto 128 digital output contacts. The system continuously checks for the alarm condition of the input channels and displays the alarm status on an ALARM CRT. Facility has been provided for the transfer of data to a central computer. At any instant of time, the information regarding different channels being monitored is available from the local console as well as through five remote terminals located at various places in the reactor building. (author)

  18. Job-mix modeling and system analysis of an aerospace multiprocessor.

    Science.gov (United States)

    Mallach, E. G.

    1972-01-01

    An aerospace guidance computer organization, consisting of multiple processors and memory units attached to a central time-multiplexed data bus, is described. A job mix for this type of computer is obtained by analysis of Apollo mission programs. Multiprocessor performance is then analyzed using: 1) queuing theory, under certain 'limiting case' assumptions; 2) Markov process methods; and 3) system simulation. Results of the analyses indicate: 1) Markov process analysis is a useful and efficient predictor of simulation results; 2) efficient job execution is not seriously impaired even when the system is so overloaded that new jobs are inordinately delayed in starting; 3) job scheduling is significant in determining system performance; and 4) a system having many slow processors may or may not perform better than a system of equal power having few fast processors, but will not perform significantly worse.

  19. Operating experience with a VMEbus multiprocessor system for data acquisition and reduction in nuclear physics

    International Nuclear Information System (INIS)

    Kutt, P.H.; Balamuth, D.P.

    1989-01-01

    A multiprocessor system based on commercially available VMEbus components has been developed for the acquisition and reduction of event-mode data in nuclear physics experiments. The system contains seven 68000 CPU's and 14 MB of memory. A minimal operating system handles data transfer and task allocation, and a compiler for a specially designed event analysis language produces code for the processors. The system has been in operation for four years at the University of Pennsylvania Tandem Accelerator Laboratory. Computation rates over 3 times that of a MicroVAX II have been achieved at a fraction of the cost. The use of WORM optical disks for event recording allows the processing for gigabyte data sets without operator intervention. A more powerful system is being planned which will make use of recently developed RISC processors to obtain an order of magnitude increase in computing power per node

  20. High speed vision processor with reconfigurable processing element array based on full-custom distributed memory

    Science.gov (United States)

    Chen, Zhe; Yang, Jie; Shi, Cong; Qin, Qi; Liu, Liyuan; Wu, Nanjian

    2016-04-01

    In this paper, a hybrid vision processor based on a compact full-custom distributed memory for near-sensor high-speed image processing is proposed. The proposed processor consists of a reconfigurable processing element (PE) array, a row processor (RP) array, and a dual-core microprocessor. The PE array includes two-dimensional processing elements with a compact full-custom distributed memory. It supports real-time reconfiguration between the PE array and the self-organized map (SOM) neural network. The vision processor is fabricated using a 0.18 µm CMOS technology. The circuit area of the distributed memory is reduced markedly into 1/3 of that of the conventional memory so that the circuit area of the vision processor is reduced by 44.2%. Experimental results demonstrate that the proposed design achieves correct functions.

  1. SuperLU{_}DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems

    Energy Technology Data Exchange (ETDEWEB)

    Li, Xiaoye S.; Demmel, James W.

    2002-03-27

    In this paper, we present the main algorithmic features in the software package SuperLU{_}DIST, a distributed-memory sparse direct solver for large sets of linear equations. We give in detail our parallelization strategies, with focus on scalability issues, and demonstrate the parallel performance and scalability on current machines. The solver is based on sparse Gaussian elimination, with an innovative static pivoting strategy proposed earlier by the authors. The main advantage of static pivoting over classical partial pivoting is that it permits a priori determination of data structures and communication pattern for sparse Gaussian elimination, which makes it more scalable on distributed memory machines. Based on this a priori knowledge, we designed highly parallel and scalable algorithms for both LU decomposition and triangular solve and we show that they are suitable for large-scale distributed memory machines.

  2. Cache-aware network-on-chip for chip multiprocessors

    Science.gov (United States)

    Tatas, Konstantinos; Kyriacou, Costas; Dekoulis, George; Demetriou, Demetris; Avraam, Costas; Christou, Anastasia

    2009-05-01

    This paper presents the hardware prototype of a Network-on-Chip (NoC) for a chip multiprocessor that provides support for cache coherence, cache prefetching and cache-aware thread scheduling. A NoC with support to these cache related mechanisms can assist in improving systems performance by reducing the cache miss ratio. The presented multi-core system employs the Data-Driven Multithreading (DDM) model of execution. In DDM thread scheduling is done according to data availability, thus the system is aware of the threads to be executed in the near future. This characteristic of the DDM model allows for cache aware thread scheduling and cache prefetching. The NoC prototype is a crossbar switch with output buffering that can support a cache-aware 4-node chip multiprocessor. The prototype is built on the Xilinx ML506 board equipped with a Xilinx Virtex-5 FPGA.

  3. Multiprocessor Real-Time Scheduling with Hierarchical Processor Affinities

    OpenAIRE

    Bonifaci , Vincenzo; Brandenburg , Björn; D'Angelo , Gianlorenzo; Marchetti-Spaccamela , Alberto

    2016-01-01

    International audience; Many multiprocessor real-time operating systems offer the possibility to restrict the migrations of any task to a specified subset of processors by setting affinity masks. A notion of " strong arbitrary processor affinity scheduling " (strong APA scheduling) has been proposed; this notion avoids schedulability losses due to overly simple implementations of processor affinities. Due to potential overheads, strong APA has not been implemented so far in a real-time operat...

  4. Design considerations for a multiprocessor based data acquisition system

    International Nuclear Information System (INIS)

    Tippie, J.W.; Kulaga, J.E.

    1979-01-01

    The rapid advance of digital technology has provided the systems designer with many new design options. Hardware is no longer the controlling expense. Complex operating systems provide the flexibility and development tools needed by software designers, but restrict throughput. Multiprocessor-based systems can be used to ''front-end'' high-throughput applications while maintaining the many advantages offered by multitasking operating systems. The design of a high-throughput data acquisition system for application in low energy nuclear physics is considered

  5. Multiprocessor Real-Time Locking Protocols for Replicated Resources

    Science.gov (United States)

    2016-07-01

    assignment problem, the ac- tual identities of the allocated replicas must be known. When locking protocols are used, tasks may experience delays due to both...Multiprocessor Real-Time Locking Protocols for Replicated Resources ∗ Catherine E. Jarrett1, Kecheng Yang1, Ming Yang1, Pontus Ekberg2, and James H...replicas to execute. In prior work on replicated resources, k-exclusion locks have been used, but this restricts tasks to lock only one replica at a time. To

  6. GOTHIC memory management : a multiprocessor shared single level store

    OpenAIRE

    Michel , Béatrice

    1990-01-01

    Gothic purpose is to build an object-oriented fault-tolerant distributed operating system for a local area network of multiprocessor workstations. This paper describes Gothic memory manager. It realizes the sharing of the secondary memory space between any process running on the Gothic system. Processes on different processors can communicate by sharing permanent information. The manager implements a shared single level storage with an invalidation protocol working on disk-pages to maintain s...

  7. Extending and implementing the Self-adaptive Virtual Processor for distributed memory architectures

    NARCIS (Netherlands)

    van Tol, M.W.; Koivisto, J.

    2011-01-01

    Many-core architectures of the future are likely to have distributed memory organizations and need fine grained concurrency management to be used effectively. The Self-adaptive Virtual Processor (SVP) is an abstract concurrent programming model which can provide this, but the model and its current

  8. 2-D fluid dynamics models for laser driven fusion on IBM 3090 vector multiprocessors

    International Nuclear Information System (INIS)

    Atzeni, S.

    1988-01-01

    Fluid-dynamics codes for laser fusion are complex research codes, consisting of many distinct modules and embodying a variety of numerical methods. They are therefore good candidates for testing general purpose advanced computer architectures and the related software. In this paper, after a brief outline of the basic concepts of laser fusion, the implementation of the 2-D laser fusion fluid code DUED on the IBM 3090 VF vector multiprocessors is discussed. Emphasis is put on parallelization, performed by means of IBM Parallel FORTRAN (PF). It is shown how different modules have been optimized by using different features of PF: i) modules based on depth-2 nested loops exploit automatic parallelization; ii) laser light ray tracing is partitioned by scheduling parallel ICCG algorithm (executed in parallel by appropiately synchronized parallel subroutines). Performance results are given for separate modules of the code, as well as for typical complete runs

  9. Use of the CAMAC-MULTIBUS combined protocol for organizing multi-processor operation in a crate

    International Nuclear Information System (INIS)

    Glejbman, Eh.M.

    1985-01-01

    Problems of developing electronic units for large on-line systems for nuclear-physical experiments automation and developed on the base of principles of distributed control and data processing are discussed. Crates with simultaneous disposition and operation of CAMAC moduli (EUR-4100) and those realizing the MULTIBUS hardcopy log in dataway are described. It is attained due to sharing the CAMAC and the MULTIBUS hardcopy logs in the crate dataway. Application of job scheduler and executor moduli in the MULTIBUS interface permits to organize multiprocessor operation and to obtain separation of data stream as well as to increase total computational capacity in the crate

  10. Performance of the coupled thermalhydraulics/neutron kinetics code R/P/C on workstation clusters and multiprocessor systems

    International Nuclear Information System (INIS)

    Hammer, C.; Paffrath, M.; Boeer, R.; Finnemann, H.; Jackson, C.J.

    1996-01-01

    The light water reactor core simulation code PANBOX has been coupled with the transient analysis code RELAP5 for the purpose of performing plant safety analyses with a three-dimensional (3-D) neutron kinetics model. The system has been parallelized to improve the computational efficiency. The paper describes the features of this system with emphasis on performance aspects. Performance results are given for different types of parallelization, i. e. for using an automatic parallelizing compiler, using the portable PVM platform on a workstation cluster, using PVM on a shared memory multiprocessor, and for using machine dependent interfaces. (author)

  11. Simulation of the behaviour of electron-optical systems using a parallel computer

    International Nuclear Information System (INIS)

    Balladore, J.L.; Hawkes, P.W.

    1990-01-01

    The advantage of using a multiprocessor computer for the calculation of electron-optical properties is investigated. A considerable reduction of computing time is obtained by reorganising the finite-element field computation. (orig.)

  12. Cyclic executive for safety-critical Java on chip-multiprocessors

    DEFF Research Database (Denmark)

    Ravn, Anders P.; Schoeberl, Martin

    2010-01-01

    , that uses model checking to find a static schedule, if one exists at all, which gives an implementation of a table driven multiprocessor scheduler. To evaluate the proposed cyclic executive for multiprocessors we have implemented it in the context of safety-critical Java on a Java processor....

  13. Efficient calculation of open quantum system dynamics and time-resolved spectroscopy with distributed memory HEOM (DM-HEOM).

    Science.gov (United States)

    Kramer, Tobias; Noack, Matthias; Reinefeld, Alexander; Rodríguez, Mirta; Zelinskyy, Yaroslav

    2018-06-11

    Time- and frequency-resolved optical signals provide insights into the properties of light-harvesting molecular complexes, including excitation energies, dipole strengths and orientations, as well as in the exciton energy flow through the complex. The hierarchical equations of motion (HEOM) provide a unifying theory, which allows one to study the combined effects of system-environment dissipation and non-Markovian memory without making restrictive assumptions about weak or strong couplings or separability of vibrational and electronic degrees of freedom. With increasing system size the exact solution of the open quantum system dynamics requires memory and compute resources beyond a single compute node. To overcome this barrier, we developed a scalable variant of HEOM. Our distributed memory HEOM, DM-HEOM, is a universal tool for open quantum system dynamics. It is used to accurately compute all experimentally accessible time- and frequency-resolved processes in light-harvesting molecular complexes with arbitrary system-environment couplings for a wide range of temperatures and complex sizes. © 2018 Wiley Periodicals, Inc. © 2018 Wiley Periodicals, Inc.

  14. Multiprocessor system with multiple concurrent modes of execution

    Science.gov (United States)

    Ahn, Daniel; Ceze, Luis H; Chen, Dong; Gara, Alan; Heidelberger, Philip; Ohmacht, Martin

    2013-12-31

    A multiprocessor system supports multiple concurrent modes of speculative execution. Speculation identification numbers (IDs) are allocated to speculative threads from a pool of available numbers. The pool is divided into domains, with each domain being assigned to a mode of speculation. Modes of speculation include TM, TLS, and rollback. Allocation of the IDs is carried out with respect to a central state table and using hardware pointers. The IDs are used for writing different versions of speculative results in different ways of a set in a cache memory.

  15. Data Provenance for Agent-Based Models in a Distributed Memory

    Directory of Open Access Journals (Sweden)

    Delmar B. Davis

    2018-04-01

    Full Text Available Agent-Based Models (ABMs assist with studying emergent collective behavior of individual entities in social, biological, economic, network, and physical systems. Data provenance can support ABM by explaining individual agent behavior. However, there is no provenance support for ABMs in a distributed setting. The Multi-Agent Spatial Simulation (MASS library provides a framework for simulating ABMs at fine granularity, where agents and spatial data are shared application resources in a distributed memory. We introduce a novel approach to capture ABM provenance in a distributed memory, called ProvMASS. We evaluate our technique with traditional data provenance queries and performance measures. Our results indicate that a configurable approach can capture provenance that explains coordination of distributed shared resources, simulation logic, and agent behavior while limiting performance overhead. We also show the ability to support practical analyses (e.g., agent tracking and storage requirements for different capture configurations.

  16. Power profiling of Cholesky and QR factorizations on distributed memory systems

    KAUST Repository

    Bosilca, George

    2012-08-30

    This paper presents the power profile of two high performance dense linear algebra libraries on distributed memory systems, ScaLAPACK and DPLASMA. From the algorithmic perspective, their methodologies are opposite. The former is based on block algorithms and relies on multithreaded BLAS and a two-dimensional block cyclic data distribution to achieve high parallel performance. The latter is based on tile algorithms running on top of a tile data layout and uses fine-grained task parallelism combined with a dynamic distributed scheduler (DAGuE) to leverage distributed memory systems. We present performance results (Gflop/s) as well as the power profile (Watts) of two common dense factorizations needed to solve linear systems of equations, namely Cholesky and QR. The reported numbers show that DPLASMA surpasses ScaLAPACK not only in terms of performance (up to 2X speedup) but also in terms of energy efficiency (up to 62 %). © 2012 Springer-Verlag (outside the USA).

  17. Monte Carlo photon transport on shared memory and distributed memory parallel processors

    International Nuclear Information System (INIS)

    Martin, W.R.; Wan, T.C.; Abdel-Rahman, T.S.; Mudge, T.N.; Miura, K.

    1987-01-01

    Parallelized Monte Carlo algorithms for analyzing photon transport in an inertially confined fusion (ICF) plasma are considered. Algorithms were developed for shared memory (vector and scalar) and distributed memory (scalar) parallel processors. The shared memory algorithm was implemented on the IBM 3090/400, and timing results are presented for dedicated runs with two, three, and four processors. Two alternative distributed memory algorithms (replication and dispatching) were implemented on a hypercube parallel processor (1 through 64 nodes). The replication algorithm yields essentially full efficiency for all cube sizes; with the 64-node configuration, the absolute performance is nearly the same as with the CRAY X-MP. The dispatching algorithm also yields efficiencies above 80% in a large simulation for the 64-processor configuration

  18. Learning to read aloud: A neural network approach using sparse distributed memory

    Science.gov (United States)

    Joglekar, Umesh Dwarkanath

    1989-01-01

    An attempt to solve a problem of text-to-phoneme mapping is described which does not appear amenable to solution by use of standard algorithmic procedures. Experiments based on a model of distributed processing are also described. This model (sparse distributed memory (SDM)) can be used in an iterative supervised learning mode to solve the problem. Additional improvements aimed at obtaining better performance are suggested.

  19. Two alternate proofs of Wang's lune formula for sparse distributed memory and an integral approximation

    Science.gov (United States)

    Jaeckel, Louis A.

    1988-01-01

    In Kanerva's Sparse Distributed Memory, writing to and reading from the memory are done in relation to spheres in an n-dimensional binary vector space. Thus it is important to know how many points are in the intersection of two spheres in this space. Two proofs are given of Wang's formula for spheres of unequal radii, and an integral approximation for the intersection in this case.

  20. A combined PLC and CPU approach to multiprocessor control

    International Nuclear Information System (INIS)

    Harris, J.J.; Broesch, J.D.; Coon, R.M.

    1995-10-01

    A sophisticated multiprocessor control system has been developed for use in the E-Power Supply System Integrated Control (EPSSIC) on the DIII-D tokamak. EPSSIC provides control and interlocks for the ohmic heating coil power supply and its associated systems. Of particular interest is the architecture of this system: both a Programmable Logic Controller (PLC) and a Central Processor Unit (CPU) have been combined on a standard VME bus. The PLC and CPU input and output signals are routed through signal conditioning modules, which provide the necessary voltage and ground isolation. Additionally these modules adapt the signal levels to that of the VME I/O boards. One set of I/O signals is shared between the two processors. The resulting multiprocessor system provides a number of advantages: redundant operation for mission critical situations, flexible communications using conventional TCP/IP protocols, the simplicity of ladder logic programming for the majority of the control code, and an easily maintained and expandable non-proprietary system

  1. Thermal-Aware Scheduling for Future Chip Multiprocessors

    Directory of Open Access Journals (Sweden)

    Pedro Trancoso

    2007-04-01

    Full Text Available The increased complexity and operating frequency in current single chip microprocessors is resulting in a decrease in the performance improvements. Consequently, major manufacturers offer chip multiprocessor (CMP architectures in order to keep up with the expected performance gains. This architecture is successfully being introduced in many markets including that of the embedded systems. Nevertheless, the integration of several cores onto the same chip may lead to increased heat dissipation and consequently additional costs for cooling, higher power consumption, decrease of the reliability, and thermal-induced performance loss, among others. In this paper, we analyze the evolution of the thermal issues for the future chip multiprocessor architectures and show that as the number of on-chip cores increases, the thermal-induced problems will worsen. In addition, we present several scenarios that result in excessive thermal stress to the CMP chip or significant performance loss. In order to minimize or even eliminate these problems, we propose thermal-aware scheduler (TAS algorithms. When assigning processes to cores, TAS takes their temperature and cooling ability into account in order to avoid thermal stress and at the same time improve the performance. Experimental results have shown that a TAS algorithm that considers also the temperatures of neighboring cores is able to significantly reduce the temperature-induced performance loss while at the same time, decrease the chip's temperature across many different operation and configuration scenarios.

  2. Multi-processor developments in the United States for future high energy physics experiments and accelerators

    International Nuclear Information System (INIS)

    Gaines, I.

    1988-03-01

    The use of multi-processors for analysis and high-level triggering in High Energy Physics experiments, pioneered by the early emulator systems, has reached maturity, in particular with the multiple microprocessor systems in use at Fermilab. It is widely acknowledged that such systems will fulfill the major portion of the computing needs of future large experiments. Recent developments at Fermilab's Advanced Computer Program will make such systems even more powerful, cost-effective, and easier to use than they are at present. The next generation of microprocessors, already available, will provide CPU power of about one VAX 780 equivalent/$300, while supporting most VMS FORTRAN extensions and large (>8MB) amounts of memory. Low cost high density mass storage devices (based on video tape cartridge technology) will allow parallel I/O to remove potential I/O bottlenecks in systems of over 1000 VAX equipment processors. New interconnection schemes and system software will allow more flexible topologies and extremely high data bandwidth, especially for on-line systems. This talk will summarize the work at the Advanced Computer Program and the rest of the US in this field. 3 refs., 4 figs

  3. Trade-Off Exploration for Target Tracking Application in a Customized Multiprocessor Architecture

    Directory of Open Access Journals (Sweden)

    Yassin El-Hillali

    2009-01-01

    Full Text Available This paper presents the design of an FPGA-based multiprocessor-system-on-chip (MPSoC architecture optimized for Multiple Target Tracking (MTT in automotive applications. An MTT system uses an automotive radar to track the speed and relative position of all the vehicles (targets within its field of view. As the number of targets increases, the computational needs of the MTT system also increase making it difficult for a single processor to handle it alone. Our implementation distributes the computational load among multiple soft processor cores optimized for executing specific computational tasks. The paper explains how we designed and profiled the MTT application to partition it among different processors. It also explains how we applied different optimizations to customize the individual processor cores to their assigned tasks and to assess their impact on performance and FPGA resource utilization. The result is a complete MTT application running on an optimized MPSoC architecture that fits in a contemporary medium-sized FPGA and that meets the application's real-time constraints.

  4. Performance of Barotropic Ocean Models on Shared and Distributed Memory Computers

    National Research Council Canada - National Science Library

    Piacsek, S

    1998-01-01

    The efficiency of explicit time integration schemes for barotropic models of the Mediterranean were investigated, in context of the vectorization and parallel modeling approaches employed on different architectures...

  5. System-Level Design Methodologies for Networked Multiprocessor Systems-on-Chip

    DEFF Research Database (Denmark)

    Virk, Kashif Munir

    2008-01-01

    is the first such attempt in the published literature. The second part of the thesis deals with the issues related to the development of system-level design methodologies for networked multiprocessor systems-on-chip at various levels of design abstraction with special focus on the modeling and design...... at the system-level. The multiprocessor modeling framework is then extended to include models of networked multiprocessor systems-on-chip which is then employed to model wireless sensor networks both at the sensor node level as well as the wireless network level. In the third and the final part, the thesis...... to the transaction-level model. The thesis, as a whole makes contributions by describing a design methodology for networked multiprocessor embedded systems at three layers of abstraction from system-level through transaction-level to the cycle accurate level as well as demonstrating it practically by implementing...

  6. The Cortex Transform as an image preprocessor for sparse distributed memory: An initial study

    Science.gov (United States)

    Olshausen, Bruno; Watson, Andrew

    1990-01-01

    An experiment is described which was designed to evaluate the use of the Cortex Transform as an image processor for Sparse Distributed Memory (SDM). In the experiment, a set of images were injected with Gaussian noise, preprocessed with the Cortex Transform, and then encoded into bit patterns. The various spatial frequency bands of the Cortex Transform were encoded separately so that they could be evaluated based on their ability to properly cluster patterns belonging to the same class. The results of this study indicate that by simply encoding the low pass band of the Cortex Transform, a very suitable input representation for the SDM can be achieved.

  7. Efficient packing of patterns in sparse distributed memory by selective weighting of input bits

    Science.gov (United States)

    Kanerva, Pentti

    1991-01-01

    When a set of patterns is stored in a distributed memory, any given storage location participates in the storage of many patterns. From the perspective of any one stored pattern, the other patterns act as noise, and such noise limits the memory's storage capacity. The more similar the retrieval cues for two patterns are, the more the patterns interfere with each other in memory, and the harder it is to separate them on retrieval. A method is described of weighting the retrieval cues to reduce such interference and thus to improve the separability of patterns that have similar cues.

  8. A database for on-line event analysis on a distributed memory machine

    CERN Document Server

    Argante, E; Van der Stok, P D V; Willers, Ian Malcolm

    1995-01-01

    Parallel in-memory databases can enhance the structuring and parallelization of programs used in High Energy Physics (HEP). Efficient database access routines are used as communication primitives which hide the communication topology in contrast to the more explicit communications like PVM or MPI. A parallel in-memory database, called SPIDER, has been implemented on a 32 node Meiko CS-2 distributed memory machine. The spider primitives generate a lower overhead than the one generated by PVM or PMI. The event reconstruction program, CPREAD of the CPLEAR experiment, has been used as a test case. Performance measurerate generated by CPLEAR.

  9. Standard interfaces for program-modular multiprocessor systems

    International Nuclear Information System (INIS)

    Chernykh, E.V.

    1982-01-01

    The peculiarities of the structures of existing and developed standard interfaces used in automation systems for nuclear physical experiments are considered. general structural characteristics of multiprocessor system interfaces are revealed. The comparison of the existing system CAMAC crate and designed standards of COMPEX, E3S and FASTBUS interfaces by capacity and relative cost is carried out. The analysis of the given data shows that operation of any interface is more advantageous at the rates close to capacity values, the relative cost being minimum. In this case the advantage is on the side of interfaces with greater capacity values for which at a moderated decrease of the exchange or requests processing rate the relative costs grow slower. A higher capacity of one-cycle exchange is provided with functional data way specialization in the interface. The conclusion is drawn that most perspective trend in the development of automation systems for high energy physics experiments is using FASTBUS standard

  10. An Adaptive Hybrid Multiprocessor technique for bioinformatics sequence alignment

    KAUST Repository

    Bonny, Talal

    2012-07-28

    Sequence alignment algorithms such as the Smith-Waterman algorithm are among the most important applications in the development of bioinformatics. Sequence alignment algorithms must process large amounts of data which may take a long time. Here, we introduce our Adaptive Hybrid Multiprocessor technique to accelerate the implementation of the Smith-Waterman algorithm. Our technique utilizes both the graphics processing unit (GPU) and the central processing unit (CPU). It adapts to the implementation according to the number of CPUs given as input by efficiently distributing the workload between the processing units. Using existing resources (GPU and CPU) in an efficient way is a novel approach. The peak performance achieved for the platforms GPU + CPU, GPU + 2CPUs, and GPU + 3CPUs is 10.4 GCUPS, 13.7 GCUPS, and 18.6 GCUPS, respectively (with the query length of 511 amino acid). © 2010 IEEE.

  11. Multi-processor network implementations in Multibus II and VME

    International Nuclear Information System (INIS)

    Briegel, C.

    1992-01-01

    ACNET (Fermilab Accelerator Controls Network), a proprietary network protocol, is implemented in a multi-processor configuration for both Multibus II and VME. The implementations are contrasted by the bus protocol and software design goals. The Multibus II implementation provides for multiple processors running a duplicate set of tasks on each processor. For a network connected task, messages are distributed by a network round-robin scheduler. Further, messages can be stopped, continued, or re-routed for each task by user-callable commands. The VME implementation provides for multiple processors running one task across all processors. The process can either be fixed to a particular processor or dynamically allocated to an available processor depending on the scheduling algorithm of the multi-processing operating system. (author)

  12. A measurement-based performability model for a multiprocessor system

    Science.gov (United States)

    Ilsueh, M. C.; Iyer, Ravi K.; Trivedi, K. S.

    1987-01-01

    A measurement-based performability model based on real error-data collected on a multiprocessor system is described. Model development from the raw errror-data to the estimation of cumulative reward is described. Both normal and failure behavior of the system are characterized. The measured data show that the holding times in key operational and failure states are not simple exponential and that semi-Markov process is necessary to model the system behavior. A reward function, based on the service rate and the error rate in each state, is then defined in order to estimate the performability of the system and to depict the cost of different failure types and recovery procedures.

  13. Analysis and Optimisation of Hierarchically Scheduled Multiprocessor Embedded Systems

    DEFF Research Database (Denmark)

    Pop, Traian; Pop, Paul; Eles, Petru

    2008-01-01

    We present an approach to the analysis and optimisation of heterogeneous multiprocessor embedded systems. The systems are heterogeneous not only in terms of hardware components, but also in terms of communication protocols and scheduling policies. When several scheduling policies share a resource......, they are organised in a hierarchy. In this paper, we first develop a holistic scheduling and schedulability analysis that determines the timing properties of a hierarchically scheduled system. Second, we address design problems that are characteristic to such hierarchically scheduled systems: assignment...... of scheduling policies to tasks, mapping of tasks to hardware components, and the scheduling of the activities. We also present several algorithms for solving these problems. Our heuristics are able to find schedulable implementations under limited resources, achieving an efficient utilisation of the system...

  14. Behavioral Simulation and Performance Evaluation of Multi-Processor Architectures

    Directory of Open Access Journals (Sweden)

    Ausif Mahmood

    1996-01-01

    Full Text Available The development of multi-processor architectures requires extensive behavioral simulations to verify the correctness of design and to evaluate its performance. A high level language can provide maximum flexibility in this respect if the constructs for handling concurrent processes and a time mapping mechanism are added. This paper describes a novel technique for emulating hardware processes involved in a parallel architecture such that an object-oriented description of the design is maintained. The communication and synchronization between hardware processes is handled by splitting the processes into their equivalent subprograms at the entry points. The proper scheduling of these subprograms is coordinated by a timing wheel which provides a time mapping mechanism. Finally, a high level language pre-processor is proposed so that the timing wheel and the process emulation details can be made transparent to the user.

  15. E-Token Energy-Aware Proportionate Sharing Scheduling Algorithm for Multiprocessor Systems

    Directory of Open Access Journals (Sweden)

    Pasupuleti Ramesh

    2017-01-01

    Full Text Available WSN plays vital role from small range healthcare surveillance systems to largescale environmental monitoring. Its design for energy constrained applications is a challenging issue. Sensors in WSNs are projected to run separately for longer periods. It is of excessive cost to substitute exhausted batteries which is not even possible in antagonistic situations. Multiprocessors are used in WSNs for high performance scientific computing, where each processor is assigned the same or different workload. When the computational demands of the system increase then the energy efficient approaches play an important role to increase system lifetime. Energy efficiency is commonly carried out by using proportionate fair scheduler. This introduces abnormal overloading effect. In order to overcome the existing problems E-token Energy-Aware Proportionate Sharing (EEAPS scheduling is proposed here. The power consumption for each thread/task is calculated and the tasks are allotted to the multiple processors through the auctioning mechanism. The algorithm is simulated by using the real-time simulator (RTSIM and the results are tested.

  16. A Taxonomy of Reconfigurable Single-/Multiprocessor Systems-on-Chip

    Directory of Open Access Journals (Sweden)

    Diana Göhringer

    2009-01-01

    Full Text Available Runtime adaptivity of hardware in processor architectures is a novel trend, which is under investigation in a variety of research labs all over the world. The runtime exchange of modules, implemented on a reconfigurable hardware, affects the instruction flow (e.g., in reconfigurable instruction set processors or the data flow, which has a strong impact on the performance of an application. Furthermore, the choice of a certain processor architecture related to the class of target applications is a crucial point in application development. A simple example is the domain of high-performance computing applications found in meteorology or high-energy physics, where vector processors are the optimal choice. A classification scheme for computer systems was provided in 1966 by Flynn where single/multiple data and instruction streams were combined to four types of architectures. This classification is now used as a foundation for an extended classification scheme including runtime adaptivity as further degree of freedom for processor architecture design. The developed scheme is validated by a multiprocessor system implemented on reconfigurable hardware as well as by a classification of existing static and reconfigurable processor systems.

  17. Performance of Multithreaded Chip Multiprocessors And Implications for Operating System Design

    OpenAIRE

    Fedorova, Alexandra; Seltzer, Margo I.; Small, Christopher A.; Nussbaum, Daniel

    2005-01-01

    An operating system’s design is often influenced by the architecture of the target hardware. While uniprocessor and multiprocessor architectures are well understood, such is not the case for multithreaded chip multiprocessors (CMT) – a new generation of processors designed to improve performance of memory-intensive applications. The first systems equipped with CMT processors are just becoming available, so it is critical that we now understand how to obtain the best performance from such syst...

  18. Parallel algorithms for quantum chemistry. I. Integral transformations on a hypercube multiprocessor

    International Nuclear Information System (INIS)

    Whiteside, R.A.; Binkley, J.S.; Colvin, M.E.; Schaefer, H.F. III

    1987-01-01

    For many years it has been recognized that fundamental physical constraints such as the speed of light will limit the ultimate speed of single processor computers to less than about three billion floating point operations per second (3 GFLOPS). This limitation is becoming increasingly restrictive as commercially available machines are now within an order of magnitude of this asymptotic limit. A natural way to avoid this limit is to harness together many processors to work on a single computational problem. In principle, these parallel processing computers have speeds limited only by the number of processors one chooses to acquire. The usefulness of potentially unlimited processing speed to a computationally intensive field such as quantum chemistry is obvious. If these methods are to be applied to significantly larger chemical systems, parallel schemes will have to be employed. For this reason we have developed distributed-memory algorithms for a number of standard quantum chemical methods. We are currently implementing these on a 32 processor Intel hypercube. In this paper we present our algorithm and benchmark results for one of the bottleneck steps in quantum chemical calculations: the four index integral transformation

  19. Computational Fluid Dynamics (CFD) Computations With Zonal Navier-Stokes Flow Solver (ZNSFLOW) Common High Performance Computing Scalable Software Initiative (CHSSI) Software

    National Research Council Canada - National Science Library

    Edge, Harris

    1999-01-01

    ...), computational fluid dynamics (CFD) 6 project. Under the project, a proven zonal Navier-Stokes solver was rewritten for scalable parallel performance on both shared memory and distributed memory high performance computers...

  20. An efficient communication scheme for solving Sn equations on message-passing multiprocessors

    International Nuclear Information System (INIS)

    Azmy, Y.Y.

    1993-01-01

    Early models of Intel's hypercube multiprocessors, e.g., the iPSC/1 and iPSC/2, were characterized by the high latency of message passing. This relatively weak dependence of the communication penalty on the size of messages, in contrast to its strong dependence on the number of messages, justified using the Fan-in Fan-out algorithm (which implements a minimum spanning tree path) to perform global operations, such as global sums, etc. Recent models of message-passing computers, such as the iPSC/860 and the Paragon, have been found to possess much smaller latency, thus forcing a reexamination of the issue of performance optimization with respect to communication schemes. Essentially, the Fan-in Fan-out scheme minimizes the number of nonsimultaneous messages sent but not the volume of data traffic across the network. Furthermore, if a global operation is performed in conjunction with the message passing, a large fraction of the attached nodes remains idle as the number of utilized processors is halved in each step of the process. On the other hand, the Recursive Halving scheme offers the smallest communication cost for global operations but has some drawbacks

  1. A data base for on-line event analysis on a distributed memory machine

    International Nuclear Information System (INIS)

    Argante, E.; Meesters, M.R.J.; Willers, I.; Stok, P. van der

    1996-01-01

    Parallel in-memory databases can enhance the structuring and parallelization of programs used in High Energy Physics (HEP). Efficient database access routines are used as communication primitives which hide the communication topology in contrast to the more explicit communications like PVM or MPI. A parallel in-memory database, called SPIDER, has been implemented on a 32 node Meiko CS-2 distributed memory machine. The SPIDER primitives generate a lower overhead than the one generated by PVM or MPI. The even reconstruction program, CPREAD, of the CLEAR experiment, has been used as test case. Performance measurements showed that CPREAD interfaced to SPIDER can easily cope with the event rate generated by CPLEAR. (author)

  2. Comparison between sparsely distributed memory and Hopfield-type neural network models

    Science.gov (United States)

    Keeler, James D.

    1986-01-01

    The Sparsely Distributed Memory (SDM) model (Kanerva, 1984) is compared to Hopfield-type neural-network models. A mathematical framework for comparing the two is developed, and the capacity of each model is investigated. The capacity of the SDM can be increased independently of the dimension of the stored vectors, whereas the Hopfield capacity is limited to a fraction of this dimension. However, the total number of stored bits per matrix element is the same in the two models, as well as for extended models with higher order interactions. The models are also compared in their ability to store sequences of patterns. The SDM is extended to include time delays so that contextual information can be used to cover sequences. Finally, it is shown how a generalization of the SDM allows storage of correlated input pattern vectors.

  3. Sparse Distributed Memory: understanding the speed and robustness of expert memory

    Directory of Open Access Journals (Sweden)

    Marcelo Salhab Brogliato

    2014-04-01

    Full Text Available How can experts, sometimes in exacting detail, almost immediately and very precisely recall memory items from a vast repertoire? The problem in which we will be interested concerns models of theoretical neuroscience that could explain the speed and robustness of an expert's recollection. The approach is based on Sparse Distributed Memory, which has been shown to be plausible, both in a neuroscientific and in a psychological manner, in a number of ways. A crucial characteristic concerns the limits of human recollection, the `tip-of-tongue' memory event--which is found at a non-linearity in the model. We expand the theoretical framework, deriving an optimization formula to solve to this non-linearity. Numerical results demonstrate how the higher frequency of rehearsal, through work or study, immediately increases the robustness and speed associated with expert memory.

  4. Parallel definition of tear film maps on distributed-memory clusters for the support of dry eye diagnosis.

    Science.gov (United States)

    González-Domínguez, Jorge; Remeseiro, Beatriz; Martín, María J

    2017-02-01

    The analysis of the interference patterns on the tear film lipid layer is a useful clinical test to diagnose dry eye syndrome. This task can be automated with a high degree of accuracy by means of the use of tear film maps. However, the time required by the existing applications to generate them prevents a wider acceptance of this method by medical experts. Multithreading has been previously successfully employed by the authors to accelerate the tear film map definition on multicore single-node machines. In this work, we propose a hybrid message-passing and multithreading parallel approach that further accelerates the generation of tear film maps by exploiting the computational capabilities of distributed-memory systems such as multicore clusters and supercomputers. The algorithm for drawing tear film maps is parallelized using Message Passing Interface (MPI) for inter-node communications and the multithreading support available in the C++11 standard for intra-node parallelization. The original algorithm is modified to reduce the communications and increase the scalability. The hybrid method has been tested on 32 nodes of an Intel cluster (with two 12-core Haswell 2680v3 processors per node) using 50 representative images. Results show that maximum runtime is reduced from almost two minutes using the previous only-multithreaded approach to less than ten seconds using the hybrid method. The hybrid MPI/multithreaded implementation can be used by medical experts to obtain tear film maps in only a few seconds, which will significantly accelerate and facilitate the diagnosis of the dry eye syndrome. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  5. A Case for Tamper-Resistant and Tamper-Evident Computer Systems

    National Research Council Canada - National Science Library

    Solihin, Yan

    2007-01-01

    .... These attacks attempt to snoop or modify data transfer between various chips in a computer system such as between the processor and memory, and between processors in a multiprocessor interconnect network...

  6. Supporting Multiprocessors in the Icecap Safety-Critical Java Run-Time Environment

    DEFF Research Database (Denmark)

    Zhao, Shuai; Wellings, Andy; Korsholm, Stephan Erbs

    The current version of the Safety Critical Java (SCJ) specification defines three compliance levels. Level 0 targets single processor programs while Level 1 and 2 can support multiprocessor platforms. Level 1 programs must be fully partitioned but Level 2 programs can also be more globally...... scheduled. As of yet, there is no official Reference Implementation for SCJ. However, the icecap project has produced a Safety-Critical Java Run-time Environment based on the Hardware-near Virtual Machine (HVM). This supports SCJ at all compliance levels and provides an implementation of the safety......-critical Java (javax.safetycritical) package. This is still work-in-progress and lacks certain key features. Among these is the ability to support multiprocessor platforms. In this paper, we explore two possible options to adding multiprocessor support to this environment: the “green thread” and the “native...

  7. A general model for memory interference in a multiprocessor system with memory hierarchy

    Science.gov (United States)

    Taha, Badie A.; Standley, Hilda M.

    1989-01-01

    The problem of memory interference in a multiprocessor system with a hierarchy of shared buses and memories is addressed. The behavior of the processors is represented by a sequence of memory requests with each followed by a determined amount of processing time. A statistical queuing network model for determining the extent of memory interference in multiprocessor systems with clusters of memory hierarchies is presented. The performance of the system is measured by the expected number of busy memory clusters. The results of the analytic model are compared with simulation results, and the correlation between them is found to be very high.

  8. Optical RAM-enabled cache memory and optical routing for chip multiprocessors: technologies and architectures

    Science.gov (United States)

    Pleros, Nikos; Maniotis, Pavlos; Alexoudi, Theonitsa; Fitsios, Dimitris; Vagionas, Christos; Papaioannou, Sotiris; Vyrsokinos, K.; Kanellos, George T.

    2014-03-01

    The processor-memory performance gap, commonly referred to as "Memory Wall" problem, owes to the speed mismatch between processor and electronic RAM clock frequencies, forcing current Chip Multiprocessor (CMP) configurations to consume more than 50% of the chip real-estate for caching purposes. In this article, we present our recent work spanning from Si-based integrated optical RAM cell architectures up to complete optical cache memory architectures for Chip Multiprocessor configurations. Moreover, we discuss on e/o router subsystems with up to Tb/s routing capacity for cache interconnection purposes within CMP configurations, currently pursued within the FP7 PhoxTrot project.

  9. A system-level multiprocessor system-on-chip modeling framework

    DEFF Research Database (Denmark)

    Virk, Kashif Munir; Madsen, Jan

    2004-01-01

    We present a system-level modeling framework to model system-on-chips (SoC) consisting of heterogeneous multiprocessors and network-on-chip communication structures in order to enable the developers of today's SoC designs to take advantage of the flexibility and scalability of network-on-chip and...... SoC design. We show how a hand-held multimedia terminal, consisting of JPEG, MP3 and GSM applications, can be modeled as a multiprocessor SoC in our framework....

  10. Mapping of H.264 decoding on a multiprocessor architecture

    Science.gov (United States)

    van der Tol, Erik B.; Jaspers, Egbert G.; Gelderblom, Rob H.

    2003-05-01

    Due to the increasing significance of development costs in the competitive domain of high-volume consumer electronics, generic solutions are required to enable reuse of the design effort and to increase the potential market volume. As a result from this, Systems-on-Chip (SoCs) contain a growing amount of fully programmable media processing devices as opposed to application-specific systems, which offered the most attractive solutions due to a high performance density. The following motivates this trend. First, SoCs are increasingly dominated by their communication infrastructure and embedded memory, thereby making the cost of the functional units less significant. Moreover, the continuously growing design costs require generic solutions that can be applied over a broad product range. Hence, powerful programmable SoCs are becoming increasingly attractive. However, to enable power-efficient designs, that are also scalable over the advancing VLSI technology, parallelism should be fully exploited. Both task-level and instruction-level parallelism can be provided by means of e.g. a VLIW multiprocessor architecture. To provide the above-mentioned scalability, we propose to partition the data over the processors, instead of traditional functional partitioning. An advantage of this approach is the inherent locality of data, which is extremely important for communication-efficient software implementations. Consequently, a software implementation is discussed, enabling e.g. SD resolution H.264 decoding with a two-processor architecture, whereas High-Definition (HD) decoding can be achieved with an eight-processor system, executing the same software. Experimental results show that the data communication considerably reduces up to 65% directly improving the overall performance. Apart from considerable improvement in memory bandwidth, this novel concept of partitioning offers a natural approach for optimally balancing the load of all processors, thereby further improving the

  11. Design of a modular digital computer system, DRL 4. [for meeting future requirements of spaceborne computers

    Science.gov (United States)

    1972-01-01

    The design is reported of an advanced modular computer system designated the Automatically Reconfigurable Modular Multiprocessor System, which anticipates requirements for higher computing capacity and reliability for future spaceborne computers. Subjects discussed include: an overview of the architecture, mission analysis, synchronous and nonsynchronous scheduling control, reliability, and data transmission.

  12. Communication strategies for angular domain decomposition of transport calculations on message passing multiprocessors

    International Nuclear Information System (INIS)

    Azmy, Y.Y.

    1997-01-01

    The effect of three communication schemes for solving Arbitrarily High Order Transport (AHOT) methods of the Nodal type on parallel performance is examined via direct measurements and performance models. The target architecture in this study is Oak Ridge National Laboratory's 128 node Paragon XP/S 5 computer and the parallelization is based on the Parallel Virtual Machine (PVM) library. However, the conclusions reached can be easily generalized to a large class of message passing platforms and communication software. The three schemes considered here are: (1) PVM's global operations (broadcast and reduce) which utilizes the Paragon's native corresponding operations based on a spanning tree routing; (2) the Bucket algorithm wherein the angular domain decomposition of the mesh sweep is complemented with a spatial domain decomposition of the accumulation process of the scalar flux from the angular flux and the convergence test; (3) a distributed memory version of the Bucket algorithm that pushes the spatial domain decomposition one step farther by actually distributing the fixed source and flux iterates over the memories of the participating processes. Their conclusion is that the Bucket algorithm is the most efficient of the three if all participating processes have sufficient memories to hold the entire problem arrays. Otherwise, the third scheme becomes necessary at an additional cost to speedup and parallel efficiency that is quantifiable via the parallel performance model

  13. Accelerated Cyclic Reduction: A Distributed-Memory Fast Solver for Structured Linear Systems

    KAUST Repository

    Chávez, Gustavo

    2017-12-15

    We present Accelerated Cyclic Reduction (ACR), a distributed-memory fast solver for rank-compressible block tridiagonal linear systems arising from the discretization of elliptic operators, developed here for three dimensions. Algorithmic synergies between Cyclic Reduction and hierarchical matrix arithmetic operations result in a solver that has O(kNlogN(logN+k2)) arithmetic complexity and O(k Nlog N) memory footprint, where N is the number of degrees of freedom and k is the rank of a block in the hierarchical approximation, and which exhibits substantial concurrency. We provide a baseline for performance and applicability by comparing with the multifrontal method with and without hierarchical semi-separable matrices, with algebraic multigrid and with the classic cyclic reduction method. Over a set of large-scale elliptic systems with features of nonsymmetry and indefiniteness, the robustness of the direct solvers extends beyond that of the multigrid solver, and relative to the multifrontal approach ACR has lower or comparable execution time and size of the factors, with substantially lower numerical ranks. ACR exhibits good strong and weak scaling in a distributed context and, as with any direct solver, is advantageous for problems that require the solution of multiple right-hand sides. Numerical experiments show that the rank k patterns are of O(1) for the Poisson equation and of O(n) for the indefinite Helmholtz equation. The solver is ideal in situations where low-accuracy solutions are sufficient, or otherwise as a preconditioner within an iterative method.

  14. Accelerated Cyclic Reduction: A Distributed-Memory Fast Solver for Structured Linear Systems

    KAUST Repository

    Chá vez, Gustavo; Turkiyyah, George; Zampini, Stefano; Ltaief, Hatem; Keyes, David E.

    2017-01-01

    We present Accelerated Cyclic Reduction (ACR), a distributed-memory fast solver for rank-compressible block tridiagonal linear systems arising from the discretization of elliptic operators, developed here for three dimensions. Algorithmic synergies between Cyclic Reduction and hierarchical matrix arithmetic operations result in a solver that has O(kNlogN(logN+k2)) arithmetic complexity and O(k Nlog N) memory footprint, where N is the number of degrees of freedom and k is the rank of a block in the hierarchical approximation, and which exhibits substantial concurrency. We provide a baseline for performance and applicability by comparing with the multifrontal method with and without hierarchical semi-separable matrices, with algebraic multigrid and with the classic cyclic reduction method. Over a set of large-scale elliptic systems with features of nonsymmetry and indefiniteness, the robustness of the direct solvers extends beyond that of the multigrid solver, and relative to the multifrontal approach ACR has lower or comparable execution time and size of the factors, with substantially lower numerical ranks. ACR exhibits good strong and weak scaling in a distributed context and, as with any direct solver, is advantageous for problems that require the solution of multiple right-hand sides. Numerical experiments show that the rank k patterns are of O(1) for the Poisson equation and of O(n) for the indefinite Helmholtz equation. The solver is ideal in situations where low-accuracy solutions are sufficient, or otherwise as a preconditioner within an iterative method.

  15. A scalable single-chip multi-processor architecture with on-chip RTOS kernel

    NARCIS (Netherlands)

    Theelen, B.D.; Verschueren, A.C.; Reyes Suarez, V.V.; Stevens, M.P.J.; Nunez, A.

    2003-01-01

    Now that system-on-chip technology is emerging, single-chip multi-processors are becoming feasible. A key problem of designing such systems is the complexity of their on-chip interconnects and memory architecture. It is furthermore unclear at what level software should be integrated. An example of a

  16. Temporal analysis and scheduling of hard real-time radios running on a multi-processor

    NARCIS (Netherlands)

    Moreira, O.

    2012-01-01

    On a multi-radio baseband system, multiple independent transceivers must share the resources of a multi-processor, while meeting each its own hard real-time requirements. Not all possible combinations of transceivers are known at compile time, so a solution must be found that either allows for

  17. DAEDALUS: System-Level Design Methodology for Streaming Multiprocessor Embedded Systems on Chips

    NARCIS (Netherlands)

    Stefanov, T.; Pimentel, A.; Nikolov, H.; Ha, S.; Teich, J.

    2017-01-01

    The complexity of modern embedded systems, which are increasingly based on heterogeneous multiprocessor system-on-chip (MPSoC) architectures, has led to the emergence of system-level design. To cope with this design complexity, system-level design aims at raising the abstraction level of the design

  18. Abstractions for aperiodic multiprocessor scheduling of real-time stream processing applications

    NARCIS (Netherlands)

    Hausmans, J.P.H.M.

    2015-01-01

    Embedded multiprocessor systems are often used in the domain of real-time stream processing applications to keep up with increasing power and performance requirements. Examples of such real-time stream processing applications are digital radio baseband processing and WLAN transceivers. These stream

  19. Shared random access memory resource for multiprocessor real-time systems

    International Nuclear Information System (INIS)

    Dimmler, D.G.; Hardy, W.H. II

    1977-01-01

    A shared random-access memory resource is described which is used within real-time data acquisition and control systems with multiprocessor and multibus organizations. Hardware and software aspects are discussed in a specific example where interconnections are done via a UNIBUS. The general applicability of the approach is also discussed

  20. Cache aware mapping of streaming apllications on a multiprocessor system-on-chip

    NARCIS (Netherlands)

    Moonen, A.J.M.; Bekooij, M.J.G.; Berg, van den R.M.J.; Meerbergen, van J.; Sciuto, D.; Peng, Z.

    2008-01-01

    Efficient use of the memory hierarchy is critical for achieving high performance in a multiprocessor system- on-chip. An external memory that is shared between processors is a bottleneck in current and future systems. Cache misses and a large cache miss penalty contribute to a low processor

  1. 3D-TV Rendering on a Multiprocessor System on a Chip

    NARCIS (Netherlands)

    Van Eijndhoven, J.T.J.; Li, X.

    2006-01-01

    This thesis focuses on the issue of mapping 3D-TV rendering applications to a multiprocessor platform. The target platform aims to address tomorrow's multi-media consumer market. The prototype chip, called Wasabi, contains a set of TriMedia processors that communicate viaa shared memory, fast

  2. An FPGA design flow for reconfigurable network-based multi-processor systems on chip

    NARCIS (Netherlands)

    Kumar, A.; Hansson, M.A; Huisken, J.; Corporaal, H.

    2007-01-01

    Multi-processor systems on chip (MPSoC) platforms are becoming increasingly more heterogeneous and are shifting towards a more communication-centric methodology. Networks on chip (NoC) have emerged as the design paradigm for scalable on-chip communication architectures. As the system complexity

  3. Multi-processor system-level synthesis for multiple applications on platform FPGA

    NARCIS (Netherlands)

    Kumar, A.; Fernando, S.D.; Ha, Y.; Mesman, B.; Corporaal, H.; Bertels, Koen

    2007-01-01

    Multiprocessor systems-on-chip (MPSoC) are being developed in increasing numbers to support the high number of applications running on modern embedded systems. Designing and programming such systems prove to be a major challenge. Most of the current design methodologies rely on creating the design

  4. Kmerind: A Flexible Parallel Library for K-mer Indexing of Biological Sequences on Distributed Memory Systems.

    Science.gov (United States)

    Pan, Tony; Flick, Patrick; Jain, Chirag; Liu, Yongchao; Aluru, Srinivas

    2017-10-09

    Counting and indexing fixed length substrings, or k-mers, in biological sequences is a key step in many bioinformatics tasks including genome alignment and mapping, genome assembly, and error correction. While advances in next generation sequencing technologies have dramatically reduced the cost and improved latency and throughput, few bioinformatics tools can efficiently process the datasets at the current generation rate of 1.8 terabases every 3 days. We present Kmerind, a high performance parallel k-mer indexing library for distributed memory environments. The Kmerind library provides a set of simple and consistent APIs with sequential semantics and parallel implementations that are designed to be flexible and extensible. Kmerind's k-mer counter performs similarly or better than the best existing k-mer counting tools even on shared memory systems. In a distributed memory environment, Kmerind counts k-mers in a 120 GB sequence read dataset in less than 13 seconds on 1024 Xeon CPU cores, and fully indexes their positions in approximately 17 seconds. Querying for 1% of the k-mers in these indices can be completed in 0.23 seconds and 28 seconds, respectively. Kmerind is the first k-mer indexing library for distributed memory environments, and the first extensible library for general k-mer indexing and counting. Kmerind is available at https://github.com/ParBLiSS/kmerind.

  5. A Performance-Prediction Model for PIC Applications on Clusters of Symmetric MultiProcessors: Validation with Hierarchical HPF+OpenMP Implementation

    Directory of Open Access Journals (Sweden)

    Sergio Briguglio

    2003-01-01

    Full Text Available A performance-prediction model is presented, which describes different hierarchical workload decomposition strategies for particle in cell (PIC codes on Clusters of Symmetric MultiProcessors. The devised workload decomposition is hierarchically structured: a higher-level decomposition among the computational nodes, and a lower-level one among the processors of each computational node. Several decomposition strategies are evaluated by means of the prediction model, with respect to the memory occupancy, the parallelization efficiency and the required programming effort. Such strategies have been implemented by integrating the high-level languages High Performance Fortran (at the inter-node stage and OpenMP (at the intra-node one. The details of these implementations are presented, and the experimental values of parallelization efficiency are compared with the predicted results.

  6. Estimating Performance of Single Bus, Shared Memory Multiprocessors

    Science.gov (United States)

    1987-05-01

    Chandy78] K.M. Chandy, C.M. Sauer, "Approximate methods for analyzing queuing network models of computing systems," Computing Surveys, vol10 , no 3...Denning78] P. Denning, J. Buzen, "The operational analysis of queueing network models", Computing Sur- veys, vol10 , no 3, September 1978, pp 225-261

  7. Safe and Efficient Support for Embeded Multi-Processors in ADA

    Science.gov (United States)

    Ruiz, Jose F.

    2010-08-01

    New software demands increasing processing power, and multi-processor platforms are spreading as the answer to achieve the required performance. Embedded real-time systems are also subject to this trend, but in the case of real-time mission-critical systems, the properties of reliability, predictability and analyzability are also paramount. The Ada 2005 language defined a subset of its tasking model, the Ravenscar profile, that provides the basis for the implementation of deterministic and time analyzable applications on top of a streamlined run-time system. This Ravenscar tasking profile, originally designed for single processors, has proven remarkably useful for modelling verifiable real-time single-processor systems. This paper proposes a simple extension to the Ravenscar profile to support multi-processor systems using a fully partitioned approach. The implementation of this scheme is simple, and it can be used to develop applications amenable to schedulability analysis.

  8. Resource Allocation Model for Modelling Abstract RTOS on Multiprocessor System-on-Chip

    DEFF Research Database (Denmark)

    Virk, Kashif Munir; Madsen, Jan

    2003-01-01

    Resource Allocation is an important problem in RTOS's, and has been an active area of research. Numerous approaches have been developed and many different techniques have been combined for a wide range of applications. In this paper, we address the problem of resource allocation in the context...... of modelling an abstract RTOS on multiprocessor SoC platforms. We discuss the implementation details of a simplified basic priority inheritance protocol for our abstract system model in SystemC....

  9. Communication and Memory Architecture Design of Application-Specific High-End Multiprocessors

    Directory of Open Access Journals (Sweden)

    Yahya Jan

    2012-01-01

    Full Text Available This paper is devoted to the design of communication and memory architectures of massively parallel hardware multiprocessors necessary for the implementation of highly demanding applications. We demonstrated that for the massively parallel hardware multiprocessors the traditionally used flat communication architectures and multi-port memories do not scale well, and the memory and communication network influence on both the throughput and circuit area dominates the processors influence. To resolve the problems and ensure scalability, we proposed to design highly optimized application-specific hierarchical and/or partitioned communication and memory architectures through exploring and exploiting the regularity and hierarchy of the actual data flows of a given application. Furthermore, we proposed some data distribution and related data mapping schemes in the shared (global partitioned memories with the aim to eliminate the memory access conflicts, as well as, to ensure that our communication design strategies will be applicable. We incorporated these architecture synthesis strategies into our quality-driven model-based multi-processor design method and related automated architecture exploration framework. Using this framework, we performed a large series of experiments that demonstrate many various important features of the synthesized memory and communication architectures. They also demonstrate that our method and related framework are able to efficiently synthesize well scalable memory and communication architectures even for the high-end multiprocessors. The gains as high as 12-times in performance and 25-times in area can be obtained when using the hierarchical communication networks instead of the flat networks. However, for the high parallelism levels only the partitioned approach ensures the scalability in performance.

  10. Dynamic scheduling and analysis of real time systems with multiprocessors

    Directory of Open Access Journals (Sweden)

    M.D. Nashid Anjum

    2016-08-01

    Full Text Available This research work considers a scenario of cloud computing job-shop scheduling problems. We consider m realtime jobs with various lengths and n machines with different computational speeds and costs. Each job has a deadline to be met, and the profit of processing a packet of a job differs from other jobs. Moreover, considered deadlines are either hard or soft and a penalty is applied if a deadline is missed where the penalty is considered as an exponential function of time. The scheduling problem has been formulated as a mixed integer non-linear programming problem whose objective is to maximize net-profit. The formulated problem is computationally hard and not solvable in deterministic polynomial time. This research work proposes an algorithm named the Tube-tap algorithm as a solution to this scheduling optimization problem. Extensive simulation shows that the proposed algorithm outperforms existing solutions in terms of maximizing net-profit and preserving deadlines.

  11. A multitasking, multisinked, multiprocessor data acquisition front end

    International Nuclear Information System (INIS)

    Fox, R.; Au, R.; Molen, A.V.

    1989-01-01

    The authors have developed a generalized data acquisition front end system which is based on MC68020 processors running a commercial real time kernel (rhoSOS), and implemented primarily in a high level language (C). This system has been attached to the back end on-line computing system at NSCL via our high performance ETHERNET protocol. Data may be simultaneously sent to any number of back end systems. Fixed fraction sampling along links to back end computing is also supported. A nonprocedural program generator simplifies the development of experiment specific code

  12. A QDWH-Based SVD Software Framework on Distributed-Memory Manycore Systems

    KAUST Repository

    Sukkari, Dalal; Ltaief, Hatem; Esposito, Aniello; Keyes, David E.

    2017-01-01

    , the inherent high level of concurrency associated with Level 3 BLAS compute-bound kernels ultimately compensates for the arithmetic complexity overhead. Using the ScaLAPACK two-dimensional block cyclic data distribution with a rectangular processor topology

  13. High-performance computing — an overview

    Science.gov (United States)

    Marksteiner, Peter

    1996-08-01

    An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.

  14. Interference control by best-effort process duty-cycling in chip multi-processor systems for real-time medical image processing

    NARCIS (Netherlands)

    Westmijze, M.; Bekooij, Marco Jan Gerrit; Smit, Gerardus Johannes Maria

    2013-01-01

    Systems with chip multi-processors are currently used for several applications that have real-time requirements. In chip multi-processor architectures, many hardware resources such as parts of the cache hierarchy are shared between cores and by using such resources, applications can significantly

  15. A Heterogeneous Multiprocessor Graphics System Using Processor-Enhanced Memories

    Science.gov (United States)

    1989-02-01

    frames per second, font generation directly from conic spline descriptions, and rapid calculation of radiosity form factors. The hardware consists of...generality for rendering curved surfaces, volume data, objects dcscri id with Constructive Solid Geometry, for rendering scenes using the radiosity ...f.aces and for computing a spherical radiosity lighting model (see Section 7.6). Custom Memory Chips \\ 208 bits x 128 pixels - Renderer Board ix p o a

  16. Stereo Vision and 3D Reconstruction on a Distributed Memory System

    NARCIS (Netherlands)

    Kuijpers, N.H.L.; Paar, G.; Lukkien, J.J.

    1996-01-01

    An important research topic in image processing is stereo vision. The objective is to compute a 3-dimensional representation of some scenery from two 2-dimensional digital images. Constructing a 3-dimensional representation involves finding pairs of pixels from the two images which correspond to the

  17. Optimizing NEURON Simulation Environment Using Remote Memory Access with Recursive Doubling on Distributed Memory Systems

    OpenAIRE

    Shehzad, Danish; Bozkuş, Zeki

    2016-01-01

    Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI) is used. MPI_Allgather collecti...

  18. Capacity for patterns and sequences in Kanerva's SDM as compared to other associative memory models. [Sparse, Distributed Memory

    Science.gov (United States)

    Keeler, James D.

    1988-01-01

    The information capacity of Kanerva's Sparse Distributed Memory (SDM) and Hopfield-type neural networks is investigated. Under the approximations used here, it is shown that the total information stored in these systems is proportional to the number connections in the network. The proportionality constant is the same for the SDM and Hopfield-type models independent of the particular model, or the order of the model. The approximations are checked numerically. This same analysis can be used to show that the SDM can store sequences of spatiotemporal patterns, and the addition of time-delayed connections allows the retrieval of context dependent temporal patterns. A minor modification of the SDM can be used to store correlated patterns.

  19. Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

    Energy Technology Data Exchange (ETDEWEB)

    Ohmacht, Martin

    2017-08-15

    In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.

  20. Quality-Driven Model-Based Design of MultiProcessor Embedded Systems for Highlydemanding Applications

    DEFF Research Database (Denmark)

    Jozwiak, Lech; Madsen, Jan

    2013-01-01

    The recent spectacular progress in modern nano-dimension semiconductor technology enabled implementation of a complete complex multi-processor system on a single chip (MPSoC), global networking and mobile wire-less communication, and facilitated a fast progress in these areas. New important...... accessible or distant) objects, installations, machines or devices, or even implanted in human or animal body can serve as examples. However, many of the modern embedded application impose very stringent functional and parametric demands. Moreover, the spectacular advances in microelectronics introduced...

  1. A high speed multi-tasking, multi-processor telemetry system

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Kung Chris [Univ. of Texas, El Paso, TX (United States)

    1996-12-31

    This paper describes a small size, light weight, multitasking, multiprocessor telemetry system capable of collecting 32 channels of differential signals at a sampling rate of 6.25 kHz per channel. The system is designed to collect data from remote wind turbine research sites and transfer the data via wireless communication. A description of operational theory, hardware components, and itemized cost is provided. Synchronization with other data acquisition systems and test data on data transmission rates is also given. 11 refs., 7 figs., 4 tabs.

  2. Parallel algorithms for geometric connected component labeling on a hypercube multiprocessor

    Science.gov (United States)

    Belkhale, K. P.; Banerjee, P.

    1992-01-01

    Different algorithms for the geometric connected component labeling (GCCL) problem are defined each of which involves d stages of message passing, for a d-dimensional hypercube. The major idea is that in each stage a hypercube multiprocessor increases its knowledge of domain. The algorithms under consideration include the QUAD algorithm for small number of processors and the Overlap Quad algorithm for large number of processors, subject to the locality of the connected sets. These algorithms differ in their run time, memory requirements, and message complexity. They were implemented on an Intel iPSC2/D4/MX hypercube.

  3. A parallel implementation of 3-d CT image reconstruction on a hypercube multiprocessor

    International Nuclear Information System (INIS)

    Chen, C.M.; Lee, S.Y.; Cho, Z.H.

    1990-01-01

    In this paper, the authors describe how image reconstruction in computerized tomography (CT) can be parallelized on a message-passing multiprocessor. In particular, the results obtained from parallel implementation of 3-D CT image reconstruction for parallel beam geometries on the Intel hypercube, iPSC/2, are presented. A two stage pipelining approach is employed for filtering (convolution) and backprojection. The conventional sequential convolution algorithm is modified such that the symmetry of the filter kernel is fully utilized for parallelization. In the backprojection stage, the 3-D incremental algorithm, the authors' recently developed backprojection scheme which is shown to be faster than conventional algorithm, is parallelized

  4. Energy-efficient fault tolerance in multiprocessor real-time systems

    Science.gov (United States)

    Guo, Yifeng

    The recent progress in the multiprocessor/multicore systems has important implications for real-time system design and operation. From vehicle navigation to space applications as well as industrial control systems, the trend is to deploy multiple processors in real-time systems: systems with 4 -- 8 processors are common, and it is expected that many-core systems with dozens of processing cores will be available in near future. For such systems, in addition to general temporal requirement common for all real-time systems, two additional operational objectives are seen as critical: energy efficiency and fault tolerance. An intriguing dimension of the problem is that energy efficiency and fault tolerance are typically conflicting objectives, due to the fact that tolerating faults (e.g., permanent/transient) often requires extra resources with high energy consumption potential. In this dissertation, various techniques for energy-efficient fault tolerance in multiprocessor real-time systems have been investigated. First, the Reliability-Aware Power Management (RAPM) framework, which can preserve the system reliability with respect to transient faults when Dynamic Voltage Scaling (DVS) is applied for energy savings, is extended to support parallel real-time applications with precedence constraints. Next, the traditional Standby-Sparing (SS) technique for dual processor systems, which takes both transient and permanent faults into consideration while saving energy, is generalized to support multiprocessor systems with arbitrary number of identical processors. Observing the inefficient usage of slack time in the SS technique, a Preference-Oriented Scheduling Framework is designed to address the problem where tasks are given preferences for being executed as soon as possible (ASAP) or as late as possible (ALAP). A preference-oriented earliest deadline (POED) scheduler is proposed and its application in multiprocessor systems for energy-efficient fault tolerance is

  5. A survey of Tumult, a real-time multi-processor system

    International Nuclear Information System (INIS)

    Jansen, P.G.

    1986-01-01

    Tumult (Twente University MULTi processor system) is the name of an ongoing project aiming at the design and implementation of a modular extendible multiprocessor system. All memory is distributed and processors communicate in parallel via a fast and reliable local switching network instead of a shared bus. A distributed real-time operating system is being designed and implemented, consisting of a multi-tasking subsystem per processor. Processes can communicate via a message passing mechanism. Communication links and processes are dynamically created and disposed by the application. In this article a brief description of the system is given; communication aspects are emphasized. (Auth.)

  6. Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

    Science.gov (United States)

    Ohmacht, Martin

    2014-09-09

    In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.

  7. VIRTUS: a multi-processor system in FASTBUS

    International Nuclear Information System (INIS)

    Ellett, J.; Jackson, R.; Ritter, R.; Schlein, P.; Yaeger, D.; Zweizig, J.

    1986-01-01

    VIRTUS is a system of parallel MC68000-based processors interconnected by FASTBUS that is used either on-line as an intelligent trigger component or off-line for full event processing. Each processor receives the complete set of data from one event. The host computer, a VAX 11/780, down-line loads all software to the processors, controls and monitors the functioning of all processors, and writes processed data to tape. Instructions, programs, and data are transferred among the processors and the host in the form of fixed format, variable length data blocks. (Auth.)

  8. A DAFT DL_POLY distributed memory adaptation of the Smoothed Particle Mesh Ewald method

    Science.gov (United States)

    Bush, I. J.; Todorov, I. T.; Smith, W.

    2006-09-01

    The Smoothed Particle Mesh Ewald method [U. Essmann, L. Perera, M.L. Berkowtz, T. Darden, H. Lee, L.G. Pedersen, J. Chem. Phys. 103 (1995) 8577] for calculating long ranged forces in molecular simulation has been adapted for the parallel molecular dynamics code DL_POLY_3 [I.T. Todorov, W. Smith, Philos. Trans. Roy. Soc. London 362 (2004) 1835], making use of a novel 3D Fast Fourier Transform (DAFT) [I.J. Bush, The Daresbury Advanced Fourier transform, Daresbury Laboratory, 1999] that perfectly matches the Domain Decomposition (DD) parallelisation strategy [W. Smith, Comput. Phys. Comm. 62 (1991) 229; M.R.S. Pinches, D. Tildesley, W. Smith, Mol. Sim. 6 (1991) 51; D. Rapaport, Comput. Phys. Comm. 62 (1991) 217] of the DL_POLY_3 code. In this article we describe software adaptations undertaken to import this functionality and provide a review of its performance.

  9. Fundamental Parallel Algorithms for Private-Cache Chip Multiprocessors

    DEFF Research Database (Denmark)

    Arge, Lars Allan; Goodrich, Michael T.; Nelson, Michael

    2008-01-01

    about the way cores are interconnected, for we assume that all inter-processor communication occurs through the memory hierarchy. We study several fundamental problems, including prefix sums, selection, and sorting, which often form the building blocks of other parallel algorithms. Indeed, we present...... two sorting algorithms, a distribution sort and a mergesort. Our algorithms are asymptotically optimal in terms of parallel cache accesses and space complexity under reasonable assumptions about the relationships between the number of processors, the size of memory, and the size of cache blocks....... In addition, we study sorting lower bounds in a computational model, which we call the parallel external-memory (PEM) model, that formalizes the essential properties of our algorithms for private-cache CMPs....

  10. Numerical simulation of nuclear power plants on multiprocessor systems

    International Nuclear Information System (INIS)

    Boeer, R.; Finnemann, H.; Mueller, R.; Krapf, J.

    1990-04-01

    The present final report gives an account of work performed and results obtained at Siemens/KWU within the frame of SUPRENUM project. The coupled neutron kinetics - thermal hydraulics program system 3D-SIM for the calculation of steady-state and transient conditions in the core and in the coolant loops of light water reactors has been developed. Making efficient use of the parallel computing architecture required the implementation of novel, purpose-built software concepts which can be characterized by multigrid algorithms and parallelization by domain decomposition. The modules treating the reactor core are 3D-NEUT for neutron kinetics and 3D-THERM for thermal hydraulics, the remaining system and control components being represented by the module NLOOP. This modular structure is a main feature of 3D-SIM which allows a flexible choice of applications, employing modules individually or in combinations to fulfil a specific calculation task. In particular, 3D-SIM reactor core representation is fully 3-dimensional for both neutron diffusion and thermal-hydraulic equations, allowing coupled calculations and evaluation of crossflow between fuel elements or subchannels. Including also the loops and plant models, 3D-SIM is a powerful analytical tool for simulating pressurized plant response to a wide spectrum of events. (orig.) With 25 refs., 11 figs [de

  11. Distributed memory in a heterogeneous network, as used in the CERN-PS complex timing system

    CERN Document Server

    Kovaltsov, V I

    1995-01-01

    The Distributed Table Manager (DTM) is a fast and efficient utility for distributing named binary data structures called Tables, of arbitrary size and structure, around a heterogeneous network of computers to a set of registered clients. The Tables are transmitted over a UDP network between DTM servers in network format, where the servers perform the conversions to and from host format for local clients. The servers provide clients with synchronization mechanisms, a choice of network data flows, and table options such as keeping table disc copies, shared memory or heap memory table allocation, table read/write permissions, and table subnet broadcasting. DTM has been designed to be easily maintainable, and to automatically recover from the type of errors typically encountered in a large control system network. The DTM system is based on a three level server daemon hierarchy, in which an inter daemon protocol handles network failures, and incorporates recovery procedures which will guarantee table consistency w...

  12. Quality-driven model-based design of multi-processor accelerators : an application to LDPC decoders

    NARCIS (Netherlands)

    Jan, Y.

    2012-01-01

    The recent spectacular progress in nano-electronic technology has enabled the implementation of very complex multi-processor systems on single chips (MPSoCs). However in parallel, new highly demanding complex embedded applications are emerging, in fields like communication and networking,

  13. Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

    KAUST Repository

    Quintin, Jean-Noel

    2013-10-01

    Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon\\'s algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon\\'s algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.

  14. Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

    KAUST Repository

    Quintin, Jean-Noel; Hasanov, Khalid; Lastovetsky, Alexey

    2013-01-01

    Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon's algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon's algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.

  15. The FORCE: A portable parallel programming language supporting computational structural mechanics

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Brehm, Juergen; Ramanan, Aruna

    1989-01-01

    This project supports the conversion of codes in Computational Structural Mechanics (CSM) to a parallel form which will efficiently exploit the computational power available from multiprocessors. The work is a part of a comprehensive, FORTRAN-based system to form a basis for a parallel version of the NICE/SPAR combination which will form the CSM Testbed. The software is macro-based and rests on the force methodology developed by the principal investigator in connection with an early scientific multiprocessor. Machine independence is an important characteristic of the system so that retargeting it to the Flex/32, or any other multiprocessor on which NICE/SPAR might be imnplemented, is well supported. The principal investigator has experience in producing parallel software for both full and sparse systems of linear equations using the force macros. Other researchers have used the Force in finite element programs. It has been possible to rapidly develop software which performs at maximum efficiency on a multiprocessor. The inherent machine independence of the system also means that the parallelization will not be limited to a specific multiprocessor.

  16. Some methods of encoding simple visual images for use with a sparse distributed memory, with applications to character recognition

    Science.gov (United States)

    Jaeckel, Louis A.

    1989-01-01

    To study the problems of encoding visual images for use with a Sparse Distributed Memory (SDM), I consider a specific class of images- those that consist of several pieces, each of which is a line segment or an arc of a circle. This class includes line drawings of characters such as letters of the alphabet. I give a method of representing a segment of an arc by five numbers in a continuous way; that is, similar arcs have similar representations. I also give methods for encoding these numbers as bit strings in an approximately continuous way. The set of possible segments and arcs may be viewed as a five-dimensional manifold M, whose structure is like a Mobious strip. An image, considered to be an unordered set of segments and arcs, is therefore represented by a set of points in M - one for each piece. I then discuss the problem of constructing a preprocessor to find the segments and arcs in these images, although a preprocessor has not been developed. I also describe a possible extension of the representation.

  17. Multi-processor system for real-time deconvolution and flow estimation in medical ultrasound

    DEFF Research Database (Denmark)

    Jensen, Jesper Lomborg; Jensen, Jørgen Arendt; Stetson, Paul F.

    1996-01-01

    of the algorithms. Many of the algorithms can only be properly evaluated in a clinical setting with real-time processing, which generally cannot be done with conventional equipment. This paper therefore presents a multi-processor system capable of performing 1.2 billion floating point operations per second on RF...... filter is used with a second time-reversed recursive estimation step. Here it is necessary to perform about 70 arithmetic operations per RF sample or about 1 billion operations per second for real-time deconvolution. Furthermore, these have to be floating point operations due to the adaptive nature...... interfaced to our previously-developed real-time sampling system that can acquire RF data at a rate of 20 MHz and simultaneously transmit the data at 20 MHz to the processing system via several parallel channels. These two systems can, thus, perform real-time processing of ultrasound data. The advantage...

  18. Commodity multi-processor systems in the ATLAS level-2 trigger

    International Nuclear Information System (INIS)

    Abolins, M.; Blair, R.; Bock, R.; Bogaerts, A.; Dawson, J.; Ermoline, Y.; Hauser, R.; Kugel, A.; Lay, R.; Muller, M.; Noffz, K.-H.; Pope, B.; Schlereth, J.; Werner, P.

    2000-01-01

    Low cost SMP (Symmetric Multi-Processor) systems provide substantial CPU and I/O capacity. These features together with the ease of system integration make them an attractive and cost effective solution for a number of real-time applications in event selection. In ATLAS the authors consider them as intelligent input buffers (active ROB complex), as event flow supervisors or as powerful processing nodes. Measurements of the performance of one off-the-shelf commercial 4-processor PC with two PCI buses, equipped with commercial FPGA based data source cards (microEnable) and running commercial software are presented and mapped on such applications together with a long-term program of work. The SMP systems may be considered as an important building block in future data acquisition systems

  19. Multi-processor data acquisition and monitoring systems for particle physics

    International Nuclear Information System (INIS)

    White, V.; Burch, B.; Eng, K.; Heinicke, P.; Pyatetsky, M.; Ritchie, D.

    1983-01-01

    A high speed distributed processing system, using PDP-11 and VAX processors, is being developed at Fermilab. The acquisition of data is done using one or more PDP-11s. Additional processors are connected to provide either data logging or extra data analysis capabilities. Within this framework, functional interchangeability of PDP-11 and VAX processors and of the PDP-11 operating systems, RT-11 and RSX-11M, has been maintained. Inter-processor connections have been implemented in a general way using the 5 megabit DR11-W hardware currently selected for the purpose. Using this approach the authors have been able to make use of several existing data acquisition and analysis packages, such as RT/MULTI, in a multi-processor system

  20. Commodity multi-processor systems in the ATLAS level-2 trigger

    CERN Document Server

    Abolins, M; Bock, R; Bogaerts, J A C; Dawson, J; Ermoline, Y; Hauser, R; Kugel, A; Lay, R; Müller, M; Noffz, K H; Pope, B; Schlereth, J L; Werner, P

    2000-01-01

    Low cost SMP (symmetric multi-processor) systems provide substantial CPU and I/O capacity. These features together with the ease of system integration make them an attractive and cost effective solution for a number of real-time applications in event selection. In ATLAS we consider them as intelligent input buffers (an "active" ROB complex), as event flow supervisors or as powerful processing nodes. Measurements of the performance of one off-the-shelf commercial 4- processor PC with two PCI buses, equipped with commercial FPGA based data source cards (microEnable) and running commercial software are presented and mapped on such applications together with a long-term programme of work. The SMP systems may be considered as an important building block in future data acquisition systems. (9 refs).

  1. COMPUTING: International symposium

    International Nuclear Information System (INIS)

    Anon.

    1984-01-01

    Recent Developments in Computing, Processor, and Software Research for High Energy Physics, a four-day international symposium, was held in Guanajuato, Mexico, from 8-11 May, with 112 attendees from nine countries. The symposium was the third in a series of meetings exploring activities in leading-edge computing technology in both processor and software research and their effects on high energy physics. Topics covered included fixed-target on- and off-line reconstruction processors; lattice gauge and general theoretical processors and computing; multiprocessor projects; electron-positron collider on- and offline reconstruction processors; state-of-the-art in university computer science and industry; software research; accelerator processors; and proton-antiproton collider on and off-line reconstruction processors

  2. Optimal data replication: A new approach to optimizing parallel EM algorithms on a mesh-connected multiprocessor for 3D PET image reconstruction

    International Nuclear Information System (INIS)

    Chen, C.M.; Lee, S.Y.

    1995-01-01

    The EM algorithm promises an estimated image with the maximal likelihood for 3D PET image reconstruction. However, due to its long computation time, the EM algorithm has not been widely used in practice. While several parallel implementations of the EM algorithm have been developed to make the EM algorithm feasible, they do not guarantee an optimal parallelization efficiency. In this paper, the authors propose a new parallel EM algorithm which maximizes the performance by optimizing data replication on a mesh-connected message-passing multiprocessor. To optimize data replication, the authors have formally derived the optimal allocation of shared data, group sizes, integration and broadcasting of replicated data as well as the scheduling of shared data accesses. The proposed parallel EM algorithm has been implemented on an iPSC/860 with 16 PEs. The experimental and theoretical results, which are consistent with each other, have shown that the proposed parallel EM algorithm could improve performance substantially over those using unoptimized data replication

  3. Compilation time analysis to minimize run-time overhead in preemptive scheduling on multiprocessors

    Science.gov (United States)

    Wauters, Piet; Lauwereins, Rudy; Peperstraete, J.

    1994-10-01

    This paper describes a scheduling method for hard real-time Digital Signal Processing (DSP) applications, implemented on a multi-processor. Due to the very high operating frequencies of DSP applications (typically hundreds of kHz) runtime overhead should be kept as small as possible. Because static scheduling introduces very little run-time overhead it is used as much as possible. Dynamic pre-emption of tasks is allowed if and only if it leads to better performance in spite of the extra run-time overhead. We essentially combine static scheduling with dynamic pre-emption using static priorities. Since we are dealing with hard real-time applications we must be able to guarantee at compile-time that all timing requirements will be satisfied at run-time. We will show that our method performs at least as good as any static scheduling method. It also reduces the total amount of dynamic pre-emptions compared with run time methods like deadline monotonic scheduling.

  4. Multiprocessor systems for real-time data acquisition on the Asdex upgrade and future plasma experiments

    International Nuclear Information System (INIS)

    Zilker, M.; Hallatschek, K.; Heimann, P.; Hertweck, F.

    1999-01-01

    In this paper we present our transputer-based multitop multiprocessor systems for data acquisition, which are currently used on the Asdex upgrade experiment. The bandwidth of these systems goes from low-speed like the calorimetry diagnostic up to highspeed and large data volume systems like the soft-X-ray and Mirnov diagnostics, which collect several hundreds of megabytes of data during a plasma discharge of ∼8 s. Further, we present the multitop-MX, a newly developed system based on transputers and powerPCs, which provides real-time facilities for analysing the acquired data, to generate necessary information for the dynamic adaptation of sample rates, and to deliver triggers when certain events in the plasma are detected. The algorithm running on the powerPCs performs a wavelet like time-frequency transform. In the last part we give an outlook how to build the next generation of data acquisition systems to be used on the future plasma experiments W7-X and ITER, but also on Asdex upgrade. The hardware of these new distributed systems should be mainly based on established industry standards like the VME-bus, PCI-bus and FiberChannel, but also emerging technologies like SCI (scalable coherent interconnect) should be considered. The systems software should be well designed with object oriented methods to simplify the maintenance process and to enable further expansions and adaptations to new problems in an easy way. (orig.)

  5. Scheduling for energy and reliability management on multiprocessor real-time systems

    Science.gov (United States)

    Qi, Xuan

    Scheduling algorithms for multiprocessor real-time systems have been studied for years with many well-recognized algorithms proposed. However, it is still an evolving research area and many problems remain open due to their intrinsic complexities. With the emergence of multicore processors, it is necessary to re-investigate the scheduling problems and design/develop efficient algorithms for better system utilization, low scheduling overhead, high energy efficiency, and better system reliability. Focusing cluster schedulings with optimal global schedulers, we study the utilization bound and scheduling overhead for a class of cluster-optimal schedulers. Then, taking energy/power consumption into consideration, we developed energy-efficient scheduling algorithms for real-time systems, especially for the proliferating embedded systems with limited energy budget. As the commonly deployed energy-saving technique (e.g. dynamic voltage frequency scaling (DVFS)) will significantly affect system reliability, we study schedulers that have intelligent mechanisms to recuperate system reliability to satisfy the quality assurance requirements. Extensive simulation is conducted to evaluate the performance of the proposed algorithms on reduction of scheduling overhead, energy saving, and reliability improvement. The simulation results show that the proposed reliability-aware power management schemes could preserve the system reliability while still achieving substantial energy saving.

  6. Efficient computation of aerodynamic influence coefficients for aeroelastic analysis on a transputer network

    Science.gov (United States)

    Janetzke, David C.; Murthy, Durbha V.

    1991-01-01

    Aeroelastic analysis is multi-disciplinary and computationally expensive. Hence, it can greatly benefit from parallel processing. As part of an effort to develop an aeroelastic capability on a distributed memory transputer network, a parallel algorithm for the computation of aerodynamic influence coefficients is implemented on a network of 32 transputers. The aerodynamic influence coefficients are calculated using a 3-D unsteady aerodynamic model and a parallel discretization. Efficiencies up to 85 percent were demonstrated using 32 processors. The effect of subtask ordering, problem size, and network topology are presented. A comparison to results on a shared memory computer indicates that higher speedup is achieved on the distributed memory system.

  7. Solving the Stokes problem on a massively parallel computer

    DEFF Research Database (Denmark)

    Axelsson, Owe; Barker, Vincent A.; Neytcheva, Maya

    2001-01-01

    boundary value problem for each velocity component, are solved by the conjugate gradient method with a preconditioning based on the algebraic multi‐level iteration (AMLI) technique. The velocity is found from the computed pressure. The method is optimal in the sense that the computational work...... is proportional to the number of unknowns. Further, it is designed to exploit a massively parallel computer with distributed memory architecture. Numerical experiments on a Cray T3E computer illustrate the parallel performance of the method....

  8. COMPUTING

    CERN Multimedia

    M. Kasemann

    Overview In autumn the main focus was to process and handle CRAFT data and to perform the Summer08 MC production. The operational aspects were well covered by regular Computing Shifts, experts on duty and Computing Run Coordination. At the Computing Resource Board (CRB) in October a model to account for service work at Tier 2s was approved. The computing resources for 2009 were reviewed for presentation at the C-RRB. The quarterly resource monitoring is continuing. Facilities/Infrastructure operations Operations during CRAFT data taking ran fine. This proved to be a very valuable experience for T0 workflows and operations. The transfers of custodial data to most T1s went smoothly. A first round of reprocessing started at the Tier-1 centers end of November; it will take about two weeks. The Computing Shifts procedure was tested full scale during this period and proved to be very efficient: 30 Computing Shifts Persons (CSP) and 10 Computing Resources Coordinators (CRC). The shift program for the shut down w...

  9. Computer Architecture A Quantitative Approach

    CERN Document Server

    Hennessy, John L

    2007-01-01

    The era of seemingly unlimited growth in processor performance is over: single chip architectures can no longer overcome the performance limitations imposed by the power they consume and the heat they generate. Today, Intel and other semiconductor firms are abandoning the single fast processor model in favor of multi-core microprocessors--chips that combine two or more processors in a single package. In the fourth edition of Computer Architecture, the authors focus on this historic shift, increasing their coverage of multiprocessors and exploring the most effective ways of achieving parallelis

  10. COMPUTING

    CERN Multimedia

    I. Fisk

    2011-01-01

    Introduction CMS distributed computing system performed well during the 2011 start-up. The events in 2011 have more pile-up and are more complex than last year; this results in longer reconstruction times and harder events to simulate. Significant increases in computing capacity were delivered in April for all computing tiers, and the utilisation and load is close to the planning predictions. All computing centre tiers performed their expected functionalities. Heavy-Ion Programme The CMS Heavy-Ion Programme had a very strong showing at the Quark Matter conference. A large number of analyses were shown. The dedicated heavy-ion reconstruction facility at the Vanderbilt Tier-2 is still involved in some commissioning activities, but is available for processing and analysis. Facilities and Infrastructure Operations Facility and Infrastructure operations have been active with operations and several important deployment tasks. Facilities participated in the testing and deployment of WMAgent and WorkQueue+Request...

  11. COMPUTING

    CERN Multimedia

    P. McBride

    The Computing Project is preparing for a busy year where the primary emphasis of the project moves towards steady operations. Following the very successful completion of Computing Software and Analysis challenge, CSA06, last fall, we have reorganized and established four groups in computing area: Commissioning, User Support, Facility/Infrastructure Operations and Data Operations. These groups work closely together with groups from the Offline Project in planning for data processing and operations. Monte Carlo production has continued since CSA06, with about 30M events produced each month to be used for HLT studies and physics validation. Monte Carlo production will continue throughout the year in the preparation of large samples for physics and detector studies ramping to 50 M events/month for CSA07. Commissioning of the full CMS computing system is a major goal for 2007. Site monitoring is an important commissioning component and work is ongoing to devise CMS specific tests to be included in Service Availa...

  12. COMPUTING

    CERN Multimedia

    M. Kasemann

    Overview During the past three months activities were focused on data operations, testing and re-enforcing shift and operational procedures for data production and transfer, MC production and on user support. Planning of the computing resources in view of the new LHC calendar in ongoing. Two new task forces were created for supporting the integration work: Site Commissioning, which develops tools helping distributed sites to monitor job and data workflows, and Analysis Support, collecting the user experience and feedback during analysis activities and developing tools to increase efficiency. The development plan for DMWM for 2009/2011 was developed at the beginning of the year, based on the requirements from the Physics, Computing and Offline groups (see Offline section). The Computing management meeting at FermiLab on February 19th and 20th was an excellent opportunity discussing the impact and for addressing issues and solutions to the main challenges facing CMS computing. The lack of manpower is particul...

  13. Design concepts for a virtualizable embedded MPSoC architecture enabling virtualization in embedded multi-processor systems

    CERN Document Server

    Biedermann, Alexander

    2014-01-01

    Alexander Biedermann presents a generic hardware-based virtualization approach, which may transform an array of any off-the-shelf embedded processors into a multi-processor system with high execution dynamism. Based on this approach, he highlights concepts for the design of energy aware systems, self-healing systems as well as parallelized systems. For the latter, the novel so-called Agile Processing scheme is introduced by the author, which enables a seamless transition between sequential and parallel execution schemes. The design of such virtualizable systems is further aided by introduction

  14. Design of Networks-on-Chip for Real-Time Multi-Processor Systems-on-Chip

    DEFF Research Database (Denmark)

    Sparsø, Jens

    2012-01-01

    This paper addresses the design of networks-on-chips for use in multi-processor systems-on-chips - the hardware platforms used in embedded systems. These platforms typically have to guarantee real-time properties, and as the network is a shared resource, it has to provide service guarantees...... (bandwidth and/or latency) to different communication flows. The paper reviews some past work in this field and the lessons learned, and the paper discusses ongoing research conducted as part of the project "Time-predictable Multi-Core Architecture for Embedded Systems" (T-CREST), supported by the European...

  15. Embedded software design and programming of multiprocessor system-on-chip simulink and system C case studies

    CERN Document Server

    Popovici, Katalin; Jerraya, Ahmed A; Wolf, Marilyn

    2010-01-01

    Current multimedia and telecom applications require complex, heterogeneous multiprocessor system on chip (MPSoC) architectures with specific communication infrastructure in order to achieve the required performance. Heterogeneous MPSoC includes different types of processing units (DSP, microcontroller, ASIP) and different communication schemes (fast links, non standard memory organization and access).Programming an MPSoC requires the generation of efficient software running on MPSoC from a high level environment, by using the characteristics of the architecture. This task is known to be tediou

  16. COMPUTING

    CERN Multimedia

    I. Fisk

    2013-01-01

    Computing activity had ramped down after the completion of the reprocessing of the 2012 data and parked data, but is increasing with new simulation samples for analysis and upgrade studies. Much of the Computing effort is currently involved in activities to improve the computing system in preparation for 2015. Operations Office Since the beginning of 2013, the Computing Operations team successfully re-processed the 2012 data in record time, not only by using opportunistic resources like the San Diego Supercomputer Center which was accessible, to re-process the primary datasets HTMHT and MultiJet in Run2012D much earlier than planned. The Heavy-Ion data-taking period was successfully concluded in February collecting almost 500 T. Figure 3: Number of events per month (data) In LS1, our emphasis is to increase efficiency and flexibility of the infrastructure and operation. Computing Operations is working on separating disk and tape at the Tier-1 sites and the full implementation of the xrootd federation ...

  17. COMPUTING

    CERN Multimedia

    I. Fisk

    2010-01-01

    Introduction It has been a very active quarter in Computing with interesting progress in all areas. The activity level at the computing facilities, driven by both organised processing from data operations and user analysis, has been steadily increasing. The large-scale production of simulated events that has been progressing throughout the fall is wrapping-up and reprocessing with pile-up will continue. A large reprocessing of all the proton-proton data has just been released and another will follow shortly. The number of analysis jobs by users each day, that was already hitting the computing model expectations at the time of ICHEP, is now 33% higher. We are expecting a busy holiday break to ensure samples are ready in time for the winter conferences. Heavy Ion An activity that is still in progress is computing for the heavy-ion program. The heavy-ion events are collected without zero suppression, so the event size is much large at roughly 11 MB per event of RAW. The central collisions are more complex and...

  18. COMPUTING

    CERN Multimedia

    M. Kasemann P. McBride Edited by M-C. Sawley with contributions from: P. Kreuzer D. Bonacorsi S. Belforte F. Wuerthwein L. Bauerdick K. Lassila-Perini M-C. Sawley

    Introduction More than seventy CMS collaborators attended the Computing and Offline Workshop in San Diego, California, April 20-24th to discuss the state of readiness of software and computing for collisions. Focus and priority were given to preparations for data taking and providing room for ample dialog between groups involved in Commissioning, Data Operations, Analysis and MC Production. Throughout the workshop, aspects of software, operating procedures and issues addressing all parts of the computing model were discussed. Plans for the CMS participation in STEP’09, the combined scale testing for all four experiments due in June 2009, were refined. The article in CMS Times by Frank Wuerthwein gave a good recap of the highly collaborative atmosphere of the workshop. Many thanks to UCSD and to the organizers for taking care of this workshop, which resulted in a long list of action items and was definitely a success. A considerable amount of effort and care is invested in the estimate of the comput...

  19. Dynamic grid refinement for partial differential equations on parallel computers

    International Nuclear Information System (INIS)

    Mccormick, S.; Quinlan, D.

    1989-01-01

    The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids to provide adaptive resolution and fast solution of PDEs. An asynchronous version of FAC, called AFAC, that completely eliminates the bottleneck to parallelism is presented. This paper describes the advantage that this algorithm has in adaptive refinement for moving singularities on multiprocessor computers. This work is applicable to the parallel solution of two- and three-dimensional shock tracking problems. 6 refs

  20. Domain Decomposition: A Bridge between Nature and Parallel Computers

    Science.gov (United States)

    1992-09-01

    B., "Domain Decomposition Algorithms for Indefinite Elliptic Problems," S"IAM Journal of S; cientific and Statistical (’omputing, Vol. 13, 1992, pp...AD-A256 575 NASA Contractor Report 189709 ICASE Report No. 92-44 ICASE DOMAIN DECOMPOSITION: A BRIDGE BETWEEN NATURE AND PARALLEL COMPUTERS DTIC dE...effectively implemented on dis- tributed memory multiprocessors. In 1990 (as reported in Ref. 38 using the tile algo- rithm), a 103,201-unknown 2D elliptic

  1. Process Management and Exception Handling in Multiprocessor Operating Systems Using Object-Oriented Design Techniques. Revised Sep. 1988

    Science.gov (United States)

    Russo, Vincent; Johnston, Gary; Campbell, Roy

    1988-01-01

    The programming of the interrupt handling mechanisms, process switching primitives, scheduling mechanism, and synchronization primitives of an operating system for a multiprocessor require both efficient code in order to support the needs of high- performance or real-time applications and careful organization to facilitate maintenance. Although many advantages have been claimed for object-oriented class hierarchical languages and their corresponding design methodologies, the application of these techniques to the design of the primitives within an operating system has not been widely demonstrated. To investigate the role of class hierarchical design in systems programming, the authors have constructed the Choices multiprocessor operating system architecture the C++ programming language. During the implementation, it was found that many operating system design concerns can be represented advantageously using a class hierarchical approach, including: the separation of mechanism and policy; the organization of an operating system into layers, each of which represents an abstract machine; and the notions of process and exception management. In this paper, we discuss an implementation of the low-level primitives of this system and outline the strategy by which we developed our solution.

  2. COMPUTING

    CERN Multimedia

    P. McBride

    It has been a very active year for the computing project with strong contributions from members of the global community. The project has focused on site preparation and Monte Carlo production. The operations group has begun processing data from P5 as part of the global data commissioning. Improvements in transfer rates and site availability have been seen as computing sites across the globe prepare for large scale production and analysis as part of CSA07. Preparations for the upcoming Computing Software and Analysis Challenge CSA07 are progressing. Ian Fisk and Neil Geddes have been appointed as coordinators for the challenge. CSA07 will include production tests of the Tier-0 production system, reprocessing at the Tier-1 sites and Monte Carlo production at the Tier-2 sites. At the same time there will be a large analysis exercise at the Tier-2 centres. Pre-production simulation of the Monte Carlo events for the challenge is beginning. Scale tests of the Tier-0 will begin in mid-July and the challenge it...

  3. COMPUTING

    CERN Multimedia

    M. Kasemann

    Introduction During the past six months, Computing participated in the STEP09 exercise, had a major involvement in the October exercise and has been working with CMS sites on improving open issues relevant for data taking. At the same time operations for MC production, real data reconstruction and re-reconstructions and data transfers at large scales were performed. STEP09 was successfully conducted in June as a joint exercise with ATLAS and the other experiments. It gave good indication about the readiness of the WLCG infrastructure with the two major LHC experiments stressing the reading, writing and processing of physics data. The October Exercise, in contrast, was conducted as an all-CMS exercise, where Physics, Computing and Offline worked on a common plan to exercise all steps to efficiently access and analyze data. As one of the major results, the CMS Tier-2s demonstrated to be fully capable for performing data analysis. In recent weeks, efforts were devoted to CMS Computing readiness. All th...

  4. COMPUTING

    CERN Multimedia

    I. Fisk

    2011-01-01

    Introduction It has been a very active quarter in Computing with interesting progress in all areas. The activity level at the computing facilities, driven by both organised processing from data operations and user analysis, has been steadily increasing. The large-scale production of simulated events that has been progressing throughout the fall is wrapping-up and reprocessing with pile-up will continue. A large reprocessing of all the proton-proton data has just been released and another will follow shortly. The number of analysis jobs by users each day, that was already hitting the computing model expectations at the time of ICHEP, is now 33% higher. We are expecting a busy holiday break to ensure samples are ready in time for the winter conferences. Heavy Ion The Tier 0 infrastructure was able to repack and promptly reconstruct heavy-ion collision data. Two copies were made of the data at CERN using a large CASTOR disk pool, and the core physics sample was replicated ...

  5. COMPUTING

    CERN Multimedia

    I. Fisk

    2012-01-01

    Introduction Computing continued with a high level of activity over the winter in preparation for conferences and the start of the 2012 run. 2012 brings new challenges with a new energy, more complex events, and the need to make the best use of the available time before the Long Shutdown. We expect to be resource constrained on all tiers of the computing system in 2012 and are working to ensure the high-priority goals of CMS are not impacted. Heavy ions After a successful 2011 heavy-ion run, the programme is moving to analysis. During the run, the CAF resources were well used for prompt analysis. Since then in 2012 on average 200 job slots have been used continuously at Vanderbilt for analysis workflows. Operations Office As of 2012, the Computing Project emphasis has moved from commissioning to operation of the various systems. This is reflected in the new organisation structure where the Facilities and Data Operations tasks have been merged into a common Operations Office, which now covers everything ...

  6. COMPUTING

    CERN Multimedia

    M. Kasemann

    CCRC’08 challenges and CSA08 During the February campaign of the Common Computing readiness challenges (CCRC’08), the CMS computing team had achieved very good results. The link between the detector site and the Tier0 was tested by gradually increasing the number of parallel transfer streams well beyond the target. Tests covered the global robustness at the Tier0, processing a massive number of very large files and with a high writing speed to tapes.  Other tests covered the links between the different Tiers of the distributed infrastructure and the pre-staging and reprocessing capacity of the Tier1’s: response time, data transfer rate and success rate for Tape to Buffer staging of files kept exclusively on Tape were measured. In all cases, coordination with the sites was efficient and no serious problem was found. These successful preparations prepared the ground for the second phase of the CCRC’08 campaign, in May. The Computing Software and Analysis challen...

  7. COMPUTING

    CERN Multimedia

    I. Fisk

    2010-01-01

    Introduction The first data taking period of November produced a first scientific paper, and this is a very satisfactory step for Computing. It also gave the invaluable opportunity to learn and debrief from this first, intense period, and make the necessary adaptations. The alarm procedures between different groups (DAQ, Physics, T0 processing, Alignment/calibration, T1 and T2 communications) have been reinforced. A major effort has also been invested into remodeling and optimizing operator tasks in all activities in Computing, in parallel with the recruitment of new Cat A operators. The teams are being completed and by mid year the new tasks will have been assigned. CRB (Computing Resource Board) The Board met twice since last CMS week. In December it reviewed the experience of the November data-taking period and could measure the positive improvements made for the site readiness. It also reviewed the policy under which Tier-2 are associated with Physics Groups. Such associations are decided twice per ye...

  8. COMPUTING

    CERN Multimedia

    M. Kasemann

    Introduction More than seventy CMS collaborators attended the Computing and Offline Workshop in San Diego, California, April 20-24th to discuss the state of readiness of software and computing for collisions. Focus and priority were given to preparations for data taking and providing room for ample dialog between groups involved in Commissioning, Data Operations, Analysis and MC Production. Throughout the workshop, aspects of software, operating procedures and issues addressing all parts of the computing model were discussed. Plans for the CMS participation in STEP’09, the combined scale testing for all four experiments due in June 2009, were refined. The article in CMS Times by Frank Wuerthwein gave a good recap of the highly collaborative atmosphere of the workshop. Many thanks to UCSD and to the organizers for taking care of this workshop, which resulted in a long list of action items and was definitely a success. A considerable amount of effort and care is invested in the estimate of the co...

  9. Hybrid Simulation of the Interaction of Europa's Atmosphere with the Jovian Plasma: Multiprocessor Simulations

    Science.gov (United States)

    Dols, V. J.; Delamere, P. A.; Bagenal, F.; Cassidy, T. A.; Crary, F. J.

    2014-12-01

    We model the interaction of Europa's tenuous atmosphere with the plasma of Jupiter's torus with an improved version of our hybrid plasma code. In a hybrid plasma code, the ions are treated as kinetic Macro-particles moving under the Lorentz force and the electrons as a fluid leading to a generalized formulation of Ohm's law. In this version, the spatial simulation domain is decomposed in 2 directions and is non-uniform in the plasma convection direction. The code is run on a multi-processor supercomputer that offers 16416 cores and 2GB Ram per core. This new version allows us to tap into the large memory of the supercomputer and simulate the full interaction volume (Reuropa=1561km) with a high spatial resolution (50km). Compared to Io, Europa's atmosphere is about 100 times more tenuous, the ambient magnetic field is weaker and the density of incident plasma is lower. Consequently, the electrodynamic interaction is also weaker and substantial fluxes of thermal torus ions might reach and sputter the icy surface. Molecular O2 is the dominant atmospheric product of this surface sputtering. Observations of oxygen UV emissions (specifically the ratio of OI 1356A / 1304A emissions) are roughly consistent with an atmosphere that is composed predominantely of O2 with a small amount of atomic O. Galileo observations along flybys close to Europa have revealed the existence of induced currents in a conducting ocean under the icy crust. They also showed that, from flyby to flyby, the plasma interaction is very variable. Asymmetries of the plasma density and temperature in the wake of Europa were also observed and still elude a clear explanation. Galileo mag data also detected ion cyclotron waves, which is an indication of heavy ion pickup close to the moon. We prescribe an O2 atmosphere with a vertical density column consistent with UV observations and model the plasma properties along several Galileo flybys of the moon. We compare our results with the magnetometer

  10. A Parallel Distributed-Memory Particle Method Enables Acquisition-Rate Segmentation of Large Fluorescence Microscopy Images.

    Directory of Open Access Journals (Sweden)

    Yaser Afshar

    Full Text Available Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 10(10 pixels, but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments.

  11. A Parallel Distributed-Memory Particle Method Enables Acquisition-Rate Segmentation of Large Fluorescence Microscopy Images.

    Science.gov (United States)

    Afshar, Yaser; Sbalzarini, Ivo F

    2016-01-01

    Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 10(10) pixels), but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments.

  12. A Parallel Distributed-Memory Particle Method Enables Acquisition-Rate Segmentation of Large Fluorescence Microscopy Images

    Science.gov (United States)

    Afshar, Yaser; Sbalzarini, Ivo F.

    2016-01-01

    Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 1010 pixels), but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments. PMID:27046144

  13. COMPUTING

    CERN Multimedia

    2010-01-01

    Introduction Just two months after the “LHC First Physics” event of 30th March, the analysis of the O(200) million 7 TeV collision events in CMS accumulated during the first 60 days is well under way. The consistency of the CMS computing model has been confirmed during these first weeks of data taking. This model is based on a hierarchy of use-cases deployed between the different tiers and, in particular, the distribution of RECO data to T1s, who then serve data on request to T2s, along a topology known as “fat tree”. Indeed, during this period this model was further extended by almost full “mesh” commissioning, meaning that RECO data were shipped to T2s whenever possible, enabling additional physics analyses compared with the “fat tree” model. Computing activities at the CMS Analysis Facility (CAF) have been marked by a good time response for a load almost evenly shared between ALCA (Alignment and Calibration tasks - highest p...

  14. COMPUTING

    CERN Multimedia

    Contributions from I. Fisk

    2012-01-01

    Introduction The start of the 2012 run has been busy for Computing. We have reconstructed, archived, and served a larger sample of new data than in 2011, and we are in the process of producing an even larger new sample of simulations at 8 TeV. The running conditions and system performance are largely what was anticipated in the plan, thanks to the hard work and preparation of many people. Heavy ions Heavy Ions has been actively analysing data and preparing for conferences.  Operations Office Figure 6: Transfers from all sites in the last 90 days For ICHEP and the Upgrade efforts, we needed to produce and process record amounts of MC samples while supporting the very successful data-taking. This was a large burden, especially on the team members. Nevertheless the last three months were very successful and the total output was phenomenal, thanks to our dedicated site admins who keep the sites operational and the computing project members who spend countless hours nursing the...

  15. COMPUTING

    CERN Multimedia

    M. Kasemann

    Introduction A large fraction of the effort was focused during the last period into the preparation and monitoring of the February tests of Common VO Computing Readiness Challenge 08. CCRC08 is being run by the WLCG collaboration in two phases, between the centres and all experiments. The February test is dedicated to functionality tests, while the May challenge will consist of running at all centres and with full workflows. For this first period, a number of functionality checks of the computing power, data repositories and archives as well as network links are planned. This will help assess the reliability of the systems under a variety of loads, and identifying possible bottlenecks. Many tests are scheduled together with other VOs, allowing the full scale stress test. The data rates (writing, accessing and transfer¬ring) are being checked under a variety of loads and operating conditions, as well as the reliability and transfer rates of the links between Tier-0 and Tier-1s. In addition, the capa...

  16. COMPUTING

    CERN Multimedia

    Matthias Kasemann

    Overview The main focus during the summer was to handle data coming from the detector and to perform Monte Carlo production. The lessons learned during the CCRC and CSA08 challenges in May were addressed by dedicated PADA campaigns lead by the Integration team. Big improvements were achieved in the stability and reliability of the CMS Tier1 and Tier2 centres by regular and systematic follow-up of faults and errors with the help of the Savannah bug tracking system. In preparation for data taking the roles of a Computing Run Coordinator and regular computing shifts monitoring the services and infrastructure as well as interfacing to the data operations tasks are being defined. The shift plan until the end of 2008 is being put together. User support worked on documentation and organized several training sessions. The ECoM task force delivered the report on “Use Cases for Start-up of pp Data-Taking” with recommendations and a set of tests to be performed for trigger rates much higher than the ...

  17. COMPUTING

    CERN Multimedia

    P. MacBride

    The Computing Software and Analysis Challenge CSA07 has been the main focus of the Computing Project for the past few months. Activities began over the summer with the preparation of the Monte Carlo data sets for the challenge and tests of the new production system at the Tier-0 at CERN. The pre-challenge Monte Carlo production was done in several steps: physics generation, detector simulation, digitization, conversion to RAW format and the samples were run through the High Level Trigger (HLT). The data was then merged into three "Soups": Chowder (ALPGEN), Stew (Filtered Pythia) and Gumbo (Pythia). The challenge officially started when the first Chowder events were reconstructed on the Tier-0 on October 3rd. The data operations teams were very busy during the the challenge period. The MC production teams continued with signal production and processing while the Tier-0 and Tier-1 teams worked on splitting the Soups into Primary Data Sets (PDS), reconstruction and skimming. The storage sys...

  18. COMPUTING

    CERN Multimedia

    I. Fisk

    2013-01-01

    Computing operation has been lower as the Run 1 samples are completing and smaller samples for upgrades and preparations are ramping up. Much of the computing activity is focusing on preparations for Run 2 and improvements in data access and flexibility of using resources. Operations Office Data processing was slow in the second half of 2013 with only the legacy re-reconstruction pass of 2011 data being processed at the sites.   Figure 1: MC production and processing was more in demand with a peak of over 750 Million GEN-SIM events in a single month.   Figure 2: The transfer system worked reliably and efficiently and transferred on average close to 520 TB per week with peaks at close to 1.2 PB.   Figure 3: The volume of data moved between CMS sites in the last six months   The tape utilisation was a focus for the operation teams with frequent deletion campaigns from deprecated 7 TeV MC GEN-SIM samples to INVALID datasets, which could be cleaned up...

  19. COMPUTING

    CERN Multimedia

    I. Fisk

    2012-01-01

      Introduction Computing activity has been running at a sustained, high rate as we collect data at high luminosity, process simulation, and begin to process the parked data. The system is functional, though a number of improvements are planned during LS1. Many of the changes will impact users, we hope only in positive ways. We are trying to improve the distributed analysis tools as well as the ability to access more data samples more transparently.  Operations Office Figure 2: Number of events per month, for 2012 Since the June CMS Week, Computing Operations teams successfully completed data re-reconstruction passes and finished the CMSSW_53X MC campaign with over three billion events available in AOD format. Recorded data was successfully processed in parallel, exceeding 1.2 billion raw physics events per month for the first time in October 2012 due to the increase in data-parking rate. In parallel, large efforts were dedicated to WMAgent development and integrati...

  20. COMPUTING

    CERN Multimedia

    I. Fisk

    2011-01-01

    Introduction The Computing Team successfully completed the storage, initial processing, and distribution for analysis of proton-proton data in 2011. There are still a variety of activities ongoing to support winter conference activities and preparations for 2012. Heavy ions The heavy-ion run for 2011 started in early November and has already demonstrated good machine performance and success of some of the more advanced workflows planned for 2011. Data collection will continue until early December. Facilities and Infrastructure Operations Operational and deployment support for WMAgent and WorkQueue+Request Manager components, routinely used in production by Data Operations, are provided. The GlideInWMS and components installation are now deployed at CERN, which is added to the GlideInWMS factory placed in the US. There has been new operational collaboration between the CERN team and the UCSD GlideIn factory operators, covering each others time zones by monitoring/debugging pilot jobs sent from the facto...

  1. Algorithms for parallel computers

    International Nuclear Information System (INIS)

    Churchhouse, R.F.

    1985-01-01

    Until relatively recently almost all the algorithms for use on computers had been designed on the (usually unstated) assumption that they were to be run on single processor, serial machines. With the introduction of vector processors, array processors and interconnected systems of mainframes, minis and micros, however, various forms of parallelism have become available. The advantage of parallelism is that it offers increased overall processing speed but it also raises some fundamental questions, including: (i) which, if any, of the existing 'serial' algorithms can be adapted for use in the parallel mode. (ii) How close to optimal can such adapted algorithms be and, where relevant, what are the convergence criteria. (iii) How can we design new algorithms specifically for parallel systems. (iv) For multi-processor systems how can we handle the software aspects of the interprocessor communications. Aspects of these questions illustrated by examples are considered in these lectures. (orig.)

  2. COMPUTING

    CERN Multimedia

    M. Kasemann

    CMS relies on a well functioning, distributed computing infrastructure. The Site Availability Monitoring (SAM) and the Job Robot submission have been very instrumental for site commissioning in order to increase availability of more sites such that they are available to participate in CSA07 and are ready to be used for analysis. The commissioning process has been further developed, including "lessons learned" documentation via the CMS twiki. Recently the visualization, presentation and summarizing of SAM tests for sites has been redesigned, it is now developed by the central ARDA project of WLCG. Work to test the new gLite Workload Management System was performed; a 4 times increase in throughput with respect to LCG Resource Broker is observed. CMS has designed and launched a new-generation traffic load generator called "LoadTest" to commission and to keep exercised all data transfer routes in the CMS PhE-DEx topology. Since mid-February, a transfer volume of about 12 P...

  3. A pipelined architecture for real time correction of non-uniformity in infrared focal plane arrays imaging system using multiprocessors

    Science.gov (United States)

    Zou, Liang; Fu, Zhuang; Zhao, YanZheng; Yang, JunYan

    2010-07-01

    This paper proposes a kind of pipelined electric circuit architecture implemented in FPGA, a very large scale integrated circuit (VLSI), which efficiently deals with the real time non-uniformity correction (NUC) algorithm for infrared focal plane arrays (IRFPA). Dual Nios II soft-core processors and a DSP with a 64+ core together constitute this image system. Each processor undertakes own systematic task, coordinating its work with each other's. The system on programmable chip (SOPC) in FPGA works steadily under the global clock frequency of 96Mhz. Adequate time allowance makes FPGA perform NUC image pre-processing algorithm with ease, which has offered favorable guarantee for the work of post image processing in DSP. And at the meantime, this paper presents a hardware (HW) and software (SW) co-design in FPGA. Thus, this systematic architecture yields an image processing system with multiprocessor, and a smart solution to the satisfaction with the performance of the system.

  4. PSHED: a simplified approach to developing parallel programs

    International Nuclear Information System (INIS)

    Mahajan, S.M.; Ramesh, K.; Rajesh, K.; Somani, A.; Goel, M.

    1992-01-01

    This paper presents a simplified approach in the forms of a tree structured computational model for parallel application programs. An attempt is made to provide a standard user interface to execute programs on BARC Parallel Processing System (BPPS), a scalable distributed memory multiprocessor. The interface package called PSHED provides a basic framework for representing and executing parallel programs on different parallel architectures. The PSHED package incorporates concepts from a broad range of previous research in programming environments and parallel computations. (author). 6 refs

  5. A new communication scheme for the neutron diffusion nodal method in a distributed computing environment

    International Nuclear Information System (INIS)

    Kirk, B.L.; Azmy, Y.

    1994-01-01

    A modified scheme is developed for solving the two-dimensional nodal diffusion equations on distributed memory computers. The scheme is aimed at minimizing the volume of communication among processors while maximizing the tasks in parallel. Results show a significant improvement in parallel efficiency on the Intel iPSC/860 hypercube compared to previous algorithms

  6. Introduction to programming multiple-processor computers

    International Nuclear Information System (INIS)

    Hicks, H.R.; Lynch, V.E.

    1985-04-01

    FORTRAN applications programs can be executed on multiprocessor computers in either a unitasking (traditional) or multitasking form. The latter allows a single job to use more than one processor simultaneously, with a consequent reduction in wall-clock time and, perhaps, the cost of the calculation. An introduction to programming in this environment is presented. The concepts of synchronization and data sharing using EVENTS and LOCKS are illustrated with examples. The strategy of strong synchronization and the use of synchronization templates are proposed. We emphasize that incorrect multitasking programs can produce irreproducible results, which makes debugging more difficult

  7. The ACP [Advanced Computer Program] Branch bus and real-time applications of the ACP multiprocessor system

    International Nuclear Information System (INIS)

    Hance, R.; Areti, H.; Atac, R.

    1987-01-01

    The ACP Branchbus, a high speed differential bus for data movement in multiprocessing and data acquisition environments, is described. This bus was designed as the central bus in the ACP multiprocessing system. In its full implementation with 16 branches and a bus switch, it will handle data rates of 160 MByte/sec and allow reliable data transmission over inter rack distances. We also summarize applications of the ACP system in experimental data acquisition, triggering and monitoring, with special attention paid to FASTBUS environments

  8. Parallel computing by Monte Carlo codes MVP/GMVP

    International Nuclear Information System (INIS)

    Nagaya, Yasunobu; Nakagawa, Masayuki; Mori, Takamasa

    2001-01-01

    General-purpose Monte Carlo codes MVP/GMVP are well-vectorized and thus enable us to perform high-speed Monte Carlo calculations. In order to achieve more speedups, we parallelized the codes on the different types of parallel computing platforms or by using a standard parallelization library MPI. The platforms used for benchmark calculations are a distributed-memory vector-parallel computer Fujitsu VPP500, a distributed-memory massively parallel computer Intel paragon and a distributed-memory scalar-parallel computer Hitachi SR2201, IBM SP2. As mentioned generally, linear speedup could be obtained for large-scale problems but parallelization efficiency decreased as the batch size per a processing element(PE) was smaller. It was also found that the statistical uncertainty for assembly powers was less than 0.1% by the PWR full-core calculation with more than 10 million histories and it took about 1.5 hours by massively parallel computing. (author)

  9. Bringing high-performance computing to the biologist's workbench: approaches, applications, and challenges

    International Nuclear Information System (INIS)

    Oehmen, C S; Cannon, W R

    2008-01-01

    Data-intensive and high-performance computing are poised to significantly impact the future of biological research which is increasingly driven by the prevalence of high-throughput experimental methodologies for genome sequencing, transcriptomics, proteomics, and other areas. Large centers such as NIH's National Center for Biotechnology Information, The Institute for Genomic Research, and the DOE's Joint Genome Institute) have made extensive use of multiprocessor architectures to deal with some of the challenges of processing, storing and curating exponentially growing genomic and proteomic datasets, thus enabling users to rapidly access a growing public data source, as well as use analysis tools transparently on high-performance computing resources. Applying this computational power to single-investigator analysis, however, often relies on users to provide their own computational resources, forcing them to endure the learning curve of porting, building, and running software on multiprocessor architectures. Solving the next generation of large-scale biology challenges using multiprocessor machines-from small clusters to emerging petascale machines-can most practically be realized if this learning curve can be minimized through a combination of workflow management, data management and resource allocation as well as intuitive interfaces and compatibility with existing common data formats

  10. Computing possibilities in the mid 1990s

    International Nuclear Information System (INIS)

    Nash, T.

    1988-09-01

    This paper describes the kind of computing resources it may be possible to make available for experiments in high energy physics in the mid and late 1990s. We outline some of the work going on today, particularly at Fermilab's Advanced Computer Program, that projects to the future. We attempt to define areas in which coordinated R and D efforts should prove fruitful to provide for on and off-line computing in the SSC era. Because of extraordinary components anticipated from industry, we can be optimistic even to the level of predicting million VAX equivalent on-line multiprocessor/data acquisition systems for SSC detectors. Managing this scale of computing will require a new approach to large hardware and software systems. 15 refs., 6 figs

  11. A parallel row-based algorithm with error control for standard-cell replacement on a hypercube multiprocessor

    Science.gov (United States)

    Sargent, Jeff Scott

    1988-01-01

    A new row-based parallel algorithm for standard-cell placement targeted for execution on a hypercube multiprocessor is presented. Key features of this implementation include a dynamic simulated-annealing schedule, row-partitioning of the VLSI chip image, and two novel new approaches to controlling error in parallel cell-placement algorithms; Heuristic Cell-Coloring and Adaptive (Parallel Move) Sequence Control. Heuristic Cell-Coloring identifies sets of noninteracting cells that can be moved repeatedly, and in parallel, with no buildup of error in the placement cost. Adaptive Sequence Control allows multiple parallel cell moves to take place between global cell-position updates. This feedback mechanism is based on an error bound derived analytically from the traditional annealing move-acceptance profile. Placement results are presented for real industry circuits and the performance is summarized of an implementation on the Intel iPSC/2 Hypercube. The runtime of this algorithm is 5 to 16 times faster than a previous program developed for the Hypercube, while producing equivalent quality placement. An integrated place and route program for the Intel iPSC/2 Hypercube is currently being developed.

  12. ePRO-MP: A Tool for Profiling and Optimizing Energy and Performance of Mobile Multiprocessor Applications

    Directory of Open Access Journals (Sweden)

    Wonil Choi

    2009-01-01

    Full Text Available For mobile multiprocessor applications, achieving high performance with low energy consumption is a challenging task. In order to help programmers to meet these design requirements, system development tools play an important role. In this paper, we describe one such development tool, ePRO-MP, which profiles and optimizes both performance and energy consumption of multi-threaded applications running on top of Linux for ARM11 MPCore-based embedded systems. One of the key features of ePRO-MP is that it can accurately estimate the energy consumption of multi-threaded applications without requiring a power measurement equipment, using a regression-based energy model. We also describe another key benefit of ePRO-MP, an automatic optimization function, using two example problems. Using the automatic optimization function, ePRO-MP can achieve high performance and low power consumption without programmer intervention. Our experimental results show that ePRO-MP can improve the performance and energy consumption by 6.1% and 4.1%, respectively, over a baseline version for the co-running applications optimization example. For the producer-consumer application optimization example, ePRO-MP improves the performance and energy consumption by 60.5% and 43.3%, respectively over a baseline version.

  13. Picture processing computer to control movement by computer provided vision

    Energy Technology Data Exchange (ETDEWEB)

    Graefe, V

    1983-01-01

    The author introduces a multiprocessor system which has been specially developed to enable mechanical devices to interpret pictures presented in real time. The separate processors within this system operate simultaneously and independently. By means of freely moveable windows the processors can concentrate on those parts of the picture that are relevant to the control problem. If a machine is to make a correct response to its observation of a picture of moving objects, it must be able to follow the picture sequence, step by step, in real time. As the usual serially operating processors are too slow for such a task, the author describes three models of a special picture processing computer which it has been necessary to develop. 3 references.

  14. Multi-processor system for real-time flow estimation in medical ultrasound imaging

    DEFF Research Database (Denmark)

    Stetson, Paul F.; Jensen, Jesper Lomborg; Antonius, Peter

    1997-01-01

    the processed data. The generous bandwidth of the links makes it easy to balance the computational load among the processors.In order to manage the shared system memory and to make use of the parallel processing capabilities of the system, a real-time multitasking kernel has been developed. The kernel uses...

  15. The numerical parallel computing of photon transport

    International Nuclear Information System (INIS)

    Huang Qingnan; Liang Xiaoguang; Zhang Lifa

    1998-12-01

    The parallel computing of photon transport is investigated, the parallel algorithm and the parallelization of programs on parallel computers both with shared memory and with distributed memory are discussed. By analyzing the inherent law of the mathematics and physics model of photon transport according to the structure feature of parallel computers, using the strategy of 'to divide and conquer', adjusting the algorithm structure of the program, dissolving the data relationship, finding parallel liable ingredients and creating large grain parallel subtasks, the sequential computing of photon transport into is efficiently transformed into parallel and vector computing. The program was run on various HP parallel computers such as the HY-1 (PVP), the Challenge (SMP) and the YH-3 (MPP) and very good parallel speedup has been gotten

  16. HEP - A semaphore-synchronized multiprocessor with central control. [Heterogeneous Element Processor

    Science.gov (United States)

    Gilliland, M. C.; Smith, B. J.; Calvert, W.

    1976-01-01

    The paper describes the design concept of the Heterogeneous Element Processor (HEP), a system tailored to the special needs of scientific simulation. In order to achieve high-speed computation required by simulation, HEP features a hierarchy of processes executing in parallel on a number of processors, with synchronization being largely accomplished by hardware. A full-empty-reserve scheme of synchronization is realized by zero-one-valued hardware semaphores. A typical system has, besides the control computer and the scheduler, an algebraic module, a memory module, a first-in first-out (FIFO) module, an integrator module, and an I/O module. The architecture of the scheduler and the algebraic module is examined in detail.

  17. High performance parallel computers for science

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.

    1989-01-01

    This paper reports that Fermilab's Advanced Computer Program (ACP) has been developing cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 Mflops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction

  18. The Computational Physics Program of the national MFE Computer Center

    International Nuclear Information System (INIS)

    Mirin, A.A.

    1989-01-01

    Since June 1974, the MFE Computer Center has been engaged in a significant computational physics effort. The principal objective of the Computational Physics Group is to develop advanced numerical models for the investigation of plasma phenomena and the simulation of present and future magnetic confinement devices. Another major objective of the group is to develop efficient algorithms and programming techniques for current and future generations of supercomputers. The Computational Physics Group has been involved in several areas of fusion research. One main area is the application of Fokker-Planck/quasilinear codes to tokamaks. Another major area is the investigation of resistive magnetohydrodynamics in three dimensions, with applications to tokamaks and compact toroids. A third area is the investigation of kinetic instabilities using a 3-D particle code; this work is often coupled with the task of numerically generating equilibria which model experimental devices. Ways to apply statistical closure approximations to study tokamak-edge plasma turbulence have been under examination, with the hope of being able to explain anomalous transport. Also, we are collaborating in an international effort to evaluate fully three-dimensional linear stability of toroidal devices. In addition to these computational physics studies, the group has developed a number of linear systems solvers for general classes of physics problems and has been making a major effort at ascertaining how to efficiently utilize multiprocessor computers. A summary of these programs are included in this paper. 6 tabs

  19. Centralized multiprocessor control system for the frascati storage rings DAΦNE

    International Nuclear Information System (INIS)

    Di Pirro, G.; Milardi, C.; Serio, M.

    1992-01-01

    We describe the status of the DANTE (DAΦne New Tools Environment) control system for the new DAΦNE Φ-factory under construction at the Frascati National Laboratories. The system is based on a centralized communication architecture for simplicity and reliability. A central processor unit coordinates all communications between the consoles and the lower level distributed processing power, and continuously updates a central memory that contains the whole machine status. We have developed a system of VME Fiber Optic interfaces allowing very fast point to point communication between distant processors. Macintosh II personal computers are used as consoles. The lower levels are all built using the VME standard. (author)

  20. Optimizing survivability of multi-state systems with multi-level protection by multi-processor genetic algorithm

    International Nuclear Information System (INIS)

    Levitin, Gregory; Dai Yuanshun; Xie Min; Leng Poh, Kim

    2003-01-01

    In this paper we consider vulnerable systems which can have different states corresponding to different combinations of available elements composing the system. Each state can be characterized by a performance rate, which is the quantitative measure of a system's ability to perform its task. Both the impact of external factors (stress) and internal causes (failures) affect system survivability, which is determined as probability of meeting a given demand. In order to increase the survivability of the system, a multi-level protection is applied to its subsystems. This means that a subsystem and its inner level of protection are in their turn protected by the protection of an outer level. This double-protected subsystem has its outer protection and so forth. In such systems, the protected subsystems can be destroyed only if all of the levels of their protection are destroyed. Each level of protection can be destroyed only if all of the outer levels of protection are destroyed. We formulate the problem of finding the structure of series-parallel multi-state system (including choice of system elements, choice of structure of multi-level protection and choice of protection methods) in order to achieve a desired level of system survivability by the minimal cost. An algorithm based on the universal generating function method is used for determination of the system survivability. A multi-processor version of genetic algorithm is used as optimization tool in order to solve the structure optimization problem. An application example is presented to illustrate the procedure presented in this paper

  1. A universal multiprocessor system for the fast acquisition and processing of positron camera data

    International Nuclear Information System (INIS)

    Deluigi, B.

    1982-01-01

    In this study the main components of a suitable detection system were worked out, and their properties were examined. For the measurement of the three-dimensional distribution of radiopharmaka marked by positron emitters in animal-experimental studies first a positron camera was constructed. For the detection of the annihilation quanta serve two opposite lying position-sensitive gamma detectors which are derived in coincidence. Two commercial camera heads working according to the Anger principle were reconstructed for these purposes and switched together by a special interface to the positron camera. By this arrangement a spatial resolution of 0.8 cm FWHM for a line source in the symmetry plane and a coincidence resolution time 2T of 16ns FW0.1M was reached. For the three-dimensional image reconstruction with the data of a positron camera a maximum-likelihood procedure was developed and tested by a Monte Carlo procedure. In view of this application an at most flexible multi-microprocessor system was developed. A high computing capacity is reached owing to the fact that several partial problems are distributed to different processors and are processed parallely. The architecture was so scheduled that the system possesses a high error tolerance and that the computing capacity can be extended without a principal limit. (orig./HSI) [de

  2. An introduction to programming multiple-processor computers

    International Nuclear Information System (INIS)

    Hicks, H.R.; Lynch, V.E.

    1986-01-01

    Fortran applications programs can be executed on multiprocessor computers in either a unitasking (traditional) or multitasking form. The later allows a single job to use more than one processor simultaneously, with a consequent reduction in elapsed time and, perhaps, the cost of the calculation. An introduction to programming in this environment is presented. The concept of synchronization and data sharing using EVENTS and LOCKS are illustrated with examples. The strategy of strong synchronization and the use of synchronization templates are proposed. We emphasize that incorrect multitasking programs can produce irreducible results, which makes debugging more difficult

  3. A heterogeneous hierarchical architecture for real-time computing

    Energy Technology Data Exchange (ETDEWEB)

    Skroch, D.A.; Fornaro, R.J.

    1988-12-01

    The need for high-speed data acquisition and control algorithms has prompted continued research in the area of multiprocessor systems and related programming techniques. The result presented here is a unique hardware and software architecture for high-speed real-time computer systems. The implementation of a prototype of this architecture has required the integration of architecture, operating systems and programming languages into a cohesive unit. This report describes a Heterogeneous Hierarchial Architecture for Real-Time (H{sup 2} ART) and system software for program loading and interprocessor communication.

  4. Parallel computation of aerodynamic influence coefficients for aeroelastic analysis on a transputer network

    Science.gov (United States)

    Janetzke, D. C.; Murthy, D. V.

    1991-01-01

    Aeroelastic analysis is mult-disciplinary and computationally expensive. Hence, it can greatly benefit from parallel processing. As part of an effort to develop an aeroelastic analysis capability on a distributed-memory transputer network, a parallel algorithm for the computation of aerodynamic influence coefficients is implemented on a network of 32 transputers. The aerodynamic influence coefficients are calculated using a three-dimensional unsteady aerodynamic model and a panel discretization. Efficiencies up to 85 percent are demonstrated using 32 processors. The effects of subtask ordering, problem size and network topology are presented. A comparison to results on a shared-memory computer indicates that higher speedup is achieved on the distributed-memory system.

  5. Implementations of BLAST for parallel computers.

    Science.gov (United States)

    Jülich, A

    1995-02-01

    The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.

  6. High performance parallel computers for science: New developments at the Fermilab advanced computer program

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.

    1988-08-01

    Fermilab's Advanced Computer Program (ACP) has been developing highly cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 MFlops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction. 10 refs., 7 figs

  7. Asynchronous and corrected-asynchronous numerical solutions of parabolic PDES on MIMD multiprocessors

    Science.gov (United States)

    Amitai, Dganit; Averbuch, Amir; Itzikowitz, Samuel; Turkel, Eli

    1991-01-01

    A major problem in achieving significant speed-up on parallel machines is the overhead involved with synchronizing the concurrent process. Removing the synchronization constraint has the potential of speeding up the computation. The authors present asynchronous (AS) and corrected-asynchronous (CA) finite difference schemes for the multi-dimensional heat equation. Although the discussion concentrates on the Euler scheme for the solution of the heat equation, it has the potential for being extended to other schemes and other parabolic partial differential equations (PDEs). These schemes are analyzed and implemented on the shared memory multi-user Sequent Balance machine. Numerical results for one and two dimensional problems are presented. It is shown experimentally that the synchronization penalty can be about 50 percent of run time: in most cases, the asynchronous scheme runs twice as fast as the parallel synchronous scheme. In general, the efficiency of the parallel schemes increases with processor load, with the time level, and with the problem dimension. The efficiency of the AS may reach 90 percent and over, but it provides accurate results only for steady-state values. The CA, on the other hand, is less efficient, but provides more accurate results for intermediate (non steady-state) values.

  8. A Self Consistent Multiprocessor Space Charge Algorithm that is Almost Embarrassingly Parallel

    International Nuclear Information System (INIS)

    Nissen, Edward; Erdelyi, B.; Manikonda, S.L.

    2012-01-01

    We present a space charge code that is self consistent, massively parallelizeable, and requires very little communication between computer nodes; making the calculation almost embarrassingly parallel. This method is implemented in the code COSY Infinity where the differential algebras used in this code are important to the algorithm's proper functioning. The method works by calculating the self consistent space charge distribution using the statistical moments of the test particles, and converting them into polynomial series coefficients. These coefficients are combined with differential algebraic integrals to form the potential, and electric fields. The result is a map which contains the effects of space charge. This method allows for massive parallelization since its statistics based solver doesn't require any binning of particles, and only requires a vector containing the partial sums of the statistical moments for the different nodes to be passed. All other calculations are done independently. The resulting maps can be used to analyze the system using normal form analysis, as well as advance particles in numbers and at speeds that were previously impossible.

  9. Sierra toolkit computational mesh conceptual model

    International Nuclear Information System (INIS)

    Baur, David G.; Edwards, Harold Carter; Cochran, William K.; Williams, Alan B.; Sjaardema, Gregory D.

    2010-01-01

    The Sierra Toolkit computational mesh is a software library intended to support massively parallel multi-physics computations on dynamically changing unstructured meshes. This domain of intended use is inherently complex due to distributed memory parallelism, parallel scalability, heterogeneity of physics, heterogeneous discretization of an unstructured mesh, and runtime adaptation of the mesh. Management of this inherent complexity begins with a conceptual analysis and modeling of this domain of intended use; i.e., development of a domain model. The Sierra Toolkit computational mesh software library is designed and implemented based upon this domain model. Software developers using, maintaining, or extending the Sierra Toolkit computational mesh library must be familiar with the concepts/domain model presented in this report.

  10. The computational physics program of the National MFE Computer Center

    International Nuclear Information System (INIS)

    Mirin, A.A.

    1988-01-01

    The principal objective of the Computational Physics Group is to develop advanced numerical models for the investigation of plasma phenomena and the simulation of present and future magnetic confinement devices. Another major objective of the group is to develop efficient algorithms and programming techniques for current and future generation of supercomputers. The computational physics group is involved in several areas of fusion research. One main area is the application of Fokker-Planck/quasilinear codes to tokamaks. Another major area is the investigation of resistive magnetohydrodynamics in three dimensions, with applications to compact toroids. Another major area is the investigation of kinetic instabilities using a 3-D particle code. This work is often coupled with the task of numerically generating equilibria which model experimental devices. Ways to apply statistical closure approximations to study tokamak-edge plasma turbulence are being examined. In addition to these computational physics studies, the group has developed a number of linear systems solvers for general classes of physics problems and has been making a major effort at ascertaining how to efficiently utilize multiprocessor computers

  11. Study of an analog/logic processor for the design of an auto patch hybrid computer

    International Nuclear Information System (INIS)

    Koched, Hassen

    1976-01-01

    This paper presents the experimental study of an analog multiprocessor designed at SES/CEN-Saclay. An application of such a device as a basic component of an auto-patch hybrid computer is presented. First, the description of the processor, and a presentation of the theoretical concepts which governed the design of the processor are given. Experiments on an hybrid computer are then presented. Finally, different systems of automatic patching are presented, and conveniently modified, for the use of such a processor. (author) [fr

  12. Visualization and Data Analysis for High-Performance Computing

    Energy Technology Data Exchange (ETDEWEB)

    Sewell, Christopher Meyer [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-09-27

    This is a set of slides from a guest lecture for a class at the University of Texas, El Paso on visualization and data analysis for high-performance computing. The topics covered are the following: trends in high-performance computing; scientific visualization, such as OpenGL, ray tracing and volume rendering, VTK, and ParaView; data science at scale, such as in-situ visualization, image databases, distributed memory parallelism, shared memory parallelism, VTK-m, "big data", and then an analysis example.

  13. Parallel implementations of 2D explicit Euler solvers

    International Nuclear Information System (INIS)

    Giraud, L.; Manzini, G.

    1996-01-01

    In this work we present a subdomain partitioning strategy applied to an explicit high-resolution Euler solver. We describe the design of a portable parallel multi-domain code suitable for parallel environments. We present several implementations on a representative range of MlMD computers that include shared memory multiprocessors, distributed virtual shared memory computers, as well as networks of workstations. Computational results are given to illustrate the efficiency, the scalability, and the limitations of the different approaches. We discuss also the effect of the communication protocol on the optimal domain partitioning strategy for the distributed memory computers

  14. Evict on write, a management strategy for a prefetch unit and/or first level cache in a multiprocessor system with speculative execution

    Science.gov (United States)

    Gara, Alan; Ohmacht, Martin

    2014-09-16

    In a multiprocessor system with at least two levels of cache, a speculative thread may run on a core processor in parallel with other threads. When the thread seeks to do a write to main memory, this access is to be written through the first level cache to the second level cache. After the write though, the corresponding line is deleted from the first level cache and/or prefetch unit, so that any further accesses to the same location in main memory have to be retrieved from the second level cache. The second level cache keeps track of multiple versions of data, where more than one speculative thread is running in parallel, while the first level cache does not have any of the versions during speculation. A switch allows choosing between modes of operation of a speculation blind first level cache.

  15. TCW: transcriptome computational workbench.

    Science.gov (United States)

    Soderlund, Carol; Nelson, William; Willer, Mark; Gang, David R

    2013-01-01

    The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. The Transcriptome Computational Workbench (TCW) provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms). The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina) or assembling long sequences (e.g. Sanger, 454, transcripts), annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the transcriptome. TCW is freely available from www.agcol.arizona.edu/software/tcw.

  16. Refficientlib: an efficient load-rebalanced adaptive mesh refinement algorithm for high-performance computational physics meshes

    OpenAIRE

    Baiges Aznar, Joan; Bayona Roa, Camilo Andrés

    2017-01-01

    No separate or additional fees are collected for access to or distribution of the work. In this paper we present a novel algorithm for adaptive mesh refinement in computational physics meshes in a distributed memory parallel setting. The proposed method is developed for nodally based parallel domain partitions where the nodes of the mesh belong to a single processor, whereas the elements can belong to multiple processors. Some of the main features of the algorithm presented in this paper a...

  17. Particle orbit tracking on a parallel computer: Hypertrack

    International Nuclear Information System (INIS)

    Cole, B.; Bourianoff, G.; Pilat, F.; Talman, R.

    1991-05-01

    A program has been written which performs particle orbit tracking on the Intel iPSC/860 distributed memory parallel computer. The tracking is performed using a thin element approach. A brief description of the structure and performance of the code is presented, along with applications of the code to the analysis of accelerator lattices for the SSC. The concept of ''ensemble tracking'', i.e. the tracking of ensemble averages of noninteracting particles, such as the emittance, is presented. Preliminary results of such studies will be presented. 2 refs., 6 figs

  18. Super computer displays future of the earth

    International Nuclear Information System (INIS)

    Yokokawa, Mitsuo; Tani, Keiji

    2000-01-01

    Science and Technology Agency has promoted a project of estimation of the earth environment fluctuation since Fiscal 1997. As one of series, it is developing a very high speed parallel computer 'the earth 'simulator' with 5TFLOPS of effective performance (40TFLOPS of peak performance). Abstract of the hardware, basic software and application software is explained. Hardware is constructed by a distributed memory type parallel computer and single-stage crossbars network. Main storage capacity is 10 TB. The basic software consisted of hierarchical structure with operating system, compiler, operation and management software. In the earth simulator, 640 nodes are connected by magnetic disk units, so that input/output of calculation is parallel processor, the most important development item. The earth simulator project is developing a software, NJR (NASDA-JAMSTEC-RIST) program, which is atmosphere and ocean large circulation joint model library system. An example of analysis showed a global distribution of rain a day in the earth. (S.Y.)

  19. Fermilab advanced computer program multi-microprocessor project

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Biel, J.

    1985-06-01

    Fermilab's Advanced Computer Program is constructing a powerful 128 node multi-microprocessor system for data analysis in high-energy physics. The system will use commercial 32-bit microprocessors programmed in Fortran-77. Extensive software supports easy migration of user applications from a uniprocessor environment to the multiprocessor and provides sophisticated program development, debugging, and error handling and recovery tools. This system is designed to be readily copied, providing computing cost effectiveness of below $2200 per VAX 11/780 equivalent. The low cost, commercial availability, compatibility with off-line analysis programs, and high data bandwidths (up to 160 MByte/sec) make the system an ideal choice for applications to on-line triggers as well as an offline data processor

  20. Remarks on the development of a multiblock three-dimensional Euler code for out of core and multiprocessor calculations

    International Nuclear Information System (INIS)

    Jameson, A.; Leicher, S.; Dawson, J.; Tel Aviv Univ., Israel)

    1985-01-01

    A multiblock modification of the FLO57 code for three-dimensional wing calculations is described and demonstrated. The theoretical basis of the multistage time-stepping algorithm is reviewed; the multiblock grid structure is explained; and results from a computation of vortical flow past a delta wing, using 2.5 x 10 to the 6th grid points and performed on a Cray X/MP computer with a 128-Mword solid-state storage device, are presented graphically. 6 references

  1. A RISC multiprocessor event trigger for the data acquisition system of the H1 experiment at HERA

    International Nuclear Information System (INIS)

    Campbell, A.J.

    1991-09-01

    In late 1991 HERA will for the first time collide stored beams of electrons and protons. This paper describes the multiple (RISC) modern reduced instruction set processor system for online event filtering and reconstruction installed within the data acquisition system of the H1 experiment. Data is processed at a continuous average rate of ∼ 6 Mbytes/s in parallel by ∼ 20 R3000 VMEbus based monoboard computers providing some 400 mips computing power. (author)

  2. A RISC multiprocessor event trigger for the data acquisition system of the H1 experiment at HERA

    International Nuclear Information System (INIS)

    Campbell, A.J.

    1992-01-01

    In late 1991 HERA will collide for the first time stored beams of electrons and protons. This paper describes the multiple RISC processor system for online event filtering and reconstruction installed within the data acquisition system of the H1 experiment. Data is processed at a continuous average rate of ∼6 Mbytes/s in parallel by ∼20 R3000 VMEbus based monoboard computers providing some 400 MIPS computing power

  3. Building the Teraflops/Petabytes Production Computing Center

    International Nuclear Information System (INIS)

    Kramer, William T.C.; Lucas, Don; Simon, Horst D.

    1999-01-01

    In just one decade, the 1990s, supercomputer centers have undergone two fundamental transitions which require rethinking their operation and their role in high performance computing. The first transition in the early to mid-1990s resulted from a technology change in high performance computing architecture. Highly parallel distributed memory machines built from commodity parts increased the operational complexity of the supercomputer center, and required the introduction of intellectual services as equally important components of the center. The second transition is happening in the late 1990s as centers are introducing loosely coupled clusters of SMPs as their premier high performance computing platforms, while dealing with an ever-increasing volume of data. In addition, increasing network bandwidth enables new modes of use of a supercomputer center, in particular, computational grid applications. In this paper we describe what steps NERSC is taking to address these issues and stay at the leading edge of supercomputing centers.; N

  4. Multicore Challenges and Benefits for High Performance Scientific Computing

    Directory of Open Access Journals (Sweden)

    Ida M.B. Nielsen

    2008-01-01

    Full Text Available Until recently, performance gains in processors were achieved largely by improvements in clock speeds and instruction level parallelism. Thus, applications could obtain performance increases with relatively minor changes by upgrading to the latest generation of computing hardware. Currently, however, processor performance improvements are realized by using multicore technology and hardware support for multiple threads within each core, and taking full advantage of this technology to improve the performance of applications requires exposure of extreme levels of software parallelism. We will here discuss the architecture of parallel computers constructed from many multicore chips as well as techniques for managing the complexity of programming such computers, including the hybrid message-passing/multi-threading programming model. We will illustrate these ideas with a hybrid distributed memory matrix multiply and a quantum chemistry algorithm for energy computation using Møller–Plesset perturbation theory.

  5. Parallel Computing Strategies for Irregular Algorithms

    Science.gov (United States)

    Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

  6. Compiling for Application Specific Computational Acceleration in Reconfigurable Architectures Final Report CRADA No. TSB-2033-01

    Energy Technology Data Exchange (ETDEWEB)

    De Supinski, B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Caliga, D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2017-09-28

    The primary objective of this project was to develop memory optimization technology to efficiently deliver data to, and distribute data within, the SRC-6's Field Programmable Gate Array- ("FPGA") based Multi-Adaptive Processors (MAPs). The hardware/software approach was to explore efficient MAP configurations and generate the compiler technology to exploit those configurations. This memory accessing technology represents an important step towards making reconfigurable symmetric multi-processor (SMP) architectures that will be a costeffective solution for large-scale scientific computing.

  7. Development of a parallel DBMS on the basis of PostgreSQL

    OpenAIRE

    Pan, C.

    2011-01-01

    The paper describes the architecture and the design of PargreSQL parallel database management system (DBMS) for distributed memory multiprocessors. PargreSQL is based upon PostgreSQL open-source DBMS and exploits partitioned parallelism.

  8. Centaure: an heterogeneous parallel architecture for computer vision

    International Nuclear Information System (INIS)

    Peythieux, Marc

    1997-01-01

    This dissertation deals with the architecture of parallel computers dedicated to computer vision. In the first chapter, the problem to be solved is presented, as well as the architecture of the Sympati and Symphonie computers, on which this work is based. The second chapter is about the state of the art of computers and integrated processors that can execute computer vision and image processing codes. The third chapter contains a description of the architecture of Centaure. It has an heterogeneous structure: it is composed of a multiprocessor system based on Analog Devices ADSP21060 Sharc digital signal processor, and of a set of Symphonie computers working in a multi-SIMD fashion. Centaure also has a modular structure. Its basic node is composed of one Symphonie computer, tightly coupled to a Sharc thanks to a dual ported memory. The nodes of Centaure are linked together by the Sharc communication links. The last chapter deals with a performance validation of Centaure. The execution times on Symphonie and on Centaure of a benchmark which is typical of industrial vision, are presented and compared. In the first place, these results show that the basic node of Centaure allows a faster execution than Symphonie, and that increasing the size of the tested computer leads to a better speed-up with Centaure than with Symphonie. In the second place, these results validate the choice of running the low level structure of Centaure in a multi- SIMD fashion. (author) [fr

  9. Distributed Memory Programming on Many-Cores

    DEFF Research Database (Denmark)

    Berthold, Jost; Dieterle, Mischa; Lobachev, Oleg

    2009-01-01

    Eden is a parallel extension of the lazy functional language Haskell providing dynamic process creation and automatic data exchange. As a Haskell extension, Eden takes a high-level approach to parallel programming and thereby simplifies parallel program development. The current implementation is ...

  10. TCW: transcriptome computational workbench.

    Directory of Open Access Journals (Sweden)

    Carol Soderlund

    Full Text Available BACKGROUND: The analysis of transcriptome data involves many steps and various programs, along with organization of large amounts of data and results. Without a methodical approach for storage, analysis and query, the resulting ad hoc analysis can lead to human error, loss of data and results, inefficient use of time, and lack of verifiability, repeatability, and extensibility. METHODOLOGY: The Transcriptome Computational Workbench (TCW provides Java graphical interfaces for methodical analysis for both single and comparative transcriptome data without the use of a reference genome (e.g. for non-model organisms. The singleTCW interface steps the user through importing transcript sequences (e.g. Illumina or assembling long sequences (e.g. Sanger, 454, transcripts, annotating the sequences, and performing differential expression analysis using published statistical programs in R. The data, metadata, and results are stored in a MySQL database. The multiTCW interface builds a comparison database by importing sequence and annotation from one or more single TCW databases, executes the ESTscan program to translate the sequences into proteins, and then incorporates one or more clusterings, where the clustering options are to execute the orthoMCL program, compute transitive closure, or import clusters. Both singleTCW and multiTCW allow extensive query and display of the results, where singleTCW displays the alignment of annotation hits to transcript sequences, and multiTCW displays multiple transcript alignments with MUSCLE or pairwise alignments. The query programs can be executed on the desktop for fastest analysis, or from the web for sharing the results. CONCLUSION: It is now affordable to buy a multi-processor machine, and easy to install Java and MySQL. By simply downloading the TCW, the user can interactively analyze, query and view their data. The TCW allows in-depth data mining of the results, which can lead to a better understanding of the

  11. High-reliability logic system evaluation of a programmed multiprocessor solution. Application in the nuclear reactor safety field

    International Nuclear Information System (INIS)

    Lallement, Dominique.

    1979-01-01

    Nuclear reactors are monitored by several systems combined. The hydraulic and mechanical limitations on the equipment and the heat transfer requirements in the core set a reliable working range for the boiler defined with certain safety margins. The control system tends to keep the power plant within this working range. The protection system covers all the electrical and mechanical equipment needed to safeguard the boiler in the event of abnormal transients or accidents accounted for in the design of the plant. On units in service protection is handled by cabled automatic systems. For better reliability and safety operation, greater flexibility of use (modularity, adaptability) and improved start-up criteria by data processing the tendency is to use digital programmed systems. Computers are already present in control systems but their introduction into protection systems meets with some reticence on the part of the nuclear safety authorities. A study on the replacement of conventional by digital protection systems is presented. From choices partly made on the principles which should govern the hardware and software of a protection system the reliability of different structures and elements was examined and an experimental model built with its simulator and test facilities. A prototype based on these options and studies is being built and is to be set up on one of the CEN-G reactors for tests [fr

  12. Contributing to the design of run-time systems dedicated to high performance computing

    International Nuclear Information System (INIS)

    Perache, M.

    2006-10-01

    In the field of intensive scientific computing, the quest for performance has to face the increasing complexity of parallel architectures. Nowadays, these machines exhibit a deep memory hierarchy which complicates the design of efficient parallel applications. This thesis proposes a programming environment allowing to design efficient parallel programs on top of clusters of multi-processors. It features a programming model centered around collective communications and synchronizations, and provides load balancing facilities. The programming interface, named MPC, provides high level paradigms which are optimized according to the underlying architecture. The environment is fully functional and used within the CEA/DAM (TERANOVA) computing center. The evaluations presented in this document confirm the relevance of our approach. (author)

  13. Optical interconnection networks for high-performance computing systems

    International Nuclear Information System (INIS)

    Biberman, Aleksandr; Bergman, Keren

    2012-01-01

    Enabled by silicon photonic technology, optical interconnection networks have the potential to be a key disruptive technology in computing and communication industries. The enduring pursuit of performance gains in computing, combined with stringent power constraints, has fostered the ever-growing computational parallelism associated with chip multiprocessors, memory systems, high-performance computing systems and data centers. Sustaining these parallelism growths introduces unique challenges for on- and off-chip communications, shifting the focus toward novel and fundamentally different communication approaches. Chip-scale photonic interconnection networks, enabled by high-performance silicon photonic devices, offer unprecedented bandwidth scalability with reduced power consumption. We demonstrate that the silicon photonic platforms have already produced all the high-performance photonic devices required to realize these types of networks. Through extensive empirical characterization in much of our work, we demonstrate such feasibility of waveguides, modulators, switches and photodetectors. We also demonstrate systems that simultaneously combine many functionalities to achieve more complex building blocks. We propose novel silicon photonic devices, subsystems, network topologies and architectures to enable unprecedented performance of these photonic interconnection networks. Furthermore, the advantages of photonic interconnection networks extend far beyond the chip, offering advanced communication environments for memory systems, high-performance computing systems, and data centers. (review article)

  14. Parallel Computer System for 3D Visualization Stereo on GPU

    Science.gov (United States)

    Al-Oraiqat, Anas M.; Zori, Sergii A.

    2018-03-01

    This paper proposes the organization of a parallel computer system based on Graphic Processors Unit (GPU) for 3D stereo image synthesis. The development is based on the modified ray tracing method developed by the authors for fast search of tracing rays intersections with scene objects. The system allows significant increase in the productivity for the 3D stereo synthesis of photorealistic quality. The generalized procedure of 3D stereo image synthesis on the Graphics Processing Unit/Graphics Processing Clusters (GPU/GPC) is proposed. The efficiency of the proposed solutions by GPU implementation is compared with single-threaded and multithreaded implementations on the CPU. The achieved average acceleration in multi-thread implementation on the test GPU and CPU is about 7.5 and 1.6 times, respectively. Studying the influence of choosing the size and configuration of the computational Compute Unified Device Archi-tecture (CUDA) network on the computational speed shows the importance of their correct selection. The obtained experimental estimations can be significantly improved by new GPUs with a large number of processing cores and multiprocessors, as well as optimized configuration of the computing CUDA network.

  15. Cross-scale Efficient Tensor Contractions for Coupled Cluster Computations Through Multiple Programming Model Backends

    Energy Technology Data Exchange (ETDEWEB)

    Ibrahim, Khaled Z. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Epifanovsky, Evgeny [Q-Chem, Inc., Pleasanton, CA (United States); Williams, Samuel W. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Krylov, Anna I. [Univ. of Southern California, Los Angeles, CA (United States). Dept. of Chemistry

    2016-07-26

    Coupled-cluster methods provide highly accurate models of molecular structure by explicit numerical calculation of tensors representing the correlation between electrons. These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations. While based on matrix-matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation and the complexity of contractions. The resulting algorithms are irregular and their parallelization has been previously achieved via the use of dynamic scheduling or specialized data decompositions. We introduce our efforts to extend the Libtensor framework to work in the distributed memory environment in a scalable and energy efficient manner. We achieve up to 240 speedup compared with the best optimized shared memory implementation. We attain scalability to hundreds of thousands of compute cores on three distributed-memory architectures, (Cray XC30&XC40, BlueGene/Q), and on a heterogeneous GPU-CPU system (Cray XK7). As the bottlenecks shift from being compute-bound DGEMM's to communication-bound collectives as the size of the molecular system scales, we adopt two radically different parallelization approaches for handling load-imbalance. Nevertheless, we preserve a uni ed interface to both programming models to maintain the productivity of computational quantum chemists.

  16. Wing-Body Aeroelasticity Using Finite-Difference Fluid/Finite-Element Structural Equations on Parallel Computers

    Science.gov (United States)

    Byun, Chansup; Guruswamy, Guru P.; Kutler, Paul (Technical Monitor)

    1994-01-01

    In recent years significant advances have been made for parallel computers in both hardware and software. Now parallel computers have become viable tools in computational mechanics. Many application codes developed on conventional computers have been modified to benefit from parallel computers. Significant speedups in some areas have been achieved by parallel computations. For single-discipline use of both fluid dynamics and structural dynamics, computations have been made on wing-body configurations using parallel computers. However, only a limited amount of work has been completed in combining these two disciplines for multidisciplinary applications. The prime reason is the increased level of complication associated with a multidisciplinary approach. In this work, procedures to compute aeroelasticity on parallel computers using direct coupling of fluid and structural equations will be investigated for wing-body configurations. The parallel computer selected for computations is an Intel iPSC/860 computer which is a distributed-memory, multiple-instruction, multiple data (MIMD) computer with 128 processors. In this study, the computational efficiency issues of parallel integration of both fluid and structural equations will be investigated in detail. The fluid and structural domains will be modeled using finite-difference and finite-element approaches, respectively. Results from the parallel computer will be compared with those from the conventional computers using a single processor. This study will provide an efficient computational tool for the aeroelastic analysis of wing-body structures on MIMD type parallel computers.

  17. Recent Developments in Computed Tomography

    International Nuclear Information System (INIS)

    Braunstein, D.; Dafni, E.; Levene, S.; Malamud, G.; Shapiro, O.; Shechter, G.; Zahavi, O.

    1999-01-01

    Computerized Tomography. has become, during the past few years, one of the mostly used apparatus in X-ray diagnosis. Its clinical applications has penetrated to various fields, like operational guidance, cardiac imaging, computer aided surgery etc. The first second-generation CT scanners consisted of a rotate-rotate system detectors array and an X-ray tube. These scanners were capable of acquiring individual single slices, the duration of each being several seconds. The slow scanning rate, and the then poor computers power, limited the application range of these scanners, to relatively stable organs, short body coverage at given resolutions. Further drawbacks of these machines were weak X-ray sources and low efficiency gas detectors. In the late 80's the first helical scanners were introduced by Siemens. Based on a continuous patient couch movement during gantry rotation, much faster scans could be obtained, increasing significantly the volume coverage at a given time. In 1992 the first dual-slice scanners, equipped with high efficiency solid state detectors were introduced by Elscint. The acquisition of data simultaneously from two detector arrays doubled the efficiency of the scan. Faster computers and stronger X-ray sources further improved the performance, allowing for a new range of clinical applications. Yet, the need for even faster machines and bigger volume coverage led to further R and D efforts by the leading CT manufacturers. In order to accomplish the most demanding clinical needs, innovative 2 dimensional 4-rows solid-state detector arrays were developed, together with faster rotating machines and bigger X-ray tubes, all demanding extremely accurate and robust mechanical constructions. Parallel, multi-processor custom computers were made, in order to allow the on-line reconstruction of the growing amounts of raw data. Four-slice helical scanners, rotating at 0.5 sec per cycle are being tested nowadays in several clinics all over the world. This talk

  18. VLSI Based Multiprocessor Communications Networks.

    Science.gov (United States)

    1982-09-01

    Networks". The contract began on September 1,1980 and was approved on scientific /technical grounds for a duration of three years. Incremental funding was...values for the individual delays will vary from comunicating modules (ij) are shown in Figure 4 module to module due to processing and fabrication

  19. Parallel computing simulation of fluid flow in the unsaturated zone of Yucca Mountain, Nevada

    International Nuclear Information System (INIS)

    Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G.S.

    2001-01-01

    This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-one-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the one-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models

  20. MEDUSA - An overset grid flow solver for network-based parallel computer systems

    Science.gov (United States)

    Smith, Merritt H.; Pallis, Jani M.

    1993-01-01

    Continuing improvement in processing speed has made it feasible to solve the Reynolds-Averaged Navier-Stokes equations for simple three-dimensional flows on advanced workstations. Combining multiple workstations into a network-based heterogeneous parallel computer allows the application of programming principles learned on MIMD (Multiple Instruction Multiple Data) distributed memory parallel computers to the solution of larger problems. An overset-grid flow solution code has been developed which uses a cluster of workstations as a network-based parallel computer. Inter-process communication is provided by the Parallel Virtual Machine (PVM) software. Solution speed equivalent to one-third of a Cray-YMP processor has been achieved from a cluster of nine commonly used engineering workstation processors. Load imbalance and communication overhead are the principal impediments to parallel efficiency in this application.

  1. Distributed computing feasibility in a non-dedicated homogeneous distributed system

    Science.gov (United States)

    Leutenegger, Scott T.; Sun, Xian-He

    1993-01-01

    The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. The feasibility of such a non-dedicated parallel processing environment assuming workstation processes have preemptive priority over parallel tasks is addressed. An analytical model is developed to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. A new term task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. It was proposed that task ratio is a useful metric for determining how large the demand of a parallel applications must be in order to make efficient use of a non-dedicated distributed system.

  2. Computer Sciences and Data Systems, volume 1

    Science.gov (United States)

    1987-01-01

    Topics addressed include: software engineering; university grants; institutes; concurrent processing; sparse distributed memory; distributed operating systems; intelligent data management processes; expert system for image analysis; fault tolerant software; and architecture research.

  3. STEMsalabim: A high-performance computing cluster friendly code for scanning transmission electron microscopy image simulations of thin specimens

    International Nuclear Information System (INIS)

    Oelerich, Jan Oliver; Duschek, Lennart; Belz, Jürgen; Beyer, Andreas; Baranovskii, Sergei D.; Volz, Kerstin

    2017-01-01

    Highlights: • We present STEMsalabim, a modern implementation of the multislice algorithm for simulation of STEM images. • Our package is highly parallelizable on high-performance computing clusters, combining shared and distributed memory architectures. • With STEMsalabim, computationally and memory expensive STEM image simulations can be carried out within reasonable time. - Abstract: We present a new multislice code for the computer simulation of scanning transmission electron microscope (STEM) images based on the frozen lattice approximation. Unlike existing software packages, the code is optimized to perform well on highly parallelized computing clusters, combining distributed and shared memory architectures. This enables efficient calculation of large lateral scanning areas of the specimen within the frozen lattice approximation and fine-grained sweeps of parameter space.

  4. STEMsalabim: A high-performance computing cluster friendly code for scanning transmission electron microscopy image simulations of thin specimens

    Energy Technology Data Exchange (ETDEWEB)

    Oelerich, Jan Oliver, E-mail: jan.oliver.oelerich@physik.uni-marburg.de; Duschek, Lennart; Belz, Jürgen; Beyer, Andreas; Baranovskii, Sergei D.; Volz, Kerstin

    2017-06-15

    Highlights: • We present STEMsalabim, a modern implementation of the multislice algorithm for simulation of STEM images. • Our package is highly parallelizable on high-performance computing clusters, combining shared and distributed memory architectures. • With STEMsalabim, computationally and memory expensive STEM image simulations can be carried out within reasonable time. - Abstract: We present a new multislice code for the computer simulation of scanning transmission electron microscope (STEM) images based on the frozen lattice approximation. Unlike existing software packages, the code is optimized to perform well on highly parallelized computing clusters, combining distributed and shared memory architectures. This enables efficient calculation of large lateral scanning areas of the specimen within the frozen lattice approximation and fine-grained sweeps of parameter space.

  5. Performance evaluation of scientific programs on advanced architecture computers

    International Nuclear Information System (INIS)

    Walker, D.W.; Messina, P.; Baille, C.F.

    1988-01-01

    Recently a number of advanced architecture machines have become commercially available. These new machines promise better cost-performance then traditional computers, and some of them have the potential of competing with current supercomputers, such as the Cray X/MP, in terms of maximum performance. This paper describes an on-going project to evaluate a broad range of advanced architecture computers using a number of complete scientific application programs. The computers to be evaluated include distributed- memory machines such as the NCUBE, INTEL and Caltech/JPL hypercubes, and the MEIKO computing surface, shared-memory, bus architecture machines such as the Sequent Balance and the Alliant, very long instruction word machines such as the Multiflow Trace 7/200 computer, traditional supercomputers such as the Cray X.MP and Cray-2, and SIMD machines such as the Connection Machine. Currently 11 application codes from a number of scientific disciplines have been selected, although it is not intended to run all codes on all machines. Results are presented for two of the codes (QCD and missile tracking), and future work is proposed

  6. Optical Computing

    OpenAIRE

    Woods, Damien; Naughton, Thomas J.

    2008-01-01

    We consider optical computers that encode data using images and compute by transforming such images. We give an overview of a number of such optical computing architectures, including descriptions of the type of hardware commonly used in optical computing, as well as some of the computational efficiencies of optical devices. We go on to discuss optical computing from the point of view of computational complexity theory, with the aim of putting some old, and some very recent, re...

  7. Novel computational approaches for the analysis of cosmic magnetic fields

    Energy Technology Data Exchange (ETDEWEB)

    Saveliev, Andrey [Universitaet Hamburg, Hamburg (Germany); Keldysh Institut, Moskau (Russian Federation)

    2016-07-01

    In order to give a consistent picture of cosmic, i.e. galactic and extragalactic, magnetic fields, different approaches are possible and often even necessary. Here we present three of them: First, a semianalytic analysis of the time evolution of primordial magnetic fields from which their properties and, subsequently, the nature of present-day intergalactic magnetic fields may be deduced. Second, the use of high-performance computing infrastructure by developing powerful algorithms for (magneto-)hydrodynamic simulations and applying them to astrophysical problems. We are currently developing a code which applies kinetic schemes in massive parallel computing on high performance multiprocessor systems in a new way to calculate both hydro- and electrodynamic quantities. Finally, as a third approach, astroparticle physics might be used as magnetic fields leave imprints of their properties on charged particles transversing them. Here we focus on electromagnetic cascades by developing a software based on CRPropa which simulates the propagation of particles from such cascades through the intergalactic medium in three dimensions. This may in particular be used to obtain information about the helicity of extragalactic magnetic fields.

  8. Computer group

    International Nuclear Information System (INIS)

    Bauer, H.; Black, I.; Heusler, A.; Hoeptner, G.; Krafft, F.; Lang, R.; Moellenkamp, R.; Mueller, W.; Mueller, W.F.; Schati, C.; Schmidt, A.; Schwind, D.; Weber, G.

    1983-01-01

    The computer groups has been reorganized to take charge for the general purpose computers DEC10 and VAX and the computer network (Dataswitch, DECnet, IBM - connections to GSI and IPP, preparation for Datex-P). (orig.)

  9. Computer Engineers.

    Science.gov (United States)

    Moncarz, Roger

    2000-01-01

    Looks at computer engineers and describes their job, employment outlook, earnings, and training and qualifications. Provides a list of resources related to computer engineering careers and the computer industry. (JOW)

  10. Radon transform computer

    International Nuclear Information System (INIS)

    Current, W.; Hurst, P.; Ford, G.; Shieh, E.; Agi, I.; Nguyen, C.; Peterson, D.

    1990-01-01

    In this final report, we summarize some of our results from September 1989 to October 1990. The design, construction, and testing of a four-processor prototype multi-processor (RTP) board using TI TMS320C25 DSP chips has been completed. We are now finishing the extensive detailed final documentation of the RTP hardware and software. This extensive documentation will be provided to Steve Azevedo when we return the borrowed workstation and deliver the RTP to LLNL. A summary of the test results are in Section II. The design of our fully custom CMOS VLSI chip has been completed. The chip has been designed, the layout completed, and the chip is now going through its final pre-fabrication simulations. The present status of the custom chip design activity will be summarized in the separately submitted ''Final Report on the ''Real-Time Multi-Dimensional Processing Hardware Designs'' Project.'' Evaluations of the hardware requirements for fast filtering of data for filtered backprojection have been completed and are summarized. We briefly summarize the test results of the TMS320 multi-processor prototype RTP board evaluation

  11. Computer Music

    Science.gov (United States)

    Cook, Perry R.

    This chapter covers algorithms, technologies, computer languages, and systems for computer music. Computer music involves the application of computers and other digital/electronic technologies to music composition, performance, theory, history, and the study of perception. The field combines digital signal processing, computational algorithms, computer languages, hardware and software systems, acoustics, psychoacoustics (low-level perception of sounds from the raw acoustic signal), and music cognition (higher-level perception of musical style, form, emotion, etc.).

  12. Many-core technologies: The move to energy-efficient, high-throughput x86 computing (TFLOPS on a chip)

    CERN Multimedia

    CERN. Geneva

    2012-01-01

    With Moore's Law alive and well, more and more parallelism is introduced into all computing platforms at all levels of integration and programming to achieve higher performance and energy efficiency. Especially in the area of High-Performance Computing (HPC) users can entertain a combination of different hardware and software parallel architectures and programming environments. Those technologies range from vectorization and SIMD computation over shared memory multi-threading (e.g. OpenMP) to distributed memory message passing (e.g. MPI) on cluster systems. We will discuss HPC industry trends and Intel's approach to it from processor/system architectures and research activities to hardware and software tools technologies. This includes the recently announced new Intel(r) Many Integrated Core (MIC) architecture for highly-parallel workloads and general purpose, energy efficient TFLOPS performance, some of its architectural features and its programming environment. At the end we will have a br...

  13. Using high performance interconnects in a distributed computing and mass storage environment

    International Nuclear Information System (INIS)

    Ernst, M.

    1994-01-01

    Detector Collaborations of the HERA Experiments typically involve more than 500 physicists from a few dozen institutes. These physicists require access to large amounts of data in a fully transparent manner. Important issues include Distributed Mass Storage Management Systems in a Distributed and Heterogeneous Computing Environment. At the very center of a distributed system, including tens of CPUs and network attached mass storage peripherals are the communication links. Today scientists are witnessing an integration of computing and communication technology with the open-quote network close-quote becoming the computer. This contribution reports on a centrally operated computing facility for the HERA Experiments at DESY, including Symmetric Multiprocessor Machines (84 Processors), presently more than 400 GByte of magnetic disk and 40 TB of automoted tape storage, tied together by a HIPPI open-quote network close-quote. Focussing on the High Performance Interconnect technology, details will be provided about the HIPPI based open-quote Backplane close-quote configured around a 20 Gigabit/s Multi Media Router and the performance and efficiency of the related computer interfaces

  14. The DANTE Boltzmann transport solver: An unstructured mesh, 3-D, spherical harmonics algorithm compatible with parallel computer architectures

    International Nuclear Information System (INIS)

    McGhee, J.M.; Roberts, R.M.; Morel, J.E.

    1997-01-01

    A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner for scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated

  15. Analog computing

    CERN Document Server

    Ulmann, Bernd

    2013-01-01

    This book is a comprehensive introduction to analog computing. As most textbooks about this powerful computing paradigm date back to the 1960s and 1970s, it fills a void and forges a bridge from the early days of analog computing to future applications. The idea of analog computing is not new. In fact, this computing paradigm is nearly forgotten, although it offers a path to both high-speed and low-power computing, which are in even more demand now than they were back in the heyday of electronic analog computers.

  16. Computational composites

    DEFF Research Database (Denmark)

    Vallgårda, Anna K. A.; Redström, Johan

    2007-01-01

    Computational composite is introduced as a new type of composite material. Arguing that this is not just a metaphorical maneuver, we provide an analysis of computational technology as material in design, which shows how computers share important characteristics with other materials used in design...... and architecture. We argue that the notion of computational composites provides a precise understanding of the computer as material, and of how computations need to be combined with other materials to come to expression as material. Besides working as an analysis of computers from a designer’s point of view......, the notion of computational composites may also provide a link for computer science and human-computer interaction to an increasingly rapid development and use of new materials in design and architecture....

  17. Quantum Computing

    OpenAIRE

    Scarani, Valerio

    1998-01-01

    The aim of this thesis was to explain what quantum computing is. The information for the thesis was gathered from books, scientific publications, and news articles. The analysis of the information revealed that quantum computing can be broken down to three areas: theories behind quantum computing explaining the structure of a quantum computer, known quantum algorithms, and the actual physical realizations of a quantum computer. The thesis reveals that moving from classical memor...

  18. Parallel computing for homogeneous diffusion and transport equations in neutronics

    International Nuclear Information System (INIS)

    Pinchedez, K.

    1999-06-01

    Parallel computing meets the ever-increasing requirements for neutronic computer code speed and accuracy. In this work, two different approaches have been considered. We first parallelized the sequential algorithm used by the neutronics code CRONOS developed at the French Atomic Energy Commission. The algorithm computes the dominant eigenvalue associated with PN simplified transport equations by a mixed finite element method. Several parallel algorithms have been developed on distributed memory machines. The performances of the parallel algorithms have been studied experimentally by implementation on a T3D Cray and theoretically by complexity models. A comparison of various parallel algorithms has confirmed the chosen implementations. We next applied a domain sub-division technique to the two-group diffusion Eigen problem. In the modal synthesis-based method, the global spectrum is determined from the partial spectra associated with sub-domains. Then the Eigen problem is expanded on a family composed, on the one hand, from eigenfunctions associated with the sub-domains and, on the other hand, from functions corresponding to the contribution from the interface between the sub-domains. For a 2-D homogeneous core, this modal method has been validated and its accuracy has been measured. (author)

  19. Exploiting Data Sparsity for Large-Scale Matrix Computations

    KAUST Repository

    Akbudak, Kadir; Ltaief, Hatem; Mikhalev, Aleksandr; Charara, Ali; Keyes, David E.

    2018-01-01

    Exploiting data sparsity in dense matrices is an algorithmic bridge between architectures that are increasingly memory-austere on a per-core basis and extreme-scale applications. The Hierarchical matrix Computations on Manycore Architectures (HiCMA) library tackles this challenging problem by achieving significant reductions in time to solution and memory footprint, while preserving a specified accuracy requirement of the application. HiCMA provides a high-performance implementation on distributed-memory systems of one of the most widely used matrix factorization in large-scale scientific applications, i.e., the Cholesky factorization. It employs the tile low-rank data format to compress the dense data-sparse off-diagonal tiles of the matrix. It then decomposes the matrix computations into interdependent tasks and relies on the dynamic runtime system StarPU for asynchronous out-of-order scheduling, while allowing high user-productivity. Performance comparisons and memory footprint on matrix dimensions up to eleven million show a performance gain and memory saving of more than an order of magnitude for both metrics on thousands of cores, against state-of-the-art open-source and vendor optimized numerical libraries. This represents an important milestone in enabling large-scale matrix computations toward solving big data problems in geospatial statistics for climate/weather forecasting applications.

  20. Exploiting Data Sparsity for Large-Scale Matrix Computations

    KAUST Repository

    Akbudak, Kadir

    2018-02-24

    Exploiting data sparsity in dense matrices is an algorithmic bridge between architectures that are increasingly memory-austere on a per-core basis and extreme-scale applications. The Hierarchical matrix Computations on Manycore Architectures (HiCMA) library tackles this challenging problem by achieving significant reductions in time to solution and memory footprint, while preserving a specified accuracy requirement of the application. HiCMA provides a high-performance implementation on distributed-memory systems of one of the most widely used matrix factorization in large-scale scientific applications, i.e., the Cholesky factorization. It employs the tile low-rank data format to compress the dense data-sparse off-diagonal tiles of the matrix. It then decomposes the matrix computations into interdependent tasks and relies on the dynamic runtime system StarPU for asynchronous out-of-order scheduling, while allowing high user-productivity. Performance comparisons and memory footprint on matrix dimensions up to eleven million show a performance gain and memory saving of more than an order of magnitude for both metrics on thousands of cores, against state-of-the-art open-source and vendor optimized numerical libraries. This represents an important milestone in enabling large-scale matrix computations toward solving big data problems in geospatial statistics for climate/weather forecasting applications.

  1. Heuristic framework for parallel sorting computations | Nwanze ...

    African Journals Online (AJOL)

    Parallel sorting techniques have become of practical interest with the advent of new multiprocessor architectures. The decreasing cost of these processors will probably in the future, make the solutions that are derived thereof to be more appealing. Efficient algorithms for sorting scheme that are encountered in a number of ...

  2. Parallel Implementation of Triangular Cellular Automata for Computing Two-Dimensional Elastodynamic Response on Arbitrary Domains

    Science.gov (United States)

    Leamy, Michael J.; Springer, Adam C.

    In this research we report parallel implementation of a Cellular Automata-based simulation tool for computing elastodynamic response on complex, two-dimensional domains. Elastodynamic simulation using Cellular Automata (CA) has recently been presented as an alternative, inherently object-oriented technique for accurately and efficiently computing linear and nonlinear wave propagation in arbitrarily-shaped geometries. The local, autonomous nature of the method should lead to straight-forward and efficient parallelization. We address this notion on symmetric multiprocessor (SMP) hardware using a Java-based object-oriented CA code implementing triangular state machines (i.e., automata) and the MPI bindings written in Java (MPJ Express). We use MPJ Express to reconfigure our existing CA code to distribute a domain's automata to cores present on a dual quad-core shared-memory system (eight total processors). We note that this message passing parallelization strategy is directly applicable to computer clustered computing, which will be the focus of follow-on research. Results on the shared memory platform indicate nearly-ideal, linear speed-up. We conclude that the CA-based elastodynamic simulator is easily configured to run in parallel, and yields excellent speed-up on SMP hardware.

  3. Computational Medicine

    DEFF Research Database (Denmark)

    Nygaard, Jens Vinge

    2017-01-01

    The Health Technology Program at Aarhus University applies computational biology to investigate the heterogeneity of tumours......The Health Technology Program at Aarhus University applies computational biology to investigate the heterogeneity of tumours...

  4. Grid Computing

    Indian Academy of Sciences (India)

    A computing grid interconnects resources such as high performancecomputers, scientific databases, and computercontrolledscientific instruments of cooperating organizationseach of which is autonomous. It precedes and is quitedifferent from cloud computing, which provides computingresources by vendors to customers ...

  5. Green Computing

    Directory of Open Access Journals (Sweden)

    K. Shalini

    2013-01-01

    Full Text Available Green computing is all about using computers in a smarter and eco-friendly way. It is the environmentally responsible use of computers and related resources which includes the implementation of energy-efficient central processing units, servers and peripherals as well as reduced resource consumption and proper disposal of electronic waste .Computers certainly make up a large part of many people lives and traditionally are extremely damaging to the environment. Manufacturers of computer and its parts have been espousing the green cause to help protect environment from computers and electronic waste in any way.Research continues into key areas such as making the use of computers as energy-efficient as Possible, and designing algorithms and systems for efficiency-related computer technologies.

  6. Quantum computers and quantum computations

    International Nuclear Information System (INIS)

    Valiev, Kamil' A

    2005-01-01

    This review outlines the principles of operation of quantum computers and their elements. The theory of ideal computers that do not interact with the environment and are immune to quantum decohering processes is presented. Decohering processes in quantum computers are investigated. The review considers methods for correcting quantum computing errors arising from the decoherence of the state of the quantum computer, as well as possible methods for the suppression of the decohering processes. A brief enumeration of proposed quantum computer realizations concludes the review. (reviews of topical problems)

  7. Quantum Computing for Computer Architects

    CERN Document Server

    Metodi, Tzvetan

    2011-01-01

    Quantum computers can (in theory) solve certain problems far faster than a classical computer running any known classical algorithm. While existing technologies for building quantum computers are in their infancy, it is not too early to consider their scalability and reliability in the context of the design of large-scale quantum computers. To architect such systems, one must understand what it takes to design and model a balanced, fault-tolerant quantum computer architecture. The goal of this lecture is to provide architectural abstractions for the design of a quantum computer and to explore

  8. Pervasive Computing

    NARCIS (Netherlands)

    Silvis-Cividjian, N.

    This book provides a concise introduction to Pervasive Computing, otherwise known as Internet of Things (IoT) and Ubiquitous Computing (Ubicomp) which addresses the seamless integration of computing systems within everyday objects. By introducing the core topics and exploring assistive pervasive

  9. Computational vision

    CERN Document Server

    Wechsler, Harry

    1990-01-01

    The book is suitable for advanced courses in computer vision and image processing. In addition to providing an overall view of computational vision, it contains extensive material on topics that are not usually covered in computer vision texts (including parallel distributed processing and neural networks) and considers many real applications.

  10. Spatial Computation

    Science.gov (United States)

    2003-12-01

    Computation and today’s microprocessors with the approach to operating system architecture, and the controversy between microkernels and monolithic kernels...Both Spatial Computation and microkernels break away a relatively monolithic architecture into in- dividual lightweight pieces, well specialized...for their particular functionality. Spatial Computation removes global signals and control, in the same way microkernels remove the global address

  11. Diseño e Implementación de un Multiprocessor Systems-on-Chip (MPSoC Interconectado por una Networks-on-Chip (NoC

    Directory of Open Access Journals (Sweden)

    Wilson Mauricio Chicaiza

    2013-11-01

    Full Text Available En el presente documento se presenta una breve caracterización de los medios de comunicación empleados en arquitecturas multiprocesadas. Esta caracterización tiene como objetivo principal el mostrar un nuevo modelo de comunicación basado en conmutación de paquetes a los cuales se les denomina como Networks-On-Chip (NoC. Esta publicación muestra una arquitectura de red llamada NoC Hermes, la cual fue interconectada a un Multiprocessor-Systems-on-Chip (MPSoC compuesto de cuatro procesadores MicroBlaze. Está conexión se la realizó gracias al diseño y desarrollo de una Interfaz de Red generada en código VHDL. Por medio de la Interfaz de Red se consiguió que los procesadores MicroBlaze interactúen con los Switches de Hermes a fin de crear una arquitectura multiprocesada interconectada por una NoC. Con el motivo de realizar comparaciones también se creó otra arquitectura de multiprocesadores interconectados por buses. Para ambas arquitecturas se desarrolló una aplicación de Esteganografía enla que existe multiprocesamiento de dos procesadores trabajando simultáneamente. Lamentablemente sobre dicha aplicación no fue posible medir directamente la latencia y el consumo de energía, razón por la cual se utilizó simuladores que permitieron estimar dichas mediciones.

  12. High performance statistical computing with parallel R: applications to biology and climate modelling

    International Nuclear Information System (INIS)

    Samatova, Nagiza F; Branstetter, Marcia; Ganguly, Auroop R; Hettich, Robert; Khan, Shiraj; Kora, Guruprasad; Li, Jiangtian; Ma, Xiaosong; Pan, Chongle; Shoshani, Arie; Yoginath, Srikanth

    2006-01-01

    Ultrascale computing and high-throughput experimental technologies have enabled the production of scientific data about complex natural phenomena. With this opportunity, comes a new problem - the massive quantities of data so produced. Answers to fundamental questions about the nature of those phenomena remain largely hidden in the produced data. The goal of this work is to provide a scalable high performance statistical data analysis framework to help scientists perform interactive analyses of these raw data to extract knowledge. Towards this goal we have been developing an open source parallel statistical analysis package, called Parallel R, that lets scientists employ a wide range of statistical analysis routines on high performance shared and distributed memory architectures without having to deal with the intricacies of parallelizing these routines

  13. Parallel computations

    CERN Document Server

    1982-01-01

    Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn

  14. Human Computation

    CERN Multimedia

    CERN. Geneva

    2008-01-01

    What if people could play computer games and accomplish work without even realizing it? What if billions of people collaborated to solve important problems for humanity or generate training data for computers? My work aims at a general paradigm for doing exactly that: utilizing human processing power to solve computational problems in a distributed manner. In particular, I focus on harnessing human time and energy for addressing problems that computers cannot yet solve. Although computers have advanced dramatically in many respects over the last 50 years, they still do not possess the basic conceptual intelligence or perceptual capabilities...

  15. Quantum computation

    International Nuclear Information System (INIS)

    Deutsch, D.

    1992-01-01

    As computers become ever more complex, they inevitably become smaller. This leads to a need for components which are fabricated and operate on increasingly smaller size scales. Quantum theory is already taken into account in microelectronics design. This article explores how quantum theory will need to be incorporated into computers in future in order to give them their components functionality. Computation tasks which depend on quantum effects will become possible. Physicists may have to reconsider their perspective on computation in the light of understanding developed in connection with universal quantum computers. (UK)

  16. Computer software.

    Science.gov (United States)

    Rosenthal, L E

    1986-10-01

    Software is the component in a computer system that permits the hardware to perform the various functions that a computer system is capable of doing. The history of software and its development can be traced to the early nineteenth century. All computer systems are designed to utilize the "stored program concept" as first developed by Charles Babbage in the 1850s. The concept was lost until the mid-1940s, when modern computers made their appearance. Today, because of the complex and myriad tasks that a computer system can perform, there has been a differentiation of types of software. There is software designed to perform specific business applications. There is software that controls the overall operation of a computer system. And there is software that is designed to carry out specialized tasks. Regardless of types, software is the most critical component of any computer system. Without it, all one has is a collection of circuits, transistors, and silicone chips.

  17. Computer sciences

    Science.gov (United States)

    Smith, Paul H.

    1988-01-01

    The Computer Science Program provides advanced concepts, techniques, system architectures, algorithms, and software for both space and aeronautics information sciences and computer systems. The overall goal is to provide the technical foundation within NASA for the advancement of computing technology in aerospace applications. The research program is improving the state of knowledge of fundamental aerospace computing principles and advancing computing technology in space applications such as software engineering and information extraction from data collected by scientific instruments in space. The program includes the development of special algorithms and techniques to exploit the computing power provided by high performance parallel processors and special purpose architectures. Research is being conducted in the fundamentals of data base logic and improvement techniques for producing reliable computing systems.

  18. Some computational challenges of developing efficient parallel algorithms for data-dependent computations in thermal-hydraulics supercomputer applications

    International Nuclear Information System (INIS)

    Woodruff, S.B.

    1994-01-01

    The Transient Reactor Analysis Code (TRAC), which features a two-fluid treatment of thermal-hydraulics, is designed to model transients in water reactors and related facilities. One of the major computational costs associated with TRAC and similar codes is calculating constitutive coefficients. Although the formulations for these coefficients are local, the costs are flow-regime- or data-dependent; i.e., the computations needed for a given spatial node often vary widely as a function of time. Consequently, a fixed, uniform assignment of nodes to prallel processors will result in degraded computational efficiency due to the poor load balancing. A standard method for treating data-dependent models on vector architectures has been to use gather operations (or indirect adressing) to sort the nodes into subsets that (temporarily) share a common computational model. However, this method is not effective on distributed memory data parallel architectures, where indirect adressing involves expensive communication overhead. Another serious problem with this method involves software engineering challenges in the areas of maintainability and extensibility. For example, an implementation that was hand-tuned to achieve good computational efficiency would have to be rewritten whenever the decision tree governing the sorting was modified. Using an example based on the calculation of the wall-to-liquid and wall-to-vapor heat-transfer coefficients for three nonboiling flow regimes, we describe how the use of the Fortran 90 WHERE construct and automatic inlining of functions can be used to ameliorate this problem while improving both efficiency and software engineering. Unfortunately, a general automatic solution to the load-balancing problem associated with data-dependent computations is not yet available for massively parallel architectures. We discuss why developers should either wait for such solutions or consider alternative numerical algorithms, such as a neural network

  19. Contributing to the design of run-time systems dedicated to high performance computing; Contribution a l'elaboration d'environnements de programmation dedies au calcul scientifique hautes performances

    Energy Technology Data Exchange (ETDEWEB)

    Perache, M

    2006-10-15

    In the field of intensive scientific computing, the quest for performance has to face the increasing complexity of parallel architectures. Nowadays, these machines exhibit a deep memory hierarchy which complicates the design of efficient parallel applications. This thesis proposes a programming environment allowing to design efficient parallel programs on top of clusters of multi-processors. It features a programming model centered around collective communications and synchronizations, and provides load balancing facilities. The programming interface, named MPC, provides high level paradigms which are optimized according to the underlying architecture. The environment is fully functional and used within the CEA/DAM (TERANOVA) computing center. The evaluations presented in this document confirm the relevance of our approach. (author)

  20. Contributing to the design of run-time systems dedicated to high performance computing; Contribution a l'elaboration d'environnements de programmation dedies au calcul scientifique hautes performances

    Energy Technology Data Exchange (ETDEWEB)

    Perache, M

    2006-10-15

    In the field of intensive scientific computing, the quest for performance has to face the increasing complexity of parallel architectures. Nowadays, these machines exhibit a deep memory hierarchy which complicates the design of efficient parallel applications. This thesis proposes a programming environment allowing to design efficient parallel programs on top of clusters of multi-processors. It features a programming model centered around collective communications and synchronizations, and provides load balancing facilities. The programming interface, named MPC, provides high level paradigms which are optimized according to the underlying architecture. The environment is fully functional and used within the CEA/DAM (TERANOVA) computing center. The evaluations presented in this document confirm the relevance of our approach. (author)

  1. High-speed computation of the EM algorithm for PET image reconstruction

    International Nuclear Information System (INIS)

    Rajan, K.; Patnaik, L.M.; Ramakrishna, J.

    1994-01-01

    The PET image reconstruction based on the EM algorithm has several attractive advantages over the conventional convolution backprojection algorithms. However, two major drawbacks have impeded the routine use of the EM algorithm, namely, the long computational time due to slow convergence and the large memory required for the storage of the image, projection data and the probability matrix. In this study, the authors attempts to solve these two problems by parallelizing the EM algorithm on a multiprocessor system. The authors have implemented an extended hypercube (EH) architecture for the high-speed computation of the EM algorithm using the commercially available fast floating point digital signal processor (DSP) chips as the processing elements (PEs). The authors discuss and compare the performance of the EM algorithm on a 386/387 machine, CD 4360 mainframe, and on the EH system. The results show that the computational speed performance of an EH using DSP chips as PEs executing the EM image reconstruction algorithm is about 130 times better than that of the CD 4360 mainframe. The EH topology is expandable with more number of PEs

  2. Development of a software for a multi-processor system aimed at the on-line control of nuclear physics experiments

    International Nuclear Information System (INIS)

    Poggioli, Jean Renaud

    1984-01-01

    This research thesis reports the development of a software for an acquisition computer aimed at the on-line control of nuclear physics experiments. An original architecture, based on the assignment of a processor to each fundamental task, enables the implementation of a high performance system. In order to make the user free of programming constraints, the author developed a software for dynamic generation of acquisition and processing codes. These codes are created from a data base which is programmed by the user by using a language close to the physical reality. Procedures of interactive control of the experiment are thus simplified by displaying function menus on the operator terminal. The author evokes possible hardware improvements and possible extensions of the system [fr

  3. Computer programming and computer systems

    CERN Document Server

    Hassitt, Anthony

    1966-01-01

    Computer Programming and Computer Systems imparts a "reading knowledge? of computer systems.This book describes the aspects of machine-language programming, monitor systems, computer hardware, and advanced programming that every thorough programmer should be acquainted with. This text discusses the automatic electronic digital computers, symbolic language, Reverse Polish Notation, and Fortran into assembly language. The routine for reading blocked tapes, dimension statements in subroutines, general-purpose input routine, and efficient use of memory are also elaborated.This publication is inten

  4. Research in Parallel Algorithms and Software for Computational Aerosciences

    Science.gov (United States)

    Domel, Neal D.

    1996-01-01

    Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.

  5. Organic Computing

    CERN Document Server

    Würtz, Rolf P

    2008-01-01

    Organic Computing is a research field emerging around the conviction that problems of organization in complex systems in computer science, telecommunications, neurobiology, molecular biology, ethology, and possibly even sociology can be tackled scientifically in a unified way. From the computer science point of view, the apparent ease in which living systems solve computationally difficult problems makes it inevitable to adopt strategies observed in nature for creating information processing machinery. In this book, the major ideas behind Organic Computing are delineated, together with a sparse sample of computational projects undertaken in this new field. Biological metaphors include evolution, neural networks, gene-regulatory networks, networks of brain modules, hormone system, insect swarms, and ant colonies. Applications are as diverse as system design, optimization, artificial growth, task allocation, clustering, routing, face recognition, and sign language understanding.

  6. Computational biomechanics

    International Nuclear Information System (INIS)

    Ethier, C.R.

    2004-01-01

    Computational biomechanics is a fast-growing field that integrates modern biological techniques and computer modelling to solve problems of medical and biological interest. Modelling of blood flow in the large arteries is the best-known application of computational biomechanics, but there are many others. Described here is work being carried out in the laboratory on the modelling of blood flow in the coronary arteries and on the transport of viral particles in the eye. (author)

  7. Multiscale Methods, Parallel Computation, and Neural Networks for Real-Time Computer Vision.

    Science.gov (United States)

    Battiti, Roberto

    1990-01-01

    This thesis presents new algorithms for low and intermediate level computer vision. The guiding ideas in the presented approach are those of hierarchical and adaptive processing, concurrent computation, and supervised learning. Processing of the visual data at different resolutions is used not only to reduce the amount of computation necessary to reach the fixed point, but also to produce a more accurate estimation of the desired parameters. The presented adaptive multiple scale technique is applied to the problem of motion field estimation. Different parts of the image are analyzed at a resolution that is chosen in order to minimize the error in the coefficients of the differential equations to be solved. Tests with video-acquired images show that velocity estimation is more accurate over a wide range of motion with respect to the homogeneous scheme. In some cases introduction of explicit discontinuities coupled to the continuous variables can be used to avoid propagation of visual information from areas corresponding to objects with different physical and/or kinematic properties. The human visual system uses concurrent computation in order to process the vast amount of visual data in "real -time." Although with different technological constraints, parallel computation can be used efficiently for computer vision. All the presented algorithms have been implemented on medium grain distributed memory multicomputers with a speed-up approximately proportional to the number of processors used. A simple two-dimensional domain decomposition assigns regions of the multiresolution pyramid to the different processors. The inter-processor communication needed during the solution process is proportional to the linear dimension of the assigned domain, so that efficiency is close to 100% if a large region is assigned to each processor. Finally, learning algorithms are shown to be a viable technique to engineer computer vision systems for different applications starting from

  8. Computational Composites

    DEFF Research Database (Denmark)

    Vallgårda, Anna K. A.

    to understand the computer as a material like any other material we would use for design, like wood, aluminum, or plastic. That as soon as the computer forms a composition with other materials it becomes just as approachable and inspiring as other smart materials. I present a series of investigations of what...... Computational Composite, and Telltale). Through the investigations, I show how the computer can be understood as a material and how it partakes in a new strand of materials whose expressions come to be in context. I uncover some of their essential material properties and potential expressions. I develop a way...

  9. Functional requirements document for the Earth Observing System Data and Information System (EOSDIS) Scientific Computing Facilities (SCF) of the NASA/MSFC Earth Science and Applications Division, 1992

    Science.gov (United States)

    Botts, Michael E.; Phillips, Ron J.; Parker, John V.; Wright, Patrick D.

    1992-01-01

    Five scientists at MSFC/ESAD have EOS SCF investigator status. Each SCF has unique tasks which require the establishment of a computing facility dedicated to accomplishing those tasks. A SCF Working Group was established at ESAD with the charter of defining the computing requirements of the individual SCFs and recommending options for meeting these requirements. The primary goal of the working group was to determine which computing needs can be satisfied using either shared resources or separate but compatible resources, and which needs require unique individual resources. The requirements investigated included CPU-intensive vector and scalar processing, visualization, data storage, connectivity, and I/O peripherals. A review of computer industry directions and a market survey of computing hardware provided information regarding important industry standards and candidate computing platforms. It was determined that the total SCF computing requirements might be most effectively met using a hierarchy consisting of shared and individual resources. This hierarchy is composed of five major system types: (1) a supercomputer class vector processor; (2) a high-end scalar multiprocessor workstation; (3) a file server; (4) a few medium- to high-end visualization workstations; and (5) several low- to medium-range personal graphics workstations. Specific recommendations for meeting the needs of each of these types are presented.

  10. Computers as Components Principles of Embedded Computing System Design

    CERN Document Server

    Wolf, Wayne

    2008-01-01

    This book was the first to bring essential knowledge on embedded systems technology and techniques under a single cover. This second edition has been updated to the state-of-the-art by reworking and expanding performance analysis with more examples and exercises, and coverage of electronic systems now focuses on the latest applications. Researchers, students, and savvy professionals schooled in hardware or software design, will value Wayne Wolf's integrated engineering design approach.The second edition gives a more comprehensive view of multiprocessors including VLIW and superscalar archite

  11. GPGPU COMPUTING

    Directory of Open Access Journals (Sweden)

    BOGDAN OANCEA

    2012-05-01

    Full Text Available Since the first idea of using GPU to general purpose computing, things have evolved over the years and now there are several approaches to GPU programming. GPU computing practically began with the introduction of CUDA (Compute Unified Device Architecture by NVIDIA and Stream by AMD. These are APIs designed by the GPU vendors to be used together with the hardware that they provide. A new emerging standard, OpenCL (Open Computing Language tries to unify different GPU general computing API implementations and provides a framework for writing programs executed across heterogeneous platforms consisting of both CPUs and GPUs. OpenCL provides parallel computing using task-based and data-based parallelism. In this paper we will focus on the CUDA parallel computing architecture and programming model introduced by NVIDIA. We will present the benefits of the CUDA programming model. We will also compare the two main approaches, CUDA and AMD APP (STREAM and the new framwork, OpenCL that tries to unify the GPGPU computing models.

  12. Quantum Computing

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 5; Issue 9. Quantum Computing - Building Blocks of a Quantum Computer. C S Vijay Vishal Gupta. General Article Volume 5 Issue 9 September 2000 pp 69-81. Fulltext. Click here to view fulltext PDF. Permanent link:

  13. Platform computing

    CERN Multimedia

    2002-01-01

    "Platform Computing releases first grid-enabled workload management solution for IBM eServer Intel and UNIX high performance computing clusters. This Out-of-the-box solution maximizes the performance and capability of applications on IBM HPC clusters" (1/2 page) .

  14. Quantum Computing

    Indian Academy of Sciences (India)

    In the first part of this article, we had looked at how quantum physics can be harnessed to make the building blocks of a quantum computer. In this concluding part, we look at algorithms which can exploit the power of this computational device, and some practical difficulties in building such a device. Quantum Algorithms.

  15. Quantum computing

    OpenAIRE

    Burba, M.; Lapitskaya, T.

    2017-01-01

    This article gives an elementary introduction to quantum computing. It is a draft for a book chapter of the "Handbook of Nature-Inspired and Innovative Computing", Eds. A. Zomaya, G.J. Milburn, J. Dongarra, D. Bader, R. Brent, M. Eshaghian-Wilner, F. Seredynski (Springer, Berlin Heidelberg New York, 2006).

  16. Computational Pathology

    Science.gov (United States)

    Louis, David N.; Feldman, Michael; Carter, Alexis B.; Dighe, Anand S.; Pfeifer, John D.; Bry, Lynn; Almeida, Jonas S.; Saltz, Joel; Braun, Jonathan; Tomaszewski, John E.; Gilbertson, John R.; Sinard, John H.; Gerber, Georg K.; Galli, Stephen J.; Golden, Jeffrey A.; Becich, Michael J.

    2016-01-01

    Context We define the scope and needs within the new discipline of computational pathology, a discipline critical to the future of both the practice of pathology and, more broadly, medical practice in general. Objective To define the scope and needs of computational pathology. Data Sources A meeting was convened in Boston, Massachusetts, in July 2014 prior to the annual Association of Pathology Chairs meeting, and it was attended by a variety of pathologists, including individuals highly invested in pathology informatics as well as chairs of pathology departments. Conclusions The meeting made recommendations to promote computational pathology, including clearly defining the field and articulating its value propositions; asserting that the value propositions for health care systems must include means to incorporate robust computational approaches to implement data-driven methods that aid in guiding individual and population health care; leveraging computational pathology as a center for data interpretation in modern health care systems; stating that realizing the value proposition will require working with institutional administrations, other departments, and pathology colleagues; declaring that a robust pipeline should be fostered that trains and develops future computational pathologists, for those with both pathology and non-pathology backgrounds; and deciding that computational pathology should serve as a hub for data-related research in health care systems. The dissemination of these recommendations to pathology and bioinformatics departments should help facilitate the development of computational pathology. PMID:26098131

  17. Cloud Computing

    DEFF Research Database (Denmark)

    Krogh, Simon

    2013-01-01

    with technological changes, the paradigmatic pendulum has swung between increased centralization on one side and a focus on distributed computing that pushes IT power out to end users on the other. With the introduction of outsourcing and cloud computing, centralization in large data centers is again dominating...... the IT scene. In line with the views presented by Nicolas Carr in 2003 (Carr, 2003), it is a popular assumption that cloud computing will be the next utility (like water, electricity and gas) (Buyya, Yeo, Venugopal, Broberg, & Brandic, 2009). However, this assumption disregards the fact that most IT production......), for instance, in establishing and maintaining trust between the involved parties (Sabherwal, 1999). So far, research in cloud computing has neglected this perspective and focused entirely on aspects relating to technology, economy, security and legal questions. While the core technologies of cloud computing (e...

  18. Computability theory

    CERN Document Server

    Weber, Rebecca

    2012-01-01

    What can we compute--even with unlimited resources? Is everything within reach? Or are computations necessarily drastically limited, not just in practice, but theoretically? These questions are at the heart of computability theory. The goal of this book is to give the reader a firm grounding in the fundamentals of computability theory and an overview of currently active areas of research, such as reverse mathematics and algorithmic randomness. Turing machines and partial recursive functions are explored in detail, and vital tools and concepts including coding, uniformity, and diagonalization are described explicitly. From there the material continues with universal machines, the halting problem, parametrization and the recursion theorem, and thence to computability for sets, enumerability, and Turing reduction and degrees. A few more advanced topics round out the book before the chapter on areas of research. The text is designed to be self-contained, with an entire chapter of preliminary material including re...

  19. Computational Streetscapes

    Directory of Open Access Journals (Sweden)

    Paul M. Torrens

    2016-09-01

    Full Text Available Streetscapes have presented a long-standing interest in many fields. Recently, there has been a resurgence of attention on streetscape issues, catalyzed in large part by computing. Because of computing, there is more understanding, vistas, data, and analysis of and on streetscape phenomena than ever before. This diversity of lenses trained on streetscapes permits us to address long-standing questions, such as how people use information while mobile, how interactions with people and things occur on streets, how we might safeguard crowds, how we can design services to assist pedestrians, and how we could better support special populations as they traverse cities. Amid each of these avenues of inquiry, computing is facilitating new ways of posing these questions, particularly by expanding the scope of what-if exploration that is possible. With assistance from computing, consideration of streetscapes now reaches across scales, from the neurological interactions that form among place cells in the brain up to informatics that afford real-time views of activity over whole urban spaces. For some streetscape phenomena, computing allows us to build realistic but synthetic facsimiles in computation, which can function as artificial laboratories for testing ideas. In this paper, I review the domain science for studying streetscapes from vantages in physics, urban studies, animation and the visual arts, psychology, biology, and behavioral geography. I also review the computational developments shaping streetscape science, with particular emphasis on modeling and simulation as informed by data acquisition and generation, data models, path-planning heuristics, artificial intelligence for navigation and way-finding, timing, synthetic vision, steering routines, kinematics, and geometrical treatment of collision detection and avoidance. I also discuss the implications that the advances in computing streetscapes might have on emerging developments in cyber

  20. Preliminary Study on the Enhancement of Reconstruction Speed for Emission Computed Tomography Using Parallel Processing

    International Nuclear Information System (INIS)

    Park, Min Jae; Lee, Jae Sung; Kim, Soo Mee; Kang, Ji Yeon; Lee, Dong Soo; Park, Kwang Suk

    2009-01-01

    Conventional image reconstruction uses simplified physical models of projection. However, real physics, for example 3D reconstruction, takes too long time to process all the data in clinic and is unable in a common reconstruction machine because of the large memory for complex physical models. We suggest the realistic distributed memory model of fast-reconstruction using parallel processing on personal computers to enable large-scale technologies. The preliminary tests for the possibility on virtual machines and various performance test on commercial super computer, Tachyon were performed. Expectation maximization algorithm with common 2D projection and realistic 3D line of response were tested. Since the process time was getting slower (max 6 times) after a certain iteration, optimization for compiler was performed to maximize the efficiency of parallelization. Parallel processing of a program on multiple computers was available on Linux with MPICH and NFS. We verified that differences between parallel processed image and single processed image at the same iterations were under the significant digits of floating point number, about 6 bit. Double processors showed good efficiency (1.96 times) of parallel computing. Delay phenomenon was solved by vectorization method using SSE. Through the study, realistic parallel computing system in clinic was established to be able to reconstruct by plenty of memory using the realistic physical models which was impossible to simplify

  1. Development of superconductor electronics technology for high-end computing

    Energy Technology Data Exchange (ETDEWEB)

    Silver, A [Jet Propulsion Laboratory, 4800 Oak Grove Drive, Pasadena, CA 91109-8099 (United States); Kleinsasser, A [Jet Propulsion Laboratory, 4800 Oak Grove Drive, Pasadena, CA 91109-8099 (United States); Kerber, G [Northrop Grumman Space Technology, One Space Park, Redondo Beach, CA 90278 (United States); Herr, Q [Northrop Grumman Space Technology, One Space Park, Redondo Beach, CA 90278 (United States); Dorojevets, M [Department of Electrical and Computer Engineering, SUNY-Stony Brook, NY 11794-2350 (United States); Bunyk, P [Northrop Grumman Space Technology, One Space Park, Redondo Beach, CA 90278 (United States); Abelson, L [Northrop Grumman Space Technology, One Space Park, Redondo Beach, CA 90278 (United States)

    2003-12-01

    This paper describes our programme to develop and demonstrate ultra-high performance single flux quantum (SFQ) VLSI technology that will enable superconducting digital processors for petaFLOPS-scale computing. In the hybrid technology, multi-threaded architecture, the computational engine to power a petaFLOPS machine at affordable power will consist of 4096 SFQ multi-chip processors, with 50 to 100 GHz clock frequency and associated cryogenic RAM. We present the superconducting technology requirements, progress to date and our plan to meet these requirements. We improved SFQ Nb VLSI by two generations, to a 8 kA cm{sup -2}, 1.25 {mu}m junction process, incorporated new CAD tools into our methodology, demonstrated methods for recycling the bias current and data communication at speeds up to 60 Gb s{sup -1}, both on and between chips through passive transmission lines. FLUX-1 is the most ambitious project implemented in SFQ technology to date, a prototype general-purpose 8 bit microprocessor chip. We are testing the FLUX-1 chip (5K gates, 20 GHz clock) and designing a 32 bit floating-point SFQ multiplier with vector-register memory. We report correct operation of the complete stripline-connected gate library with large bias margins, as well as several larger functional units used in FLUX-1. The next stage will be an SFQ multi-processor machine. Important challenges include further reducing chip supply current and on-chip power dissipation, developing at least 64 kbit, sub-nanosecond cryogenic RAM chips, developing thermally and electrically efficient high data rate cryogenic-to-ambient input/output technology and improving Nb VLSI to increase gate density.

  2. Development of superconductor electronics technology for high-end computing

    International Nuclear Information System (INIS)

    Silver, A; Kleinsasser, A; Kerber, G; Herr, Q; Dorojevets, M; Bunyk, P; Abelson, L

    2003-01-01

    This paper describes our programme to develop and demonstrate ultra-high performance single flux quantum (SFQ) VLSI technology that will enable superconducting digital processors for petaFLOPS-scale computing. In the hybrid technology, multi-threaded architecture, the computational engine to power a petaFLOPS machine at affordable power will consist of 4096 SFQ multi-chip processors, with 50 to 100 GHz clock frequency and associated cryogenic RAM. We present the superconducting technology requirements, progress to date and our plan to meet these requirements. We improved SFQ Nb VLSI by two generations, to a 8 kA cm -2 , 1.25 μm junction process, incorporated new CAD tools into our methodology, demonstrated methods for recycling the bias current and data communication at speeds up to 60 Gb s -1 , both on and between chips through passive transmission lines. FLUX-1 is the most ambitious project implemented in SFQ technology to date, a prototype general-purpose 8 bit microprocessor chip. We are testing the FLUX-1 chip (5K gates, 20 GHz clock) and designing a 32 bit floating-point SFQ multiplier with vector-register memory. We report correct operation of the complete stripline-connected gate library with large bias margins, as well as several larger functional units used in FLUX-1. The next stage will be an SFQ multi-processor machine. Important challenges include further reducing chip supply current and on-chip power dissipation, developing at least 64 kbit, sub-nanosecond cryogenic RAM chips, developing thermally and electrically efficient high data rate cryogenic-to-ambient input/output technology and improving Nb VLSI to increase gate density

  3. COMPUTATIONAL THINKING

    Directory of Open Access Journals (Sweden)

    Evgeniy K. Khenner

    2016-01-01

    Full Text Available Abstract. The aim of the research is to draw attention of the educational community to the phenomenon of computational thinking which actively discussed in the last decade in the foreign scientific and educational literature, to substantiate of its importance, practical utility and the right on affirmation in Russian education.Methods. The research is based on the analysis of foreign studies of the phenomenon of computational thinking and the ways of its formation in the process of education; on comparing the notion of «computational thinking» with related concepts used in the Russian scientific and pedagogical literature.Results. The concept «computational thinking» is analyzed from the point of view of intuitive understanding and scientific and applied aspects. It is shown as computational thinking has evolved in the process of development of computers hardware and software. The practice-oriented interpretation of computational thinking which dominant among educators is described along with some ways of its formation. It is shown that computational thinking is a metasubject result of general education as well as its tool. From the point of view of the author, purposeful development of computational thinking should be one of the tasks of the Russian education.Scientific novelty. The author gives a theoretical justification of the role of computational thinking schemes as metasubject results of learning. The dynamics of the development of this concept is described. This process is connected with the evolution of computer and information technologies as well as increase of number of the tasks for effective solutions of which computational thinking is required. Author substantiated the affirmation that including «computational thinking » in the set of pedagogical concepts which are used in the national education system fills an existing gap.Practical significance. New metasubject result of education associated with

  4. Low Overhead Real-Time Computing With General Purpose Operating Systems

    National Research Council Canada - National Science Library

    Raymond, Michael

    2004-01-01

    .... In larger systems and more recently, general-purpose operating systems such as SGI IRIX and Linux are used for new projects because they already have multiprocessor and device driver support as well a large user base...

  5. Computer interfacing

    CERN Document Server

    Dixey, Graham

    1994-01-01

    This book explains how computers interact with the world around them and therefore how to make them a useful tool. Topics covered include descriptions of all the components that make up a computer, principles of data exchange, interaction with peripherals, serial communication, input devices, recording methods, computer-controlled motors, and printers.In an informative and straightforward manner, Graham Dixey describes how to turn what might seem an incomprehensible 'black box' PC into a powerful and enjoyable tool that can help you in all areas of your work and leisure. With plenty of handy

  6. Computational physics

    CERN Document Server

    Newman, Mark

    2013-01-01

    A complete introduction to the field of computational physics, with examples and exercises in the Python programming language. Computers play a central role in virtually every major physics discovery today, from astrophysics and particle physics to biophysics and condensed matter. This book explains the fundamentals of computational physics and describes in simple terms the techniques that every physicist should know, such as finite difference methods, numerical quadrature, and the fast Fourier transform. The book offers a complete introduction to the topic at the undergraduate level, and is also suitable for the advanced student or researcher who wants to learn the foundational elements of this important field.

  7. Computational physics

    Energy Technology Data Exchange (ETDEWEB)

    Anon.

    1987-01-15

    Computers have for many years played a vital role in the acquisition and treatment of experimental data, but they have more recently taken up a much more extended role in physics research. The numerical and algebraic calculations now performed on modern computers make it possible to explore consequences of basic theories in a way which goes beyond the limits of both analytic insight and experimental investigation. This was brought out clearly at the Conference on Perspectives in Computational Physics, held at the International Centre for Theoretical Physics, Trieste, Italy, from 29-31 October.

  8. Cloud Computing

    CERN Document Server

    Baun, Christian; Nimis, Jens; Tai, Stefan

    2011-01-01

    Cloud computing is a buzz-word in today's information technology (IT) that nobody can escape. But what is really behind it? There are many interpretations of this term, but no standardized or even uniform definition. Instead, as a result of the multi-faceted viewpoints and the diverse interests expressed by the various stakeholders, cloud computing is perceived as a rather fuzzy concept. With this book, the authors deliver an overview of cloud computing architecture, services, and applications. Their aim is to bring readers up to date on this technology and thus to provide a common basis for d

  9. Computational Viscoelasticity

    CERN Document Server

    Marques, Severino P C

    2012-01-01

    This text is a guide how to solve problems in which viscoelasticity is present using existing commercial computational codes. The book gives information on codes’ structure and use, data preparation  and output interpretation and verification. The first part of the book introduces the reader to the subject, and to provide the models, equations and notation to be used in the computational applications. The second part shows the most important Computational techniques: Finite elements formulation, Boundary elements formulation, and presents the solutions of Viscoelastic problems with Abaqus.

  10. Optical computing.

    Science.gov (United States)

    Stroke, G. W.

    1972-01-01

    Applications of the optical computer include an approach for increasing the sharpness of images obtained from the most powerful electron microscopes and fingerprint/credit card identification. The information-handling capability of the various optical computing processes is very great. Modern synthetic-aperture radars scan upward of 100,000 resolvable elements per second. Fields which have assumed major importance on the basis of optical computing principles are optical image deblurring, coherent side-looking synthetic-aperture radar, and correlative pattern recognition. Some examples of the most dramatic image deblurring results are shown.

  11. Computational physics

    International Nuclear Information System (INIS)

    Anon.

    1987-01-01

    Computers have for many years played a vital role in the acquisition and treatment of experimental data, but they have more recently taken up a much more extended role in physics research. The numerical and algebraic calculations now performed on modern computers make it possible to explore consequences of basic theories in a way which goes beyond the limits of both analytic insight and experimental investigation. This was brought out clearly at the Conference on Perspectives in Computational Physics, held at the International Centre for Theoretical Physics, Trieste, Italy, from 29-31 October

  12. Phenomenological Computation?

    DEFF Research Database (Denmark)

    Brier, Søren

    2014-01-01

    Open peer commentary on the article “Info-computational Constructivism and Cognition” by Gordana Dodig-Crnkovic. Upshot: The main problems with info-computationalism are: (1) Its basic concept of natural computing has neither been defined theoretically or implemented practically. (2. It cannot...... encompass human concepts of subjective experience and intersubjective meaningful communication, which prevents it from being genuinely transdisciplinary. (3) Philosophically, it does not sufficiently accept the deep ontological differences between various paradigms such as von Foerster’s second- order...

  13. High Performance Polar Decomposition on Distributed Memory Systems

    KAUST Repository

    Sukkari, Dalal E.; Ltaief, Hatem; Keyes, David E.

    2016-01-01

    The polar decomposition of a dense matrix is an important operation in linear algebra. It can be directly calculated through the singular value decomposition (SVD) or iteratively using the QR dynamically-weighted Halley algorithm (QDWH). The former

  14. Shark: Fast Data Analysis Using Coarse-grained Distributed Memory

    Science.gov (United States)

    2013-05-01

    Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 7.1.1 Java Objects...often MySQL or Derby) with a namespace for tables, table metadata, and par- tition information. Table data is stored in an HDFS directory, while a...saving time and space for large data sets. This is achieved with support for custom SerDe (serialization/deserialization) java interface implementations

  15. A highly efficient parallel algorithm for solving the neutron diffusion nodal equations on shared-memory computers

    International Nuclear Information System (INIS)

    Azmy, Y.Y.; Kirk, B.L.

    1990-01-01

    Modern parallel computer architectures offer an enormous potential for reducing CPU and wall-clock execution times of large-scale computations commonly performed in various applications in science and engineering. Recently, several authors have reported their efforts in developing and implementing parallel algorithms for solving the neutron diffusion equation on a variety of shared- and distributed-memory parallel computers. Testing of these algorithms for a variety of two- and three-dimensional meshes showed significant speedup of the computation. Even for very large problems (i.e., three-dimensional fine meshes) executed concurrently on a few nodes in serial (nonvector) mode, however, the measured computational efficiency is very low (40 to 86%). In this paper, the authors present a highly efficient (∼85 to 99.9%) algorithm for solving the two-dimensional nodal diffusion equations on the Sequent Balance 8000 parallel computer. Also presented is a model for the performance, represented by the efficiency, as a function of problem size and the number of participating processors. The model is validated through several tests and then extrapolated to larger problems and more processors to predict the performance of the algorithm in more computationally demanding situations

  16. Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via Tree-Grafting

    International Nuclear Information System (INIS)

    Azad, Ariful; Buluc, Aydn; Pothen, Alex

    2016-01-01

    It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its shared-memory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. This algorithm searches for augmenting paths via specialized breadth-first searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. Algorithms that employ multiple-source searches cannot discard a search tree once no augmenting path is discovered from the tree, unlike algorithms that rely on single-source searches. We describe a novel tree-grafting method that eliminates most of the redundant edge traversals resulting from this property of multiple-source searches. We also employ the recent direction-optimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. Here, we provide a proof of correctness for our algorithm. Our NUMA-aware implementation is scalable to 80 threads of an Intel multiprocessor and to 240 threads on an Intel Knights Corner coprocessor. On average, our parallel algorithm runs an order of magnitude faster than the fastest algorithms available. The performance improvement is more significant on graphs with small matching number.

  17. A parallel solution-adaptive scheme for predicting multi-phase core flows in solid propellant rocket motors

    International Nuclear Information System (INIS)

    Sachdev, J.S.; Groth, C.P.T.; Gottlieb, J.J.

    2003-01-01

    The development of a parallel adaptive mesh refinement (AMR) scheme is described for solving the governing equations for multi-phase (gas-particle) core flows in solid propellant rocket motors (SRM). An Eulerian formulation is used to described the coupled motion between the gas and particle phases. A cell-centred upwind finite-volume discretization and the use of limited solution reconstruction, Riemann solver based flux functions for the gas and particle phases, and explicit multi-stage time-stepping allows for high solution accuracy and computational robustness. A Riemann problem is formulated for prescribing boundary data at the burning surface. Efficient and scalable parallel implementations are achieved with domain decomposition on distributed memory multiprocessor architectures. Numerical results are described to demonstrate the capabilities of the approach for predicting SRM core flows. (author)

  18. Essentials of cloud computing

    CERN Document Server

    Chandrasekaran, K

    2014-01-01

    ForewordPrefaceComputing ParadigmsLearning ObjectivesPreambleHigh-Performance ComputingParallel ComputingDistributed ComputingCluster ComputingGrid ComputingCloud ComputingBiocomputingMobile ComputingQuantum ComputingOptical ComputingNanocomputingNetwork ComputingSummaryReview PointsReview QuestionsFurther ReadingCloud Computing FundamentalsLearning ObjectivesPreambleMotivation for Cloud ComputingThe Need for Cloud ComputingDefining Cloud ComputingNIST Definition of Cloud ComputingCloud Computing Is a ServiceCloud Computing Is a Platform5-4-3 Principles of Cloud computingFive Essential Charact

  19. Personal Computers.

    Science.gov (United States)

    Toong, Hoo-min D.; Gupta, Amar

    1982-01-01

    Describes the hardware, software, applications, and current proliferation of personal computers (microcomputers). Includes discussions of microprocessors, memory, output (including printers), application programs, the microcomputer industry, and major microcomputer manufacturers (Apple, Radio Shack, Commodore, and IBM). (JN)

  20. Computational Literacy

    DEFF Research Database (Denmark)

    Chongtay, Rocio; Robering, Klaus

    2016-01-01

    In recent years, there has been a growing interest in and recognition of the importance of Computational Literacy, a skill generally considered to be necessary for success in the 21st century. While much research has concentrated on requirements, tools, and teaching methodologies for the acquisit......In recent years, there has been a growing interest in and recognition of the importance of Computational Literacy, a skill generally considered to be necessary for success in the 21st century. While much research has concentrated on requirements, tools, and teaching methodologies...... for the acquisition of Computational Literacy at basic educational levels, focus on higher levels of education has been much less prominent. The present paper considers the case of courses for higher education programs within the Humanities. A model is proposed which conceives of Computational Literacy as a layered...

  1. Computing Religion

    DEFF Research Database (Denmark)

    Nielbo, Kristoffer Laigaard; Braxton, Donald M.; Upal, Afzal

    2012-01-01

    The computational approach has become an invaluable tool in many fields that are directly relevant to research in religious phenomena. Yet the use of computational tools is almost absent in the study of religion. Given that religion is a cluster of interrelated phenomena and that research...... concerning these phenomena should strive for multilevel analysis, this article argues that the computational approach offers new methodological and theoretical opportunities to the study of religion. We argue that the computational approach offers 1.) an intermediary step between any theoretical construct...... and its targeted empirical space and 2.) a new kind of data which allows the researcher to observe abstract constructs, estimate likely outcomes, and optimize empirical designs. Because sophisticated mulitilevel research is a collaborative project we also seek to introduce to scholars of religion some...

  2. Computational Controversy

    NARCIS (Netherlands)

    Timmermans, Benjamin; Kuhn, Tobias; Beelen, Kaspar; Aroyo, Lora

    2017-01-01

    Climate change, vaccination, abortion, Trump: Many topics are surrounded by fierce controversies. The nature of such heated debates and their elements have been studied extensively in the social science literature. More recently, various computational approaches to controversy analysis have

  3. Grid Computing

    Indian Academy of Sciences (India)

    IAS Admin

    emergence of supercomputers led to the use of computer simula- tion as an .... Scientific and engineering applications (e.g., Tera grid secure gate way). Collaborative ... Encryption, privacy, protection from malicious software. Physical Layer.

  4. Computer tomographs

    International Nuclear Information System (INIS)

    Niedzwiedzki, M.

    1982-01-01

    Physical foundations and the developments in the transmission and emission computer tomography are presented. On the basis of the available literature and private communications a comparison is made of the various transmission tomographs. A new technique of computer emission tomography ECT, unknown in Poland, is described. The evaluation of two methods of ECT, namely those of positron and single photon emission tomography is made. (author)

  5. Computational sustainability

    CERN Document Server

    Kersting, Kristian; Morik, Katharina

    2016-01-01

    The book at hand gives an overview of the state of the art research in Computational Sustainability as well as case studies of different application scenarios. This covers topics such as renewable energy supply, energy storage and e-mobility, efficiency in data centers and networks, sustainable food and water supply, sustainable health, industrial production and quality, etc. The book describes computational methods and possible application scenarios.

  6. Computing farms

    International Nuclear Information System (INIS)

    Yeh, G.P.

    2000-01-01

    High-energy physics, nuclear physics, space sciences, and many other fields have large challenges in computing. In recent years, PCs have achieved performance comparable to the high-end UNIX workstations, at a small fraction of the price. We review the development and broad applications of commodity PCs as the solution to CPU needs, and look forward to the important and exciting future of large-scale PC computing

  7. Computational chemistry

    Science.gov (United States)

    Arnold, J. O.

    1987-01-01

    With the advent of supercomputers, modern computational chemistry algorithms and codes, a powerful tool was created to help fill NASA's continuing need for information on the properties of matter in hostile or unusual environments. Computational resources provided under the National Aerodynamics Simulator (NAS) program were a cornerstone for recent advancements in this field. Properties of gases, materials, and their interactions can be determined from solutions of the governing equations. In the case of gases, for example, radiative transition probabilites per particle, bond-dissociation energies, and rates of simple chemical reactions can be determined computationally as reliably as from experiment. The data are proving to be quite valuable in providing inputs to real-gas flow simulation codes used to compute aerothermodynamic loads on NASA's aeroassist orbital transfer vehicles and a host of problems related to the National Aerospace Plane Program. Although more approximate, similar solutions can be obtained for ensembles of atoms simulating small particles of materials with and without the presence of gases. Computational chemistry has application in studying catalysis, properties of polymers, all of interest to various NASA missions, including those previously mentioned. In addition to discussing these applications of computational chemistry within NASA, the governing equations and the need for supercomputers for their solution is outlined.

  8. Proceedings: Sisal `93

    Energy Technology Data Exchange (ETDEWEB)

    Feo, J.T. [ed.

    1993-10-01

    This report contain papers on: Programmability and performance issues; The case of an iterative partial differential equation solver; Implementing the kernal of the Australian Region Weather Prediction Model in Sisal; Even and quarter-even prime length symmetric FFTs and their Sisal Implementations; Top-down thread generation for Sisal; Overlapping communications and computations on NUMA architechtures; Compiling technique based on dataflow analysis for funtional programming language Valid; Copy elimination for true multidimensional arrays in Sisal 2.0; Increasing parallelism for an optimization that reduces copying in IF2 graphs; Caching in on Sisal; Cache performance of Sisal Vs. FORTRAN; FFT algorithms on a shared-memory multiprocessor; A parallel implementation of nonnumeric search problems in Sisal; Computer vision algorithms in Sisal; Compilation of Sisal for a high-performance data driven vector processor; Sisal on distributed memory machines; A virtual shared addressing system for distributed memory Sisal; Developing a high-performance FFT algorithm in Sisal for a vector supercomputer; Implementation issues for IF2 on a static data-flow architechture; and Systematic control of parallelism in array-based data-flow computation. Selected papers have been indexed separately for inclusion in the Energy Science and Technology Database.

  9. Computational creativity

    Directory of Open Access Journals (Sweden)

    López de Mántaras Badia, Ramon

    2013-12-01

    Full Text Available New technologies, and in particular artificial intelligence, are drastically changing the nature of creative processes. Computers are playing very significant roles in creative activities such as music, architecture, fine arts, and science. Indeed, the computer is already a canvas, a brush, a musical instrument, and so on. However, we believe that we must aim at more ambitious relations between computers and creativity. Rather than just seeing the computer as a tool to help human creators, we could see it as a creative entity in its own right. This view has triggered a new subfield of Artificial Intelligence called Computational Creativity. This article addresses the question of the possibility of achieving computational creativity through some examples of computer programs capable of replicating some aspects of creative behavior in the fields of music and science.Las nuevas tecnologías y en particular la Inteligencia Artificial están cambiando de forma importante la naturaleza del proceso creativo. Los ordenadores están jugando un papel muy significativo en actividades artísticas tales como la música, la arquitectura, las bellas artes y la ciencia. Efectivamente, el ordenador ya es el lienzo, el pincel, el instrumento musical, etc. Sin embargo creemos que debemos aspirar a relaciones más ambiciosas entre los ordenadores y la creatividad. En lugar de verlos solamente como herramientas de ayuda a la creación, los ordenadores podrían ser considerados agentes creativos. Este punto de vista ha dado lugar a un nuevo subcampo de la Inteligencia Artificial denominado Creatividad Computacional. En este artículo abordamos la cuestión de la posibilidad de alcanzar dicha creatividad computacional mediante algunos ejemplos de programas de ordenador capaces de replicar algunos aspectos relacionados con el comportamiento creativo en los ámbitos de la música y la ciencia.

  10. Quantum computing

    International Nuclear Information System (INIS)

    Steane, Andrew

    1998-01-01

    The subject of quantum computing brings together ideas from classical information theory, computer science, and quantum physics. This review aims to summarize not just quantum computing, but the whole subject of quantum information theory. Information can be identified as the most general thing which must propagate from a cause to an effect. It therefore has a fundamentally important role in the science of physics. However, the mathematical treatment of information, especially information processing, is quite recent, dating from the mid-20th century. This has meant that the full significance of information as a basic concept in physics is only now being discovered. This is especially true in quantum mechanics. The theory of quantum information and computing puts this significance on a firm footing, and has led to some profound and exciting new insights into the natural world. Among these are the use of quantum states to permit the secure transmission of classical information (quantum cryptography), the use of quantum entanglement to permit reliable transmission of quantum states (teleportation), the possibility of preserving quantum coherence in the presence of irreversible noise processes (quantum error correction), and the use of controlled quantum evolution for efficient computation (quantum computation). The common theme of all these insights is the use of quantum entanglement as a computational resource. It turns out that information theory and quantum mechanics fit together very well. In order to explain their relationship, this review begins with an introduction to classical information theory and computer science, including Shannon's theorem, error correcting codes, Turing machines and computational complexity. The principles of quantum mechanics are then outlined, and the Einstein, Podolsky and Rosen (EPR) experiment described. The EPR-Bell correlations, and quantum entanglement in general, form the essential new ingredient which distinguishes quantum from

  11. Quantum computing

    Energy Technology Data Exchange (ETDEWEB)

    Steane, Andrew [Department of Atomic and Laser Physics, University of Oxford, Clarendon Laboratory, Oxford (United Kingdom)

    1998-02-01

    The subject of quantum computing brings together ideas from classical information theory, computer science, and quantum physics. This review aims to summarize not just quantum computing, but the whole subject of quantum information theory. Information can be identified as the most general thing which must propagate from a cause to an effect. It therefore has a fundamentally important role in the science of physics. However, the mathematical treatment of information, especially information processing, is quite recent, dating from the mid-20th century. This has meant that the full significance of information as a basic concept in physics is only now being discovered. This is especially true in quantum mechanics. The theory of quantum information and computing puts this significance on a firm footing, and has led to some profound and exciting new insights into the natural world. Among these are the use of quantum states to permit the secure transmission of classical information (quantum cryptography), the use of quantum entanglement to permit reliable transmission of quantum states (teleportation), the possibility of preserving quantum coherence in the presence of irreversible noise processes (quantum error correction), and the use of controlled quantum evolution for efficient computation (quantum computation). The common theme of all these insights is the use of quantum entanglement as a computational resource. It turns out that information theory and quantum mechanics fit together very well. In order to explain their relationship, this review begins with an introduction to classical information theory and computer science, including Shannon's theorem, error correcting codes, Turing machines and computational complexity. The principles of quantum mechanics are then outlined, and the Einstein, Podolsky and Rosen (EPR) experiment described. The EPR-Bell correlations, and quantum entanglement in general, form the essential new ingredient which distinguishes quantum from

  12. Report on evaluation of research and development of superhigh-function electronic computers; Chokoseino denshi keisanki no kenkyu kaihatsu ni kansuru hyoka hokokusho

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1973-02-20

    Described herein is development of superhigh-function electronic computers.This project was implemented on a 6-year joint project, beginning in FY 1966, by the government, industrial and academic circles, with the objective to develop standard, large-size computers comparable with those of the world's highest functions by the beginning of the 70's. The computers developed by this project met almost all of the specifications of the world's representative, large-size commercial computers, partly surpassing the world's machine. In particular, integration of the virtual memory, buffer memory and multi-processor functions, which were considered to be the central technical features of the computers of the next generation, into one system was a Japan's unique concept, not seen in other countries. The other developments considered to have great ripple effects are seen in LSI's, and techniques for utilizing and mounting them and for improving their reliability. Development of magnetic discs is another notable result for the peripheral devices. Development of the input/output devices was started to correspond to inputting, outputting and reading Chinese characters, which are characteristics of Japan. The software developed has sufficient functions for common use and is considered to be the world's leading, large-size operating system, although evaluation thereof largely awaits the actual specification results. (NEDO)

  13. Multiparty Computations

    DEFF Research Database (Denmark)

    Dziembowski, Stefan

    here and discuss other problems caused by the adaptiveness. All protocols in the thesis are formally specified and the proofs of their security are given. [1]Ronald Cramer, Ivan Damgård, Stefan Dziembowski, Martin Hirt, and Tal Rabin. Efficient multiparty computations with dishonest minority......In this thesis we study a problem of doing Verifiable Secret Sharing (VSS) and Multiparty Computations in a model where private channels between the players and a broadcast channel is available. The adversary is active, adaptive and has an unbounded computing power. The thesis is based on two...... to a polynomial time black-box reduction, the complexity of adaptively secure VSS is the same as that of ordinary secret sharing (SS), where security is only required against a passive, static adversary. Previously, such a connection was only known for linear secret sharing and VSS schemes. We then show...

  14. Scientific computing

    CERN Document Server

    Trangenstein, John A

    2017-01-01

    This is the third of three volumes providing a comprehensive presentation of the fundamentals of scientific computing. This volume discusses topics that depend more on calculus than linear algebra, in order to prepare the reader for solving differential equations. This book and its companions show how to determine the quality of computational results, and how to measure the relative efficiency of competing methods. Readers learn how to determine the maximum attainable accuracy of algorithms, and how to select the best method for computing problems. This book also discusses programming in several languages, including C++, Fortran and MATLAB. There are 90 examples, 200 exercises, 36 algorithms, 40 interactive JavaScript programs, 91 references to software programs and 1 case study. Topics are introduced with goals, literature references and links to public software. There are descriptions of the current algorithms in GSLIB and MATLAB. This book could be used for a second course in numerical methods, for either ...

  15. Computational Psychiatry

    Science.gov (United States)

    Wang, Xiao-Jing; Krystal, John H.

    2014-01-01

    Psychiatric disorders such as autism and schizophrenia arise from abnormalities in brain systems that underlie cognitive, emotional and social functions. The brain is enormously complex and its abundant feedback loops on multiple scales preclude intuitive explication of circuit functions. In close interplay with experiments, theory and computational modeling are essential for understanding how, precisely, neural circuits generate flexible behaviors and their impairments give rise to psychiatric symptoms. This Perspective highlights recent progress in applying computational neuroscience to the study of mental disorders. We outline basic approaches, including identification of core deficits that cut across disease categories, biologically-realistic modeling bridging cellular and synaptic mechanisms with behavior, model-aided diagnosis. The need for new research strategies in psychiatry is urgent. Computational psychiatry potentially provides powerful tools for elucidating pathophysiology that may inform both diagnosis and treatment. To achieve this promise will require investment in cross-disciplinary training and research in this nascent field. PMID:25442941

  16. Computationally efficient implementation of combustion chemistry in parallel PDF calculations

    International Nuclear Information System (INIS)

    Lu Liuyan; Lantz, Steven R.; Ren Zhuyin; Pope, Stephen B.

    2009-01-01

    In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f m pi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel

  17. Computational artifacts

    DEFF Research Database (Denmark)

    Schmidt, Kjeld; Bansler, Jørgen P.

    2016-01-01

    The key concern of CSCW research is that of understanding computing technologies in the social context of their use, that is, as integral features of our practices and our lives, and to think of their design and implementation under that perspective. However, the question of the nature...... of that which is actually integrated in our practices is often discussed in confusing ways, if at all. The article aims to try to clarify the issue and in doing so revisits and reconsiders the notion of ‘computational artifact’....

  18. Computer security

    CERN Document Server

    Gollmann, Dieter

    2011-01-01

    A completely up-to-date resource on computer security Assuming no previous experience in the field of computer security, this must-have book walks you through the many essential aspects of this vast topic, from the newest advances in software and technology to the most recent information on Web applications security. This new edition includes sections on Windows NT, CORBA, and Java and discusses cross-site scripting and JavaScript hacking as well as SQL injection. Serving as a helpful introduction, this self-study guide is a wonderful starting point for examining the variety of competing sec

  19. Cloud Computing

    CERN Document Server

    Antonopoulos, Nick

    2010-01-01

    Cloud computing has recently emerged as a subject of substantial industrial and academic interest, though its meaning and scope is hotly debated. For some researchers, clouds are a natural evolution towards the full commercialisation of grid systems, while others dismiss the term as a mere re-branding of existing pay-per-use technologies. From either perspective, 'cloud' is now the label of choice for accountable pay-per-use access to third party applications and computational resources on a massive scale. Clouds support patterns of less predictable resource use for applications and services a

  20. Computational Logistics

    DEFF Research Database (Denmark)

    Pacino, Dario; Voss, Stefan; Jensen, Rune Møller

    2013-01-01

    This book constitutes the refereed proceedings of the 4th International Conference on Computational Logistics, ICCL 2013, held in Copenhagen, Denmark, in September 2013. The 19 papers presented in this volume were carefully reviewed and selected for inclusion in the book. They are organized in to...... in topical sections named: maritime shipping, road transport, vehicle routing problems, aviation applications, and logistics and supply chain management.......This book constitutes the refereed proceedings of the 4th International Conference on Computational Logistics, ICCL 2013, held in Copenhagen, Denmark, in September 2013. The 19 papers presented in this volume were carefully reviewed and selected for inclusion in the book. They are organized...

  1. Computational Logistics

    DEFF Research Database (Denmark)

    This book constitutes the refereed proceedings of the 4th International Conference on Computational Logistics, ICCL 2013, held in Copenhagen, Denmark, in September 2013. The 19 papers presented in this volume were carefully reviewed and selected for inclusion in the book. They are organized in to...... in topical sections named: maritime shipping, road transport, vehicle routing problems, aviation applications, and logistics and supply chain management.......This book constitutes the refereed proceedings of the 4th International Conference on Computational Logistics, ICCL 2013, held in Copenhagen, Denmark, in September 2013. The 19 papers presented in this volume were carefully reviewed and selected for inclusion in the book. They are organized...

  2. Computational engineering

    CERN Document Server

    2014-01-01

    The book presents state-of-the-art works in computational engineering. Focus is on mathematical modeling, numerical simulation, experimental validation and visualization in engineering sciences. In particular, the following topics are presented: constitutive models and their implementation into finite element codes, numerical models in nonlinear elasto-dynamics including seismic excitations, multiphase models in structural engineering and multiscale models of materials systems, sensitivity and reliability analysis of engineering structures, the application of scientific computing in urban water management and hydraulic engineering, and the application of genetic algorithms for the registration of laser scanner point clouds.

  3. Computer busses

    CERN Document Server

    Buchanan, William

    2000-01-01

    As more and more equipment is interface or'bus' driven, either by the use of controllers or directly from PCs, the question of which bus to use is becoming increasingly important both in industry and in the office. 'Computer Busses' has been designed to help choose the best type of bus for the particular application.There are several books which cover individual busses, but none which provide a complete guide to computer busses. The author provides a basic theory of busses and draws examples and applications from real bus case studies. Busses are analysed using from a top-down approach, helpin

  4. Reconfigurable Computing

    CERN Document Server

    Cardoso, Joao MP

    2011-01-01

    As the complexity of modern embedded systems increases, it becomes less practical to design monolithic processing platforms. As a result, reconfigurable computing is being adopted widely for more flexible design. Reconfigurable Computers offer the spatial parallelism and fine-grained customizability of application-specific circuits with the postfabrication programmability of software. To make the most of this unique combination of performance and flexibility, designers need to be aware of both hardware and software issues. FPGA users must think not only about the gates needed to perform a comp

  5. The discrete-dipole-approximation code ADDA: capabilities and known limitations

    NARCIS (Netherlands)

    Yurkin, M.A.; Hoekstra, A.G.

    2011-01-01

    The open-source code ADDA is described, which implements the discrete dipole approximation (DDA), a method to simulate light scattering by finite 3D objects of arbitrary shape and composition. Besides standard sequential execution, ADDA can run on a multiprocessor distributed-memory system,

  6. Riemannian computing in computer vision

    CERN Document Server

    Srivastava, Anuj

    2016-01-01

    This book presents a comprehensive treatise on Riemannian geometric computations and related statistical inferences in several computer vision problems. This edited volume includes chapter contributions from leading figures in the field of computer vision who are applying Riemannian geometric approaches in problems such as face recognition, activity recognition, object detection, biomedical image analysis, and structure-from-motion. Some of the mathematical entities that necessitate a geometric analysis include rotation matrices (e.g. in modeling camera motion), stick figures (e.g. for activity recognition), subspace comparisons (e.g. in face recognition), symmetric positive-definite matrices (e.g. in diffusion tensor imaging), and function-spaces (e.g. in studying shapes of closed contours).   ·         Illustrates Riemannian computing theory on applications in computer vision, machine learning, and robotics ·         Emphasis on algorithmic advances that will allow re-application in other...

  7. Statistical Computing

    Indian Academy of Sciences (India)

    inference and finite population sampling. Sudhakar Kunte. Elements of statistical computing are discussed in this series. ... which captain gets an option to decide whether to field first or bat first ... may of course not be fair, in the sense that the team which wins ... describe two methods of drawing a random number between 0.

  8. Computational biology

    DEFF Research Database (Denmark)

    Hartmann, Lars Røeboe; Jones, Neil; Simonsen, Jakob Grue

    2011-01-01

    Computation via biological devices has been the subject of close scrutiny since von Neumann’s early work some 60 years ago. In spite of the many relevant works in this field, the notion of programming biological devices seems to be, at best, ill-defined. While many devices are claimed or proved t...

  9. Computing News

    CERN Multimedia

    McCubbin, N

    2001-01-01

    We are still five years from the first LHC data, so we have plenty of time to get the computing into shape, don't we? Well, yes and no: there is time, but there's an awful lot to do! The recently-completed CERN Review of LHC Computing gives the flavour of the LHC computing challenge. The hardware scale for each of the LHC experiments is millions of 'SpecInt95' (SI95) units of cpu power and tens of PetaBytes of data storage. PCs today are about 20-30SI95, and expected to be about 100 SI95 by 2005, so it's a lot of PCs. This hardware will be distributed across several 'Regional Centres' of various sizes, connected by high-speed networks. How to realise this in an orderly and timely fashion is now being discussed in earnest by CERN, Funding Agencies, and the LHC experiments. Mixed in with this is, of course, the GRID concept...but that's a topic for another day! Of course hardware, networks and the GRID constitute just one part of the computing. Most of the ATLAS effort is spent on software development. What we ...

  10. Quantum Computation

    Indian Academy of Sciences (India)

    Home; Journals; Resonance – Journal of Science Education; Volume 16; Issue 9. Quantum Computation - Particle and Wave Aspects of Algorithms. Apoorva Patel. General Article Volume 16 Issue 9 September 2011 pp 821-835. Fulltext. Click here to view fulltext PDF. Permanent link:

  11. Cloud computing.

    Science.gov (United States)

    Wink, Diane M

    2012-01-01

    In this bimonthly series, the author examines how nurse educators can use Internet and Web-based technologies such as search, communication, and collaborative writing tools; social networking and social bookmarking sites; virtual worlds; and Web-based teaching and learning programs. This article describes how cloud computing can be used in nursing education.

  12. Computer Recreations.

    Science.gov (United States)

    Dewdney, A. K.

    1988-01-01

    Describes the creation of the computer program "BOUNCE," designed to simulate a weighted piston coming into equilibrium with a cloud of bouncing balls. The model follows the ideal gas law. Utilizes the critical event technique to create the model. Discusses another program, "BOOM," which simulates a chain reaction. (CW)

  13. [Grid computing

    CERN Multimedia

    Wolinsky, H

    2003-01-01

    "Turn on a water spigot, and it's like tapping a bottomless barrel of water. Ditto for electricity: Flip the switch, and the supply is endless. But computing is another matter. Even with the Internet revolution enabling us to connect in new ways, we are still limited to self-contained systems running locally stored software, limited by corporate, institutional and geographic boundaries" (1 page).

  14. Computational Finance

    DEFF Research Database (Denmark)

    Rasmussen, Lykke

    One of the major challenges in todays post-crisis finance environment is calculating the sensitivities of complex products for hedging and risk management. Historically, these derivatives have been determined using bump-and-revalue, but due to the increasing magnitude of these computations does...

  15. Optical Computing

    Indian Academy of Sciences (India)

    Optical computing technology is, in general, developing in two directions. One approach is ... current support in many places, with private companies as well as governments in several countries encouraging such research work. For example, much ... which enables more information to be carried and data to be processed.

  16. The fast Amsterdam multiprocessor (FAMP) operation system

    International Nuclear Information System (INIS)

    Gosman, D.; Hertzberger, L.O.; Holthuizen, D.J.; Por, G.J.A.; Schoorel, M.

    1981-01-01

    The Fast Amsterdam Multi Processor system (FAMP system) is developed for on-line filtering and second stage triggering. The system is based on the MC 68000 microprocessor from MOTOROLA. In this report we will describe: The FAMP operating system software, the features of the slaves and supervisor in the FAMP operating system, the communication between supervisor and slaves using the dual port memories, the communication between user programs and the operating system. The hardware as well as the application of the system will be described elsewhere. (orig.)

  17. Communications Patterns in a Symbolic Multiprocessor.

    Science.gov (United States)

    1987-06-01

    instruction references that Multilisp programs make. The cache hit ratio is greatest when instruction references have a high degree of -- locality. Another...future touches hit an undetermined future. N, The only exception is Consim, in which one third of future touches hit unde- termined futures. Task...Cambridge, MA, June 1985. [52] S. Sugimoto, K. Agusa, K. Tabata , and Y. Ohno. A multi-microprocessor system for concurrent Lisp. In Proceedings of

  18. Decay-usage scheduling in multiprocessors

    NARCIS (Netherlands)

    Epema, D.H.J.

    1998-01-01

    Decay-usage scheduling is a priority-aging time-sharing scheduling policy capable of dealing with a workload of both interactive and batch jobs by decreasing the priority of a job when it acquires CPU time, and by increasing its priority when it does not use the (a) CPU. In this article we deal with

  19. Architecture of an acquisition system-multiprocessors

    International Nuclear Information System (INIS)

    Postec, H.

    1987-07-01

    To follow the huge increasing of concerned parameters in nuclear detection systems, acquisition systems become bigger and have to present very good rapidity performance. At Ganil, four detection systems have been set in Nautilus reaction chamber, that lead to experiment configurations with 700 parameters to process. In front of present acquisition system limitation, a device more relevant to lecture of a large number of channels show off necessary. Functionalities already operating in other systems and hardware already used have been chosen; specific technical solutions were aldo developed to use the most recent techniques and to take in account the four detection system structure of the device [fr

  20. Efficient thermal management for multiprocessor systems

    OpenAIRE

    Coşkun, Ayşe Kıvılcım

    2009-01-01

    High temperatures and large thermal variations on the die create severe challenges in system reliability, performance, leakage power, and cooling costs. Designing for worst-case thermal conditions is highly costly and time-consuming. Therefore, dynamic thermal management methods are needed to maintain safe temperature levels during execution. Conventional management techniques sacrifice performance to control temperature and only consider the hot spots, neglecting the effects of thermal varia...

  1. Computable Frames in Computable Banach Spaces

    Directory of Open Access Journals (Sweden)

    S.K. Kaushik

    2016-06-01

    Full Text Available We develop some parts of the frame theory in Banach spaces from the point of view of Computable Analysis. We define computable M-basis and use it to construct a computable Banach space of scalar valued sequences. Computable Xd frames and computable Banach frames are also defined and computable versions of sufficient conditions for their existence are obtained.

  2. Algebraic computing

    International Nuclear Information System (INIS)

    MacCallum, M.A.H.

    1990-01-01

    The implementation of a new computer algebra system is time consuming: designers of general purpose algebra systems usually say it takes about 50 man-years to create a mature and fully functional system. Hence the range of available systems and their capabilities changes little between one general relativity meeting and the next, despite which there have been significant changes in the period since the last report. The introductory remarks aim to give a brief survey of capabilities of the principal available systems and highlight one or two trends. The reference to the most recent full survey of computer algebra in relativity and brief descriptions of the Maple, REDUCE and SHEEP and other applications are given. (author)

  3. MPSalsa a finite element computer program for reacting flow problems. Part 2 - user`s guide

    Energy Technology Data Exchange (ETDEWEB)

    Salinger, A.; Devine, K.; Hennigan, G.; Moffat, H. [and others

    1996-09-01

    This manual describes the use of MPSalsa, an unstructured finite element (FE) code for solving chemically reacting flow problems on massively parallel computers. MPSalsa has been written to enable the rigorous modeling of the complex geometry and physics found in engineering systems that exhibit coupled fluid flow, heat transfer, mass transfer, and detailed reactions. In addition, considerable effort has been made to ensure that the code makes efficient use of the computational resources of massively parallel (MP), distributed memory architectures in a way that is nearly transparent to the user. The result is the ability to simultaneously model both three-dimensional geometries and flow as well as detailed reaction chemistry in a timely manner on MT computers, an ability we believe to be unique. MPSalsa has been designed to allow the experienced researcher considerable flexibility in modeling a system. Any combination of the momentum equations, energy balance, and an arbitrary number of species mass balances can be solved. The physical and transport properties can be specified as constants, as functions, or taken from the Chemkin library and associated database. Any of the standard set of boundary conditions and source terms can be adapted by writing user functions, for which templates and examples exist.

  4. Extending the length and time scales of Gram–Schmidt Lyapunov vector computations

    Energy Technology Data Exchange (ETDEWEB)

    Costa, Anthony B., E-mail: acosta@northwestern.edu [Department of Chemistry, Northwestern University, Evanston, IL 60208 (United States); Green, Jason R., E-mail: jason.green@umb.edu [Department of Chemistry, Northwestern University, Evanston, IL 60208 (United States); Department of Chemistry, University of Massachusetts Boston, Boston, MA 02125 (United States)

    2013-08-01

    Lyapunov vectors have found growing interest recently due to their ability to characterize systems out of thermodynamic equilibrium. The computation of orthogonal Gram–Schmidt vectors requires multiplication and QR decomposition of large matrices, which grow as N{sup 2} (with the particle count). This expense has limited such calculations to relatively small systems and short time scales. Here, we detail two implementations of an algorithm for computing Gram–Schmidt vectors. The first is a distributed-memory message-passing method using Scalapack. The second uses the newly-released MAGMA library for GPUs. We compare the performance of both codes for Lennard–Jones fluids from N=100 to 1300 between Intel Nahalem/Infiniband DDR and NVIDIA C2050 architectures. To our best knowledge, these are the largest systems for which the Gram–Schmidt Lyapunov vectors have been computed, and the first time their calculation has been GPU-accelerated. We conclude that Lyapunov vector calculations can be significantly extended in length and time by leveraging the power of GPU-accelerated linear algebra.

  5. Extending the length and time scales of Gram–Schmidt Lyapunov vector computations

    International Nuclear Information System (INIS)

    Costa, Anthony B.; Green, Jason R.

    2013-01-01

    Lyapunov vectors have found growing interest recently due to their ability to characterize systems out of thermodynamic equilibrium. The computation of orthogonal Gram–Schmidt vectors requires multiplication and QR decomposition of large matrices, which grow as N 2 (with the particle count). This expense has limited such calculations to relatively small systems and short time scales. Here, we detail two implementations of an algorithm for computing Gram–Schmidt vectors. The first is a distributed-memory message-passing method using Scalapack. The second uses the newly-released MAGMA library for GPUs. We compare the performance of both codes for Lennard–Jones fluids from N=100 to 1300 between Intel Nahalem/Infiniband DDR and NVIDIA C2050 architectures. To our best knowledge, these are the largest systems for which the Gram–Schmidt Lyapunov vectors have been computed, and the first time their calculation has been GPU-accelerated. We conclude that Lyapunov vector calculations can be significantly extended in length and time by leveraging the power of GPU-accelerated linear algebra

  6. Computational Controversy

    OpenAIRE

    Timmermans, Benjamin; Kuhn, Tobias; Beelen, Kaspar; Aroyo, Lora

    2017-01-01

    Climate change, vaccination, abortion, Trump: Many topics are surrounded by fierce controversies. The nature of such heated debates and their elements have been studied extensively in the social science literature. More recently, various computational approaches to controversy analysis have appeared, using new data sources such as Wikipedia, which help us now better understand these phenomena. However, compared to what social sciences have discovered about such debates, the existing computati...

  7. Computed tomography

    International Nuclear Information System (INIS)

    Andre, M.; Resnick, D.

    1988-01-01

    Computed tomography (CT) has matured into a reliable and prominent tool for study of the muscoloskeletal system. When it was introduced in 1973, it was unique in many ways and posed a challenge to interpretation. It is in these unique features, however, that its advantages lie in comparison with conventional techniques. These advantages will be described in a spectrum of important applications in orthopedics and rheumatology

  8. Computed radiography

    International Nuclear Information System (INIS)

    Pupchek, G.

    2004-01-01

    Computed radiography (CR) is an image acquisition process that is used to create digital, 2-dimensional radiographs. CR employs a photostimulable phosphor-based imaging plate, replacing the standard x-ray film and intensifying screen combination. Conventional radiographic exposure equipment is used with no modification required to the existing system. CR can transform an analog x-ray department into a digital one and eliminates the need for chemicals, water, darkrooms and film processor headaches. (author)

  9. Computational universes

    International Nuclear Information System (INIS)

    Svozil, Karl

    2005-01-01

    Suspicions that the world might be some sort of a machine or algorithm existing 'in the mind' of some symbolic number cruncher have lingered from antiquity. Although popular at times, the most radical forms of this idea never reached mainstream. Modern developments in physics and computer science have lent support to the thesis, but empirical evidence is needed before it can begin to replace our contemporary world view

  10. Monitoring of Computing Resource Use of Active Software Releases in ATLAS

    CERN Document Server

    Limosani, Antonio; The ATLAS collaboration

    2016-01-01

    The LHC is the world's most powerful particle accelerator, colliding protons at centre of mass energy of 13 TeV. As the energy and frequency of collisions has grown in the search for new physics, so too has demand for computing resources needed for event reconstruction. We will report on the evolution of resource usage in terms of CPU and RAM in key ATLAS offline reconstruction workflows at the Tier0 at CERN and on the WLCG. Monitoring of workflows is achieved using the ATLAS PerfMon package, which is the standard ATLAS performance monitoring system running inside Athena jobs. Systematic daily monitoring has recently been expanded to include all workflows beginning at Monte Carlo generation through to end user physics analysis, beyond that of event reconstruction. Moreover, the move to a multiprocessor mode in production jobs has facilitated the use of tools, such as "MemoryMonitor", to measure the memory shared across processors in jobs. Resource consumption is broken down into software domains and displayed...

  11. Monitoring of computing resource use of active software releases at ATLAS

    CERN Document Server

    AUTHOR|(INSPIRE)INSPIRE-00219183; The ATLAS collaboration

    2017-01-01

    The LHC is the world’s most powerful particle accelerator, colliding protons at centre of mass energy of 13 TeV. As the energy and frequency of collisions has grown in the search for new physics, so too has demand for computing resources needed for event reconstruction. We will report on the evolution of resource usage in terms of CPU and RAM in key ATLAS offline reconstruction workflows at the TierO at CERN and on the WLCG. Monitoring of workflows is achieved using the ATLAS PerfMon package, which is the standard ATLAS performance monitoring system running inside Athena jobs. Systematic daily monitoring has recently been expanded to include all workflows beginning at Monte Carlo generation through to end-user physics analysis, beyond that of event reconstruction. Moreover, the move to a multiprocessor mode in production jobs has facilitated the use of tools, such as “MemoryMonitor”, to measure the memory shared across processors in jobs. Resource consumption is broken down into software domains and dis...

  12. Parallelization of a Monte Carlo particle transport simulation code

    Science.gov (United States)

    Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

    2010-05-01

    We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.

  13. Customizable computing

    CERN Document Server

    Chen, Yu-Ting; Gill, Michael; Reinman, Glenn; Xiao, Bingjun

    2015-01-01

    Since the end of Dennard scaling in the early 2000s, improving the energy efficiency of computation has been the main concern of the research community and industry. The large energy efficiency gap between general-purpose processors and application-specific integrated circuits (ASICs) motivates the exploration of customizable architectures, where one can adapt the architecture to the workload. In this Synthesis lecture, we present an overview and introduction of the recent developments on energy-efficient customizable architectures, including customizable cores and accelerators, on-chip memory

  14. Computed tomography

    International Nuclear Information System (INIS)

    Wells, P.; Davis, J.; Morgan, M.

    1994-01-01

    X-ray or gamma-ray transmission computed tomography (CT) is a powerful non-destructive evaluation (NDE) technique that produces two-dimensional cross-sectional images of an object without the need to physically section it. CT is also known by the acronym CAT, for computerised axial tomography. This review article presents a brief historical perspective on CT, its current status and the underlying physics. The mathematical fundamentals of computed tomography are developed for the simplest transmission CT modality. A description of CT scanner instrumentation is provided with an emphasis on radiation sources and systems. Examples of CT images are shown indicating the range of materials that can be scanned and the spatial and contrast resolutions that may be achieved. Attention is also given to the occurrence, interpretation and minimisation of various image artefacts that may arise. A final brief section is devoted to the principles and potential of a range of more recently developed tomographic modalities including diffraction CT, positron emission CT and seismic tomography. 57 refs., 2 tabs., 14 figs

  15. Computing Services and Assured Computing

    Science.gov (United States)

    2006-05-01

    fighters’ ability to execute the mission.” Computing Services 4 We run IT Systems that: provide medical care pay the warfighters manage maintenance...users • 1,400 applications • 18 facilities • 180 software vendors • 18,000+ copies of executive software products • Virtually every type of mainframe and... chocs electriques, de branchez les deux cordons d’al imentation avant de faire le depannage P R IM A R Y SD A S B 1 2 PowerHub 7000 RST U L 00- 00

  16. Application of parallel computing techniques to a large-scale reservoir simulation

    International Nuclear Information System (INIS)

    Zhang, Keni; Wu, Yu-Shu; Ding, Chris; Pruess, Karsten

    2001-01-01

    Even with the continual advances made in both computational algorithms and computer hardware used in reservoir modeling studies, large-scale simulation of fluid and heat flow in heterogeneous reservoirs remains a challenge. The problem commonly arises from intensive computational requirement for detailed modeling investigations of real-world reservoirs. This paper presents the application of a massive parallel-computing version of the TOUGH2 code developed for performing large-scale field simulations. As an application example, the parallelized TOUGH2 code is applied to develop a three-dimensional unsaturated-zone numerical model simulating flow of moisture, gas, and heat in the unsaturated zone of Yucca Mountain, Nevada, a potential repository for high-level radioactive waste. The modeling approach employs refined spatial discretization to represent the heterogeneous fractured tuffs of the system, using more than a million 3-D gridblocks. The problem of two-phase flow and heat transfer within the model domain leads to a total of 3,226,566 linear equations to be solved per Newton iteration. The simulation is conducted on a Cray T3E-900, a distributed-memory massively parallel computer. Simulation results indicate that the parallel computing technique, as implemented in the TOUGH2 code, is very efficient. The reliability and accuracy of the model results have been demonstrated by comparing them to those of small-scale (coarse-grid) models. These comparisons show that simulation results obtained with the refined grid provide more detailed predictions of the future flow conditions at the site, aiding in the assessment of proposed repository performance

  17. FPGA-based distributed computing microarchitecture for complex physical dynamics investigation.

    Science.gov (United States)

    Borgese, Gianluca; Pace, Calogero; Pantano, Pietro; Bilotta, Eleonora

    2013-09-01

    In this paper, we present a distributed computing system, called DCMARK, aimed at solving partial differential equations at the basis of many investigation fields, such as solid state physics, nuclear physics, and plasma physics. This distributed architecture is based on the cellular neural network paradigm, which allows us to divide the differential equation system solving into many parallel integration operations to be executed by a custom multiprocessor system. We push the number of processors to the limit of one processor for each equation. In order to test the present idea, we choose to implement DCMARK on a single FPGA, designing the single processor in order to minimize its hardware requirements and to obtain a large number of easily interconnected processors. This approach is particularly suited to study the properties of 1-, 2- and 3-D locally interconnected dynamical systems. In order to test the computing platform, we implement a 200 cells, Korteweg-de Vries (KdV) equation solver and perform a comparison between simulations conducted on a high performance PC and on our system. Since our distributed architecture takes a constant computing time to solve the equation system, independently of the number of dynamical elements (cells) of the CNN array, it allows us to reduce the elaboration time more than other similar systems in the literature. To ensure a high level of reconfigurability, we design a compact system on programmable chip managed by a softcore processor, which controls the fast data/control communication between our system and a PC Host. An intuitively graphical user interface allows us to change the calculation parameters and plot the results.

  18. Computational neuroscience

    CERN Document Server

    Blackwell, Kim L

    2014-01-01

    Progress in Molecular Biology and Translational Science provides a forum for discussion of new discoveries, approaches, and ideas in molecular biology. It contains contributions from leaders in their fields and abundant references. This volume brings together different aspects of, and approaches to, molecular and multi-scale modeling, with applications to a diverse range of neurological diseases. Mathematical and computational modeling offers a powerful approach for examining the interaction between molecular pathways and ionic channels in producing neuron electrical activity. It is well accepted that non-linear interactions among diverse ionic channels can produce unexpected neuron behavior and hinder a deep understanding of how ion channel mutations bring about abnormal behavior and disease. Interactions with the diverse signaling pathways activated by G protein coupled receptors or calcium influx adds an additional level of complexity. Modeling is an approach to integrate myriad data sources into a cohesiv...

  19. Social Computing

    CERN Multimedia

    CERN. Geneva

    2011-01-01

    The past decade has witnessed a momentous transformation in the way people interact with each other. Content is now co-produced, shared, classified, and rated by millions of people, while attention has become the ephemeral and valuable resource that everyone seeks to acquire. This talk will describe how social attention determines the production and consumption of content within both the scientific community and social media, how its dynamics can be used to predict the future and the role that social media plays in setting the public agenda. About the speaker Bernardo Huberman is a Senior HP Fellow and Director of the Social Computing Lab at Hewlett Packard Laboratories. He received his Ph.D. in Physics from the University of Pennsylvania, and is currently a Consulting Professor in the Department of Applied Physics at Stanford University. He originally worked in condensed matter physics, ranging from superionic conductors to two-dimensional superfluids, and made contributions to the theory of critical p...

  20. computer networks

    Directory of Open Access Journals (Sweden)

    N. U. Ahmed

    2002-01-01

    Full Text Available In this paper, we construct a new dynamic model for the Token Bucket (TB algorithm used in computer networks and use systems approach for its analysis. This model is then augmented by adding a dynamic model for a multiplexor at an access node where the TB exercises a policing function. In the model, traffic policing, multiplexing and network utilization are formally defined. Based on the model, we study such issues as (quality of service QoS, traffic sizing and network dimensioning. Also we propose an algorithm using feedback control to improve QoS and network utilization. Applying MPEG video traces as the input traffic to the model, we verify the usefulness and effectiveness of our model.