WorldWideScience

Sample records for supercomputers confronts parallelism

  1. Ultrascalable petaflop parallel supercomputer

    Energy Technology Data Exchange (ETDEWEB)

    Blumrich, Matthias A. (Ridgefield, CT); Chen, Dong (Croton On Hudson, NY); Chiu, George (Cross River, NY); Cipolla, Thomas M. (Katonah, NY); Coteus, Paul W. (Yorktown Heights, NY); Gara, Alan G. (Mount Kisco, NY); Giampapa, Mark E. (Irvington, NY); Hall, Shawn (Pleasantville, NY); Haring, Rudolf A. (Cortlandt Manor, NY); Heidelberger, Philip (Cortlandt Manor, NY); Kopcsay, Gerard V. (Yorktown Heights, NY); Ohmacht, Martin (Yorktown Heights, NY); Salapura, Valentina (Chappaqua, NY); Sugavanam, Krishnan (Mahopac, NY); Takken, Todd (Brewster, NY)

    2010-07-20

    A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

  2. Parallel supercomputers for lattice gauge theory.

    Science.gov (United States)

    Brown, F R; Christ, N H

    1988-03-18

    During the past 10 years, particle physicists have increasingly employed numerical simulation to answer fundamental theoretical questions about the properties of quarks and gluons. The enormous computer resources required by quantum chromodynamic calculations have inspired the design and construction of very powerful, highly parallel, dedicated computers optimized for this work. This article gives a brief description of the numerical structure and current status of these large-scale lattice gauge theory calculations, with emphasis on the computational demands they make. The architecture, present state, and potential of these special-purpose supercomputers is described. It is argued that a numerical solution of low energy quantum chromodynamics may well be achieved by these machines.

  3. Multi-petascale highly efficient parallel supercomputer

    Energy Technology Data Exchange (ETDEWEB)

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O' Brien, John K.; O' Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2015-07-14

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

  4. Applications of parallel supercomputers: Scientific results and computer science lessons

    Energy Technology Data Exchange (ETDEWEB)

    Fox, G.C.

    1989-07-12

    Parallel Computing has come of age with several commercial and inhouse systems that deliver supercomputer performance. We illustrate this with several major computations completed or underway at Caltech on hypercubes, transputer arrays and the SIMD Connection Machine CM-2 and AMT DAP. Applications covered are lattice gauge theory, computational fluid dynamics, subatomic string dynamics, statistical and condensed matter physics,theoretical and experimental astronomy, quantum chemistry, plasma physics, grain dynamics, computer chess, graphics ray tracing, and Kalman filters. We use these applications to compare the performance of several advanced architecture computers including the conventional CRAY and ETA-10 supercomputers. We describe which problems are suitable for which computers in the terms of a matching between problem and computer architecture. This is part of a set of lessons we draw for hardware, software, and performance. We speculate on the emergence of new academic disciplines motivated by the growing importance of computers. 138 refs., 23 figs., 10 tabs.

  5. Numerical simulations of astrophysical problems on massively parallel supercomputers

    Science.gov (United States)

    Kulikov, Igor; Chernykh, Igor; Glinsky, Boris

    2016-10-01

    In this paper, we propose the last version of the numerical model for simulation of astrophysical objects dynamics, and a new realization of our AstroPhi code for Intel Xeon Phi based RSC PetaStream supercomputers. The co-design of a computational model for the description of astrophysical objects is described. The parallel implementation and scalability tests of the AstroPhi code are presented. We achieve a 73% weak scaling efficiency with using of 256x Intel Xeon Phi accelerators with 61440 threads.

  6. AENEAS A Custom-built Parallel Supercomputer for Quantum Gravity

    CERN Document Server

    Hamber, H W

    1998-01-01

    Accurate Quantum Gravity calculations, based on the simplicial lattice formulation, are computationally very demanding and require vast amounts of computer resources. A custom-made 64-node parallel supercomputer capable of performing up to $2 \\times 10^{10}$ floating point operations per second has been assembled entirely out of commodity components, and has been operational for the last ten months. It will allow the numerical computation of a variety of quantities of physical interest in quantum gravity and related field theories, including the estimate of the critical exponents in the vicinity of the ultraviolet fixed point to an accuracy of a few percent.

  7. Refinement of herpesvirus B-capsid structure on parallel supercomputers.

    Science.gov (United States)

    Zhou, Z H; Chiu, W; Haskell, K; Spears, H; Jakana, J; Rixon, F J; Scott, L R

    1998-01-01

    Electron cryomicroscopy and icosahedral reconstruction are used to obtain the three-dimensional structure of the 1250-A-diameter herpesvirus B-capsid. The centers and orientations of particles in focal pairs of 400-kV, spot-scan micrographs are determined and iteratively refined by common-lines-based local and global refinement procedures. We describe the rationale behind choosing shared-memory multiprocessor computers for executing the global refinement, which is the most computationally intensive step in the reconstruction procedure. This refinement has been implemented on three different shared-memory supercomputers. The speedup and efficiency are evaluated by using test data sets with different numbers of particles and processors. Using this parallel refinement program, we refine the herpesvirus B-capsid from 355-particle images to 13-A resolution. The map shows new structural features and interactions of the protein subunits in the three distinct morphological units: penton, hexon, and triplex of this T = 16 icosahedral particle.

  8. Micro-mechanical Simulations of Soils using Massively Parallel Supercomputers

    Directory of Open Access Journals (Sweden)

    David W. Washington

    2004-06-01

    Full Text Available In this research a computer program, Trubal version 1.51, based on the Discrete Element Method was converted to run on a Connection Machine (CM-5,a massively parallel supercomputer with 512 nodes, to expedite the computational times of simulating Geotechnical boundary value problems. The dynamic memory algorithm in Trubal program did not perform efficiently in CM-2 machine with the Single Instruction Multiple Data (SIMD architecture. This was due to the communication overhead involving global array reductions, global array broadcast and random data movement. Therefore, a dynamic memory algorithm in Trubal program was converted to a static memory arrangement and Trubal program was successfully converted to run on CM-5 machines. The converted program was called "TRUBAL for Parallel Machines (TPM." Simulating two physical triaxial experiments and comparing simulation results with Trubal simulations validated the TPM program. With a 512 nodes CM-5 machine TPM produced a nine-fold speedup demonstrating the inherent parallelism within algorithms based on the Discrete Element Method.

  9. Using the multistage cube network topology in parallel supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Siegel, H.J.; Nation, W.G. (Purdue Univ., Lafayette, IN (USA). School of Electrical Engineering); Kruskal, C.P. (Maryland Univ., College Park, MD (USA). Dept. of Computer Science); Napolitano, L.M. Jr. (Sandia National Labs., Livermore, CA (USA))

    1989-12-01

    A variety of approaches to designing the interconnection network to support communications among the processors and memories of supercomputers employing large-scale parallel processing have been proposed and/or implemented. These approaches are often based on the multistage cube topology. This topology is the subject of much ongoing research and study because of the ways in which the multistage cube can be used. The attributes of the topology that make it useful are described. These include O(N log{sub 2} N) cost for an N input/output network, decentralized control, a variety of implementation options, good data permuting capability to support single instruction stream/multiple data stream (SIMD) parallelism, good throughput to support multiple instruction stream/multiple data stream (MIMD) parallelism, and ability to be partitioned into independent subnetworks to support reconfigurable systems. Examples of existing systems that use multistage cube networks are overviewed. The multistage cube topology can be converted into a single-stage network by associating with each switch in the network a processor (and a memory). Properties of systems that use the multistage cube network in this way are also examined.

  10. Requirements for supercomputing in energy research: The transition to massively parallel computing

    Energy Technology Data Exchange (ETDEWEB)

    1993-02-01

    This report discusses: The emergence of a practical path to TeraFlop computing and beyond; requirements of energy research programs at DOE; implementation: supercomputer production computing environment on massively parallel computers; and implementation: user transition to massively parallel computing.

  11. Programming Environment for a High-Performance Parallel Supercomputer with Intelligent Communication

    OpenAIRE

    A. Gunzinger; BÄumle, B.; Frey, M.; Klebl, M.; Kocheisen, M.; Kohler, P.; Morel, R.; Müller, U; Rosenthal, M

    1996-01-01

    At the Electronics Laboratory of the Swiss Federal Institute of Technology (ETH) in Zürich, the high-performance parallel supercomputer MUSIC (MUlti processor System with Intelligent Communication) has been developed. As applications like neural network simulation and molecular dynamics show, the Electronics Laboratory supercomputer is absolutely on par with those of conventional supercomputers, but electric power requirements are reduced by a factor of 1,000, weight is reduced by a factor of...

  12. Programming Environment for a High-Performance Parallel Supercomputer with Intelligent Communication

    Directory of Open Access Journals (Sweden)

    A. Gunzinger

    1996-01-01

    Full Text Available At the Electronics Laboratory of the Swiss Federal Institute of Technology (ETH in Zürich, the high-performance parallel supercomputer MUSIC (MUlti processor System with Intelligent Communication has been developed. As applications like neural network simulation and molecular dynamics show, the Electronics Laboratory supercomputer is absolutely on par with those of conventional supercomputers, but electric power requirements are reduced by a factor of 1,000, weight is reduced by a factor of 400, and price is reduced by a factor of 100. Software development is a key issue of such parallel systems. This article focuses on the programming environment of the MUSIC system and on its applications.

  13. Design and performance characterization of electronic structure calculations on massively parallel supercomputers

    DEFF Research Database (Denmark)

    Romero, N. A.; Glinsvad, Christian; Larsen, Ask Hjorth

    2013-01-01

    Density function theory (DFT) is the most widely employed electronic structure method because of its favorable scaling with system size and accuracy for a broad range of molecular and condensed-phase systems. The advent of massively parallel supercomputers has enhanced the scientific community's ...

  14. ATLAS FTK a - very complex - custom parallel supercomputer

    CERN Document Server

    Kimura, Naoki; The ATLAS collaboration

    2016-01-01

    In the ever increasing pile-up LHC environment advanced techniques of analysing the data are implemented in order to increase the rate of relevant physics processes with respect to background processes. The Fast TracKer (FTK) is a track finding implementation at hardware level that is designed to deliver full-scan tracks with $p_{T}$ above 1GeV to the ATLAS trigger system for every L1 accept (at a maximum rate of 100kHz). In order to achieve this performance a highly parallel system was designed and now it is under installation in ATLAS. In the beginning of 2016 it will provide tracks for the trigger system in a region covering the central part of the ATLAS detector, and during the year it's coverage will be extended to the full detector coverage. The system relies on matching hits coming from the silicon tracking detectors against 1 billion patterns stored in specially designed ASICS chips (Associative memory - AM06). In a first stage coarse resolution hits are matched against the patterns and the accepted h...

  15. Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

    Science.gov (United States)

    Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

    2011-01-01

    The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.

  16. Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer.

    Science.gov (United States)

    Suplatov, Dmitry; Popova, Nina; Zhumatiy, Sergey; Voevodin, Vladimir; Švedas, Vytas

    2016-04-01

    Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpiWrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads - one for task management and communication, and another for subtask execution - are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper .

  17. Scalable parallel programming for high performance seismic simulation on petascale heterogeneous supercomputers

    Science.gov (United States)

    Zhou, Jun

    The 1994 Northridge earthquake in Los Angeles, California, killed 57 people, injured over 8,700 and caused an estimated $20 billion in damage. Petascale simulations are needed in California and elsewhere to provide society with a better understanding of the rupture and wave dynamics of the largest earthquakes at shaking frequencies required to engineer safe structures. As the heterogeneous supercomputing infrastructures are becoming more common, numerical developments in earthquake system research are particularly challenged by the dependence on the accelerator elements to enable "the Big One" simulations with higher frequency and finer resolution. Reducing time to solution and power consumption are two primary focus area today for the enabling technology of fault rupture dynamics and seismic wave propagation in realistic 3D models of the crust's heterogeneous structure. This dissertation presents scalable parallel programming techniques for high performance seismic simulation running on petascale heterogeneous supercomputers. A real world earthquake simulation code, AWP-ODC, one of the most advanced earthquake codes to date, was chosen as the base code in this research, and the testbed is based on Titan at Oak Ridge National Laboraratory, the world's largest hetergeneous supercomputer. The research work is primarily related to architecture study, computation performance tuning and software system scalability. An earthquake simulation workflow has also been developed to support the efficient production sets of simulations. The highlights of the technical development are an aggressive performance optimization focusing on data locality and a notable data communication model that hides the data communication latency. This development results in the optimal computation efficiency and throughput for the 13-point stencil code on heterogeneous systems, which can be extended to general high-order stencil codes. Started from scratch, the hybrid CPU/GPU version of AWP

  18. An Optimized Parallel FDTD Topology for Challenging Electromagnetic Simulations on Supercomputers

    Directory of Open Access Journals (Sweden)

    Shugang Jiang

    2015-01-01

    Full Text Available It may not be a challenge to run a Finite-Difference Time-Domain (FDTD code for electromagnetic simulations on a supercomputer with more than 10 thousands of CPU cores; however, to make FDTD code work with the highest efficiency is a challenge. In this paper, the performance of parallel FDTD is optimized through MPI (message passing interface virtual topology, based on which a communication model is established. The general rules of optimal topology are presented according to the model. The performance of the method is tested and analyzed on three high performance computing platforms with different architectures in China. Simulations including an airplane with a 700-wavelength wingspan, and a complex microstrip antenna array with nearly 2000 elements are performed very efficiently using a maximum of 10240 CPU cores.

  19. A CPU/MIC Collaborated Parallel Framework for GROMACS on Tianhe-2 Supercomputer.

    Science.gov (United States)

    Peng, Shaoliang; Yang, Shunyun; Su, Wenhe; Zhang, Xiaoyu; Zhang, Tenglilang; Liu, Weiguo; Zhao, Xingming

    2017-06-16

    Molecular Dynamics (MD) is the simulation of the dynamic behavior of atoms and molecules. As the most popular software for molecular dynamics, GROMACS cannot work on large-scale data because of limit computing resources. In this paper, we propose a CPU and Intel® Xeon Phi Many Integrated Core (MIC) collaborated parallel framework to accelerate GROMACS using the offload mode on a MIC coprocessor, with which the performance of GROMACS is improved significantly, especially with the utility of Tianhe-2 supercomputer. Furthermore, we optimize GROMACS so that it can run on both the CPU and MIC at the same time. In addition, we accelerate multi-node GROMACS so that it can be used in practice. Benchmarking on real data, our accelerated GROMACS performs very well and reduces computation time significantly. Source code: https://github.com/tianhe2/gromacs-mic.

  20. A Parallel Supercomputer Implementation of a Biological Inspired Neural Network and its use for Pattern Recognition

    Science.gov (United States)

    de Ladurantaye, Vincent; Lavoie, Jean; Bergeron, Jocelyn; Parenteau, Maxime; Lu, Huizhong; Pichevar, Ramin; Rouat, Jean

    2012-02-01

    A parallel implementation of a large spiking neural network is proposed and evaluated. The neural network implements the binding by synchrony process using the Oscillatory Dynamic Link Matcher (ODLM). Scalability, speed and performance are compared for 2 implementations: Message Passing Interface (MPI) and Compute Unified Device Architecture (CUDA) running on clusters of multicore supercomputers and NVIDIA graphical processing units respectively. A global spiking list that represents at each instant the state of the neural network is described. This list indexes each neuron that fires during the current simulation time so that the influence of their spikes are simultaneously processed on all computing units. Our implementation shows a good scalability for very large networks. A complex and large spiking neural network has been implemented in parallel with success, thus paving the road towards real-life applications based on networks of spiking neurons. MPI offers a better scalability than CUDA, while the CUDA implementation on a GeForce GTX 285 gives the best cost to performance ratio. When running the neural network on the GTX 285, the processing speed is comparable to the MPI implementation on RQCHP's Mammouth parallel with 64 notes (128 cores).

  1. Massively-parallel electrical-conductivity imaging of hydrocarbonsusing the Blue Gene/L supercomputer

    Energy Technology Data Exchange (ETDEWEB)

    Commer, M.; Newman, G.A.; Carazzone, J.J.; Dickens, T.A.; Green,K.E.; Wahrmund, L.A.; Willen, D.E.; Shiu, J.

    2007-05-16

    Large-scale controlled source electromagnetic (CSEM)three-dimensional (3D) geophysical imaging is now receiving considerableattention for electrical conductivity mapping of potential offshore oiland gas reservoirs. To cope with the typically large computationalrequirements of the 3D CSEM imaging problem, our strategies exploitcomputational parallelism and optimized finite-difference meshing. Wereport on an imaging experiment, utilizing 32,768 tasks/processors on theIBM Watson Research Blue Gene/L (BG/L) supercomputer. Over a 24-hourperiod, we were able to image a large scale marine CSEM field data setthat previously required over four months of computing time ondistributed clusters utilizing 1024 tasks on an Infiniband fabric. Thetotal initial data misfit could be decreased by 67 percent within 72completed inversion iterations, indicating an electrically resistiveregion in the southern survey area below a depth of 1500 m below theseafloor. The major part of the residual misfit stems from transmitterparallel receiver components that have an offset from the transmittersail line (broadside configuration). Modeling confirms that improvedbroadside data fits can be achieved by considering anisotropic electricalconductivities. While delivering a satisfactory gross scale image for thedepths of interest, the experiment provides important evidence for thenecessity of discriminating between horizontal and verticalconductivities for maximally consistent 3D CSEM inversions.

  2. MPI/OpenMP Hybrid Parallel Algorithm of Resolution of Identity Second-Order Møller-Plesset Perturbation Calculation for Massively Parallel Multicore Supercomputers.

    Science.gov (United States)

    Katouda, Michio; Nakajima, Takahito

    2013-12-10

    A new algorithm for massively parallel calculations of electron correlation energy of large molecules based on the resolution of identity second-order Møller-Plesset perturbation (RI-MP2) technique is developed and implemented into the quantum chemistry software NTChem. In this algorithm, a Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) hybrid parallel programming model is applied to attain efficient parallel performance on massively parallel supercomputers. An in-core storage scheme of intermediate data of three-center electron repulsion integrals utilizing the distributed memory is developed to eliminate input/output (I/O) overhead. The parallel performance of the algorithm is tested on massively parallel supercomputers such as the K computer (using up to 45 992 central processing unit (CPU) cores) and a commodity Intel Xeon cluster (using up to 8192 CPU cores). The parallel RI-MP2/cc-pVTZ calculation of two-layer nanographene sheets (C150H30)2 (number of atomic orbitals is 9640) is performed using 8991 node and 71 288 CPU cores of the K computer.

  3. The PVM (Parallel Virtual Machine) system: Supercomputer level concurrent computation on a network of IBM RS/6000 power stations

    Energy Technology Data Exchange (ETDEWEB)

    Sunderam, V.S. (Emory Univ., Atlanta, GA (USA). Dept. of Mathematics and Computer Science); Geist, G.A. (Oak Ridge National Lab., TN (USA))

    1991-01-01

    The PVM (Parallel Virtual Machine) system enables supercomputer level concurrent computations to be performed on interconnected networks of heterogeneous computer systems. Specifically, a network of 13 IBM RS/6000 powerstations has been successfully used to execute production quality runs of superconductor modeling codes at more than 250 Mflops. This work demonstrates the effectiveness of cooperative concurrent processing for high performance applications, and shows that supercomputer level computations may be attained at a fraction of the cost on distributed computing platforms. This paper describes the PVM programming environment and user facilities, as they apply to hardware platforms comprising a network of IBM RS/6000 powerstations. The salient design features of PVM will be discussed; including heterogeneity, scalability, multilanguage support, provisions for fault tolerance, the use of multiprocessors and scalar machines, an interactive graphical front end, and support for profiling, tracing, and visual analysis. The PVM system has been used extensively, and a range of production quality concurrent applications have been successfully executed using PVM on a variety of networked platforms. The paper will mention representative examples, and discuss two in detail. The first is a material sciences problem that was originally developed on a Cray 2. This application code calculates the electronic structure of metallic alloys from first principles and is based on the KKR-CPA algorithm. The second is a molecular dynamics simulation for calculating materials properties. Performance results for both applicants on networks of RS/6000 powerstations will be presented, and accompanied by discussions of the other advantages of PVM and its potential as a complement or alternative to conventional supercomputers.

  4. An efficient highly parallel implementation of a large air pollution model on an IBM blue gene supercomputer

    Science.gov (United States)

    Ostromsky, Tz.; Georgiev, K.; Zlatev, Z.

    2012-10-01

    In this paper we discuss the efficient distributed-memory parallelization strategy of the Unified Danish Eulerian Model (UNI-DEM). We apply an improved decomposition strategy to the spatial domain in order to get more parallel tasks (based on the larger number of subdomains) with less communications between them (due to optimization of the overlapping area when the advection-diffusion problem is solved numerically). This kind of rectangular block partitioning (with a squareshape trend) allows us not only to increase significantly the number of potential parallel tasks, but also to reduce the local memory requirements per task, which is critical for the distributed-memory implementation of the higher-resolution/finergrid versions of UNI-DEM on some parallel systems, and particularly on the IBM BlueGene/P platform - our target hardware. We will show by experiments that our new parallel implementation can use rather efficiently the resources of the powerful IBM BlueGene/P supercomputer, the largest in Bulgaria, up to its full capacity. It turned out to be extremely useful in the large and computationally expensive numerical experiments, carried out to calculate some initial data for sensitivity analysis of the Danish Eulerian model.

  5. Time Parallel Solution of Linear Partial Differential Equations on the Intel Touchstone Delta Supercomputer

    Science.gov (United States)

    Toomarian, N.; Fijany, A.; Barhen, J.

    1993-01-01

    Evolutionary partial differential equations are usually solved by decretization in time and space, and by applying a marching in time procedure to data and algorithms potentially parallelized in the spatial domain.

  6. Parallel Supercomputing PC Cluster and Some Physical Results in Lattice QCD

    Institute of Scientific and Technical Information of China (English)

    LUOXiang-Qian; MEIZhong-Hao; EricB.Gregory; YANGJie-Chao; WANGYu-Li; LINYin

    2003-01-01

    We describe the construction of a high performance parallel computer composed of PC components, present some physical results for light hadron and hybrid meson masses from lattice QCD. We also show that the smearing technique is very useful for improving the spectrum calculations.

  7. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

    Directory of Open Access Journals (Sweden)

    Mark James Abraham

    2015-09-01

    Full Text Available GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported.

  8. Advancements and performance of iterative methods in industrial applications codes on CRAY parallel/vector supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Poole, G.; Heroux, M. [Engineering Applications Group, Eagan, MN (United States)

    1994-12-31

    This paper will focus on recent work in two widely used industrial applications codes with iterative methods. The ANSYS program, a general purpose finite element code widely used in structural analysis applications, has now added an iterative solver option. Some results are given from real applications comparing performance with the tradition parallel/vector frontal solver used in ANSYS. Discussion of the applicability of iterative solvers as a general purpose solver will include the topics of robustness, as well as memory requirements and CPU performance. The FIDAP program is a widely used CFD code which uses iterative solvers routinely. A brief description of preconditioners used and some performance enhancements for CRAY parallel/vector systems is given. The solution of large-scale applications in structures and CFD includes examples from industry problems solved on CRAY systems.

  9. Modeling cardiovascular hemodynamics using the lattice Boltzmann method on massively parallel supercomputers

    Science.gov (United States)

    Randles, Amanda Elizabeth

    the modeling of fluids in vessels with smaller diameters and a method for introducing the deformational forces exerted on the arterial flows from the movement of the heart by borrowing concepts from cosmodynamics are presented. These additional forces have a great impact on the endothelial shear stress. Third, the fluid model is extended to not only recover Navier-Stokes hydrodynamics, but also a wider range of Knudsen numbers, which is especially important in micro- and nano-scale flows. The tradeoffs of many optimizations methods such as the use of deep halo level ghost cells that, alongside hybrid programming models, reduce the impact of such higher-order models and enable efficient modeling of extreme regimes of computational fluid dynamics are discussed. Fourth, the extension of these models to other research questions like clogging in microfluidic devices and determining the severity of co-arctation of the aorta is presented. Through this work, a validation of these methods by taking real patient data and the measured pressure value before the narrowing of the aorta and predicting the pressure drop across the co-arctation is shown. Comparison with the measured pressure drop in vivo highlights the accuracy and potential impact of such patient specific simulations. Finally, a method to enable the simulation of longer trajectories in time by discretizing both spatially and temporally is presented. In this method, a serial coarse iterator is used to initialize data at discrete time steps for a fine model that runs in parallel. This coarse solver is based on a larger time step and typically a coarser discretization in space. Iterative refinement enables the compute-intensive fine iterator to be modeled with temporal parallelization. The algorithm consists of a series of prediction-corrector iterations completing when the results have converged within a certain tolerance. Combined, these developments allow large fluid models to be simulated for longer time durations

  10. Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore supercomputers

    KAUST Repository

    Wu, Xingfu

    2011-03-29

    The NAS Parallel Benchmarks (NPB) are well-known applications with the fixed algorithms for evaluating parallel systems and tools. Multicore supercomputers provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node and MPI can be used with the communication between nodes. In this paper, we use SP and BT benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore supercomputers. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76%, and the hybrid BT outperforms the MPI BT by up to 8.58% on up to 10,000 cores on BlueGene/P at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. We also use performance tools and MPI trace libraries available on these supercomputers to further investigate the performance characteristics of the hybrid SP and BT.

  11. Supercomputational science

    CERN Document Server

    Wilson, S

    1990-01-01

    In contemporary research, the supercomputer now ranks, along with radio telescopes, particle accelerators and the other apparatus of "big science", as an expensive resource, which is nevertheless essential for state of the art research. Supercomputers are usually provided as shar.ed central facilities. However, unlike, telescopes and accelerators, they are find a wide range of applications which extends across a broad spectrum of research activity. The difference in performance between a "good" and a "bad" computer program on a traditional serial computer may be a factor of two or three, but on a contemporary supercomputer it can easily be a factor of one hundred or even more! Furthermore, this factor is likely to increase with future generations of machines. In keeping with the large capital and recurrent costs of these machines, it is appropriate to devote effort to training and familiarization so that supercomputers are employed to best effect. This volume records the lectures delivered at a Summer School ...

  12. Supercomputer debugging workshop 1991 proceedings

    Energy Technology Data Exchange (ETDEWEB)

    Brown, J.

    1991-01-01

    This report discusses the following topics on supercomputer debugging: Distributed debugging; use interface to debugging tools and standards; debugging optimized codes; debugging parallel codes; and debugger performance and interface as analysis tools. (LSP)

  13. Supercomputer debugging workshop 1991 proceedings

    Energy Technology Data Exchange (ETDEWEB)

    Brown, J.

    1991-12-31

    This report discusses the following topics on supercomputer debugging: Distributed debugging; use interface to debugging tools and standards; debugging optimized codes; debugging parallel codes; and debugger performance and interface as analysis tools. (LSP)

  14. Energy sciences supercomputing 1990

    Energy Technology Data Exchange (ETDEWEB)

    Mirin, A.A.; Kaiper, G.V. (eds.)

    1990-01-01

    This report contains papers on the following topics: meeting the computational challenge; lattice gauge theory: probing the standard model; supercomputing for the superconducting super collider; and overview of ongoing studies in climate model diagnosis and intercomparison; MHD simulation of the fueling of a tokamak fusion reactor through the injection of compact toroids; gyrokinetic particle simulation of tokamak plasmas; analyzing chaos: a visual essay in nonlinear dynamics; supercomputing and research in theoretical chemistry; monte carlo simulations of light nuclei; parallel processing; and scientists of the future: learning by doing.

  15. Statistical correlations and risk analyses techniques for a diving dual phase bubble model and data bank using massively parallel supercomputers.

    Science.gov (United States)

    Wienke, B R; O'Leary, T R

    2008-05-01

    Linking model and data, we detail the LANL diving reduced gradient bubble model (RGBM), dynamical principles, and correlation with data in the LANL Data Bank. Table, profile, and meter risks are obtained from likelihood analysis and quoted for air, nitrox, helitrox no-decompression time limits, repetitive dive tables, and selected mixed gas and repetitive profiles. Application analyses include the EXPLORER decompression meter algorithm, NAUI tables, University of Wisconsin Seafood Diver tables, comparative NAUI, PADI, Oceanic NDLs and repetitive dives, comparative nitrogen and helium mixed gas risks, USS Perry deep rebreather (RB) exploration dive,world record open circuit (OC) dive, and Woodville Karst Plain Project (WKPP) extreme cave exploration profiles. The algorithm has seen extensive and utilitarian application in mixed gas diving, both in recreational and technical sectors, and forms the bases forreleased tables and decompression meters used by scientific, commercial, and research divers. The LANL Data Bank is described, and the methods used to deduce risk are detailed. Risk functions for dissolved gas and bubbles are summarized. Parameters that can be used to estimate profile risk are tallied. To fit data, a modified Levenberg-Marquardt routine is employed with L2 error norm. Appendices sketch the numerical methods, and list reports from field testing for (real) mixed gas diving. A Monte Carlo-like sampling scheme for fast numerical analysis of the data is also detailed, as a coupled variance reduction technique and additional check on the canonical approach to estimating diving risk. The method suggests alternatives to the canonical approach. This work represents a first time correlation effort linking a dynamical bubble model with deep stop data. Supercomputing resources are requisite to connect model and data in application.

  16. Scalable fine-grained parallelization of plane-wave-based ab initio molecular dynamics for large supercomputers.

    Science.gov (United States)

    Vadali, Ramkumar V; Shi, Yan; Kumar, Sameer; Kale, Laxmikant V; Tuckerman, Mark E; Martyna, Glenn J

    2004-12-01

    Many systems of great importance in material science, chemistry, solid-state physics, and biophysics require forces generated from an electronic structure calculation, as opposed to an empirically derived force law to describe their properties adequately. The use of such forces as input to Newton's equations of motion forms the basis of the ab initio molecular dynamics method, which is able to treat the dynamics of chemical bond-breaking and -forming events. However, a very large number of electronic structure calculations must be performed to compute an ab initio molecular dynamics trajectory, making the efficiency as well as the accuracy of the electronic structure representation critical issues. One efficient and accurate electronic structure method is the generalized gradient approximation to the Kohn-Sham density functional theory implemented using a plane-wave basis set and atomic pseudopotentials. The marriage of the gradient-corrected density functional approach with molecular dynamics, as pioneered by Car and Parrinello (R. Car and M. Parrinello, Phys Rev Lett 1985, 55, 2471), has been demonstrated to be capable of elucidating the atomic scale structure and dynamics underlying many complex systems at finite temperature. However, despite the relative efficiency of this approach, it has not been possible to obtain parallel scaling of the technique beyond several hundred processors on moderately sized systems using standard approaches. Consequently, the time scales that can be accessed and the degree of phase space sampling are severely limited. To take advantage of next generation computer platforms with thousands of processors such as IBM's BlueGene, a novel scalable parallelization strategy for Car-Parrinello molecular dynamics is developed using the concept of processor virtualization as embodied by the Charm++ parallel programming system. Charm++ allows the diverse elements of a Car-Parrinello molecular dynamics calculation to be interleaved with low

  17. Comparing Clusters and Supercomputers for Lattice QCD

    CERN Document Server

    Gottlieb, S

    2001-01-01

    Since the development of the Beowulf project to build a parallel computer from commodity PC components, there have been many such clusters built. The MILC QCD code has been run on a variety of clusters and supercomputers. Key design features are identified, and the cost effectiveness of clusters and supercomputers are compared.

  18. Low Cost Supercomputer for Applications in Physics

    Science.gov (United States)

    Ahmed, Maqsood; Ahmed, Rashid; Saeed, M. Alam; Rashid, Haris; Fazal-e-Aleem

    2007-02-01

    Using parallel processing technique and commodity hardware, Beowulf supercomputers can be built at a much lower cost. Research organizations and educational institutions are using this technique to build their own high performance clusters. In this paper we discuss the architecture and design of Beowulf supercomputer and our own experience of building BURRAQ cluster.

  19. Microprocessors: from desktops to supercomputers.

    Science.gov (United States)

    Baskett, F; Hennessy, J L

    1993-08-13

    Continuing improvements in integrated circuit technology and computer architecture have driven microprocessors to performance levels that rival those of supercomputers-at a fraction of the price. The use of sophisticated memory hierarchies enables microprocessor-based machines to have very large memories built from commodity dynamic random access memory while retaining the high bandwidth and low access time needed in a high-performance machine. Parallel processors composed of these high-performance microprocessors are becoming the supercomputing technology of choice for scientific and engineering applications. The challenges for these new supercomputers have been in developing multiprocessor architectures that are easy to program and that deliver high performance without extraordinary programming efforts by users. Recent progress in multiprocessor architecture has led to ways to meet these challenges.

  20. Implementation and scaling of the fully coupled Terrestrial Systems Modeling Platform (TerrSysMP v1.0) in a massively parallel supercomputing environment - a case study on JUQUEEN (IBM Blue Gene/Q)

    Science.gov (United States)

    Gasper, F.; Goergen, K.; Shrestha, P.; Sulis, M.; Rihani, J.; Geimer, M.; Kollet, S.

    2014-10-01

    Continental-scale hyper-resolution simulations constitute a grand challenge in characterizing nonlinear feedbacks of states and fluxes of the coupled water, energy, and biogeochemical cycles of terrestrial systems. Tackling this challenge requires advanced coupling and supercomputing technologies for earth system models that are discussed in this study, utilizing the example of the implementation of the newly developed Terrestrial Systems Modeling Platform (TerrSysMP v1.0) on JUQUEEN (IBM Blue Gene/Q) of the Jülich Supercomputing Centre, Germany. The applied coupling strategies rely on the Multiple Program Multiple Data (MPMD) paradigm using the OASIS suite of external couplers, and require memory and load balancing considerations in the exchange of the coupling fields between different component models and the allocation of computational resources, respectively. Using the advanced profiling and tracing tool Scalasca to determine an optimum load balancing leads to a 19% speedup. In massively parallel supercomputer environments, the coupler OASIS-MCT is recommended, which resolves memory limitations that may be significant in case of very large computational domains and exchange fields as they occur in these specific test cases and in many applications in terrestrial research. However, model I/O and initialization in the petascale range still require major attention, as they constitute true big data challenges in light of future exascale computing resources. Based on a factor-two speedup due to compiler optimizations, a refactored coupling interface using OASIS-MCT and an optimum load balancing, the problem size in a weak scaling study can be increased by a factor of 64 from 512 to 32 768 processes while maintaining parallel efficiencies above 80% for the component models.

  1. Implementation and scaling of the fully coupled Terrestrial Systems Modeling Platform (TerrSysMP in a massively parallel supercomputing environment – a case study on JUQUEEN (IBM Blue Gene/Q

    Directory of Open Access Journals (Sweden)

    F. Gasper

    2014-06-01

    Full Text Available Continental-scale hyper-resolution simulations constitute a grand challenge in characterizing non-linear feedbacks of states and fluxes of the coupled water, energy, and biogeochemical cycles of terrestrial systems. Tackling this challenge requires advanced coupling and supercomputing technologies for earth system models that are discussed in this study, utilizing the example of the implementation of the newly developed Terrestrial Systems Modeling Platform (TerrSysMP on JUQUEEN (IBM Blue Gene/Q of the Jülich Supercomputing Centre, Germany. The applied coupling strategies rely on the Multiple Program Multiple Data (MPMD paradigm and require memory and load balancing considerations in the exchange of the coupling fields between different component models and allocation of computational resources, respectively. These considerations can be reached with advanced profiling and tracing tools leading to the efficient use of massively parallel computing environments, which is then mainly determined by the parallel performance of individual component models. However, the problem of model I/O and initialization in the peta-scale range requires major attention, because this constitutes a true big data challenge in the perspective of future exa-scale capabilities, which is unsolved.

  2. Implementation and scaling of the fully coupled Terrestrial Systems Modeling Platform (TerrSysMP) in a massively parallel supercomputing environment - a case study on JUQUEEN (IBM Blue Gene/Q)

    Science.gov (United States)

    Gasper, F.; Goergen, K.; Kollet, S.; Shrestha, P.; Sulis, M.; Rihani, J.; Geimer, M.

    2014-06-01

    Continental-scale hyper-resolution simulations constitute a grand challenge in characterizing non-linear feedbacks of states and fluxes of the coupled water, energy, and biogeochemical cycles of terrestrial systems. Tackling this challenge requires advanced coupling and supercomputing technologies for earth system models that are discussed in this study, utilizing the example of the implementation of the newly developed Terrestrial Systems Modeling Platform (TerrSysMP) on JUQUEEN (IBM Blue Gene/Q) of the Jülich Supercomputing Centre, Germany. The applied coupling strategies rely on the Multiple Program Multiple Data (MPMD) paradigm and require memory and load balancing considerations in the exchange of the coupling fields between different component models and allocation of computational resources, respectively. These considerations can be reached with advanced profiling and tracing tools leading to the efficient use of massively parallel computing environments, which is then mainly determined by the parallel performance of individual component models. However, the problem of model I/O and initialization in the peta-scale range requires major attention, because this constitutes a true big data challenge in the perspective of future exa-scale capabilities, which is unsolved.

  3. KAUST Supercomputing Laboratory

    KAUST Repository

    Bailey, April Renee

    2011-11-15

    KAUST has partnered with IBM to establish a Supercomputing Research Center. KAUST is hosting the Shaheen supercomputer, named after the Arabian falcon famed for its swiftness of flight. This 16-rack IBM Blue Gene/P system is equipped with 4 gigabyte memory per node and capable of 222 teraflops, making KAUST campus the site of one of the world’s fastest supercomputers in an academic environment. KAUST is targeting petaflop capability within 3 years.

  4. Emerging supercomputer architectures

    Energy Technology Data Exchange (ETDEWEB)

    Messina, P.C.

    1987-01-01

    This paper will examine the current and near future trends for commercially available high-performance computers with architectures that differ from the mainstream ''supercomputer'' systems in use for the last few years. These emerging supercomputer architectures are just beginning to have an impact on the field of high performance computing. 7 refs., 1 tab.

  5. GREEN SUPERCOMPUTING IN A DESKTOP BOX

    Energy Technology Data Exchange (ETDEWEB)

    HSU, CHUNG-HSING [Los Alamos National Laboratory; FENG, WU-CHUN [NON LANL; CHING, AVERY [NON LANL

    2007-01-17

    The computer workstation, introduced by Sun Microsystems in 1982, was the tool of choice for scientists and engineers as an interactive computing environment for the development of scientific codes. However, by the mid-1990s, the performance of workstations began to lag behind high-end commodity PCs. This, coupled with the disappearance of BSD-based operating systems in workstations and the emergence of Linux as an open-source operating system for PCs, arguably led to the demise of the workstation as we knew it. Around the same time, computational scientists started to leverage PCs running Linux to create a commodity-based (Beowulf) cluster that provided dedicated computer cycles, i.e., supercomputing for the rest of us, as a cost-effective alternative to large supercomputers, i.e., supercomputing for the few. However, as the cluster movement has matured, with respect to cluster hardware and open-source software, these clusters have become much more like their large-scale supercomputing brethren - a shared (and power-hungry) datacenter resource that must reside in a machine-cooled room in order to operate properly. Consequently, the above observations, when coupled with the ever-increasing performance gap between the PC and cluster supercomputer, provide the motivation for a 'green' desktop supercomputer - a turnkey solution that provides an interactive and parallel computing environment with the approximate form factor of a Sun SPARCstation 1 'pizza box' workstation. In this paper, they present the hardware and software architecture of such a solution as well as its prowess as a developmental platform for parallel codes. In short, imagine a 12-node personal desktop supercomputer that achieves 14 Gflops on Linpack but sips only 185 watts of power at load, resulting in a performance-power ratio that is over 300% better than their reference SMP platform.

  6. NSF Commits to Supercomputers.

    Science.gov (United States)

    Waldrop, M. Mitchell

    1985-01-01

    The National Science Foundation (NSF) has allocated at least $200 million over the next five years to support four new supercomputer centers. Issues and trends related to this NSF initiative are examined. (JN)

  7. A Hybrid Parallel Strategy Based on String Graph Theory to Improve De Novo DNA Assembly on the TianHe-2 Supercomputer.

    Science.gov (United States)

    Zhang, Feng; Liao, Xiangke; Peng, Shaoliang; Cui, Yingbo; Wang, Bingqiang; Zhu, Xiaoqian; Liu, Jie

    2016-06-01

    ' The de novo assembly of DNA sequences is increasingly important for biological researches in the genomic era. After more than one decade since the Human Genome Project, some challenges still exist and new solutions are being explored to improve de novo assembly of genomes. String graph assembler (SGA), based on the string graph theory, is a new method/tool developed to address the challenges. In this paper, based on an in-depth analysis of SGA we prove that the SGA-based sequence de novo assembly is an NP-complete problem. According to our analysis, SGA outperforms other similar methods/tools in memory consumption, but costs much more time, of which 60-70 % is spent on the index construction. Upon this analysis, we introduce a hybrid parallel optimization algorithm and implement this algorithm in the TianHe-2's parallel framework. Simulations are performed with different datasets. For data of small size the optimized solution is 3.06 times faster than before, and for data of middle size it's 1.60 times. The results demonstrate an evident performance improvement, with the linear scalability for parallel FM-index construction. This results thus contribute significantly to improving the efficiency of de novo assembly of DNA sequences.

  8. Fine-grained parallelization of the Car-Parrinello ab initio molecular dynamics method on the IBM Blue Gene/L supercomputer

    Energy Technology Data Exchange (ETDEWEB)

    E. Bohm A. Bhatele L. V. Kale M. E. Tuckerman S. Kumar J. A. Gunnels G. J. Martyna; Bohm, E. [Thomas M. Siebel Center, Univ. of Illinois at Urbana-Champaign, Urbana, IL (United States). Dept of Computer Science; Bhatele, A. [Thomas M. Siebel Center, Univ. of Illinois at Urbana-Champaign, Urbana, IL (United States). Dept of Computer Science; Kale, L. V. [Thomas M. Siebel Center, Univ. of Illinois at Urbana-Champaign, Urbana, IL (United States). Dept of Computer Science; Tuckerman, M. E. [New York Univ., NY (United States). Dept. of Chemistry and Courant Institute of Mathematical Sciences; Kumar, S. [IBM T. J. Watson Research Center, Yorktown Heights, NY (United States). IBM Research Division; Gunnels, J. A. [IBM T. J. Watson Research Center, Yorktown Heights, NY (United States). IBM Research Division; Martyna, G. J. [IBM T. J. Watson Research Center, Yorktown Heights, NY (United States). IBM Research Division

    2008-01-01

    Important scientific problems can be treated via ab initio-based molecular modeling approaches, wherein atomic forces are derived from an energy Junction that explicitly considers the electrons. The Car-Parrinello ab initio molecular dynamics (CPAIMD) method is widely used to study small systems containing on the order of 10 to 103 atoms. However, the impact of CPAIMD has been limited until recently because of difficulties inherent to scaling the technique beyond processor numbers about equal to the number of electronic states. CPAIMD computations involve a large number of interdependent phases with high interprocessor communication overhead. These phases require the evaluation of various transforms and non-square matrix multiplications that require large interprocessor data movement when efficiently parallelized. Using the Charm++ parallel programming language and runtime system, the phases are discretized into a large number of virtual processors, which are, in turn, mapped flexibly onto physical processors, thereby allowing interleaving of work. Algorithmic and IBM Blue Gene/L(tm) system-specific optimizations are employed to scale the CPAIMD method to at least 30 times the number of electronic states in small systems consisting of 24 to 768 atoms (32 to 1,024 electronic states) in order to demonstrate fine-grained parallelism. The largest systems studied scaled well across the entire machine (20,480 nodes).

  9. Confronting Reality

    Institute of Scientific and Technical Information of China (English)

    2005-01-01

    Confronting Reality will change the way you think about and run your business, It is the first book that shows how to connect the big picture of the new era of business with the nitty-gritty of what to do about it. Through a completely new way to understand and use the business model as the primary tool for confronting reality-breakthrough that will become the management innovaticn of this decade you'll know sooner rather than later whether your fundamental business premise is under assault where your best opportunities lie .

  10. Supercomputers to transform Science

    CERN Multimedia

    2006-01-01

    "New insights into the structure of space and time, climate modeling, and the design of novel drugs, are but a few of the many research areas that will be transforned by the installation of three supercomputers at the Unversity of Bristol." (1/2 page)

  11. Petaflop supercomputers of China

    Institute of Scientific and Technical Information of China (English)

    Guoliang CHEN

    2010-01-01

    @@ After ten years of development, high performance computing (HPC) in China has made remarkable progress. In November, 2010, the NUDT Tianhe-1A and the Dawning Nebulae respectively claimed the 1st and 3rd places in the Top500 Supercomputers List; this recognizes internationally the level that China has achieved in high performance computer manufacturing.

  12. Confronting Stereotypes

    Science.gov (United States)

    Buswell, Carol

    2011-01-01

    People confront stereotypes every day, both in and out of the classroom. Some ideas have been carried in the collective memory and classroom textbooks for so long they are generally recognized as fact. Many are constantly being reinforced by personal experiences, family discussions, and Hollywood productions as well. The distinct advantage to…

  13. Confronting Stereotypes

    Science.gov (United States)

    Buswell, Carol

    2011-01-01

    People confront stereotypes every day, both in and out of the classroom. Some ideas have been carried in the collective memory and classroom textbooks for so long they are generally recognized as fact. Many are constantly being reinforced by personal experiences, family discussions, and Hollywood productions as well. The distinct advantage to…

  14. Introduction to Reconfigurable Supercomputing

    CERN Document Server

    Lanzagorta, Marco; Rosenberg, Robert

    2010-01-01

    This book covers technologies, applications, tools, languages, procedures, advantages, and disadvantages of reconfigurable supercomputing using Field Programmable Gate Arrays (FPGAs). The target audience is the community of users of High Performance Computers (HPe who may benefit from porting their applications into a reconfigurable environment. As such, this book is intended to guide the HPC user through the many algorithmic considerations, hardware alternatives, usability issues, programming languages, and design tools that need to be understood before embarking on the creation of reconfigur

  15. Enabling department-scale supercomputing

    Energy Technology Data Exchange (ETDEWEB)

    Greenberg, D.S.; Hart, W.E.; Phillips, C.A.

    1997-11-01

    The Department of Energy (DOE) national laboratories have one of the longest and most consistent histories of supercomputer use. The authors summarize the architecture of DOE`s new supercomputers that are being built for the Accelerated Strategic Computing Initiative (ASCI). The authors then argue that in the near future scaled-down versions of these supercomputers with petaflop-per-weekend capabilities could become widely available to hundreds of research and engineering departments. The availability of such computational resources will allow simulation of physical phenomena to become a full-fledged third branch of scientific exploration, along with theory and experimentation. They describe the ASCI and other supercomputer applications at Sandia National Laboratories, and discuss which lessons learned from Sandia`s long history of supercomputing can be applied in this new setting.

  16. Simulating Galactic Winds on Supercomputers

    Science.gov (United States)

    Schneider, Evan

    2017-01-01

    Galactic winds are a ubiquitous feature of rapidly star-forming galaxies. Observations of nearby galaxies have shown that winds are complex, multiphase phenomena, comprised of outflowing gas at a large range of densities, temperatures, and velocities. Describing how starburst-driven outflows originate, evolve, and affect the circumgalactic medium and gas supply of galaxies is an important challenge for theories of galaxy evolution. In this talk, I will discuss how we are using a new hydrodynamics code, Cholla, to improve our understanding of galactic winds. Cholla is a massively parallel, GPU-based code that takes advantage of specialized hardware on the newest generation of supercomputers. With Cholla, we can perform large, three-dimensional simulations of multiphase outflows, allowing us to track the coupling of mass and momentum between gas phases across hundreds of parsecs at sub-parsec resolution. The results of our recent simulations demonstrate that the evolution of cool gas in galactic winds is highly dependent on the initial structure of embedded clouds. In particular, we find that turbulent density structures lead to more efficient mass transfer from cool to hot phases of the wind. I will discuss the implications of our results both for the incorporation of winds into cosmological simulations, and for interpretations of observed multiphase winds and the circumgalatic medium of nearby galaxies.

  17. Supercomputing '91; Proceedings of the 4th Annual Conference on High Performance Computing, Albuquerque, NM, Nov. 18-22, 1991

    Science.gov (United States)

    1991-01-01

    Various papers on supercomputing are presented. The general topics addressed include: program analysis/data dependence, memory access, distributed memory code generation, numerical algorithms, supercomputer benchmarks, latency tolerance, parallel programming, applications, processor design, networks, performance tools, mapping and scheduling, characterization affecting performance, parallelism packaging, computing climate change, combinatorial algorithms, hardware and software performance issues, system issues. (No individual items are abstracted in this volume)

  18. Supercomputer debugging workshop `92

    Energy Technology Data Exchange (ETDEWEB)

    Brown, J.S.

    1993-02-01

    This report contains papers or viewgraphs on the following topics: The ABCs of Debugging in the 1990s; Cray Computer Corporation; Thinking Machines Corporation; Cray Research, Incorporated; Sun Microsystems, Inc; Kendall Square Research; The Effects of Register Allocation and Instruction Scheduling on Symbolic Debugging; Debugging Optimized Code: Currency Determination with Data Flow; A Debugging Tool for Parallel and Distributed Programs; Analyzing Traces of Parallel Programs Containing Semaphore Synchronization; Compile-time Support for Efficient Data Race Detection in Shared-Memory Parallel Programs; Direct Manipulation Techniques for Parallel Debuggers; Transparent Observation of XENOOPS Objects; A Parallel Software Monitor for Debugging and Performance Tools on Distributed Memory Multicomputers; Profiling Performance of Inter-Processor Communications in an iWarp Torus; The Application of Code Instrumentation Technology in the Los Alamos Debugger; and CXdb: The Road to Remote Debugging.

  19. Computational Dimensionalities of Global Supercomputing

    Directory of Open Access Journals (Sweden)

    Richard S. Segall

    2013-12-01

    Full Text Available This Invited Paper pertains to subject of my Plenary Keynote Speech at the 17th World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2013 held in Orlando, Florida on July 9-12, 2013. The title of my Plenary Keynote Speech was: "Dimensionalities of Computation: from Global Supercomputing to Data, Text and Web Mining" but this Invited Paper will focus only on the "Computational Dimensionalities of Global Supercomputing" and is based upon a summary of the contents of several individual articles that have been previously written with myself as lead author and published in [75], [76], [77], [78], [79], [80] and [11]. The topics of these of the Plenary Speech included Overview of Current Research in Global Supercomputing [75], Open-Source Software Tools for Data Mining Analysis of Genomic and Spatial Images using High Performance Computing [76], Data Mining Supercomputing with SAS™ JMP® Genomics ([77], [79], [80], and Visualization by Supercomputing Data Mining [81]. ______________________ [11.] Committee on the Future of Supercomputing, National Research Council (2003, The Future of Supercomputing: An Interim Report, ISBN-13: 978-0-309-09016- 2, http://www.nap.edu/catalog/10784.html [75.] Segall, Richard S.; Zhang, Qingyu and Cook, Jeffrey S.(2013, "Overview of Current Research in Global Supercomputing", Proceedings of Forty- Fourth Meeting of Southwest Decision Sciences Institute (SWDSI, Albuquerque, NM, March 12-16, 2013. [76.] Segall, Richard S. and Zhang, Qingyu (2010, "Open-Source Software Tools for Data Mining Analysis of Genomic and Spatial Images using High Performance Computing", Proceedings of 5th INFORMS Workshop on Data Mining and Health Informatics, Austin, TX, November 6, 2010. [77.] Segall, Richard S., Zhang, Qingyu and Pierce, Ryan M.(2010, "Data Mining Supercomputing with SAS™ JMP®; Genomics: Research-in-Progress, Proceedings of 2010 Conference on Applied Research in Information Technology, sponsored by

  20. World's fastest supercomputer opens up to users

    Science.gov (United States)

    Xin, Ling

    2016-08-01

    China's latest supercomputer - Sunway TaihuLight - has claimed the crown as the world's fastest computer according to the latest TOP500 list, released at the International Supercomputer Conference in Frankfurt in late June.

  1. Improved Access to Supercomputers Boosts Chemical Applications.

    Science.gov (United States)

    Borman, Stu

    1989-01-01

    Supercomputing is described in terms of computing power and abilities. The increase in availability of supercomputers for use in chemical calculations and modeling are reported. Efforts of the National Science Foundation and Cray Research are highlighted. (CW)

  2. The GF11 supercomputer

    Science.gov (United States)

    Beetem, J.; Denneau, M.; Weingarten, D.

    1987-01-01

    GF11 is a parallel computer currently under construction at the IBM Yorktown Research Center. The machine incorporates 576 floating-point processors arranged in a modified SIMD architecture. Each has space for 2 Mbytes of memory and is capable of 20 Mflops, giving the total machine a peak of 1.125 Gbytes of memory and 11.52 Gflops. The floating-point processors are interconnected by a dynamically reconfigurable non-blocking switching network. At each machine cycle any of 1024 pre-selected permutations of data can be realized among the processors. The main intended application of GF11 is a class of calculations arising from quantum chromodynamics.

  3. Desktop supercomputers. Advance medical imaging.

    Science.gov (United States)

    Frisiello, R S

    1991-02-01

    Medical imaging tools that radiologists as well as a wide range of clinicians and healthcare professionals have come to depend upon are emerging into the next phase of functionality. The strides being made in supercomputing technologies--including reduction of size and price--are pushing medical imaging to a new level of accuracy and functionality.

  4. The PMS project Poor Man's Supercomputer

    CERN Document Server

    Csikor, Ferenc; Hegedüs, P; Horváth, V K; Katz, S D; Piróth, A

    2001-01-01

    We briefly describe the Poor Man's Supercomputer (PMS) project that is carried out at Eotvos University, Budapest. The goal is to develop a cost effective, scalable, fast parallel computer to perform numerical calculations of physical problems that can be implemented on a lattice with nearest neighbour interactions. To reach this goal we developed the PMS architecture using PC components and designed a special, low cost communication hardware and the driver software for Linux OS. Our first implementation of the PMS includes 32 nodes (PMS1). The performance of the PMS1 was tested by Lattice Gauge Theory simulations. Using SU(3) pure gauge theory or bosonic MSSM on the PMS1 computer we obtained 3$/Mflops price-per-sustained performance ratio. The design of the special hardware and the communication driver are freely available upon request for non-profit organizations.

  5. The BlueGene/L Supercomputer

    CERN Document Server

    Bhanot, G V; Gara, A; Vranas, P M; Bhanot, Gyan; Chen, Dong; Gara, Alan; Vranas, Pavlos

    2002-01-01

    The architecture of the BlueGene/L massively parallel supercomputer is described. Each computing node consists of a single compute ASIC plus 256 MB of external memory. The compute ASIC integrates two 700 MHz PowerPC 440 integer CPU cores, two 2.8 Gflops floating point units, 4 MB of embedded DRAM as cache, a memory controller for external memory, six 1.4 Gbit/s bi-directional ports for a 3-dimensional torus network connection, three 2.8 Gbit/s bi-directional ports for connecting to a global tree network and a Gigabit Ethernet for I/O. 65,536 of such nodes are connected into a 3-d torus with a geometry of 32x32x64. The total peak performance of the system is 360 Teraflops and the total amount of memory is 16 TeraBytes.

  6. An assessment of worldwide supercomputer usage

    Energy Technology Data Exchange (ETDEWEB)

    Wasserman, H.J.; Simmons, M.L.; Hayes, A.H.

    1995-01-01

    This report provides a comparative study of advanced supercomputing usage in Japan and the United States as of Spring 1994. It is based on the findings of a group of US scientists whose careers have centered on programming, evaluating, and designing high-performance supercomputers for over ten years. The report is a follow-on to an assessment of supercomputing technology in Europe and Japan that was published in 1993. Whereas the previous study focused on supercomputer manufacturing capabilities, the primary focus of the current work was to compare where and how supercomputers are used. Research for this report was conducted through both literature studies and field research in Japan.

  7. High Performance Networks From Supercomputing to Cloud Computing

    CERN Document Server

    Abts, Dennis

    2011-01-01

    Datacenter networks provide the communication substrate for large parallel computer systems that form the ecosystem for high performance computing (HPC) systems and modern Internet applications. The design of new datacenter networks is motivated by an array of applications ranging from communication intensive climatology, complex material simulations and molecular dynamics to such Internet applications as Web search, language translation, collaborative Internet applications, streaming video and voice-over-IP. For both Supercomputing and Cloud Computing the network enables distributed applicati

  8. Parallel computing works!

    CERN Document Server

    Fox, Geoffrey C; Messina, Guiseppe C

    2014-01-01

    A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop

  9. INTEGRATION OF PANDA WORKLOAD MANAGEMENT SYSTEM WITH SUPERCOMPUTERS

    Energy Technology Data Exchange (ETDEWEB)

    De, K [University of Texas at Arlington; Jha, S [Rutgers University; Maeno, T [Brookhaven National Laboratory (BNL); Mashinistov, R. [Russian Research Center, Kurchatov Institute, Moscow, Russia; Nilsson, P [Brookhaven National Laboratory (BNL); Novikov, A. [Russian Research Center, Kurchatov Institute, Moscow, Russia; Oleynik, D [University of Texas at Arlington; Panitkin, S [Brookhaven National Laboratory (BNL); Poyda, A. [Russian Research Center, Kurchatov Institute, Moscow, Russia; Ryabinkin, E. [Russian Research Center, Kurchatov Institute, Moscow, Russia; Teslyuk, A. [Russian Research Center, Kurchatov Institute, Moscow, Russia; Tsulaia, V. [Lawrence Berkeley National Laboratory (LBNL); Velikhov, V. [Russian Research Center, Kurchatov Institute, Moscow, Russia; Wen, G. [University of Wisconsin, Madison; Wells, Jack C [ORNL; Wenaus, T [Brookhaven National Laboratory (BNL)

    2016-01-01

    Abstract The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the funda- mental nature of matter and the basic forces that shape our universe, and were recently credited for the dis- covery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System for managing the workflow for all data processing on over 140 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data cen- ters are physically scattered all over the world. While PanDA currently uses more than 250000 cores with a peak performance of 0.3+ petaFLOPS, next LHC data taking runs will require more resources than Grid computing can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, Europe and Russia (in particular with Titan supercomputer at Oak Ridge Leadership Com- puting Facility (OLCF), Supercomputer at the National Research Center Kurchatov Institute , IT4 in Ostrava, and others). The current approach utilizes a modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run single- threaded workloads in parallel on Titan s multi-core worker nodes. This implementation was tested with a variety of

  10. Integration of Panda Workload Management System with supercomputers

    Science.gov (United States)

    De, K.; Jha, S.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Nilsson, P.; Novikov, A.; Oleynik, D.; Panitkin, S.; Poyda, A.; Read, K. F.; Ryabinkin, E.; Teslyuk, A.; Velikhov, V.; Wells, J. C.; Wenaus, T.

    2016-09-01

    The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System for managing the workflow for all data processing on over 140 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250000 cores with a peak performance of 0.3+ petaFLOPS, next LHC data taking runs will require more resources than Grid computing can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, Europe and Russia (in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF), Supercomputer at the National Research Center "Kurchatov Institute", IT4 in Ostrava, and others). The current approach utilizes a modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run singlethreaded workloads in parallel on Titan's multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads

  11. Proceedings of the first energy research power supercomputer users symposium

    Energy Technology Data Exchange (ETDEWEB)

    1991-01-01

    The Energy Research Power Supercomputer Users Symposium was arranged to showcase the richness of science that has been pursued and accomplished in this program through the use of supercomputers and now high performance parallel computers over the last year: this report is the collection of the presentations given at the Symposium. Power users'' were invited by the ER Supercomputer Access Committee to show that the use of these computational tools and the associated data communications network, ESNet, go beyond merely speeding up computations. Today the work often directly contributes to the advancement of the conceptual developments in their fields and the computational and network resources form the very infrastructure of today's science. The Symposium also provided an opportunity, which is rare in this day of network access to computing resources, for the invited users to compare and discuss their techniques and approaches with those used in other ER disciplines. The significance of new parallel architectures was highlighted by the interesting evening talk given by Dr. Stephen Orszag of Princeton University.

  12. TOP500 Supercomputers for June 2004

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2004-06-23

    23rd Edition of TOP500 List of World's Fastest Supercomputers Released: Japan's Earth Simulator Enters Third Year in Top Position MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a closely watched event in the world of high-performance computing, the 23rd edition of the TOP500 list of the world's fastest supercomputers was released today (June 23, 2004) at the International Supercomputer Conference in Heidelberg, Germany.

  13. TACTICS OF CONFRONTATION

    Directory of Open Access Journals (Sweden)

    Golovin M. V.

    2015-12-01

    Full Text Available As a part of the investigation carried out in the course of the investigation of crimes, a confrontation is very important. This article reveals the essence of the scientific production of the confrontation, the object and purpose of which is to establish the truth in the case. The investigator, in accordance with the Article 192 of the Code of Criminal Procedure of the Russian Federation has the right to decide on proceeding a confrontation, in cases when previously there were significant differences. In conducting confrontation, there are confirmed correct versions and the versions denied by others, it turns out the real facts of the case and eliminates significant contradictions in the testimony of previously interrogated persons. By making the decision to produce a confrontation, the investigator must be confident in the ability of the participant, who gave truthful testimony, to withstand the psychological pressure. This party should be prepared to create his "immunity" against future attempts to influence the other party to persuade to change readings, etc. Before the production of a confrontation, the investigator must draw up a plan in which the questions are formulated. Then prioritize questioning of participants of confrontation and identify tactics that can be applied in the course of its production. The investigator prepares a space for the production of confrontation, and audio, photo and video equipment. Different violations, errors during the confrontation, have the ultimate impact on the overall result of the preliminary investigation of a specific criminal case. In this regard, clarification of the nature of the confrontation has not only theoretical but also practical importance

  14. Extending ATLAS Computing to Commercial Clouds and Supercomputers

    CERN Document Server

    Nilsson, P; The ATLAS collaboration; Filipcic, A; Klimentov, A; Maeno, T; Oleynik, D; Panitkin, S; Wenaus, T; Wu, W

    2014-01-01

    The Large Hadron Collider will resume data collection in 2015 with substantially increased computing requirements relative to its first 2009-2013 run. A near doubling of the energy and the data rate, high level of event pile-up, and detector upgrades will mean the number and complexity of events to be analyzed will increase dramatically. A naive extrapolation of the Run 1 experience would suggest that a 5-6 fold increase in computing resources are needed - impossible within the anticipated flat computing budgets in the near future. Consequently ATLAS is engaged in an ambitious program to expand its computing to all available resources, notably including opportunistic use of commercial clouds and supercomputers. Such resources present new challenges in managing heterogeneity, supporting data flows, parallelizing workflows, provisioning software, and other aspects of distributed computing, all while minimizing operational load. We will present the ATLAS experience to date with clouds and supercomputers, and des...

  15. INTEL: Intel based systems move up in supercomputing ranks

    CERN Multimedia

    2002-01-01

    "The TOP500 supercomputer rankings released today at the Supercomputing 2002 conference show a dramatic increase in the number of Intel-based systems being deployed in high-performance computing (HPC) or supercomputing areas" (1/2 page).

  16. High Performance Distributed Computing in a Supercomputer Environment: Computational Services and Applications Issues

    Science.gov (United States)

    Kramer, Williams T. C.; Simon, Horst D.

    1994-01-01

    This tutorial proposes to be a practical guide for the uninitiated to the main topics and themes of high-performance computing (HPC), with particular emphasis to distributed computing. The intent is first to provide some guidance and directions in the rapidly increasing field of scientific computing using both massively parallel and traditional supercomputers. Because of their considerable potential computational power, loosely or tightly coupled clusters of workstations are increasingly considered as a third alternative to both the more conventional supercomputers based on a small number of powerful vector processors, as well as high massively parallel processors. Even though many research issues concerning the effective use of workstation clusters and their integration into a large scale production facility are still unresolved, such clusters are already used for production computing. In this tutorial we will utilize the unique experience made at the NAS facility at NASA Ames Research Center. Over the last five years at NAS massively parallel supercomputers such as the Connection Machines CM-2 and CM-5 from Thinking Machines Corporation and the iPSC/860 (Touchstone Gamma Machine) and Paragon Machines from Intel were used in a production supercomputer center alongside with traditional vector supercomputers such as the Cray Y-MP and C90.

  17. Porting Ordinary Applications to Blue Gene/Q Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Maheshwari, Ketan C.; Wozniak, Justin M.; Armstrong, Timothy; Katz, Daniel S.; Binkowski, T. Andrew; Zhong, Xiaoliang; Heinonen, Olle; Karpeyev, Dmitry; Wilde, Michael

    2015-08-31

    Efficiently porting ordinary applications to Blue Gene/Q supercomputers is a significant challenge. Codes are often originally developed without considering advanced architectures and related tool chains. Science needs frequently lead users to want to run large numbers of relatively small jobs (often called many-task computing, an ensemble, or a workflow), which can conflict with supercomputer configurations. In this paper, we discuss techniques developed to execute ordinary applications over leadership class supercomputers. We use the high-performance Swift parallel scripting framework and build two workflow execution techniques-sub-jobs and main-wrap. The sub-jobs technique, built on top of the IBM Blue Gene/Q resource manager Cobalt's sub-block jobs, lets users submit multiple, independent, repeated smaller jobs within a single larger resource block. The main-wrap technique is a scheme that enables C/C++ programs to be defined as functions that are wrapped by a high-performance Swift wrapper and that are invoked as a Swift script. We discuss the needs, benefits, technicalities, and current limitations of these techniques. We further discuss the real-world science enabled by these techniques and the results obtained.

  18. Optimizing Linpack Benchmark on GPU-Accelerated Petascale Supercomputer

    Institute of Scientific and Technical Information of China (English)

    Feng Wang; Can-Qun Yang; Yun-Fei Du; Juan Chen; Hui-Zhan Yi; Wei-Xia Xu

    2011-01-01

    In this paper we present the programming of the Linpack benchmark on TianHe-1 system,the first petascale supercomputer system of China,and the largest GPU-accelerated heterogeneous system ever attempted before.A hybrid programming model consisting of MPI,OpenMP and streaming computing is described to explore the task parallel,thread parallel and data parallel of the Linpack.We explain how we optimized the load distribution across the CPUs and GPUs using the two-level adaptive method and describe the implementation in details.To overcome the low-bandwidth between the CPU and GPU communication,we present a software pipelining technique to hide the communication overhead.Combined with other traditional optimizations,the Linpack we developed achieved 196.7 GFLOPS on a single compute element of TianHe-1.This result is 70.1% of the peak compute capability,3.3 times faster than the result by using the vendor's library.On the full configuration of TianHe-1 our optimizations resulted in a Linpack performance of 0.563 PFLOPS,which made TianHe-1 the 5th fastest supercomputer on the Top500 list in November,2009.

  19. TOP500 Supercomputers for June 2005

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2005-06-22

    25th Edition of TOP500 List of World's Fastest Supercomputers Released: DOE/L LNL BlueGene/L and IBM gain Top Positions MANNHEIM, Germany; KNOXVILLE, Tenn.; BERKELEY, Calif. In what has become a closely watched event in the world of high-performance computing, the 25th edition of the TOP500 list of the world's fastest supercomputers was released today (June 22, 2005) at the 20th International Supercomputing Conference (ISC2005) in Heidelberg Germany.

  20. TOP500 Supercomputers for November 2003

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2003-11-16

    22nd Edition of TOP500 List of World s Fastest Supercomputers Released MANNHEIM, Germany; KNOXVILLE, Tenn.; BERKELEY, Calif. In what has become a much-anticipated event in the world of high-performance computing, the 22nd edition of the TOP500 list of the worlds fastest supercomputers was released today (November 16, 2003). The Earth Simulator supercomputer retains the number one position with its Linpack benchmark performance of 35.86 Tflop/s (''teraflops'' or trillions of calculations per second). It was built by NEC and installed last year at the Earth Simulator Center in Yokohama, Japan.

  1. Supercomputer debugging workshop '92

    Energy Technology Data Exchange (ETDEWEB)

    Brown, J.S.

    1993-01-01

    This report contains papers or viewgraphs on the following topics: The ABCs of Debugging in the 1990s; Cray Computer Corporation; Thinking Machines Corporation; Cray Research, Incorporated; Sun Microsystems, Inc; Kendall Square Research; The Effects of Register Allocation and Instruction Scheduling on Symbolic Debugging; Debugging Optimized Code: Currency Determination with Data Flow; A Debugging Tool for Parallel and Distributed Programs; Analyzing Traces of Parallel Programs Containing Semaphore Synchronization; Compile-time Support for Efficient Data Race Detection in Shared-Memory Parallel Programs; Direct Manipulation Techniques for Parallel Debuggers; Transparent Observation of XENOOPS Objects; A Parallel Software Monitor for Debugging and Performance Tools on Distributed Memory Multicomputers; Profiling Performance of Inter-Processor Communications in an iWarp Torus; The Application of Code Instrumentation Technology in the Los Alamos Debugger; and CXdb: The Road to Remote Debugging.

  2. 16 million [pounds] investment for 'virtual supercomputer'

    CERN Multimedia

    Holland, C

    2003-01-01

    "The Particle Physics and Astronomy Research Council is to spend 16million [pounds] to create a massive computing Grid, equivalent to the world's second largest supercomputer after Japan's Earth Simulator computer" (1/2 page)

  3. Supercomputers open window of opportunity for nursing.

    Science.gov (United States)

    Meintz, S L

    1993-01-01

    A window of opportunity was opened for nurse researchers with the High Performance Computing and Communications (HPCC) initiative in President Bush's 1992 fiscal-year budget. Nursing research moved into the high-performance computing environment through the University of Nevada Las Vegas/Cray Project for Nursing and Health Data Research (PNHDR). USing the CRAY YMP 2/216 supercomputer, the PNHDR established the validity of a supercomputer platform for nursing research. In addition, the research has identified a paradigm shift in statistical analysis, delineated actual and potential barriers to nursing research in a supercomputing environment, conceptualized a new branch of nursing science called Nurmetrics, and discovered new avenue for nursing research utilizing supercomputing tools.

  4. TOP500 Supercomputers for November 2004

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2004-11-08

    24th Edition of TOP500 List of World's Fastest Supercomputers Released: DOE/IBM BlueGene/L and NASA/SGI's Columbia gain Top Positions MANNHEIM, Germany; KNOXVILLE, Tenn.; BERKELEY, Calif. In what has become a closely watched event in the world of high-performance computing, the 24th edition of the TOP500 list of the worlds fastest supercomputers was released today (November 8, 2004) at the SC2004 Conference in Pittsburgh, Pa.

  5. Misleading Performance Reporting in the Supercomputing Field

    Directory of Open Access Journals (Sweden)

    David H. Bailey

    1992-01-01

    Full Text Available In a previous humorous note, I outlined 12 ways in which performance figures for scientific supercomputers can be distorted. In this paper, the problem of potentially misleading performance reporting is discussed in detail. Included are some examples that have appeared in recent published scientific papers. This paper also includes some proposed guidelines for reporting performance, the adoption of which would raise the level of professionalism and reduce the level of confusion in the field of supercomputing.

  6. Confronting an Augmented Reality

    Science.gov (United States)

    Munnerley, Danny; Bacon, Matt; Wilson, Anna; Steele, James; Hedberg, John; Fitzgerald, Robert

    2012-01-01

    How can educators make use of augmented reality technologies and practices to enhance learning and why would we want to embrace such technologies anyway? How can an augmented reality help a learner confront, interpret and ultimately comprehend reality itself ? In this article, we seek to initiate a discussion that focuses on these questions, and…

  7. Confronting Islamophobia in Education

    Science.gov (United States)

    Ramarajan, Dhaya; Runell, Marcella

    2007-01-01

    The "Religion and Diversity Education" program of the Tanenbaum Center for Interreligious Understanding includes innovative training for elementary and high school educators on addressing religious pluralism in school. This paper highlights the program and its curricula which confront Islamophobia by teaching students concrete skills for living in…

  8. Confronting an Augmented Reality

    Science.gov (United States)

    Munnerley, Danny; Bacon, Matt; Wilson, Anna; Steele, James; Hedberg, John; Fitzgerald, Robert

    2012-01-01

    How can educators make use of augmented reality technologies and practices to enhance learning and why would we want to embrace such technologies anyway? How can an augmented reality help a learner confront, interpret and ultimately comprehend reality itself ? In this article, we seek to initiate a discussion that focuses on these questions, and…

  9. Confronting Ambiguity in Science

    Science.gov (United States)

    Emery, Katherine; Harlow, Danielle; Whitmer, Ali; Gaines, Steven

    2015-01-01

    People are regularly confronted with environmental and science-related issues presented to them in newspapers, on television, or even in their own doctor's office. Often the information they use to inform their decisions on matters of science may be ambiguous and contradictory. This article presents an activity that investigates how students deal…

  10. Existential Confrontation and Religiosity.

    Science.gov (United States)

    Watson, P. J.; And Others

    1988-01-01

    Tested the hypothesis that religious intrinsicness predicts failure to confront existential problems and an interactional orientation promotes the opposite influence. Results from 94 male and 107 female college students failed to support hypothesis that orthodox, intrinsic individuals are rigid in their approach to existential questions in life.…

  11. Storage-Intensive Supercomputing Benchmark Study

    Energy Technology Data Exchange (ETDEWEB)

    Cohen, J; Dossa, D; Gokhale, M; Hysom, D; May, J; Pearce, R; Yoo, A

    2007-10-30

    Critical data science applications requiring frequent access to storage perform poorly on today's computing architectures. This project addresses efficient computation of data-intensive problems in national security and basic science by exploring, advancing, and applying a new form of computing called storage-intensive supercomputing (SISC). Our goal is to enable applications that simply cannot run on current systems, and, for a broad range of data-intensive problems, to deliver an order of magnitude improvement in price/performance over today's data-intensive architectures. This technical report documents much of the work done under LDRD 07-ERD-063 Storage Intensive Supercomputing during the period 05/07-09/07. The following chapters describe: (1) a new file I/O monitoring tool iotrace developed to capture the dynamic I/O profiles of Linux processes; (2) an out-of-core graph benchmark for level-set expansion of scale-free graphs; (3) an entity extraction benchmark consisting of a pipeline of eight components; and (4) an image resampling benchmark drawn from the SWarp program in the LSST data processing pipeline. The performance of the graph and entity extraction benchmarks was measured in three different scenarios: data sets residing on the NFS file server and accessed over the network; data sets stored on local disk; and data sets stored on the Fusion I/O parallel NAND Flash array. The image resampling benchmark compared performance of software-only to GPU-accelerated. In addition to the work reported here, an additional text processing application was developed that used an FPGA to accelerate n-gram profiling for language classification. The n-gram application will be presented at SC07 at the High Performance Reconfigurable Computing Technologies and Applications Workshop. The graph and entity extraction benchmarks were run on a Supermicro server housing the NAND Flash 40GB parallel disk array, the Fusion-io. The Fusion system specs are as follows

  12. Hetrogenous Parallel Computing

    OpenAIRE

    2013-01-01

    With processor core counts doubling every 18-24 months and penetrating all markets from high-end servers in supercomputers to desktops and laptops down to even mobile phones, we sit at the dawn of a world of ubiquitous parallelism, one where extracting performance via parallelism is paramount. That is, the "free lunch" to better performance, where programmers could rely on substantial increases in single-threaded performance to improve software, is over. The burden falls on developers to expl...

  13. A training program for scientific supercomputing users

    Energy Technology Data Exchange (ETDEWEB)

    Hanson, F.; Moher, T.; Sabelli, N.; Solem, A.

    1988-01-01

    There is need for a mechanism to transfer supercomputing technology into the hands of scientists and engineers in such a way that they will acquire a foundation of knowledge that will permit integration of supercomputing as a tool in their research. Most computing center training emphasizes computer-specific information about how to use a particular computer system; most academic programs teach concepts to computer scientists. Only a few brief courses and new programs are designed for computational scientists. This paper describes an eleven-week training program aimed principally at graduate and postdoctoral students in computationally-intensive fields. The program is designed to balance the specificity of computing center courses, the abstractness of computer science courses, and the personal contact of traditional apprentice approaches. It is based on the experience of computer scientists and computational scientists, and consists of seminars and clinics given by many visiting and local faculty. It covers a variety of supercomputing concepts, issues, and practices related to architecture, operating systems, software design, numerical considerations, code optimization, graphics, communications, and networks. Its research component encourages understanding of scientific computing and supercomputer hardware issues. Flexibility in thinking about computing needs is emphasized by the use of several different supercomputer architectures, such as the Cray X/MP48 at the National Center for Supercomputing Applications at University of Illinois at Urbana-Champaign, IBM 3090 600E/VF at the Cornell National Supercomputer Facility, and Alliant FX/8 at the Advanced Computing Research Facility at Argonne National Laboratory. 11 refs., 6 tabs.

  14. Confronting Islamic Jihadist Movements

    Directory of Open Access Journals (Sweden)

    M. Afzal Upal

    2015-05-01

    Full Text Available This paper argues that in order to win the long-term fight against Islamic Jihadist movements, we must confront their ideological foundations and provide the majority of Muslims with an alternative narrative that satisfies their social identity needs for a positive esteem.  By analysing social identity dynamics of Western-Muslim interactions, this paper presents some novel ideas that can lead to the creation of such a narrative.

  15. JESPP: Joint Experimentation on Scalable Parallel Processors Supercomputers

    Science.gov (United States)

    2010-03-01

    for traceability of individual pro- grammers during the tumult of operations in a simulation bay, where many operators will need to log-in, use...of which remains to be apprehended. The author’s experienced in teaching an introductory course on Data Mining at the Viterbi School of Engineering

  16. Parametric Parallel Simulation of Discrete Event Systems on SIMD Supercomputers

    Science.gov (United States)

    1994-05-01

    Arrival @ Node i )r, - i. (5.20) qmaxBE P(Accepting Departure @ Node i => Join Nodej )1•. - •i’,P, - (5.21) qmax,BE k XDri + g) P(Null Event)!P,.,.a =W1...network. The departure rate from node j is 0 when that node is in state 0 and g, otherwise. Departure Rate from Nodej = 0* n(0Oj) + j(l - (0j)) 168

  17. Parallel Earthquake Simulations on Large-Scale Multicore Supercomputers

    KAUST Repository

    Wu, Xingfu

    2011-01-01

    Earthquakes are one of the most destructive natural hazards on our planet Earth. Hugh earthquakes striking offshore may cause devastating tsunamis, as evidenced by the 11 March 2011 Japan (moment magnitude Mw9.0) and the 26 December 2004 Sumatra (Mw9.1) earthquakes. Earthquake prediction (in terms of the precise time, place, and magnitude of a coming earthquake) is arguably unfeasible in the foreseeable future. To mitigate seismic hazards from future earthquakes in earthquake-prone areas, such as California and Japan, scientists have been using numerical simulations to study earthquake rupture propagation along faults and seismic wave propagation in the surrounding media on ever-advancing modern computers over past several decades. In particular, ground motion simulations for past and future (possible) significant earthquakes have been performed to understand factors that affect ground shaking in populated areas, and to provide ground shaking characteristics and synthetic seismograms for emergency preparation and design of earthquake-resistant structures. These simulation results can guide the development of more rational seismic provisions for leading to safer, more efficient, and economical50pt]Please provide V. Taylor author e-mail ID. structures in earthquake-prone regions.

  18. TOP500 Supercomputers for June 2003

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2003-06-23

    21st Edition of TOP500 List of World's Fastest Supercomputers Released MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a much-anticipated event in the world of high-performance computing, the 21st edition of the TOP500 list of the world's fastest supercomputers was released today (June 23, 2003). The Earth Simulator supercomputer built by NEC and installed last year at the Earth Simulator Center in Yokohama, Japan, with its Linpack benchmark performance of 35.86 Tflop/s (teraflops or trillions of calculations per second), retains the number one position. The number 2 position is held by the re-measured ASCI Q system at Los Alamos National Laboratory. With 13.88 Tflop/s, it is the second system ever to exceed the 10 Tflop/smark. ASCIQ was built by Hewlett-Packard and is based on the AlphaServerSC computer system.

  19. TOP500 Supercomputers for June 2002

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2002-06-20

    19th Edition of TOP500 List of World's Fastest Supercomputers Released MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a much-anticipated event in the world of high-performance computing, the 19th edition of the TOP500 list of the worlds fastest supercomputers was released today (June 20, 2002). The recently installed Earth Simulator supercomputer at the Earth Simulator Center in Yokohama, Japan, is as expected the clear new number 1. Its performance of 35.86 Tflop/s (trillions of calculations per second) running the Linpack benchmark is almost five times higher than the performance of the now No.2 IBM ASCI White system at Lawrence Livermore National Laboratory (7.2 Tflop/s). This powerful leap frogging to the top by a system so much faster than the previous top system is unparalleled in the history of the TOP500.

  20. TOP500 Supercomputers for November 2002

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2002-11-15

    20th Edition of TOP500 List of World's Fastest Supercomputers Released MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a much-anticipated event in the world of high-performance computing, the 20th edition of the TOP500 list of the world's fastest supercomputers was released today (November 15, 2002). The Earth Simulator supercomputer installed earlier this year at the Earth Simulator Center in Yokohama, Japan, is with its Linpack benchmark performance of 35.86 Tflop/s (trillions of calculations per second) retains the number one position. The No.2 and No.3 positions are held by two new, identical ASCI Q systems at Los Alamos National Laboratory (7.73Tflop/s each). These systems are built by Hewlett-Packard and based on the Alpha Server SC computer system.

  1. Input/output behavior of supercomputing applications

    Science.gov (United States)

    Miller, Ethan L.

    1991-01-01

    The collection and analysis of supercomputer I/O traces and their use in a collection of buffering and caching simulations are described. This serves two purposes. First, it gives a model of how individual applications running on supercomputers request file system I/O, allowing system designer to optimize I/O hardware and file system algorithms to that model. Second, the buffering simulations show what resources are needed to maximize the CPU utilization of a supercomputer given a very bursty I/O request rate. By using read-ahead and write-behind in a large solid stated disk, one or two applications were sufficient to fully utilize a Cray Y-MP CPU.

  2. GPUs: An Oasis in the Supercomputing Desert

    CERN Document Server

    Kamleh, Waseem

    2012-01-01

    A novel metric is introduced to compare the supercomputing resources available to academic researchers on a national basis. Data from the supercomputing Top 500 and the top 500 universities in the Academic Ranking of World Universities (ARWU) are combined to form the proposed "500/500" score for a given country. Australia scores poorly in the 500/500 metric when compared with other countries with a similar ARWU ranking, an indication that HPC-based researchers in Australia are at a relative disadvantage with respect to their overseas competitors. For HPC problems where single precision is sufficient, commodity GPUs provide a cost-effective means of quenching the computational thirst of otherwise parched Lattice practitioners traversing the Australian supercomputing desert. We explore some of the more difficult terrain in single precision territory, finding that BiCGStab is unreliable in single precision at large lattice sizes. We test the CGNE and CGNR forms of the conjugate gradient method on the normal equa...

  3. Floating point arithmetic in future supercomputers

    Science.gov (United States)

    Bailey, David H.; Barton, John T.; Simon, Horst D.; Fouts, Martin J.

    1989-01-01

    Considerations in the floating-point design of a supercomputer are discussed. Particular attention is given to word size, hardware support for extended precision, format, and accuracy characteristics. These issues are discussed from the perspective of the Numerical Aerodynamic Simulation Systems Division at NASA Ames. The features believed to be most important for a future supercomputer floating-point design include: (1) a 64-bit IEEE floating-point format with 11 exponent bits, 52 mantissa bits, and one sign bit and (2) hardware support for reasonably fast double-precision arithmetic.

  4. Confronting the disruptive physician.

    Science.gov (United States)

    Linney, B J

    1997-01-01

    Ignoring disruptive behavior is no longer an option in today's changing health care environment. Competition and managed care have caused more organizations to deal with the disruptive physician, rather than look the other way as many did in years past. But it's not an easy task, possibly the toughest of your management career. How should you confront a disruptive physician? By having clearly stated expectations for physician behavior and policies in place for dealing with problem physicians, organizations have a context from which to address the situation.

  5. Confronting an augmented reality

    Directory of Open Access Journals (Sweden)

    John Hedberg

    2012-08-01

    Full Text Available How can educators make use of augmented reality technologies and practices to enhance learning and why would we want to embrace such technologies anyway? How can an augmented reality help a learner confront, interpret and ultimately comprehend reality itself? In this article, we seek to initiate a discussion that focuses on these questions, and suggest that they be used as drivers for research into effective educational applications of augmented reality. We discuss how multi-modal, sensorial augmentation of reality links to existing theories of education and learning, focusing on ideas of cognitive dissonance and the confrontation of new realities implied by exposure to new and varied perspectives. We also discuss connections with broader debates brought on by the social and cultural changes wrought by the increased digitalisation of our lives, especially the concept of the extended mind. Rather than offer a prescription for augmentation, our intention is to throw open debate and to provoke deep thinking about what interacting with and creating an augmented reality might mean for both teacher and learner.

  6. Productive Parallel Programming: The PCN Approach

    Directory of Open Access Journals (Sweden)

    Ian Foster

    1992-01-01

    Full Text Available We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel computer and run unchanged on supercomputers, and integrated debugging and performance analysis tools. We survey representative scientific applications and identify problem classes for which PCN has proved particularly useful.

  7. Direct exploitation of a top 500 Supercomputer for Analysis of CMS Data

    Science.gov (United States)

    Cabrillo, I.; Cabellos, L.; Marco, J.; Fernandez, J.; Gonzalez, I.

    2014-06-01

    The Altamira Supercomputer hosted at the Instituto de Fisica de Cantatbria (IFCA) entered in operation in summer 2012. Its last generation FDR Infiniband network used (for message passing) in parallel jobs, supports the connection to General Parallel File System (GPFS) servers, enabling an efficient simultaneous processing of multiple data demanding jobs. Sharing a common GPFS system and a single LDAP-based identification with the existing Grid clusters at IFCA allows CMS researchers to exploit the large instantaneous capacity of this supercomputer to execute analysis jobs. The detailed experience describing this opportunistic use for skimming and final analysis of CMS 2012 data for a specific physics channel, resulting in an order of magnitude reduction of the waiting time, is presented.

  8. Integration Of PanDA Workload Management System With Supercomputers for ATLAS and Data Intensive Science

    Energy Technology Data Exchange (ETDEWEB)

    De, K [University of Texas at Arlington; Jha, S [Rutgers University; Klimentov, A [Brookhaven National Laboratory (BNL); Maeno, T [Brookhaven National Laboratory (BNL); Nilsson, P [Brookhaven National Laboratory (BNL); Oleynik, D [University of Texas at Arlington; Panitkin, S [Brookhaven National Laboratory (BNL); Wells, Jack C [ORNL; Wenaus, T [Brookhaven National Laboratory (BNL)

    2016-01-01

    The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System for managing the workflow for all data processing on over 150 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250,000 cores with a peak performance of 0.3 petaFLOPS, LHC data taking runs require more resources than Grid computing can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, Europe and Russia (in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF), MIRA supercomputer at Argonne Leadership Computing Facilities (ALCF), Supercomputer at the National Research Center Kurchatov Institute , IT4 in Ostrava and others). Current approach utilizes modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on LCFs multi-core worker nodes. This implementation

  9. TSP:A Heterogeneous Multiprocessor Supercomputing System Based on i860XP

    Institute of Scientific and Technical Information of China (English)

    黄国勇; 李三立

    1994-01-01

    Numerous new RISC processors provide support for supercomputing.By using the “mini-Cray” i860 superscalar processor,an add-on board has been developed to boost the performance of a real time system.A parallel heterogeneous multiprocessor surercomputing system,TSP,is constructed.In this paper,we present the system design consideration and described the architecture of the TSP and its features.

  10. Adventures in Supercomputing: An innovative program

    Energy Technology Data Exchange (ETDEWEB)

    Summers, B.G.; Hicks, H.R.; Oliver, C.E.

    1995-06-01

    Within the realm of education, seldom does an innovative program become available with the potential to change an educator`s teaching methodology and serve as a spur to systemic reform. The Adventures in Supercomputing (AiS) program, sponsored by the Department of Energy, is such a program. Adventures in Supercomputing is a program for high school and middle school teachers. It has helped to change the teaching paradigm of many of the teachers involved in the program from a teacher-centered classroom to a student-centered classroom. ``A student-centered classroom offers better opportunities for development of internal motivation, planning skills, goal setting and perseverance than does the traditional teacher-directed mode``. Not only is the process of teaching changed, but evidences of systemic reform are beginning to surface. After describing the program, the authors discuss the teaching strategies being used and the evidences of systemic change in many of the AiS schools in Tennessee.

  11. Parallel computation with the spectral element method

    Energy Technology Data Exchange (ETDEWEB)

    Ma, Hong

    1995-12-01

    Spectral element models for the shallow water equations and the Navier-Stokes equations have been successfully implemented on a data parallel supercomputer, the Connection Machine model CM-5. The nonstaggered grid formulations for both models are described, which are shown to be especially efficient in data parallel computing environment.

  12. Virtualizing Super-Computation On-Board Uas

    Science.gov (United States)

    Salami, E.; Soler, J. A.; Cuadrado, R.; Barrado, C.; Pastor, E.

    2015-04-01

    Unmanned aerial systems (UAS, also known as UAV, RPAS or drones) have a great potential to support a wide variety of aerial remote sensing applications. Most UAS work by acquiring data using on-board sensors for later post-processing. Some require the data gathered to be downlinked to the ground in real-time. However, depending on the volume of data and the cost of the communications, this later option is not sustainable in the long term. This paper develops the concept of virtualizing super-computation on-board UAS, as a method to ease the operation by facilitating the downlink of high-level information products instead of raw data. Exploiting recent developments in miniaturized multi-core devices is the way to speed-up on-board computation. This hardware shall satisfy size, power and weight constraints. Several technologies are appearing with promising results for high performance computing on unmanned platforms, such as the 36 cores of the TILE-Gx36 by Tilera (now EZchip) or the 64 cores of the Epiphany-IV by Adapteva. The strategy for virtualizing super-computation on-board includes the benchmarking for hardware selection, the software architecture and the communications aware design. A parallelization strategy is given for the 36-core TILE-Gx36 for a UAS in a fire mission or in similar target-detection applications. The results are obtained for payload image processing algorithms and determine in real-time the data snapshot to gather and transfer to ground according to the needs of the mission, the processing time, and consumed watts.

  13. Navigating between Dialogue and Confrontation

    DEFF Research Database (Denmark)

    Thuesen, Frederik

    2011-01-01

    Interviews with political or organizational elites often constitute a valuable source of knowledge about organizational culture, values, and strategies. However, such interviews often confront the researcher with either reluctant or control-seeking respondents, especially for sensitive issues...... phronesis, Aristotle’s concept of practical rationality. A phronetic approach, involving reflections on the link between reason and emotions, is well suited for handling both dialogue and confrontation in the interview process. Empirically, the paper draws on interviews with representatives of trade unions...

  14. Data-intensive computing on numerically-insensitive supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Ahrens, James P [Los Alamos National Laboratory; Fasel, Patricia K [Los Alamos National Laboratory; Habib, Salman [Los Alamos National Laboratory; Heitmann, Katrin [Los Alamos National Laboratory; Lo, Li - Ta [Los Alamos National Laboratory; Patchett, John M [Los Alamos National Laboratory; Williams, Sean J [Los Alamos National Laboratory; Woodring, Jonathan L [Los Alamos National Laboratory; Wu, Joshua [Los Alamos National Laboratory; Hsu, Chung - Hsing [ONL

    2010-12-03

    With the advent of the era of petascale supercomputing, via the delivery of the Roadrunner supercomputing platform at Los Alamos National Laboratory, there is a pressing need to address the problem of visualizing massive petascale-sized results. In this presentation, I discuss progress on a number of approaches including in-situ analysis, multi-resolution out-of-core streaming and interactive rendering on the supercomputing platform. These approaches are placed in context by the emerging area of data-intensive supercomputing.

  15. Explaining the Gap between Theoretical Peak Performance and Real Performance for Supercomputer Architectures

    Directory of Open Access Journals (Sweden)

    W. Schönauer

    1994-01-01

    Full Text Available The basic architectures of vector and parallel computers and their properties are presented followed by a discussion of memory size and arithmetic operations in the context of memory bandwidth. For a single operation micromeasurements of the vector triad for the IBM 3090 VF and the CRAY Y-MP/8 are presented, revealing in detail the losses for this operation. The global performance of a whole supercomputer is then considered by identifying reduction factors that reduce the theoretical peak performance to the poor real performance. The responsibilities of the manufacturer and of the user for these losses are discussed. The price-performance ratio for different architectures as of January 1991 is briefly mentioned. Finally a user-friendly architecture for a supercomputer is proposed.

  16. An implementation of a parallel MOL solver on the Intel gamma parallel computer

    Energy Technology Data Exchange (ETDEWEB)

    Lawkins, W.F.; Payne, J.S.

    1992-06-17

    A implicit parallel method-of-lines solver that has been implemented on the MIMD Intel Gamma prototype supercomputer is discussed. The strategy for implementation is to execute the ODE solver sequentially and to do the numerical linear algebra in parallel. Performance studies for this implementation are presented.

  17. Supercomputing Centers and Electricity Service Providers

    DEFF Research Database (Denmark)

    Patki, Tapasya; Bates, Natalie; Ghatikar, Girish

    2016-01-01

    Supercomputing Centers (SCs) have high and variable power demands, which increase the challenges of the Electricity Service Providers (ESPs) with regards to efficient electricity distribution and reliable grid operation. High penetration of renewable energy generation further exacerbates this pro......Supercomputing Centers (SCs) have high and variable power demands, which increase the challenges of the Electricity Service Providers (ESPs) with regards to efficient electricity distribution and reliable grid operation. High penetration of renewable energy generation further exacerbates...... from a detailed, quantitative survey-based analysis and compare the perspectives of the European grid and SCs to the ones of the United States (US). We then show that contrary to the expectation, SCs in the US are more open toward cooperating and developing demand-management strategies with their ESPs...... (LRZ). We conclude that perspectives on demand management are dependent on the electricity market and pricing in the geographical region and on the degree of control that a particular SC has in terms of power-purchase negotiation....

  18. A workbench for tera-flop supercomputing

    Energy Technology Data Exchange (ETDEWEB)

    Resch, M.M.; Kuester, U.; Mueller, M.S.; Lang, U. [High Performance Computing Center Stuttgart (HLRS), Stuttgart (Germany)

    2003-07-01

    Supercomputers currently reach a peak performance in the range of TFlop/s. With but one exception - the Japanese Earth Simulator - none of these systems has so far been able to also show a level of sustained performance for a variety of applications that comes close to the peak performance. Sustained TFlop/s are therefore rarely seen. The reasons are manifold and are well known: Bandwidth and latency both for main memory and for the internal network are the key internal technical problems. Cache hierarchies with large caches can bring relief but are no remedy to the problem. However, there are not only technical problems that inhibit the full exploitation by scientists of the potential of modern supercomputers. More and more organizational issues come to the forefront. This paper shows the approach of the High Performance Computing Center Stuttgart (HLRS) to deliver a sustained performance of TFlop/s for a wide range of applications from a large group of users spread over Germany. The core of the concept is the role of the data. Around this we design a simulation workbench that hides the complexity of interacting computers, networks and file systems from the user. (authors)

  19. Seismic signal processing on heterogeneous supercomputers

    Science.gov (United States)

    Gokhberg, Alexey; Ermert, Laura; Fichtner, Andreas

    2015-04-01

    The processing of seismic signals - including the correlation of massive ambient noise data sets - represents an important part of a wide range of seismological applications. It is characterized by large data volumes as well as high computational input/output intensity. Development of efficient approaches towards seismic signal processing on emerging high performance computing systems is therefore essential. Heterogeneous supercomputing systems introduced in the recent years provide numerous computing nodes interconnected via high throughput networks, every node containing a mix of processing elements of different architectures, like several sequential processor cores and one or a few graphical processing units (GPU) serving as accelerators. A typical representative of such computing systems is "Piz Daint", a supercomputer of the Cray XC 30 family operated by the Swiss National Supercomputing Center (CSCS), which we used in this research. Heterogeneous supercomputers provide an opportunity for manifold application performance increase and are more energy-efficient, however they have much higher hardware complexity and are therefore much more difficult to program. The programming effort may be substantially reduced by the introduction of modular libraries of software components that can be reused for a wide class of seismology applications. The ultimate goal of this research is design of a prototype for such library suitable for implementing various seismic signal processing applications on heterogeneous systems. As a representative use case we have chosen an ambient noise correlation application. Ambient noise interferometry has developed into one of the most powerful tools to image and monitor the Earth's interior. Future applications will require the extraction of increasingly small details from noise recordings. To meet this demand, more advanced correlation techniques combined with very large data volumes are needed. This poses new computational problems that

  20. Most Social Scientists Shun Free Use of Supercomputers.

    Science.gov (United States)

    Kiernan, Vincent

    1998-01-01

    Social scientists, who frequently complain that the federal government spends too little on them, are passing up what scholars in the physical and natural sciences see as the government's best give-aways: free access to supercomputers. Some social scientists say the supercomputers are difficult to use; others find desktop computers provide…

  1. Parallel hierarchical global illumination

    Energy Technology Data Exchange (ETDEWEB)

    Snell, Quinn O. [Iowa State Univ., Ames, IA (United States)

    1997-10-08

    Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.

  2. Supercomputing - Use Cases, Advances, The Future (2/2)

    CERN Document Server

    CERN. Geneva

    2017-01-01

    Supercomputing has become a staple of science and the poster child for aggressive developments in silicon technology, energy efficiency and programming. In this series we examine the key components of supercomputing setups and the various advances – recent and past – that made headlines and delivered bigger and bigger machines. We also take a closer look at the future prospects of supercomputing, and the extent of its overlap with high throughput computing, in the context of main use cases ranging from oil exploration to market simulation. On the second day, we will focus on software and software paradigms driving supercomputers, workloads that need supercomputing treatment, advances in technology and possible future developments. Lecturer's short bio: Andrzej Nowak has 10 years of experience in computing technologies, primarily from CERN openlab and Intel. At CERN, he managed a research lab collaborating with Intel and was part of the openlab Chief Technology Office. Andrzej also worked closely and i...

  3. Integration Of PanDA Workload Management System With Supercomputers for ATLAS and Data Intensive Science

    Science.gov (United States)

    Klimentov, A.; De, K.; Jha, S.; Maeno, T.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Wells, J.; Wenaus, T.

    2016-10-01

    The.LHC, operating at CERN, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System for managing the workflow for all data processing on over 150 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250,000 cores with a peak performance of 0.3 petaFLOPS, LHC data taking runs require more resources than grid can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility. Current approach utilizes modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on LCFs multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads on several supercomputing platforms for ALICE and ATLAS experiments and it is in full pro duction for the ATLAS since September 2015. We will present our current accomplishments with running PanDA at supercomputers and demonstrate our ability to use PanDA as a portal independent of the

  4. A novel VLSI processor architecture for supercomputing arrays

    Science.gov (United States)

    Venkateswaran, N.; Pattabiraman, S.; Devanathan, R.; Ahmed, Ashaf; Venkataraman, S.; Ganesh, N.

    1993-01-01

    Design of the processor element for general purpose massively parallel supercomputing arrays is highly complex and cost ineffective. To overcome this, the architecture and organization of the functional units of the processor element should be such as to suit the diverse computational structures and simplify mapping of complex communication structures of different classes of algorithms. This demands that the computation and communication structures of different class of algorithms be unified. While unifying the different communication structures is a difficult process, analysis of a wide class of algorithms reveals that their computation structures can be expressed in terms of basic IP,IP,OP,CM,R,SM, and MAA operations. The execution of these operations is unified on the PAcube macro-cell array. Based on this PAcube macro-cell array, we present a novel processor element called the GIPOP processor, which has dedicated functional units to perform the above operations. The architecture and organization of these functional units are such to satisfy the two important criteria mentioned above. The structure of the macro-cell and the unification process has led to a very regular and simpler design of the GIPOP processor. The production cost of the GIPOP processor is drastically reduced as it is designed on high performance mask programmable PAcube arrays.

  5. Will Your Next Supercomputer Come from Costco?

    Energy Technology Data Exchange (ETDEWEB)

    Farber, Rob

    2007-04-15

    A fun topic for April, one that is not an April fool’s joke, is that you can purchase a commodity 200+ Gflop (single-precision) Linux supercomputer for around $600 from your favorite electronic vendor. Yes, it’s true. Just walk in and ask for a Sony Playstation 3 (PS3), take it home and install Linux on it. IBM has provided an excellent tutorial for installing Linux and building applications at http://www-128.ibm.com/developerworks/power/library/pa-linuxps3-1. If you want to raise some eyebrows at work, then submit a purchase request for a Sony PS3 game console and watch the reactions as your paperwork wends its way through the procurement process.

  6. HPL and STREAM Benchmarks on SANAM Supercomputer

    KAUST Repository

    Bin Sulaiman, Riman A.

    2017-03-13

    SANAM supercomputer was jointly built by KACST and FIAS in 2012 ranking second that year in the Green500 list with a power efficiency of 2.3 GFLOPS/W (Rohr et al., 2014). It is a heterogeneous accelerator-based HPC system that has 300 compute nodes. Each node includes two Intel Xeon E5?2650 CPUs, two AMD FirePro S10000 dual GPUs and 128 GiB of main memory. In this work, the seven benchmarks of HPCC were installed and configured to reassess the performance of SANAM, as part of an unpublished master thesis, after it was reassembled in the Kingdom of Saudi Arabia. We present here detailed results of HPL and STREAM benchmarks.

  7. Multiprocessing on supercomputers for computational aerodynamics

    Science.gov (United States)

    Yarrow, Maurice; Mehta, Unmeel B.

    1991-01-01

    Little use is made of multiple processors available on current supercomputers (computers with a theoretical peak performance capability equal to 100 MFLOPS or more) to improve turnaround time in computational aerodynamics. The productivity of a computer user is directly related to this turnaround time. In a time-sharing environment, such improvement in this speed is achieved when multiple processors are used efficiently to execute an algorithm. The concept of multiple instructions and multiple data (MIMD) is applied through multitasking via a strategy that requires relatively minor modifications to an existing code for a single processor. This approach maps the available memory to multiple processors, exploiting the C-Fortran-Unix interface. The existing code is mapped without the need for developing a new algorithm. The procedure for building a code utilizing this approach is automated with the Unix stream editor.

  8. Self-Confrontation of Teachers.

    Science.gov (United States)

    Schmuck, Richard A.

    Simply presenting teachers with information about discrepancies between their ideal and their actual classroom performances does not, in itself, lead to constructive change. In part, this is because teachers confronted with such discrepancies experience dissonance which often gives rise to anxiety. This paper discusses the psychological processes…

  9. Visualization on supercomputing platform level II ASC milestone (3537-1B) results from Sandia.

    Energy Technology Data Exchange (ETDEWEB)

    Geveci, Berk (Kitware, Inc., Clifton Park, NY); Fabian, Nathan; Marion, Patrick (Kitware, Inc., Clifton Park, NY); Moreland, Kenneth D.

    2010-09-01

    This report provides documentation for the completion of the Sandia portion of the ASC Level II Visualization on the platform milestone. This ASC Level II milestone is a joint milestone between Sandia National Laboratories and Los Alamos National Laboratories. This milestone contains functionality required for performing visualization directly on a supercomputing platform, which is necessary for peta-scale visualization. Sandia's contribution concerns in-situ visualization, running a visualization in tandem with a solver. Visualization and analysis of petascale data is limited by several factors which must be addressed as ACES delivers the Cielo platform. Two primary difficulties are: (1) Performance of interactive rendering, which is most computationally intensive portion of the visualization process. For terascale platforms, commodity clusters with graphics processors(GPUs) have been used for interactive rendering. For petascale platforms, visualization and rendering may be able to run efficiently on the supercomputer platform itself. (2) I/O bandwidth, which limits how much information can be written to disk. If we simply analyze the sparse information that is saved to disk we miss the opportunity to analyze the rich information produced every timestep by the simulation. For the first issue, we are pursuing in-situ analysis, in which simulations are coupled directly with analysis libraries at runtime. This milestone will evaluate the visualization and rendering performance of current and next generation supercomputers in contrast to GPU-based visualization clusters, and evaluate the performance of common analysis libraries coupled with the simulation that analyze and write data to disk during a running simulation. This milestone will explore, evaluate and advance the maturity level of these technologies and their applicability to problems of interest to the ASC program. Scientific simulation on parallel supercomputers is traditionally performed in four

  10. Accelerating Virtual High-Throughput Ligand Docking: current technology and case study on a petascale supercomputer.

    Science.gov (United States)

    Ellingson, Sally R; Dakshanamurthy, Sivanesan; Brown, Milton; Smith, Jeremy C; Baudry, Jerome

    2014-04-25

    In this paper we give the current state of high-throughput virtual screening. We describe a case study of using a task-parallel MPI (Message Passing Interface) version of Autodock4 [1], [2] to run a virtual high-throughput screen of one-million compounds on the Jaguar Cray XK6 Supercomputer at Oak Ridge National Laboratory. We include a description of scripts developed to increase the efficiency of the predocking file preparation and postdocking analysis. A detailed tutorial, scripts, and source code for this MPI version of Autodock4 are available online at http://www.bio.utk.edu/baudrylab/autodockmpi.htm.

  11. Modern Gyrokinetic Particle-In-Cell Simulation of Fusion Plasmas on Top Supercomputers

    CERN Document Server

    Wang, Bei; Tang, William; Ibrahim, Khaled; Madduri, Kamesh; Williams, Samuel; Oliker, Leonid

    2015-01-01

    The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability of the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon...

  12. Earth and environmental science in the 1980's: Part 1: Environmental data systems, supercomputer facilities and networks

    Science.gov (United States)

    1986-01-01

    Overview descriptions of on-line environmental data systems, supercomputer facilities, and networks are presented. Each description addresses the concepts of content, capability, and user access relevant to the point of view of potential utilization by the Earth and environmental science community. The information on similar systems or facilities is presented in parallel fashion to encourage and facilitate intercomparison. In addition, summary sheets are given for each description, and a summary table precedes each section.

  13. A Scalable Prescriptive Parallel Debugging Model

    DEFF Research Database (Denmark)

    Jensen, Nicklas Bo; Quarfot Nielsen, Niklas; Lee, Gregory L.

    2015-01-01

    Debugging is a critical step in the development of any parallel program. However, the traditional interactive debugging model, where users manually step through code and inspect their application, does not scale well even for current supercomputers due its centralized nature. While lightweight...

  14. World's biggest 'virtual supercomputer' given the go-ahead

    CERN Multimedia

    2003-01-01

    "The Particle Physics and Astronomy Research Council has today announced GBP 16 million to create a massive computing Grid, equivalent to the world's second largest supercomputer after Japan's Earth Simulator computer" (1 page).

  15. A new parallel TreeSPH code

    OpenAIRE

    Lia, Cesario; Carraro, Giovanni; Chiosi, Cesare; Voli, Marco

    1998-01-01

    In this report we describe a parallel implementation of a Tree-SPH code realized using the SHMEM libraries in the Cray T3E supercomputer at CINECA. We show the result of a 3D test to check the code performances against its scalar version. Finally we compare the load balancing and scalability of the code with PTreeSPH (Dav\\'e et al 1997), the only other parallel Tree-SPH code present in the literature.

  16. A survey of parallel programming tools

    Science.gov (United States)

    Cheng, Doreen Y.

    1991-01-01

    This survey examines 39 parallel programming tools. Focus is placed on those tool capabilites needed for parallel scientific programming rather than for general computer science. The tools are classified with current and future needs of Numerical Aerodynamic Simulator (NAS) in mind: existing and anticipated NAS supercomputers and workstations; operating systems; programming languages; and applications. They are divided into four categories: suggested acquisitions, tools already brought in; tools worth tracking; and tools eliminated from further consideration at this time.

  17. City Women Confront New Choices

    Institute of Scientific and Technical Information of China (English)

    1994-01-01

    THE Chinese economic reform has pushed society into a period of transformation. City women—especially career women—once again are confronted with new choices. Compared to Nora’s choice to leave or stay at home, as depicted in the Norwegian playwright Henrik Ibsen’s A Doll’s House, the Chinese career women’s choice is obviously more complicated. It involves social roles and the expectations society has about their gender. Several women doctoral students with the Chinese Academy of Social Sciences recently talked about this issue.

  18. Taking ASCI supercomputing to the end game.

    Energy Technology Data Exchange (ETDEWEB)

    DeBenedictis, Erik P.

    2004-03-01

    The ASCI supercomputing program is broadly defined as running physics simulations on progressively more powerful digital computers. What happens if we extrapolate the computer technology to its end? We have developed a model for key ASCI computations running on a hypothetical computer whose technology is parameterized in ways that account for advancing technology. This model includes technology information such as Moore's Law for transistor scaling and developments in cooling technology. The model also includes limits imposed by laws of physics, such as thermodynamic limits on power dissipation, limits on cooling, and the limitation of signal propagation velocity to the speed of light. We apply this model and show that ASCI computations will advance smoothly for another 10-20 years to an 'end game' defined by thermodynamic limits and the speed of light. Performance levels at the end game will vary greatly by specific problem, but will be in the Exaflops to Zetaflops range for currently anticipated problems. We have also found an architecture that would be within a constant factor of giving optimal performance at the end game. This architecture is an evolutionary derivative of the mesh-connected microprocessor (such as ASCI Red Storm or IBM Blue Gene/L). We provide designs for the necessary enhancement to microprocessor functionality and the power-efficiency of both the processor and memory system. The technology we develop in the foregoing provides a 'perfect' computer model with which we can rate the quality of realizable computer designs, both in this writing and as a way of designing future computers. This report focuses on classical computers based on irreversible digital logic, and more specifically on algorithms that simulate space computing, irreversible logic, analog computers, and other ways to address stockpile stewardship that are outside the scope of this report.

  19. Simulating functional magnetic materials on supercomputers.

    Science.gov (United States)

    Gruner, Markus Ernst; Entel, Peter

    2009-07-22

    The recent passing of the petaflop per second landmark by the Roadrunner project at the Los Alamos National Laboratory marks a preliminary peak of an impressive world-wide development in the high-performance scientific computing sector. Also, purely academic state-of-the-art supercomputers such as the IBM Blue Gene/P at Forschungszentrum Jülich allow us nowadays to investigate large systems of the order of 10(3) spin polarized transition metal atoms by means of density functional theory. Three applications will be presented where large-scale ab initio calculations contribute to the understanding of key properties emerging from a close interrelation between structure and magnetism. The first two examples discuss the size dependent evolution of equilibrium structural motifs in elementary iron and binary Fe-Pt and Co-Pt transition metal nanoparticles, which are currently discussed as promising candidates for ultra-high-density magnetic data storage media. However, the preference for multiply twinned morphologies at smaller cluster sizes counteracts the formation of a single-crystalline L1(0) phase, which alone provides the required hard magnetic properties. The third application is concerned with the magnetic shape memory effect in the Ni-Mn-Ga Heusler alloy, which is a technologically relevant candidate for magnetomechanical actuators and sensors. In this material strains of up to 10% can be induced by external magnetic fields due to the field induced shifting of martensitic twin boundaries, requiring an extremely high mobility of the martensitic twin boundaries, but also the selection of the appropriate martensitic structure from the rich phase diagram.

  20. Supercomputing - Use Cases, Advances, The Future (1/2)

    CERN Document Server

    CERN. Geneva

    2017-01-01

    Supercomputing has become a staple of science and the poster child for aggressive developments in silicon technology, energy efficiency and programming. In this series we examine the key components of supercomputing setups and the various advances – recent and past – that made headlines and delivered bigger and bigger machines. We also take a closer look at the future prospects of supercomputing, and the extent of its overlap with high throughput computing, in the context of main use cases ranging from oil exploration to market simulation. On the first day, we will focus on the history and theory of supercomputing, the top500 list and the hardware that makes supercomputers tick. Lecturer's short bio: Andrzej Nowak has 10 years of experience in computing technologies, primarily from CERN openlab and Intel. At CERN, he managed a research lab collaborating with Intel and was part of the openlab Chief Technology Office. Andrzej also worked closely and initiated projects with the private sector (e.g. HP an...

  1. Integration of PanDA workload management system with Titan supercomputer at OLCF

    Science.gov (United States)

    De, K.; Klimentov, A.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Schovancova, J.; Vaniachine, A.; Wenaus, T.

    2015-12-01

    The PanDA (Production and Distributed Analysis) workload management system (WMS) was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment. While PanDA currently distributes jobs to more than 100,000 cores at well over 100 Grid sites, the future LHC data taking runs will require more resources than Grid computing can possibly provide. To alleviate these challenges, ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF). The current approach utilizes a modified PanDA pilot framework for job submission to Titan's batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on Titan's multicore worker nodes. It also gives PanDA new capability to collect, in real time, information about unused worker nodes on Titan, which allows precise definition of the size and duration of jobs submitted to Titan according to available free resources. This capability significantly reduces PanDA job wait time while improving Titan's utilization efficiency. This implementation was tested with a variety of Monte-Carlo workloads on Titan and is being tested on several other supercomputing platforms. Notice: This manuscript has been authored, by employees of Brookhaven Science Associates, LLC under Contract No. DE-AC02-98CH10886 with the U.S. Department of Energy. The publisher by accepting the manuscript for publication acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.

  2. Confronting the coral reef crisis

    Science.gov (United States)

    Bellwood, D. R.; Hughes, T. P.; Folke, C.; Nyström, M.

    2004-06-01

    The worldwide decline of coral reefs calls for an urgent reassessment of current management practices. Confronting large-scale crises requires a major scaling-up of management efforts based on an improved understanding of the ecological processes that underlie reef resilience. Managing for improved resilience, incorporating the role of human activity in shaping ecosystems, provides a basis for coping with uncertainty, future changes and ecological surprises. Here we review the ecological roles of critical functional groups (for both corals and reef fishes) that are fundamental to understanding resilience and avoiding phase shifts from coral dominance to less desirable, degraded ecosystems. We identify striking biogeographic differences in the species richness and composition of functional groups, which highlight the vulnerability of Caribbean reef ecosystems. These findings have profound implications for restoration of degraded reefs, management of fisheries, and the focus on marine protected areas and biodiversity hotspots as priorities for conservation.

  3. Education confronts the energy dilemma

    Energy Technology Data Exchange (ETDEWEB)

    None

    1977-01-01

    The conference was convened to present a role that America's schools could play in solving or coping with the energy crisis. Eleven sessions were conducted to fulfill this concern: Our Energy Crisis and Education: A Critical Assessment; The Energy Agenda at the Office of Education; Energy Resources: Scenarios for the Future; The Moral Dilemma of Energy Education; Constraints Influencing Education's Role; Energy Education: What's Been Done to Date; Practitioners Discuss Their Future Roles, Responsibilities; Politics of Energy Education; Confronting the Energy Dilemma; The Meaning of Scarcity; and The Impact of the Carter Energy Program on American Schools. Summary reports and reactions to the conference conclude the proceedings. (MCW)

  4. Confronting the stigma of epilepsy

    Directory of Open Access Journals (Sweden)

    Sanjeev V Thomas

    2011-01-01

    Full Text Available Stigma and resultant psychosocial issues are major hurdles that people with epilepsy confront in their daily life. People with epilepsy, particularly women, living in economically weak countries are often ill equipped to handle the stigma that they experience at multiple levels. This paper offers a systematic review of the research on stigma from sociology and social psychology and details how stigma linked to epilepsy or similar conditions can result in stereotyping, prejudice and discrimination. We also briefly discuss the strategies that are most commonly utilized to mitigate stigma. Neurologists and other health care providers, social workers, support groups and policy makers working with epilepsy need to have a deep understanding of the social and cultural perceptions of epilepsy and the related stigma. It is necessary that societies establish unique determinants of stigma and set up appropriate strategies to mitigate stigma and facilitate the complete inclusion of people with epilepsy as well as mitigating any existing discrimination.

  5. An integrated distributed processing interface for supercomputers and workstations

    Energy Technology Data Exchange (ETDEWEB)

    Campbell, J.; McGavran, L.

    1989-01-01

    Access to documentation, communication between multiple processes running on heterogeneous computers, and animation of simulations of engineering problems are typically weak in most supercomputer environments. This presentation will describe how we are improving this situation in the Computer Research and Applications group at Los Alamos National Laboratory. We have developed a tool using UNIX filters and a SunView interface that allows users simple access to documentation via mouse driven menus. We have also developed a distributed application that integrated a two point boundary value problem on one of our Cray Supercomputers. It is controlled and displayed graphically by a window interface running on a workstation screen. Our motivation for this research has been to improve the usual typewriter/static interface using language independent controls to show capabilities of the workstation/supercomputer combination. 8 refs.

  6. Supercomputers and biological sequence comparison algorithms.

    Science.gov (United States)

    Core, N G; Edmiston, E W; Saltz, J H; Smith, R M

    1989-12-01

    Comparison of biological (DNA or protein) sequences provides insight into molecular structure, function, and homology and is increasingly important as the available databases become larger and more numerous. One method of increasing the speed of the calculations is to perform them in parallel. We present the results of initial investigations using two dynamic programming algorithms on the Intel iPSC hypercube and the Connection Machine as well as an inexpensive, heuristically-based algorithm on the Encore Multimax.

  7. Integration of PanDA workload management system with Titan supercomputer at OLCF

    CERN Document Server

    Panitkin, Sergey; The ATLAS collaboration; Klimentov, Alexei; Oleynik, Danila; Petrosyan, Artem; Schovancova, Jaroslava; Vaniachine, Alexandre; Wenaus, Torre

    2015-01-01

    The PanDA (Production and Distributed Analysis) workload management system (WMS) was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment. While PanDA currently uses more than 100,000 cores at well over 100 Grid sites with a peak performance of 0.3 petaFLOPS, next LHC data taking run will require more resources than Grid computing can possibly provide. To alleviate these challenges, ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF). Current approach utilizes modified PanDA pilot framework for job submission to Titan's batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on Titan's multi-core worker nodes. It also gives PanDA new capability to collect, in real tim...

  8. Integration of PanDA workload management system with Titan supercomputer at OLCF

    CERN Document Server

    De, Kaushik; Oleynik, Danila; Panitkin, Sergey; Petrosyan, Artem; Vaniachine, Alexandre; Wenaus, Torre; Schovancova, Jaroslava

    2015-01-01

    The PanDA (Production and Distributed Analysis) workload management system (WMS) was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment. While PanDA currently distributes jobs to more than 100,000 cores at well over 100 Grid sites, next LHC data taking run will require more resources than Grid computing can possibly provide. To alleviate these challenges, ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF). Current approach utilizes modi ed PanDA pilot framework for job submission to Titan's batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on Titan's multi-core worker nodes. It also gives PanDA new capability to collect, in real time, information about unused...

  9. OpenMC:Towards Simplifying Programming for TianHe Supercomputers

    Institute of Scientific and Technical Information of China (English)

    廖湘科; 杨灿群; 唐滔; 易会战; 王锋; 吴强; 薛京灵

    2014-01-01

    Modern petascale and future exascale systems are massively heterogeneous architectures. Developing produc-tive intra-node programming models is crucial toward addressing their programming challenge. We introduce a directive-based intra-node programming model, OpenMC, and show that this new model can achieve ease of programming, high performance, and the degree of portability desired for heterogeneous nodes, especially those in TianHe supercomputers. While existing models are geared towards offloading computations to accelerators (typically one), OpenMC aims to more uniformly and adequately exploit the potential offered by multiple CPUs and accelerators in a compute node. OpenMC achieves this by providing a unified abstraction of hardware resources as workers and facilitating the exploitation of asyn-chronous task parallelism on the workers. We present an overview of OpenMC, a prototyping implementation, and results from some initial comparisons with OpenMP and hand-written code in developing six applications on two types of nodes from TianHe supercomputers.

  10. Unique Methodologies for Nano/Micro Manufacturing Job Training Via Desktop Supercomputer Modeling and Simulation

    Energy Technology Data Exchange (ETDEWEB)

    Kimball, Clyde [Northern Illinois Univ., DeKalb, IL (United States); Karonis, Nicholas [Northern Illinois Univ., DeKalb, IL (United States); Lurio, Laurence [Northern Illinois Univ., DeKalb, IL (United States); Piot, Philippe [Northern Illinois Univ., DeKalb, IL (United States); Xiao, Zhili [Northern Illinois Univ., DeKalb, IL (United States); Glatz, Andreas [Northern Illinois Univ., DeKalb, IL (United States); Pohlman, Nicholas [Northern Illinois Univ., DeKalb, IL (United States); Hou, Minmei [Northern Illinois Univ., DeKalb, IL (United States); Demir, Veysel [Northern Illinois Univ., DeKalb, IL (United States); Song, Jie [Northern Illinois Univ., DeKalb, IL (United States); Duffin, Kirk [Northern Illinois Univ., DeKalb, IL (United States); Johns, Mitrick [Northern Illinois Univ., DeKalb, IL (United States); Sims, Thomas [Northern Illinois Univ., DeKalb, IL (United States); Yin, Yanbin [Northern Illinois Univ., DeKalb, IL (United States)

    2012-11-21

    This project establishes an initiative in high speed (Teraflop)/large-memory desktop supercomputing for modeling and simulation of dynamic processes important for energy and industrial applications. It provides a training ground for employment of current students in an emerging field with skills necessary to access the large supercomputing systems now present at DOE laboratories. It also provides a foundation for NIU faculty to quantum leap beyond their current small cluster facilities. The funding extends faculty and student capability to a new level of analytic skills with concomitant publication avenues. The components of the Hewlett Packard computer obtained by the DOE funds create a hybrid combination of a Graphics Processing System (12 GPU/Teraflops) and a Beowulf CPU system (144 CPU), the first expandable via the NIU GAEA system to ~60 Teraflops integrated with a 720 CPU Beowulf system. The software is based on access to the NVIDIA/CUDA library and the ability through MATLAB multiple licenses to create additional local programs. A number of existing programs are being transferred to the CPU Beowulf Cluster. Since the expertise necessary to create the parallel processing applications has recently been obtained at NIU, this effort for software development is in an early stage. The educational program has been initiated via formal tutorials and classroom curricula designed for the coming year. Specifically, the cost focus was on hardware acquisitions and appointment of graduate students for a wide range of applications in engineering, physics and computer science.

  11. Communication Characterization and Optimization of Applications Using Topology-Aware Task Mapping on Large Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Sreepathi, Sarat [ORNL; D' Azevedo, Eduardo [ORNL; Philip, Bobby [ORNL; Worley, Patrick H [ORNL

    2016-01-01

    On large supercomputers, the job scheduling systems may assign a non-contiguous node allocation for user applications depending on available resources. With parallel applications using MPI (Message Passing Interface), the default process ordering does not take into account the actual physical node layout available to the application. This contributes to non-locality in terms of physical network topology and impacts communication performance of the application. In order to mitigate such performance penalties, this work describes techniques to identify suitable task mapping that takes the layout of the allocated nodes as well as the application's communication behavior into account. During the first phase of this research, we instrumented and collected performance data to characterize communication behavior of critical US DOE (United States - Department of Energy) applications using an augmented version of the mpiP tool. Subsequently, we developed several reordering methods (spectral bisection, neighbor join tree etc.) to combine node layout and application communication data for optimized task placement. We developed a tool called mpiAproxy to facilitate detailed evaluation of the various reordering algorithms without requiring full application executions. This work presents a comprehensive performance evaluation (14,000 experiments) of the various task mapping techniques in lowering communication costs on Titan, the leadership class supercomputer at Oak Ridge National Laboratory.

  12. Argonne Leadership Computing Facility 2011 annual report : Shaping future supercomputing.

    Energy Technology Data Exchange (ETDEWEB)

    Papka, M.; Messina, P.; Coffey, R.; Drugan, C. (LCF)

    2012-08-16

    The ALCF's Early Science Program aims to prepare key applications for the architecture and scale of Mira and to solidify libraries and infrastructure that will pave the way for other future production applications. Two billion core-hours have been allocated to 16 Early Science projects on Mira. The projects, in addition to promising delivery of exciting new science, are all based on state-of-the-art, petascale, parallel applications. The project teams, in collaboration with ALCF staff and IBM, have undertaken intensive efforts to adapt their software to take advantage of Mira's Blue Gene/Q architecture, which, in a number of ways, is a precursor to future high-performance-computing architecture. The Argonne Leadership Computing Facility (ALCF) enables transformative science that solves some of the most difficult challenges in biology, chemistry, energy, climate, materials, physics, and other scientific realms. Users partnering with ALCF staff have reached research milestones previously unattainable, due to the ALCF's world-class supercomputing resources and expertise in computation science. In 2011, the ALCF's commitment to providing outstanding science and leadership-class resources was honored with several prestigious awards. Research on multiscale brain blood flow simulations was named a Gordon Bell Prize finalist. Intrepid, the ALCF's BG/P system, ranked No. 1 on the Graph 500 list for the second consecutive year. The next-generation BG/Q prototype again topped the Green500 list. Skilled experts at the ALCF enable researchers to conduct breakthrough science on the Blue Gene system in key ways. The Catalyst Team matches project PIs with experienced computational scientists to maximize and accelerate research in their specific scientific domains. The Performance Engineering Team facilitates the effective use of applications on the Blue Gene system by assessing and improving the algorithms used by applications and the techniques used to

  13. PFLOTRAN: Reactive Flow & Transport Code for Use on Laptops to Leadership-Class Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Hammond, Glenn E.; Lichtner, Peter C.; Lu, Chuan; Mills, Richard T.

    2012-04-18

    PFLOTRAN, a next-generation reactive flow and transport code for modeling subsurface processes, has been designed from the ground up to run efficiently on machines ranging from leadership-class supercomputers to laptops. Based on an object-oriented design, the code is easily extensible to incorporate additional processes. It can interface seamlessly with Fortran 9X, C and C++ codes. Domain decomposition parallelism is employed, with the PETSc parallel framework used to manage parallel solvers, data structures and communication. Features of the code include a modular input file, implementation of high-performance I/O using parallel HDF5, ability to perform multiple realization simulations with multiple processors per realization in a seamless manner, and multiple modes for multiphase flow and multicomponent geochemical transport. Chemical reactions currently implemented in the code include homogeneous aqueous complexing reactions and heterogeneous mineral precipitation/dissolution, ion exchange, surface complexation and a multirate kinetic sorption model. PFLOTRAN has demonstrated petascale performance using 2{sup 17} processor cores with over 2 billion degrees of freedom. Accomplishments achieved to date include applications to the Hanford 300 Area and modeling CO{sub 2} sequestration in deep geologic formations.

  14. Confronting shibboleths of dental education.

    Science.gov (United States)

    Masella, Richard S

    2005-10-01

    Shibboleths are common expressions presented as indisputable truths. When used in educational discussions, they reflect "motherhood and apple pie" viewpoints and tend to bring debate to a halt. Use of shibboleths may precede a desired imposition of "locksteps" in educational programming and are easily perceived as paternalistic by recipients. Nine shibboleths are presented as common beliefs of dental faculty and administrators. Evidence contradicting the veracity of the "obvious truths" is offered. The traditional "splendid isolation" of dentistry contributes to parochialism and belief in false shibboleths. Sound principles of higher and health professions education, student learning, and dental practice apply to dental education as to all health disciplines. Student passivity in dental education is not the best preparation for proficiency in dental practice. The master teacher possesses a repertoire of methodologies specific to meeting defined educational objectives. Active learning experiences bear close resemblances to professional duties and responsibilities and internally motivate future doctors of dental medicine. The difficulty in achieving curricular change leads to curricular entrenchment. Dentistry and dental education should not trade their ethical high ground for the relatively low ethical standards of the business world. Principles of professional ethics should govern relationships between dentists, whether within the dental school workplace or in practice. Suggestions are made on how to confront shibboleths in dental school settings.

  15. Recent results from the Swinburne supercomputer software correlator

    Science.gov (United States)

    Tingay, Steven; et al.

    I will descrcibe the development of software correlators on the Swinburne Beowulf supercomputer and recent work using the Cray XD-1 machine. I will also describe recent Australian and global VLBI experiments that have been processed on the Swinburne software correlator, along with imaging results from these data. The role of the software correlator in Australia's eVLBI project will be discussed.

  16. Flux-Level Transit Injection Experiments with NASA Pleiades Supercomputer

    Science.gov (United States)

    Li, Jie; Burke, Christopher J.; Catanzarite, Joseph; Seader, Shawn; Haas, Michael R.; Batalha, Natalie; Henze, Christopher; Christiansen, Jessie; Kepler Project, NASA Advanced Supercomputing Division

    2016-06-01

    Flux-Level Transit Injection (FLTI) experiments are executed with NASA's Pleiades supercomputer for the Kepler Mission. The latest release (9.3, January 2016) of the Kepler Science Operations Center Pipeline is used in the FLTI experiments. Their purpose is to validate the Analytic Completeness Model (ACM), which can be computed for all Kepler target stars, thereby enabling exoplanet occurrence rate studies. Pleiades, a facility of NASA's Advanced Supercomputing Division, is one of the world's most powerful supercomputers and represents NASA's state-of-the-art technology. We discuss the details of implementing the FLTI experiments on the Pleiades supercomputer. For example, taking into account that ~16 injections are generated by one core of the Pleiades processors in an hour, the “shallow” FLTI experiment, in which ~2000 injections are required per target star, can be done for 16% of all Kepler target stars in about 200 hours. Stripping down the transit search to bare bones, i.e. only searching adjacent high/low periods at high/low pulse durations, makes the computationally intensive FLTI experiments affordable. The design of the FLTI experiments and the analysis of the resulting data are presented in “Validating an Analytic Completeness Model for Kepler Target Stars Based on Flux-level Transit Injection Experiments” by Catanzarite et al. (#2494058).Kepler was selected as the 10th mission of the Discovery Program. Funding for the Kepler Mission has been provided by the NASA Science Mission Directorate.

  17. Access to Supercomputers. Higher Education Panel Report 69.

    Science.gov (United States)

    Holmstrom, Engin Inel

    This survey was conducted to provide the National Science Foundation with baseline information on current computer use in the nation's major research universities, including the actual and potential use of supercomputers. Questionnaires were sent to 207 doctorate-granting institutions; after follow-ups, 167 institutions (91% of the institutions…

  18. The Sky's the Limit When Super Students Meet Supercomputers.

    Science.gov (United States)

    Trotter, Andrew

    1991-01-01

    In a few select high schools in the U.S., supercomputers are allowing talented students to attempt sophisticated research projects using simultaneous simulations of nature, culture, and technology not achievable by ordinary microcomputers. Schools can get their students online by entering contests and seeking grants and partnerships with…

  19. A Framework for HI Spectral Source Finding Using Distributed-Memory Supercomputing

    CERN Document Server

    Westerlund, Stefan

    2014-01-01

    The latest generation of radio astronomy interferometers will conduct all sky surveys with data products consisting of petabytes of spectral line data. Traditional approaches to identifying and parameterising the astrophysical sources within this data will not scale to datasets of this magnitude, since the performance of workstations will not keep up with the real-time generation of data. For this reason, it is necessary to employ high performance computing systems consisting of a large number of processors connected by a high-bandwidth network. In order to make use of such supercomputers substantial modifications must be made to serial source finding code. To ease the transition, this work presents the Scalable Source Finder Framework, a framework providing storage access, networking communication and data composition functionality, which can support a wide range of source finding algorithms provided they can be applied to subsets of the entire image. Additionally, the Parallel Gaussian Source Finder was imp...

  20. Supercomputations and big-data analysis in strong-field ultrafast optical physics: filamentation of high-peak-power ultrashort laser pulses

    Science.gov (United States)

    Voronin, A. A.; Panchenko, V. Ya; Zheltikov, A. M.

    2016-06-01

    High-intensity ultrashort laser pulses propagating in gas media or in condensed matter undergo complex nonlinear spatiotemporal evolution where temporal transformations of optical field waveforms are strongly coupled to an intricate beam dynamics and ultrafast field-induced ionization processes. At the level of laser peak powers orders of magnitude above the critical power of self-focusing, the beam exhibits modulation instabilities, producing random field hot spots and breaking up into multiple noise-seeded filaments. This problem is described by a (3  +  1)-dimensional nonlinear field evolution equation, which needs to be solved jointly with the equation for ultrafast ionization of a medium. Analysis of this problem, which is equivalent to solving a billion-dimensional evolution problem, is only possible by means of supercomputer simulations augmented with coordinated big-data processing of large volumes of information acquired through theory-guiding experiments and supercomputations. Here, we review the main challenges of supercomputations and big-data processing encountered in strong-field ultrafast optical physics and discuss strategies to confront these challenges.

  1. Development of parallel mathematical subroutine library and large scale numerical simulation

    Energy Technology Data Exchange (ETDEWEB)

    Shimizu, Futoshi [Japan Atomic Energy Research Inst., Tokyo (Japan)

    1998-03-01

    In recent years, parallel computers, namely parallel supercomputers and workstation clusters, come into use for large scale numerical simulations in the field of computational science. At present, since the parallel programming is difficult compared to using serial computers, development of efficient program in parallel computers can be easily achieved by incorporating the parallelized numerical subroutines. In Japan Atomic Energy Research Institute (JAERI), portable mathematical subroutine library using MPI (Message Passing Interface) or PVM (Parallel Virtual Machine) is being developed for distributed memory parallel computers. In this report, we present the overview of the parallel library and its application to the parallelization for tight-binding molecular dynamics. (author)

  2. The QCDOC supercomputer: hardware, software, and performance

    CERN Document Server

    Boyle, P A; Wettig, T

    2003-01-01

    An overview is given of the QCDOC architecture, a massively parallel and highly scalable computer optimized for lattice QCD using system-on-a-chip technology. The heart of a single node is the PowerPC-based QCDOC ASIC, developed in collaboration with IBM Research, with a peak speed of 1 GFlop/s. The nodes communicate via high-speed serial links in a 6-dimensional mesh with nearest-neighbor connections. We find that highly optimized four-dimensional QCD code obtains over 50% efficiency in cycle accurate simulations of QCDOC, even for problems of fixed computational difficulty run on tens of thousands of nodes. We also provide an overview of the QCDOC operating system, which manages and runs QCDOC applications on partitions of variable dimensionality. Finally, the SciDAC activity for QCDOC and the message-passing interface QMP specified as a part of the SciDAC effort are discussed for QCDOC. We explain how to make optimal use of QMP routines on QCDOC in conjunction with existing C and C++ lattice QCD codes, inc...

  3. Design considerations for parallel graphics libraries

    Science.gov (United States)

    Crockett, Thomas W.

    1994-01-01

    Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.

  4. Cell-based Adaptive Mesh Refinement on the GPU with Applications to Exascale Supercomputing

    Science.gov (United States)

    Trujillo, Dennis; Robey, Robert; Davis, Neal; Nicholaeff, David

    2011-10-01

    We present an OpenCL implementation of a cell-based adaptive mesh refinement (AMR) scheme for the shallow water equations. The challenges associated with ensuring the locality of algorithm architecture to fully exploit the massive number of parallel threads on the GPU is discussed. This includes a proof of concept that a cell-based AMR code can be effectively implemented, even on a small scale, in the memory and threading model provided by OpenCL. Additionally, the program requires dynamic memory in order to properly implement the mesh; as this is not supported in the OpenCL 1.1 standard, a combination of CPU memory management and GPU computation effectively implements a dynamic memory allocation scheme. Load balancing is achieved through a new stencil-based implementation of a space-filling curve, eliminating the need for a complete recalculation of the indexing on the mesh. A cartesian grid hash table scheme to allow fast parallel neighbor accesses is also discussed. Finally, the relative speedup of the GPU-enabled AMR code is compared to the original serial version. We conclude that parallelization using the GPU provides significant speedup for typical numerical applications and is feasible for scientific applications in the next generation of supercomputing.

  5. A user-friendly web portal for T-Coffee on supercomputers

    Directory of Open Access Journals (Sweden)

    Koetsier Jos

    2011-05-01

    Full Text Available Abstract Background Parallel T-Coffee (PTC was the first parallel implementation of the T-Coffee multiple sequence alignment tool. It is based on MPI and RMA mechanisms. Its purpose is to reduce the execution time of the large-scale sequence alignments. It can be run on distributed memory clusters allowing users to align data sets consisting of hundreds of proteins within a reasonable time. However, most of the potential users of this tool are not familiar with the use of grids or supercomputers. Results In this paper we show how PTC can be easily deployed and controlled on a super computer architecture using a web portal developed using Rapid. Rapid is a tool for efficiently generating standardized portlets for a wide range of applications and the approach described here is generic enough to be applied to other applications, or to deploy PTC on different HPC environments. Conclusions The PTC portal allows users to upload a large number of sequences to be aligned by the parallel version of TC that cannot be aligned by a single machine due to memory and execution time constraints. The web portal provides a user-friendly solution.

  6. A user-friendly web portal for T-Coffee on supercomputers.

    Science.gov (United States)

    Rius, Josep; Cores, Fernando; Solsona, Francesc; van Hemert, Jano I; Koetsier, Jos; Notredame, Cedric

    2011-05-12

    Parallel T-Coffee (PTC) was the first parallel implementation of the T-Coffee multiple sequence alignment tool. It is based on MPI and RMA mechanisms. Its purpose is to reduce the execution time of the large-scale sequence alignments. It can be run on distributed memory clusters allowing users to align data sets consisting of hundreds of proteins within a reasonable time. However, most of the potential users of this tool are not familiar with the use of grids or supercomputers. In this paper we show how PTC can be easily deployed and controlled on a super computer architecture using a web portal developed using Rapid. Rapid is a tool for efficiently generating standardized portlets for a wide range of applications and the approach described here is generic enough to be applied to other applications, or to deploy PTC on different HPC environments. The PTC portal allows users to upload a large number of sequences to be aligned by the parallel version of TC that cannot be aligned by a single machine due to memory and execution time constraints. The web portal provides a user-friendly solution.

  7. Integration of Titan supercomputer at OLCF with ATLAS production system

    CERN Document Server

    Panitkin, Sergey; The ATLAS collaboration

    2016-01-01

    The PanDA (Production and Distributed Analysis) workload management system was developed to meet the scale and complexity of distributed computing for the ATLAS experiment. PanDA managed resources are distributed worldwide, on hundreds of computing sites, with thousands of physicists accessing hundreds of Petabytes of data and the rate of data processing already exceeds Exabyte per year. While PanDA currently uses more than 200,000 cores at well over 100 Grid sites, future LHC data taking runs will require more resources than Grid computing can possibly provide. Additional computing and storage resources are required. Therefore ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. In this talk we will describe a project aimed at integration of ATLAS Production System with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF). Current approach utilizes modified PanDA Pilot framework for job...

  8. Supercomputers ready for use as discovery machines for neuroscience

    OpenAIRE

    Kunkel, Susanne; Schmidt, Maximilian; Helias, Moritz; Eppler, Jochen Martin; Igarashi, Jun; Masumoto, Gen; Fukai, Tomoki; Ishii, Shin; Plesser, Hans Ekkehard; Morrison, Abigail; Diesmann, Markus

    2013-01-01

    NEST is a widely used tool to simulate biological spiking neural networks [1]. The simulator is subject to continuous development, which is driven by the requirements of the current neuroscientific questions. At present, a major part of the software development focuses on the improvement of the simulator's fundamental data structures in order to enable brain-scale simulations on supercomputers such as the Blue Gene system in Jülich and the K computer in Kobe. Based on our memory-u...

  9. Scientists turn to supercomputers for knowledge about universe

    CERN Multimedia

    White, G

    2003-01-01

    The DOE is funding the computers at the Center for Astrophysical Thermonuclear Flashes which is based at the University of Chicago and uses supercomputers at the nation's weapons labs to study explosions in and on certain stars. The DOE is picking up the project's bill in the hope that the work will help the agency learn to better simulate the blasts of nuclear warheads (1 page).

  10. Study of ATLAS TRT performance with GRID and supercomputers

    Science.gov (United States)

    Krasnopevtsev, D. V.; Klimentov, A. A.; Mashinistov, R. Yu.; Belyaev, N. L.; Ryabinkin, E. A.

    2016-09-01

    One of the most important studies dedicated to be solved for ATLAS physical analysis is a reconstruction of proton-proton events with large number of interactions in Transition Radiation Tracker. Paper includes Transition Radiation Tracker performance results obtained with the usage of the ATLAS GRID and Kurchatov Institute's Data Processing Center including Tier-1 grid site and supercomputer as well as analysis of CPU efficiency during these studies.

  11. Calculation of Free Energy Landscape in Multi-Dimensions with Hamiltonian-Exchange Umbrella Sampling on Petascale Supercomputer.

    Science.gov (United States)

    Jiang, Wei; Luo, Yun; Maragliano, Luca; Roux, Benoît

    2012-11-13

    An extremely scalable computational strategy is described for calculations of the potential of mean force (PMF) in multidimensions on massively distributed supercomputers. The approach involves coupling thousands of umbrella sampling (US) simulation windows distributed to cover the space of order parameters with a Hamiltonian molecular dynamics replica-exchange (H-REMD) algorithm to enhance the sampling of each simulation. In the present application, US/H-REMD is carried out in a two-dimensional (2D) space and exchanges are attempted alternatively along the two axes corresponding to the two order parameters. The US/H-REMD strategy is implemented on the basis of parallel/parallel multiple copy protocol at the MPI level, and therefore can fully exploit computing power of large-scale supercomputers. Here the novel technique is illustrated using the leadership supercomputer IBM Blue Gene/P with an application to a typical biomolecular calculation of general interest, namely the binding of calcium ions to the small protein Calbindin D9k. The free energy landscape associated with two order parameters, the distance between the ion and its binding pocket and the root-mean-square deviation (rmsd) of the binding pocket relative the crystal structure, was calculated using the US/H-REMD method. The results are then used to estimate the absolute binding free energy of calcium ion to Calbindin D9k. The tests demonstrate that the 2D US/H-REMD scheme greatly accelerates the configurational sampling of the binding pocket, thereby improving the convergence of the potential of mean force calculation.

  12. From Thread to Transcontinental Computer: Disturbing Lessons in Distributed Supercomputing

    CERN Document Server

    Groen, Derek

    2015-01-01

    We describe the political and technical complications encountered during the astronomical CosmoGrid project. CosmoGrid is a numerical study on the formation of large scale structure in the universe. The simulations are challenging due to the enormous dynamic range in spatial and temporal coordinates, as well as the enormous computer resources required. In CosmoGrid we dealt with the computational requirements by connecting up to four supercomputers via an optical network and make them operate as a single machine. This was challenging, if only for the fact that the supercomputers of our choice are separated by half the planet, as three of them are located scattered across Europe and fourth one is in Tokyo. The co-scheduling of multiple computers and the 'gridification' of the code enabled us to achieve an efficiency of up to $93\\%$ for this distributed intercontinental supercomputer. In this work, we find that high-performance computing on a grid can be done much more effectively if the sites involved are will...

  13. Extracting the Textual and Temporal Structure of Supercomputing Logs

    Energy Technology Data Exchange (ETDEWEB)

    Jain, S; Singh, I; Chandra, A; Zhang, Z; Bronevetsky, G

    2009-05-26

    Supercomputers are prone to frequent faults that adversely affect their performance, reliability and functionality. System logs collected on these systems are a valuable resource of information about their operational status and health. However, their massive size, complexity, and lack of standard format makes it difficult to automatically extract information that can be used to improve system management. In this work we propose a novel method to succinctly represent the contents of supercomputing logs, by using textual clustering to automatically find the syntactic structures of log messages. This information is used to automatically classify messages into semantic groups via an online clustering algorithm. Further, we describe a methodology for using the temporal proximity between groups of log messages to identify correlated events in the system. We apply our proposed methods to two large, publicly available supercomputing logs and show that our technique features nearly perfect accuracy for online log-classification and extracts meaningful structural and temporal message patterns that can be used to improve the accuracy of other log analysis techniques.

  14. An efficient parallel algorithm for matrix-vector multiplication

    Energy Technology Data Exchange (ETDEWEB)

    Hendrickson, B.; Leland, R.; Plimpton, S.

    1993-03-01

    The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in the well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.

  15. Efficient development of memory bounded geo-applications to scale on modern supercomputers

    Science.gov (United States)

    Räss, Ludovic; Omlin, Samuel; Licul, Aleksandar; Podladchikov, Yuri; Herman, Frédéric

    2016-04-01

    Numerical modeling is an actual key tool in the area of geosciences. The current challenge is to solve problems that are multi-physics and for which the length scale and the place of occurrence might not be known in advance. Also, the spatial extend of the investigated domain might strongly vary in size, ranging from millimeters for reactive transport to kilometers for glacier erosion dynamics. An efficient way to proceed is to develop simple but robust algorithms that perform well and scale on modern supercomputers and permit therefore very high-resolution simulations. We propose an efficient approach to solve memory bounded real-world applications on modern supercomputers architectures. We optimize the software to run on our newly acquired state-of-the-art GPU cluster "octopus". Our approach shows promising preliminary results on important geodynamical and geomechanical problematics: we have developed a Stokes solver for glacier flow and a poromechanical solver including complex rheologies for nonlinear waves in stressed rocks porous rocks. We solve the system of partial differential equations on a regular Cartesian grid and use an iterative finite difference scheme with preconditioning of the residuals. The MPI communication happens only locally (point-to-point); this method is known to scale linearly by construction. The "octopus" GPU cluster, which we use for the computations, has been designed to achieve maximal data transfer throughput at minimal hardware cost. It is composed of twenty compute nodes, each hosting four Nvidia Titan X GPU accelerators. These high-density nodes are interconnected with a parallel (dual-rail) FDR InfiniBand network. Our efforts show promising preliminary results for the different physics investigated. The glacier flow solver achieves good accuracy in the relevant benchmarks and the coupled poromechanical solver permits to explain previously unresolvable focused fluid flow as a natural outcome of the porosity setup. In both cases

  16. Benchmarking Further Single Board Computers for Building a Mini Supercomputer for Simulation of Telecommunication Systems

    Directory of Open Access Journals (Sweden)

    Gábor Lencse

    2016-01-01

    Full Text Available Parallel Discrete Event Simulation (PDES with the conservative synchronization method can be efficiently used for the performance analysis of telecommunication systems because of their good lookahead properties. For PDES, a cost effective execution platform may be built by using single board computers (SBCs, which offer relatively high computation capacity compared to their price or power consumption and especially to the space they take up. A benchmarking method is proposed and its operation is demonstrated by benchmarking ten different SBCs, namely Banana Pi, Beaglebone Black, Cubieboard2, Odroid-C1+, Odroid-U3+, Odroid-XU3 Lite, Orange Pi Plus, Radxa Rock Lite, Raspberry Pi Model B+, and Raspberry Pi 2 Model B+. Their benchmarking results are compared to find out which one should be used for building a mini supercomputer for parallel discrete-event simulation of telecommunication systems. The SBCs are also used to build a heterogeneous cluster and the performance of the cluster is tested, too.

  17. Dust modelling and forecasting in the Barcelona Supercomputing Center: Activities and developments

    Energy Technology Data Exchange (ETDEWEB)

    Perez, C; Baldasano, J M; Jimenez-Guerrero, P; Jorba, O; Haustein, K; Basart, S [Earth Sciences Department. Barcelona Supercomputing Center. Barcelona (Spain); Cuevas, E [Izanaa Atmospheric Research Center. Agencia Estatal de Meteorologia, Tenerife (Spain); Nickovic, S [Atmospheric Research and Environment Branch, World Meteorological Organization, Geneva (Switzerland)], E-mail: carlos.perez@bsc.es

    2009-03-01

    The Barcelona Supercomputing Center (BSC) is the National Supercomputer Facility in Spain, hosting MareNostrum, one of the most powerful Supercomputers in Europe. The Earth Sciences Department of BSC operates daily regional dust and air quality forecasts and conducts intensive modelling research for short-term operational prediction. This contribution summarizes the latest developments and current activities in the field of sand and dust storm modelling and forecasting.

  18. Recognizing, Confronting, and Eliminating Workplace Bullying.

    Science.gov (United States)

    Berry, Peggy Ann; Gillespie, Gordon L; Fisher, Bonnie S; Gormley, Denise K

    2016-07-01

    Workplace bullying (WPB) behaviors negatively affect nurse productivity, satisfaction, and retention, and hinder safe patient care. The purpose of this article is to define WPB, differentiate between incivility and WPB, and recommend actions to prevent WPB behaviors. Informed occupational and environmental health nurses and nurse leaders must recognize, confront, and eliminate WPB in their facilities and organizations. Recognizing, confronting, and eliminating WPB behaviors in health care is a crucial first step toward sustained improvements in patient care quality and the health and safety of health care employees. © 2016 The Author(s).

  19. Parallel CFD design on network-based computer

    Science.gov (United States)

    Cheung, Samson

    1995-01-01

    Combining multiple engineering workstations into a network-based heterogeneous parallel computer allows application of aerodynamic optimization with advanced computational fluid dynamics codes, which can be computationally expensive on mainframe supercomputers. This paper introduces a nonlinear quasi-Newton optimizer designed for this network-based heterogeneous parallel computing environment utilizing a software called Parallel Virtual Machine. This paper will introduce the methodology behind coupling a Parabolized Navier-Stokes flow solver to the nonlinear optimizer. This parallel optimization package is applied to reduce the wave drag of a body of revolution and a wing/body configuration with results of 5% to 6% drag reduction.

  20. Combined Scheduling and Mapping for Scalable Computing with Parallel Tasks

    Directory of Open Access Journals (Sweden)

    Jörg Dümmler

    2012-01-01

    Full Text Available Recent and future parallel clusters and supercomputers use symmetric multiprocessors (SMPs and multi-core processors as basic nodes, providing a huge amount of parallel resources. These systems often have hierarchically structured interconnection networks combining computing resources at different levels, starting with the interconnect within multi-core processors up to the interconnection network combining nodes of the cluster or supercomputer. The challenge for the programmer is that these computing resources should be utilized efficiently by exploiting the available degree of parallelism of the application program and by structuring the application in a way which is sensitive to the heterogeneous interconnect. In this article, we pursue a parallel programming method using parallel tasks to structure parallel implementations. A parallel task can be executed by multiple processors or cores and, for each activation of a parallel task, the actual number of executing cores can be adapted to the specific execution situation. In particular, we propose a new combined scheduling and mapping technique for parallel tasks with dependencies that takes the hierarchical structure of modern multi-core clusters into account. An experimental evaluation shows that the presented programming approach can lead to a significantly higher performance compared to standard data parallel implementations.

  1. Confronting the Economic Model with the Data

    DEFF Research Database (Denmark)

    Johansen, Søren

    2006-01-01

    Econometrics is about confronting economic models with the data. In doing so it is crucial to choose a statistical model that not only contains the economic model as a submodel, but also contains the data generating process. When this is the case, the statistical model can be analyzed by likeliho...

  2. Macroeconomic Issues Confronting the Next President.

    Science.gov (United States)

    Solow, Robert M.

    1988-01-01

    Identifies economic issues that confronted the United States in the late 1980's and discusses how the president might deal with them. Highlights the following issues: recession, rising price levels, the budget deficit, international trade imbalance, and revival of U.S. long-term growth. (GEA)

  3. Engineering-Based Thermal CFD Simulations on Massive Parallel Systems

    Directory of Open Access Journals (Sweden)

    Jérôme Frisch

    2015-05-01

    Full Text Available The development of parallel Computational Fluid Dynamics (CFD codes is a challenging task that entails efficient parallelization concepts and strategies in order to achieve good scalability values when running those codes on modern supercomputers with several thousands to millions of cores. In this paper, we present a hierarchical data structure for massive parallel computations that supports the coupling of a Navier–Stokes-based fluid flow code with the Boussinesq approximation in order to address complex thermal scenarios for energy-related assessments. The newly designed data structure is specifically designed with the idea of interactive data exploration and visualization during runtime of the simulation code; a major shortcoming of traditional high-performance computing (HPC simulation codes. We further show and discuss speed-up values obtained on one of Germany’s top-ranked supercomputers with up to 140,000 processes and present simulation results for different engineering-based thermal problems.

  4. Engineering-Based Thermal CFD Simulations on Massive Parallel Systems

    KAUST Repository

    Frisch, Jérôme

    2015-05-22

    The development of parallel Computational Fluid Dynamics (CFD) codes is a challenging task that entails efficient parallelization concepts and strategies in order to achieve good scalability values when running those codes on modern supercomputers with several thousands to millions of cores. In this paper, we present a hierarchical data structure for massive parallel computations that supports the coupling of a Navier–Stokes-based fluid flow code with the Boussinesq approximation in order to address complex thermal scenarios for energy-related assessments. The newly designed data structure is specifically designed with the idea of interactive data exploration and visualization during runtime of the simulation code; a major shortcoming of traditional high-performance computing (HPC) simulation codes. We further show and discuss speed-up values obtained on one of Germany’s top-ranked supercomputers with up to 140,000 processes and present simulation results for different engineering-based thermal problems.

  5. Massively parallel processing of remotely sensed hyperspectral images

    Science.gov (United States)

    Plaza, Javier; Plaza, Antonio; Valencia, David; Paz, Abel

    2009-08-01

    In this paper, we develop several parallel techniques for hyperspectral image processing that have been specifically designed to be run on massively parallel systems. The techniques developed cover the three relevant areas of hyperspectral image processing: 1) spectral mixture analysis, a popular approach to characterize mixed pixels in hyperspectral data addressed in this work via efficient implementation of a morphological algorithm for automatic identification of pure spectral signatures or endmembers from the input data; 2) supervised classification of hyperspectral data using multi-layer perceptron neural networks with back-propagation learning; and 3) automatic target detection in the hyperspectral data using orthogonal subspace projection concepts. The scalability of the proposed parallel techniques is investigated using Barcelona Supercomputing Center's MareNostrum facility, one of the most powerful supercomputers in Europe.

  6. Parallel plasma fluid turbulence calculations

    Energy Technology Data Exchange (ETDEWEB)

    Leboeuf, J.N.; Carreras, B.A.; Charlton, L.A.; Drake, J.B.; Lynch, V.E.; Newman, D.E.; Sidikman, K.L.; Spong, D.A.

    1994-12-31

    The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center`s CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated.

  7. A special purpose silicon compiler for designing supercomputing VLSI systems

    Science.gov (United States)

    Venkateswaran, N.; Murugavel, P.; Kamakoti, V.; Shankarraman, M. J.; Rangarajan, S.; Mallikarjun, M.; Karthikeyan, B.; Prabhakar, T. S.; Satish, V.; Venkatasubramaniam, P. R.

    1991-01-01

    Design of general/special purpose supercomputing VLSI systems for numeric algorithm execution involves tackling two important aspects, namely their computational and communication complexities. Development of software tools for designing such systems itself becomes complex. Hence a novel design methodology has to be developed. For designing such complex systems a special purpose silicon compiler is needed in which: the computational and communicational structures of different numeric algorithms should be taken into account to simplify the silicon compiler design, the approach is macrocell based, and the software tools at different levels (algorithm down to the VLSI circuit layout) should get integrated. In this paper a special purpose silicon (SPS) compiler based on PACUBE macrocell VLSI arrays for designing supercomputing VLSI systems is presented. It is shown that turn-around time and silicon real estate get reduced over the silicon compilers based on PLA's, SLA's, and gate arrays. The first two silicon compiler characteristics mentioned above enable the SPS compiler to perform systolic mapping (at the macrocell level) of algorithms whose computational structures are of GIPOP (generalized inner product outer product) form. Direct systolic mapping on PLA's, SLA's, and gate arrays is very difficult as they are micro-cell based. A novel GIPOP processor is under development using this special purpose silicon compiler.

  8. Solidification in a Supercomputer: From Crystal Nuclei to Dendrite Assemblages

    Science.gov (United States)

    Shibuta, Yasushi; Ohno, Munekazu; Takaki, Tomohiro

    2015-08-01

    Thanks to the recent progress in high-performance computational environments, the range of applications of computational metallurgy is expanding rapidly. In this paper, cutting-edge simulations of solidification from atomic to microstructural levels performed on a graphics processing unit (GPU) architecture are introduced with a brief introduction to advances in computational studies on solidification. In particular, million-atom molecular dynamics simulations captured the spontaneous evolution of anisotropy in a solid nucleus in an undercooled melt and homogeneous nucleation without any inducing factor, which is followed by grain growth. At the microstructural level, the quantitative phase-field model has been gaining importance as a powerful tool for predicting solidification microstructures. In this paper, the convergence behavior of simulation results obtained with this model is discussed, in detail. Such convergence ensures the reliability of results of phase-field simulations. Using the quantitative phase-field model, the competitive growth of dendrite assemblages during the directional solidification of a binary alloy bicrystal at the millimeter scale is examined by performing two- and three-dimensional large-scale simulations by multi-GPU computation on the supercomputer, TSUBAME2.5. This cutting-edge approach using a GPU supercomputer is opening a new phase in computational metallurgy.

  9. The TeraGyroid Experiment – Supercomputing 2003

    Directory of Open Access Journals (Sweden)

    R.J. Blake

    2005-01-01

    Full Text Available Amphiphiles are molecules with hydrophobic tails and hydrophilic heads. When dispersed in solvents, they self assemble into complex mesophases including the beautiful cubic gyroid phase. The goal of the TeraGyroid experiment was to study defect pathways and dynamics in these gyroids. The UK's supercomputing and USA's TeraGrid facilities were coupled together, through a dedicated high-speed network, into a single computational Grid for research work that peaked around the Supercomputing 2003 conference. The gyroids were modeled using lattice Boltzmann methods with parameter spaces explored using many 1283 and 3grid point simulations, this data being used to inform the world's largest three-dimensional time dependent simulation with 10243-grid points. The experiment generated some 2 TBytes of useful data. In terms of Grid technology the project demonstrated the migration of simulations (using Globus middleware to and fro across the Atlantic exploiting the availability of resources. Integration of the systems accelerated the time to insight. Distributed visualisation of the output datasets enabled the parameter space of the interactions within the complex fluid to be explored from a number of sites, informed by discourse over the Access Grid. The project was sponsored by EPSRC (UK and NSF (USA with trans-Atlantic optical bandwidth provided by British Telecommunications.

  10. Calibrating Building Energy Models Using Supercomputer Trained Machine Learning Agents

    Energy Technology Data Exchange (ETDEWEB)

    Sanyal, Jibonananda [ORNL; New, Joshua Ryan [ORNL; Edwards, Richard [ORNL; Parker, Lynne Edwards [ORNL

    2014-01-01

    Building Energy Modeling (BEM) is an approach to model the energy usage in buildings for design and retrofit purposes. EnergyPlus is the flagship Department of Energy software that performs BEM for different types of buildings. The input to EnergyPlus can often extend in the order of a few thousand parameters which have to be calibrated manually by an expert for realistic energy modeling. This makes it challenging and expensive thereby making building energy modeling unfeasible for smaller projects. In this paper, we describe the Autotune research which employs machine learning algorithms to generate agents for the different kinds of standard reference buildings in the U.S. building stock. The parametric space and the variety of building locations and types make this a challenging computational problem necessitating the use of supercomputers. Millions of EnergyPlus simulations are run on supercomputers which are subsequently used to train machine learning algorithms to generate agents. These agents, once created, can then run in a fraction of the time thereby allowing cost-effective calibration of building models.

  11. 6th International Parallel Tools Workshop

    CERN Document Server

    Brinkmann, Steffen; Gracia, José; Resch, Michael; Nagel, Wolfgang

    2013-01-01

    The latest advances in the High Performance Computing hardware have significantly raised the level of available compute performance. At the same time, the growing hardware capabilities of modern supercomputing architectures have caused an increasing complexity of the parallel application development. Despite numerous efforts to improve and simplify parallel programming, there is still a lot of manual debugging and  tuning work required. This process  is supported by special software tools, facilitating debugging, performance analysis, and optimization and thus  making a major contribution to the development of  robust and efficient parallel software. This book introduces a selection of the tools, which were presented and discussed at the 6th International Parallel Tools Workshop, held in Stuttgart, Germany, 25-26 September 2012.

  12. Who confronts prejudice?: the role of implicit theories in the motivation to confront prejudice.

    Science.gov (United States)

    Rattan, Aneeta; Dweck, Carol S

    2010-07-01

    Despite the possible costs, confronting prejudice can have important benefits, ranging from the well-being of the target of prejudice to social change. What, then, motivates targets of prejudice to confront people who express explicit bias? In three studies, we tested the hypothesis that targets who hold an incremental theory of personality (i.e., the belief that people can change) are more likely to confront prejudice than targets who hold an entity theory of personality (i.e., the belief that people have fixed traits). In Study 1, targets' beliefs about the malleability of personality predicted whether they spontaneously confronted an individual who expressed bias. In Study 2, targets who held more of an incremental theory reported that they would be more likely to confront prejudice and less likely to withdraw from future interactions with an individual who expressed prejudice. In Study 3, we manipulated implicit theories and replicated these findings. By highlighting the central role that implicit theories of personality play in targets' motivation to confront prejudice, this research has important implications for intergroup relations and social change.

  13. High temporal resolution mapping of seismic noise sources using heterogeneous supercomputers

    Science.gov (United States)

    Gokhberg, Alexey; Ermert, Laura; Paitz, Patrick; Fichtner, Andreas

    2017-04-01

    Time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems. Significant interest in seismic noise source maps with high temporal resolution (days) is expected to come from a number of domains, including natural resources exploration, analysis of active earthquake fault zones and volcanoes, as well as geothermal and hydrocarbon reservoir monitoring. Currently, knowledge of noise sources is insufficient for high-resolution subsurface monitoring applications. Near-real-time seismic data, as well as advanced imaging methods to constrain seismic noise sources have recently become available. These methods are based on the massive cross-correlation of seismic noise records from all available seismic stations in the region of interest and are therefore very computationally intensive. Heterogeneous massively parallel supercomputing systems introduced in the recent years combine conventional multi-core CPU with GPU accelerators and provide an opportunity for manifold increase and computing performance. Therefore, these systems represent an efficient platform for implementation of a noise source mapping solution. We present the first results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service that provides seismic noise source maps for Central Europe with high temporal resolution (days to few weeks depending on frequency and data availability). The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept in order to provide the interested external researchers the regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for

  14. Automatic discovery of the communication network topology for building a supercomputer model

    Science.gov (United States)

    Sobolev, Sergey; Stefanov, Konstantin; Voevodin, Vadim

    2016-10-01

    The Research Computing Center of Lomonosov Moscow State University is developing the Octotron software suite for automatic monitoring and mitigation of emergency situations in supercomputers so as to maximize hardware reliability. The suite is based on a software model of the supercomputer. The model uses a graph to describe the computing system components and their interconnections. One of the most complex components of a supercomputer that needs to be included in the model is its communication network. This work describes the proposed approach for automatically discovering the Ethernet communication network topology in a supercomputer and its description in terms of the Octotron model. This suite automatically detects computing nodes and switches, collects information about them and identifies their interconnections. The application of this approach is demonstrated on the "Lomonosov" and "Lomonosov-2" supercomputers.

  15. Goal preference shapes confrontations of sexism.

    Science.gov (United States)

    Mallett, Robyn K; Melchiori, Kala J

    2014-05-01

    Although most women assume they would confront sexism, assertive responses are rare. We test whether women's preference for respect or liking during interpersonal interactions explains this surprising tendency. Women report preferring respect relative to liking after being asked sexist, compared with inappropriate, questions during a virtual job interview (Study 1, n = 149). Women's responses to sexism increase in assertiveness along with their preference for being respected, and a respect-preference mediates the relation between the type of questions and response assertiveness (Studies 1 and 2). In Study 2 (n = 105), women's responses to sexist questions are more assertive when the sense of belonging is enhanced with a belonging manipulation. Moreover, preference for respect mediates the effect of the type of questions on response assertiveness, but only when belonging needs are met. Thus the likelihood of confrontation depends on the goal to be respected outweighing the goal to be liked.

  16. CHILD WITNESSES AND THE CONFRONTATION CLAUSE.

    Science.gov (United States)

    Lyon, Thomas D; Dente, Julia A

    2012-01-01

    After the Supreme Court's ruling in Crawford v. Washington that a criminal defendant's right to confront the witnesses against him is violated by the admission of testimonial hearsay that has not been cross-examined, lower courts have overturned convictions in which hearsay from children was admitted after child witnesses were either unwilling or unable to testify. A review of social scientific evidence regarding the dynamics of child sexual abuse suggests a means for facilitating the fair receipt of children's evidence. Courts should hold that defendants have forfeited their confrontation rights if they exploited a child's vulnerabilities such that they could reasonably anticipate that the child would be unavailable to testify. Exploitation includes choosing victims on the basis of their filial dependency, their vulnerability, or their immaturity, as well as taking actions that create or accentuate those vulnerabilities.

  17. 369 TFlop/s molecular dynamics simulations on the Roadrunner general-purpose heterogeneous supercomputer

    Energy Technology Data Exchange (ETDEWEB)

    Swaminarayan, Sriram [Los Alamos National Laboratory; Germann, Timothy C [Los Alamos National Laboratory; Kadau, Kai [Los Alamos National Laboratory; Fossum, Gordon C [IBM CORPORATION

    2008-01-01

    The authors present timing and performance numbers for a short-range parallel molecular dynamics (MD) code, SPaSM, that has been rewritten for the heterogeneous Roadrunner supercomputer. Each Roadrunner compute node consists of two AMD Opteron dual-core microprocessors and four PowerXCell 8i enhanced Cell microprocessors, so that there are four MPI ranks per node, each with one Opteron and one Cell. The interatomic forces are computed on the Cells (each with one PPU and eight SPU cores), while the Opterons are used to direct inter-rank communication and perform I/O-heavy periodic analysis, visualization, and checkpointing tasks. The performance measured for our initial implementation of a standard Lennard-Jones pair potential benchmark reached a peak of 369 Tflop/s double-precision floating-point performance on the full Roadrunner system (27.7% of peak), corresponding to 124 MFlop/Watt/s at a price of approximately 3.69 MFlops/dollar. They demonstrate an initial target application, the jetting and ejection of material from a shocked surface.

  18. Integration of Titan supercomputer at OLCF with ATLAS Production System

    CERN Document Server

    Barreiro Megino, Fernando Harald; The ATLAS collaboration

    2017-01-01

    The PanDA (Production and Distributed Analysis) workload management system was developed to meet the scale and complexity of distributed computing for the ATLAS ex- periment. PanDA managed resources are distributed worldwide, on hundreds of computing sites, with thousands of physicists accessing hundreds of Petabytes of data and the rate of data processing already exceeds Exabyte per year. While PanDA currently uses more than 200,000 cores at well over 100 Grid sites, future LHC data taking runs will require more resources than Grid computing can possibly provide. Additional computing and storage resources are required. Therefore ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. In this talk we will describe a project aimed at integration of ATLAS Production System with Titan supercom- puter at Oak Ridge Leadership Computing Facility (OLCF). Current approach utilizes modi ed PanDA Pilot framework for ...

  19. Lectures in Supercomputational Neurosciences Dynamics in Complex Brain Networks

    CERN Document Server

    Graben, Peter beim; Thiel, Marco; Kurths, Jürgen

    2008-01-01

    Computational Neuroscience is a burgeoning field of research where only the combined effort of neuroscientists, biologists, psychologists, physicists, mathematicians, computer scientists, engineers and other specialists, e.g. from linguistics and medicine, seem to be able to expand the limits of our knowledge. The present volume is an introduction, largely from the physicists' perspective, to the subject matter with in-depth contributions by system neuroscientists. A conceptual model for complex networks of neurons is introduced that incorporates many important features of the real brain, such as various types of neurons, various brain areas, inhibitory and excitatory coupling and the plasticity of the network. The computational implementation on supercomputers, which is introduced and discussed in detail in this book, will enable the readers to modify and adapt the algortihm for their own research. Worked-out examples of applications are presented for networks of Morris-Lecar neurons to model the cortical co...

  20. Modeling the weather with a data flow supercomputer

    Science.gov (United States)

    Dennis, J. B.; Gao, G.-R.; Todd, K. W.

    1984-01-01

    A static concept of data flow architecture is considered for a supercomputer for weather modeling. The machine level instructions are loaded into specific memory locations before computation is initiated, with only one instruction active at a time. The machine would have processing element, functional unit, array memory, memory routing and distribution routing network elements all contained on microprocessors. A value-oriented algorithmic language (VAL) would be employed and would have, as basic operations, simple functions deriving results from operand values. Details of the machine language format, computations with an array and file processing procedures are outlined. A global weather model is discussed in terms of a static architecture and the potential computation rate is analyzed. The results indicate that detailed design studies are warranted to quantify costs and parts fabrication requirements.

  1. Toward the Graphics Turing Scale on a Blue Gene Supercomputer

    CERN Document Server

    McGuigan, Michael

    2008-01-01

    We investigate raytracing performance that can be achieved on a class of Blue Gene supercomputers. We measure a 822 times speedup over a Pentium IV on a 6144 processor Blue Gene/L. We measure the computational performance as a function of number of processors and problem size to determine the scaling performance of the raytracing calculation on the Blue Gene. We find nontrivial scaling behavior at large number of processors. We discuss applications of this technology to scientific visualization with advanced lighting and high resolution. We utilize three racks of a Blue Gene/L in our calculations which is less than three percent of the the capacity of the worlds largest Blue Gene computer.

  2. Direct numerical simulation of turbulence using GPU accelerated supercomputers

    Science.gov (United States)

    Khajeh-Saeed, Ali; Blair Perot, J.

    2013-02-01

    Direct numerical simulations of turbulence are optimized for up to 192 graphics processors. The results from two large GPU clusters are compared to the performance of corresponding CPU clusters. A number of important algorithm changes are necessary to access the full computational power of graphics processors and these adaptations are discussed. It is shown that the handling of subdomain communication becomes even more critical when using GPU based supercomputers. The potential for overlap of MPI communication with GPU computation is analyzed and then optimized. Detailed timings reveal that the internal calculations are now so efficient that the operations related to MPI communication are the primary scaling bottleneck at all but the very largest problem sizes that can fit on the hardware. This work gives a glimpse of the CFD performance issues will dominate many hardware platform in the near future.

  3. Internal computational fluid mechanics on supercomputers for aerospace propulsion systems

    Science.gov (United States)

    Andersen, Bernhard H.; Benson, Thomas J.

    1987-01-01

    The accurate calculation of three-dimensional internal flowfields for application towards aerospace propulsion systems requires computational resources available only on supercomputers. A survey is presented of three-dimensional calculations of hypersonic, transonic, and subsonic internal flowfields conducted at the Lewis Research Center. A steady state Parabolized Navier-Stokes (PNS) solution of flow in a Mach 5.0, mixed compression inlet, a Navier-Stokes solution of flow in the vicinity of a terminal shock, and a PNS solution of flow in a diffusing S-bend with vortex generators are presented and discussed. All of these calculations were performed on either the NAS Cray-2 or the Lewis Research Center Cray XMP.

  4. Solving global shallow water equations on heterogeneous supercomputers.

    Science.gov (United States)

    Fu, Haohuan; Gan, Lin; Yang, Chao; Xue, Wei; Wang, Lanning; Wang, Xinliang; Huang, Xiaomeng; Yang, Guangwen

    2017-01-01

    The scientific demand for more accurate modeling of the climate system calls for more computing power to support higher resolutions, inclusion of more component models, more complicated physics schemes, and larger ensembles. As the recent improvements in computing power mostly come from the increasing number of nodes in a system and the integration of heterogeneous accelerators, how to scale the computing problems onto more nodes and various kinds of accelerators has become a challenge for the model development. This paper describes our efforts on developing a highly scalable framework for performing global atmospheric modeling on heterogeneous supercomputers equipped with various accelerators, such as GPU (Graphic Processing Unit), MIC (Many Integrated Core), and FPGA (Field Programmable Gate Arrays) cards. We propose a generalized partition scheme of the problem domain, so as to keep a balanced utilization of both CPU resources and accelerator resources. With optimizations on both computing and memory access patterns, we manage to achieve around 8 to 20 times speedup when comparing one hybrid GPU or MIC node with one CPU node with 12 cores. Using a customized FPGA-based data-flow engines, we see the potential to gain another 5 to 8 times improvement on performance. On heterogeneous supercomputers, such as Tianhe-1A and Tianhe-2, our framework is capable of achieving ideally linear scaling efficiency, and sustained double-precision performances of 581 Tflops on Tianhe-1A (using 3750 nodes) and 3.74 Pflops on Tianhe-2 (using 8644 nodes). Our study also provides an evaluation on the programming paradigm of various accelerator architectures (GPU, MIC, FPGA) for performing global atmospheric simulation, to form a picture about both the potential performance benefits and the programming efforts involved.

  5. Selecting Simulation Models when Predicting Parallel Program Behaviour

    OpenAIRE

    Broberg, Magnus; Lundberg, Lars; Grahn, Håkan

    2002-01-01

    The use of multiprocessors is an important way to increase the performance of a supercom-puting program. This means that the program has to be parallelized to make use of the multi-ple processors. The parallelization is unfortunately not an easy task. Development tools supporting parallel programs are important. Further, it is the customer that decides the number of processors in the target machine, and as a result the developer has to make sure that the pro-gram runs efficiently on any numbe...

  6. Plasmonics and the parallel programming problem

    Science.gov (United States)

    Vishkin, Uzi; Smolyaninov, Igor; Davis, Chris

    2007-02-01

    While many parallel computers have been built, it has generally been too difficult to program them. Now, all computers are effectively becoming parallel machines. Biannual doubling in the number of cores on a single chip, or faster, over the coming decade is planned by most computer vendors. Thus, the parallel programming problem is becoming more critical. The only known solution to the parallel programming problem in the theory of computer science is through a parallel algorithmic theory called PRAM. Unfortunately, some of the PRAM theory assumptions regarding the bandwidth between processors and memories did not properly reflect a parallel computer that could be built in previous decades. Reaching memories, or other processors in a multi-processor organization, required off-chip connections through pins on the boundary of each electric chip. Using the number of transistors that is becoming available on chip, on-chip architectures that adequately support the PRAM are becoming possible. However, the bandwidth of off-chip connections remains insufficient and the latency remains too high. This creates a bottleneck at the boundary of the chip for a PRAM-On-Chip architecture. This also prevents scalability to larger "supercomputing" organizations spanning across many processing chips that can handle massive amounts of data. Instead of connections through pins and wires, power-efficient CMOS-compatible on-chip conversion to plasmonic nanowaveguides is introduced for improved latency and bandwidth. Proper incorporation of our ideas offer exciting avenues to resolving the parallel programming problem, and an alternative way for building faster, more useable and much more compact supercomputers.

  7. Algorithm comparison and benchmarking using a parallel spectra transform shallow water model

    Energy Technology Data Exchange (ETDEWEB)

    Worley, P.H. [Oak Ridge National Lab., TN (United States); Foster, I.T.; Toonen, B. [Argonne National Lab., IL (United States)

    1995-04-01

    In recent years, a number of computer vendors have produced supercomputers based on a massively parallel processing (MPP) architecture. These computers have been shown to be competitive in performance with conventional vector supercomputers for some applications. As spectral weather and climate models are heavy users of vector supercomputers, it is interesting to determine how these models perform on MPPS, and which MPPs are best suited to the execution of spectral models. The benchmarking of MPPs is complicated by the fact that different algorithms may be more efficient on different architectures. Hence, a comprehensive benchmarking effort must answer two related questions: which algorithm is most efficient on each computer and how do the most efficient algorithms compare on different computers. In general, these are difficult questions to answer because of the high cost associated with implementing and evaluating a range of different parallel algorithms on each MPP platform.

  8. Parallelization of Kinetic Theory Simulations

    CERN Document Server

    Howell, Jim; Colbry, Dirk; Pickett, Rodney; Staber, Alec; Sagert, Irina; Strother, Terrance

    2013-01-01

    Numerical studies of shock waves in large scale systems via kinetic simulations with millions of particles are too computationally demanding to be processed in serial. In this work we focus on optimizing the parallel performance of a kinetic Monte Carlo code for astrophysical simulations such as core-collapse supernovae. Our goal is to attain a flexible program that scales well with the architecture of modern supercomputers. This approach requires a hybrid model of programming that combines a message passing interface (MPI) with a multithreading model (OpenMP) in C++. We report on our approach to implement the hybrid design into the kinetic code and show first results which demonstrate a significant gain in performance when many processors are applied.

  9. Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

    KAUST Repository

    Hasanov, Khalid

    2014-03-04

    © 2014, Springer Science+Business Media New York. Many state-of-the-art parallel algorithms, which are widely used in scientific applications executed on high-end computing systems, were designed in the twentieth century with relatively small-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel algorithms for execution on large-scale distributed-memory systems. The idea is to reduce the communication cost by introducing hierarchy and hence more parallelism in the communication scheme. We apply this approach to SUMMA, the state-of-the-art parallel algorithm for matrix–matrix multiplication, and demonstrate both theoretically and experimentally that the modified Hierarchical SUMMA significantly improves the communication cost and the overall performance on large-scale platforms.

  10. MASSIVE HYBRID PARALLELISM FOR FULLY IMPLICIT MULTIPHYSICS

    Energy Technology Data Exchange (ETDEWEB)

    Cody J. Permann; David Andrs; John W. Peterson; Derek R. Gaston

    2013-05-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided.

  11. Parallel visualization on leadership computing resources

    Energy Technology Data Exchange (ETDEWEB)

    Peterka, T; Ross, R B [Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439 (United States); Shen, H-W [Department of Computer Science and Engineering, Ohio State University, Columbus, OH 43210 (United States); Ma, K-L [Department of Computer Science, University of California at Davis, Davis, CA 95616 (United States); Kendall, W [Department of Electrical Engineering and Computer Science, University of Tennessee at Knoxville, Knoxville, TN 37996 (United States); Yu, H, E-mail: tpeterka@mcs.anl.go [Sandia National Laboratories, California, Livermore, CA 94551 (United States)

    2009-07-01

    Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.

  12. Electric Propulsion Plume Simulations Using Parallel Computer

    Directory of Open Access Journals (Sweden)

    Joseph Wang

    2007-01-01

    Full Text Available A parallel, three-dimensional electrostatic PIC code is developed for large-scale electric propulsion simulations using parallel supercomputers. This code uses a newly developed immersed-finite-element particle-in-cell (IFE-PIC algorithm designed to handle complex boundary conditions accurately while maintaining the computational speed of the standard PIC code. Domain decomposition is used in both field solve and particle push to divide the computation among processors. Two simulations studies are presented to demonstrate the capability of the code. The first is a full particle simulation of near-thruster plume using real ion to electron mass ratio. The second is a high-resolution simulation of multiple ion thruster plume interactions for a realistic spacecraft using a domain enclosing the entire solar array panel. Performance benchmarks show that the IFE-PIC achieves a high parallel efficiency of ≥ 90%

  13. Non-preconditioned conjugate gradient on cell and FPCA-based hybrid supercomputer nodes

    Energy Technology Data Exchange (ETDEWEB)

    Dubois, David H [Los Alamos National Laboratory; Dubois, Andrew J [Los Alamos National Laboratory; Boorman, Thomas M [Los Alamos National Laboratory; Connor, Carolyn M [Los Alamos National Laboratory

    2009-03-10

    This work presents a detailed implementation of a double precision, Non-Preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture{trademark} in conjunction with x86 Opteron{trademark} processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.

  14. Non-preconditioned conjugate gradient on cell and FPGA based hybrid supercomputer nodes

    Energy Technology Data Exchange (ETDEWEB)

    Dubois, David H [Los Alamos National Laboratory; Dubois, Andrew J [Los Alamos National Laboratory; Boorman, Thomas M [Los Alamos National Laboratory; Connor, Carolyn M [Los Alamos National Laboratory

    2009-01-01

    This work presents a detailed implementation of a double precision, non-preconditioned, Conjugate Gradient algorithm on a Roadrunner heterogeneous supercomputer node. These nodes utilize the Cell Broadband Engine Architecture{sup TM} in conjunction with x86 Opteron{sup TM} processors from AMD. We implement a common Conjugate Gradient algorithm, on a variety of systems, to compare and contrast performance. Implementation results are presented for the Roadrunner hybrid supercomputer, SRC Computers, Inc. MAPStation SRC-6 FPGA enhanced hybrid supercomputer, and AMD Opteron only. In all hybrid implementations wall clock time is measured, including all transfer overhead and compute timings.

  15. Systems biology: confronting the complexity of cancer.

    Science.gov (United States)

    Gentles, Andrew J; Gallahan, Daniel

    2011-09-15

    The AACR-NCI Conference "Systems Biology: Confronting the Complexity of Cancer" took place from February 27 to March 2, 2011, in San Diego, CA. Several themes resonated during the meeting, notably (i) the need for better methods to distill insights from large-scale networks, (ii) the importance of integrating multiple data types in constructing more realistic models, (iii) challenges in translating insights about tumorigenic mechanisms into therapeutic interventions, and (iv) the role of the tumor microenvironment, at the physical, cellular, and molecular levels. The meeting highlighted concrete applications of systems biology to cancer, and the value of collaboration between interdisciplinary researchers in attacking formidable problems.

  16. Dark Radiation Confronting LHC in Z' Models

    CERN Document Server

    Solaguren-Beascoa, A

    2012-01-01

    Recent cosmological data favour additional relativistic degrees of freedom beyond the three active neutrinos and photons, often referred to as "dark radiation". Extensions of the SM involving TeV-scale Z' gauge bosons generically contain superweakly interacting light right-handed neutrinos which can constitute this dark radiation. In this letter we confront the requirement on the parameters of the E6 Z' models to account for the present evidence of dark radiation with the already existing constraints from searches for new neutral gauge bosons at LHC7.

  17. Parallel implementation of an algorithm for Delaunay triangulation

    Science.gov (United States)

    Merriam, Marshal L.

    1992-01-01

    The theory and practice of implementing Tanemura's algorithm for 3D Delaunay triangulation on Intel's Gamma prototype, a 128 processor MIMD computer, is described. Efficient implementation of Tanemura's algorithm on a conventional, vector processing supercomputer is problematic. It does not vectorize to any significant degree and requires indirect addressing. Efficient implementation on a parallel architecture is possible, however. Speeds in excess of 20 times a single processor Cray Y-MP are realized on 128 processors of the Intel Gamma prototype.

  18. Performance and Scalability of the NAS Parallel Benchmarks in Java

    Science.gov (United States)

    Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan A. (Technical Monitor)

    2002-01-01

    Several features make Java an attractive choice for scientific applications. In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for scientific applications.

  19. Implementation of the NAS Parallel Benchmarks in Java

    Science.gov (United States)

    Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Several features make Java an attractive choice for High Performance Computing (HPC). In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for CFD applications.

  20. Match and Move, an Approach to Data Parallel Computing

    Science.gov (United States)

    1992-10-01

    Blelloch, Siddhartha Chatterjee, Jay Sippelstein, and Marco Zagha. CVL: a C Vector Library. School of Computer Science, Carnegie Mellon University...CBZ90] Siddhartha Chatterjee, Guy E. Blelloch, and Marco Zagha. Scan primitives for vector computers. In Proceedings Supercomputing 󈨞, November 1990...Cha91] Siddhartha Chatterjee. Compiling data-parallel programs for efficient execution on shared-memory multiprocessors. PhD thesis, Carnegie Mellon

  1. Alleviating Search Uncertainty through Concept Associations: Automatic Indexing, Co-Occurrence Analysis, and Parallel Computing.

    Science.gov (United States)

    Chen, Hsinchun; Martinez, Joanne; Kirchhoff, Amy; Ng, Tobun D.; Schatz, Bruce R.

    1998-01-01

    Grounded on object filtering, automatic indexing, and co-occurrence analysis, an experiment was performed using a parallel supercomputer to analyze over 400,000 abstracts in an INSPEC computer engineering collection. A user evaluation revealed that system-generated thesauri were better than the human-generated INSPEC subject thesaurus in concept…

  2. ANALYSES ON SYSTEMATIC CONFRONTATION OF FIGHTER AIRCRAFT

    Institute of Scientific and Technical Information of China (English)

    HuaiJinpeng; WuZhe; HuangJun

    2002-01-01

    Analyses of the systematic confrontation between two military forcfes are the highest hierarchy on opera-tional effectiveness study of weapon systema.The physi-cal model for tactical many-on-many engagements of an aerial warfare with heterogeneous figher aircraft is estab-lished.On the basis of Lanchester multivariate equations of square law,a mathematical model corresponding to the established physical model is given.A superiorityh parame-ter is then derived directly from the mathematical model.With view to the high -tech condition of modern war-fare,the concept of superiority parameter which more well and truly reflects the essential of an air-to-air en-gagement is further formulated.The attrition coeffi-cients,which are key to the differential equations,are de-termined by using tactics of random target assignment and air-to-air capability index of the fighter aircraft.Hereby,taking the mathematical model and superiority parameter as cores,calculations amd analyses of complicate systemic problems such as evaluation of battle superiority,prog-mostication of combat process and optimization of colloca-tions have been accomplished.Results indicate that a clas-sical combat theory with its certain recent development has received newer applications in the military operation research for complicated confrontation analysis issues.

  3. When Professors Confront Themselves: Towards a Theoretical Conceptualization of Video Self-Confrontation in Higher Education.

    Science.gov (United States)

    Perlberg, Arye

    1983-01-01

    Two explanations of the underlying process in faculty self-evaluation by videotape recording are outlined and integrated into one conceptualization. One theory is based on affect: self-confrontation, dissonance, stress, distress, and eustress. The second is based on a cognitive and information processing approach and includes feedback,…

  4. Associative memories for supercomputers. Final report, July 1989-January 1991

    Energy Technology Data Exchange (ETDEWEB)

    Esener, S.C.; Marchand, P.; Krishnamoorthy, A.

    1992-12-01

    A motionless head 2-D parallel readout system for optical disks is presented. Its unique features are discussed and it is compared to various parallel access optical storage media. The motionless-head parallel read-out system for optical disks is shown to meet current and near-term future requirements for high performance secondary storage. In order to select a memory architecture compatible with the motionless-head disk, inner-product and outer-product associative memory algorithms are compared in terms of their storage requirements, search times, system complexities, and fault tolerance. Based on this comparison, the page serial, bit-parallel inner-product method is shown to be well suited to implementation with parallel readout optical disk, and opto-electronic XNOR gate arrays, using for instance the Si/PLZT technology. Finally, the associative memory system design is presented.... Memory, Associative memory, Optical disks.

  5. Novel Supercomputing Approaches for High Performance Linear Algebra Using FPGAs Project

    Data.gov (United States)

    National Aeronautics and Space Administration — Supercomputing plays a major role in many areas of science and engineering, and it has had tremendous impact for decades in areas such as aerospace, defense, energy,...

  6. SUPERCOMPUTERS FOR AIDING ECONOMIC PROCESSES WITH REFERENCE TO THE FINANCIAL SECTOR

    Directory of Open Access Journals (Sweden)

    Jerzy Balicki

    2014-12-01

    Full Text Available The article discusses the use of supercomputers to support business processes with particular emphasis on the financial sector. A reference was made to the selected projects that support economic development. In particular, we propose the use of supercomputers to perform artificial intel-ligence methods in banking. The proposed methods combined with modern technology enables a significant increase in the competitiveness of enterprises and banks by adding new functionality.

  7. Accelerating Science Impact through Big Data Workflow Management and Supercomputing

    Science.gov (United States)

    De, K.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Ryabinkin, E.; Wenaus, T.

    2016-02-01

    The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. ATLAS, one of the largest collaborations ever assembled in the the history of science, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. To manage the workflow for all data processing on hundreds of data centers the PanDA (Production and Distributed Analysis)Workload Management System is used. An ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF), is realizing within BigPanDA and megaPanDA projects. These projects are now exploring how PanDA might be used for managing computing jobs that run on supercomputers including OLCF's Titan and NRC-KI HPC2. The main idea is to reuse, as much as possible, existing components of the PanDA system that are already deployed on the LHC Grid for analysis of physics data. The next generation of PanDA will allow many data-intensive sciences employing a variety of computing platforms to benefit from ATLAS experience and proven tools in highly scalable processing.

  8. Developing and Deploying Advanced Algorithms to Novel Supercomputing Hardware

    CERN Document Server

    Brunner, Robert J; Myers, Adam D

    2007-01-01

    The objective of our research is to demonstrate the practical usage and orders of magnitude speedup of real-world applications by using alternative technologies to support high performance computing. Currently, the main barrier to the widespread adoption of this technology is the lack of development tools and case studies that typically impede non-specialists that might otherwise develop applications that could leverage these technologies. By partnering with the Innovative Systems Laboratory at the National Center for Supercomputing, we have obtained access to several novel technologies, including several Field-Programmable Gate Array (FPGA) systems, NVidia Graphics Processing Units (GPUs), and the STI Cell BE platform. Our goal is to not only demonstrate the capabilities of these systems, but to also serve as guides for others to follow in our path. To date, we have explored the efficacy of the SRC-6 MAP-C and MAP-E and SGI RASC Athena and RC100 reconfigurable computing platforms in supporting a two-point co...

  9. Numerical infinities and infinitesimals in a new supercomputing framework

    Science.gov (United States)

    Sergeyev, Yaroslav D.

    2016-06-01

    Traditional computers are able to work numerically with finite numbers only. The Infinity Computer patented recently in USA and EU gets over this limitation. In fact, it is a computational device of a new kind able to work numerically not only with finite quantities but with infinities and infinitesimals, as well. The new supercomputing methodology is not related to non-standard analysis and does not use either Cantor's infinite cardinals or ordinals. It is founded on Euclid's Common Notion 5 saying `The whole is greater than the part'. This postulate is applied to all numbers (finite, infinite, and infinitesimal) and to all sets and processes (finite and infinite). It is shown that it becomes possible to write down finite, infinite, and infinitesimal numbers by a finite number of symbols as numerals belonging to a positional numeral system with an infinite radix described by a specific ad hoc introduced axiom. Numerous examples of the usage of the introduced computational tools are given during the lecture. In particular, algorithms for solving optimization problems and ODEs are considered among the computational applications of the Infinity Computer. Numerical experiments executed on a software prototype of the Infinity Computer are discussed.

  10. Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters

    Science.gov (United States)

    Fluke, Christopher J.; Barnes, David G.; Barsdell, Benjamin R.; Hassan, Amr H.

    2011-01-01

    General-purpose computing on graphics processing units (GPGPU) is dramatically changing the landscape of high performance computing in astronomy. In this paper, we identify and investigate several key decision areas, with a goal of simplifying the early adoption of GPGPU in astronomy. We consider the merits of OpenCL as an open standard in order to reduce risks associated with coding in a native, vendor-specific programming environment, and present a GPU programming philosophy based on using brute force solutions. We assert that effective use of new GPU-based supercomputing facilities will require a change in approach from astronomers. This will likely include improved programming training, an increased need for software development best practice through the use of profiling and related optimisation tools, and a greater reliance on third-party code libraries. As with any new technology, those willing to take the risks and make the investment of time and effort to become early adopters of GPGPU in astronomy, stand to reap great benefits.

  11. Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters

    CERN Document Server

    Fluke, Christopher J; Barsdell, Benjamin R; Hassan, Amr H

    2010-01-01

    General purpose computing on graphics processing units (GPGPU) is dramatically changing the landscape of high performance computing in astronomy. In this paper, we identify and investigate several key decision areas, with a goal of simplyfing the early adoption of GPGPU in astronomy. We consider the merits of OpenCL as an open standard in order to reduce risks associated with coding in a native, vendor-specific programming environment, and present a GPU programming philosophy based on using brute force solutions. We assert that effective use of new GPU-based supercomputing facilities will require a change in approach from astronomers. This will likely include improved programming training, an increased need for software development best-practice through the use of profiling and related optimisation tools, and a greater reliance on third-party code libraries. As with any new technology, those willing to take the risks, and make the investment of time and effort to become early adopters of GPGPU in astronomy, s...

  12. Developing Fortran Code for Kriging on the Stampede Supercomputer

    Science.gov (United States)

    Hodgess, Erin

    2016-04-01

    Kriging is easily accessible in the open source statistical language R (R Core Team, 2015) in the gstat (Pebesma, 2004) package. It works very well, but can be slow on large data sets, particular if the prediction space is large as well. We are working on the Stampede supercomputer at the Texas Advanced Computing Center to develop code using a combination of R and the Message Passage Interface (MPI) bindings to Fortran. We have a function similar to the autofitVariogram found in the automap (Hiemstra {et al}, 2008) package and it is very effective. We are comparing R with MPI/Fortran, MPI/Fortran alone, and R with the Rmpi package, which uses bindings to C. We will present results from simulation studies and real-world examples. References: Hiemstra, P.H., Pebesma, E.J., Twenhofel, C.J.W. and G.B.M. Heuvelink, 2008. Real-time automatic interpolation of ambient gamma dose rates from the Dutch Radioactivity Monitoring Network. Computers and Geosciences, accepted for publication. Pebesma, E.J., 2004. Multivariable geostatistics in S: the gstat package. Computers and Geosciences, 30: 683-691. R Core Team, 2015. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

  13. Accelerating Science Impact through Big Data Workflow Management and Supercomputing

    Directory of Open Access Journals (Sweden)

    De K.

    2016-01-01

    Full Text Available The Large Hadron Collider (LHC, operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. ATLAS, one of the largest collaborations ever assembled in the the history of science, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. To manage the workflow for all data processing on hundreds of data centers the PanDA (Production and Distributed AnalysisWorkload Management System is used. An ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF, is realizing within BigPanDA and megaPanDA projects. These projects are now exploring how PanDA might be used for managing computing jobs that run on supercomputers including OLCF’s Titan and NRC-KI HPC2. The main idea is to reuse, as much as possible, existing components of the PanDA system that are already deployed on the LHC Grid for analysis of physics data. The next generation of PanDA will allow many data-intensive sciences employing a variety of computing platforms to benefit from ATLAS experience and proven tools in highly scalable processing.

  14. Supercomputers ready for use as discovery machines for neuroscience

    Directory of Open Access Journals (Sweden)

    Moritz eHelias

    2012-11-01

    Full Text Available NEST is a widely used tool to simulate biological spiking neural networks. Here we explain theimprovements, guided by a mathematical model of memory consumption, that enable us to exploitfor the first time the computational power of the K supercomputer for neuroscience. Multi-threadedcomponents for wiring and simulation combine 8 cores per MPI process to achieve excellent scaling.K is capable of simulating networks corresponding to a brain area with 10^8 neurons and 10^12 synapsesin the worst case scenario of random connectivity; for larger networks of the brain its hierarchicalorganization can be exploited to constrain the number of communicating computer nodes. Wediscuss the limits of the software technology, comparing maximum-□lling scaling plots for K andthe JUGENE BG/P system. The usability of these machines for network simulations has becomecomparable to running simulations on a single PC. Turn-around times in the range of minutes evenfor the largest systems enable a quasi-interactive working style and render simulations on this scalea practical tool for computational neuroscience.

  15. Supercomputers ready for use as discovery machines for neuroscience.

    Science.gov (United States)

    Helias, Moritz; Kunkel, Susanne; Masumoto, Gen; Igarashi, Jun; Eppler, Jochen Martin; Ishii, Shin; Fukai, Tomoki; Morrison, Abigail; Diesmann, Markus

    2012-01-01

    NEST is a widely used tool to simulate biological spiking neural networks. Here we explain the improvements, guided by a mathematical model of memory consumption, that enable us to exploit for the first time the computational power of the K supercomputer for neuroscience. Multi-threaded components for wiring and simulation combine 8 cores per MPI process to achieve excellent scaling. K is capable of simulating networks corresponding to a brain area with 10(8) neurons and 10(12) synapses in the worst case scenario of random connectivity; for larger networks of the brain its hierarchical organization can be exploited to constrain the number of communicating computer nodes. We discuss the limits of the software technology, comparing maximum filling scaling plots for K and the JUGENE BG/P system. The usability of these machines for network simulations has become comparable to running simulations on a single PC. Turn-around times in the range of minutes even for the largest systems enable a quasi interactive working style and render simulations on this scale a practical tool for computational neuroscience.

  16. Scalable geocomputation: evolving an environmental model building platform from single-core to supercomputers

    Science.gov (United States)

    Schmitz, Oliver; de Jong, Kor; Karssenberg, Derek

    2017-04-01

    There is an increasing demand to run environmental models on a big scale: simulations over large areas at high resolution. The heterogeneity of available computing hardware such as multi-core CPUs, GPUs or supercomputer potentially provides significant computing power to fulfil this demand. However, this requires detailed knowledge of the underlying hardware, parallel algorithm design and the implementation thereof in an efficient system programming language. Domain scientists such as hydrologists or ecologists often lack this specific software engineering knowledge, their emphasis is (and should be) on exploratory building and analysis of simulation models. As a result, models constructed by domain specialists mostly do not take full advantage of the available hardware. A promising solution is to separate the model building activity from software engineering by offering domain specialists a model building framework with pre-programmed building blocks that they combine to construct a model. The model building framework, consequently, needs to have built-in capabilities to make full usage of the available hardware. Developing such a framework providing understandable code for domain scientists and being runtime efficient at the same time poses several challenges on developers of such a framework. For example, optimisations can be performed on individual operations or the whole model, or tasks need to be generated for a well-balanced execution without explicitly knowing the complexity of the domain problem provided by the modeller. Ideally, a modelling framework supports the optimal use of available hardware whichsoever combination of model building blocks scientists use. We demonstrate our ongoing work on developing parallel algorithms for spatio-temporal modelling and demonstrate 1) PCRaster, an environmental software framework (http://www.pcraster.eu) providing spatio-temporal model building blocks and 2) parallelisation of about 50 of these building blocks using

  17. Parallel algorithms

    CERN Document Server

    Casanova, Henri; Robert, Yves

    2008-01-01

    ""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi

  18. Parents’ experience confronting child burning situation

    Directory of Open Access Journals (Sweden)

    Valdira Vieira de Oliveira

    2015-05-01

    Full Text Available Objective: to understand experiences of parents in a child burning situation during the hospitalization process. Methods: phenomenological research in view of Martin Heidegger, held with seven assisting parents at a pediatrics unit of a general hospital in Montes Claros. The information was obtained by phenomenological interview, containing the question guide: “What does it mean to you being with a son who is suffering with burns?”. Results: during the experience, parents revealed anguish, fear, helplessness, concerns and expectations of “being-in-the-world”. Conclusion: respect, understanding and care from the health team were fundamental for the adaptation and the confrontation demanded by the consequent suffering of the event.

  19. Confronting Phantom Dark Energy with Observations

    CERN Document Server

    Wang, Pao-Yu; Chen, Pisin

    2012-01-01

    We confront two types of phantom dark energy potential with observational data. The models we consider are the power-law potential, $V\\propto {\\phi}^{\\mu}$, and the exponential potential, $V\\propto \\exp({\\lambda}{\\phi}/{M_P})$. We fit the models to the latest observations from SN-Ia, CMB and BAO, and obtain tight constraints on parameter spaces. Furthermore, we apply the goodness-of-fit and the information criteria to compare the fitting results from phantom models with that from the cosmological constant and the quintessence models presented in our previous work. The results show that the cosmological constant is statistically most preferred, while the phantom dark energy fits slightly better than the quintessence does.

  20. The four-meter confrontation visual field test.

    OpenAIRE

    Kodsi, S R; Younge, B. R.

    1992-01-01

    The 4-m confrontation visual field test has been successfully used at the Mayo Clinic for many years in addition to the standard 0.5-m confrontation visual field test. The 4-m confrontation visual field test is a test of macular function and can identify small central or paracentral scotomas that the examiner may not find when the patient is tested only at 0.5 m. Also, macular sparing in homonymous hemianopias and quadrantanopias may be identified with the 4-m confrontation visual field test....

  1. Parallel biocomputing

    Directory of Open Access Journals (Sweden)

    Witte John S

    2011-03-01

    Full Text Available Abstract Background With the advent of high throughput genomics and high-resolution imaging techniques, there is a growing necessity in biology and medicine for parallel computing, and with the low cost of computing, it is now cost-effective for even small labs or individuals to build their own personal computation cluster. Methods Here we briefly describe how to use commodity hardware to build a low-cost, high-performance compute cluster, and provide an in-depth example and sample code for parallel execution of R jobs using MOSIX, a mature extension of the Linux kernel for parallel computing. A similar process can be used with other cluster platform software. Results As a statistical genetics example, we use our cluster to run a simulated eQTL experiment. Because eQTL is computationally intensive, and is conceptually easy to parallelize, like many statistics/genetics applications, parallel execution with MOSIX gives a linear speedup in analysis time with little additional effort. Conclusions We have used MOSIX to run a wide variety of software programs in parallel with good results. The limitations and benefits of using MOSIX are discussed and compared to other platforms.

  2. Enabling Loosely-Coupled Serial Job Execution on the IBM BlueGene/P Supercomputer and the SiCortex SC5832

    CERN Document Server

    Raicu, Ioan; Wilde, Mike; Foster, Ian

    2008-01-01

    Our work addresses the enabling of the execution of highly parallel computations composed of loosely coupled serial jobs with no modifications to the respective applications, on large-scale systems. This approach allows new-and potentially far larger-classes of application to leverage systems such as the IBM Blue Gene/P supercomputer and similar emerging petascale architectures. We present here the challenges of I/O performance encountered in making this model practical, and show results using both micro-benchmarks and real applications on two large-scale systems, the BG/P and the SiCortex SC5832. Our preliminary benchmarks show that we can scale to 4096 processors on the Blue Gene/P and 5832 processors on the SiCortex with high efficiency, and can achieve thousands of tasks/sec sustained execution rates for parallel workloads of ordinary serial applications. We measured applications from two domains, economic energy modeling and molecular dynamics.

  3. Parallel Evolutionary Optimization for Neuromorphic Network Training

    Energy Technology Data Exchange (ETDEWEB)

    Schuman, Catherine D [ORNL; Disney, Adam [University of Tennessee (UT); Singh, Susheela [North Carolina State University (NCSU), Raleigh; Bruer, Grant [University of Tennessee (UT); Mitchell, John Parker [University of Tennessee (UT); Klibisz, Aleksander [University of Tennessee (UT); Plank, James [University of Tennessee (UT)

    2016-01-01

    One of the key impediments to the success of current neuromorphic computing architectures is the issue of how best to program them. Evolutionary optimization (EO) is one promising programming technique; in particular, its wide applicability makes it especially attractive for neuromorphic architectures, which can have many different characteristics. In this paper, we explore different facets of EO on a spiking neuromorphic computing model called DANNA. We focus on the performance of EO in the design of our DANNA simulator, and on how to structure EO on both multicore and massively parallel computing systems. We evaluate how our parallel methods impact the performance of EO on Titan, the U.S.'s largest open science supercomputer, and BOB, a Beowulf-style cluster of Raspberry Pi's. We also focus on how to improve the EO by evaluating commonality in higher performing neural networks, and present the result of a study that evaluates the EO performed by Titan.

  4. 28 CFR 552.23 - Confrontation avoidance procedures.

    Science.gov (United States)

    2010-07-01

    ... 28 Judicial Administration 2 2010-07-01 2010-07-01 false Confrontation avoidance procedures. 552... MANAGEMENT CUSTODY Use of Force and Application of Restraints on Inmates § 552.23 Confrontation avoidance... information about the inmate and the immediate situation. Based on their assessment of that information,...

  5. Confronting predictive texture zeros in lepton mass matrices with current data

    CERN Document Server

    Cebola, Luís M; Felipe, Ricardo González

    2015-01-01

    Several popular Ans\\"atze of lepton mass matrices that contain texture zeros are confronted with current neutrino observational data. We perform a systematic $\\chi^2$-analysis in a wide class of schemes, considering arbitrary Hermitian charged lepton mass matrices and symmetric mass matrices for Majorana neutrinos or Hermitian mass matrices for Dirac neutrinos. Our study reveals that several patterns are still consistent with all the observations at 68.27% confidence level, while some others are disfavored or excluded by the experimental data. The well-known Frampton-Glashow-Marfatia two-zero textures, hybrid textures and parallel structures, among others, are considered.

  6. PARALLEL STABILIZATION

    Institute of Scientific and Technical Information of China (English)

    J.L.LIONS

    1999-01-01

    A new algorithm for the stabilization of (possibly turbulent, chaotic) distributed systems, governed by linear or non linear systems of equations is presented. The SPA (Stabilization Parallel Algorithm) is based on a systematic parallel decomposition of the problem (related to arbitrarily overlapping decomposition of domains) and on a penalty argument. SPA is presented here for the case of linear parabolic equations: with distrjbuted or boundary control. It extends to practically all linear and non linear evolution equations, as it will be presented in several other publications.

  7. Performance Analysis and Scaling Behavior of the Terrestrial Systems Modeling Platform TerrSysMP in Large-Scale Supercomputing Environments

    Science.gov (United States)

    Kollet, S. J.; Goergen, K.; Gasper, F.; Shresta, P.; Sulis, M.; Rihani, J.; Simmer, C.; Vereecken, H.

    2013-12-01

    In studies of the terrestrial hydrologic, energy and biogeochemical cycles, integrated multi-physics simulation platforms take a central role in characterizing non-linear interactions, variances and uncertainties of system states and fluxes in reciprocity with observations. Recently developed integrated simulation platforms attempt to honor the complexity of the terrestrial system across multiple time and space scales from the deeper subsurface including groundwater dynamics into the atmosphere. Technically, this requires the coupling of atmospheric, land surface, and subsurface-surface flow models in supercomputing environments, while ensuring a high-degree of efficiency in the utilization of e.g., standard Linux clusters and massively parallel resources. A systematic performance analysis including profiling and tracing in such an application is crucial in the understanding of the runtime behavior, to identify optimum model settings, and is an efficient way to distinguish potential parallel deficiencies. On sophisticated leadership-class supercomputers, such as the 28-rack 5.9 petaFLOP IBM Blue Gene/Q 'JUQUEEN' of the Jülich Supercomputing Centre (JSC), this is a challenging task, but even more so important, when complex coupled component models are to be analysed. Here we want to present our experience from coupling, application tuning (e.g. 5-times speedup through compiler optimizations), parallel scaling and performance monitoring of the parallel Terrestrial Systems Modeling Platform TerrSysMP. The modeling platform consists of the weather prediction system COSMO of the German Weather Service; the Community Land Model, CLM of NCAR; and the variably saturated surface-subsurface flow code ParFlow. The model system relies on the Multiple Program Multiple Data (MPMD) execution model where the external Ocean-Atmosphere-Sea-Ice-Soil coupler (OASIS3) links the component models. TerrSysMP has been instrumented with the performance analysis tool Scalasca and analyzed

  8. Confronting Higgcision with Electric Dipole Moments

    CERN Document Server

    Cheung, Kingman; Senaha, Eibun; Tseng, Po-Yan

    2014-01-01

    Current data on the signal strengths and angular spectrum of the 125.5 GeV Higgs boson still allow a CP-mixed state, namely, the pseudoscalar coupling to the top quark can be as sizable as the scalar coupling: $C_u^S \\approx C_u^P =1/2$. CP violation can then arise and manifest in sizable electric dipole moments (EDMs). In the framework of two-Higgs-doublet models, we not only update the Higgs precision (Higgcision) study on the couplings with the most updated Higgs signal strength data, but also compute all the Higgs-mediated contributions from the 125.5 GeV Higgs boson to the EDMs, and confront the allowed parameter space against the existing constraints from the EDM measurements of Thallium, neutron, Mercury, and Thorium monoxide. We found that the combined EDM constraints restrict the pseudoscalar coupling to be less than about $10^{-2}$, unless there are contributions from other Higgs bosons, supersymmetric particles, or other exotic particles that delicately cancel the current Higgs-mediated contributio...

  9. Gauge-flation confronted with Planck

    CERN Document Server

    Namba, Ryo; Peloso, Marco

    2013-01-01

    Gauge-flation is a recently proposed model in which inflation is driven solely by a non-Abelian gauge field thanks to a specific higher order derivative operator. The nature of the operator is such that it does not introduce ghosts. We compute the cosmological scalar and tensor perturbations for this model, improving over an existing computation. We then confront these results with the Planck data. The model is characterized by the quantity \\gamma = (g^2 Q^2)/H^2 (where g is the gauge coupling constant, Q the vector vev, and H the Hubble rate). For \\gamma < 2, the scalar perturbations show a strong tachyonic instability. In the stable region, the scalar power spectrum n_s is too low at small \\gamma, while the tensor-to-scalar ratio r is too high at large \\gamma. No value of \\gamma leads to acceptable values for n_s and r, and so the model is ruled out by the CMB data. The same behavior with \\gamma was obtained in Chromo-natural inflation, a model in which inflation is driven by a pseudo-scalar coupled to a...

  10. Confronting mortality: faith and meaning across cultures.

    Science.gov (United States)

    Paulson, Steve; Kellehear, Allan; Kripal, Jeffrey J; Leary, Lani

    2014-11-01

    Despite advances in technology and medicine, death itself remains an immutable certainty. Indeed, the acceptance and understanding of our mortality are among the enduring metaphysical challenges that have confronted human beings from the beginning of time. How have we sought to cope with the inevitability of our mortality? How do various cultural and social representations of mortality shape and influence the way in which we understand and approach death? To what extent do personal beliefs and convictions about the meaning of life or the notion of an afterlife affect how we perceive and experience the process of death and dying? Steve Paulson, executive producer and host of To the Best of Our Knowledge, moderated a discussion on death, dying, and what lies beyond that included psychologist Lani Leary, professor of philosophy and religion Jeffrey J. Kripal, and sociologist Allan Kellehear. The following is an edited transcript of the discussion that occurred February 5, 7:00-8:30 pm, at the New York Academy of Sciences in New York City.

  11. Dementia Care: Confronting Myths in Clinical Management.

    Science.gov (United States)

    Neitch, Shirley M; Meadows, Charles; Patton-Tackett, Eva; Yingling, Kevin W

    2016-01-01

    Every day, patients with dementia, their families, and their physicians face the enormous challenges of this pervasive life-changing condition. Seeking help, often grasping at straws, victims, and their care providers are confronted with misinformation and myths when they search the internet or other sources. When Persons with Dementia (PWD) and their caregivers believe and/or act on false information, proper treatment may be delayed, and ultimately damage can be done. In this paper, we review commonly misunderstood issues encountered in caring for PWD. Our goal is to equip Primary Care Practitioners (PCPs) with accurate information to share with patients and families, to improve the outcomes of PWD to the greatest extent possible. While there are innumerable myths about dementia and its causes and treatments, we are going to focus on the most common false claims or misunderstandings which we hear in our Internal Medicine practice at Marshall Health. We offer suggestions for busy practitioners approaching some of the more common issues with patients and families in a clinic setting.

  12. Confronting Higgcision with electric dipole moments

    Science.gov (United States)

    Cheung, Kingman; Lee, Jae Sik; Senaha, Eibun; Tseng, Po-Yan

    2014-06-01

    Current data on the signal strengths and angular spectrum of the 125.5 GeV Higgs boson still allow a CP-mixed state, namely, the pseudoscalar coupling to the top quark can be as sizable as the scalar coupling: C {/u S } ≈ C {/u P } = 1/2. CP violation can then arise and manifest in sizable electric dipole moments (EDMs). In the framework of two-Higgs-doublet models, we not only update the Higgs precision (Higgcision) study on the couplings with the most updated Higgs signal strength data, but also compute all the Higgs-mediated contributions from the 125.5 GeV Higgs boson to the EDMs, and confront the allowed parameter space against the existing constraints from the EDM measurements of Thallium, neutron, Mercury, and Thorium monoxide. We found that the combined EDM constraints restrict the pseudoscalar coupling to be less than about 10-2, unless there are contributions from other Higgs bosons, supersymmetric particles, or other exotic particles that delicately cancel the current Higgs-mediated contributions.

  13. Cyberdyn supercomputer - a tool for imaging geodinamic processes

    Science.gov (United States)

    Pomeran, Mihai; Manea, Vlad; Besutiu, Lucian; Zlagnean, Luminita

    2014-05-01

    More and more physical processes developed within the deep interior of our planet, but with significant impact on the Earth's shape and structure, become subject to numerical modelling by using high performance computing facilities. Nowadays, worldwide an increasing number of research centers decide to make use of such powerful and fast computers for simulating complex phenomena involving fluid dynamics and get deeper insight to intricate problems of Earth's evolution. With the CYBERDYN cybernetic infrastructure (CCI), the Solid Earth Dynamics Department in the Institute of Geodynamics of the Romanian Academy boldly steps into the 21st century by entering the research area of computational geodynamics. The project that made possible this advancement, has been jointly supported by EU and Romanian Government through the Structural and Cohesion Funds. It lasted for about three years, ending October 2013. CCI is basically a modern high performance Beowulf-type supercomputer (HPCC), combined with a high performance visualization cluster (HPVC) and a GeoWall. The infrastructure is mainly structured around 1344 cores and 3 TB of RAM. The high speed interconnect is provided by a Qlogic InfiniBand switch, able to transfer up to 40 Gbps. The CCI storage component is a 40 TB Panasas NAS. The operating system is Linux (CentOS). For control and maintenance, the Bright Cluster Manager package is used. The SGE job scheduler manages the job queues. CCI has been designed for a theoretical peak performance up to 11.2 TFlops. Speed tests showed that a high resolution numerical model (256 × 256 × 128 FEM elements) could be resolved with a mean computational speed of 1 time step at 30 seconds, by employing only a fraction of the computing power (20%). After passing the mandatory tests, the CCI has been involved in numerical modelling of various scenarios related to the East Carpathians tectonic and geodynamic evolution, including the Neogene magmatic activity, and the intriguing

  14. When Do We Confront? Perceptions of Costs and Benefits Predict Confronting Discrimination on Behalf of the Self and Others

    Science.gov (United States)

    Good, Jessica J.; Moss-Racusin, Corinne A.; Sanchez, Diana T.

    2012-01-01

    Across two studies, we tested whether perceived social costs and benefits of confrontation would similarly predict confronting discrimination both when it was experienced and when it was observed as directed at others. Female undergraduate participants were asked to recall past experiences and observations of sexism, as well as their confronting…

  15. Parallel Worlds

    DEFF Research Database (Denmark)

    Steno, Anne Mia

    2013-01-01

    as a symbol of something else, for instance as a way of handling uncertainty in difficult times, magical practice should also be seen as an emic concept. In this context, understanding the existence of two parallel universes, the profane and the magic, is important because the witches’ movements across...

  16. Partial Overhaul and Initial Parallel Optimization of KINETICS, a Coupled Dynamics and Chemistry Atmosphere Model

    Science.gov (United States)

    Nguyen, Howard; Willacy, Karen; Allen, Mark

    2012-01-01

    KINETICS is a coupled dynamics and chemistry atmosphere model that is data intensive and computationally demanding. The potential performance gain from using a supercomputer motivates the adaptation from a serial version to a parallelized one. Although the initial parallelization had been done, bottlenecks caused by an abundance of communication calls between processors led to an unfavorable drop in performance. Before starting on the parallel optimization process, a partial overhaul was required because a large emphasis was placed on streamlining the code for user convenience and revising the program to accommodate the new supercomputers at Caltech and JPL. After the first round of optimizations, the partial runtime was reduced by a factor of 23; however, performance gains are dependent on the size of the data, the number of processors requested, and the computer used.

  17. Data mining method for anomaly detection in the supercomputer task flow

    Science.gov (United States)

    Voevodin, Vadim; Voevodin, Vladimir; Shaikhislamov, Denis; Nikitenko, Dmitry

    2016-10-01

    The efficiency of most supercomputer applications is extremely low. At the same time, the user rarely even suspects that their applications may be wasting computing resources. Software tools need to be developed to help detect inefficient applications and report them to the users. We suggest an algorithm for detecting anomalies in the supercomputer's task flow, based on a data mining methods. System monitoring is used to calculate integral characteristics for every job executed, and the data is used as input for our classification method based on the Random Forest algorithm. The proposed approach can currently classify the application as one of three classes - normal, suspicious and definitely anomalous. The proposed approach has been demonstrated on actual applications running on the "Lomonosov" supercomputer.

  18. Confronting Misinformation in Climate Change Higher Education

    Science.gov (United States)

    Bedford, D. P.

    2012-12-01

    Among the many challenges faced by climate change educators is the highly politicized nature of the subject matter (e.g. McCright and Dunlap, 2011) and the associated misinformation from key media outlets and websites (e.g. see Oreskes and Conway, 2010). Students typically do not enter the classroom as 'blank slates', but often have already formed some opinion about climate change which may or may not be based on reputable sources. Further, many students have lives outside the classroom and/or off campus, and even those who do live in an isolated bubble of campus life will eventually graduate. Thus, providing students with a level of climate change knowledge and understanding robust enough to cope with misinformation may be an important goal for educators. This paper presents a case study of the direct use of climate change misinformation as a college-level classroom activity. Some research from other fields (notably psychology) has found that directly addressing misconceptions in the classroom can be the most effective means of dispelling them (Kowalski and Taylor, 2009). However, directly confronting misinformation in the classroom carries inherent risks, such as reinforcing misconceptions (e.g. Cook and Lewandowsky, 2011). This paper therefore considers approaches to minimizing those risks while attempting to maximize the possible benefits. This paper argues that use of misinformation as a teaching tool can provide useful exercises in critical thinking, testing of content knowledge, and consideration of the nature of science. Cook, J. and S. Lewandowsky. 2011. The Debunking Handbook. Online publication available www.skepticalscience.com/docs/Debunking_Handbook.pdf. Accessed 7 July 2012. Kowalski, P. and A.K. Taylor. 2009. DOI: 10.1080/00986280902959986. McCright, A., and R.T. Dunlap. 2011. The politicization of climate change and polarization in the American public's views of global warming, 2001-2010. The Sociological Quarterly 52:2, 155-194. Oreskes, N. and E

  19. Gauge-flation confronted with Planck

    Energy Technology Data Exchange (ETDEWEB)

    Namba, Ryo; Dimastrogiovanni, Emanuela; Peloso, Marco, E-mail: namba@physics.umn.edu, E-mail: ema@physics.umn.edu, E-mail: peloso@physics.umn.edu [School of Physics and Astronomy, University of Minnesota, Minneapolis, 55455 (United States)

    2013-11-01

    Gauge-flation is a recently proposed model in which inflation is driven solely by a non-Abelian gauge field thanks to a specific higher order derivative operator. The nature of the operator is such that it does not introduce ghosts. We compute the cosmological scalar and tensor perturbations for this model, improving over an existing computation. We then confront these results with the Planck data. The model is characterized by the quantity γ ≡ g{sup 2}Q{sup 2}/H{sup 2} (where g is the gauge coupling constant, Q the vector vev, and H the Hubble rate). For γ < 2, the scalar perturbations show a strong tachyonic instability. In the stable region, the scalar power spectrum n{sub s} is too low at small γ, while the tensor-to-scalar ratio r is too high at large γ. No value of γ leads to acceptable values for n{sub s} and r, and so the model is ruled out by the CMB data. The same behavior with γ was obtained in Chromo-natural inflation, a model in which inflation is driven by a pseudo-scalar coupled to a non-Abelian gauge field. When the pseudo-scalar can be integrated out, one recovers the model of Gauge-flation plus corrections. It was shown that this identification is very accurate at the background level, but differences emerged in the literature concerning the perturbations of the two models. On the contrary, our results show that the analogy between the two models continues to be accurate also at the perturbative level.

  20. schwimmbad: A uniform interface to parallel processing pools in Python

    Science.gov (United States)

    Price-Whelan, Adrian M.; Foreman-Mackey, Daniel

    2017-09-01

    Many scientific and computing problems require doing some calculation on all elements of some data set. If the calculations can be executed in parallel (i.e. without any communication between calculations), these problems are said to be perfectly parallel. On computers with multiple processing cores, these tasks can be distributed and executed in parallel to greatly improve performance. A common paradigm for handling these distributed computing problems is to use a processing "pool": the "tasks" (the data) are passed in bulk to the pool, and the pool handles distributing the tasks to a number of worker processes when available. schwimmbad provides a uniform interface to parallel processing pools and enables switching easily between local development (e.g., serial processing or with multiprocessing) and deployment on a cluster or supercomputer (via, e.g., MPI or JobLib).

  1. Optimisation of a parallel ocean general circulation model

    Directory of Open Access Journals (Sweden)

    M. I. Beare

    Full Text Available This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  2. Guide to dataflow supercomputing basic concepts, case studies, and a detailed example

    CERN Document Server

    Milutinovic, Veljko; Trifunovic, Nemanja; Giorgi, Roberto

    2015-01-01

    This unique text/reference describes an exciting and novel approach to supercomputing in the DataFlow paradigm. The major advantages and applications of this approach are clearly described, and a detailed explanation of the programming model is provided using simple yet effective examples. The work is developed from a series of lecture courses taught by the authors in more than 40 universities across more than 20 countries, and from research carried out by Maxeler Technologies, Inc. Topics and features: presents a thorough introduction to DataFlow supercomputing for big data problems; revie

  3. Performance Characteristics of Hybrid MPI/OpenMP Scientific Applications on a Large-Scale Multithreaded BlueGene/Q Supercomputer

    KAUST Repository

    Wu, Xingfu

    2013-07-01

    In this paper, we investigate the performance characteristics of five hybrid MPI/OpenMP scientific applications (two NAS Parallel benchmarks Multi-Zone SP-MZ and BT-MZ, an earthquake simulation PEQdyna, an aerospace application PMLB and a 3D particle-in-cell application GTC) on a large-scale multithreaded Blue Gene/Q supercomputer at Argonne National laboratory, and quantify the performance gap resulting from using different number of threads per node. We use performance tools and MPI profile and trace libraries available on the supercomputer to analyze and compare the performance of these hybrid scientific applications with increasing the number OpenMP threads per node, and find that increasing the number of threads to some extent saturates or worsens performance of these hybrid applications. For the strong-scaling hybrid scientific applications such as SP-MZ, BT-MZ, PEQdyna and PLMB, using 32 threads per node results in much better application efficiency than using 64 threads per node, and as increasing the number of threads per node, the FPU (Floating Point Unit) percentage decreases, and the MPI percentage (except PMLB) and IPC (Instructions per cycle) per core (except BT-MZ) increase. For the weak-scaling hybrid scientific application such as GTC, the performance trend (relative speedup) is very similar with increasing number of threads per node no matter how many nodes (32, 128, 512) are used. © 2013 IEEE.

  4. High-Performance Computing: Industry Uses of Supercomputers and High-Speed Networks. Report to Congressional Requesters.

    Science.gov (United States)

    General Accounting Office, Washington, DC. Information Management and Technology Div.

    This report was prepared in response to a request for information on supercomputers and high-speed networks from the Senate Committee on Commerce, Science, and Transportation, and the House Committee on Science, Space, and Technology. The following information was requested: (1) examples of how various industries are using supercomputers to…

  5. A Parallel Tree-SPH code for Galaxy Formation

    CERN Document Server

    Lia, C; Lia, Cesario; Carraro, Giovanni

    1999-01-01

    We describe a new implementation of a parallel Tree-SPH code with the aim to simulate Galaxy Formation and Evolution. The code has been parallelized using SHMEM, a Cray proprietary library to handle communications between the 256 processors of the Silicon Graphics T3E massively parallel supercomputer hosted by the Cineca Supercomputing Center (Bologna, Italy). The code combines the Smoothed Particle Hydrodynamics (SPH) method to solve hydro-dynamical equations with the popular Barnes and Hut (1986) tree-code to perform gravity calculation with a NlogN scaling, and it is based on the scalar Tree-SPH code developed by Carraro et al(1998)[MNRAS 297, 1021]. Parallelization is achieved distributing particles along processors according to a work-load criterion. Benchmarks, in terms of load-balance and scalability, of the code are analyzed and critically discussed against the adiabatic collapse of an isothermal gas sphere test using 20,000 particles on 8 processors. The code results balanced at more that 95% level. ...

  6. Parallel TreeSPH A Tool for Galaxy Formation

    CERN Document Server

    Lia, C; Lia, Cesario; Carraro, Giovanni

    1999-01-01

    We describe a new implementation of a parallel Tree-SPH code with the aim to simulate Galaxy Formation and Evolution. The code has been parallelized using SHMEM, a Cray proprietary library to handle communications between the 256 processors of the Silicon Graphics T3E massively parallel supercomputer hosted by the Cineca Super-computing Center (Bologna, Italy). The code combines the Smoothed Particle Hydrodynamics (SPH) method to solve hydro-dynamical equations with the popular Barnes and Hut (1986) tree-code to perform gravity calculation with a $N \\times logN$ scaling, and it is based on the scalar Tree-SPH code developed by Carraro et al (1998)[MNRAS 297, 1021]. Parallelization is achieved distributing particles along processors according to a work-load criterium. Benchmarks, in terms of load-balance and scalability, of the code are analised and critically discussed against the adiabatic collapse of an isothermal gas sphere test using $2 \\times 10^{4}$ particles on 8 processors. The code results balanced a...

  7. A new parallel n-body gravity solver TPM

    CERN Document Server

    Xu, G

    1994-01-01

    We have developed a gravity solver based on combining the well developed Particle-Mesh (PM) method and TREE methods. It is designed for and has been implemented on parallel computer architectures. The new code can deal with tens of millions of particles on current computers, with the calculation done on a parallel supercomputer or a group of workstations. Typically, the spatial resolution is enhanced by more than a factor of 20 over the pure PM code with mass resolution retained at nearly the PM level. This code runs much faster than a pure TREE code with the same number of particles and maintains almost the same resolution in high density regions. Multiple time step integration has also been implemented with the code, with second order time accuracy. The performance of the code has been checked in several kinds of parallel computer configuration, including IBM SP1, SGI Challenge and a group of workstations, with the speedup of the parallel code on a 32 processor IBM SP2 supercomputer nearly linear (efficienc...

  8. Comparison of neuronal spike exchange methods on a Blue Gene/P supercomputer.

    Science.gov (United States)

    Hines, Michael; Kumar, Sameer; Schürmann, Felix

    2011-01-01

    For neural network simulations on parallel machines, interprocessor spike communication can be a significant portion of the total simulation time. The performance of several spike exchange methods using a Blue Gene/P (BG/P) supercomputer has been tested with 8-128 K cores using randomly connected networks of up to 32 M cells with 1 k connections per cell and 4 M cells with 10 k connections per cell, i.e., on the order of 4·10(10) connections (K is 1024, M is 1024(2), and k is 1000). The spike exchange methods used are the standard Message Passing Interface (MPI) collective, MPI_Allgather, and several variants of the non-blocking Multisend method either implemented via non-blocking MPI_Isend, or exploiting the possibility of very low overhead direct memory access (DMA) communication available on the BG/P. In all cases, the worst performing method was that using MPI_Isend due to the high overhead of initiating a spike communication. The two best performing methods-the persistent Multisend method using the Record-Replay feature of the Deep Computing Messaging Framework DCMF_Multicast; and a two-phase multisend in which a DCMF_Multicast is used to first send to a subset of phase one destination cores, which then pass it on to their subset of phase two destination cores-had similar performance with very low overhead for the initiation of spike communication. Departure from ideal scaling for the Multisend methods is almost completely due to load imbalance caused by the large variation in number of cells that fire on each processor in the interval between synchronization. Spike exchange time itself is negligible since transmission overlaps with computation and is handled by a DMA controller. We conclude that ideal performance scaling will be ultimately limited by imbalance between incoming processor spikes between synchronization intervals. Thus, counterintuitively, maximization of load balance requires that the distribution of cells on processors should not reflect

  9. Confronting Endometriosis | NIH MedlinePlus the Magazine

    Science.gov (United States)

    ... of this page please turn JavaScript on. Feature: Endometriosis Confronting Endometriosis Past Issues / Summer 2016 Table of Contents left ... condition. Would you share your personal history with endometriosis? My symptoms started with my first period when ...

  10. Fireball and cannonball models of gamma ray bursts confront observations

    OpenAIRE

    Dar, Arnon

    2005-01-01

    The two leading contenders for the theory of gamma-ray bursts (GRBs) and their afterglows, the Fireball and Cannonball models, are compared and their predictions are confronted, within space limitations, with key GRB observations, including recent observations with SWIFT

  11. Confronting quasi-exponential inflation with WMAP seven

    Energy Technology Data Exchange (ETDEWEB)

    Pal, Barun Kumar; Pal, Supratik; Basu, B., E-mail: barunp1985@rediffmail.com, E-mail: pal@th.physik.uni-bonn.de, E-mail: banasri@isical.ac.in [Physics and Applied Mathematics Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata 700 108 (India)

    2012-04-01

    We confront quasi-exponential models of inflation with WMAP seven years dataset using Hamilton Jacobi formalism. With a phenomenological Hubble parameter, representing quasi exponential inflation, we develop the formalism and subject the analysis to confrontation with WMAP seven using the publicly available code CAMB. The observable parameters are found to fair extremely well with WMAP seven. We also obtain a ratio of tensor to scalar amplitudes which may be detectable in PLANCK.

  12. Parallel R

    CERN Document Server

    McCallum, Ethan

    2011-01-01

    It's tough to argue with R as a high-quality, cross-platform, open source statistical software product-unless you're in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You'll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don't. With these packages, you can overcome R's single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R's memory barrier.

  13. Argonne National Lab deploys Force10 networks' massively dense ethernet switch for supercomputing cluster

    CERN Multimedia

    2003-01-01

    "Force10 Networks, Inc. today announced that Argonne National Laboratory (Argonne, IL) has successfully deployed Force10 E-Series switch/routers to connect to the TeraGrid, the world's largest supercomputing grid, sponsored by the National Science Foundation (NSF)" (1/2 page).

  14. Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers

    KAUST Repository

    Wu, Xingfu

    2013-12-01

    In this paper, we present a performance modeling framework based on memory bandwidth contention time and a parameterized communication model to predict the performance of OpenMP, MPI and hybrid applications with weak scaling on three large-scale multicore supercomputers: IBM POWER4, POWER5+ and BlueGene/P, and analyze the performance of these MPI, OpenMP and hybrid applications. We use STREAM memory benchmarks and Intel\\'s MPI benchmarks to provide initial performance analysis and model validation of MPI and OpenMP applications on these multicore supercomputers because the measured sustained memory bandwidth can provide insight into the memory bandwidth that a system should sustain on scientific applications with the same amount of workload per core. In addition to using these benchmarks, we also use a weak-scaling hybrid MPI/OpenMP large-scale scientific application: Gyrokinetic Toroidal Code (GTC) in magnetic fusion to validate our performance model of the hybrid application on these multicore supercomputers. The validation results for our performance modeling method show less than 7.77% error rate in predicting the performance of hybrid MPI/OpenMP GTC on up to 512 cores on these multicore supercomputers. © 2013 Elsevier Inc.

  15. The impact of the U.S. supercomputing initiative will be global

    Energy Technology Data Exchange (ETDEWEB)

    Crawford, Dona [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2016-01-15

    Last July, President Obama issued an executive order that created a coordinated federal strategy for HPC research, development, and deployment called the U.S. National Strategic Computing Initiative (NSCI). However, this bold, necessary step toward building the next generation of supercomputers has inaugurated a new era for U.S. high performance computing (HPC).

  16. Congressional Panel Seeks To Curb Access of Foreign Students to U.S. Supercomputers.

    Science.gov (United States)

    Kiernan, Vincent

    1999-01-01

    Fearing security problems, a congressional committee on Chinese espionage recommends that foreign students and other foreign nationals be barred from using supercomputers at national laboratories unless they first obtain export licenses from the federal government. University officials dispute the data on which the report is based and find the…

  17. [Experience in simulating the structural and dynamic features of small proteins using table supercomputers].

    Science.gov (United States)

    Kondrat'ev, M S; Kabanov, A V; Komarov, V M; Khechinashvili, N N; Samchenko, A A

    2011-01-01

    The results of theoretical studies of the structural and dynamic features of peptides and small proteins have been presented that were carried out by quantum chemical and molecular dynamics methods in high-performance graphic stations, "table supercomputers", using distributed calculations by the CUDA technology.

  18. The science of computing - Parallel computation

    Science.gov (United States)

    Denning, P. J.

    1985-01-01

    Although parallel computation architectures have been known for computers since the 1920s, it was only in the 1970s that microelectronic components technologies advanced to the point where it became feasible to incorporate multiple processors in one machine. Concommitantly, the development of algorithms for parallel processing also lagged due to hardware limitations. The speed of computing with solid-state chips is limited by gate switching delays. The physical limit implies that a 1 Gflop operational speed is the maximum for sequential processors. A computer recently introduced features a 'hypercube' architecture with 128 processors connected in networks at 5, 6 or 7 points per grid, depending on the design choice. Its computing speed rivals that of supercomputers, but at a fraction of the cost. The added speed with less hardware is due to parallel processing, which utilizes algorithms representing different parts of an equation that can be broken into simpler statements and processed simultaneously. Present, highly developed computer languages like FORTRAN, PASCAL, COBOL, etc., rely on sequential instructions. Thus, increased emphasis will now be directed at parallel processing algorithms to exploit the new architectures.

  19. Parallel computing in atmospheric chemistry models

    Energy Technology Data Exchange (ETDEWEB)

    Rotman, D. [Lawrence Livermore National Lab., CA (United States). Atmospheric Sciences Div.

    1996-02-01

    Studies of atmospheric chemistry are of high scientific interest, involve computations that are complex and intense, and require enormous amounts of I/O. Current supercomputer computational capabilities are limiting the studies of stratospheric and tropospheric chemistry and will certainly not be able to handle the upcoming coupled chemistry/climate models. To enable such calculations, the authors have developed a computing framework that allows computations on a wide range of computational platforms, including massively parallel machines. Because of the fast paced changes in this field, the modeling framework and scientific modules have been developed to be highly portable and efficient. Here, the authors present the important features of the framework and focus on the atmospheric chemistry module, named IMPACT, and its capabilities. Applications of IMPACT to aircraft studies will be presented.

  20. Parallelization of the Coupled Earthquake Model

    Science.gov (United States)

    Block, Gary; Li, P. Peggy; Song, Yuhe T.

    2007-01-01

    This Web-based tsunami simulation system allows users to remotely run a model on JPL s supercomputers for a given undersea earthquake. At the time of this reporting, predicting tsunamis on the Internet has never happened before. This new code directly couples the earthquake model and the ocean model on parallel computers and improves simulation speed. Seismometers can only detect information from earthquakes; they cannot detect whether or not a tsunami may occur as a result of the earthquake. When earthquake-tsunami models are coupled with the improved computational speed of modern, high-performance computers and constrained by remotely sensed data, they are able to provide early warnings for those coastal regions at risk. The software is capable of testing NASA s satellite observations of tsunamis. It has been successfully tested for several historical tsunamis, has passed all alpha and beta testing, and is well documented for users.

  1. Internal fluid mechanics research on supercomputers for aerospace propulsion systems

    Science.gov (United States)

    Miller, Brent A.; Anderson, Bernhard H.; Szuch, John R.

    1988-01-01

    The Internal Fluid Mechanics Division of the NASA Lewis Research Center is combining the key elements of computational fluid dynamics, aerothermodynamic experiments, and advanced computational technology to bring internal computational fluid mechanics (ICFM) to a state of practical application for aerospace propulsion systems. The strategies used to achieve this goal are to: (1) pursue an understanding of flow physics, surface heat transfer, and combustion via analysis and fundamental experiments, (2) incorporate improved understanding of these phenomena into verified 3-D CFD codes, and (3) utilize state-of-the-art computational technology to enhance experimental and CFD research. Presented is an overview of the ICFM program in high-speed propulsion, including work in inlets, turbomachinery, and chemical reacting flows. Ongoing efforts to integrate new computer technologies, such as parallel computing and artificial intelligence, into high-speed aeropropulsion research are described.

  2. Parallel Lines

    Directory of Open Access Journals (Sweden)

    James G. Worner

    2017-05-01

    Full Text Available James Worner is an Australian-based writer and scholar currently pursuing a PhD at the University of Technology Sydney. His research seeks to expose masculinities lost in the shadow of Australia’s Anzac hegemony while exploring new opportunities for contemporary historiography. He is the recipient of the Doctoral Scholarship in Historical Consciousness at the university’s Australian Centre of Public History and will be hosted by the University of Bologna during 2017 on a doctoral research writing scholarship.   ‘Parallel Lines’ is one of a collection of stories, The Shapes of Us, exploring liminal spaces of modern life: class, gender, sexuality, race, religion and education. It looks at lives, like lines, that do not meet but which travel in proximity, simultaneously attracted and repelled. James’ short stories have been published in various journals and anthologies.

  3. Ada compiler validation summary report. Certificate number: 891116W1. 10191. Intel Corporation, IPSC/2 Ada, Release 1. 1, IPSC/2 parallel supercomputer, system resource manager host and IPSC/2 parallel supercomputer, CX-1 nodes target

    Energy Technology Data Exchange (ETDEWEB)

    1989-11-16

    This VSR documents the results of the validation testing performed on an Ada compiler. Testing was carried out for the following purposes: To attempt to identify any language constructs supported by the compiler that do not conform to the Ada Standard; To attempt to identify any language constructs not supported by the compiler but required by the Ada Standard; and To determine that the implementation-dependent behavior is allowed by the Ada Standard. Testing of this compiler was conducted by SofTech, Inc. under the direction of he AVF according to procedures established by the Ada Joint Program Office and administered by the Ada Validation Organization (AVO). On-side testing was completed 16 November 1989 at Aloha OR.

  4. Lattice gauge theory on the Intel parallel scientific computer

    Energy Technology Data Exchange (ETDEWEB)

    Gottlieb, S. (Department of Physics, Indiana University, Bloomington, IN (USA))

    1990-08-01

    Intel Scientific Computers (ISC) has just started producing its third general of parallel computer, the iPSC/860. Based on the i860 chip that has a peak performance of 80 Mflops and with a current maximum of 128 nodes, this computer should achieve speeds in excess of those obtainable on conventional vector supercomputers. The hardware, software and computing techniques appropriate for lattice gauge theory calculations are described. The differences between a staggered fermion conjugate gradient program written under CANOPY and for the iPSC are detailed.

  5. A tool for simulating parallel branch-and-bound methods

    Science.gov (United States)

    Golubeva, Yana; Orlov, Yury; Posypkin, Mikhail

    2016-01-01

    The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer's interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.

  6. Factors that influence on the confrontation the spinal cord injury

    Directory of Open Access Journals (Sweden)

    Montserrat Melchor Arteaga

    2011-05-01

    Full Text Available Definitions of spinal cord injury agree in the consequences of that injury is the loss in varying degrees of autonomic function; this will cause a change in the lifestyle of patients and their families.In the spinal injury, the priority is the recovery or maintenance of vital organ functions, the physical stabilization for people. Later, the priority here is the rehabilitation and adaptation. This should be integrated at all levels, physical, psychological and social. Confrontation is, by Callista Roy, a important variable for understanding the effect of stress on health and disease, health maintenance or recovery. The way, that the patients have to confront the disease, are the confrontation strategies. They are defined as thoughts and actions that persons put in place to deal with adverse changes. They are grouped into 3 categories: problems, emotions and avoidance.There are others factors that influence in the use of strategies, between them the personality. According to Eysenck this is determined by the functional interaction of four factors: cognitive (intelligence, conative (character, affective (temperament and somatic (construction. With this study we want to know the factors that influence in the confrontation of the spinal cord injury and to analyze the possible relation between them, and to be able to elaborate particular tools, on the most determinant factors, to obtain an effective confrontation about this type of disease.

  7. Interactive steering of supercomputing simulation for aerodynamic noise radiated from square cylinder; Supercomputer wo mochiita steering system ni yoru kakuchu kara hoshasareru kurikion no suchi kaiseki

    Energy Technology Data Exchange (ETDEWEB)

    Yokono, Y. [Toshiba Corp., Tokyo (Japan); Fujita, H. [Tokyo Inst. of Technology, Tokyo (Japan). Precision Engineering Lab.

    1995-03-25

    This paper describes extensive computer simulation for aerodynamic noise radiated from a square cylinder using an interactive steering supercomputing simulation system. The unsteady incompressible three-dimensional Navier-Stokes equations are solved by the finite volume method using a steering system which can visualize the numerical process during calculation and alter the numerical parameter. Using the fluctuating surface pressure of the square cylinder, the farfield sound pressure is calculated based on Lighthill-Curle`s equation. The results are compared with those of low noise wind tunnel experiments, and good agreement is observed for the peak spectrum frequency of the sound pressure level. 14 refs., 10 figs.

  8. Exploring new frontiers in statistical physics with a new, parallel Wang-Landau framework

    CERN Document Server

    Vogel, Thomas; Wüst, Thomas; Landau, David P

    2013-01-01

    Combining traditional Wang-Landau sampling for multiple replica systems with an exchange of densities of states between replicas, we describe a general framework for simulations on massively parallel Petaflop supercomputers. The advantages and general applicability of the method for simulations of complex systems are demonstrated for the classical 2D Potts spin model featuring a strong first-order transition and the self-assembly of lipid bilayers in amphiphilic solutions in a continuous model.

  9. Exploring new frontiers in statistical physics with a new, parallel Wang-Landau framework

    Science.gov (United States)

    Vogel, Thomas; Li, Ying Wai; Wüst, Thomas; Landau, David P.

    2014-03-01

    Combining traditional Wang-Landau sampling for multiple replica systems with an exchange of densities of states between replicas, we describe a general framework for simulations on massively parallel Petaflop supercomputers. The advantages and general applicability of the method for simulations of complex systems are demonstrated for the classical 2D Potts spin model featuring a strong first-order transition and the self-assembly of lipid bilayers in amphiphilic solutions in a continuous model.

  10. Exploring new frontiers in statistical physics with a new, parallel Wang-Landau framework

    Energy Technology Data Exchange (ETDEWEB)

    Vogel, Thomas [University of Georgia, Athens, GA; Li, Ying Wai [ORNL; Wuest, Thomas [Swiss Federal Research Institute, Switzerland; Landau, David P [University of Georgia, Athens, GA

    2014-01-01

    Combining traditional Wang Landau sampling for multiple replica systems with an exchange of densities of states between replicas we describe a general framework for simulations on massively parallel Petaflop supercomputers. The advantages and general applicability of the method for simulations of complex systems are demonstrated for the classical 2D Potts spin model featuring a strong first-order transition and the self-assembly of lipid bilayers in amphiphilic solutions in a continuous model.

  11. B-MIC: An Ultrafast Three-Level Parallel Sequence Aligner Using MIC.

    Science.gov (United States)

    Cui, Yingbo; Liao, Xiangke; Zhu, Xiaoqian; Wang, Bingqiang; Peng, Shaoliang

    2016-03-01

    Sequence alignment is the central process for sequence analysis, where mapping raw sequencing data to reference genome. The large amount of data generated by NGS is far beyond the process capabilities of existing alignment tools. Consequently, sequence alignment becomes the bottleneck of sequence analysis. Intensive computing power is required to address this challenge. Intel recently announced the MIC coprocessor, which can provide massive computing power. The Tianhe-2 is the world's fastest supercomputer now equipped with three MIC coprocessors each compute node. A key feature of sequence alignment is that different reads are independent. Considering this property, we proposed a MIC-oriented three-level parallelization strategy to speed up BWA, a widely used sequence alignment tool, and developed our ultrafast parallel sequence aligner: B-MIC. B-MIC contains three levels of parallelization: firstly, parallelization of data IO and reads alignment by a three-stage parallel pipeline; secondly, parallelization enabled by MIC coprocessor technology; thirdly, inter-node parallelization implemented by MPI. In this paper, we demonstrate that B-MIC outperforms BWA by a combination of those techniques using Inspur NF5280M server and the Tianhe-2 supercomputer. To the best of our knowledge, B-MIC is the first sequence alignment tool to run on Intel MIC and it can achieve more than fivefold speedup over the original BWA while maintaining the alignment precision.

  12. ParCAT: Parallel Climate Analysis Toolkit

    Energy Technology Data Exchange (ETDEWEB)

    Smith, Brian E. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Steed, Chad A. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Shipman, Galen M. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Ricciuto, Daniel M. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Thornton, Peter E. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Wehner, Michael [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Williams, Dean N. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2013-01-01

    Climate science is employing increasingly complex models and simulations to analyze the past and predict the future of Earth s climate. This growth in complexity is creating a widening gap between the data being produced and the ability to analyze the datasets. Parallel computing tools are necessary to analyze, compare, and interpret the simulation data. The Parallel Climate Analysis Toolkit (ParCAT) provides basic tools to efficiently use parallel computing techniques to make analysis of these datasets manageable. The toolkit provides the ability to compute spatio-temporal means, differences between runs or differences between averages of runs, and histograms of the values in a data set. ParCAT is implemented as a command-line utility written in C. This allows for easy integration in other tools and allows for use in scripts. This also makes it possible to run ParCAT on many platforms from laptops to supercomputers. ParCAT outputs NetCDF files so it is compatible with existing utilities such as Panoply and UV-CDAT. This paper describes ParCAT and presents results from some example runs on the Titan system at ORNL.

  13. Software development strategies for parallel computer architectures

    Science.gov (United States)

    Gruber, Ralf; Cooper, W. Anthony; Beniston, Martin; Gengler, Marc; Merazzi, Silvio

    1991-09-01

    As pragmatic users of high performance supercomputers, we believe that nowadays parallel computer architectures with disturbed memories are not yet mature to be used by a wide range of application engineers. A big effort should be made to bring these very promising computers closer to the users. One major flaw of massively parallel machines is that the programmer has to take care himself of the data flow which is often different on different parallel computers. To overcome this problem, we propose that data structures be standardized. The data base then can become an integrated part of the system and the data flow for a given algorithm can be easily prescribed. Fixing data structures forces the computer manufacturer to rather adapt his machine to user's demands and not, as it happens now, the user has to adapt to the innovative computer science approach of the computer manufacturer. In this paper, we present data standards chosen for our ASTRID programming platform for research scientist and engineers, as well as a plasma physics application which won the Cray Gigaflop Performance Awards 1989 and 1990 and which was succesfully ported on an INTEL iPSC/2 hypercube.

  14. Confronting moral distress in Nursing: recognizing nurses as moral agents

    Directory of Open Access Journals (Sweden)

    Franco A. Carnevale

    2013-09-01

    Full Text Available The concept of moral distress has brought forth a substantively different way of understanding some of the difficulties confronted by nurses in their practice. This concept highlights that nurses' distress can be an indication of nurses' conscientious moral engagement with their professional practice that has confronted practices or an environment that impedes them from acting according to their own ethical standards. Moral distress can be an indicator of problems in nurses' practice environments. This concept is described and related to moral agency in nursing practice. Selected research on moral distress is reviewed, followed by a discussion of recommendations for addressing this problem.

  15. The TianHe-1A Supercomputer: Its Hardware and Software

    Institute of Scientific and Technical Information of China (English)

    Xue-Jun Yang; Xiang-Ke Liao; Kai Lu; Qing-Feng Hu; Jun-Qiang Song; Jin-Shu Su

    2011-01-01

    This paper presents an overview of TianHe-1A (TH-1A) supercomputer, which is built by National University of Defense Technology of China (NUDT). TH-1A adopts a hybrid architecture by integrating CPUs and GPUs, and its interconnect network is a proprietary high-speed communication network. The theoretical peak performance of TH-1A is 4700TFlops, and its LINPACK test result is 2566TFlops. It was ranked the No. 1 on the TOP500 List released in November, 2010. TH-1A is now deployed in National Supercomputer Center in Tianjin and provides high performance computing services. TH-1A has played an important role in many applications, such as oil exploration, weather forecast, bio-medical research.

  16. HACC: Simulating Sky Surveys on State-of-the-Art Supercomputing Architectures

    CERN Document Server

    Habib, Salman; Finkel, Hal; Frontiere, Nicholas; Heitmann, Katrin; Daniel, David; Fasel, Patricia; Morozov, Vitali; Zagaris, George; Peterka, Tom; Vishwanath, Venkatram; Lukic, Zarija; Sehrish, Saba; Liao, Wei-keng

    2014-01-01

    Current and future surveys of large-scale cosmic structure are associated with a massive and complex datastream to study, characterize, and ultimately understand the physics behind the two major components of the 'Dark Universe', dark energy and dark matter. In addition, the surveys also probe primordial perturbations and carry out fundamental measurements, such as determining the sum of neutrino masses. Large-scale simulations of structure formation in the Universe play a critical role in the interpretation of the data and extraction of the physics of interest. Just as survey instruments continue to grow in size and complexity, so do the supercomputers that enable these simulations. Here we report on HACC (Hardware/Hybrid Accelerated Cosmology Code), a recently developed and evolving cosmology N-body code framework, designed to run efficiently on diverse computing architectures and to scale to millions of cores and beyond. HACC can run on all current supercomputer architectures and supports a variety of prog...

  17. Sandia`s network for supercomputing `95: Validating the progress of Asynchronous Transfer Mode (ATM) switching

    Energy Technology Data Exchange (ETDEWEB)

    Pratt, T.J.; Vahle, O.; Gossage, S.A.

    1996-04-01

    The Advanced Networking Integration Department at Sandia National Laboratories has used the annual Supercomputing conference sponsored by the IEEE and ACM for the past three years as a forum to demonstrate and focus communication and networking developments. For Supercomputing `95, Sandia elected: to demonstrate the functionality and capability of an AT&T Globeview 20Gbps Asynchronous Transfer Mode (ATM) switch, which represents the core of Sandia`s corporate network, to build and utilize a three node 622 megabit per second Paragon network, and to extend the DOD`s ACTS ATM Internet from Sandia, New Mexico to the conference`s show floor in San Diego, California, for video demonstrations. This paper documents those accomplishments, discusses the details of their implementation, and describes how these demonstrations supports Sandia`s overall strategies in ATM networking.

  18. Supercomputer and cluster performance modeling and analysis efforts:2004-2006.

    Energy Technology Data Exchange (ETDEWEB)

    Sturtevant, Judith E.; Ganti, Anand; Meyer, Harold (Hal) Edward; Stevenson, Joel O.; Benner, Robert E., Jr. (.,; .); Goudy, Susan Phelps; Doerfler, Douglas W.; Domino, Stefan Paul; Taylor, Mark A.; Malins, Robert Joseph; Scott, Ryan T.; Barnette, Daniel Wayne; Rajan, Mahesh; Ang, James Alfred; Black, Amalia Rebecca; Laub, Thomas William; Vaughan, Courtenay Thomas; Franke, Brian Claude

    2007-02-01

    This report describes efforts by the Performance Modeling and Analysis Team to investigate performance characteristics of Sandia's engineering and scientific applications on the ASC capability and advanced architecture supercomputers, and Sandia's capacity Linux clusters. Efforts to model various aspects of these computers are also discussed. The goals of these efforts are to quantify and compare Sandia's supercomputer and cluster performance characteristics; to reveal strengths and weaknesses in such systems; and to predict performance characteristics of, and provide guidelines for, future acquisitions and follow-on systems. Described herein are the results obtained from running benchmarks and applications to extract performance characteristics and comparisons, as well as modeling efforts, obtained during the time period 2004-2006. The format of the report, with hypertext links to numerous additional documents, purposefully minimizes the document size needed to disseminate the extensive results from our research.

  19. BSMBench: a flexible and scalable supercomputer benchmark from computational particle physics

    CERN Document Server

    Bennett, Ed; Del Debbio, Luigi; Jordan, Kirk; Patella, Agostino; Pica, Claudio; Rago, Antonio

    2016-01-01

    Benchmarking plays a central role in the evaluation of High Performance Computing architectures. Several benchmarks have been designed that allow users to stress various components of supercomputers. In order for the figures they provide to be useful, benchmarks need to be representative of the most common real-world scenarios. In this work, we introduce BSMBench, a benchmarking suite derived from Monte Carlo code used in computational particle physics. The advantage of this suite (which can be freely downloaded from http://www.bsmbench.org/) over others is the capacity to vary the relative importance of computation and communication. This enables the tests to simulate various practical situations. To showcase BSMBench, we perform a wide range of tests on various architectures, from desktop computers to state-of-the-art supercomputers, and discuss the corresponding results. Possible future directions of development of the benchmark are also outlined.

  20. Towards 21st century stellar models: Star clusters, supercomputing and asteroseismology

    Science.gov (United States)

    Campbell, S. W.; Constantino, T. N.; D'Orazi, V.; Meakin, C.; Stello, D.; Christensen-Dalsgaard, J.; Kuehn, C.; De Silva, G. M.; Arnett, W. D.; Lattanzio, J. C.; MacLean, B. T.

    2016-09-01

    Stellar models provide a vital basis for many aspects of astronomy and astrophysics. Recent advances in observational astronomy - through asteroseismology, precision photometry, high-resolution spectroscopy, and large-scale surveys - are placing stellar models under greater quantitative scrutiny than ever. The model limitations are being exposed and the next generation of stellar models is needed as soon as possible. The current uncertainties in the models propagate to the later phases of stellar evolution, hindering our understanding of stellar populations and chemical evolution. Here we give a brief overview of the evolution, importance, and substantial uncertainties of core helium burning stars in particular and then briefly discuss a range of methods, both theoretical and observational, that we are using to advance the modelling. This study uses observational data from from HST, VLT, AAT, Kepler, and supercomputing resources in Australia provided by the National Computational Infrastructure (NCI) and Pawsey Supercomputing Centre.

  1. Analyzing the Interplay of Failures and Workload on a Leadership-Class Supercomputer

    Energy Technology Data Exchange (ETDEWEB)

    Meneses, Esteban [University of Pittsburgh; Ni, Xiang [University of Illinois at Urbana-Champaign; Jones, Terry R [ORNL; Maxwell, Don E [ORNL

    2015-01-01

    The unprecedented computational power of cur- rent supercomputers now makes possible the exploration of complex problems in many scientific fields, from genomic analysis to computational fluid dynamics. Modern machines are powerful because they are massive: they assemble millions of cores and a huge quantity of disks, cards, routers, and other components. But it is precisely the size of these machines that glooms the future of supercomputing. A system that comprises many components has a high chance to fail, and fail often. In order to make the next generation of supercomputers usable, it is imperative to use some type of fault tolerance platform to run applications on large machines. Most fault tolerance strategies can be optimized for the peculiarities of each system and boost efficacy by keeping the system productive. In this paper, we aim to understand how failure characterization can improve resilience in several layers of the software stack: applications, runtime systems, and job schedulers. We examine the Titan supercomputer, one of the fastest systems in the world. We analyze a full year of Titan in production and distill the failure patterns of the machine. By looking into Titan s log files and using the criteria of experts, we provide a detailed description of the types of failures. In addition, we inspect the job submission files and describe how the system is used. Using those two sources, we cross correlate failures in the machine to executing jobs and provide a picture of how failures affect the user experience. We believe such characterization is fundamental in developing appropriate fault tolerance solutions for Cray systems similar to Titan.

  2. Enabling Diverse Software Stacks on Supercomputers using High Performance Virtual Clusters.

    Energy Technology Data Exchange (ETDEWEB)

    Younge, Andrew J. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Pedretti, Kevin [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Grant, Ryan [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Brightwell, Ron [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-05-01

    While large-scale simulations have been the hallmark of the High Performance Computing (HPC) community for decades, Large Scale Data Analytics (LSDA) workloads are gaining attention within the scientific community not only as a processing component to large HPC simulations, but also as standalone scientific tools for knowledge discovery. With the path towards Exascale, new HPC runtime systems are also emerging in a way that differs from classical distributed com- puting models. However, system software for such capabilities on the latest extreme-scale DOE supercomputing needs to be enhanced to more appropriately support these types of emerging soft- ware ecosystems. In this paper, we propose the use of Virtual Clusters on advanced supercomputing resources to enable systems to support not only HPC workloads, but also emerging big data stacks. Specifi- cally, we have deployed the KVM hypervisor within Cray's Compute Node Linux on a XC-series supercomputer testbed. We also use libvirt and QEMU to manage and provision VMs directly on compute nodes, leveraging Ethernet-over-Aries network emulation. To our knowledge, this is the first known use of KVM on a true MPP supercomputer. We investigate the overhead our solution using HPC benchmarks, both evaluating single-node performance as well as weak scaling of a 32-node virtual cluster. Overall, we find single node performance of our solution using KVM on a Cray is very efficient with near-native performance. However overhead increases by up to 20% as virtual cluster size increases, due to limitations of the Ethernet-over-Aries bridged network. Furthermore, we deploy Apache Spark with large data analysis workloads in a Virtual Cluster, ef- fectively demonstrating how diverse software ecosystems can be supported by High Performance Virtual Clusters.

  3. US Department of Energy High School Student Supercomputing Honors Program: A follow-up assessment

    Energy Technology Data Exchange (ETDEWEB)

    1987-01-01

    The US DOE High School Student Supercomputing Honors Program was designed to recognize high school students with superior skills in mathematics and computer science and to provide them with formal training and experience with advanced computer equipment. This document reports on the participants who attended the first such program, which was held at the National Magnetic Fusion Energy Computer Center at the Lawrence Livermore National Laboratory (LLNL) during August 1985.

  4. Application of Supercomputer Technologies for Simulation Of Socio-Economic Systems

    Directory of Open Access Journals (Sweden)

    Vladimir Valentinovich Okrepilov

    2015-06-01

    Full Text Available To date, an extensive experience has been accumulated in investigation of problems related to quality, assessment of management systems, modeling of economic system sustainability. The performed studies have created a basis for development of a new research area — Economics of Quality. Its tools allow to use opportunities of model simulation for construction of the mathematical models adequately reflecting the role of quality in natural, technical, social regularities of functioning of the complex socio-economic systems. Extensive application and development of models, and also system modeling with use of supercomputer technologies, on our deep belief, will bring the conducted research of socio-economic systems to essentially new level. Moreover, the current scientific research makes a significant contribution to model simulation of multi-agent social systems and that is not less important, it belongs to the priority areas in development of science and technology in our country. This article is devoted to the questions of supercomputer technologies application in public sciences, first of all, — regarding technical realization of the large-scale agent-focused models (AFM. The essence of this tool is that owing to the power computer increase it has become possible to describe the behavior of many separate fragments of a difficult system, as socio-economic systems are. The article also deals with the experience of foreign scientists and practicians in launching the AFM on supercomputers, and also the example of AFM developed in CEMI RAS, stages and methods of effective calculating kernel display of multi-agent system on architecture of a modern supercomputer will be analyzed. The experiments on the basis of model simulation on forecasting the population of St. Petersburg according to three scenarios as one of the major factors influencing the development of socio-economic system and quality of life of the population are presented in the

  5. Multiscale Hy3S: Hybrid stochastic simulation for supercomputers

    Directory of Open Access Journals (Sweden)

    Kaznessis Yiannis N

    2006-02-01

    Full Text Available Abstract Background Stochastic simulation has become a useful tool to both study natural biological systems and design new synthetic ones. By capturing the intrinsic molecular fluctuations of "small" systems, these simulations produce a more accurate picture of single cell dynamics, including interesting phenomena missed by deterministic methods, such as noise-induced oscillations and transitions between stable states. However, the computational cost of the original stochastic simulation algorithm can be high, motivating the use of hybrid stochastic methods. Hybrid stochastic methods partition the system into multiple subsets and describe each subset as a different representation, such as a jump Markov, Poisson, continuous Markov, or deterministic process. By applying valid approximations and self-consistently merging disparate descriptions, a method can be considerably faster, while retaining accuracy. In this paper, we describe Hy3S, a collection of multiscale simulation programs. Results Building on our previous work on developing novel hybrid stochastic algorithms, we have created the Hy3S software package to enable scientists and engineers to both study and design extremely large well-mixed biological systems with many thousands of reactions and chemical species. We have added adaptive stochastic numerical integrators to permit the robust simulation of dynamically stiff biological systems. In addition, Hy3S has many useful features, including embarrassingly parallelized simulations with MPI; special discrete events, such as transcriptional and translation elongation and cell division; mid-simulation perturbations in both the number of molecules of species and reaction kinetic parameters; combinatorial variation of both initial conditions and kinetic parameters to enable sensitivity analysis; use of NetCDF optimized binary format to quickly read and write large datasets; and a simple graphical user interface, written in Matlab, to help users

  6. Study on Parallel Computing

    Institute of Scientific and Technical Information of China (English)

    Guo-Liang Chen; Guang-Zhong Sun; Yun-Quan Zhang; Ze-Yao Mo

    2006-01-01

    In this paper, we present a general survey on parallel computing. The main contents include parallel computer system which is the hardware platform of parallel computing, parallel algorithm which is the theoretical base of parallel computing, parallel programming which is the software support of parallel computing. After that, we also introduce some parallel applications and enabling technologies. We argue that parallel computing research should form an integrated methodology of "architecture - algorithm - programming - application". Only in this way, parallel computing research becomes continuous development and more realistic.

  7. Building more powerful less expensive supercomputers using Processing-In-Memory (PIM) LDRD final report.

    Energy Technology Data Exchange (ETDEWEB)

    Murphy, Richard C.

    2009-09-01

    This report details the accomplishments of the 'Building More Powerful Less Expensive Supercomputers Using Processing-In-Memory (PIM)' LDRD ('PIM LDRD', number 105809) for FY07-FY09. Latency dominates all levels of supercomputer design. Within a node, increasing memory latency, relative to processor cycle time, limits CPU performance. Between nodes, the same increase in relative latency impacts scalability. Processing-In-Memory (PIM) is an architecture that directly addresses this problem using enhanced chip fabrication technology and machine organization. PIMs combine high-speed logic and dense, low-latency, high-bandwidth DRAM, and lightweight threads that tolerate latency by performing useful work during memory transactions. This work examines the potential of PIM-based architectures to support mission critical Sandia applications and an emerging class of more data intensive informatics applications. This work has resulted in a stronger architecture/implementation collaboration between 1400 and 1700. Additionally, key technology components have impacted vendor roadmaps, and we are in the process of pursuing these new collaborations. This work has the potential to impact future supercomputer design and construction, reducing power and increasing performance. This final report is organized as follow: this summary chapter discusses the impact of the project (Section 1), provides an enumeration of publications and other public discussion of the work (Section 1), and concludes with a discussion of future work and impact from the project (Section 1). The appendix contains reprints of the refereed publications resulting from this work.

  8. Confronting Bias through Teaching: Insights from Social Psychology

    Science.gov (United States)

    Crittle, Chelsea; Maddox, Keith B.

    2017-01-01

    Research in social psychology has the potential to address real-world issues involving racial stereotyping, prejudice, and discrimination. Literature on confrontation suggests that addressing racism can be seen as a persuasive act that will allow for more effective interpersonal interactions. In this article, we explore the persuasive…

  9. Performance analysis of high quality parallel preconditioners applied to 3D finite element structural analysis

    Energy Technology Data Exchange (ETDEWEB)

    Kolotilina, L.; Nikishin, A.; Yeremin, A. [and others

    1994-12-31

    The solution of large systems of linear equations is a crucial bottleneck when performing 3D finite element analysis of structures. Also, in many cases the reliability and robustness of iterative solution strategies, and their efficiency when exploiting hardware resources, fully determine the scope of industrial applications which can be solved on a particular computer platform. This is especially true for modern vector/parallel supercomputers with large vector length and for modern massively parallel supercomputers. Preconditioned iterative methods have been successfully applied to industrial class finite element analysis of structures. The construction and application of high quality preconditioners constitutes a high percentage of the total solution time. Parallel implementation of high quality preconditioners on such architectures is a formidable challenge. Two common types of existing preconditioners are the implicit preconditioners and the explicit preconditioners. The implicit preconditioners (e.g. incomplete factorizations of several types) are generally high quality but require solution of lower and upper triangular systems of equations per iteration which are difficult to parallelize without deteriorating the convergence rate. The explicit type of preconditionings (e.g. polynomial preconditioners or Jacobi-like preconditioners) require sparse matrix-vector multiplications and can be parallelized but their preconditioning qualities are less than desirable. The authors present results of numerical experiments with Factorized Sparse Approximate Inverses (FSAI) for symmetric positive definite linear systems. These are high quality preconditioners that possess a large resource of parallelism by construction without increasing the serial complexity.

  10. Survey of new vector computers: The CRAY 1S from CRAY research; the CYBER 205 from CDC and the parallel computer from ICL - architecture and programming

    Science.gov (United States)

    Gentzsch, W.

    1982-01-01

    Problems which can arise with vector and parallel computers are discussed in a user oriented context. Emphasis is placed on the algorithms used and the programming techniques adopted. Three recently developed supercomputers are examined and typical application examples are given in CRAY FORTRAN, CYBER 205 FORTRAN and DAP (distributed array processor) FORTRAN. The systems performance is compared. The addition of parts of two N x N arrays is considered. The influence of the architecture on the algorithms and programming language is demonstrated. Numerical analysis of magnetohydrodynamic differential equations by an explicit difference method is illustrated, showing very good results for all three systems. The prognosis for supercomputer development is assessed.

  11. (Studies of ocean predictability at decade to century time scales using a global ocean general circulation model in a parallel competing environment). [Large Scale Geostrophic Model

    Energy Technology Data Exchange (ETDEWEB)

    1992-03-10

    The first phase of the proposed work is largely completed on schedule. Scientists at the San Diego Supercomputer Center (SDSC) succeeded in putting a version of the Hamburg isopycnal coordinate ocean model (OPYC) onto the INTEL parallel computer. Due to the slow run speeds of the OPYC on the parallel machine, another ocean is being model used during the first part of phase 2. The model chosen is the Large Scale Geostrophic (LSG) model form the Max Planck Institute.

  12. [Studies of ocean predictability at decade to century time scales using a global ocean general circulation model in a parallel competing environment]. Progress report

    Energy Technology Data Exchange (ETDEWEB)

    1992-03-10

    The first phase of the proposed work is largely completed on schedule. Scientists at the San Diego Supercomputer Center (SDSC) succeeded in putting a version of the Hamburg isopycnal coordinate ocean model (OPYC) onto the INTEL parallel computer. Due to the slow run speeds of the OPYC on the parallel machine, another ocean is being model used during the first part of phase 2. The model chosen is the Large Scale Geostrophic (LSG) model form the Max Planck Institute.

  13. Progress on the Multiphysics Capabilities of the Parallel Electromagnetic ACE3P Simulation Suite

    Energy Technology Data Exchange (ETDEWEB)

    Kononenko, Oleksiy [SLAC National Accelerator Lab., Menlo Park, CA (United States)

    2015-03-26

    ACE3P is a 3D parallel simulation suite that is being developed at SLAC National Accelerator Laboratory. Effectively utilizing supercomputer resources, ACE3P has become a key tool for the coupled electromagnetic, thermal and mechanical research and design of particle accelerators. Based on the existing finite-element infrastructure, a massively parallel eigensolver is developed for modal analysis of mechanical structures. It complements a set of the multiphysics tools in ACE3P and, in particular, can be used for the comprehensive study of microphonics in accelerating cavities ensuring the operational reliability of a particle accelerator.

  14. A Hybrid Decomposition Parallel Implementation of the Car-Parrinello Method

    CERN Document Server

    Wiggs, J K; Wiggs, James K.; Jonsson, Hannes

    1994-01-01

    We have developed a flexible hybrid decomposition parallel implementation of the first-principles molecular dynamics algorithm of Car and Parrinello. The code allows the problem to be decomposed either spatially, over the electronic orbitals, or any combination of the two. Performance statistics for 32, 64, 128 and 512 Si atom runs on the Touchstone Delta and Intel Paragon parallel supercomputers and comparison with the performance of an optimized code running the smaller systems on the Cray Y-MP and C90 are presented.

  15. CODE BLUE: Three dimensional massively-parallel simulation of multi-scale configurations

    Science.gov (United States)

    Juric, Damir; Kahouadji, Lyes; Chergui, Jalel; Shin, Seungwon; Craster, Richard; Matar, Omar

    2016-11-01

    We present recent progress on BLUE, a solver for massively parallel simulations of fully three-dimensional multiphase flows which runs on a variety of computer architectures from laptops to supercomputers and on 131072 threads or more (limited only by the availability to us of more threads). The code is wholly written in Fortran 2003 and uses a domain decomposition strategy for parallelization with MPI. The fluid interface solver is based on a parallel implementation of a hybrid Front Tracking/Level Set method designed to handle highly deforming interfaces with complex topology changes. We developed parallel GMRES and multigrid iterative solvers suited to the linear systems arising from the implicit solution for the fluid velocities and pressure in the presence of strong density and viscosity discontinuities across fluid phases. Particular attention is drawn to the details and performance of the parallel Multigrid solver. EPSRC UK Programme Grant MEMPHIS (EP/K003976/1).

  16. Streamline Integration using MPI-Hybrid Parallelism on a Large Multi-Core Architecture

    Energy Technology Data Exchange (ETDEWEB)

    Camp, David; Garth, Christoph; Childs, Hank; Pugmire, Dave; Joy, Kenneth I.

    2010-11-01

    Streamline computation in a very large vector field data set represents a significant challenge due to the non-local and datadependentnature of streamline integration. In this paper, we conduct a study of the performance characteristics of hybrid parallel programmingand execution as applied to streamline integration on a large, multicore platform. With multi-core processors now prevalent in clustersand supercomputers, there is a need to understand the impact of these hybrid systems in order to make the best implementation choice.We use two MPI-based distribution approaches based on established parallelization paradigms, parallelize-over-seeds and parallelize-overblocks,and present a novel MPI-hybrid algorithm for each approach to compute streamlines. Our findings indicate that the work sharing betweencores in the proposed MPI-hybrid parallel implementation results in much improved performance and consumes less communication andI/O bandwidth than a traditional, non-hybrid distributed implementation.

  17. Streamline integration using MPI-hybrid parallelism on a large multicore architecture.

    Science.gov (United States)

    Camp, David; Garth, Christoph; Childs, Hank; Pugmire, Dave; Joy, Kenneth I

    2011-11-01

    Streamline computation in a very large vector field data set represents a significant challenge due to the nonlocal and data-dependent nature of streamline integration. In this paper, we conduct a study of the performance characteristics of hybrid parallel programming and execution as applied to streamline integration on a large, multicore platform. With multicore processors now prevalent in clusters and supercomputers, there is a need to understand the impact of these hybrid systems in order to make the best implementation choice. We use two MPI-based distribution approaches based on established parallelization paradigms, parallelize over seeds and parallelize over blocks, and present a novel MPI-hybrid algorithm for each approach to compute streamlines. Our findings indicate that the work sharing between cores in the proposed MPI-hybrid parallel implementation results in much improved performance and consumes less communication and I/O bandwidth than a traditional, nonhybrid distributed implementation.

  18. A parallel finite-difference method for computational aerodynamics

    Science.gov (United States)

    Swisshelm, Julie M.

    1989-01-01

    A finite-difference scheme for solving complex three-dimensional aerodynamic flow on parallel-processing supercomputers is presented. The method consists of a basic flow solver with multigrid convergence acceleration, embedded grid refinements, and a zonal equation scheme. Multitasking and vectorization have been incorporated into the algorithm. Results obtained include multiprocessed flow simulations from the Cray X-MP and Cray-2. Speedups as high as 3.3 for the two-dimensional case and 3.5 for segments of the three-dimensional case have been achieved on the Cray-2. The entire solver attained a factor of 2.7 improvement over its unitasked version on the Cray-2. The performance of the parallel algorithm on each machine is analyzed.

  19. Efficient Multidimensional Data Redistribution for Resizable Parallel Computations

    CERN Document Server

    Sudarsan, Rajesh

    2007-01-01

    Traditional parallel schedulers running on cluster supercomputers support only static scheduling, where the number of processors allocated to an application remains fixed throughout the execution of the job. This results in under-utilization of idle system resources thereby decreasing overall system throughput. In our research, we have developed a prototype framework called ReSHAPE, which supports dynamic resizing of parallel MPI applications executing on distributed memory platforms. The resizing library in ReSHAPE includes support for releasing and acquiring processors and efficiently redistributing application state to a new set of processors. In this paper, we derive an algorithm for redistributing two-dimensional block-cyclic arrays from $P$ to $Q$ processors, organized as 2-D processor grids. The algorithm ensures a contention-free communication schedule for data redistribution if $P_r \\leq Q_r$ and $P_c \\leq Q_c$. In other cases, the algorithm implements circular row and column shifts on the communicat...

  20. SiGN-SSM: open source parallel software for estimating gene networks with state space models.

    Science.gov (United States)

    Tamada, Yoshinori; Yamaguchi, Rui; Imoto, Seiya; Hirose, Osamu; Yoshida, Ryo; Nagasaki, Masao; Miyano, Satoru

    2011-04-15

    SiGN-SSM is an open-source gene network estimation software able to run in parallel on PCs and massively parallel supercomputers. The software estimates a state space model (SSM), that is a statistical dynamic model suitable for analyzing short time and/or replicated time series gene expression profiles. SiGN-SSM implements a novel parameter constraint effective to stabilize the estimated models. Also, by using a supercomputer, it is able to determine the gene network structure by a statistical permutation test in a practical time. SiGN-SSM is applicable not only to analyzing temporal regulatory dependencies between genes, but also to extracting the differentially regulated genes from time series expression profiles. SiGN-SSM is distributed under GNU Affero General Public Licence (GNU AGPL) version 3 and can be downloaded at http://sign.hgc.jp/signssm/. The pre-compiled binaries for some architectures are available in addition to the source code. The pre-installed binaries are also available on the Human Genome Center supercomputer system. The online manual and the supplementary information of SiGN-SSM is available on our web site. tamada@ims.u-tokyo.ac.jp.

  1. Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format.

    Science.gov (United States)

    Bustamam, Alhadi; Burrage, Kevin; Hamilton, Nicholas A

    2012-01-01

    Markov clustering (MCL) is becoming a key algorithm within bioinformatics for determining clusters in networks. However,with increasing vast amount of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, GPU computing, which uses CUDA tool for implementing a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient, and low-cost option to achieve substantial performance gains over CPU approaches. The use of on-chip memory on the GPU is efficiently lowering the latency time, thus, circumventing a major issue in other parallel computing environments, such as MPI. We introduce a very fast Markov clustering algorithm using CUDA (CUDA-MCL) to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of MCL. We utilized ELLPACK-R sparse format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks data sets in bioinformatics applications. As the results show, CUDA-MCL is significantly faster than the original MCL running on CPU. Thus, large-scale parallel computation on off-the-shelf desktop-machines, that were previously only possible on supercomputing architectures, can significantly change the way bioinformaticians and biologists deal with their data.

  2. Time-dependent density-functional theory in massively parallel computer architectures: the octopus project

    Science.gov (United States)

    Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.

    2012-06-01

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.

  3. Confronting Violence, Improving Women's Lives Special Display Opens at NLM | NIH MedlinePlus the Magazine

    Science.gov (United States)

    ... of this page please turn JavaScript on. Confronting Violence, Improving Women's Lives Special Display Opens at NLM Past Issues / ... Medicine Division. Photo Courtesy of Lisa Helfert Confronting Violence, Improving Women's Lives is on display in the NLM History ...

  4. A Perfect Storm: The Dynamics Confronting U.S. Agribusiness

    Science.gov (United States)

    2008-01-01

    should assign one existing federal agency the lead in addressing the water resource challenges that face America in general and agribusiness in...These cuts will have a particular impact on food relief efforts in Latin America , Central Asia and Africa. With the United States supplying half...0 Spring 2008 Industry Study Final Report Agribusiness Industry A Perfect Storm: The Dynamics Confronting U.S. Agribusiness

  5. Academic Challenges That Chinese Overseas Students in the UK Confront

    Institute of Scientific and Technical Information of China (English)

    王佳妮

    2014-01-01

    Under the education globalization tendency, studying abroad, which is a core topic in education, has more and more frequently appeared in the international environment in recent years.Study will never be an easy social activity, needless to say study abroad, which involves people from different cultures. This essay focuses on the academic challenges that the Chinese overseas students in UK confront in their studies.

  6. Academic Challenges That Chinese Overseas Students in the UK Confront

    Institute of Scientific and Technical Information of China (English)

    王佳妮

    2014-01-01

    Under the education globalization tendency, studying abroad, which is a core topic in education, has more and more frequently appeared in the international environment in recent years.Study will never be an easy social activity, needless to say study abroad, which involves people from different cultures.This essay focuses on the academic challenges that the Chinese overseas students in UK confront in their studies.

  7. Performance evaluation of interconnected logistics networks confronted to hub disruptions

    OpenAIRE

    2016-01-01

    International audience; This paper investigates performance of interconnected logistics networks confronted to disruptions at hub level. With traditional supply chain network design, companies define and optimize their own logistics networks, resulting in current logistics systems being a set of independent heterogeneous logistics networks. The concept of PI aims to integrate independent logistics networks into a global, open, interconnected system. Prior research has shown that the new organ...

  8. Views of Discrimination among Individuals Confronting Genetic Disease

    OpenAIRE

    Klitzman, Robert

    2010-01-01

    Though the US passed the Genetic Information Non-Discrimination Act, many questions remain of how individuals confronting genetic disease view and experience possible discrimination. We interviewed, for 2 hours each, 64 individuals who had, or were at risk for, Huntington’s Disease, breast cancer, or Alpha-1 antitrypsin deficiency. Discrimination can be implicit, indirect and subtle, rather than explicit, direct and overt; and be hard to prove. Patients may be treated “differently” and unfair...

  9. HACC: Simulating sky surveys on state-of-the-art supercomputing architectures

    Science.gov (United States)

    Habib, Salman; Pope, Adrian; Finkel, Hal; Frontiere, Nicholas; Heitmann, Katrin; Daniel, David; Fasel, Patricia; Morozov, Vitali; Zagaris, George; Peterka, Tom; Vishwanath, Venkatram; Lukić, Zarija; Sehrish, Saba; Liao, Wei-keng

    2016-01-01

    Current and future surveys of large-scale cosmic structure are associated with a massive and complex datastream to study, characterize, and ultimately understand the physics behind the two major components of the 'Dark Universe', dark energy and dark matter. In addition, the surveys also probe primordial perturbations and carry out fundamental measurements, such as determining the sum of neutrino masses. Large-scale simulations of structure formation in the Universe play a critical role in the interpretation of the data and extraction of the physics of interest. Just as survey instruments continue to grow in size and complexity, so do the supercomputers that enable these simulations. Here we report on HACC (Hardware/Hybrid Accelerated Cosmology Code), a recently developed and evolving cosmology N-body code framework, designed to run efficiently on diverse computing architectures and to scale to millions of cores and beyond. HACC can run on all current supercomputer architectures and supports a variety of programming models and algorithms. It has been demonstrated at scale on Cell- and GPU-accelerated systems, standard multi-core node clusters, and Blue Gene systems. HACC's design allows for ease of portability, and at the same time, high levels of sustained performance on the fastest supercomputers available. We present a description of the design philosophy of HACC, the underlying algorithms and code structure, and outline implementation details for several specific architectures. We show selected accuracy and performance results from some of the largest high resolution cosmological simulations so far performed, including benchmarks evolving more than 3.6 trillion particles.

  10. Massively Parallel Phase-Field Simulations for Ternary Eutectic Directional Solidification

    CERN Document Server

    Bauer, Martin; Steinmetz, Philipp; Jainta, Marcus; Berghoff, Marco; Schornbaum, Florian; Godenschwager, Christian; Köstler, Harald; Nestler, Britta; Rüde, Ulrich

    2015-01-01

    Microstructures forming during ternary eutectic directional solidification processes have significant influence on the macroscopic mechanical properties of metal alloys. For a realistic simulation, we use the well established thermodynamically consistent phase-field method and improve it with a new grand potential formulation to couple the concentration evolution. This extension is very compute intensive due to a temperature dependent diffusive concentration. We significantly extend previous simulations that have used simpler phase-field models or were performed on smaller domain sizes. The new method has been implemented within the massively parallel HPC framework waLBerla that is designed to exploit current supercomputers efficiently. We apply various optimization techniques, including buffering techniques, explicit SIMD kernel vectorization, and communication hiding. Simulations utilizing up to 262,144 cores have been run on three different supercomputing architectures and weak scalability results are show...

  11. Stop Harassment! Men’s reactions to victims’ confrontation

    Directory of Open Access Journals (Sweden)

    M. Carmen Herrera

    2014-07-01

    Full Text Available Sexual harassment is one of the most widespread forms of gender violence. Perceptions of sexual harassment depend on gender, context, the perceivers’ ideology, and a host of other factors. Research has underscored the importance of coping strategies in raising a victim’s self-confidence by making her feel that she plays an active role in overcoming her own problems. The aim of this study was to assess the men’s perceptions of sexual harassment in relation to different victim responses. The study involved 101 men who were administered a questionnaire focusing on two of the most frequent types of harassment (gender harassment vs. unwanted sexual attention and victim response (confrontation vs. non confrontation, both of which were manipulated. Moreover, the influences of ideological variables, ambivalent sexism, and the acceptance of myths of sexual harassment on perception were also assessed. The results highlight the complexities involved in recognizing certain behaviors as harassment and the implications of different victim responses to incidents of harassment. As the coping strategies used by women to confront harassment entail drawbacks that pose problems or hinder them, the design and implementation of prevention and/or education programs should strive to raise awareness among men and women to further their understanding of this construct.

  12. Development of the general interpolants method for the CYBER 200 series of supercomputers

    Science.gov (United States)

    Stalnaker, J. F.; Robinson, M. A.; Spradley, L. W.; Kurzius, S. C.; Thoenes, J.

    1988-01-01

    The General Interpolants Method (GIM) is a 3-D, time-dependent, hybrid procedure for generating numerical analogs of the conservation laws. This study is directed toward the development and application of the GIM computer code for fluid dynamic research applications as implemented for the Cyber 200 series of supercomputers. An elliptic and quasi-parabolic version of the GIM code are discussed. Turbulence models, algebraic and differential equations, were added to the basic viscous code. An equilibrium reacting chemistry model and an implicit finite difference scheme are also included.

  13. A New Hydrodynamic Model for Numerical Simulation of Interacting Galaxies on Intel Xeon Phi Supercomputers

    Science.gov (United States)

    Kulikov, Igor; Chernykh, Igor; Tutukov, Alexander

    2016-05-01

    This paper presents a new hydrodynamic model of interacting galaxies based on the joint solution of multicomponent hydrodynamic equations, first moments of the collisionless Boltzmann equation and the Poisson equation for gravity. Using this model, it is possible to formulate a unified numerical method for solving hyperbolic equations. This numerical method has been implemented for hybrid supercomputers with Intel Xeon Phi accelerators. The collision of spiral and disk galaxies considering the star formation process, supernova feedback and molecular hydrogen formation is shown as a simulation result.

  14. Scheduling Supercomputers.

    Science.gov (United States)

    1983-02-01

    no task is scheduled with overlap. Let numpi be the total number of preemptions and idle slots of size at most to that are introduced. We see that if...no usable block remains on Qm-*, then numpi < m-k. Otherwise, numpi ! m-k-1. If j>n when this procedure terminates, then all tasks have been scheduled

  15. Grassroots Supercomputing

    CERN Multimedia

    Buchanan, Mark

    2005-01-01

    What started out as a way for SETI to plow through its piles or radio-signal data from deep space has turned into a powerful research tool as computer users acrosse the globe donate their screen-saver time to projects as diverse as climate-change prediction, gravitational-wave searches, and protein folding (4 pages)

  16. Parallel Programming with Intel Parallel Studio XE

    CERN Document Server

    Blair-Chappell , Stephen

    2012-01-01

    Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the

  17. Parallel processing of atmospheric chemistry calculations: Preliminary considerations

    Energy Technology Data Exchange (ETDEWEB)

    Elliott, S.; Jones, P.

    1995-01-01

    Global climate calculations are already saturating the class modern vector supercomputers with only a few central processing units. Increased resolution and inclusion of routines to deal with biogeochemical portions of the terrestrial climate system will soon demand massively parallel approaches. The atmospheric photochemistry ensemble is intimately linked to climate through the trace greenhouse gases ozone and methane and modules for representing it are being attached to global three dimensional transport and GCM frameworks. Atmospheric kinetics involve dozens of highly interactive tracers and so will accentuate the need for parallel processing of earth system simulations. In the present text we lay some of the groundwork for addition of atmospheric kinetics packages to GCM and global scale atmospheric models on multiply parallel computers. The discussion is tailored for consumption by the photochemical modelling community. After a review of numerical atmospheric chemistry methods, we examine how kinetics can be implemented on a parallel computer. We concentrate especially on data layout and flexibility and how these can be implemented in various programming models. We conclude that chemistry can be implemented rather easily within existing frameworks of several parallel atmospheric models. However, memory limitations may preclude high resolution studies of global chemistry.

  18. A parallel implementation of kriging with a trend

    Energy Technology Data Exchange (ETDEWEB)

    Gajraj, A.; Joubert, W.; Jones, J.

    1997-11-01

    This paper describes the parallelization of the GSLIB ktb3dm code. The code is parallelized using the message passing paradigm, Parallel Virtual Machine (PVM), under a Multiple Instructions, Multiple Data (MIMD) architecture. The code performance is analyzed using different grid sizes of 5x5x1, 50x50x1, 100x100x1 and 500x500x1 with 1, 2, 4, 8 and in some cases 16 processors on the Cray T3D supercomputer. The parallelization effort focused on the main kriging do loop. The results confirm that there is a substantial benefit to be derived in terms of CPU time savings (or execution speed) by using the parallel version of the code, especially when considering larger grids. Additionally, speed-up and scalability analyses show that actual speed-up is close to theoretical, while the code scales appropriately within the 1 to 16 processor range tested. The kriging of a quarter-million grid cell system fell from over 9 CPU minutes on one Cray T3D processor to about 1.25 CPU minutes on 16 processors on the same machine.

  19. Parallelism in computations in quantum and statistical mechanics

    Science.gov (United States)

    Clementi, E.; Corongiu, G.; Detrich, J. H.

    1985-07-01

    Often very fundamental biochemical and biophysical problems defy simulations because of limitations in today's computers. We present and discuss a distributed system composed of two IBM 4341s and/or an IBM 4381 as front-end processors and ten FPS-164 attached array processors. This parallel system - called LCAP - has presently a peak performance of about 110 Mflops; extensions to higher performance are discussed. Presently, the system applications use a modified version of VM/SP as the operating system: description of the modifications is given. Three applications programs have been migrated from sequential to parallel: a molecular quantum mechanical, a Metropolis-Monte Carlo and a molecular dynamics program. Descriptions of the parallel codes are briefly outlined. Use of these parallel codes has already opened up new capabilities for our research. The very positive performance comparisons with today's supercomputers allow us to conclude that parallel computers and programming, of the type we have considered, represent a pragmatic answer to many computationally intensive problems.

  20. 大规模并行时域有限差分法电磁计算研究%Study on Large-Scale Parallel Finite-Difference Time-Domain Method for Electromagnetic Computation

    Institute of Scientific and Technical Information of China (English)

    江树刚; 张玉; 赵勋旺

    2015-01-01

    基于我国超级计算机平台,开展了大规模并行时域有限差分法(Finite-Difference Time-Domain FDTD)的性能和应用研究.在我国首台百万亿次"魔方"超级计算机、具有国产CPU的"神威蓝光"超级计算机和当前排名世界第一的"天河二号"超级计算机上就并行FDTD方法的并行性能进行了测试,并分别突破了10000 CPU核,100000 CPU核和300000 CPU核的并行规模.在不同测试规模下,该算法的并行效率均达到了50%以上,表明了本文并行算法具有良好的可扩展性.通过仿真分析多个微带天线阵的辐射特性和某大型飞机的散射特性,表明本文方法可以在不同架构的超级计算机上对复杂电磁问题进行精确高效电磁仿真.%The study of performance and applications of the large-scale parallel Finite-Difference Time-Domain (FDTD) method is carried out on the China-made supercomputer platforms. The parallel efficiency is tested on Magic Cube supercomputer that ranked the 10th fastest supercomputer in the world in Nov. 2008, Sunway BlueLight MPP supercomputer that is the first large scale parallel supercomputer with China-own-made CPUs, and Tianhe-2 supercomputer that ranked the 1st in the world currently, and the algorithm is implemented on the three supercomputers using maximum number of CPU cores of 10000, 100000 and 300000, respectively. The parallel efficiency reaches up to 50%, which indicates a good scalability of the method in this paper. Multiple microstrip antenna arrays and a large airplane are computed to demonstrate that the parallel FDTD can be applied for accurate and efficient simulation of complicated electromagnetic problems on supercomputer with different architectures.

  1. Feynman diagrams sampling for quantum field theories on the QPACE 2 supercomputer

    Energy Technology Data Exchange (ETDEWEB)

    Rappl, Florian

    2016-08-01

    This work discusses the application of Feynman diagram sampling in quantum field theories. The method uses a computer simulation to sample the diagrammatic space obtained in a series expansion. For running large physical simulations powerful computers are obligatory, effectively splitting the thesis in two parts. The first part deals with the method of Feynman diagram sampling. Here the theoretical background of the method itself is discussed. Additionally, important statistical concepts and the theory of the strong force, quantum chromodynamics, are introduced. This sets the context of the simulations. We create and evaluate a variety of models to estimate the applicability of diagrammatic methods. The method is then applied to sample the perturbative expansion of the vertex correction. In the end we obtain the value for the anomalous magnetic moment of the electron. The second part looks at the QPACE 2 supercomputer. This includes a short introduction to supercomputers in general, as well as a closer look at the architecture and the cooling system of QPACE 2. Guiding benchmarks of the InfiniBand network are presented. At the core of this part, a collection of best practices and useful programming concepts are outlined, which enables the development of efficient, yet easily portable, applications for the QPACE 2 system.

  2. PREFACE: HITES 2012: 'Horizons of Innovative Theories, Experiments, and Supercomputing in Nuclear Physics'

    Science.gov (United States)

    Hecht, K. T.

    2012-12-01

    This volume contains the contributions of the speakers of an international conference in honor of Jerry Draayer's 70th birthday, entitled 'Horizons of Innovative Theories, Experiments and Supercomputing in Nuclear Physics'. The list of contributors includes not only international experts in these fields, but also many former collaborators, former graduate students, and former postdoctoral fellows of Jerry Draayer, stressing innovative theories such as special symmetries and supercomputing, both of particular interest to Jerry. The organizers of the conference intended to honor Jerry Draayer not only for his seminal contributions in these fields, but also for his administrative skills at departmental, university, national and international level. Signed: Ted Hecht University of Michigan Conference photograph Scientific Advisory Committee Ani AprahamianUniversity of Notre Dame Baha BalantekinUniversity of Wisconsin Bruce BarrettUniversity of Arizona Umit CatalyurekOhio State Unversity David DeanOak Ridge National Laboratory Jutta Escher (Chair)Lawrence Livermore National Laboratory Jorge HirschUNAM, Mexico David RoweUniversity of Toronto Brad Sherill & Michigan State University Joel TohlineLouisiana State University Edward ZganjarLousiana State University Organizing Committee Jeff BlackmonLouisiana State University Mark CaprioUniversity of Notre Dame Tomas DytrychLouisiana State University Ana GeorgievaINRNE, Bulgaria Kristina Launey (Co-chair)Louisiana State University Gabriella PopaOhio University Zanesville James Vary (Co-chair)Iowa State University Local Organizing Committee Laura LinhardtLouisiana State University Charlie RascoLouisiana State University Karen Richard (Coordinator)Louisiana State University

  3. Graph visualization for the analysis of the structure and dynamics of extreme-scale supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Berkbigler, K. P. (Kathryn P.); Bush, B. W. (Brian W.); Davis, Kei,; Hoisie, A. (Adolfy); Smith, S. A. (Steve A.)

    2002-01-01

    We are exploring the development and application of information visualization techniques for the analysis of new extreme-scale supercomputer architectures. Modern supercomputers typically comprise very large clusters of commodity SMPs interconnected by possibly dense and often nonstandard networks. The scale, complexity, and inherent nonlocality of the structure and dynamics of this hardware, and the systems and applications distributed over it, challenge traditional analysis methods. As part of the a la carte team at Los Alamos National Laboratory, who are simulating these advanced architectures, we are exploring advanced visualization techniques and creating tools to provide intuitive exploration, discovery, and analysis of these simulations. This work complements existing and emerging algorithmic analysis tools. Here we gives background on the problem domain, a description of a prototypical computer architecture of interest (on the order of 10,000 processors connected by a quaternary fat-tree network), and presentations of several visualizations of the simulation data that make clear the flow of data in the interconnection network.

  4. Groundwater cooling of a supercomputer in Perth, Western Australia: hydrogeological simulations and thermal sustainability

    Science.gov (United States)

    Sheldon, Heather A.; Schaubs, Peter M.; Rachakonda, Praveen K.; Trefry, Michael G.; Reid, Lynn B.; Lester, Daniel R.; Metcalfe, Guy; Poulet, Thomas; Regenauer-Lieb, Klaus

    2015-12-01

    Groundwater cooling (GWC) is a sustainable alternative to conventional cooling technologies for supercomputers. A GWC system has been implemented for the Pawsey Supercomputing Centre in Perth, Western Australia. Groundwater is extracted from the Mullaloo Aquifer at 20.8 °C and passes through a heat exchanger before returning to the same aquifer. Hydrogeological simulations of the GWC system were used to assess its performance and sustainability. Simulations were run with cooling capacities of 0.5 or 2.5 Mega Watts thermal (MWth), with scenarios representing various combinations of pumping rate, injection temperature and hydrogeological parameter values. The simulated system generates a thermal plume in the Mullaloo Aquifer and overlying Superficial Aquifer. Thermal breakthrough (transfer of heat from injection to production wells) occurred in 2.7-4.3 years for a 2.5 MWth system. Shielding (reinjection of cool groundwater between the injection and production wells) resulted in earlier thermal breakthrough but reduced the rate of temperature increase after breakthrough, such that shielding was beneficial after approximately 5 years pumping. Increasing injection temperature was preferable to increasing flow rate for maintaining cooling capacity after thermal breakthrough. Thermal impacts on existing wells were small, with up to 10 wells experiencing a temperature increase ≥ 0.1 °C (largest increase 6 °C).

  5. Visualization at Supercomputing Centers: The Tale of Little Big Iron and the Three Skinny Guys

    Energy Technology Data Exchange (ETDEWEB)

    Bethel, E. Wes; van Rosendale, John; Southard, Dale; Gaither, Kelly; Childs, Hank; Brugger, Eric; Ahern, Sean

    2010-12-01

    Supercomputing Centers (SC's) are unique resources that aim to enable scientific knowledge discovery through the use of large computational resources, the Big Iron. Design, acquisition, installation, and management of the Big Iron are activities that are carefully planned and monitored. Since these Big Iron systems produce a tsunami of data, it is natural to co-locate visualization and analysis infrastructure as part of the same facility. This infrastructure consists of hardware (Little Iron) and staff (Skinny Guys). Our collective experience suggests that design, acquisition, installation, and management of the Little Iron and Skinny Guys does not receive the same level of treatment as that of the Big Iron. The main focus of this article is to explore different aspects of planning, designing, fielding, and maintaining the visualization and analysis infrastructure at supercomputing centers. Some of the questions we explore in this article include:"How should the Little Iron be sized to adequately support visualization and analysis of data coming off the Big Iron?" What sort of capabilities does it need to have?" Related questions concern the size of visualization support staff:"How big should a visualization program be (number of persons) and what should the staff do?" and"How much of the visualization should be provided as a support service, and how much should applications scientists be expected to do on their own?"

  6. Frequently updated noise threat maps created with use of supercomputing grid

    Directory of Open Access Journals (Sweden)

    Szczodrak Maciej

    2014-09-01

    Full Text Available An innovative supercomputing grid services devoted to noise threat evaluation were presented. The services described in this paper concern two issues, first is related to the noise mapping, while the second one focuses on assessment of the noise dose and its influence on the human hearing system. The discussed serviceswere developed within the PL-Grid Plus Infrastructure which accumulates Polish academic supercomputer centers. Selected experimental results achieved by the usage of the services proposed were presented. The assessment of the environmental noise threats includes creation of the noise maps using either ofline or online data, acquired through a grid of the monitoring stations. A concept of estimation of the source model parameters based on the measured sound level for the purpose of creating frequently updated noise maps was presented. Connecting the noise mapping grid service with a distributed sensor network enables to automatically update noise maps for a specified time period. Moreover, a unique attribute of the developed software is the estimation of the auditory effects evoked by the exposure to noise. The estimation method uses a modified psychoacoustic model of hearing and is based on the calculated noise level values and on the given exposure period. Potential use scenarios of the grid services for research or educational purpose were introduced. Presentation of the results of predicted hearing threshold shift caused by exposure to excessive noise can raise the public awareness of the noise threats.

  7. Supercomputer Assisted Generation of Machine Learning Agents for the Calibration of Building Energy Models

    Energy Technology Data Exchange (ETDEWEB)

    Sanyal, Jibonananda [ORNL; New, Joshua Ryan [ORNL; Edwards, Richard [ORNL

    2013-01-01

    Building Energy Modeling (BEM) is an approach to model the energy usage in buildings for design and retrot pur- poses. EnergyPlus is the agship Department of Energy software that performs BEM for dierent types of buildings. The input to EnergyPlus can often extend in the order of a few thousand parameters which have to be calibrated manu- ally by an expert for realistic energy modeling. This makes it challenging and expensive thereby making building en- ergy modeling unfeasible for smaller projects. In this paper, we describe the \\Autotune" research which employs machine learning algorithms to generate agents for the dierent kinds of standard reference buildings in the U.S. building stock. The parametric space and the variety of building locations and types make this a challenging computational problem necessitating the use of supercomputers. Millions of En- ergyPlus simulations are run on supercomputers which are subsequently used to train machine learning algorithms to generate agents. These agents, once created, can then run in a fraction of the time thereby allowing cost-eective cali- bration of building models.

  8. Federal Market Information Technology in the Post Flash Crash Era: Roles for Supercomputing

    Energy Technology Data Exchange (ETDEWEB)

    Bethel, E. Wes; Leinweber, David; Ruebel, Oliver; Wu, Kesheng

    2011-09-16

    This paper describes collaborative work between active traders, regulators, economists, and supercomputing researchers to replicate and extend investigations of the Flash Crash and other market anomalies in a National Laboratory HPC environment. Our work suggests that supercomputing tools and methods will be valuable to market regulators in achieving the goal of market safety, stability, and security. Research results using high frequency data and analytics are described, and directions for future development are discussed. Currently the key mechanism for preventing catastrophic market action are “circuit breakers.” We believe a more graduated approach, similar to the “yellow light” approach in motorsports to slow down traffic, might be a better way to achieve the same goal. To enable this objective, we study a number of indicators that could foresee hazards in market conditions and explore options to confirm such predictions. Our tests confirm that Volume Synchronized Probability of Informed Trading (VPIN) and a version of volume Herfindahl-Hirschman Index (HHI) for measuring market fragmentation can indeed give strong signals ahead of the Flash Crash event on May 6 2010. This is a preliminary step toward a full-fledged early-warning system for unusual market conditions.

  9. Architecture, implementation and parallelization of the software to search for periodic gravitational wave signals

    CERN Document Server

    Poghosyan, Gevorg; Streit, Achim; Bejger, Michał; Królak, Andrzej

    2014-01-01

    The parallelization, design and scalability of the \\sky code to search for periodic gravitational waves from rotating neutron stars is discussed. The code is based on an efficient implementation of the F-statistic using the Fast Fourier Transform algorithm. To perform an analysis of data from the advanced LIGO and Virgo gravitational wave detectors' network, which will start operating in 2015, hundreds of millions of CPU hours will be required - the code utilizing the potential of massively parallel supercomputers is therefore mandatory. We have parallelized the code using the Message Passing Interface standard, implemented a mechanism for combining the searches at different sky-positions and frequency bands into one extremely scalable program. The parallel I/O interface is used to escape bottlenecks, when writing the generated data into file system. This allowed to develop a highly scalable computation code, which would enable the data analysis at large scales on acceptable time scales. Benchmarking of the c...

  10. A Solver for Massively Parallel Direct Numerical Simulation of Three-Dimensional Multiphase Flows

    CERN Document Server

    Shin, S; Juric, D

    2014-01-01

    We present a new solver for massively parallel simulations of fully three-dimensional multiphase flows. The solver runs on a variety of computer architectures from laptops to supercomputers and on 65536 threads or more (limited only by the availability to us of more threads). The code is wholly written by the authors in Fortran 2003 and uses a domain decomposition strategy for parallelization with MPI. The fluid interface solver is based on a parallel implementation of the LCRM hybrid Front Tracking/Level Set method designed to handle highly deforming interfaces with complex topology changes. We discuss the implementation of this interface method and its particular suitability to distributed processing where all operations are carried out locally on distributed subdomains. We have developed parallel GMRES and Multigrid iterative solvers suited to the linear systems arising from the implicit solution of the fluid velocities and pressure in the presence of strong density and viscosity discontinuities across flu...

  11. The Optimization of a Shaped-Charge Design Using Parallel Computers

    Energy Technology Data Exchange (ETDEWEB)

    GARDNER,DAVID R.; VAUGHAN,COURTENAY T.

    1999-11-01

    Current supercomputers use large parallel arrays of tightly coupled processors to achieve levels of performance far surpassing conventional vector supercomputers. Shock-wave physics codes have been developed for these new supercomputers at Sandia National Laboratories and elsewhere. These parallel codes run fast enough on many simulations to consider using them to study the effects of varying design parameters on the performance of models of conventional munitions and other complex systems. Such studies maybe directed by optimization software to improve the performance of the modeled system. Using a shaped-charge jet design as an archetypal test case and the CTH parallel shock-wave physics code controlled by the Dakota optimization software, we explored the use of automatic optimization tools to optimize the design for conventional munitions. We used a scheme in which a lower resolution computational mesh was used to identify candidate optimal solutions and then these were verified using a higher resolution mesh. We identified three optimal solutions for the model and a region of the design domain where the jet tip speed is nearly optimal, indicating the possibility of a robust design. Based on this study we identified some of the difficulties in using high-fidelity models with optimization software to develop improved designs. These include developing robust algorithms for the objective function and constraints and mitigating the effects of numerical noise in them. We conclude that optimization software running high-fidelity models of physical systems using parallel shock wave physics codes to find improved designs can be a valuable tool for designers. While current state of algorithm and software development does not permit routine, ''black box'' optimization of designs, the effort involved in using the existing tools may well be worth the improvement achieved in designs.

  12. CaKernel – A Parallel Application Programming Framework for Heterogenous Computing Architectures

    Directory of Open Access Journals (Sweden)

    Marek Blazewicz

    2011-01-01

    Full Text Available With the recent advent of new heterogeneous computing architectures there is still a lack of parallel problem solving environments that can help scientists to use easily and efficiently hybrid supercomputers. Many scientific simulations that use structured grids to solve partial differential equations in fact rely on stencil computations. Stencil computations have become crucial in solving many challenging problems in various domains, e.g., engineering or physics. Although many parallel stencil computing approaches have been proposed, in most cases they solve only particular problems. As a result, scientists are struggling when it comes to the subject of implementing a new stencil-based simulation, especially on high performance hybrid supercomputers. In response to the presented need we extend our previous work on a parallel programming framework for CUDA – CaCUDA that now supports OpenCL. We present CaKernel – a tool that simplifies the development of parallel scientific applications on hybrid systems. CaKernel is built on the highly scalable and portable Cactus framework. In the CaKernel framework, Cactus manages the inter-process communication via MPI while CaKernel manages the code running on Graphics Processing Units (GPUs and interactions between them. As a non-trivial test case we have developed a 3D CFD code to demonstrate the performance and scalability of the automatically generated code.

  13. A Parallel Implementation of Unscheduled Flow Control in Interconnected Power Systems

    Directory of Open Access Journals (Sweden)

    G. Ozdemir Dag

    2012-01-01

    Full Text Available The unscheduled power flow problem needs to be minimized or controlled as soon as possible in a deregulated power system since the transmission systems are mostly operated at their power-carrying limits or very close to it. The time spent for simulations to determine the current states of all the system and control variables of the interconnected power system is important. Taking necessary action in case of any failure of equipment or any other occurrence of an undesired situation could be critical. Using supercomputing facilities and parallel computing techniques together decreases the computation time greatly. In this study, a parallel implementation of a multiobjective optimization approach based on both genetic algorithms and fuzzy decision making to manage unscheduled flows is presented. Parallel computation techniques are applied using supercomputers (high-performance computers. The proposed method is applied to the IEEE 300 bus test system. Two different cases for some parameters of GA are considered to see the power of parallel computation technique. Then the simulation results are presented.

  14. MANAGERIAL PROBLEMS CONFRONTED BY EXECUTIVE CHEFS IN HOTELS

    Directory of Open Access Journals (Sweden)

    Kemal BIRDIR

    2014-07-01

    Full Text Available The study was conducted to determine the managerial problems confronted by executive chefs working at 4 and 5-star hotels in Turkey. A survey developed by the researchers was employed as a data collection tool. Answers given by participants were analyzed using “T-test” and “ANOVA” analyses in order to determine whether there are significant differences of opinion on the subject (collated in answers to the survey questionnaire amongst executive chefs, based on answers given by them (expressed as average figures dependent upon such variables as their “Age”, “Gender”, “Educational Status” and “Star status of the hotel within which they worked.” The study results showed that the most important problem confronting executive chefs was “finding educated/trained kitchen personnel.” On the specific problem, “responsibility and authority is not clear within the kitchen,” there was a significant difference of opinion by the gender of the executive chefs. Moreover, there was a significant difference of opinion dependent upon the star status of the hotels within which the chefs worked on the problem of whether or not “the working hours of kitchen personnel were too long.” The findings suggest that there are important problems confronted by executive chefs. Moreover, male and female executive chefs have different opinions on the magnitudes of some specific problems. Whereas there are various reports and similar publications discussing problems faced by executive chefs, the present study is the first one in the literature that solely explore the managerial problems experienced at a kitchen context.

  15. Computing Parallelism in Discourse

    CERN Document Server

    Gardent, C; Gardent, Claire; Kohlhase, Michael

    1997-01-01

    Although much has been said about parallelism in discourse, a formal, computational theory of parallelism structure is still outstanding. In this paper, we present a theory which given two parallel utterances predicts which are the parallel elements. The theory consists of a sorted, higher-order abductive calculus and we show that it reconciles the insights of discourse theories of parallelism with those of Higher-Order Unification approaches to discourse semantics, thereby providing a natural framework in which to capture the effect of parallelism on discourse semantics.

  16. Parallel sorting algorithms

    CERN Document Server

    Akl, Selim G

    1985-01-01

    Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the

  17. Practical parallel computing

    CERN Document Server

    Morse, H Stephen

    1994-01-01

    Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi

  18. Confronting the Consequences of a Permanent Changing Environment

    Directory of Open Access Journals (Sweden)

    Raluca Ioana Vosloban

    2013-05-01

    Full Text Available Businesses and governments choose how they wish to deal with change. Whether this change is organizational, technological, political, financial etc or even individual pursuing actions as usual is likely to lead to a downward path. The authors of this paper are giving a set of tools for confronting and understanding the consequences of this era of permanent changes by building strengths and seeking opportunities within organizations (private or public and within family (including friends. The work environment and the personal life of the individual have a common point which is adaptability, coping efficiently with changes, a demanded ability of the 3rd millennium human being.

  19. Political humor as a confrontational tool against the syrian regime

    OpenAIRE

    Camps-Febrer, Blanca

    2012-01-01

    The aim of this working paper is to analyze the inclusion of political humor into the set of actions used by opponents to the Syrian regime during the first year of a state-wide uprising in 2011. The research ar-gues that although political humor has traditionally been seen mainly as a concealed voice against dominant elites, it can nevertheless take a confrontational stance and challenge a regime. In this paper we assess the role of political humor in challenging the legitimacy of the Syrian...

  20. Confronting Uncertainty in Life Cycle Assessment Used for Decision Support

    DEFF Research Database (Denmark)

    Herrmann, Ivan Tengbjerg; Hauschild, Michael Zwicky; Sohn, Michael D.

    2014-01-01

    The aim of this article is to help confront uncertainty in life cycle assessments (LCAs) used for decision support. LCAs offer a quantitative approach to assess environmental effects of products, technologies, and services and are conducted by an LCA practitioner or analyst (AN) to support...... be described as a variance simulation based on individual data points used in an LCA. This article develops and proposes a taxonomy for LCAs based on extensive research in the LCA, management, and economic literature. This taxonomy can be used ex ante to support planning and communication between an AN and DM...

  1. Federal Council on Science, Engineering and Technology: Committee on Computer Research and Applications, Subcommittee on Science and Engineering Computing: The US Supercomputer Industry

    Energy Technology Data Exchange (ETDEWEB)

    1987-12-01

    The Federal Coordinating Council on Science, Engineering, and Technology (FCCSET) Committee on Supercomputing was chartered by the Director of the Office of Science and Technology Policy in 1982 to examine the status of supercomputing in the United States and to recommend a role for the Federal Government in the development of this technology. In this study, the FCCSET Committee (now called the Subcommittee on Science and Engineering Computing of the FCCSET Committee on Computer Research and Applications) reports on the status of the supercomputer industry and addresses changes that have occured since issuance of the 1983 and 1985 reports. The review based upon periodic meetings with and site visits to supercomputer manufacturers and consultation with experts in high performance scientific computing. White papers have been contributed to this report by industry leaders and supercomputer experts.

  2. QCMPI: A parallel environment for quantum computing

    Science.gov (United States)

    Tabakin, Frank; Juliá-Díaz, Bruno

    2009-06-01

    :http://cpc.cs.qub.ac.uk/summaries/AECS_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 4866 No. of bytes in distributed program, including test data, etc.: 42 114 Distribution format: tar.gz Programming language: Fortran 90 and MPI Computer: Any system that supports Fortran 90 and MPI Operating system: developed and tested at the Pittsburgh Supercomputer Center, at the Barcelona Supercomputer (BSC/CNS) and on multi-processor Macs and PCs. For cases where distributed density matrix evaluation is invoked, the BLACS and SCALAPACK packages are needed. Has the code been vectorized or parallelized?: Yes Classification: 4.15 External routines: LAPACK, SCALAPACK, BLACS Nature of problem: Analysis of quantum computation algorithms and the effects of noise. Solution method: A Fortran 90/MPI package is provided that contains modular commands to create and analyze quantum circuits. Shor's factorization and Grover's search algorithms are explained in detail. Procedures for distributing state vector amplitudes over processors and for solving concurrent (multiverse) cases with noise effects are implemented. Density matrix and entropy evaluations are provided in both single and parallel versions. Running time: Test run takes less than 1 minute using 2 processors.

  3. The Lived Experience of Iranian Women Confronting Breast Cancer Diagnosis

    Directory of Open Access Journals (Sweden)

    Esmat Mehrabi

    2016-03-01

    Full Text Available Introduction: The populations who survive from breast cancer are growing; nevertheless, they mostly encounter with many cancer related problems in their life, especially after early diagnosis and have to deal with these problems. Except for the disease entity, several socio-cultural factors may affect confronting this challenge among patients and the way they deal with. Present study was carried out to prepare clear understanding of Iranian women's lived experiences confronting breast cancer diagnosis and coping ways they applied to deal with it. Methods: This study was carried out by using qualitative phenomenological design. Data gathering was done through purposive sampling using semi-structured, in-depth interviews with 18 women who survived from breast cancer. The transcribed interviews were analyzed using Van Manen’s thematic analysis approach. Results: Two main themes were emerged from the interviews including "emotional turbulence" and "threat control". The first, comprised three sub themes including uncertainty, perceived worries, and living with fears. The second included risk control, recurrence control, immediate seeking help, seeking support and resource to spirituality. Conclusion: Emotional response was the immediate reflection to cancer diagnosis. However, during post-treatment period a variety of emotions were not uncommon findings, patients' perceptions have been changing along the time and problem-focused coping strategies have replaced. Although women may experience a degree of improvement and adjustment with illness, the emotional problems are not necessarily resolved, they may continue and gradually engender positive outcomes.

  4. Parallel processing ITS

    Energy Technology Data Exchange (ETDEWEB)

    Fan, W.C.; Halbleib, J.A. Sr.

    1996-09-01

    This report provides a users` guide for parallel processing ITS on a UNIX workstation network, a shared-memory multiprocessor or a massively-parallel processor. The parallelized version of ITS is based on a master/slave model with message passing. Parallel issues such as random number generation, load balancing, and communication software are briefly discussed. Timing results for example problems are presented for demonstration purposes.

  5. Introduction to parallel programming

    CERN Document Server

    Brawer, Steven

    1989-01-01

    Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race

  6. Developing Parallel Programs

    Directory of Open Access Journals (Sweden)

    Ranjan Sen

    2012-09-01

    Full Text Available Parallel programming is an extension of sequential programming; today, it is becoming the mainstream paradigm in day-to-day information processing. Its aim is to build the fastest programs on parallel computers. The methodologies for developing a parallelprogram can be put into integrated frameworks. Development focuses on algorithm, languages, and how the program is deployed on the parallel computer.

  7. Toward an automated parallel computing environment for geosciences

    Science.gov (United States)

    Zhang, Huai; Liu, Mian; Shi, Yaolin; Yuen, David A.; Yan, Zhenzhen; Liang, Guoping

    2007-08-01

    Software for geodynamic modeling has not kept up with the fast growing computing hardware and network resources. In the past decade supercomputing power has become available to most researchers in the form of affordable Beowulf clusters and other parallel computer platforms. However, to take full advantage of such computing power requires developing parallel algorithms and associated software, a task that is often too daunting for geoscience modelers whose main expertise is in geosciences. We introduce here an automated parallel computing environment built on open-source algorithms and libraries. Users interact with this computing environment by specifying the partial differential equations, solvers, and model-specific properties using an English-like modeling language in the input files. The system then automatically generates the finite element codes that can be run on distributed or shared memory parallel machines. This system is dynamic and flexible, allowing users to address different problems in geosciences. It is capable of providing web-based services, enabling users to generate source codes online. This unique feature will facilitate high-performance computing to be integrated with distributed data grids in the emerging cyber-infrastructures for geosciences. In this paper we discuss the principles of this automated modeling environment and provide examples to demonstrate its versatility.

  8. Diskless supercomputers: Scalable, reliable I/O for the Tera-Op technology base

    Science.gov (United States)

    Katz, Randy H.; Ousterhout, John K.; Patterson, David A.

    1993-01-01

    Computing is seeing an unprecedented improvement in performance; over the last five years there has been an order-of-magnitude improvement in the speeds of workstation CPU's. At least another order of magnitude seems likely in the next five years, to machines with 500 MIPS or more. The goal of the ARPA Teraop program is to realize even larger, more powerful machines, executing as many as a trillion operations per second. Unfortunately, we have seen no comparable breakthroughs in I/O performance; the speeds of I/O devices and the hardware and software architectures for managing them have not changed substantially in many years. We have completed a program of research to demonstrate hardware and software I/O architectures capable of supporting the kinds of internetworked 'visualization' workstations and supercomputers that will appear in the mid 1990s. The project had three overall goals: high performance, high reliability, and scalable, multipurpose system.

  9. Large-scale integrated super-computing platform for next generation virtual drug discovery.

    Science.gov (United States)

    Mitchell, Wayne; Matsumoto, Shunji

    2011-08-01

    Traditional drug discovery starts by experimentally screening chemical libraries to find hit compounds that bind to protein targets, modulating their activity. Subsequent rounds of iterative chemical derivitization and rescreening are conducted to enhance the potency, selectivity, and pharmacological properties of hit compounds. Although computational docking of ligands to targets has been used to augment the empirical discovery process, its historical effectiveness has been limited because of the poor correlation of ligand dock scores and experimentally determined binding constants. Recent progress in super-computing, coupled to theoretical insights, allows the calculation of the Gibbs free energy, and therefore accurate binding constants, for usually large ligand-receptor systems. This advance extends the potential of virtual drug discovery. A specific embodiment of the technology, integrating de novo, abstract fragment based drug design, sophisticated molecular simulation, and the ability to calculate thermodynamic binding constants with unprecedented accuracy, are discussed. Copyright © 2011 Elsevier Ltd. All rights reserved.

  10. Reliability Lessons Learned From GPU Experience With The Titan Supercomputer at Oak Ridge Leadership Computing Facility

    Energy Technology Data Exchange (ETDEWEB)

    Gallarno, George [Christian Brothers University; Rogers, James H [ORNL; Maxwell, Don E [ORNL

    2015-01-01

    The high computational capability of graphics processing units (GPUs) is enabling and driving the scientific discovery process at large-scale. The world s second fastest supercomputer for open science, Titan, has more than 18,000 GPUs that computational scientists use to perform scientific simu- lations and data analysis. Understanding of GPU reliability characteristics, however, is still in its nascent stage since GPUs have only recently been deployed at large-scale. This paper presents a detailed study of GPU errors and their impact on system operations and applications, describing experiences with the 18,688 GPUs on the Titan supercom- puter as well as lessons learned in the process of efficient operation of GPUs at scale. These experiences are helpful to HPC sites which already have large-scale GPU clusters or plan to deploy GPUs in the future.

  11. Operational numerical weather prediction on a GPU-accelerated cluster supercomputer

    Science.gov (United States)

    Lapillonne, Xavier; Fuhrer, Oliver; Spörri, Pascal; Osuna, Carlos; Walser, André; Arteaga, Andrea; Gysi, Tobias; Rüdisühli, Stefan; Osterried, Katherine; Schulthess, Thomas

    2016-04-01

    The local area weather prediction model COSMO is used at MeteoSwiss to provide high resolution numerical weather predictions over the Alpine region. In order to benefit from the latest developments in computer technology the model was optimized and adapted to run on Graphical Processing Units (GPUs). Thanks to these model adaptations and the acquisition of a dedicated hybrid supercomputer a new set of operational applications have been introduced, COSMO-1 (1 km deterministic), COSMO-E (2 km ensemble) and KENDA (data assimilation) at MeteoSwiss. These new applications correspond to an increase of a factor 40x in terms of computational load as compared to the previous operational setup. We present an overview of the porting approach of the COSMO model to GPUs together with a detailed description of and performance results on the new hybrid Cray CS-Storm computer, Piz Kesch.

  12. Mixed precision numerical weather prediction on hybrid GPU-CPU supercomputers

    Science.gov (United States)

    Lapillonne, Xavier; Osuna, Carlos; Spoerri, Pascal; Osterried, Katherine; Charpilloz, Christophe; Fuhrer, Oliver

    2017-04-01

    A new version of the climate and weather model COSMO that runs faster on traditional high performance computing systems with CPUs as well as on heterogeneous architectures using graphics processing units (GPUs) has been developed. The model was in addition adapted to be able to run in "single precision" mode. After discussing the key changes introduced in this new model version and the tools used in the porting approach, we present 3 applications, namely the MeteoSwiss operational weather prediction system, COSMO-LEPS and the CALMO project, which already take advantage of the performance improvement, up to a factor 4, by running on GPU system and using the single precision mode. We discuss how the code changes open new perspectives for scientific research and can enable researchers to get access to a new class of supercomputers.

  13. Palacios and Kitten : high performance operating systems for scalable virtualized and native supercomputing.

    Energy Technology Data Exchange (ETDEWEB)

    Widener, Patrick (University of New Mexico); Jaconette, Steven (Northwestern University); Bridges, Patrick G. (University of New Mexico); Xia, Lei (Northwestern University); Dinda, Peter (Northwestern University); Cui, Zheng.; Lange, John (Northwestern University); Hudson, Trammell B.; Levenhagen, Michael J.; Pedretti, Kevin Thomas Tauke; Brightwell, Ronald Brian

    2009-09-01

    Palacios and Kitten are new open source tools that enable applications, whether ported or not, to achieve scalable high performance on large machines. They provide a thin layer over the hardware to support both full-featured virtualized environments and native code bases. Kitten is an OS under development at Sandia that implements a lightweight kernel architecture to provide predictable behavior and increased flexibility on large machines, while also providing Linux binary compatibility. Palacios is a VMM that is under development at Northwestern University and the University of New Mexico. Palacios, which can be embedded into Kitten and other OSes, supports existing, unmodified applications and operating systems by using virtualization that leverages hardware technologies. We describe the design and implementation of both Kitten and Palacios. Our benchmarks show that they provide near native, scalable performance. Palacios and Kitten provide an incremental path to using supercomputer resources that is not performance-compromised.

  14. Dawning Nebulae: A PetaFLOPS Supercomputer with a Heterogeneous Structure

    Institute of Scientific and Technical Information of China (English)

    Ning-Hui Sun; Jing Xing; Zhi-Gang Huo; Guang-Ming Tan; Jin Xiong; Bo Li; Can Ma

    2011-01-01

    Dawning Nebulae is a heterogeneous system composed of 9280 multi-core x86 CPUs and 4640 NVIDIA Fermi GPUs. With a Linpack performance of 1.271 petaFLOPS, it was ranked the second in the TOP500 List released in June 2010. In this paper, key issues in the system design of Dawning Nebulae are introduced. System tuning methodologies aiming at petaFLOPS Linpack result are presented, including algorithmic optimization and communication improvement. The design of its file I/O subsystem, including HVFS and the underlying DCFS3, is also described. Performance evaluations show that the Linpack efficiency of each node reaches 69.89%, and 1024-node aggregate read and write bandwidths exceed 100 GB/s and 70 GB/s respectively. The success of Dawning Nebulae has demonstrated the viability of CPU/GPU heterogeneous structure for future designs of supercomputers.

  15. Scalability Test of multiscale fluid-platelet model for three top supercomputers

    Science.gov (United States)

    Zhang, Peng; Zhang, Na; Gao, Chao; Zhang, Li; Gao, Yuxiang; Deng, Yuefan; Bluestein, Danny

    2016-07-01

    We have tested the scalability of three supercomputers: the Tianhe-2, Stampede and CS-Storm with multiscale fluid-platelet simulations, in which a highly-resolved and efficient numerical model for nanoscale biophysics of platelets in microscale viscous biofluids is considered. Three experiments involving varying problem sizes were performed: Exp-S: 680,718-particle single-platelet; Exp-M: 2,722,872-particle 4-platelet; and Exp-L: 10,891,488-particle 16-platelet. Our implementations of multiple time-stepping (MTS) algorithm improved the performance of single time-stepping (STS) in all experiments. Using MTS, our model achieved the following simulation rates: 12.5, 25.0, 35.5 μs/day for Exp-S and 9.09, 6.25, 14.29 μs/day for Exp-M on Tianhe-2, CS-Storm 16-K80 and Stampede K20. The best rate for Exp-L was 6.25 μs/day for Stampede. Utilizing current advanced HPC resources, the simulation rates achieved by our algorithms bring within reach performing complex multiscale simulations for solving vexing problems at the interface of biology and engineering, such as thrombosis in blood flow which combines millisecond-scale hematology with microscale blood flow at resolutions of micro-to-nanoscale cellular components of platelets. This study of testing the performance characteristics of supercomputers with advanced computational algorithms that offer optimal trade-off to achieve enhanced computational performance serves to demonstrate that such simulations are feasible with currently available HPC resources.

  16. Harnessing Petaflop-Scale Multi-Core Supercomputing for Problems in Space Science

    Science.gov (United States)

    Albright, B. J.; Yin, L.; Bowers, K. J.; Daughton, W.; Bergen, B.; Kwan, T. J.

    2008-12-01

    The particle-in-cell kinetic plasma code VPIC has been migrated successfully to the world's fastest supercomputer, Roadrunner, a hybrid multi-core platform built by IBM for the Los Alamos National Laboratory. How this was achieved will be described and examples of state-of-the-art calculations in space science, in particular, the study of magnetic reconnection, will be presented. With VPIC on Roadrunner, we have performed, for the first time, plasma PIC calculations with over one trillion particles, >100× larger than calculations considered "heroic" by community standards. This allows examination of physics at unprecedented scale and fidelity. Roadrunner is an example of an emerging paradigm in supercomputing: the trend toward multi-core systems with deep hierarchies and where memory bandwidth optimization is vital to achieving high performance. Getting VPIC to perform well on such systems is a formidable challenge: the core algorithm is memory bandwidth limited with low compute-to-data ratio and requires random access to memory in its inner loop. That we were able to get VPIC to perform and scale well, achieving >0.374 Pflop/s and linear weak scaling on real physics problems on up to the full 12240-core Roadrunner machine, bodes well for harnessing these machines for our community's needs in the future. Many of the design considerations encountered commute to other multi-core and accelerated (e.g., via GPU) platforms and we modified VPIC with flexibility in mind. These will be summarized and strategies for how one might adapt a code for such platforms will be shared. Work performed under the auspices of the U.S. DOE by the LANS LLC Los Alamos National Laboratory. Dr. Bowers is a LANL Guest Scientist; he is presently at D. E. Shaw Research LLC, 120 W 45th Street, 39th Floor, New York, NY 10036.

  17. DVS-SOFTWARE: An Effective Tool for Applying Highly Parallelized Hardware To Computational Geophysics

    Science.gov (United States)

    Herrera, I.; Herrera, G. S.

    2015-12-01

    Most geophysical systems are macroscopic physical systems. The behavior prediction of such systems is carried out by means of computational models whose basic models are partial differential equations (PDEs) [1]. Due to the enormous size of the discretized version of such PDEs it is necessary to apply highly parallelized super-computers. For them, at present, the most efficient software is based on non-overlapping domain decomposition methods (DDM). However, a limiting feature of the present state-of-the-art techniques is due to the kind of discretizations used in them. Recently, I. Herrera and co-workers using 'non-overlapping discretizations' have produced the DVS-Software which overcomes this limitation [2]. The DVS-software can be applied to a great variety of geophysical problems and achieves very high parallel efficiencies (90%, or so [3]). It is therefore very suitable for effectively applying the most advanced parallel supercomputers available at present. In a parallel talk, in this AGU Fall Meeting, Graciela Herrera Z. will present how this software is being applied to advance MOD-FLOW. Key Words: Parallel Software for Geophysics, High Performance Computing, HPC, Parallel Computing, Domain Decomposition Methods (DDM)REFERENCES [1]. Herrera Ismael and George F. Pinder, Mathematical Modelling in Science and Engineering: An axiomatic approach", John Wiley, 243p., 2012. [2]. Herrera, I., de la Cruz L.M. and Rosas-Medina A. "Non Overlapping Discretization Methods for Partial, Differential Equations". NUMER METH PART D E, 30: 1427-1454, 2014, DOI 10.1002/num 21852. (Open source) [3]. Herrera, I., & Contreras Iván "An Innovative Tool for Effectively Applying Highly Parallelized Software To Problems of Elasticity". Geofísica Internacional, 2015 (In press)

  18. An Experimental Study of the Clinical Acquisition of Behavioral Principles by Videotape Self-Confrontation. Final Report.

    Science.gov (United States)

    Denver Univ., CO.

    To determine the effect of videotape self-confrontation as a training device for speech clinicians, 30 students participated in a 12 month study. Ten experimental subjects were assigned to single confrontation, 10 to double confrontation, and 10 were control subjects. Each confrontation subject used a therapy matrix and scored his therapy session…

  19. A parallel buffer tree

    DEFF Research Database (Denmark)

    Sitchinava, Nodar; Zeh, Norbert

    2012-01-01

    We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number...... of available processor cores compared to its sequential counterpart, thereby taking full advantage of multicore parallelism. The parallel buffer tree is a search tree data structure that supports the batched parallel processing of a sequence of N insertions, deletions, membership queries, and range queries...

  20. Parallel Atomistic Simulations

    Energy Technology Data Exchange (ETDEWEB)

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  1. Invariants for Parallel Mapping

    Institute of Scientific and Technical Information of China (English)

    YIN Yajun; WU Jiye; FAN Qinshan; HUANG Kezhi

    2009-01-01

    This paper analyzes the geometric quantities that remain unchanged during parallel mapping (i.e., mapping from a reference curved surface to a parallel surface with identical normal direction). The second gradient operator, the second class of integral theorems, the Gauss-curvature-based integral theorems, and the core property of parallel mapping are used to derive a series of parallel mapping invadants or geometri-cally conserved quantities. These include not only local mapping invadants but also global mapping invari-ants found to exist both in a curved surface and along curves on the curved surface. The parallel mapping invadants are used to identify important transformations between the reference surface and parallel surfaces. These mapping invadants and transformations have potential applications in geometry, physics, biome-chanics, and mechanics in which various dynamic processes occur along or between parallel surfaces.

  2. Efficient biased random bit generation for parallel processing

    Energy Technology Data Exchange (ETDEWEB)

    Slone, D.M.

    1994-09-28

    A lattice gas automaton was implemented on a massively parallel machine (the BBN TC2000) and a vector supercomputer (the CRAY C90). The automaton models Burgers equation {rho}t + {rho}{rho}{sub x} = {nu}{rho}{sub xx} in 1 dimension. The lattice gas evolves by advecting and colliding pseudo-particles on a 1-dimensional, periodic grid. The specific rules for colliding particles are stochastic in nature and require the generation of many billions of random numbers to create the random bits necessary for the lattice gas. The goal of the thesis was to speed up the process of generating the random bits and thereby lessen the computational bottleneck of the automaton.

  3. A Computational Fluid Dynamics Algorithm on a Massively Parallel Computer

    Science.gov (United States)

    Jespersen, Dennis C.; Levit, Creon

    1989-01-01

    The discipline of computational fluid dynamics is demanding ever-increasing computational power to deal with complex fluid flow problems. We investigate the performance of a finite-difference computational fluid dynamics algorithm on a massively parallel computer, the Connection Machine. Of special interest is an implicit time-stepping algorithm; to obtain maximum performance from the Connection Machine, it is necessary to use a nonstandard algorithm to solve the linear systems that arise in the implicit algorithm. We find that the Connection Machine ran achieve very high computation rates on both explicit and implicit algorithms. The performance of the Connection Machine puts it in the same class as today's most powerful conventional supercomputers.

  4. Parallel Libraries to support High-Level Programming

    DEFF Research Database (Denmark)

    Larsen, Morten Nørgaard

    model requires the programmer to think a bit differently, but at the same time the implemented algorithms will perform very well, as shown by the initial tests presented. In the second part of this thesis, I will change focus from the CELL-BE architecture to the more traditionally x86 architecture...... of the more exotic though short-lived heterogeneous CELL Broadband Engine (CELL-BE) architecture added to this shift. Furthermore, the use of cluster computers made of commodity hardware and specialized supercomputers have greatly increased in both industry as well as in the academic world. Finally...... as they would be a single machine. In between is a number of tools helping the programmers handle communication, share data, run loops in parallel, handle algorithms mining huge amounts of data etc. Even though most of them do a good job performance-wise, almost all of them require that the programmers learn...

  5. Parallelism and pipelining in high-speed digital simulators

    Science.gov (United States)

    Karplus, W. J.

    1983-01-01

    The attainment of high computing speed as measured by the computational throughput is seen as one of the most challenging requirements. It is noted that high speed is cardinal in several distinct classes of applications. These classes are then discussed; they comprise (1) the real-time simulation of dynamic systems , (2) distributed parameter systems, and (3) mixed lumped and distributed systems. From the 1950s on, the quest for high speed in digital simulators concentrated on overcoming the limitations imposed by the so-called von Neumann bottleneck. Two major architectural approaches have made ig possible to circumvent this bottleneck and attain high speeds. These are pipelining and parallelism. Supercomputers, peripheral array processors, and microcomputer networks are then discussed.

  6. Confronting Theoretical Predictions With Experimental Data; Fitting Strategy For Multi-Dimensional Distributions

    Directory of Open Access Journals (Sweden)

    Tomasz Przedziński

    2015-01-01

    Full Text Available After developing a Resonance Chiral Lagrangian (RχL model to describe hadronic τ lepton decays [18], the model was confronted with experimental data. This was accomplished using a fitting framework which was developed to take into account the complexity of the model and to ensure the numerical stability for the algorithms used in the fitting. Since the model used in the fit contained 15 parameters and there were only three 1-dimensional distributions available, we could expect multiple local minima or even whole regions of equal potential to appear. Our methods had to thoroughly explore the whole parameter space and ensure, as well as possible, that the result is a global minimum. This paper is focused on the technical aspects of the fitting strategy used. The first approach was based on re-weighting algorithm published in [17] and produced results in around two weeks. Later approach, with improved theoretical model and simple parallelization algorithm based on Inter-Process Communication (IPC methods of UNIX system, reduced computation time down to 2-3 days. Additional approximations were introduced to the model decreasing time to obtain the preliminary results down to 8 hours. This allowed to better validate the results leading to a more robust analysis published in [12].

  7. Confronting hip resurfacing and big femoral head replacement gait analysis

    Directory of Open Access Journals (Sweden)

    Panagiotis K. Karampinas

    2014-03-01

    Full Text Available Improved hip kinematics and bone preservation have been reported after resurfacing total hip replacement (THRS. On the other hand, hip kinematics with standard total hip replacement (THR is optimized with large diameter femoral heads (BFH-THR. The purpose of this study is to evaluate the functional outcomes of THRS and BFH-THR and correlate these results to bone preservation or the large femoral heads. Thirty-one patients were included in the study. Gait speed, postural balance, proprioception and overall performance. Our results demonstrated a non-statistically significant improvement in gait, postural balance and proprioception in the THRS confronting to BFH-THR group. THRS provide identical outcomes to traditional BFH-THR. The THRS choice as bone preserving procedure in younger patients is still to be evaluated.

  8. TUBULORETICULAR STRUCTURE AND CYLINDRICAL CONFRONTING CISTERNAE IN LUPUS NEPHRITIS

    Institute of Scientific and Technical Information of China (English)

    陈振斌; 梁平; 余英豪; 谢福安; 陈莲云

    1998-01-01

    Objective. To investigate the pathological significance of tubuloreticular structure(TRS) and cyhndricol confronting cisternae(CCC) in patients with lupus nephritis. Methods. An electron microscopical study of 24 renal biopsy specimens from patients with lupus nephritis was carried out, with particular emphasis on two endoplasmie reticulum(ER)-related structures. Result. TRS was found in 18 cases, and CCC in 10 of them. TRS often oppeared in the capillary endothelium, and did not correlate well with the activity index of lupus nephritis, CCC appeared frequently in monoeyte/macrophage and lymphocyte, and correlated well with hoth the activity index nod the amount of interstitial immune deposits. Conclusion. TRS and CCC derived from inward "budding" of ER membrane were suggested and the morphogenesis and morphologic variations of CCC were discussed. Both TRS and CCC are pathognomonic,though not specific changes. They may be helpful in pathologic diagnosis of lupus nephritis, when properly combined with certain clinical and pathological features.

  9. Confronting product life thinking with product life cycle analysis

    DEFF Research Database (Denmark)

    McAloone, Tim C.

    2001-01-01

    Industry is increasingly being confronted with the need to consider the whole life cycle effects of its products, in order to make environmental improvements of any significance. There is a danger that naive environmental decisions are made, due to apparent lack of data or actual lack of insight....... This paper describes a case study, where a class of students was presented with a product from the Danish company, Danfoss A/S, and given the task of carrying out an initial environmental evaluation of the product. This evaluation consisted of both a "life cycle check" and an exercise where the students were...... to "read" the environment out of the product, in order to systematically, quickly and efficiently come to some design recommendations for the company. The phrases "LCA" and "product life thinking" will be described and differentiated and a pattern identified for their cooperative effect in use....

  10. The latent confrontation: The Korean peninsula’s uncertain future

    Directory of Open Access Journals (Sweden)

    Asier Blas Mendoza

    2007-10-01

    Full Text Available This article covers the changes that have taken place in inter-Korean relations since the fall of the Soviet regimes. In the first few years following the fall of the iron curtain, the Korean perimeter became a scenario of confrontation that seemed to perpetuate theproblem. However, in the second half of the ‘90s, north-east Asia began to undergo a real change that resulted in public contacts between the two Koreas. The new game that was officially opened by the “Sunshine policy” led to a deep-seated rethinking of foreignpolicy by both states, and opened a new chapter in inter-Korean relations that has clearly demonstrated the important dimension and repercussions of the conflict in the geo-strategic framework of the entire East Asian area, as well as in international politics.

  11. Israel's prophets and their confrontation with the Canaanite religion

    Directory of Open Access Journals (Sweden)

    Arvid S. Kapelrud

    1969-01-01

    Full Text Available The confrontation between Israelites and Canaanites is presented in the Old Testament historical books as a violent encounter, a battle which was waged between tribes who invaded Canaan from Egypt and the people who already lived in the land and neighbouring territories. The narrators who composed the Deuteronomistic historical work (Deuteronomy, Joshua, Judges, I and II Samuel, I and II Kings have, with background in the situation in which they themselves lived, described it such that Moses gave the order for a complete extermination of other people and tribes in those regions where they settled down (Deut. 7: x ff.. The intention was that the tribes should not be tempted to take up the inhabitants' religion, thereby disrupting the domestic, harmonious unity among the Israelites and starting an internal disintegration.

  12. Ukrainian-russian relations in the context of geopolitical confrontation

    Directory of Open Access Journals (Sweden)

    Oleksandr S. Vonsovych

    2016-02-01

    Full Text Available The article is devoted to Ukrainian- Russian relations in the context of geopolitical confrontation. Located in the geographical center of Europe, Ukraine has an important geopolitical and geostrategic position in pan-European processes, as well as in the processes that relate to the security sphere. It was determined that the Russian aggression against our country manifests itself in the form of direct and intermediary use of armed force against the sovereignty and maintains the integrity of territorial. It is proved that given the state of the negotiation process, its results and the decisions not to promote the settlement of the conflict, but on the contrary – result in further complication of the situation. Argued the thesis that the initiator of the introduction of sanctions to the international isolation of Russia was the US government, under the pressure of which, at the risk of incurring an economic loss to the sanctions have joined countries of the European Union. On the one hand, the sanctions policy towards Russia slowed down the development of the European Union. On the other hand, the continuation of sanctions against Russia due to the close strategic cooperation between the EU-countries and the USA. It is proved that an equally important factor in the context of the development of Ukrainian-Russian relations will be the results of the presidential elections in the United States of America in 2016. The results of elections will determine the contours of the new geopolitical confrontation between Russia and the USA, and what place Ukraine will takes in it: either to remain a bargaining chip in Russia’s geopolitical plans, or will become a real strategic partner of the USA. The second option is more acceptable, because under these conditions, we will have more chances of further settlement of the Ukrainian-Russian conflict contradictions and stabilization of the domestic situation.

  13. Hybrid Parallel Programming Models for AMR Neutron Monte-Carlo Transport

    Science.gov (United States)

    Dureau, David; Poëtte, Gaël

    2014-06-01

    This paper deals with High Performance Computing (HPC) applied to neutron transport theory on complex geometries, thanks to both an Adaptive Mesh Refinement (AMR) algorithm and a Monte-Carlo (MC) solver. Several Parallelism models are presented and analyzed in this context, among them shared memory and distributed memory ones such as Domain Replication and Domain Decomposition, together with Hybrid strategies. The study is illustrated by weak and strong scalability tests on complex benchmarks on several thousands of cores thanks to the petaflopic supercomputer Tera100.

  14. Adapting the serial Alpgen event generator to simulate LHC collisions on millions of parallel threads

    CERN Document Server

    Childers, J T; LeCompte, T J; Papka, M E; Benjamin, D P

    2015-01-01

    As the LHC moves to higher energies and luminosity, the demand for computing resources increases accordingly and will soon outpace the growth of the Worldwide LHC Computing Grid. To meet this greater demand, event generation Monte Carlo was targeted for adaptation to run on Mira, the supercomputer at the Argonne Leadership Computing Facility. Alpgen is a Monte Carlo event generation application that is used by LHC experiments in the simulation of collisions that take place in the Large Hadron Collider. This paper details the process by which Alpgen was adapted from a single-processor serial-application to a large-scale parallel-application and the performance that was achieved.

  15. Behavior change following self-confrontation: a test of the value-mediation hypothesis.

    Science.gov (United States)

    Grube, J W; Rankin, W L; Greenstein, T N; Kearney, K A

    1977-04-01

    This study presents a reanalysis of data from Rokeach's self-confrontation experiments using path analytic techniques. Contrary to Rokeach's interpretations, findings indicate that behavior changes following self-confrontation are not primarily mediated through changes in value priorities. Rather, the available data suggest that the self-confrontation process involves the resolution of inconsistencies between behaviors and self-conceptions that are revelaed during the treatment session. The authors interpret these findings within the framework of Rokeach's general theory of self-disatisfaction and cognitive-behavioral change. Suggestions for future directions in self-confrontation research are offered.

  16. A Parallel Monte Carlo Code for Simulating Collisional N-body Systems

    CERN Document Server

    Pattabiraman, Bharath; Liao, Wei-Keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A

    2012-01-01

    We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N~10^7 particles. Our code is based on the the H\\'enon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures, and the introduction of a parallel random number generation scheme, as well as a parallel sorting algorithm, required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. The implementation uses the Message Passing Interface (MPI) library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functi...

  17. Assessing the Need for Supercomputing Resources Within the Pacific Area of Responsibility

    Science.gov (United States)

    2015-05-26

    database can be divided into smaller shards that are distributed to many nodes, and each node performs the search in parallel with the rest. The desired...subparts that can be solved in parallel, for example, when searching a large database for records satisfying a particular condition. To solve this, the...records are then collected after the parallel searches are completed. However, during the time that the search is being performed on each shard

  18. Parallelization in Modern C++

    CERN Document Server

    CERN. Geneva

    2016-01-01

    The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...

  19. Parallelism in matrix computations

    CERN Document Server

    Gallopoulos, Efstratios; Sameh, Ahmed H

    2016-01-01

    This book is primarily intended as a research monograph that could also be used in graduate courses for the design of parallel algorithms in matrix computations. It assumes general but not extensive knowledge of numerical linear algebra, parallel architectures, and parallel programming paradigms. The book consists of four parts: (I) Basics; (II) Dense and Special Matrix Computations; (III) Sparse Matrix Computations; and (IV) Matrix functions and characteristics. Part I deals with parallel programming paradigms and fundamental kernels, including reordering schemes for sparse matrices. Part II is devoted to dense matrix computations such as parallel algorithms for solving linear systems, linear least squares, the symmetric algebraic eigenvalue problem, and the singular-value decomposition. It also deals with the development of parallel algorithms for special linear systems such as banded ,Vandermonde ,Toeplitz ,and block Toeplitz systems. Part III addresses sparse matrix computations: (a) the development of pa...

  20. Parallel digital forensics infrastructure.

    Energy Technology Data Exchange (ETDEWEB)

    Liebrock, Lorie M. (New Mexico Tech, Socorro, NM); Duggan, David Patrick

    2009-10-01

    This report documents the architecture and implementation of a Parallel Digital Forensics infrastructure. This infrastructure is necessary for supporting the design, implementation, and testing of new classes of parallel digital forensics tools. Digital Forensics has become extremely difficult with data sets of one terabyte and larger. The only way to overcome the processing time of these large sets is to identify and develop new parallel algorithms for performing the analysis. To support algorithm research, a flexible base infrastructure is required. A candidate architecture for this base infrastructure was designed, instantiated, and tested by this project, in collaboration with New Mexico Tech. Previous infrastructures were not designed and built specifically for the development and testing of parallel algorithms. With the size of forensics data sets only expected to increase significantly, this type of infrastructure support is necessary for continued research in parallel digital forensics. This report documents the implementation of the parallel digital forensics (PDF) infrastructure architecture and implementation.

  1. Introduction to Parallel Computing

    Science.gov (United States)

    1992-05-01

    Topology C, Ada, C++, Data-parallel FORTRAN, 2D mesh of node boards, each node FORTRAN-90 (late 1992) board has 1 application processor Devopment Tools ...parallel machines become the wave of the present, tools are increasingly needed to assist programmers in creating parallel tasks and coordinating...their activities. Linda was designed to be such a tool . Linda was designed with three important goals in mind: to be portable, efficient, and easy to use

  2. Parallel Wolff Cluster Algorithms

    Science.gov (United States)

    Bae, S.; Ko, S. H.; Coddington, P. D.

    The Wolff single-cluster algorithm is the most efficient method known for Monte Carlo simulation of many spin models. Due to the irregular size, shape and position of the Wolff clusters, this method does not easily lend itself to efficient parallel implementation, so that simulations using this method have thus far been confined to workstations and vector machines. Here we present two parallel implementations of this algorithm, and show that one gives fairly good performance on a MIMD parallel computer.

  3. Practical Parallel Rendering

    CERN Document Server

    Chalmers, Alan

    2002-01-01

    Meeting the growing demands for speed and quality in rendering computer graphics images requires new techniques. Practical parallel rendering provides one of the most practical solutions. This book addresses the basic issues of rendering within a parallel or distributed computing environment, and considers the strengths and weaknesses of multiprocessor machines and networked render farms for graphics rendering. Case studies of working applications demonstrate, in detail, practical ways of dealing with complex issues involved in parallel processing.

  4. Parallel Algorithms and Patterns

    Energy Technology Data Exchange (ETDEWEB)

    Robey, Robert W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-06-16

    This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.

  5. Approach of generating parallel programs from parallelized algorithm design strategies

    Institute of Scientific and Technical Information of China (English)

    WAN Jian-yi; LI Xiao-ying

    2008-01-01

    Today, parallel programming is dominated by message passing libraries, such as message passing interface (MPI). This article intends to simplify parallel programming by generating parallel programs from parallelized algorithm design strategies. It uses skeletons to abstract parallelized algorithm design strategies, as well as parallel architectures. Starting from problem specification, an abstract parallel abstract programming language+ (Apla+) program is generated from parallelized algorithm design strategies and problem-specific function definitions. By combining with parallel architectures, implicity of parallelism inside the parallelized algorithm design strategies is exploited. With implementation and transformation, C++ and parallel virtual machine (CPPVM) parallel program is finally generated. Parallelized branch and bound (B&B) algorithm design strategy and parallelized divide and conquer (D & C) algorithm design strategy are studied in this article as examples. And it also illustrates the approach with a case study.

  6. CREATIVITY OF V.G. RASPUTIN AND V.V. KILYAKOV: THE CONCEPT OF CONFRONTATION OF EPOCHS

    Directory of Open Access Journals (Sweden)

    K. I. CHICHKINA

    2015-01-01

    Full Text Available This article highlights the philosophical reflection of artistic parallels in the creative works of writers of modern Russian literature – V.V. Kilyakov and V.G. Rasputin by the example of selected works. The  tory "Farewell to Matyora" by V.G. Rasputin and the story "Restless" by V.V. Kilyakov vividly express the concept of confrontation of epochs when comparing the spiritual and moral tragedies of the protagonists. Replacement of the eternal human values by the pursuit of profits and material achievements has deprived people of spiritual ideals. Honor, duty, patriotism, faith in God, respect for elders are replaced by dishonesty, vanity, immorality, trample of holy truths, and wastefulness in the pursuit of wealth. An unbreakable wall of misunderstanding and denial has appeared between the moral and immoral epochs. The idea of the compared works is not only to show this gap and its depth. The authors' position is clear - they cannot accept what is happening, and by their works they try to encourage the young generation to look back into the past. The objective of this article is to denote the concept of confrontation of epochs in the story by V.V. Kilyakov and to draw a parallel with the work by V.G. Rasputin in the context of spiritual and moral contradictions of villages and cities.

  7. Parallel time domain solvers for electrically large transient scattering problems

    KAUST Repository

    Liu, Yang

    2014-09-26

    Marching on in time (MOT)-based integral equation solvers represent an increasingly appealing avenue for analyzing transient electromagnetic interactions with large and complex structures. MOT integral equation solvers for analyzing electromagnetic scattering from perfect electrically conducting objects are obtained by enforcing electric field boundary conditions and implicitly time advance electric surface current densities by iteratively solving sparse systems of equations at all time steps. Contrary to finite difference and element competitors, these solvers apply to nonlinear and multi-scale structures comprising geometrically intricate and deep sub-wavelength features residing atop electrically large platforms. Moreover, they are high-order accurate, stable in the low- and high-frequency limits, and applicable to conducting and penetrable structures represented by highly irregular meshes. This presentation reviews some recent advances in the parallel implementations of time domain integral equation solvers, specifically those that leverage multilevel plane-wave time-domain algorithm (PWTD) on modern manycore computer architectures including graphics processing units (GPUs) and distributed memory supercomputers. The GPU-based implementation achieves at least one order of magnitude speedups compared to serial implementations while the distributed parallel implementation are highly scalable to thousands of compute-nodes. A distributed parallel PWTD kernel has been adopted to solve time domain surface/volume integral equations (TDSIE/TDVIE) for analyzing transient scattering from large and complex-shaped perfectly electrically conducting (PEC)/dielectric objects involving ten million/tens of millions of spatial unknowns.

  8. NAS Parallel Benchmark Results 11-96. 1.0

    Science.gov (United States)

    Bailey, David H.; Bailey, David; Chancellor, Marisa K. (Technical Monitor)

    1997-01-01

    The NAS Parallel Benchmarks have been developed at NASA Ames Research Center to study the performance of parallel supercomputers. The eight benchmark problems are specified in a "pencil and paper" fashion. In other words, the complete details of the problem to be solved are given in a technical document, and except for a few restrictions, benchmarkers are free to select the language constructs and implementation techniques best suited for a particular system. These results represent the best results that have been reported to us by the vendors for the specific 3 systems listed. In this report, we present new NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz), NEC SX-4/32, SGI/CRAY T3E, SGI Origin200, and SGI Origin2000. We also report High Performance Fortran (HPF) based NPB results for IBM SP2 Wide Nodes, HP/Convex Exemplar SPP2000, and SGI/CRAY T3D. These results have been submitted by Applied Parallel Research (APR) and Portland Group Inc. (PGI). We also present sustained performance per dollar for Class B LU, SP and BT benchmarks.

  9. Performance Evaluation Methodologies and Tools for Massively Parallel Programs

    Science.gov (United States)

    Yan, Jerry C.; Sarukkai, Sekhar; Tucker, Deanne (Technical Monitor)

    1994-01-01

    The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessors. However, without effective means to monitor (and analyze) program execution, tuning the performance of parallel programs becomes exponentially difficult as program complexity and machine size increase. The recent introduction of performance tuning tools from various supercomputer vendors (Intel's ParAide, TMC's PRISM, CSI'S Apprentice, and Convex's CXtrace) seems to indicate the maturity of performance tool technologies and vendors'/customers' recognition of their importance. However, a few important questions remain: What kind of performance bottlenecks can these tools detect (or correct)? How time consuming is the performance tuning process? What are some important technical issues that remain to be tackled in this area? This workshop reviews the fundamental concepts involved in analyzing and improving the performance of parallel and heterogeneous message-passing programs. Several alternative strategies will be contrasted, and for each we will describe how currently available tuning tools (e.g., AIMS, ParAide, PRISM, Apprentice, CXtrace, ATExpert, Pablo, IPS-2)) can be used to facilitate the process. We will characterize the effectiveness of the tools and methodologies based on actual user experiences at NASA Ames Research Center. Finally, we will discuss their limitations and outline recent approaches taken by vendors and the research community to address them.

  10. Parallel Computation of the Regional Ocean Modeling System (ROMS)

    Energy Technology Data Exchange (ETDEWEB)

    Wang, P; Song, Y T; Chao, Y; Zhang, H

    2005-04-05

    The Regional Ocean Modeling System (ROMS) is a regional ocean general circulation modeling system solving the free surface, hydrostatic, primitive equations over varying topography. It is free software distributed world-wide for studying both complex coastal ocean problems and the basin-to-global scale ocean circulation. The original ROMS code could only be run on shared-memory systems. With the increasing need to simulate larger model domains with finer resolutions and on a variety of computer platforms, there is a need in the ocean-modeling community to have a ROMS code that can be run on any parallel computer ranging from 10 to hundreds of processors. Recently, we have explored parallelization for ROMS using the MPI programming model. In this paper, an efficient parallelization strategy for such a large-scale scientific software package, based on an existing shared-memory computing model, is presented. In addition, scientific applications and data-performance issues on a couple of SGI systems, including Columbia, the world's third-fastest supercomputer, are discussed.

  11. Middleware Architecture for the Interconnection of Distributed and Parallel Systems

    Directory of Open Access Journals (Sweden)

    Ovidiu Gherman

    2012-01-01

    Full Text Available Grid computing is a fast evolving technology, bringing more computing power to its users. Two main directions are observable: creating dedicated supercomputers for scientific and commercial tasks and creating distributed commodity-based systems. The first ones are usually much expensive, but have the advantage of performance, better control and uniformity in platforms. The second one is more affordable but lacks in flexibility and easy maintenance. The computing necessities that often require supplementary computing power for certain time periods are better satisfied by interconnecting available resources than buying new, expensive ones. But interconnecting platforms – sometimes radically different – can be a difficult task. The proliferation of hybrid parallel computing systems can be even more complicated because it puts in contact systems with various operating flows at the parallelism level. In this frame, the present article proposes a new middleware architecture that can connect multiple parallel or distributed resources, of different types, allowing unitary resource utilization and reservation for the user’s jobs. The new architecture is described functionally and structurally.

  12. A New Parallel Method for Binary Black Hole Simulations

    Directory of Open Access Journals (Sweden)

    Quan Yang

    2016-01-01

    Full Text Available Simulating binary black hole (BBH systems are a computationally intensive problem and it can lead to great scientific discovery. How to explore more parallelism to take advantage of the large number of computing resources of modern supercomputers is the key to achieve high performance for BBH simulations. In this paper, we propose a scalable MPM (Mesh based Parallel Method which can explore both the inter- and intramesh level parallelism to improve the performance of BBH simulation. At the same time, we also leverage GPU to accelerate the performance. Different kinds of performance tests are conducted on Blue Waters. Compared with the existing method, our MPM can improve the performance from 5x speedup (compared with the normalized speed of 32 MPI processes to 8x speedup. For the GPU accelerated version, our MPM can improve the performance from 12x speedup to 28x speedup. Experimental results also show that when only enough CPU computing resource or limited GPU computing resource is available, our MPM can employ two special scheduling mechanisms to achieve better performance. Furthermore, our scalable GPU acceleration MPM can achieve almost ideal weak scaling up to 2048 GPU computing nodes which enables our software to handle even larger BBH simulations efficiently.

  13. Rule transgressions in groups : The conditional nature of newcomers' willingness to confront deviance

    NARCIS (Netherlands)

    Jetten, Jolanda; Hornsey, Matthew J.; Spears, Russell; Haslam, S. Alexander; Cowell, Eleanor

    2010-01-01

    We provide evidence that, compared to old-timers, newcomers' intentions to confront deviants are more sensitive to the Social context when confronted with ride-violations. Female rugby players (N = 71) were asked for their disapproval of, and willingness to sanction, ingroup and outgroup members who

  14. Confrontation Avoidance in Island Cultures: Some Observations in Guam and Micronesia.

    Science.gov (United States)

    Ford, C. Christopher; Carey, Edwin

    The avoidance of interpersonal confrontations is a behavioral pattern which seems to be common to small homogeneous island cultures. At the University of Guam, a quasi-experimental study was designed to investigate this confrontation avoidance phenomenon in Guam and Micronesia. This paper presents a description of that study and a discussion of…

  15. Rule transgressions in groups : The conditional nature of newcomers' willingness to confront deviance

    NARCIS (Netherlands)

    Jetten, Jolanda; Hornsey, Matthew J.; Spears, Russell; Haslam, S. Alexander; Cowell, Eleanor

    We provide evidence that, compared to old-timers, newcomers' intentions to confront deviants are more sensitive to the Social context when confronted with ride-violations. Female rugby players (N = 71) were asked for their disapproval of, and willingness to sanction, ingroup and outgroup members who

  16. Sex-Role Orientation and the Willingness to Confront Existential Issues.

    Science.gov (United States)

    Stevens, Michael J.; And Others

    1990-01-01

    Examined relationship between sex-role orientation and willingness to confront existential issues in undergraduate students (N=88). Results indicated masculine men and feminine women reported the least willingness to confront existential issues; androgynous men were most receptive to facing existential issues. (Author/ABL)

  17. Patterns For Parallel Programming

    CERN Document Server

    Mattson, Timothy G; Massingill, Berna L

    2005-01-01

    From grids and clusters to next-generation game consoles, parallel computing is going mainstream. Innovations such as Hyper-Threading Technology, HyperTransport Technology, and multicore microprocessors from IBM, Intel, and Sun are accelerating the movement's growth. Only one thing is missing: programmers with the skills to meet the soaring demand for parallel software.

  18. Parallel scheduling algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Dekel, E.; Sahni, S.

    1983-01-01

    Parallel algorithms are given for scheduling problems such as scheduling to minimize the number of tardy jobs, job sequencing with deadlines, scheduling to minimize earliness and tardiness penalties, channel assignment, and minimizing the mean finish time. The shared memory model of parallel computers is used to obtain fast algorithms. 26 references.

  19. Fast wavelet packet transform-based algorithm for numerical solution of image restoration problems in a parallel environment

    Science.gov (United States)

    Carracciuolo, Luisa; D'Amore, Luisa; Murli, Almerico

    1998-10-01

    We explore the filtering properties of wavelets functions in order to develop accurate and efficient numerical algorithms for Image Restoration problems. We propose a parallel implementation for MIMD distributed memory environments. The key insight of our approach is the use of distributed versions of Level 3 Basic Linear Algebra Subprograms as computational building blocks and the use of Basic Linear Algebra Communication Subprograms as communication building blocks for advanced architecture computers. The use of these low-level mathematical software libraries guarantees the development of efficient, portable and scalable high-level algorithms and hides many details of the parallelism from the user's point of view. Numerical experiments on a simulated image restoration applications are shown. The parallel software has been tested on a 12 nodes IBM SP2 available at the Center for Research on Parallel Computing and Supercomputers in Naples, Italy).

  20. The company's mainframes join CERN's openlab for DataGrid apps and are pivotal in a new $22 million Supercomputer in the U.K.

    CERN Multimedia

    2002-01-01

    Hewlett-Packard has installed a supercomputer system valued at more than $22 million at the Wellcome Trust Sanger Institute (WTSI) in the U.K. HP has also joined the CERN openlab for DataGrid applications (1 page).

  1. Research center Juelich to install Germany's most powerful supercomputer new IBM System for science and research will achieve 5.8 trillion computations per second

    CERN Multimedia

    2002-01-01

    "The Research Center Juelich, Germany, and IBM today announced that they have signed a contract for the delivery and installation of a new IBM supercomputer at the Central Institute for Applied Mathematics" (1/2 page).

  2. Parallel computing works

    Energy Technology Data Exchange (ETDEWEB)

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  3. Load-balancing techniques for a parallel electromagnetic particle-in-cell code

    Energy Technology Data Exchange (ETDEWEB)

    PLIMPTON,STEVEN J.; SEIDEL,DAVID B.; PASIK,MICHAEL F.; COATS,REBECCA S.

    2000-01-01

    QUICKSILVER is a 3-d electromagnetic particle-in-cell simulation code developed and used at Sandia to model relativistic charged particle transport. It models the time-response of electromagnetic fields and low-density-plasmas in a self-consistent manner: the fields push the plasma particles and the plasma current modifies the fields. Through an LDRD project a new parallel version of QUICKSILVER was created to enable large-scale plasma simulations to be run on massively-parallel distributed-memory supercomputers with thousands of processors, such as the Intel Tflops and DEC CPlant machines at Sandia. The new parallel code implements nearly all the features of the original serial QUICKSILVER and can be run on any platform which supports the message-passing interface (MPI) standard as well as on single-processor workstations. This report describes basic strategies useful for parallelizing and load-balancing particle-in-cell codes, outlines the parallel algorithms used in this implementation, and provides a summary of the modifications made to QUICKSILVER. It also highlights a series of benchmark simulations which have been run with the new code that illustrate its performance and parallel efficiency. These calculations have up to a billion grid cells and particles and were run on thousands of processors. This report also serves as a user manual for people wishing to run parallel QUICKSILVER.

  4. Assessment techniques for a learning-centered curriculum: evaluation design for adventures in supercomputing

    Energy Technology Data Exchange (ETDEWEB)

    Helland, B. [Ames Lab., IA (United States); Summers, B.G. [Oak Ridge National Lab., TN (United States)

    1996-09-01

    As the classroom paradigm shifts from being teacher-centered to being learner-centered, student assessments are evolving from typical paper and pencil testing to other methods of evaluation. Students should be probed for understanding, reasoning, and critical thinking abilities rather than their ability to return memorized facts. The assessment of the Department of Energy`s pilot program, Adventures in Supercomputing (AiS), offers one example of assessment techniques developed for learner-centered curricula. This assessment has employed a variety of methods to collect student data. Methods of assessment used were traditional testing, performance testing, interviews, short questionnaires via email, and student presentations of projects. The data obtained from these sources have been analyzed by a professional assessment team at the Center for Children and Technology. The results have been used to improve the AiS curriculum and establish the quality of the overall AiS program. This paper will discuss the various methods of assessment used and the results.

  5. Distributed computing as a virtual supercomputer: Tools to run and manage large-scale BOINC simulations

    Science.gov (United States)

    Giorgino, Toni; Harvey, M. J.; de Fabritiis, Gianni

    2010-08-01

    Distributed computing (DC) projects tackle large computational problems by exploiting the donated processing power of thousands of volunteered computers, connected through the Internet. To efficiently employ the computational resources of one of world's largest DC efforts, GPUGRID, the project scientists require tools that handle hundreds of thousands of tasks which run asynchronously and generate gigabytes of data every day. We describe RBoinc, an interface that allows computational scientists to embed the DC methodology into the daily work-flow of high-throughput experiments. By extending the Berkeley Open Infrastructure for Network Computing (BOINC), the leading open-source middleware for current DC projects, with mechanisms to submit and manage large-scale distributed computations from individual workstations, RBoinc turns distributed grids into cost-effective virtual resources that can be employed by researchers in work-flows similar to conventional supercomputers. The GPUGRID project is currently using RBoinc for all of its in silico experiments based on molecular dynamics methods, including the determination of binding free energies and free energy profiles in all-atom models of biomolecules.

  6. Some Challenges the Management Confronts with, in the Financial Institutions

    Directory of Open Access Journals (Sweden)

    Laurentiu Mihai Treapat

    2014-02-01

    Full Text Available In this paper, we analyze some features and components of the management in general, and of the management in the financial area in particular. Special attention is given to how they cope with some risk which could affect their activity. Trying to find from practice what kind of difficulties the management faces in their work, for sure, we get to interesting conclusions and furthermore, to optimum solutions. We already have some data, result of some earlier preoccupations of the specialists (Dănilă and Berea, 2000 pp.39-48 while others can be foreseen as specific elements for the beginning of the 3rd millennium, that started with what the rating agencies seem to admit as the most important economic decline and prolonged recession risk within the post World War II history. We consider an evaluation of the challenges the management confronts with, lately - while subject to pressures and to the need for radical changes that come with an astonishing speed and that are enhanced by the shareholders’ desperate need to protect their capital. Findings reveal that, in any business enterprise the shareholders’ strategy and the management’s objectives are earning new clients, enlarging the market share, creating added value and on these bases, maximizing the gained profits. We consider that the volatile and fluctuant nature of the raw material the banks operate with - namely the money – turn the management in this area into a particular one, depicted by some specific features, which we analyze in the following pages.

  7. The Piggle: confrontations with non-existence in childhood.

    Science.gov (United States)

    Charles, M

    1999-08-01

    The author interweaves pieces from D. W. Winnicott's 'The Piggle: An Account of the Psychoanalytic Treatment of a Little Girl' with her own experiences and case material, to explore how confrontations with existence/non-existence can become terrifying realities that overwhelm the child's resources. In the pages of 'The Piggle', we can see a young child 'playing' with the unfathomable notions of beginnings and endings in the wake of the birth of a sibling, which also heralds the death of the extant mother/child relationship. In the case material these issues are explored from the perspective of an adult who had been unable to resolve them without great loss of self. In both cases, parental failures impeded the child's ability to integrate his or her experiences. The work of Matte-Blanco provides a useful terminology for understanding condensations and equivalences in the material; as affect intensifies, the rules of conventional logic recede, and sameness threatens to annihilate distinctions between good and bad, or self and other. Containing these dichotomies within the analytic setting facilitates their integration, in a movement from the fragmentation of the paranoid-schizoid position towards the reconciliation and renunciation of the depressive position.

  8. Views of discrimination among individuals confronting genetic disease.

    Science.gov (United States)

    Klitzman, Robert

    2010-02-01

    Though the US passed the Genetic Information Non-Discrimination Act, many questions remain of how individuals confronting genetic disease view and experience possible discrimination. We interviewed, for 2 hours each, 64 individuals who had, or were at risk for, Huntington's Disease, breast cancer, or Alpha-1 antitrypsin deficiency. Discrimination can be implicit, indirect and subtle, rather than explicit, direct and overt; and be hard to prove. Patients may be treated "differently" and unfairly, raising questions of how to define "discrimination", and "appropriate accommodation". Patients were often unclear and wary about legislation. Fears and experiences of discrimination can shape testing, treatment, and disclosure. Discrimination can be subjective, and take various forms. Searches for only objective evidence of it may be inherently difficult. Providers need to be aware of, and prepared to address, subtle and indirect discrimination; ambiguities, confusion and potential limitations concerning current legislation; and needs for education about these laws. Policies are needed to prevent discrimination in life, long-term care, and disability insurance, not covered by GINA.

  9. [30 years' development of economic theories in confrontation with facts].

    Science.gov (United States)

    Levy, F

    1994-01-01

    In 1969, R. Frish and J. Tinbergen received the first Nobel Price in Economics. 200 years after Quesnay's "economic tables", economics were at last considered as a science. During the last thirty years, economics haven't lost their scientific reputation, but, confronted with different situations in the world and in France, economics have been unable to bring appropriate solutions to every problem. Nowadays, economics are quite far from the promises born after the Second World War: indeed, until the 1974 world crisis, Keynes' theory (from a macro-economic point of view) and Walras' formalisation of the markets (from a micro-economic point of view) were sufficient to give satisfying representations of any economic reality and mechanism. Nevertheless, these theories were unable to deal with both growing unemployment and raising inflation in the late 70's and early 80's. Today, economics present a wide variety of efficient analysis of society. Game theory, behaviour theory or econometrics are new means used to found new kinds of reality representation. Therefore, if economics can no longer be considered as an exact science, it remains nevertheless scientific.

  10. Confronting uncertainty in wildlife management: performance of grizzly bear management.

    Directory of Open Access Journals (Sweden)

    Kyle A Artelle

    Full Text Available Scientific management of wildlife requires confronting the complexities of natural and social systems. Uncertainty poses a central problem. Whereas the importance of considering uncertainty has been widely discussed, studies of the effects of unaddressed uncertainty on real management systems have been rare. We examined the effects of outcome uncertainty and components of biological uncertainty on hunt management performance, illustrated with grizzly bears (Ursus arctos horribilis in British Columbia, Canada. We found that both forms of uncertainty can have serious impacts on management performance. Outcome uncertainty alone--discrepancy between expected and realized mortality levels--led to excess mortality in 19% of cases (population-years examined. Accounting for uncertainty around estimated biological parameters (i.e., biological uncertainty revealed that excess mortality might have occurred in up to 70% of cases. We offer a general method for identifying targets for exploited species that incorporates uncertainty and maintains the probability of exceeding mortality limits below specified thresholds. Setting targets in our focal system using this method at thresholds of 25% and 5% probability of overmortality would require average target mortality reductions of 47% and 81%, respectively. Application of our transparent and generalizable framework to this or other systems could improve management performance in the presence of uncertainty.

  11. Confronting uncertainty in wildlife management: performance of grizzly bear management.

    Science.gov (United States)

    Artelle, Kyle A; Anderson, Sean C; Cooper, Andrew B; Paquet, Paul C; Reynolds, John D; Darimont, Chris T

    2013-01-01

    Scientific management of wildlife requires confronting the complexities of natural and social systems. Uncertainty poses a central problem. Whereas the importance of considering uncertainty has been widely discussed, studies of the effects of unaddressed uncertainty on real management systems have been rare. We examined the effects of outcome uncertainty and components of biological uncertainty on hunt management performance, illustrated with grizzly bears (Ursus arctos horribilis) in British Columbia, Canada. We found that both forms of uncertainty can have serious impacts on management performance. Outcome uncertainty alone--discrepancy between expected and realized mortality levels--led to excess mortality in 19% of cases (population-years) examined. Accounting for uncertainty around estimated biological parameters (i.e., biological uncertainty) revealed that excess mortality might have occurred in up to 70% of cases. We offer a general method for identifying targets for exploited species that incorporates uncertainty and maintains the probability of exceeding mortality limits below specified thresholds. Setting targets in our focal system using this method at thresholds of 25% and 5% probability of overmortality would require average target mortality reductions of 47% and 81%, respectively. Application of our transparent and generalizable framework to this or other systems could improve management performance in the presence of uncertainty.

  12. Bringing ATLAS production to HPC resources - A use case with the Hydra supercomputer of the Max Planck Society

    Science.gov (United States)

    Kennedy, J. A.; Kluth, S.; Mazzaferro, L.; Walker, Rodney

    2015-12-01

    The possible usage of HPC resources by ATLAS is now becoming viable due to the changing nature of these systems and it is also very attractive due to the need for increasing amounts of simulated data. In recent years the architecture of HPC systems has evolved, moving away from specialized monolithic systems, to a more generic linux type platform. This change means that the deployment of non HPC specific codes has become much easier. The timing of this evolution perfectly suits the needs of ATLAS and opens a new window of opportunity. The ATLAS experiment at CERN will begin a period of high luminosity data taking in 2015. This high luminosity phase will be accompanied by a need for increasing amounts of simulated data which is expected to exceed the capabilities of the current Grid infrastructure. ATLAS aims to address this need by opportunistically accessing resources such as cloud and HPC systems. This paper presents the results of a pilot project undertaken by ATLAS and the MPP/RZG to provide access to the HYDRA supercomputer facility. Hydra is the supercomputer of the Max Planck Society, it is a linux based supercomputer with over 80000 cores and 4000 physical nodes located at the RZG near Munich. This paper describes the work undertaken to integrate Hydra into the ATLAS production system by using the Nordugrid ARC-CE and other standard Grid components. The customization of these components and the strategies for HPC usage are discussed as well as possibilities for future directions.

  13. Algorithms and parallel computing

    CERN Document Server

    Gebali, Fayez

    2011-01-01

    There is a software gap between the hardware potential and the performance that can be attained using today's software parallel program development tools. The tools need manual intervention by the programmer to parallelize the code. Programming a parallel computer requires closely studying the target algorithm or application, more so than in the traditional sequential programming we have all learned. The programmer must be aware of the communication and data dependencies of the algorithm or application. This book provides the techniques to explore the possible ways to

  14. Parallel Programming Paradigms

    Science.gov (United States)

    1987-07-01

    GOVT ACCESSION NO. 3. RECIPIENT’S CATALOG NUMBER 4, TITL.: td Subtitle) S. TYPE OF REPORT & PERIOD COVERED Parallel Programming Paradigms...studied. 0A ITI is Jt, t’i- StCUI-eASSIICATION OFvrHIS PAGFrm".n Def. £ntered, Parallel Programming Paradigms Philip Arne Nelson Department of Computer...8416878 and by the Office of Naval Research Contracts No. N00014-86-K-0264 and No. N00014-85- K-0328. 8 ?~~ O .G 1 49 II Parallel Programming Paradigms

  15. GPAW - massively parallel electronic structure calculations with Python-based software.

    Energy Technology Data Exchange (ETDEWEB)

    Enkovaara, J.; Romero, N.; Shende, S.; Mortensen, J. (LCF)

    2011-01-01

    Electronic structure calculations are a widely used tool in materials science and large consumer of supercomputing resources. Traditionally, the software packages for these kind of simulations have been implemented in compiled languages, where Fortran in its different versions has been the most popular choice. While dynamic, interpreted languages, such as Python, can increase the effciency of programmer, they cannot compete directly with the raw performance of compiled languages. However, by using an interpreted language together with a compiled language, it is possible to have most of the productivity enhancing features together with a good numerical performance. We have used this approach in implementing an electronic structure simulation software GPAW using the combination of Python and C programming languages. While the chosen approach works well in standard workstations and Unix environments, massively parallel supercomputing systems can present some challenges in porting, debugging and profiling the software. In this paper we describe some details of the implementation and discuss the advantages and challenges of the combined Python/C approach. We show that despite the challenges it is possible to obtain good numerical performance and good parallel scalability with Python based software.

  16. Hybrid Numerical Solvers for Massively Parallel Eigenvalue Computation and Their Benchmark with Electronic Structure Calculations

    CERN Document Server

    Imachi, Hiroto

    2015-01-01

    Optimally hybrid numerical solvers were constructed for massively parallel generalized eigenvalue problem (GEP).The strong scaling benchmark was carried out on the K computer and other supercomputers for electronic structure calculation problems in the matrix sizes of M = 10^4-10^6 with upto 105 cores. The procedure of GEP is decomposed into the two subprocedures of the reducer to the standard eigenvalue problem (SEP) and the solver of SEP. A hybrid solver is constructed, when a routine is chosen for each subprocedure from the three parallel solver libraries of ScaLAPACK, ELPA and EigenExa. The hybrid solvers with the two newer libraries, ELPA and EigenExa, give better benchmark results than the conventional ScaLAPACK library. The detailed analysis on the results implies that the reducer can be a bottleneck in next-generation (exa-scale) supercomputers, which indicates the guidance for future research. The code was developed as a middleware and a mini-application and will appear online.

  17. Petascale Parallelization of the Gyrokinetic Toroidal Code

    Energy Technology Data Exchange (ETDEWEB)

    Ethier, Stephane; Adams, Mark; Carter, Jonathan; Oliker, Leonid

    2010-05-01

    The Gyrokinetic Toroidal Code (GTC) is a global, three-dimensional particle-in-cell application developed to study microturbulence in tokamak fusion devices. The global capability of GTC is unique, allowing researchers to systematically analyze important dynamics such as turbulence spreading. In this work we examine a new radial domain decomposition approach to allow scalability onto the latest generation of petascale systems. Extensive performance evaluation is conducted on three high performance computing systems: the IBM BG/P, the Cray XT4, and an Intel Xeon Cluster. Overall results show that the radial decomposition approach dramatically increases scalability, while reducing the memory footprint - allowing for fusion device simulations at an unprecedented scale. After a decade where high-end computing (HEC) was dominated by the rapid pace of improvements to processor frequencies, the performance of next-generation supercomputers is increasingly differentiated by varying interconnect designs and levels of integration. Understanding the tradeoffs of these system designs is a key step towards making effective petascale computing a reality. In this work, we examine a new parallelization scheme for the Gyrokinetic Toroidal Code (GTC) [?] micro-turbulence fusion application. Extensive scalability results and analysis are presented on three HEC systems: the IBM BlueGene/P (BG/P) at Argonne National Laboratory, the Cray XT4 at Lawrence Berkeley National Laboratory, and an Intel Xeon cluster at Lawrence Livermore National Laboratory. Overall results indicate that the new radial decomposition approach successfully attains unprecedented scalability to 131,072 BG/P cores by overcoming the memory limitations of the previous approach. The new version is well suited to utilize emerging petascale resources to access new regimes of physical phenomena.

  18. Parallel programming with PCN

    Energy Technology Data Exchange (ETDEWEB)

    Foster, I.; Tuecke, S.

    1991-12-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

  19. Algorithmic support for commodity-based parallel computing systems.

    Energy Technology Data Exchange (ETDEWEB)

    Leung, Vitus Joseph; Bender, Michael A. (State University of New York, Stony Brook, NY); Bunde, David P. (University of Illinois, Urbna, IL); Phillips, Cynthia Ann

    2003-10-01

    The Computational Plant or Cplant is a commodity-based distributed-memory supercomputer under development at Sandia National Laboratories. Distributed-memory supercomputers run many parallel programs simultaneously. Users submit their programs to a job queue. When a job is scheduled to run, it is assigned to a set of available processors. Job runtime depends not only on the number of processors but also on the particular set of processors assigned to it. Jobs should be allocated to localized clusters of processors to minimize communication costs and to avoid bandwidth contention caused by overlapping jobs. This report introduces new allocation strategies and performance metrics based on space-filling curves and one dimensional allocation strategies. These algorithms are general and simple. Preliminary simulations and Cplant experiments indicate that both space-filling curves and one-dimensional packing improve processor locality compared to the sorted free list strategy previously used on Cplant. These new allocation strategies are implemented in Release 2.0 of the Cplant System Software that was phased into the Cplant systems at Sandia by May 2002. Experimental results then demonstrated that the average number of communication hops between the processors allocated to a job strongly correlates with the job's completion time. This report also gives processor-allocation algorithms for minimizing the average number of communication hops between the assigned processors for grid architectures. The associated clustering problem is as follows: Given n points in {Re}d, find k points that minimize their average pairwise L{sub 1} distance. Exact and approximate algorithms are given for these optimization problems. One of these algorithms has been implemented on Cplant and will be included in Cplant System Software, Version 2.1, to be released. In more preliminary work, we suggest improvements to the scheduler separate from the allocator.

  20. Parallel Software Model Checking

    Science.gov (United States)

    2015-01-08

    JAN 2015 2. REPORT TYPE N/A 3. DATES COVERED 4. TITLE AND SUBTITLE Parallel Software Model Checking 5a. CONTRACT NUMBER 5b. GRANT NUMBER...AND ADDRESS(ES) Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213 8. PERFORMING ORGANIZATION REPORT NUMBER 9...3: ∧ ≥ 10 ∧ ≠ 10 ⇒ : Parallel Software Model Checking Team Members Sagar Chaki, Arie Gurfinkel

  1. Continuous parallel coordinates.

    Science.gov (United States)

    Heinrich, Julian; Weiskopf, Daniel

    2009-01-01

    Typical scientific data is represented on a grid with appropriate interpolation or approximation schemes,defined on a continuous domain. The visualization of such data in parallel coordinates may reveal patterns latently contained in the data and thus can improve the understanding of multidimensional relations. In this paper, we adopt the concept of continuous scatterplots for the visualization of spatially continuous input data to derive a density model for parallel coordinates. Based on the point-line duality between scatterplots and parallel coordinates, we propose a mathematical model that maps density from a continuous scatterplot to parallel coordinates and present different algorithms for both numerical and analytical computation of the resulting density field. In addition, we show how the 2-D model can be used to successively construct continuous parallel coordinates with an arbitrary number of dimensions. Since continuous parallel coordinates interpolate data values within grid cells, a scalable and dense visualization is achieved, which will be demonstrated for typical multi-variate scientific data.

  2. Influence of Earth crust composition on continental collision style in Precambrian conditions: Results of supercomputer modelling

    Science.gov (United States)

    Zavyalov, Sergey; Zakharov, Vladimir

    2016-04-01

    A number of issues concerning Precambrian geodynamics still remain unsolved because of uncertainity of many physical (thermal regime, lithosphere thickness, crust thickness, etc.) and chemical (mantle composition, crust composition) parameters, which differed considerably comparing to the present day values. In this work, we show results of numerical supercomputations based on petrological and thermomechanical 2D model, which simulates the process of collision between two continental plates, each 80-160 km thick, with various convergence rates ranging from 5 to 15 cm/year. In the model, the upper mantle temperature is 150-200 ⁰C higher than the modern value, while the continental crust radiogenic heat production is higher than the present value by the factor of 1.5. These settings correspond to Archean conditions. The present study investigates the dependence of collision style on various continental crust parameters, especially on crust composition. The 3 following archetypal settings of continental crust composition are examined: 1) completely felsic continental crust; 2) basic lower crust and felsic upper crust; 3) basic upper crust and felsic lower crust (hereinafter referred to as inverted crust). Modeling results show that collision with completely felsic crust is unlikely. In the case of basic lower crust, a continental subduction and subsequent continental rocks exhumation can take place. Therefore, formation of ultra-high pressure metamorphic rocks is possible. Continental subduction also occurs in the case of inverted continental crust. However, in the latter case, the exhumation of felsic rocks is blocked by upper basic layer and their subsequent interaction depends on their volume ratio. Thus, if the total inverted crust thickness is about 15 km and the thicknesses of the two layers are equal, felsic rocks cannot be exhumed. If the total thickness is 30 to 40 km and that of the felsic layer is 20 to 25 km, it breaks through the basic layer leading to

  3. Architecture, implementation and parallelization of the software to search for periodic gravitational wave signals

    Science.gov (United States)

    Poghosyan, G.; Matta, S.; Streit, A.; Bejger, M.; Królak, A.

    2015-03-01

    The parallelization, design and scalability of the PolGrawAllSky code to search for periodic gravitational waves from rotating neutron stars is discussed. The code is based on an efficient implementation of the F-statistic using the Fast Fourier Transform algorithm. To perform an analysis of data from the advanced LIGO and Virgo gravitational wave detectors' network, which will start operating in 2015, hundreds of millions of CPU hours will be required-the code utilizing the potential of massively parallel supercomputers is therefore mandatory. We have parallelized the code using the Message Passing Interface standard, implemented a mechanism for combining the searches at different sky-positions and frequency bands into one extremely scalable program. The parallel I/O interface is used to escape bottlenecks, when writing the generated data into file system. This allowed to develop a highly scalable computation code, which would enable the data analysis at large scales on acceptable time scales. Benchmarking of the code on a Cray XE6 system was performed to show efficiency of our parallelization concept and to demonstrate scaling up to 50 thousand cores in parallel.

  4. A Massively Parallel Solver for the Mechanical Harmonic Analysis of Accelerator Cavities

    Energy Technology Data Exchange (ETDEWEB)

    Kononenko, O. [SLAC National Accelerator Lab., Menlo Park, CA (United States)

    2015-02-17

    ACE3P is a 3D massively parallel simulation suite that developed at SLAC National Accelerator Laboratory that can perform coupled electromagnetic, thermal and mechanical study. Effectively utilizing supercomputer resources, ACE3P has become a key simulation tool for particle accelerator R and D. A new frequency domain solver to perform mechanical harmonic response analysis of accelerator components is developed within the existing parallel framework. This solver is designed to determine the frequency response of the mechanical system to external harmonic excitations for time-efficient accurate analysis of the large-scale problems. Coupled with the ACE3P electromagnetic modules, this capability complements a set of multi-physics tools for a comprehensive study of microphonics in superconducting accelerating cavities in order to understand the RF response and feedback requirements for the operational reliability of a particle accelerator. (auth)

  5. Parallel Approach for Time Series Analysis with General Regression Neural Networks

    Directory of Open Access Journals (Sweden)

    J.C. Cuevas-Tello

    2012-04-01

    Full Text Available The accuracy on time delay estimation given pairs of irregularly sampled time series is of great relevance in astrophysics. However the computational time is also important because the study of large data sets is needed. Besides introducing a new approach for time delay estimation, this paper presents a parallel approach to obtain a fast algorithm for time delay estimation. The neural network architecture that we use is general Regression Neural Network (GRNN. For the parallel approach, we use Message Passing Interface (MPI on a beowulf-type cluster and on a Cray supercomputer and we also use the Compute Unified Device Architecture (CUDA™ language on Graphics Processing Units (GPUs. We demonstrate that, with our approach, fast algorithms can be obtained for time delay estimation on large data sets with the same accuracy as state-of-the-art methods.

  6. Parallel symbolic state-space exploration is difficult, but what is the alternative?

    CERN Document Server

    Ciardo, Gianfranco; Jin, Xiaoqing; 10.4204/EPTCS.14.1

    2009-01-01

    State-space exploration is an essential step in many modeling and analysis problems. Its goal is to find the states reachable from the initial state of a discrete-state model described. The state space can used to answer important questions, e.g., "Is there a dead state?" and "Can N become negative?", or as a starting point for sophisticated investigations expressed in temporal logic. Unfortunately, the state space is often so large that ordinary explicit data structures and sequential algorithms cannot cope, prompting the exploration of (1) parallel approaches using multiple processors, from simple workstation networks to shared-memory supercomputers, to satisfy large memory and runtime requirements and (2) symbolic approaches using decision diagrams to encode the large structured sets and relations manipulated during state-space generation. Both approaches have merits and limitations. Parallel explicit state-space generation is challenging, but almost linear speedup can be achieved; however, the analysis is...

  7. Animation of interactive fluid flow visualization tools on a data parallel machine

    Energy Technology Data Exchange (ETDEWEB)

    Sethian, J.A. (California Univ., Berkeley, CA (USA). Dept. of Mathematics); Salem, J.B. (Thinking Machines Corp., Cambridge, MA (USA))

    1989-01-01

    The authors describe a new graphics environment for essentially real-time interactive visualization of computational fluid mechanics. The researcher may interactively examine fluid data on a graphics display using animated flow visualization diagnostics that mimic those in the experimental laboratory. These tools include display of moving color contours for scalar fields, smoke or dye injection of passive particles to identify coherent flow structures, and bubble wire tracers for velocity profiles, as well as three-dimensional interactive rotation and zoom and pan. The system is implemented on a data parallel supercomputer attached to a framebuffer. Since most fluid visualization techniques are highly parallel in nature, this allows rapid animation of fluid motion. The authors demonstrate our interactive graphics fluid flow system by analyzing data generated by numerical simulations of viscous, incompressible, laminar and turbulent flow over a backward-facing step and in a closed cavity. Input parameters are menu-driven, and images are updated at nine frames per second.

  8. Task parallel sensitivity analysis and parameter estimation of groundwater simulations through the SALSSA framework

    Energy Technology Data Exchange (ETDEWEB)

    Schuchardt, Karen L.; Agarwal, Khushbu; Chase, Jared M.; Rockhold, Mark L.; Freedman, Vicky L.; Elsethagen, Todd O.; Scheibe, Timothy D.; Chin, George; Sivaramakrishnan, Chandrika

    2010-07-15

    The Support Architecture for Large-Scale Subsurface Analysis (SALSSA) provides an extensible framework, sophisticated graphical user interface, and underlying data management system that simplifies the process of running subsurface models, tracking provenance information, and analyzing the model results. Initially, SALSSA supported two styles of job control: user directed execution and monitoring of individual jobs, and load balancing of jobs across multiple machines taking advantage of many available workstations. Recent efforts in subsurface modelling have been directed at advancing simulators to take advantage of leadership class supercomputers. We describe two approaches, current progress, and plans toward enabling efficient application of the subsurface simulator codes via the SALSSA framework: automating sensitivity analysis problems through task parallelism, and task parallel parameter estimation using the PEST framework.

  9. Accurate reaction-diffusion operator splitting on tetrahedral meshes for parallel stochastic molecular simulations

    Science.gov (United States)

    Hepburn, I.; Chen, W.; De Schutter, E.

    2016-08-01

    Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realistic biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification.

  10. Parallel STEPS: Large Scale Stochastic Spatial Reaction-Diffusion Simulation with High Performance Computers

    CERN Document Server

    Chen, Weiliang

    2016-01-01

    Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of simulated models and morphologies have exceeded the capacity of any serial implementation. This led to development of parallel solutions that benefit from the boost in performance of modern large-scale supercomputers. In this paper, we describe an MPI-based, parallel Operator-Splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its usage in real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecul...

  11. Operational mesoscale atmospheric dispersion prediction using a parallel computing cluster

    Indian Academy of Sciences (India)

    C V Srinivas; R Venkatesan; N V Muralidharan; Someshwar Das; Hari Dass; P Eswara Kumar

    2006-06-01

    An operational atmospheric dispersion prediction system is implemented on a cluster supercomputer for Online Emergency Response at the Kalpakkam nuclear site.This numerical system constitutes a parallel version of a nested grid meso-scale meteorological model MM5 coupled to a random walk particle dispersion model FLEXPART.The system provides 48-hour forecast of the local weather and radioactive plume dispersion due to hypothetical airborne releases in a range of 100 km around the site.The parallel code was implemented on different cluster con figurations like distributed and shared memory systems.A 16-node dual Xeon distributed memory gigabit ethernet cluster has been found sufficient for operational applications.The runtime of a triple nested domain MM5 is about 4 h for a 24 h forecast.The system had been operated continuously for a few months and results were ported on the IMSc home page. Initial and periodic boundary condition data for MM5 are provided by NCMRWF,New Delhi. An alternative source is found to be NCEP,USA.These two sources provide the input data to the operational models at different spatial and temporal resolutions using different assimilation methods.A comparative study on the results of forecast is presented using these two data sources for present operational use.Improvement is noticed in rainfall forecasts that used NCEP data, probably because of its high spatial and temporal resolution.

  12. Perm Web: remote parallel and distributed volume visualization

    Energy Technology Data Exchange (ETDEWEB)

    Wittenbrink, C.M.; Kim, K.; Story, J.; Pang, A.; Hollerbach, K.; Max, N.

    1997-01-01

    In this paper we present a system for visualizing volume data from remote supercomputers (PermWeb). We have developed both parallel volume rendering algorithms, and the World Wide Web software for accessing the data at the remote sites. The implementation uses Hypertext Markup Language (HTML), Java, and Common Gateway Interface (CGI) scripts to connect World Wide Web (WWW) servers/clients to our volume renderers. The front ends are interactive Java classes for specification of view, shading, and classification inputs. We present performance results, and implementation details for connections to our computing resources at the University of California Santa Cruz including a MasPar MP-2, SGI Reality Engine-RE2, and SGI Challenge machines. We apply the system to the task of visualizing trabecular bone from finite element simulations. Fast volume rendering on remote compute servers through a web interface allows us to increase the accessibility of the results to more users. User interface issues, overviews of parallel algorithm developments, and overall system interfaces and protocols are presented. Access is available through Uniform Resource Locator (URL) http://www.cse.ucsc.edu/research/slvg/. 26 refs., 7 figs.

  13. The Heat is On! Confronting Climate Change in the Classroom

    Science.gov (United States)

    Bowman, R.; Atwood-Blaine, D.

    2008-12-01

    This paper discusses a professional development workshop for K-12 science teachers entitled "The Heat is On! Confronting Climate Change in the Classroom." This workshop was conducted by the Center for Remote Sensing of Ice Sheets (CReSIS), which has the primary goal to understand and predict the role of polar ice sheets in sea level change. The specific objectives of this summer workshop were two-fold; first, to address the need for advancement in science technology engineering and mathematics (STEM) education and second, to address the need for science teacher training in climate change science. Twenty-eight Kansas teachers completed four pre-workshop assignments online in Moodle and attended a one-week workshop. The workshop included lecture presentations by scientists (both face-to-face and via video-conference) and collaboration between teachers and scientists to create online inquiry-based lessons on the water budget, remote sensing, climate data, and glacial modeling. Follow-up opportunities are communicated via the CReSIS Teachers listserv to maintain and further develop the collegial connections and collaborations established during the workshop. Both qualitative and quantitative evaluation results indicate that this workshop was particularly effective in the following four areas: 1) creating meaningful connections between K-12 teachers and CReSIS scientists; 2) integrating distance-learning technologies to facilitate the social construction of knowledge; 3) increasing teachers' content understanding of climate change and its impacts on the cryosphere and global sea level; and 4) increasing teachers' self-efficacy beliefs about teaching climate science. Evaluation methods included formative content understanding assessments (via "clickers") during each scientist's presentation, a qualitative evaluation survey administered at the end of the workshop, and two quantitative evaluation instruments administered pre- and post- workshop. The first of these

  14. Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

    Energy Technology Data Exchange (ETDEWEB)

    Xu, Chuanfu, E-mail: xuchuanfu@nudt.edu.cn [College of Computer Science, National University of Defense Technology, Changsha 410073 (China); Deng, Xiaogang; Zhang, Lilun [College of Computer Science, National University of Defense Technology, Changsha 410073 (China); Fang, Jianbin [Parallel and Distributed Systems Group, Delft University of Technology, Delft 2628CD (Netherlands); Wang, Guangxue; Jiang, Yi [State Key Laboratory of Aerodynamics, P.O. Box 211, Mianyang 621000 (China); Cao, Wei; Che, Yonggang; Wang, Yongxian; Wang, Zhenghua; Liu, Wei; Cheng, Xinghua [College of Computer Science, National University of Defense Technology, Changsha 410073 (China)

    2014-12-01

    Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations for high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU–GPU collaborative simulations

  15. Book review: Confronting the Shadow Education System: What Government Policies for What Private Tutoring'

    Directory of Open Access Journals (Sweden)

    Keith Watson

    2012-10-01

    Full Text Available Book Review Confronting the Shadow Education System: What Government Policies for What Private Tutoring' By Mark Bray (2009, 135pp. ISBN: 078-92-803-1333-8, Paris: IIEP/UNESCO Publishing

  16. Parallel Magnetic Resonance Imaging

    CERN Document Server

    Uecker, Martin

    2015-01-01

    The main disadvantage of Magnetic Resonance Imaging (MRI) are its long scan times and, in consequence, its sensitivity to motion. Exploiting the complementary information from multiple receive coils, parallel imaging is able to recover images from under-sampled k-space data and to accelerate the measurement. Because parallel magnetic resonance imaging can be used to accelerate basically any imaging sequence it has many important applications. Parallel imaging brought a fundamental shift in image reconstruction: Image reconstruction changed from a simple direct Fourier transform to the solution of an ill-conditioned inverse problem. This work gives an overview of image reconstruction from the perspective of inverse problems. After introducing basic concepts such as regularization, discretization, and iterative reconstruction, advanced topics are discussed including algorithms for auto-calibration, the connection to approximation theory, and the combination with compressed sensing.

  17. Parallel optical sampler

    Energy Technology Data Exchange (ETDEWEB)

    Tauke-Pedretti, Anna; Skogen, Erik J; Vawter, Gregory A

    2014-05-20

    An optical sampler includes a first and second 1.times.n optical beam splitters splitting an input optical sampling signal and an optical analog input signal into n parallel channels, respectively, a plurality of optical delay elements providing n parallel delayed input optical sampling signals, n photodiodes converting the n parallel optical analog input signals into n respective electrical output signals, and n optical modulators modulating the input optical sampling signal or the optical analog input signal by the respective electrical output signals, and providing n successive optical samples of the optical analog input signal. A plurality of output photodiodes and eADCs convert the n successive optical samples to n successive digital samples. The optical modulator may be a photodiode interconnected Mach-Zehnder Modulator. A method of sampling the optical analog input signal is disclosed.

  18. Performance analysis of three dimensional integral equation computations on a massively parallel computer. M.S. Thesis

    Science.gov (United States)

    Logan, Terry G.

    1994-01-01

    The purpose of this study is to investigate the performance of the integral equation computations using numerical source field-panel method in a massively parallel processing (MPP) environment. A comparative study of computational performance of the MPP CM-5 computer and conventional Cray-YMP supercomputer for a three-dimensional flow problem is made. A serial FORTRAN code is converted into a parallel CM-FORTRAN code. Some performance results are obtained on CM-5 with 32, 62, 128 nodes along with those on Cray-YMP with a single processor. The comparison of the performance indicates that the parallel CM-FORTRAN code near or out-performs the equivalent serial FORTRAN code for some cases.

  19. SPINning parallel systems software.

    Energy Technology Data Exchange (ETDEWEB)

    Matlin, O.S.; Lusk, E.; McCune, W.

    2002-03-15

    We describe our experiences in using Spin to verify parts of the Multi Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of processes connected by Unix network sockets. MPD is dynamic processes and connections among them are created and destroyed as MPD is initialized, runs user processes, recovers from faults, and terminates. This dynamic nature is easily expressible in the Spin/Promela framework but poses performance and scalability challenges. We present here the results of expressing some of the parallel algorithms of MPD and executing both simulation and verification runs with Spin.

  20. Coarrars for Parallel Processing

    Science.gov (United States)

    Snyder, W. Van

    2011-01-01

    The design of the Coarray feature of Fortran 2008 was guided by answering the question "What is the smallest change required to convert Fortran to a robust and efficient parallel language." Two fundamental issues that any parallel programming model must address are work distribution and data distribution. In order to coordinate work distribution and data distribution, methods for communication and synchronization must be provided. Although originally designed for Fortran, the Coarray paradigm has stimulated development in other languages. X10, Chapel, UPC, Titanium, and class libraries being developed for C++ have the same conceptual framework.

  1. Parallel programming with Python

    CERN Document Server

    Palach, Jan

    2014-01-01

    A fast, easy-to-follow and clear tutorial to help you develop Parallel computing systems using Python. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts and will help you in implementing these techniques in the real world. If you are an experienced Python programmer and are willing to utilize the available computing resources by parallelizing applications in a simple way, then this book is for you. You are required to have a basic knowledge of Python development to get the most of this book.

  2. ADAPTATION OF PARALLEL VIRTUAL MACHINES MECHANISMS TO PARALLEL SYSTEMS

    Directory of Open Access Journals (Sweden)

    Zafer DEMİR

    2001-02-01

    Full Text Available In this study, at first, Parallel Virtual Machine is reviewed. Since It is based upon parallel processing, it is similar to parallel systems in principle in terms of architecture. Parallel Virtual Machine is neither an operating system nor a programming language. It is a specific software tool that supports heterogeneous parallel systems. However, it takes advantage of the features of both to make users close to parallel systems. Since tasks can be executed in parallel on parallel systems by Parallel Virtual Machine, there is an important similarity between PVM and distributed systems and multiple processors. In this study, the relations in question are examined by making use of Master-Slave programming technique. In conclusion, the PVM is tested with a simple factorial computation on a distributed system to observe its adaptation to parallel architects.

  3. Northeast Parallel Architectures Center (NPAC)

    Science.gov (United States)

    1992-07-01

    34Supercomputing 󈨝, Reno, Nevada, November 1898. Mohamed, A.G., Valentine, D.T. and Hessel , R.E., Users Guide to Some Useful Mathematical Software. Report Number...Stein, Visiting Researcher, to SCCS/NPAC Meeting, January 22, 1991. SCCS 46: Mohamed, A. Gaber, Valentine, Daniel T., and Hessel , Richard E., "Numerical...22202 Los Alamos National Laboratory 1 Report Library MS 5300 Los Alamos NM 87544 AEDC Library I Tech Filtes•MS-lO0 Arnold AFB TN 37389 Commancer

  4. When do high and low status group members support confrontation? The role of perceived pervasiveness of prejudice.

    Science.gov (United States)

    Kahn, Kimberly Barsamian; Barreto, Manuela; Kaiser, Cheryl R; Rego, Marco Silva

    2016-03-01

    This paper examines how perceived pervasiveness of prejudice differentially affects high and low status group members' support for a low status group member who confronts. In Experiment 1 (N = 228), men and women read a text describing sexism as rare or as pervasive and subsequently indicated their support for a woman who confronted or did not confront a sexist remark. Experiment 2 (N = 324) specified the underlying process using a self-affirmation manipulation. Results show that men were more supportive of confrontation when sexism was perceived to be rare than when it was pervasive. By contrast, women tended to prefer confrontation when sexism was pervasive relative to when it was rare. Personal self-affirmation decreased men's and increased women's support for confrontation when prejudice was rare, suggesting that men's and women's support for confrontation when prejudice is rare is driven by personal impression management considerations. Implications for understanding how members of low and high status groups respond to prejudice are discussed.

  5. User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code

    Energy Technology Data Exchange (ETDEWEB)

    Earth Sciences Division; Zhang, Keni; Zhang, Keni; Wu, Yu-Shu; Pruess, Karsten

    2008-05-27

    TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator is to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used

  6. Task-parallel message passing interface implementation of Autodock4 for docking of very large databases of compounds using high-performance super-computers.

    Science.gov (United States)

    Collignon, Barbara; Schulz, Roland; Smith, Jeremy C; Baudry, Jerome

    2011-04-30

    A message passing interface (MPI)-based implementation (Autodock4.lga.MPI) of the grid-based docking program Autodock4 has been developed to allow simultaneous and independent docking of multiple compounds on up to thousands of central processing units (CPUs) using the Lamarkian genetic algorithm. The MPI version reads a single binary file containing precalculated grids that represent the protein-ligand interactions, i.e., van der Waals, electrostatic, and desolvation potentials, and needs only two input parameter files for the entire docking run. In comparison, the serial version of Autodock4 reads ASCII grid files and requires one parameter file per compound. The modifications performed result in significantly reduced input/output activity compared with the serial version. Autodock4.lga.MPI scales up to 8192 CPUs with a maximal overhead of 16.3%, of which two thirds is due to input/output operations and one third originates from MPI operations. The optimal docking strategy, which minimizes docking CPU time without lowering the quality of the database enrichments, comprises the docking of ligands preordered from the most to the least flexible and the assignment of the number of energy evaluations as a function of the number of rotatable bounds. In 24 h, on 8192 high-performance computing CPUs, the present MPI version would allow docking to a rigid protein of about 300K small flexible compounds or 11 million rigid compounds.

  7. Building a Parallel Supercomputer with Pile-of--PCs Cluster%用PC机群组构并行超级计算机

    Institute of Scientific and Technical Information of China (English)

    黎康保; 陶文正; 许丽华; 黎文楼

    2000-01-01

    美国由高等院校、大型实验室和研究部门共同研究推出PC群机Beowulf超级计算机.这一创举,说明超级计算机可以用大众化的PC机集群来完成,这对我国是一个挑战和机遇.这里对Beowulf作了一些研究的基础上,论述了PC群机的结构组成原理,操作系统平台和并行计算程序设计、并行通信程序设计等同题.

  8. Parallel k-means++

    Energy Technology Data Exchange (ETDEWEB)

    2017-04-04

    A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.

  9. Parallel hierarchical radiosity rendering

    Energy Technology Data Exchange (ETDEWEB)

    Carter, M.

    1993-07-01

    In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.

  10. Practical parallel programming

    CERN Document Server

    Bauer, Barr E

    2014-01-01

    This is the book that will teach programmers to write faster, more efficient code for parallel processors. The reader is introduced to a vast array of procedures and paradigms on which actual coding may be based. Examples and real-life simulations using these devices are presented in C and FORTRAN.

  11. Parallel and Distributed Databases

    NARCIS (Netherlands)

    Hiemstra, Djoerd; Kemper, Alfons; Prieto, Manuel; Szalay, Alex

    2009-01-01

    Euro-Par Topic 5 addresses data management issues in parallel and distributed computing. Advances in data management (storage, access, querying, retrieval, mining) are inherent to current and future information systems. Today, accessing large volumes of information is a reality: Data-intensive appli

  12. Parallel Fast Legendre Transform

    NARCIS (Netherlands)

    Alves de Inda, M.; Bisseling, R.H.; Maslen, D.K.

    2001-01-01

    We discuss a parallel implementation of a fast algorithm for the discrete polynomial Legendre transform We give an introduction to the DriscollHealy algorithm using polynomial arithmetic and present experimental results on the eciency and accuracy of our implementation The algorithms were implemente

  13. Implementation of Parallel Algorithms

    Science.gov (United States)

    1991-09-30

    Lecture Notes in Computer Science , Warwich, England, July 16-20... Lecture Notes in Computer Science , Springer-Verlag, Bangalor, India, December 1990. J. Reif, J. Canny, and A. Page, "An Exact Algorithm for Kinodynamic...Parallel Algorithms and its Impact on Computational Geometry, in Optimal Algorithms, H. Djidjev editor, Springer-Verlag Lecture Notes in Computer Science

  14. Parallel universes beguile science

    CERN Multimedia

    2007-01-01

    A staple of mind-bending science fiction, the possibility of multiple universes has long intrigued hard-nosed physicists, mathematicians and cosmologists too. We may not be able -- as least not yet -- to prove they exist, many serious scientists say, but there are plenty of reasons to think that parallel dimensions are more than figments of eggheaded imagination.

  15. Parallel programming with PCN

    Energy Technology Data Exchange (ETDEWEB)

    Foster, I.; Tuecke, S.

    1993-01-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.

  16. Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA

    Science.gov (United States)

    Oliker, Leonid; Biswas, Rupak

    1999-01-01

    The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.

  17. Proposal of a Desk-Side Supercomputer with Reconfigurable Data-Paths Using Rapid Single-Flux-Quantum Circuits

    Science.gov (United States)

    Takagi, Naofumi; Murakami, Kazuaki; Fujimaki, Akira; Yoshikawa, Nobuyuki; Inoue, Koji; Honda, Hiroaki

    We propose a desk-side supercomputer with large-scale reconfigurable data-paths (LSRDPs) using superconducting rapid single-flux-quantum (RSFQ) circuits. It has several sets of computing unit which consists of a general-purpose microprocessor, an LSRDP and a memory. An LSRDP consists of a lot of, e. g., a few thousand, floating-point units (FPUs) and operand routing networks (ORNs) which connect the FPUs. We reconfigure the LSRDP to fit a computation, i. e., a group of floating-point operations, which appears in a ‘for’ loop of numerical programs by setting the route in ORNs before the execution of the loop. We propose to implement the LSRDPs by RSFQ circuits. The processors and the memories can be implemented by semiconductor technology. We expect that a 10 TFLOPS supercomputer, as well as a refrigerating engine, will be housed in a desk-side rack, using a near-future RSFQ process technology, such as 0.35μm process.

  18. 超级计算中心核心应用的浅析%Brief Exploration on Technical Development of Key Applications at Supercomputing Center

    Institute of Scientific and Technical Information of China (English)

    党岗; 程志全

    2013-01-01

    目前,我国国家级超算中心大多采用“地方政府投资、以市场为导向开展应用”的建设思路,地方政府更关心涉及本地企事业单位的高性能计算应用和服务,超算中心常被用于普通的应用,很难充分发挥超级计算的战略作用.如何让超算中心这艘能力超强的航母生存下来,进而“攻城掠地”,推动技术创新,一直是业内人士研究的课题.初步探讨了国内超算中心核心应用所面临的挑战,提出了超算中心核心应用服务地方建设的几点建议.%National supercomputing centers at China work use building thought of local government investigation, and market-oriented application performing. Supercomputing resources are always applied at general applications,as the local govenment more focuses on the high-performance computing applications and services related to local business, rather than supercomputing working as strategical role in the traditional way. It is a long-term researching topic how to make the supercomputing powerful as a super-carrier active and applicable to benefit the technical innovation. Some challenging technical issues suiting for the superomputing were discussed by taking domestic supercomputing center as the example, and some useful advises were addressed for applying international supercomputing center at local services.

  19. To Parallelize or Not to Parallelize, Speed Up Issue

    CERN Document Server

    Elnashar, Alaa Ismail

    2011-01-01

    Running parallel applications requires special and expensive processing resources to obtain the required results within a reasonable time. Before parallelizing serial applications, some analysis is recommended to be carried out to decide whether it will benefit from parallelization or not. In this paper we discuss the issue of speed up gained from parallelization using Message Passing Interface (MPI) to compromise between the overhead of parallelization cost and the gained parallel speed up. We also propose an experimental method to predict the speed up of MPI applications.

  20. Collisionless parallel shocks

    Science.gov (United States)

    Khabibrakhmanov, I. KH.; Galeev, A. A.; Galinskii, V. L.

    1993-01-01

    Consideration is given to a collisionless parallel shock based on solitary-type solutions of the modified derivative nonlinear Schroedinger equation (MDNLS) for parallel Alfven waves. The standard derivative nonlinear Schroedinger equation is generalized in order to include the possible anisotropy of the plasma distribution and higher-order Korteweg-de Vies-type dispersion. Stationary solutions of MDNLS are discussed. The anisotropic nature of 'adiabatic' reflections leads to the asymmetric particle distribution in the upstream as well as in the downstream regions of the shock. As a result, nonzero heat flux appears near the front of the shock. It is shown that this causes the stochastic behavior of the nonlinear waves, which can significantly contribute to the shock thermalization.

  1. Parallel grid population

    Science.gov (United States)

    Wald, Ingo; Ize, Santiago

    2015-07-28

    Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.

  2. Parallel clustering with CFinder

    CERN Document Server

    Pollner, Peter; Vicsek, Tamas; 10.1142/S0129626412400014

    2012-01-01

    The amount of available data about complex systems is increasing every year, measurements of larger and larger systems are collected and recorded. A natural representation of such data is given by networks, whose size is following the size of the original system. The current trend of multiple cores in computing infrastructures call for a parallel reimplementation of earlier methods. Here we present the grid version of CFinder, which can locate overlapping communities in directed, weighted or undirected networks based on the clique percolation method (CPM). We show that the computation of the communities can be distributed among several CPU-s or computers. Although switching to the parallel version not necessarily leads to gain in computing time, it definitely makes the community structure of extremely large networks accessible.

  3. PARALLEL MOVING MECHANICAL SYSTEMS

    Directory of Open Access Journals (Sweden)

    Florian Ion Tiberius Petrescu

    2014-09-01

    Full Text Available Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Moving mechanical systems parallel structures are solid, fast, and accurate. Between parallel systems it is to be noticed Stewart platforms, as the oldest systems, fast, solid and precise. The work outlines a few main elements of Stewart platforms. Begin with the geometry platform, kinematic elements of it, and presented then and a few items of dynamics. Dynamic primary element on it means the determination mechanism kinetic energy of the entire Stewart platforms. It is then in a record tail cinematic mobile by a method dot matrix of rotation. If a structural mottoelement consists of two moving elements which translates relative, drive train and especially dynamic it is more convenient to represent the mottoelement as a single moving components. We have thus seven moving parts (the six motoelements or feet to which is added mobile platform 7 and one fixed.

  4. Homology, convergence and parallelism.

    Science.gov (United States)

    Ghiselin, Michael T

    2016-01-05

    Homology is a relation of correspondence between parts of parts of larger wholes. It is used when tracking objects of interest through space and time and in the context of explanatory historical narratives. Homologues can be traced through a genealogical nexus back to a common ancestral precursor. Homology being a transitive relation, homologues remain homologous however much they may come to differ. Analogy is a relationship of correspondence between parts of members of classes having no relationship of common ancestry. Although homology is often treated as an alternative to convergence, the latter is not a kind of correspondence: rather, it is one of a class of processes that also includes divergence and parallelism. These often give rise to misleading appearances (homoplasies). Parallelism can be particularly hard to detect, especially when not accompanied by divergences in some parts of the body. © 2015 The Author(s).

  5. Parallel programming with MPI

    Energy Technology Data Exchange (ETDEWEB)

    Tatebe, Osamu [Electrotechnical Lab., Tsukuba, Ibaraki (Japan)

    1998-03-01

    MPI is a practical, portable, efficient and flexible standard for message passing, which has been implemented on most MPPs and network of workstations by machine vendors, universities and national laboratories. MPI avoids specifying how operations will take place and superfluous work to achieve efficiency as well as portability, and is also designed to encourage overlapping communication and computation to hide communication latencies. This presentation briefly explains the MPI standard, and comments on efficient parallel programming to improve performance. (author)

  6. Xyce parallel electronic simulator.

    Energy Technology Data Exchange (ETDEWEB)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.

    2010-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.

  7. Implementation of Parallel Algorithms

    Science.gov (United States)

    1993-06-30

    their socia ’ relations or to achieve some goals. For example, we define a pair-wise force law of i epulsion and attraction for a group of identical...quantization based compression schemes. Photo-refractive crystals, which provide high density recording in real time, are used as our holographic media . The...of Parallel Algorithms (J. Reif, ed.). Kluwer Academic Pu’ ishers, 1993. (4) "A Dynamic Separator Algorithm", D. Armon and J. Reif. To appear in

  8. Algorithmically specialized parallel computers

    CERN Document Server

    Snyder, Lawrence; Gannon, Dennis B

    1985-01-01

    Algorithmically Specialized Parallel Computers focuses on the concept and characteristics of an algorithmically specialized computer.This book discusses the algorithmically specialized computers, algorithmic specialization using VLSI, and innovative architectures. The architectures and algorithms for digital signal, speech, and image processing and specialized architectures for numerical computations are also elaborated. Other topics include the model for analyzing generalized inter-processor, pipelined architecture for search tree maintenance, and specialized computer organization for raster

  9. Parallel Algorithms Derivation

    Science.gov (United States)

    1989-03-31

    Lecture Notes in Computer Science , Warwich, England, July 16.20, 1990. J. Reif and J. Storer, "A Parallel Architecture for...34, The 10th Conference on Foundations of Software Technology and Theoretical Computer Science, Lecture Notes in Computer Science , Springer-Verlag...Geometry, in Optimal Algorithms, H. Djidjev editor, Springer-Verlag Lecture Notes in Computer Science 401, 1989, 1.8.. J. Reif, R. Paturi, and S.

  10. Stability of parallel flows

    CERN Document Server

    Betchov, R

    2012-01-01

    Stability of Parallel Flows provides information pertinent to hydrodynamical stability. This book explores the stability problems that occur in various fields, including electronics, mechanics, oceanography, administration, economics, as well as naval and aeronautical engineering. Organized into two parts encompassing 10 chapters, this book starts with an overview of the general equations of a two-dimensional incompressible flow. This text then explores the stability of a laminar boundary layer and presents the equation of the inviscid approximation. Other chapters present the general equation

  11. Parallel Feature Extraction System

    Institute of Scientific and Technical Information of China (English)

    MAHuimin; WANGYan

    2003-01-01

    Very high speed image processing is needed in some application specially for weapon. In this paper, a high speed image feature extraction system with parallel structure was implemented by Complex programmable logic device (CPLD), and it can realize image feature extraction in several microseconds almost with no delay. This system design is presented by an application instance of flying plane, whose infrared image includes two kinds of feature: geometric shape feature in the binary image and temperature-feature in the gray image. Accordingly the feature extraction is taken on the two kind features. Edge and area are two most important features of the image. Angle often exists in the connection of the different parts of the target's image, which indicates that one area ends and the other area begins. The three key features can form the whole presentation of an image. So this parallel feature extraction system includes three processing modules: edge extraction, angle extraction and area extraction. The parallel structure is realized by a group of processors, every detector is followed by one route of processor, every route has the same circuit form, and works together at the same time controlled by a set of clock to realize feature extraction. The extraction system has simple structure, small volume, high speed, and better stability against noise. It can be used in the war field recognition system.

  12. The Parallel C Preprocessor

    Directory of Open Access Journals (Sweden)

    Eugene D. Brooks III

    1992-01-01

    Full Text Available We describe a parallel extension of the C programming language designed for multiprocessors that provide a facility for sharing memory between processors. The programming model was initially developed on conventional shared memory machines with small processor counts such as the Sequent Balance and Alliant FX/8, but has more recently been used on a scalable massively parallel machine, the BBN TC2000. The programming model is split-join rather than fork-join. Concurrency is exploited to use a fixed number of processors more efficiently rather than to exploit more processors as in the fork-join model. Team splitting, a mechanism to split the team of processors executing a code into subteams to handle parallel subtasks, is used to provide an efficient mechanism to exploit nested concurrency. We have found the split-join programming model to have an inherent implementation advantage, compared to the fork-join model, when the number of processors in a machine becomes large.

  13. Anti-parallel triplexes

    DEFF Research Database (Denmark)

    Kosbar, Tamer R.; Sofan, Mamdouh A.; Waly, Mohamed A.

    2015-01-01

    -parallel TFO strand was modified with Y with one or two insertions at the end of the TFO strand, the thermal stability was increased 1.2 °C and 3 °C at pH 7.2, respectively, whereas one insertion in the middle of the TFO strand decreased the thermal stability 1.4 °C compared to the wild type oligonucleotide......The phosphoramidites of DNA monomers of 7-(3-aminopropyn-1-yl)-8-aza-7-deazaadenine (Y) and 7-(3-aminopropyn-1-yl)-8-aza-7-deazaadenine LNA (Z) are synthesized, and the thermal stability at pH 7.2 and 8.2 of anti-parallel triplexes modified with these two monomers is determined. When, the anti...... chain, especially at the end of the TFO strand. On the other hand, the thermal stability of the anti-parallel triplex was dramatically decreased when the TFO strand was modified with the LNA monomer analog Z in the middle of the TFO strand (ΔTm = -9.1 °C). Also the thermal stability decreased...

  14. A fully parallel, high precision, N-body code running on hybrid computing platforms

    CERN Document Server

    Capuzzo-Dolcetta, R; Punzo, D

    2012-01-01

    We present a new implementation of the numerical integration of the classical, gravitational, N-body problem based on a high order Hermite's integration scheme with block time steps, with a direct evaluation of the particle-particle forces. The main innovation of this code (called HiGPUs) is its full parallelization, exploiting both OpenMP and MPI in the use of the multicore Central Processing Units as well as either Compute Unified Device Architecture (CUDA) or OpenCL for the hosted Graphic Processing Units. We tested both performance and accuracy of the code using up to 256 GPUs in the supercomputer IBM iDataPlex DX360M3 Linux Infiniband Cluster provided by the italian supercomputing consortium CINECA, for values of N up to 8 millions. We were able to follow the evolution of a system of 8 million bodies for few crossing times, task previously unreached by direct summation codes. The code is freely available to the scientific community.

  15. Individual and Situational Factors Related to Young Women's Likelihood of Confronting Sexism in Their Everyday Lives.

    Science.gov (United States)

    Ayres, Melanie M; Friedman, Carly K; Leaper, Campbell

    2009-10-01

    Factors related to young women's reported likelihood of confronting sexism were investigated. Participants were 338 U.S. female undergraduates (M = 19 years) attending a California university. They were asked to complete questionnaire measures and to write a personal narrative about an experience with sexism. Approximately half (46%) the women reported confronting the perpetrator. Individual factors (prior experience with sexism, feminist identification, collective action) and situational factors (familiarity and status of perpetrator, type of sexism) were tested as predictors in a logistic regression. Women were less likely to report confronting sexism if (1) they did not identify as feminists, (2) the perpetrator was unfamiliar or high-status/familiar (vs. familiar/equal-status), or (3) the type of sexism involved unwanted sexual attention (vs. sexist comments).

  16. Perceptions of racial confrontation: the role of color blindness and comment ambiguity.

    Science.gov (United States)

    Zou, Linda X; Dickter, Cheryl L

    2013-01-01

    Because of its emphasis on diminishing race and avoiding racial discourse, color-blind racial ideology has been suggested to have negative consequences for modern day race relations. The current research examined the influence of color blindness and the ambiguity of a prejudiced remark on perceptions of a racial minority group member who confronts the remark. One hundred thirteen White participants responded to a vignette depicting a White character making a prejudiced comment of variable ambiguity, after which a Black target character confronted the comment. Results demonstrated that the target confronter was perceived more negatively and as responding less appropriately by participants high in color blindness, and that this effect was particularly pronounced when participants responded to the ambiguous comment. Implications for the ways in which color blindness, as an accepted norm that is endorsed across legal and educational settings, can facilitate Whites' complicity in racial inequality are discussed.

  17. Resistor Combinations for Parallel Circuits.

    Science.gov (United States)

    McTernan, James P.

    1978-01-01

    To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)

  18. Supercomputing for molecular dynamics simulations handling multi-trillion particles in nanofluidics

    CERN Document Server

    Heinecke, Alexander; Horsch, Martin; Bungartz, Hans-Joachim

    2015-01-01

    This work presents modern implementations of relevant molecular dynamics algorithms using ls1 mardyn, a simulation program for engineering applications. The text focuses strictly on HPC-related aspects, covering implementation on HPC architectures, taking Intel Xeon and Intel Xeon Phi clusters as representatives of current platforms. The work describes distributed and shared-memory parallelization on these platforms, including load balancing, with a particular focus on the efficient implementation of the compute kernels. The text also discusses the software-architecture of the resulting code.

  19. Lightweight Specifications for Parallel Correctness

    Science.gov (United States)

    2012-12-05

    series (series), encryption and decryption (crypt), and LU factorization (lufact) — as well as a parallel molecular dynamic simulator (moldyn), ray...111, 57, 132]). The PJ benchmarks include an app computing a Monte Carlo approximation of π (pi), a parallel cryptographic key cracking app (keysearch3...an app for parallel rendering Mandelbrot Set images (mandelbrot), and a parallel branch-and-bound search for optimal phylogenetic trees (phylogeny

  20. Architectural Adaptability in Parallel Programming

    Science.gov (United States)

    1991-05-01

    I AD-A247 516 Architectural Adaptability in Parallel Programming Lawrence Alan Crowl Technical Report 381 May 1991 92-06322 UNIVERSITY OF ROC R...COMPUTER SCIENCE Best Avai~lable Copy Architectural Adaptability in Parallel Programming by Lawrence Alan Crowl Submitted in Partial Fulfillment of the...in the development of their programs. In applying abstraction to parallel programming , we can use abstractions to represent potential parallelism