WorldWideScience

Sample records for supercomputers confronts parallelism

  1. Ultrascalable petaflop parallel supercomputer

    Science.gov (United States)

    Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY

    2010-07-20

    A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

  2. Advanced parallel processing with supercomputer architectures

    International Nuclear Information System (INIS)

    Hwang, K.

    1987-01-01

    This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers

  3. Multi-petascale highly efficient parallel supercomputer

    Science.gov (United States)

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen-Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2018-05-15

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.

  4. Multi-petascale highly efficient parallel supercomputer

    Science.gov (United States)

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2015-07-14

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

  5. Exploiting Thread Parallelism for Ocean Modeling on Cray XC Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Sarje, Abhinav [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Jacobsen, Douglas W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Williams, Samuel W. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Ringler, Todd [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

    2016-05-01

    The incorporation of increasing core counts in modern processors used to build state-of-the-art supercomputers is driving application development towards exploitation of thread parallelism, in addition to distributed memory parallelism, with the goal of delivering efficient high-performance codes. In this work we describe the exploitation of threading and our experiences with it with respect to a real-world ocean modeling application code, MPAS-Ocean. We present detailed performance analysis and comparisons of various approaches and configurations for threading on the Cray XC series supercomputers.

  6. Plane-wave electronic structure calculations on a parallel supercomputer

    International Nuclear Information System (INIS)

    Nelson, J.S.; Plimpton, S.J.; Sears, M.P.

    1993-01-01

    The development of iterative solutions of Schrodinger's equation in a plane-wave (pw) basis over the last several years has coincided with great advances in the computational power available for performing the calculations. These dual developments have enabled many new and interesting condensed matter phenomena to be studied from a first-principles approach. The authors present a detailed description of the implementation on a parallel supercomputer (hypercube) of the first-order equation-of-motion solution to Schrodinger's equation, using plane-wave basis functions and ab initio separable pseudopotentials. By distributing the plane-waves across the processors of the hypercube many of the computations can be performed in parallel, resulting in decreases in the overall computation time relative to conventional vector supercomputers. This partitioning also provides ample memory for large Fast Fourier Transform (FFT) meshes and the storage of plane-wave coefficients for many hundreds of energy bands. The usefulness of the parallel techniques is demonstrated by benchmark timings for both the FFT's and iterations of the self-consistent solution of Schrodinger's equation for different sized Si unit cells of up to 512 atoms

  7. Comments on the parallelization efficiency of the Sunway TaihuLight supercomputer

    OpenAIRE

    Végh, János

    2016-01-01

    In the world of supercomputers, the large number of processors requires to minimize the inefficiencies of parallelization, which appear as a sequential part of the program from the point of view of Amdahl's law. The recently suggested new figure of merit is applied to the recently presented supercomputer, and the timeline of "Top 500" supercomputers is scrutinized using the metric. It is demonstrated, that in addition to the computing performance and power consumption, the new supercomputer i...

  8. Micro-mechanical Simulations of Soils using Massively Parallel Supercomputers

    Directory of Open Access Journals (Sweden)

    David W. Washington

    2004-06-01

    Full Text Available In this research a computer program, Trubal version 1.51, based on the Discrete Element Method was converted to run on a Connection Machine (CM-5,a massively parallel supercomputer with 512 nodes, to expedite the computational times of simulating Geotechnical boundary value problems. The dynamic memory algorithm in Trubal program did not perform efficiently in CM-2 machine with the Single Instruction Multiple Data (SIMD architecture. This was due to the communication overhead involving global array reductions, global array broadcast and random data movement. Therefore, a dynamic memory algorithm in Trubal program was converted to a static memory arrangement and Trubal program was successfully converted to run on CM-5 machines. The converted program was called "TRUBAL for Parallel Machines (TPM." Simulating two physical triaxial experiments and comparing simulation results with Trubal simulations validated the TPM program. With a 512 nodes CM-5 machine TPM produced a nine-fold speedup demonstrating the inherent parallelism within algorithms based on the Discrete Element Method.

  9. JINR supercomputer of the module type for event parallel analysis

    International Nuclear Information System (INIS)

    Kolpakov, I.F.; Senner, A.E.; Smirnov, V.A.

    1987-01-01

    A model of a supercomputer with 50 million of operations per second is suggested. Its realization allows one to solve JINR data analysis problems for large spectrometers (in particular DELPHY collaboration). The suggested module supercomputer is based on 32-bit commercial available microprocessor with a processing rate of about 1 MFLOPS. The processors are combined by means of VME standard busbars. MicroVAX-11 is a host computer organizing the operation of the system. Data input and output is realized via microVAX-11 computer periphery. Users' software is based on the FORTRAN-77. The supercomputer is connected with a JINR net port and all JINR users get an access to the suggested system

  10. An efficient implementation of a backpropagation learning algorithm on quadrics parallel supercomputer

    International Nuclear Information System (INIS)

    Taraglio, S.; Massaioli, F.

    1995-08-01

    A parallel implementation of a library to build and train Multi Layer Perceptrons via the Back Propagation algorithm is presented. The target machine is the SIMD massively parallel supercomputer Quadrics. Performance measures are provided on three different machines with different number of processors, for two network examples. A sample source code is given

  11. Design and performance characterization of electronic structure calculations on massively parallel supercomputers

    DEFF Research Database (Denmark)

    Romero, N. A.; Glinsvad, Christian; Larsen, Ask Hjorth

    2013-01-01

    Density function theory (DFT) is the most widely employed electronic structure method because of its favorable scaling with system size and accuracy for a broad range of molecular and condensed-phase systems. The advent of massively parallel supercomputers has enhanced the scientific community...

  12. ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.

    Science.gov (United States)

    Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping

    2018-04-27

    A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.

  13. Radiation-magnetohydrodynamics of fusion plasmas on parallel supercomputers

    International Nuclear Information System (INIS)

    Yasar, O.; Moses, G.A.; Tautges, T.J.

    1993-01-01

    A parallel computational model to simulate fusion plasmas in the radiation-magnetohydrodynamics (R-MHD) framework is presented. Plasmas are often treated in a fluid dynamics context (magnetohydrodynamics, MHD), but when the flow field is coupled with the radiation field it falls into a more complex category, radiation magnetohydrodynamics (R-MHD), where the interaction between the flow field and the radiation field is nonlinear. The solution for the radiation field usually dominates the R-MHD computation. To solve for the radiation field, one usually chooses the S N discrete ordinates method (a deterministic method) rather than the Monte Carlo method if the geometry is not complex. The discrete ordinates method on a massively parallel processor (Intel iPSC/860) is implemented. The speedup is 14 for a run on 16 processors and the performance is 3.7 times better than a single CRAY YMP processor implementation. (orig./DG)

  14. Monte Carlo simulations of quantum systems on massively parallel supercomputers

    International Nuclear Information System (INIS)

    Ding, H.Q.

    1993-01-01

    A large class of quantum physics applications uses operator representations that are discrete integers by nature. This class includes magnetic properties of solids, interacting bosons modeling superfluids and Cooper pairs in superconductors, and Hubbard models for strongly correlated electrons systems. This kind of application typically uses integer data representations and the resulting algorithms are dominated entirely by integer operations. The authors implemented an efficient algorithm for one such application on the Intel Touchstone Delta and iPSC/860. The algorithm uses a multispin coding technique which allows significant data compactification and efficient vectorization of Monte Carlo updates. The algorithm regularly switches between two data decompositions, corresponding naturally to different Monte Carlo updating processes and observable measurements such that only nearest-neighbor communications are needed within a given decomposition. On 128 nodes of Intel Delta, this algorithm updates 183 million spins per second (compared to 21 million on CM-2 and 6.2 million on a Cray Y-MP). A systematic performance analysis shows a better than 90% efficiency in the parallel implementation

  15. ATLAS FTK a - very complex - custom parallel supercomputer

    CERN Document Server

    Kimura, Naoki; The ATLAS collaboration

    2016-01-01

    In the ever increasing pile-up LHC environment advanced techniques of analysing the data are implemented in order to increase the rate of relevant physics processes with respect to background processes. The Fast TracKer (FTK) is a track finding implementation at hardware level that is designed to deliver full-scan tracks with $p_{T}$ above 1GeV to the ATLAS trigger system for every L1 accept (at a maximum rate of 100kHz). In order to achieve this performance a highly parallel system was designed and now it is under installation in ATLAS. In the beginning of 2016 it will provide tracks for the trigger system in a region covering the central part of the ATLAS detector, and during the year it's coverage will be extended to the full detector coverage. The system relies on matching hits coming from the silicon tracking detectors against 1 billion patterns stored in specially designed ASICS chips (Associative memory - AM06). In a first stage coarse resolution hits are matched against the patterns and the accepted h...

  16. Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

    Science.gov (United States)

    Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

    2011-01-01

    The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.

  17. Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Sreepathi, Sarat [ORNL; Kumar, Jitendra [ORNL; Mills, Richard T. [Argonne National Laboratory; Hoffman, Forrest M. [ORNL; Sripathi, Vamsi [Intel Corporation; Hargrove, William Walter [United States Department of Agriculture (USDA), United States Forest Service (USFS)

    2017-09-01

    A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like the Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.

  18. Large scale simulations of lattice QCD thermodynamics on Columbia Parallel Supercomputers

    International Nuclear Information System (INIS)

    Ohta, Shigemi

    1989-01-01

    The Columbia Parallel Supercomputer project aims at the construction of a parallel processing, multi-gigaflop computer optimized for numerical simulations of lattice QCD. The project has three stages; 16-node, 1/4GF machine completed in April 1985, 64-node, 1GF machine completed in August 1987, and 256-node, 16GF machine now under construction. The machines all share a common architecture; a two dimensional torus formed from a rectangular array of N 1 x N 2 independent and identical processors. A processor is capable of operating in a multi-instruction multi-data mode, except for periods of synchronous interprocessor communication with its four nearest neighbors. Here the thermodynamics simulations on the two working machines are reported. (orig./HSI)

  19. Parallel supercomputing: Advanced methods, algorithms, and software for large-scale linear and nonlinear problems

    Energy Technology Data Exchange (ETDEWEB)

    Carey, G.F.; Young, D.M.

    1993-12-31

    The program outlined here is directed to research on methods, algorithms, and software for distributed parallel supercomputers. Of particular interest are finite element methods and finite difference methods together with sparse iterative solution schemes for scientific and engineering computations of very large-scale systems. Both linear and nonlinear problems will be investigated. In the nonlinear case, applications with bifurcation to multiple solutions will be considered using continuation strategies. The parallelizable numerical methods of particular interest are a family of partitioning schemes embracing domain decomposition, element-by-element strategies, and multi-level techniques. The methods will be further developed incorporating parallel iterative solution algorithms with associated preconditioners in parallel computer software. The schemes will be implemented on distributed memory parallel architectures such as the CRAY MPP, Intel Paragon, the NCUBE3, and the Connection Machine. We will also consider other new architectures such as the Kendall-Square (KSQ) and proposed machines such as the TERA. The applications will focus on large-scale three-dimensional nonlinear flow and reservoir problems with strong convective transport contributions. These are legitimate grand challenge class computational fluid dynamics (CFD) problems of significant practical interest to DOE. The methods developed and algorithms will, however, be of wider interest.

  20. Parallel simulation of tsunami inundation on a large-scale supercomputer

    Science.gov (United States)

    Oishi, Y.; Imamura, F.; Sugawara, D.

    2013-12-01

    An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the

  1. An Optimized Parallel FDTD Topology for Challenging Electromagnetic Simulations on Supercomputers

    Directory of Open Access Journals (Sweden)

    Shugang Jiang

    2015-01-01

    Full Text Available It may not be a challenge to run a Finite-Difference Time-Domain (FDTD code for electromagnetic simulations on a supercomputer with more than 10 thousands of CPU cores; however, to make FDTD code work with the highest efficiency is a challenge. In this paper, the performance of parallel FDTD is optimized through MPI (message passing interface virtual topology, based on which a communication model is established. The general rules of optimal topology are presented according to the model. The performance of the method is tested and analyzed on three high performance computing platforms with different architectures in China. Simulations including an airplane with a 700-wavelength wingspan, and a complex microstrip antenna array with nearly 2000 elements are performed very efficiently using a maximum of 10240 CPU cores.

  2. A Parallel Supercomputer Implementation of a Biological Inspired Neural Network and its use for Pattern Recognition

    International Nuclear Information System (INIS)

    De Ladurantaye, Vincent; Lavoie, Jean; Bergeron, Jocelyn; Parenteau, Maxime; Lu Huizhong; Pichevar, Ramin; Rouat, Jean

    2012-01-01

    A parallel implementation of a large spiking neural network is proposed and evaluated. The neural network implements the binding by synchrony process using the Oscillatory Dynamic Link Matcher (ODLM). Scalability, speed and performance are compared for 2 implementations: Message Passing Interface (MPI) and Compute Unified Device Architecture (CUDA) running on clusters of multicore supercomputers and NVIDIA graphical processing units respectively. A global spiking list that represents at each instant the state of the neural network is described. This list indexes each neuron that fires during the current simulation time so that the influence of their spikes are simultaneously processed on all computing units. Our implementation shows a good scalability for very large networks. A complex and large spiking neural network has been implemented in parallel with success, thus paving the road towards real-life applications based on networks of spiking neurons. MPI offers a better scalability than CUDA, while the CUDA implementation on a GeForce GTX 285 gives the best cost to performance ratio. When running the neural network on the GTX 285, the processing speed is comparable to the MPI implementation on RQCHP's Mammouth parallel with 64 notes (128 cores).

  3. De Novo Ultrascale Atomistic Simulations On High-End Parallel Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Nakano, A; Kalia, R K; Nomura, K; Sharma, A; Vashishta, P; Shimojo, F; van Duin, A; Goddard, III, W A; Biswas, R; Srivastava, D; Yang, L H

    2006-09-04

    We present a de novo hierarchical simulation framework for first-principles based predictive simulations of materials and their validation on high-end parallel supercomputers and geographically distributed clusters. In this framework, high-end chemically reactive and non-reactive molecular dynamics (MD) simulations explore a wide solution space to discover microscopic mechanisms that govern macroscopic material properties, into which highly accurate quantum mechanical (QM) simulations are embedded to validate the discovered mechanisms and quantify the uncertainty of the solution. The framework includes an embedded divide-and-conquer (EDC) algorithmic framework for the design of linear-scaling simulation algorithms with minimal bandwidth complexity and tight error control. The EDC framework also enables adaptive hierarchical simulation with automated model transitioning assisted by graph-based event tracking. A tunable hierarchical cellular decomposition parallelization framework then maps the O(N) EDC algorithms onto Petaflops computers, while achieving performance tunability through a hierarchy of parameterized cell data/computation structures, as well as its implementation using hybrid Grid remote procedure call + message passing + threads programming. High-end computing platforms such as IBM BlueGene/L, SGI Altix 3000 and the NSF TeraGrid provide an excellent test grounds for the framework. On these platforms, we have achieved unprecedented scales of quantum-mechanically accurate and well validated, chemically reactive atomistic simulations--1.06 billion-atom fast reactive force-field MD and 11.8 million-atom (1.04 trillion grid points) quantum-mechanical MD in the framework of the EDC density functional theory on adaptive multigrids--in addition to 134 billion-atom non-reactive space-time multiresolution MD, with the parallel efficiency as high as 0.998 on 65,536 dual-processor BlueGene/L nodes. We have also achieved an automated execution of hierarchical QM

  4. Three-dimensional kinetic simulations of whistler turbulence in solar wind on parallel supercomputers

    Science.gov (United States)

    Chang, Ouliang

    The objective of this dissertation is to study the physics of whistler turbulence evolution and its role in energy transport and dissipation in the solar wind plasmas through computational and theoretical investigations. This dissertation presents the first fully three-dimensional (3D) particle-in-cell (PIC) simulations of whistler turbulence forward cascade in a homogeneous, collisionless plasma with a uniform background magnetic field B o, and the first 3D PIC simulation of whistler turbulence with both forward and inverse cascades. Such computationally demanding research is made possible through the use of massively parallel, high performance electromagnetic PIC simulations on state-of-the-art supercomputers. Simulations are carried out to study characteristic properties of whistler turbulence under variable solar wind fluctuation amplitude (epsilon e) and electron beta (betae), relative contributions to energy dissipation and electron heating in whistler turbulence from the quasilinear scenario and the intermittency scenario, and whistler turbulence preferential cascading direction and wavevector anisotropy. The 3D simulations of whistler turbulence exhibit a forward cascade of fluctuations into broadband, anisotropic, turbulent spectrum at shorter wavelengths with wavevectors preferentially quasi-perpendicular to B o. The overall electron heating yields T ∥ > T⊥ for all epsilone and betae values, indicating the primary linear wave-particle interaction is Landau damping. But linear wave-particle interactions play a minor role in shaping the wavevector spectrum, whereas nonlinear wave-wave interactions are overall stronger and faster processes, and ultimately determine the wavevector anisotropy. Simulated magnetic energy spectra as function of wavenumber show a spectral break to steeper slopes, which scales as k⊥lambda e ≃ 1 independent of betae values, where lambdae is electron inertial length, qualitatively similar to solar wind observations. Specific

  5. What is supercomputing ?

    International Nuclear Information System (INIS)

    Asai, Kiyoshi

    1992-01-01

    Supercomputing means the high speed computation using a supercomputer. Supercomputers and the technical term ''supercomputing'' have spread since ten years ago. The performances of the main computers installed so far in Japan Atomic Energy Research Institute are compared. There are two methods to increase computing speed by using existing circuit elements, parallel processor system and vector processor system. CRAY-1 is the first successful vector computer. Supercomputing technology was first applied to meteorological organizations in foreign countries, and to aviation and atomic energy research institutes in Japan. The supercomputing for atomic energy depends on the trend of technical development in atomic energy, and the contents are divided into the increase of computing speed in existing simulation calculation and the acceleration of the new technical development of atomic energy. The examples of supercomputing in Japan Atomic Energy Research Institute are reported. (K.I.)

  6. On the adequacy of message-passing parallel supercomputers for solving neutron transport problems

    International Nuclear Information System (INIS)

    Azmy, Y.Y.

    1990-01-01

    A coarse-grained, static-scheduling parallelization of the standard iterative scheme used for solving the discrete-ordinates approximation of the neutron transport equation is described. The parallel algorithm is based on a decomposition of the angular domain along the discrete ordinates, thus naturally producing a set of completely uncoupled systems of equations in each iteration. Implementation of the parallel code on Intcl's iPSC/2 hypercube, and solutions to test problems are presented as evidence of the high speedup and efficiency of the parallel code. The performance of the parallel code on the iPSC/2 is analyzed, and a model for the CPU time as a function of the problem size (order of angular quadrature) and the number of participating processors is developed and validated against measured CPU times. The performance model is used to speculate on the potential of massively parallel computers for significantly speeding up real-life transport calculations at acceptable efficiencies. We conclude that parallel computers with a few hundred processors are capable of producing large speedups at very high efficiencies in very large three-dimensional problems. 10 refs., 8 figs

  7. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

    Directory of Open Access Journals (Sweden)

    Mark James Abraham

    2015-09-01

    Full Text Available GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported.

  8. Getting To Exascale: Applying Novel Parallel Programming Models To Lab Applications For The Next Generation Of Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Dube, Evi [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Shereda, Charles [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Nau, Lee [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Harris, Lance [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2010-09-27

    As supercomputing moves toward exascale, node architectures will change significantly. CPU core counts on nodes will increase by an order of magnitude or more. Heterogeneous architectures will become more commonplace, with GPUs or FPGAs providing additional computational power. Novel programming models may make better use of on-node parallelism in these new architectures than do current models. In this paper we examine several of these novel models – UPC, CUDA, and OpenCL –to determine their suitability to LLNL scientific application codes. Our study consisted of several phases: We conducted interviews with code teams and selected two codes to port; We learned how to program in the new models and ported the codes; We debugged and tuned the ported applications; We measured results, and documented our findings. We conclude that UPC is a challenge for porting code, Berkeley UPC is not very robust, and UPC is not suitable as a general alternative to OpenMP for a number of reasons. CUDA is well supported and robust but is a proprietary NVIDIA standard, while OpenCL is an open standard. Both are well suited to a specific set of application problems that can be run on GPUs, but some problems are not suited to GPUs. Further study of the landscape of novel models is recommended.

  9. Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore supercomputers

    KAUST Repository

    Wu, Xingfu; Taylor, Valerie

    2011-01-01

    The NAS Parallel Benchmarks (NPB) are well-known applications with the fixed algorithms for evaluating parallel systems and tools. Multicore supercomputers provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node and MPI can be used with the communication between nodes. In this paper, we use SP and BT benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore supercomputers. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76%, and the hybrid BT outperforms the MPI BT by up to 8.58% on up to 10,000 cores on BlueGene/P at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. We also use performance tools and MPI trace libraries available on these supercomputers to further investigate the performance characteristics of the hybrid SP and BT.

  10. Performance characteristics of hybrid MPI/OpenMP implementations of NAS parallel benchmarks SP and BT on large-scale multicore supercomputers

    KAUST Repository

    Wu, Xingfu

    2011-03-29

    The NAS Parallel Benchmarks (NPB) are well-known applications with the fixed algorithms for evaluating parallel systems and tools. Multicore supercomputers provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node and MPI can be used with the communication between nodes. In this paper, we use SP and BT benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore supercomputers. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76%, and the hybrid BT outperforms the MPI BT by up to 8.58% on up to 10,000 cores on BlueGene/P at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. We also use performance tools and MPI trace libraries available on these supercomputers to further investigate the performance characteristics of the hybrid SP and BT.

  11. Supercomputers Of The Future

    Science.gov (United States)

    Peterson, Victor L.; Kim, John; Holst, Terry L.; Deiwert, George S.; Cooper, David M.; Watson, Andrew B.; Bailey, F. Ron

    1992-01-01

    Report evaluates supercomputer needs of five key disciplines: turbulence physics, aerodynamics, aerothermodynamics, chemistry, and mathematical modeling of human vision. Predicts these fields will require computer speed greater than 10(Sup 18) floating-point operations per second (FLOP's) and memory capacity greater than 10(Sup 15) words. Also, new parallel computer architectures and new structured numerical methods will make necessary speed and capacity available.

  12. Supercomputational science

    CERN Document Server

    Wilson, S

    1990-01-01

    In contemporary research, the supercomputer now ranks, along with radio telescopes, particle accelerators and the other apparatus of "big science", as an expensive resource, which is nevertheless essential for state of the art research. Supercomputers are usually provided as shar.ed central facilities. However, unlike, telescopes and accelerators, they are find a wide range of applications which extends across a broad spectrum of research activity. The difference in performance between a "good" and a "bad" computer program on a traditional serial computer may be a factor of two or three, but on a contemporary supercomputer it can easily be a factor of one hundred or even more! Furthermore, this factor is likely to increase with future generations of machines. In keeping with the large capital and recurrent costs of these machines, it is appropriate to devote effort to training and familiarization so that supercomputers are employed to best effect. This volume records the lectures delivered at a Summer School ...

  13. Supercomputer debugging workshop 1991 proceedings

    Energy Technology Data Exchange (ETDEWEB)

    Brown, J.

    1991-01-01

    This report discusses the following topics on supercomputer debugging: Distributed debugging; use interface to debugging tools and standards; debugging optimized codes; debugging parallel codes; and debugger performance and interface as analysis tools. (LSP)

  14. Supercomputer debugging workshop 1991 proceedings

    Energy Technology Data Exchange (ETDEWEB)

    Brown, J.

    1991-12-31

    This report discusses the following topics on supercomputer debugging: Distributed debugging; use interface to debugging tools and standards; debugging optimized codes; debugging parallel codes; and debugger performance and interface as analysis tools. (LSP)

  15. Some computational challenges of developing efficient parallel algorithms for data-dependent computations in thermal-hydraulics supercomputer applications

    International Nuclear Information System (INIS)

    Woodruff, S.B.

    1992-01-01

    The Transient Reactor Analysis Code (TRAC), which features a two- fluid treatment of thermal-hydraulics, is designed to model transients in water reactors and related facilities. One of the major computational costs associated with TRAC and similar codes is calculating constitutive coefficients. Although the formulations for these coefficients are local the costs are flow-regime- or data-dependent; i.e., the computations needed for a given spatial node often vary widely as a function of time. Consequently, poor load balancing will degrade efficiency on either vector or data parallel architectures when the data are organized according to spatial location. Unfortunately, a general automatic solution to the load-balancing problem associated with data-dependent computations is not yet available for massively parallel architectures. This document discusses why developers algorithms, such as a neural net representation, that do not exhibit algorithms, such as a neural net representation, that do not exhibit load-balancing problems

  16. Some computational challenges of developing efficient parallel algorithms for data-dependent computations in thermal-hydraulics supercomputer applications

    International Nuclear Information System (INIS)

    Woodruff, S.B.

    1994-01-01

    The Transient Reactor Analysis Code (TRAC), which features a two-fluid treatment of thermal-hydraulics, is designed to model transients in water reactors and related facilities. One of the major computational costs associated with TRAC and similar codes is calculating constitutive coefficients. Although the formulations for these coefficients are local, the costs are flow-regime- or data-dependent; i.e., the computations needed for a given spatial node often vary widely as a function of time. Consequently, a fixed, uniform assignment of nodes to prallel processors will result in degraded computational efficiency due to the poor load balancing. A standard method for treating data-dependent models on vector architectures has been to use gather operations (or indirect adressing) to sort the nodes into subsets that (temporarily) share a common computational model. However, this method is not effective on distributed memory data parallel architectures, where indirect adressing involves expensive communication overhead. Another serious problem with this method involves software engineering challenges in the areas of maintainability and extensibility. For example, an implementation that was hand-tuned to achieve good computational efficiency would have to be rewritten whenever the decision tree governing the sorting was modified. Using an example based on the calculation of the wall-to-liquid and wall-to-vapor heat-transfer coefficients for three nonboiling flow regimes, we describe how the use of the Fortran 90 WHERE construct and automatic inlining of functions can be used to ameliorate this problem while improving both efficiency and software engineering. Unfortunately, a general automatic solution to the load-balancing problem associated with data-dependent computations is not yet available for massively parallel architectures. We discuss why developers should either wait for such solutions or consider alternative numerical algorithms, such as a neural network

  17. Computational plasma physics and supercomputers

    International Nuclear Information System (INIS)

    Killeen, J.; McNamara, B.

    1984-09-01

    The Supercomputers of the 80's are introduced. They are 10 to 100 times more powerful than today's machines. The range of physics modeling in the fusion program is outlined. New machine architecture will influence particular codes, but parallel processing poses new coding difficulties. Increasing realism in simulations will require better numerics and more elaborate mathematics

  18. Confrontation clause

    Directory of Open Access Journals (Sweden)

    Tkachuk Sviatoslav

    2016-07-01

    Full Text Available The Sixth Amendment to the United States Constitution enumerates a cluster of rights granted to criminal defendants and is designed to make criminal prosecutions more accurate, fair, and legitimate. The Confrontation Clause, which states that „In all criminal prosecutions, the accused shall enjoy the right…to be confronted with the witness against him” should not be underestimated. This article seeks to analyse the evolution of the Confrontation Clause and the extent of a defendant’s right to face-to-face confrontation. The article analyse the case Crawford v. Washington, which was a key shift in the Supreme Court’s Confrontation Clause jurisprudence.

  19. KAUST Supercomputing Laboratory

    KAUST Repository

    Bailey, April Renee; Kaushik, Dinesh; Winfer, Andrew

    2011-01-01

    KAUST has partnered with IBM to establish a Supercomputing Research Center. KAUST is hosting the Shaheen supercomputer, named after the Arabian falcon famed for its swiftness of flight. This 16-rack IBM Blue Gene/P system is equipped with 4 gigabyte memory per node and capable of 222 teraflops, making KAUST campus the site of one of the world’s fastest supercomputers in an academic environment. KAUST is targeting petaflop capability within 3 years.

  20. KAUST Supercomputing Laboratory

    KAUST Repository

    Bailey, April Renee

    2011-11-15

    KAUST has partnered with IBM to establish a Supercomputing Research Center. KAUST is hosting the Shaheen supercomputer, named after the Arabian falcon famed for its swiftness of flight. This 16-rack IBM Blue Gene/P system is equipped with 4 gigabyte memory per node and capable of 222 teraflops, making KAUST campus the site of one of the world’s fastest supercomputers in an academic environment. KAUST is targeting petaflop capability within 3 years.

  1. Computational plasma physics and supercomputers. Revision 1

    International Nuclear Information System (INIS)

    Killeen, J.; McNamara, B.

    1985-01-01

    The Supercomputers of the 80's are introduced. They are 10 to 100 times more powerful than today's machines. The range of physics modeling in the fusion program is outlined. New machine architecture will influence particular models, but parallel processing poses new programming difficulties. Increasing realism in simulations will require better numerics and more elaborate mathematical models

  2. Confronting Stereotypes

    Science.gov (United States)

    Buswell, Carol

    2011-01-01

    People confront stereotypes every day, both in and out of the classroom. Some ideas have been carried in the collective memory and classroom textbooks for so long they are generally recognized as fact. Many are constantly being reinforced by personal experiences, family discussions, and Hollywood productions as well. The distinct advantage to…

  3. Japanese supercomputer technology

    International Nuclear Information System (INIS)

    Buzbee, B.L.; Ewald, R.H.; Worlton, W.J.

    1982-01-01

    In February 1982, computer scientists from the Los Alamos National Laboratory and Lawrence Livermore National Laboratory visited several Japanese computer manufacturers. The purpose of these visits was to assess the state of the art of Japanese supercomputer technology and to advise Japanese computer vendors of the needs of the US Department of Energy (DOE) for more powerful supercomputers. The Japanese foresee a domestic need for large-scale computing capabilities for nuclear fusion, image analysis for the Earth Resources Satellite, meteorological forecast, electrical power system analysis (power flow, stability, optimization), structural and thermal analysis of satellites, and very large scale integrated circuit design and simulation. To meet this need, Japan has launched an ambitious program to advance supercomputer technology. This program is described

  4. Supercomputer applications in nuclear research

    International Nuclear Information System (INIS)

    Ishiguro, Misako

    1992-01-01

    The utilization of supercomputers in Japan Atomic Energy Research Institute is mainly reported. The fields of atomic energy research which use supercomputers frequently and the contents of their computation are outlined. What is vectorizing is simply explained, and nuclear fusion, nuclear reactor physics, the hydrothermal safety of nuclear reactors, the parallel property that the atomic energy computations of fluids and others have, the algorithm for vector treatment and the effect of speed increase by vectorizing are discussed. At present Japan Atomic Energy Research Institute uses two systems of FACOM VP 2600/10 and three systems of M-780. The contents of computation changed from criticality computation around 1970, through the analysis of LOCA after the TMI accident, to nuclear fusion research, the design of new type reactors and reactor safety assessment at present. Also the method of using computers advanced from batch processing to time sharing processing, from one-dimensional to three dimensional computation, from steady, linear to unsteady nonlinear computation, from experimental analysis to numerical simulation and so on. (K.I.)

  5. Supercomputers to transform Science

    CERN Multimedia

    2006-01-01

    "New insights into the structure of space and time, climate modeling, and the design of novel drugs, are but a few of the many research areas that will be transforned by the installation of three supercomputers at the Unversity of Bristol." (1/2 page)

  6. FPS scientific and supercomputers computers in chemistry

    International Nuclear Information System (INIS)

    Curington, I.J.

    1987-01-01

    FPS Array Processors, scientific computers, and highly parallel supercomputers are used in nearly all aspects of compute-intensive computational chemistry. A survey is made of work utilizing this equipment, both published and current research. The relationship of the computer architecture to computational chemistry is discussed, with specific reference to Molecular Dynamics, Quantum Monte Carlo simulations, and Molecular Graphics applications. Recent installations of the FPS T-Series are highlighted, and examples of Molecular Graphics programs running on the FPS-5000 are shown

  7. Introduction to Reconfigurable Supercomputing

    CERN Document Server

    Lanzagorta, Marco; Rosenberg, Robert

    2010-01-01

    This book covers technologies, applications, tools, languages, procedures, advantages, and disadvantages of reconfigurable supercomputing using Field Programmable Gate Arrays (FPGAs). The target audience is the community of users of High Performance Computers (HPe who may benefit from porting their applications into a reconfigurable environment. As such, this book is intended to guide the HPC user through the many algorithmic considerations, hardware alternatives, usability issues, programming languages, and design tools that need to be understood before embarking on the creation of reconfigur

  8. A supercomputer for parallel data analysis

    International Nuclear Information System (INIS)

    Kolpakov, I.F.; Senner, A.E.; Smirnov, V.A.

    1987-01-01

    The project of a powerful multiprocessor system is proposed. The main purpose of the project is to develop a low cost computer system with a processing rate of a few tens of millions of operations per second. The system solves many problems of data analysis from high-energy physics spectrometers. It includes about 70 MOTOROLA-68020 based powerful slave microprocessor boards liaisoned through the VME crates to a host VAX micro computer. Each single microprocessor board performs the same algorithm requiring large computing time. The host computer distributes data over the microprocessor board, collects and combines obtained results. The architecture of the system easily allows one to use it in the real time mode

  9. Enabling department-scale supercomputing

    Energy Technology Data Exchange (ETDEWEB)

    Greenberg, D.S.; Hart, W.E.; Phillips, C.A.

    1997-11-01

    The Department of Energy (DOE) national laboratories have one of the longest and most consistent histories of supercomputer use. The authors summarize the architecture of DOE`s new supercomputers that are being built for the Accelerated Strategic Computing Initiative (ASCI). The authors then argue that in the near future scaled-down versions of these supercomputers with petaflop-per-weekend capabilities could become widely available to hundreds of research and engineering departments. The availability of such computational resources will allow simulation of physical phenomena to become a full-fledged third branch of scientific exploration, along with theory and experimentation. They describe the ASCI and other supercomputer applications at Sandia National Laboratories, and discuss which lessons learned from Sandia`s long history of supercomputing can be applied in this new setting.

  10. Quantum Hamiltonian Physics with Supercomputers

    International Nuclear Information System (INIS)

    Vary, James P.

    2014-01-01

    The vision of solving the nuclear many-body problem in a Hamiltonian framework with fundamental interactions tied to QCD via Chiral Perturbation Theory is gaining support. The goals are to preserve the predictive power of the underlying theory, to test fundamental symmetries with the nucleus as laboratory and to develop new understandings of the full range of complex quantum phenomena. Advances in theoretical frameworks (renormalization and many-body methods) as well as in computational resources (new algorithms and leadership-class parallel computers) signal a new generation of theory and simulations that will yield profound insights into the origins of nuclear shell structure, collective phenomena and complex reaction dynamics. Fundamental discovery opportunities also exist in such areas as physics beyond the Standard Model of Elementary Particles, the transition between hadronic and quark–gluon dominated dynamics in nuclei and signals that characterize dark matter. I will review some recent achievements and present ambitious consensus plans along with their challenges for a coming decade of research that will build new links between theory, simulations and experiment. Opportunities for graduate students to embark upon careers in the fast developing field of supercomputer simulations is also discussed

  11. Quantum Hamiltonian Physics with Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Vary, James P.

    2014-06-15

    The vision of solving the nuclear many-body problem in a Hamiltonian framework with fundamental interactions tied to QCD via Chiral Perturbation Theory is gaining support. The goals are to preserve the predictive power of the underlying theory, to test fundamental symmetries with the nucleus as laboratory and to develop new understandings of the full range of complex quantum phenomena. Advances in theoretical frameworks (renormalization and many-body methods) as well as in computational resources (new algorithms and leadership-class parallel computers) signal a new generation of theory and simulations that will yield profound insights into the origins of nuclear shell structure, collective phenomena and complex reaction dynamics. Fundamental discovery opportunities also exist in such areas as physics beyond the Standard Model of Elementary Particles, the transition between hadronic and quark–gluon dominated dynamics in nuclei and signals that characterize dark matter. I will review some recent achievements and present ambitious consensus plans along with their challenges for a coming decade of research that will build new links between theory, simulations and experiment. Opportunities for graduate students to embark upon careers in the fast developing field of supercomputer simulations is also discussed.

  12. The new landscape of parallel computer architecture

    Energy Technology Data Exchange (ETDEWEB)

    Shalf, John [NERSC Division, Lawrence Berkeley National Laboratory 1 Cyclotron Road, Berkeley California, 94720 (United States)

    2007-07-15

    The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models.

  13. The new landscape of parallel computer architecture

    International Nuclear Information System (INIS)

    Shalf, John

    2007-01-01

    The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models

  14. Supercomputer debugging workshop `92

    Energy Technology Data Exchange (ETDEWEB)

    Brown, J.S.

    1993-02-01

    This report contains papers or viewgraphs on the following topics: The ABCs of Debugging in the 1990s; Cray Computer Corporation; Thinking Machines Corporation; Cray Research, Incorporated; Sun Microsystems, Inc; Kendall Square Research; The Effects of Register Allocation and Instruction Scheduling on Symbolic Debugging; Debugging Optimized Code: Currency Determination with Data Flow; A Debugging Tool for Parallel and Distributed Programs; Analyzing Traces of Parallel Programs Containing Semaphore Synchronization; Compile-time Support for Efficient Data Race Detection in Shared-Memory Parallel Programs; Direct Manipulation Techniques for Parallel Debuggers; Transparent Observation of XENOOPS Objects; A Parallel Software Monitor for Debugging and Performance Tools on Distributed Memory Multicomputers; Profiling Performance of Inter-Processor Communications in an iWarp Torus; The Application of Code Instrumentation Technology in the Los Alamos Debugger; and CXdb: The Road to Remote Debugging.

  15. Computational Dimensionalities of Global Supercomputing

    Directory of Open Access Journals (Sweden)

    Richard S. Segall

    2013-12-01

    Full Text Available This Invited Paper pertains to subject of my Plenary Keynote Speech at the 17th World Multi-Conference on Systemics, Cybernetics and Informatics (WMSCI 2013 held in Orlando, Florida on July 9-12, 2013. The title of my Plenary Keynote Speech was: "Dimensionalities of Computation: from Global Supercomputing to Data, Text and Web Mining" but this Invited Paper will focus only on the "Computational Dimensionalities of Global Supercomputing" and is based upon a summary of the contents of several individual articles that have been previously written with myself as lead author and published in [75], [76], [77], [78], [79], [80] and [11]. The topics of these of the Plenary Speech included Overview of Current Research in Global Supercomputing [75], Open-Source Software Tools for Data Mining Analysis of Genomic and Spatial Images using High Performance Computing [76], Data Mining Supercomputing with SAS™ JMP® Genomics ([77], [79], [80], and Visualization by Supercomputing Data Mining [81]. ______________________ [11.] Committee on the Future of Supercomputing, National Research Council (2003, The Future of Supercomputing: An Interim Report, ISBN-13: 978-0-309-09016- 2, http://www.nap.edu/catalog/10784.html [75.] Segall, Richard S.; Zhang, Qingyu and Cook, Jeffrey S.(2013, "Overview of Current Research in Global Supercomputing", Proceedings of Forty- Fourth Meeting of Southwest Decision Sciences Institute (SWDSI, Albuquerque, NM, March 12-16, 2013. [76.] Segall, Richard S. and Zhang, Qingyu (2010, "Open-Source Software Tools for Data Mining Analysis of Genomic and Spatial Images using High Performance Computing", Proceedings of 5th INFORMS Workshop on Data Mining and Health Informatics, Austin, TX, November 6, 2010. [77.] Segall, Richard S., Zhang, Qingyu and Pierce, Ryan M.(2010, "Data Mining Supercomputing with SAS™ JMP®; Genomics: Research-in-Progress, Proceedings of 2010 Conference on Applied Research in Information Technology, sponsored by

  16. The ETA10 supercomputer system

    International Nuclear Information System (INIS)

    Swanson, C.D.

    1987-01-01

    The ETA Systems, Inc. ETA 10 is a next-generation supercomputer featuring multiprocessing, a large hierarchical memory system, high performance input/output, and network support for both batch and interactive processing. Advanced technology used in the ETA 10 includes liquid nitrogen cooled CMOS logic with 20,000 gates per chip, a single printed circuit board for each CPU, and high density static and dynamics MOS memory chips. Software for the ETA 10 includes an underlying kernel that supports multiple user environments, a new ETA FORTRAN compiler with an advanced automatic vectorizer, a multitasking library and debugging tools. Possible developments for future supercomputers from ETA Systems are discussed. (orig.)

  17. World's fastest supercomputer opens up to users

    Science.gov (United States)

    Xin, Ling

    2016-08-01

    China's latest supercomputer - Sunway TaihuLight - has claimed the crown as the world's fastest computer according to the latest TOP500 list, released at the International Supercomputer Conference in Frankfurt in late June.

  18. The GF11 supercomputer

    International Nuclear Information System (INIS)

    Beetem, J.; Weingarten, D.

    1986-01-01

    GF11 is a parallel computer currently under construction at the IBM Yorktown Research Center. The machine incorporates 576 floating-point processors arrangedin a modified SIMD architecture. Each has space for 2 Mbytes of memory and is capable of 20 Mflops, giving the total machine a peak of 1.125 Gbytes of memory and 11.52 Gflops. The floating-point processors are interconnected by a dynamically reconfigurable non-blocking switching network. At each machine cycle any of 1024 pre-selected permutations of data can be realized among the processors. The main intended application of GF11 is a class of calculations arising from quantum chromodynamics

  19. The GF11 supercomputer

    International Nuclear Information System (INIS)

    Beetem, J.; Denneau, M.; Weingarten, D.

    1985-01-01

    GF11 is a parallel computer currently under construction at the IBM Yorktown Research Center. The machine incorporates 576 floating- point processors arranged in a modified SIMD architecture. Each has space for 2 Mbytes of memory and is capable of 20 Mflops, giving the total machine a peak of 1.125 Gbytes of memory and 11.52 Gflops. The floating-point processors are interconnected by a dynamically reconfigurable nonblocking switching network. At each machine cycle any of 1024 pre-selected permutations of data can be realized among the processors. The main intended application of GF11 is a class of calculations arising from quantum chromodynamics

  20. Supercomputing and related national projects in Japan

    International Nuclear Information System (INIS)

    Miura, Kenichi

    1985-01-01

    Japanese supercomputer development activities in the industry and research projects are outlined. Architecture, technology, software, and applications of Fujitsu's Vector Processor Systems are described as an example of Japanese supercomputers. Applications of supercomputers to high energy physics are also discussed. (orig.)

  1. Mistral Supercomputer Job History Analysis

    OpenAIRE

    Zasadziński, Michał; Muntés-Mulero, Victor; Solé, Marc; Ludwig, Thomas

    2018-01-01

    In this technical report, we show insights and results of operational data analysis from petascale supercomputer Mistral, which is ranked as 42nd most powerful in the world as of January 2018. Data sources include hardware monitoring data, job scheduler history, topology, and hardware information. We explore job state sequences, spatial distribution, and electric power patterns.

  2. Supercomputers and quantum field theory

    International Nuclear Information System (INIS)

    Creutz, M.

    1985-01-01

    A review is given of why recent simulations of lattice gauge theories have resulted in substantial demands from particle theorists for supercomputer time. These calculations have yielded first principle results on non-perturbative aspects of the strong interactions. An algorithm for simulating dynamical quark fields is discussed. 14 refs

  3. Parallel computing works!

    CERN Document Server

    Fox, Geoffrey C; Messina, Guiseppe C

    2014-01-01

    A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop

  4. Tryton Supercomputer Capabilities for Analysis of Massive Data Streams

    Directory of Open Access Journals (Sweden)

    Krawczyk Henryk

    2015-09-01

    Full Text Available The recently deployed supercomputer Tryton, located in the Academic Computer Center of Gdansk University of Technology, provides great means for massive parallel processing. Moreover, the status of the Center as one of the main network nodes in the PIONIER network enables the fast and reliable transfer of data produced by miscellaneous devices scattered in the area of the whole country. The typical examples of such data are streams containing radio-telescope and satellite observations. Their analysis, especially with real-time constraints, can be challenging and requires the usage of dedicated software components. We propose a solution for such parallel analysis using the supercomputer, supervised by the KASKADA platform, which with the conjunction with immerse 3D visualization techniques can be used to solve problems such as pulsar detection and chronometric or oil-spill simulation on the sea surface.

  5. The GF11 supercomputer

    International Nuclear Information System (INIS)

    Beetem, J.; Denneau, M.; Weingarten, D.

    1985-01-01

    GF11 is a parallel computer currently under construction at the Yorktown Research Center. The machine incorporates 576 floating-point processors arranged in a modified SIMD architecture. Each processor has space for 2 Mbytes of memory and is capable of 20 MFLOPS, giving the total machine a peak of 1.125 Gbytes of memory and 11.52 GFLOPS. The floating-point processors are interconnected by a dynamically reconfigurable non-blocking switching network. At each machine cycle any of 1024 pre-selected permutations of data can be realized among the processors. The main intended application of GF11 is a class of calculations arising from quantum chromodynamics, a proposed theory of the elementary particles which participate in nuclear interactions

  6. Visualizing quantum scattering on the CM-2 supercomputer

    International Nuclear Information System (INIS)

    Richardson, J.L.

    1991-01-01

    We implement parallel algorithms for solving the time-dependent Schroedinger equation on the CM-2 supercomputer. These methods are unconditionally stable as well as unitary at each time step and have the advantage of being spatially local and explicit. We show how to visualize the dynamics of quantum scattering using techniques for visualizing complex wave functions. Several scattering problems are solved to demonstrate the use of these methods. (orig.)

  7. High Performance Networks From Supercomputing to Cloud Computing

    CERN Document Server

    Abts, Dennis

    2011-01-01

    Datacenter networks provide the communication substrate for large parallel computer systems that form the ecosystem for high performance computing (HPC) systems and modern Internet applications. The design of new datacenter networks is motivated by an array of applications ranging from communication intensive climatology, complex material simulations and molecular dynamics to such Internet applications as Web search, language translation, collaborative Internet applications, streaming video and voice-over-IP. For both Supercomputing and Cloud Computing the network enables distributed applicati

  8. An assessment of worldwide supercomputer usage

    Energy Technology Data Exchange (ETDEWEB)

    Wasserman, H.J.; Simmons, M.L.; Hayes, A.H.

    1995-01-01

    This report provides a comparative study of advanced supercomputing usage in Japan and the United States as of Spring 1994. It is based on the findings of a group of US scientists whose careers have centered on programming, evaluating, and designing high-performance supercomputers for over ten years. The report is a follow-on to an assessment of supercomputing technology in Europe and Japan that was published in 1993. Whereas the previous study focused on supercomputer manufacturing capabilities, the primary focus of the current work was to compare where and how supercomputers are used. Research for this report was conducted through both literature studies and field research in Japan.

  9. Supercomputers and parallel computation. Based on the proceedings of a workshop on progress in the use of vector and array processors organised by the Institute of Mathematics and its Applications and held in Bristol, 2-3 September 1982

    International Nuclear Information System (INIS)

    Paddon, D.J.

    1984-01-01

    This book is based on the proceedings of a conference on parallel computing held in 1982. There are 18 papers which cover the following topics: VLSI parallel architectures, the theory of parallel computing and vector and array processor computing. One paper on 'Tough Problems in Reactor Design' is indexed separately. All the contributions are on research done in the United Kingdom. Although much of the experience in array processor computing is associated with the ICL distributed array processor (DAP) and this is reflected in the contributions, the research relating to the ICL DAP is relevant to all types of array processors. (UK)

  10. A visual analytics system for optimizing the performance of large-scale networks in supercomputing systems

    Directory of Open Access Journals (Sweden)

    Takanori Fujiwara

    2018-03-01

    Full Text Available The overall efficiency of an extreme-scale supercomputer largely relies on the performance of its network interconnects. Several of the state of the art supercomputers use networks based on the increasingly popular Dragonfly topology. It is crucial to study the behavior and performance of different parallel applications running on Dragonfly networks in order to make optimal system configurations and design choices, such as job scheduling and routing strategies. However, in order to study these temporal network behavior, we would need a tool to analyze and correlate numerous sets of multivariate time-series data collected from the Dragonfly’s multi-level hierarchies. This paper presents such a tool–a visual analytics system–that uses the Dragonfly network to investigate the temporal behavior and optimize the communication performance of a supercomputer. We coupled interactive visualization with time-series analysis methods to help reveal hidden patterns in the network behavior with respect to different parallel applications and system configurations. Our system also provides multiple coordinated views for connecting behaviors observed at different levels of the network hierarchies, which effectively helps visual analysis tasks. We demonstrate the effectiveness of the system with a set of case studies. Our system and findings can not only help improve the communication performance of supercomputing applications, but also the network performance of next-generation supercomputers. Keywords: Supercomputing, Parallel communication network, Dragonfly networks, Time-series data, Performance analysis, Visual analytics

  11. NASA Advanced Supercomputing Facility Expansion

    Science.gov (United States)

    Thigpen, William W.

    2017-01-01

    The NASA Advanced Supercomputing (NAS) Division enables advances in high-end computing technologies and in modeling and simulation methods to tackle some of the toughest science and engineering challenges facing NASA today. The name "NAS" has long been associated with leadership and innovation throughout the high-end computing (HEC) community. We play a significant role in shaping HEC standards and paradigms, and provide leadership in the areas of large-scale InfiniBand fabrics, Lustre open-source filesystems, and hyperwall technologies. We provide an integrated high-end computing environment to accelerate NASA missions and make revolutionary advances in science. Pleiades, a petaflop-scale supercomputer, is used by scientists throughout the U.S. to support NASA missions, and is ranked among the most powerful systems in the world. One of our key focus areas is in modeling and simulation to support NASA's real-world engineering applications and make fundamental advances in modeling and simulation methods.

  12. ATLAS Software Installation on Supercomputers

    CERN Document Server

    Undrus, Alexander; The ATLAS collaboration

    2018-01-01

    PowerPC and high performance computers (HPC) are important resources for computing in the ATLAS experiment. The future LHC data processing will require more resources than Grid computing, currently using approximately 100,000 cores at well over 100 sites, can provide. Supercomputers are extremely powerful as they use resources of hundreds of thousands CPUs joined together. However their architectures have different instruction sets. ATLAS binary software distributions for x86 chipsets do not fit these architectures, as emulation of these chipsets results in huge performance loss. This presentation describes the methodology of ATLAS software installation from source code on supercomputers. The installation procedure includes downloading the ATLAS code base as well as the source of about 50 external packages, such as ROOT and Geant4, followed by compilation, and rigorous unit and integration testing. The presentation reports the application of this procedure at Titan HPC and Summit PowerPC at Oak Ridge Computin...

  13. Development of seismic tomography software for hybrid supercomputers

    Science.gov (United States)

    Nikitin, Alexandr; Serdyukov, Alexandr; Duchkov, Anton

    2015-04-01

    Seismic tomography is a technique used for computing velocity model of geologic structure from first arrival travel times of seismic waves. The technique is used in processing of regional and global seismic data, in seismic exploration for prospecting and exploration of mineral and hydrocarbon deposits, and in seismic engineering for monitoring the condition of engineering structures and the surrounding host medium. As a consequence of development of seismic monitoring systems and increasing volume of seismic data, there is a growing need for new, more effective computational algorithms for use in seismic tomography applications with improved performance, accuracy and resolution. To achieve this goal, it is necessary to use modern high performance computing systems, such as supercomputers with hybrid architecture that use not only CPUs, but also accelerators and co-processors for computation. The goal of this research is the development of parallel seismic tomography algorithms and software package for such systems, to be used in processing of large volumes of seismic data (hundreds of gigabytes and more). These algorithms and software package will be optimized for the most common computing devices used in modern hybrid supercomputers, such as Intel Xeon CPUs, NVIDIA Tesla accelerators and Intel Xeon Phi co-processors. In this work, the following general scheme of seismic tomography is utilized. Using the eikonal equation solver, arrival times of seismic waves are computed based on assumed velocity model of geologic structure being analyzed. In order to solve the linearized inverse problem, tomographic matrix is computed that connects model adjustments with travel time residuals, and the resulting system of linear equations is regularized and solved to adjust the model. The effectiveness of parallel implementations of existing algorithms on target architectures is considered. During the first stage of this work, algorithms were developed for execution on

  14. Integration of Panda Workload Management System with supercomputers

    Science.gov (United States)

    De, K.; Jha, S.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Nilsson, P.; Novikov, A.; Oleynik, D.; Panitkin, S.; Poyda, A.; Read, K. F.; Ryabinkin, E.; Teslyuk, A.; Velikhov, V.; Wells, J. C.; Wenaus, T.

    2016-09-01

    The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System for managing the workflow for all data processing on over 140 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250000 cores with a peak performance of 0.3+ petaFLOPS, next LHC data taking runs will require more resources than Grid computing can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, Europe and Russia (in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF), Supercomputer at the National Research Center "Kurchatov Institute", IT4 in Ostrava, and others). The current approach utilizes a modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run singlethreaded workloads in parallel on Titan's multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads

  15. Proceedings of the first energy research power supercomputer users symposium

    International Nuclear Information System (INIS)

    1991-01-01

    The Energy Research Power Supercomputer Users Symposium was arranged to showcase the richness of science that has been pursued and accomplished in this program through the use of supercomputers and now high performance parallel computers over the last year: this report is the collection of the presentations given at the Symposium. ''Power users'' were invited by the ER Supercomputer Access Committee to show that the use of these computational tools and the associated data communications network, ESNet, go beyond merely speeding up computations. Today the work often directly contributes to the advancement of the conceptual developments in their fields and the computational and network resources form the very infrastructure of today's science. The Symposium also provided an opportunity, which is rare in this day of network access to computing resources, for the invited users to compare and discuss their techniques and approaches with those used in other ER disciplines. The significance of new parallel architectures was highlighted by the interesting evening talk given by Dr. Stephen Orszag of Princeton University

  16. Status of supercomputers in the US

    International Nuclear Information System (INIS)

    Fernbach, S.

    1985-01-01

    Current Supercomputers; that is, the Class VI machines which first became available in 1976 are being delivered in greater quantity than ever before. In addition, manufacturers are busily working on Class VII machines to be ready for delivery in CY 1987. Mainframes are being modified or designed to take on some features of the supercomputers and new companies with the intent of either competing directly in the supercomputer arena or in providing entry-level systems from which to graduate to supercomputers are springing up everywhere. Even well founded organizations like IBM and CDC are adding machines with vector instructions in their repertoires. Japanese - manufactured supercomputers are also being introduced into the U.S. Will these begin to compete with those of U.S. manufacture. Are they truly competitive. It turns out that both from the hardware and software points of view they may be superior. We may be facing the same problems in supercomputers that we faced in videosystems

  17. TOP500 Supercomputers for June 2004

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2004-06-23

    23rd Edition of TOP500 List of World's Fastest Supercomputers Released: Japan's Earth Simulator Enters Third Year in Top Position MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a closely watched event in the world of high-performance computing, the 23rd edition of the TOP500 list of the world's fastest supercomputers was released today (June 23, 2004) at the International Supercomputer Conference in Heidelberg, Germany.

  18. Extending ATLAS Computing to Commercial Clouds and Supercomputers

    CERN Document Server

    Nilsson, P; The ATLAS collaboration; Filipcic, A; Klimentov, A; Maeno, T; Oleynik, D; Panitkin, S; Wenaus, T; Wu, W

    2014-01-01

    The Large Hadron Collider will resume data collection in 2015 with substantially increased computing requirements relative to its first 2009-2013 run. A near doubling of the energy and the data rate, high level of event pile-up, and detector upgrades will mean the number and complexity of events to be analyzed will increase dramatically. A naive extrapolation of the Run 1 experience would suggest that a 5-6 fold increase in computing resources are needed - impossible within the anticipated flat computing budgets in the near future. Consequently ATLAS is engaged in an ambitious program to expand its computing to all available resources, notably including opportunistic use of commercial clouds and supercomputers. Such resources present new challenges in managing heterogeneity, supporting data flows, parallelizing workflows, provisioning software, and other aspects of distributed computing, all while minimizing operational load. We will present the ATLAS experience to date with clouds and supercomputers, and des...

  19. INTEL: Intel based systems move up in supercomputing ranks

    CERN Multimedia

    2002-01-01

    "The TOP500 supercomputer rankings released today at the Supercomputing 2002 conference show a dramatic increase in the number of Intel-based systems being deployed in high-performance computing (HPC) or supercomputing areas" (1/2 page).

  20. Massively Parallel QCD

    International Nuclear Information System (INIS)

    Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G

    2007-01-01

    The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results

  1. Porting Ordinary Applications to Blue Gene/Q Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Maheshwari, Ketan C.; Wozniak, Justin M.; Armstrong, Timothy; Katz, Daniel S.; Binkowski, T. Andrew; Zhong, Xiaoliang; Heinonen, Olle; Karpeyev, Dmitry; Wilde, Michael

    2015-08-31

    Efficiently porting ordinary applications to Blue Gene/Q supercomputers is a significant challenge. Codes are often originally developed without considering advanced architectures and related tool chains. Science needs frequently lead users to want to run large numbers of relatively small jobs (often called many-task computing, an ensemble, or a workflow), which can conflict with supercomputer configurations. In this paper, we discuss techniques developed to execute ordinary applications over leadership class supercomputers. We use the high-performance Swift parallel scripting framework and build two workflow execution techniques-sub-jobs and main-wrap. The sub-jobs technique, built on top of the IBM Blue Gene/Q resource manager Cobalt's sub-block jobs, lets users submit multiple, independent, repeated smaller jobs within a single larger resource block. The main-wrap technique is a scheme that enables C/C++ programs to be defined as functions that are wrapped by a high-performance Swift wrapper and that are invoked as a Swift script. We discuss the needs, benefits, technicalities, and current limitations of these techniques. We further discuss the real-world science enabled by these techniques and the results obtained.

  2. Confronting an Augmented Reality

    Science.gov (United States)

    Munnerley, Danny; Bacon, Matt; Wilson, Anna; Steele, James; Hedberg, John; Fitzgerald, Robert

    2012-01-01

    How can educators make use of augmented reality technologies and practices to enhance learning and why would we want to embrace such technologies anyway? How can an augmented reality help a learner confront, interpret and ultimately comprehend reality itself ? In this article, we seek to initiate a discussion that focuses on these questions, and…

  3. Confronting Ambiguity in Science

    Science.gov (United States)

    Emery, Katherine; Harlow, Danielle; Whitmer, Ali; Gaines, Steven

    2015-01-01

    People are regularly confronted with environmental and science-related issues presented to them in newspapers, on television, or even in their own doctor's office. Often the information they use to inform their decisions on matters of science may be ambiguous and contradictory. This article presents an activity that investigates how students deal…

  4. Engineering-Based Thermal CFD Simulations on Massive Parallel Systems

    KAUST Repository

    Frisch, Jé rô me; Mundani, Ralf-Peter; Rank, Ernst; van Treeck, Christoph

    2015-01-01

    The development of parallel Computational Fluid Dynamics (CFD) codes is a challenging task that entails efficient parallelization concepts and strategies in order to achieve good scalability values when running those codes on modern supercomputers

  5. Supercomputer algorithms for reactivity, dynamics and kinetics of small molecules

    International Nuclear Information System (INIS)

    Lagana, A.

    1989-01-01

    Even for small systems, the accurate characterization of reactive processes is so demanding of computer resources as to suggest the use of supercomputers having vector and parallel facilities. The full advantages of vector and parallel architectures can sometimes be obtained by simply modifying existing programs, vectorizing the manipulation of vectors and matrices, and requiring the parallel execution of independent tasks. More often, however, a significant time saving can be obtained only when the computer code undergoes a deeper restructuring, requiring a change in the computational strategy or, more radically, the adoption of a different theoretical treatment. This book discusses supercomputer strategies based upon act and approximate methods aimed at calculating the electronic structure and the reactive properties of small systems. The book shows how, in recent years, intense design activity has led to the ability to calculate accurate electronic structures for reactive systems, exact and high-level approximations to three-dimensional reactive dynamics, and to efficient directive and declaratory software for the modelling of complex systems

  6. A fast random number generator for the Intel Paragon supercomputer

    Science.gov (United States)

    Gutbrod, F.

    1995-06-01

    A pseudo-random number generator is presented which makes optimal use of the architecture of the i860-microprocessor and which is expected to have a very long period. It is therefore a good candidate for use on the parallel supercomputer Paragon XP. In the assembler version, it needs 6.4 cycles for a real∗4 random number. There is a FORTRAN routine which yields identical numbers up to rare and minor rounding discrepancies, and it needs 28 cycles. The FORTRAN performance on other microprocessors is somewhat better. Arguments for the quality of the generator and some numerical tests are given.

  7. Parallel processor programs in the Federal Government

    Science.gov (United States)

    Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.

    1985-01-01

    In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.

  8. TOP500 Supercomputers for June 2005

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2005-06-22

    25th Edition of TOP500 List of World's Fastest Supercomputers Released: DOE/L LNL BlueGene/L and IBM gain Top Positions MANNHEIM, Germany; KNOXVILLE, Tenn.; BERKELEY, Calif. In what has become a closely watched event in the world of high-performance computing, the 25th edition of the TOP500 list of the world's fastest supercomputers was released today (June 22, 2005) at the 20th International Supercomputing Conference (ISC2005) in Heidelberg Germany.

  9. TOP500 Supercomputers for November 2003

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2003-11-16

    22nd Edition of TOP500 List of World s Fastest Supercomputers Released MANNHEIM, Germany; KNOXVILLE, Tenn.; BERKELEY, Calif. In what has become a much-anticipated event in the world of high-performance computing, the 22nd edition of the TOP500 list of the worlds fastest supercomputers was released today (November 16, 2003). The Earth Simulator supercomputer retains the number one position with its Linpack benchmark performance of 35.86 Tflop/s (''teraflops'' or trillions of calculations per second). It was built by NEC and installed last year at the Earth Simulator Center in Yokohama, Japan.

  10. Supercomputer debugging workshop '92

    Energy Technology Data Exchange (ETDEWEB)

    Brown, J.S.

    1993-01-01

    This report contains papers or viewgraphs on the following topics: The ABCs of Debugging in the 1990s; Cray Computer Corporation; Thinking Machines Corporation; Cray Research, Incorporated; Sun Microsystems, Inc; Kendall Square Research; The Effects of Register Allocation and Instruction Scheduling on Symbolic Debugging; Debugging Optimized Code: Currency Determination with Data Flow; A Debugging Tool for Parallel and Distributed Programs; Analyzing Traces of Parallel Programs Containing Semaphore Synchronization; Compile-time Support for Efficient Data Race Detection in Shared-Memory Parallel Programs; Direct Manipulation Techniques for Parallel Debuggers; Transparent Observation of XENOOPS Objects; A Parallel Software Monitor for Debugging and Performance Tools on Distributed Memory Multicomputers; Profiling Performance of Inter-Processor Communications in an iWarp Torus; The Application of Code Instrumentation Technology in the Los Alamos Debugger; and CXdb: The Road to Remote Debugging.

  11. TOP500 Supercomputers for November 2004

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2004-11-08

    24th Edition of TOP500 List of World's Fastest Supercomputers Released: DOE/IBM BlueGene/L and NASA/SGI's Columbia gain Top Positions MANNHEIM, Germany; KNOXVILLE, Tenn.; BERKELEY, Calif. In what has become a closely watched event in the world of high-performance computing, the 24th edition of the TOP500 list of the worlds fastest supercomputers was released today (November 8, 2004) at the SC2004 Conference in Pittsburgh, Pa.

  12. Plasma turbulence calculations on supercomputers

    International Nuclear Information System (INIS)

    Carreras, B.A.; Charlton, L.A.; Dominguez, N.; Drake, J.B.; Garcia, L.; Leboeuf, J.N.; Lee, D.K.; Lynch, V.E.; Sidikman, K.

    1991-01-01

    Although the single-particle picture of magnetic confinement is helpful in understanding some basic physics of plasma confinement, it does not give a full description. Collective effects dominate plasma behavior. Any analysis of plasma confinement requires a self-consistent treatment of the particles and fields. The general picture is further complicated because the plasma, in general, is turbulent. The study of fluid turbulence is a rather complex field by itself. In addition to the difficulties of classical fluid turbulence, plasma turbulence studies face the problems caused by the induced magnetic turbulence, which couples field by itself. In addition to the difficulties of classical fluid turbulence, plasma turbulence studies face the problems caused by the induced magnetic turbulence, which couples back to the fluid. Since the fluid is not a perfect conductor, this turbulence can lead to changes in the topology of the magnetic field structure, causing the magnetic field lines to wander radially. Because the plasma fluid flows along field lines, they carry the particles with them, and this enhances the losses caused by collisions. The changes in topology are critical for the plasma confinement. The study of plasma turbulence and the concomitant transport is a challenging problem. Because of the importance of solving the plasma turbulence problem for controlled thermonuclear research, the high complexity of the problem, and the necessity of attacking the problem with supercomputers, the study of plasma turbulence in magnetic confinement devices is a Grand Challenge problem

  13. Storage-Intensive Supercomputing Benchmark Study

    Energy Technology Data Exchange (ETDEWEB)

    Cohen, J; Dossa, D; Gokhale, M; Hysom, D; May, J; Pearce, R; Yoo, A

    2007-10-30

    Critical data science applications requiring frequent access to storage perform poorly on today's computing architectures. This project addresses efficient computation of data-intensive problems in national security and basic science by exploring, advancing, and applying a new form of computing called storage-intensive supercomputing (SISC). Our goal is to enable applications that simply cannot run on current systems, and, for a broad range of data-intensive problems, to deliver an order of magnitude improvement in price/performance over today's data-intensive architectures. This technical report documents much of the work done under LDRD 07-ERD-063 Storage Intensive Supercomputing during the period 05/07-09/07. The following chapters describe: (1) a new file I/O monitoring tool iotrace developed to capture the dynamic I/O profiles of Linux processes; (2) an out-of-core graph benchmark for level-set expansion of scale-free graphs; (3) an entity extraction benchmark consisting of a pipeline of eight components; and (4) an image resampling benchmark drawn from the SWarp program in the LSST data processing pipeline. The performance of the graph and entity extraction benchmarks was measured in three different scenarios: data sets residing on the NFS file server and accessed over the network; data sets stored on local disk; and data sets stored on the Fusion I/O parallel NAND Flash array. The image resampling benchmark compared performance of software-only to GPU-accelerated. In addition to the work reported here, an additional text processing application was developed that used an FPGA to accelerate n-gram profiling for language classification. The n-gram application will be presented at SC07 at the High Performance Reconfigurable Computing Technologies and Applications Workshop. The graph and entity extraction benchmarks were run on a Supermicro server housing the NAND Flash 40GB parallel disk array, the Fusion-io. The Fusion system specs are as follows

  14. Parallel Earthquake Simulations on Large-Scale Multicore Supercomputers

    KAUST Repository

    Wu, Xingfu

    2011-01-01

    Earthquakes are one of the most destructive natural hazards on our planet Earth. Hugh earthquakes striking offshore may cause devastating tsunamis, as evidenced by the 11 March 2011 Japan (moment magnitude Mw9.0) and the 26 December 2004 Sumatra (Mw9.1) earthquakes. Earthquake prediction (in terms of the precise time, place, and magnitude of a coming earthquake) is arguably unfeasible in the foreseeable future. To mitigate seismic hazards from future earthquakes in earthquake-prone areas, such as California and Japan, scientists have been using numerical simulations to study earthquake rupture propagation along faults and seismic wave propagation in the surrounding media on ever-advancing modern computers over past several decades. In particular, ground motion simulations for past and future (possible) significant earthquakes have been performed to understand factors that affect ground shaking in populated areas, and to provide ground shaking characteristics and synthetic seismograms for emergency preparation and design of earthquake-resistant structures. These simulation results can guide the development of more rational seismic provisions for leading to safer, more efficient, and economical50pt]Please provide V. Taylor author e-mail ID. structures in earthquake-prone regions.

  15. Performance analysis of job scheduling policies in parallel supercomputing environments

    Energy Technology Data Exchange (ETDEWEB)

    Naik, V.K.; Squillante, M.S. [IBM T.J. Watson Research Center, Yorktown Heights, NY (United States); Setia, S.K. [George Mason Univ., Fairfax, VA (United States). Dept. of Computer Science

    1993-12-31

    In this paper the authors analyze three general classes of scheduling policies under a workload typical of largescale scientific computing. These policies differ in the manner in which processors are partitioned among the jobs as well as the way in which jobs are prioritized for execution on the partitions. Their results indicate that existing static schemes do not perform well under varying workloads. Adaptive policies tend to make better scheduling decisions, but their ability to adjust to workload changes is limited. Dynamic partitioning policies, on the other hand, yield the best performance and can be tuned to provide desired performance differences among jobs with varying resource demands.

  16. Parallel Earthquake Simulations on Large-Scale Multicore Supercomputers

    KAUST Repository

    Wu, Xingfu; Duan, Benchun; Taylor, Valerie

    2011-01-01

    , such as California and Japan, scientists have been using numerical simulations to study earthquake rupture propagation along faults and seismic wave propagation in the surrounding media on ever-advancing modern computers over past several decades. In particular

  17. A training program for scientific supercomputing users

    Energy Technology Data Exchange (ETDEWEB)

    Hanson, F.; Moher, T.; Sabelli, N.; Solem, A.

    1988-01-01

    There is need for a mechanism to transfer supercomputing technology into the hands of scientists and engineers in such a way that they will acquire a foundation of knowledge that will permit integration of supercomputing as a tool in their research. Most computing center training emphasizes computer-specific information about how to use a particular computer system; most academic programs teach concepts to computer scientists. Only a few brief courses and new programs are designed for computational scientists. This paper describes an eleven-week training program aimed principally at graduate and postdoctoral students in computationally-intensive fields. The program is designed to balance the specificity of computing center courses, the abstractness of computer science courses, and the personal contact of traditional apprentice approaches. It is based on the experience of computer scientists and computational scientists, and consists of seminars and clinics given by many visiting and local faculty. It covers a variety of supercomputing concepts, issues, and practices related to architecture, operating systems, software design, numerical considerations, code optimization, graphics, communications, and networks. Its research component encourages understanding of scientific computing and supercomputer hardware issues. Flexibility in thinking about computing needs is emphasized by the use of several different supercomputer architectures, such as the Cray X/MP48 at the National Center for Supercomputing Applications at University of Illinois at Urbana-Champaign, IBM 3090 600E/VF at the Cornell National Supercomputer Facility, and Alliant FX/8 at the Advanced Computing Research Facility at Argonne National Laboratory. 11 refs., 6 tabs.

  18. Confronting an augmented reality

    Directory of Open Access Journals (Sweden)

    John Hedberg

    2012-08-01

    Full Text Available How can educators make use of augmented reality technologies and practices to enhance learning and why would we want to embrace such technologies anyway? How can an augmented reality help a learner confront, interpret and ultimately comprehend reality itself? In this article, we seek to initiate a discussion that focuses on these questions, and suggest that they be used as drivers for research into effective educational applications of augmented reality. We discuss how multi-modal, sensorial augmentation of reality links to existing theories of education and learning, focusing on ideas of cognitive dissonance and the confrontation of new realities implied by exposure to new and varied perspectives. We also discuss connections with broader debates brought on by the social and cultural changes wrought by the increased digitalisation of our lives, especially the concept of the extended mind. Rather than offer a prescription for augmentation, our intention is to throw open debate and to provoke deep thinking about what interacting with and creating an augmented reality might mean for both teacher and learner.

  19. Productive Parallel Programming: The PCN Approach

    Directory of Open Access Journals (Sweden)

    Ian Foster

    1992-01-01

    Full Text Available We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel computer and run unchanged on supercomputers, and integrated debugging and performance analysis tools. We survey representative scientific applications and identify problem classes for which PCN has proved particularly useful.

  20. TOP500 Supercomputers for June 2003

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2003-06-23

    21st Edition of TOP500 List of World's Fastest Supercomputers Released MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a much-anticipated event in the world of high-performance computing, the 21st edition of the TOP500 list of the world's fastest supercomputers was released today (June 23, 2003). The Earth Simulator supercomputer built by NEC and installed last year at the Earth Simulator Center in Yokohama, Japan, with its Linpack benchmark performance of 35.86 Tflop/s (teraflops or trillions of calculations per second), retains the number one position. The number 2 position is held by the re-measured ASCI Q system at Los Alamos National Laboratory. With 13.88 Tflop/s, it is the second system ever to exceed the 10 Tflop/smark. ASCIQ was built by Hewlett-Packard and is based on the AlphaServerSC computer system.

  1. Desktop supercomputer: what can it do?

    Science.gov (United States)

    Bogdanov, A.; Degtyarev, A.; Korkhov, V.

    2017-12-01

    The paper addresses the issues of solving complex problems that require using supercomputers or multiprocessor clusters available for most researchers nowadays. Efficient distribution of high performance computing resources according to actual application needs has been a major research topic since high-performance computing (HPC) technologies became widely introduced. At the same time, comfortable and transparent access to these resources was a key user requirement. In this paper we discuss approaches to build a virtual private supercomputer available at user's desktop: a virtual computing environment tailored specifically for a target user with a particular target application. We describe and evaluate possibilities to create the virtual supercomputer based on light-weight virtualization technologies, and analyze the efficiency of our approach compared to traditional methods of HPC resource management.

  2. Desktop supercomputer: what can it do?

    International Nuclear Information System (INIS)

    Bogdanov, A.; Degtyarev, A.; Korkhov, V.

    2017-01-01

    The paper addresses the issues of solving complex problems that require using supercomputers or multiprocessor clusters available for most researchers nowadays. Efficient distribution of high performance computing resources according to actual application needs has been a major research topic since high-performance computing (HPC) technologies became widely introduced. At the same time, comfortable and transparent access to these resources was a key user requirement. In this paper we discuss approaches to build a virtual private supercomputer available at user's desktop: a virtual computing environment tailored specifically for a target user with a particular target application. We describe and evaluate possibilities to create the virtual supercomputer based on light-weight virtualization technologies, and analyze the efficiency of our approach compared to traditional methods of HPC resource management.

  3. TOP500 Supercomputers for June 2002

    Energy Technology Data Exchange (ETDEWEB)

    Strohmaier, Erich; Meuer, Hans W.; Dongarra, Jack; Simon, Horst D.

    2002-06-20

    19th Edition of TOP500 List of World's Fastest Supercomputers Released MANNHEIM, Germany; KNOXVILLE, Tenn.;&BERKELEY, Calif. In what has become a much-anticipated event in the world of high-performance computing, the 19th edition of the TOP500 list of the worlds fastest supercomputers was released today (June 20, 2002). The recently installed Earth Simulator supercomputer at the Earth Simulator Center in Yokohama, Japan, is as expected the clear new number 1. Its performance of 35.86 Tflop/s (trillions of calculations per second) running the Linpack benchmark is almost five times higher than the performance of the now No.2 IBM ASCI White system at Lawrence Livermore National Laboratory (7.2 Tflop/s). This powerful leap frogging to the top by a system so much faster than the previous top system is unparalleled in the history of the TOP500.

  4. Status reports of supercomputing astrophysics in Japan

    International Nuclear Information System (INIS)

    Nakamura, Takashi; Nagasawa, Mikio

    1990-01-01

    The Workshop on Supercomputing Astrophysics was held at National Laboratory for High Energy Physics (KEK, Tsukuba) from August 31 to September 2, 1989. More than 40 participants of physicists, astronomers were attendant and discussed many topics in the informal atmosphere. The main purpose of this workshop was focused on the theoretical activities in computational astrophysics in Japan. It was also aimed to promote effective collaboration between the numerical experimentists working on supercomputing technique. The various subjects of the presented papers of hydrodynamics, plasma physics, gravitating systems, radiative transfer and general relativity are all stimulating. In fact, these numerical calculations become possible now in Japan owing to the power of Japanese supercomputer such as HITAC S820, Fujitsu VP400E and NEC SX-2. (J.P.N.)

  5. The ETA systems plans for supercomputers

    International Nuclear Information System (INIS)

    Swanson, C.D.

    1987-01-01

    The ETA Systems, is a class VII supercomputer featuring multiprocessing, a large hierarchical memory system, high performance input/output, and network support for both batch and interactive processing. Advanced technology used in the ETA 10 includes liquid nitrogen cooled CMOS logic with 20,000 gates per chip, a single printed circuit board for each CPU, and high density static and dynamic MOS memory chips. Software for the ETA 10 includes an underlying kernel that supports multiple user environments, a new ETA FORTRAN compiler with an advanced automatic vectorizer, a multitasking library and debugging tools. Possible developments for future supercomputers from ETA Systems are discussed

  6. Lattice gauge theory using parallel processors

    International Nuclear Information System (INIS)

    Lee, T.D.; Chou, K.C.; Zichichi, A.

    1987-01-01

    The book's contents include: Lattice Gauge Theory Lectures: Introduction and Current Fermion Simulations; Monte Carlo Algorithms for Lattice Gauge Theory; Specialized Computers for Lattice Gauge Theory; Lattice Gauge Theory at Finite Temperature: A Monte Carlo Study; Computational Method - An Elementary Introduction to the Langevin Equation, Present Status of Numerical Quantum Chromodynamics; Random Lattice Field Theory; The GF11 Processor and Compiler; and The APE Computer and First Physics Results; Columbia Supercomputer Project: Parallel Supercomputer for Lattice QCD; Statistical and Systematic Errors in Numerical Simulations; Monte Carlo Simulation for LGT and Programming Techniques on the Columbia Supercomputer; Food for Thought: Five Lectures on Lattice Gauge Theory

  7. Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

    KAUST Repository

    Hasanov, Khalid; Quintin, Jean-Noë l; Lastovetsky, Alexey

    2014-01-01

    -scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel

  8. Adaptability of supercomputers to nuclear computations

    International Nuclear Information System (INIS)

    Asai, Kiyoshi; Ishiguro, Misako; Matsuura, Toshihiko.

    1983-01-01

    Recently in the field of scientific and technical calculation, the usefulness of supercomputers represented by CRAY-1 has been recognized, and they are utilized in various countries. The rapid computation of supercomputers is based on the function of vector computation. The authors investigated the adaptability to vector computation of about 40 typical atomic energy codes for the past six years. Based on the results of investigation, the adaptability of the function of vector computation that supercomputers have to atomic energy codes, the problem regarding the utilization and the future prospect are explained. The adaptability of individual calculation codes to vector computation is largely dependent on the algorithm and program structure used for the codes. The change to high speed by pipeline vector system, the investigation in the Japan Atomic Energy Research Institute and the results, and the examples of expressing the codes for atomic energy, environmental safety and nuclear fusion by vector are reported. The magnification of speed up for 40 examples was from 1.5 to 9.0. It can be said that the adaptability of supercomputers to atomic energy codes is fairly good. (Kako, I.)

  9. SUPERCOMPUTER SIMULATION OF CRITICAL PHENOMENA IN COMPLEX SOCIAL SYSTEMS

    Directory of Open Access Journals (Sweden)

    Petrus M.A. Sloot

    2014-09-01

    Full Text Available The paper describes a problem of computer simulation of critical phenomena in complex social systems on a petascale computing systems in frames of complex networks approach. The three-layer system of nested models of complex networks is proposed including aggregated analytical model to identify critical phenomena, detailed model of individualized network dynamics and model to adjust a topological structure of a complex network. The scalable parallel algorithm covering all layers of complex networks simulation is proposed. Performance of the algorithm is studied on different supercomputing systems. The issues of software and information infrastructure of complex networks simulation are discussed including organization of distributed calculations, crawling the data in social networks and results visualization. The applications of developed methods and technologies are considered including simulation of criminal networks disruption, fast rumors spreading in social networks, evolution of financial networks and epidemics spreading.

  10. Direct exploitation of a top 500 Supercomputer for Analysis of CMS Data

    International Nuclear Information System (INIS)

    Cabrillo, I; Cabellos, L; Marco, J; Fernandez, J; Gonzalez, I

    2014-01-01

    The Altamira Supercomputer hosted at the Instituto de Fisica de Cantatbria (IFCA) entered in operation in summer 2012. Its last generation FDR Infiniband network used (for message passing) in parallel jobs, supports the connection to General Parallel File System (GPFS) servers, enabling an efficient simultaneous processing of multiple data demanding jobs. Sharing a common GPFS system and a single LDAP-based identification with the existing Grid clusters at IFCA allows CMS researchers to exploit the large instantaneous capacity of this supercomputer to execute analysis jobs. The detailed experience describing this opportunistic use for skimming and final analysis of CMS 2012 data for a specific physics channel, resulting in an order of magnitude reduction of the waiting time, is presented.

  11. Integration Of PanDA Workload Management System With Supercomputers for ATLAS and Data Intensive Science

    Energy Technology Data Exchange (ETDEWEB)

    De, K [University of Texas at Arlington; Jha, S [Rutgers University; Klimentov, A [Brookhaven National Laboratory (BNL); Maeno, T [Brookhaven National Laboratory (BNL); Nilsson, P [Brookhaven National Laboratory (BNL); Oleynik, D [University of Texas at Arlington; Panitkin, S [Brookhaven National Laboratory (BNL); Wells, Jack C [ORNL; Wenaus, T [Brookhaven National Laboratory (BNL)

    2016-01-01

    The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System for managing the workflow for all data processing on over 150 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250,000 cores with a peak performance of 0.3 petaFLOPS, LHC data taking runs require more resources than Grid computing can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, Europe and Russia (in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF), MIRA supercomputer at Argonne Leadership Computing Facilities (ALCF), Supercomputer at the National Research Center Kurchatov Institute , IT4 in Ostrava and others). Current approach utilizes modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on LCFs multi-core worker nodes. This implementation

  12. Mantle Convection on Modern Supercomputers

    Science.gov (United States)

    Weismüller, J.; Gmeiner, B.; Huber, M.; John, L.; Mohr, M.; Rüde, U.; Wohlmuth, B.; Bunge, H. P.

    2015-12-01

    Mantle convection is the cause for plate tectonics, the formation of mountains and oceans, and the main driving mechanism behind earthquakes. The convection process is modeled by a system of partial differential equations describing the conservation of mass, momentum and energy. Characteristic to mantle flow is the vast disparity of length scales from global to microscopic, turning mantle convection simulations into a challenging application for high-performance computing. As system size and technical complexity of the simulations continue to increase, design and implementation of simulation models for next generation large-scale architectures is handled successfully only in an interdisciplinary context. A new priority program - named SPPEXA - by the German Research Foundation (DFG) addresses this issue, and brings together computer scientists, mathematicians and application scientists around grand challenges in HPC. Here we report from the TERRA-NEO project, which is part of the high visibility SPPEXA program, and a joint effort of four research groups. TERRA-NEO develops algorithms for future HPC infrastructures, focusing on high computational efficiency and resilience in next generation mantle convection models. We present software that can resolve the Earth's mantle with up to 1012 grid points and scales efficiently to massively parallel hardware with more than 50,000 processors. We use our simulations to explore the dynamic regime of mantle convection and assess the impact of small scale processes on global mantle flow.

  13. Visualization environment of the large-scale data of JAEA's supercomputer system

    Energy Technology Data Exchange (ETDEWEB)

    Sakamoto, Kensaku [Japan Atomic Energy Agency, Center for Computational Science and e-Systems, Tokai, Ibaraki (Japan); Hoshi, Yoshiyuki [Research Organization for Information Science and Technology (RIST), Tokai, Ibaraki (Japan)

    2013-11-15

    On research and development of various fields of nuclear energy, visualization of calculated data is especially useful to understand the result of simulation in an intuitive way. Many researchers who run simulations on the supercomputer in Japan Atomic Energy Agency (JAEA) are used to transfer calculated data files from the supercomputer to their local PCs for visualization. In recent years, as the size of calculated data has gotten larger with improvement of supercomputer performance, reduction of visualization processing time as well as efficient use of JAEA network is being required. As a solution, we introduced a remote visualization system which has abilities to utilize parallel processors on the supercomputer and to reduce the usage of network resources by transferring data of intermediate visualization process. This paper reports a study on the performance of image processing with the remote visualization system. The visualization processing time is measured and the influence of network speed is evaluated by varying the drawing mode, the size of visualization data and the number of processors. Based on this study, a guideline for using the remote visualization system is provided to show how the system can be used effectively. An upgrade policy of the next system is also shown. (author)

  14. Patterns for Parallel Software Design

    CERN Document Server

    Ortega-Arjona, Jorge Luis

    2010-01-01

    Essential reading to understand patterns for parallel programming Software patterns have revolutionized the way we think about how software is designed, built, and documented, and the design of parallel software requires you to consider other particular design aspects and special skills. From clusters to supercomputers, success heavily depends on the design skills of software developers. Patterns for Parallel Software Design presents a pattern-oriented software architecture approach to parallel software design. This approach is not a design method in the classic sense, but a new way of managin

  15. Graphics supercomputer for computational fluid dynamics research

    Science.gov (United States)

    Liaw, Goang S.

    1994-11-01

    The objective of this project is to purchase a state-of-the-art graphics supercomputer to improve the Computational Fluid Dynamics (CFD) research capability at Alabama A & M University (AAMU) and to support the Air Force research projects. A cutting-edge graphics supercomputer system, Onyx VTX, from Silicon Graphics Computer Systems (SGI), was purchased and installed. Other equipment including a desktop personal computer, PC-486 DX2 with a built-in 10-BaseT Ethernet card, a 10-BaseT hub, an Apple Laser Printer Select 360, and a notebook computer from Zenith were also purchased. A reading room has been converted to a research computer lab by adding some furniture and an air conditioning unit in order to provide an appropriate working environments for researchers and the purchase equipment. All the purchased equipment were successfully installed and are fully functional. Several research projects, including two existing Air Force projects, are being performed using these facilities.

  16. Problem solving in nuclear engineering using supercomputers

    International Nuclear Information System (INIS)

    Schmidt, F.; Scheuermann, W.; Schatz, A.

    1987-01-01

    The availability of supercomputers enables the engineer to formulate new strategies for problem solving. One such strategy is the Integrated Planning and Simulation System (IPSS). With the integrated systems, simulation models with greater consistency and good agreement with actual plant data can be effectively realized. In the present work some of the basic ideas of IPSS are described as well as some of the conditions necessary to build such systems. Hardware and software characteristics as realized are outlined. (orig.) [de

  17. A supercomputing application for reactors core design and optimization

    International Nuclear Information System (INIS)

    Hourcade, Edouard; Gaudier, Fabrice; Arnaud, Gilles; Funtowiez, David; Ammar, Karim

    2010-01-01

    Advanced nuclear reactor designs are often intuition-driven processes where designers first develop or use simplified simulation tools for each physical phenomenon involved. Through the project development, complexity in each discipline increases and implementation of chaining/coupling capabilities adapted to supercomputing optimization process are often postponed to a further step so that task gets increasingly challenging. In the context of renewal in reactor designs, project of first realization are often run in parallel with advanced design although very dependant on final options. As a consequence, the development of tools to globally assess/optimize reactor core features, with the on-going design methods accuracy, is needed. This should be possible within reasonable simulation time and without advanced computer skills needed at project management scale. Also, these tools should be ready to easily cope with modeling progresses in each discipline through project life-time. An early stage development of multi-physics package adapted to supercomputing is presented. The URANIE platform, developed at CEA and based on the Data Analysis Framework ROOT, is very well adapted to this approach. It allows diversified sampling techniques (SRS, LHS, qMC), fitting tools (neuronal networks...) and optimization techniques (genetic algorithm). Also data-base management and visualization are made very easy. In this paper, we'll present the various implementing steps of this core physics tool where neutronics, thermo-hydraulics, and fuel mechanics codes are run simultaneously. A relevant example of optimization of nuclear reactor safety characteristics will be presented. Also, flexibility of URANIE tool will be illustrated with the presentation of several approaches to improve Pareto front quality. (author)

  18. Parallel hierarchical global illumination

    Energy Technology Data Exchange (ETDEWEB)

    Snell, Quinn O. [Iowa State Univ., Ames, IA (United States)

    1997-10-08

    Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.

  19. Explaining the gap between theoretical peak performance and real performance for supercomputer architectures

    International Nuclear Information System (INIS)

    Schoenauer, W.; Haefner, H.

    1993-01-01

    The basic architectures of vector and parallel computers with their properties are presented. Then the memory size and the arithmetic operations in the context of memory bandwidth are discussed. For the exemplary discussion of a single operation micro-measurements of the vector triad for the IBM 3090 VF and the CRAY Y-MP/8 are presented. They reveal the details of the losses for a single operation. Then we analyze the global performance of a whole supercomputer by identifying reduction factors that bring down the theoretical peak performance to the poor real performance. The responsibilities of the manufacturer and of the user for these losses are dicussed. Then the price-performance ratio for different architectures in a snapshot of January 1991 is briefly mentioned. Finally some remarks to a user-friendly architecture for a supercomputer will be made. (orig.)

  20. A universe model confronted to observations

    International Nuclear Information System (INIS)

    Souriau, J.M.

    1982-09-01

    Present work is a detailed study of a Universe model elaborated in several steps, and some of its consequences. Absence zone in quasar spatial distribution is first described; demonstration is made it is sufficient to determine a cosmological model. Each following paragraph is concerned with a type of observation, which is confronted with the model. Universe age and density, redshift-luminosity relation for galaxies and quasars, diameter-redshift relation for radiosources, radiation isotropy at 3 0 K, matter-antimatter contact zone physics. An eventual stratification of universe parallel to this zone is more peculiarly studied; absorption lines in quasar spectra are in way interpreted, just as local super-cluster and local group of galaxies, galaxy HI region orientation, and at last neighbouring galaxy kinematics [fr

  1. Self-Confrontation of Teachers.

    Science.gov (United States)

    Schmuck, Richard A.

    Simply presenting teachers with information about discrepancies between their ideal and their actual classroom performances does not, in itself, lead to constructive change. In part, this is because teachers confronted with such discrepancies experience dissonance which often gives rise to anxiety. This paper discusses the psychological processes…

  2. A workbench for tera-flop supercomputing

    International Nuclear Information System (INIS)

    Resch, M.M.; Kuester, U.; Mueller, M.S.; Lang, U.

    2003-01-01

    Supercomputers currently reach a peak performance in the range of TFlop/s. With but one exception - the Japanese Earth Simulator - none of these systems has so far been able to also show a level of sustained performance for a variety of applications that comes close to the peak performance. Sustained TFlop/s are therefore rarely seen. The reasons are manifold and are well known: Bandwidth and latency both for main memory and for the internal network are the key internal technical problems. Cache hierarchies with large caches can bring relief but are no remedy to the problem. However, there are not only technical problems that inhibit the full exploitation by scientists of the potential of modern supercomputers. More and more organizational issues come to the forefront. This paper shows the approach of the High Performance Computing Center Stuttgart (HLRS) to deliver a sustained performance of TFlop/s for a wide range of applications from a large group of users spread over Germany. The core of the concept is the role of the data. Around this we design a simulation workbench that hides the complexity of interacting computers, networks and file systems from the user. (authors)

  3. PNNL supercomputer to become largest computing resource on the Grid

    CERN Multimedia

    2002-01-01

    Hewlett Packard announced that the US DOE Pacific Northwest National Laboratory will connect a 9.3-teraflop HP supercomputer to the DOE Science Grid. This will be the largest supercomputer attached to a computer grid anywhere in the world (1 page).

  4. Parallel adaptation of a vectorised quantumchemical program system

    International Nuclear Information System (INIS)

    Van Corler, L.C.H.; Van Lenthe, J.H.

    1987-01-01

    Supercomputers, like the CRAY 1 or the Cyber 205, have had, and still have, a marked influence on Quantum Chemistry. Vectorization has led to a considerable increase in the performance of Quantum Chemistry programs. However, clockcycle times more than a factor 10 smaller than those of the present supercomputers are not to be expected. Therefore future supercomputers will have to depend on parallel structures. Recently, the first examples of such supercomputers have been installed. To be prepared for this new generation of (parallel) supercomputers one should consider the concepts one wants to use and the kind of problems one will encounter during implementation of existing vectorized programs on those parallel systems. The authors implemented four important parts of a large quantumchemical program system (ATMOL), i.e. integrals, SCF, 4-index and Direct-CI in the parallel environment at ECSEC (Rome, Italy). This system offers simulated parallellism on the host computer (IBM 4381) and real parallellism on at most 10 attached processors (FPS-164). Quantumchemical programs usually handle large amounts of data and very large, often sparse matrices. The transfer of that many data can cause problems concerning communication and overhead, in view of which shared memory and shared disks must be considered. The strategy and the tools that were used to parallellise the programs are shown. Also, some examples are presented to illustrate effectiveness and performance of the system in Rome for these type of calculations

  5. Supercomputing - Use Cases, Advances, The Future (2/2)

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Supercomputing has become a staple of science and the poster child for aggressive developments in silicon technology, energy efficiency and programming. In this series we examine the key components of supercomputing setups and the various advances – recent and past – that made headlines and delivered bigger and bigger machines. We also take a closer look at the future prospects of supercomputing, and the extent of its overlap with high throughput computing, in the context of main use cases ranging from oil exploration to market simulation. On the second day, we will focus on software and software paradigms driving supercomputers, workloads that need supercomputing treatment, advances in technology and possible future developments. Lecturer's short bio: Andrzej Nowak has 10 years of experience in computing technologies, primarily from CERN openlab and Intel. At CERN, he managed a research lab collaborating with Intel and was part of the openlab Chief Technology Office. Andrzej also worked closely and i...

  6. Integration Of PanDA Workload Management System With Supercomputers for ATLAS and Data Intensive Science

    Science.gov (United States)

    Klimentov, A.; De, K.; Jha, S.; Maeno, T.; Nilsson, P.; Oleynik, D.; Panitkin, S.; Wells, J.; Wenaus, T.

    2016-10-01

    The.LHC, operating at CERN, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System for managing the workflow for all data processing on over 150 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250,000 cores with a peak performance of 0.3 petaFLOPS, LHC data taking runs require more resources than grid can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility. Current approach utilizes modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on LCFs multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads on several supercomputing platforms for ALICE and ATLAS experiments and it is in full pro duction for the ATLAS since September 2015. We will present our current accomplishments with running PanDA at supercomputers and demonstrate our ability to use PanDA as a portal independent of the

  7. Integration Of PanDA Workload Management System With Supercomputers for ATLAS and Data Intensive Science

    International Nuclear Information System (INIS)

    Klimentov, A; Maeno, T; Nilsson, P; Panitkin, S; Wenaus, T; De, K; Oleynik, D; Jha, S; Wells, J

    2016-01-01

    The.LHC, operating at CERN, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe. ATLAS, one of the largest collaborations ever assembled in the sciences, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System for managing the workflow for all data processing on over 150 data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. While PanDA currently uses more than 250,000 cores with a peak performance of 0.3 petaFLOPS, LHC data taking runs require more resources than grid can possibly provide. To alleviate these challenges, LHC experiments are engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with supercomputers in United States, in particular with Titan supercomputer at Oak Ridge Leadership Computing Facility. Current approach utilizes modified PanDA pilot framework for job submission to the supercomputers batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on LCFs multi-core worker nodes. This implementation was tested with a variety of Monte-Carlo workloads on several supercomputing platforms for ALICE and ATLAS experiments and it is in full pro duction for the ATLAS since September 2015. We will present our current accomplishments with running PanDA at supercomputers and demonstrate our ability to use PanDA as a portal independent of the

  8. Vector and parallel processors in computational science

    International Nuclear Information System (INIS)

    Duff, I.S.; Reid, J.K.

    1985-01-01

    This book presents the papers given at a conference which reviewed the new developments in parallel and vector processing. Topics considered at the conference included hardware (array processors, supercomputers), programming languages, software aids, numerical methods (e.g., Monte Carlo algorithms, iterative methods, finite elements, optimization), and applications (e.g., neutron transport theory, meteorology, image processing)

  9. HPL and STREAM Benchmarks on SANAM Supercomputer

    KAUST Repository

    Bin Sulaiman, Riman A.

    2017-01-01

    SANAM supercomputer was jointly built by KACST and FIAS in 2012 ranking second that year in the Green500 list with a power efficiency of 2.3 GFLOPS/W (Rohr et al., 2014). It is a heterogeneous accelerator-based HPC system that has 300 compute nodes. Each node includes two Intel Xeon E5?2650 CPUs, two AMD FirePro S10000 dual GPUs and 128 GiB of main memory. In this work, the seven benchmarks of HPCC were installed and configured to reassess the performance of SANAM, as part of an unpublished master thesis, after it was reassembled in the Kingdom of Saudi Arabia. We present here detailed results of HPL and STREAM benchmarks.

  10. HPL and STREAM Benchmarks on SANAM Supercomputer

    KAUST Repository

    Bin Sulaiman, Riman A.

    2017-03-13

    SANAM supercomputer was jointly built by KACST and FIAS in 2012 ranking second that year in the Green500 list with a power efficiency of 2.3 GFLOPS/W (Rohr et al., 2014). It is a heterogeneous accelerator-based HPC system that has 300 compute nodes. Each node includes two Intel Xeon E5?2650 CPUs, two AMD FirePro S10000 dual GPUs and 128 GiB of main memory. In this work, the seven benchmarks of HPCC were installed and configured to reassess the performance of SANAM, as part of an unpublished master thesis, after it was reassembled in the Kingdom of Saudi Arabia. We present here detailed results of HPL and STREAM benchmarks.

  11. Supercomputing Centers and Electricity Service Providers

    DEFF Research Database (Denmark)

    Patki, Tapasya; Bates, Natalie; Ghatikar, Girish

    2016-01-01

    from a detailed, quantitative survey-based analysis and compare the perspectives of the European grid and SCs to the ones of the United States (US). We then show that contrary to the expectation, SCs in the US are more open toward cooperating and developing demand-management strategies with their ESPs......Supercomputing Centers (SCs) have high and variable power demands, which increase the challenges of the Electricity Service Providers (ESPs) with regards to efficient electricity distribution and reliable grid operation. High penetration of renewable energy generation further exacerbates...... this problem. In order to develop a symbiotic relationship between the SCs and their ESPs and to support effective power management at all levels, it is critical to understand and analyze how the existing relationships were formed and how these are expected to evolve. In this paper, we first present results...

  12. Visualization on supercomputing platform level II ASC milestone (3537-1B) results from Sandia.

    Energy Technology Data Exchange (ETDEWEB)

    Geveci, Berk (Kitware, Inc., Clifton Park, NY); Fabian, Nathan; Marion, Patrick (Kitware, Inc., Clifton Park, NY); Moreland, Kenneth D.

    2010-09-01

    This report provides documentation for the completion of the Sandia portion of the ASC Level II Visualization on the platform milestone. This ASC Level II milestone is a joint milestone between Sandia National Laboratories and Los Alamos National Laboratories. This milestone contains functionality required for performing visualization directly on a supercomputing platform, which is necessary for peta-scale visualization. Sandia's contribution concerns in-situ visualization, running a visualization in tandem with a solver. Visualization and analysis of petascale data is limited by several factors which must be addressed as ACES delivers the Cielo platform. Two primary difficulties are: (1) Performance of interactive rendering, which is most computationally intensive portion of the visualization process. For terascale platforms, commodity clusters with graphics processors(GPUs) have been used for interactive rendering. For petascale platforms, visualization and rendering may be able to run efficiently on the supercomputer platform itself. (2) I/O bandwidth, which limits how much information can be written to disk. If we simply analyze the sparse information that is saved to disk we miss the opportunity to analyze the rich information produced every timestep by the simulation. For the first issue, we are pursuing in-situ analysis, in which simulations are coupled directly with analysis libraries at runtime. This milestone will evaluate the visualization and rendering performance of current and next generation supercomputers in contrast to GPU-based visualization clusters, and evaluate the performance of common analysis libraries coupled with the simulation that analyze and write data to disk during a running simulation. This milestone will explore, evaluate and advance the maturity level of these technologies and their applicability to problems of interest to the ASC program. Scientific simulation on parallel supercomputers is traditionally performed in four

  13. Integration of Titan supercomputer at OLCF with ATLAS Production System

    Science.gov (United States)

    Barreiro Megino, F.; De, K.; Jha, S.; Klimentov, A.; Maeno, T.; Nilsson, P.; Oleynik, D.; Padolski, S.; Panitkin, S.; Wells, J.; Wenaus, T.; ATLAS Collaboration

    2017-10-01

    The PanDA (Production and Distributed Analysis) workload management system was developed to meet the scale and complexity of distributed computing for the ATLAS experiment. PanDA managed resources are distributed worldwide, on hundreds of computing sites, with thousands of physicists accessing hundreds of Petabytes of data and the rate of data processing already exceeds Exabyte per year. While PanDA currently uses more than 200,000 cores at well over 100 Grid sites, future LHC data taking runs will require more resources than Grid computing can possibly provide. Additional computing and storage resources are required. Therefore ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. In this paper we will describe a project aimed at integration of ATLAS Production System with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF). Current approach utilizes modified PanDA Pilot framework for job submission to Titan’s batch queues and local data management, with lightweight MPI wrappers to run single node workloads in parallel on Titan’s multi-core worker nodes. It provides for running of standard ATLAS production jobs on unused resources (backfill) on Titan. The system already allowed ATLAS to collect on Titan millions of core-hours per month, execute hundreds of thousands jobs, while simultaneously improving Titans utilization efficiency. We will discuss the details of the implementation, current experience with running the system, as well as future plans aimed at improvements in scalability and efficiency. Notice: This manuscript has been authored, by employees of Brookhaven Science Associates, LLC under Contract No. DE-AC02-98CH10886 with the U.S. Department of Energy. The publisher by accepting the manuscript for publication acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to

  14. Earth and environmental science in the 1980's: Part 1: Environmental data systems, supercomputer facilities and networks

    Science.gov (United States)

    1986-01-01

    Overview descriptions of on-line environmental data systems, supercomputer facilities, and networks are presented. Each description addresses the concepts of content, capability, and user access relevant to the point of view of potential utilization by the Earth and environmental science community. The information on similar systems or facilities is presented in parallel fashion to encourage and facilitate intercomparison. In addition, summary sheets are given for each description, and a summary table precedes each section.

  15. Comparison of some parallelization strategies of thermalhydraulic codes on GPUs

    International Nuclear Information System (INIS)

    Jendoubi, T.; Bergeaud, V.; Geay, A.

    2013-01-01

    Modern supercomputers architecture is now often based on hybrid concepts combining parallelism to distributed memory, parallelism to shared memory and also to GPUs (Graphic Process Units). In this work, we propose a new approach to take advantage of these graphic cards in thermohydraulics algorithms. (authors)

  16. Parallelization of the FLAPW method

    Science.gov (United States)

    Canning, A.; Mannstadt, W.; Freeman, A. J.

    2000-08-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.

  17. OpenMP Performance on the Columbia Supercomputer

    Science.gov (United States)

    Haoqiang, Jin; Hood, Robert

    2005-01-01

    This presentation discusses Columbia World Class Supercomputer which is one of the world's fastest supercomputers providing 61 TFLOPs (10/20/04). Conceived, designed, built, and deployed in just 120 days. A 20-node supercomputer built on proven 512-processor nodes. The largest SGI system in the world with over 10,000 Intel Itanium 2 processors and provides the largest node size incorporating commodity parts (512) and the largest shared-memory environment (2048) with 88% efficiency tops the scalar systems on the Top500 list.

  18. Nuclear Terrorism and its Confrontation

    International Nuclear Information System (INIS)

    Al Barody, M.M.

    2006-01-01

    The whole world first knew nuclear terrorism during the second world war through the use of excessive violence that to terror exercised by one country against another, as was carried out by USA when it exploded two nuclear bombs over Hiroshima and Nagasaki t the end of the war. there are numerous types of nuclear terrorism that can be performed by individuals or organized groups for achieving political or social objectives. the definition of the term t errorism i s correlated with u sing means capable of creating a case of public dnger . that property exists in all types of direct or indirect nuclear terrorism . the present study is divided into two chapters. Chapter one deals with nuclear terrorism and consists of two sections , the first deals with the identification of the nature of nuclear terrorism an the second deals with organize nuclear terrorism on the international level. Chapter two deals with the confrontation of nuclear terrorism in two sections. the first deals with the role of the state in combating against nuclear terrorism nd the second deals with combating against nuclear terrorism on the international level. while internally it is confronted through promulgation of legislations that deal with the protection against nuclear terrorism as well as the national legal instruments for protection of nuclear materials and installation and combating illicit trafficking of nuclear materials, confrontation of nuclear terrorism on the international level is carried out through the promulgation of international convention such as that on suppression of actions of nuclear terrorism which shall be opened for signature on sept.14 -2005 according to the recommendation the general assembly of the united nations in its 59 t h session

  19. Games Based Study of Nonblind Confrontation

    Directory of Open Access Journals (Sweden)

    Yixian Yang

    2017-01-01

    Full Text Available Security confrontation is the second cornerstone of the General Theory of Security. And it can be divided into two categories: blind confrontation and nonblind confrontation between attackers and defenders. In this paper, we study the nonblind confrontation by some well-known games. We show the probability of winning and losing between the attackers and defenders from the perspective of channel capacity. We establish channel models and find that the attacker or the defender wining one time is equivalent to one bit transmitted successfully in the channel. This paper also gives unified solutions for all the nonblind confrontations.

  20. Navigating between Dialogue and Confrontation

    DEFF Research Database (Denmark)

    Thuesen, Frederik

    2011-01-01

    such as human rights and ethnic discrimination, issues that may involve strong emotions. Drawing inspiration from a qualitative methodology focusing on resistance and power, the article argues that in such situations the interviewer needs to integrate both dialogic and agonistic interview methodologies through...... phronesis, Aristotle’s concept of practical rationality. A phronetic approach, involving reflections on the link between reason and emotions, is well suited for handling both dialogue and confrontation in the interview process. Empirically, the paper draws on interviews with representatives of trade unions...... and employer organizations on the subject of human rights and ethnic discrimination in the Danish labor market....

  1. The Pawsey Supercomputer geothermal cooling project

    Science.gov (United States)

    Regenauer-Lieb, K.; Horowitz, F.; Western Australian Geothermal Centre Of Excellence, T.

    2010-12-01

    The Australian Government has funded the Pawsey supercomputer in Perth, Western Australia, providing computational infrastructure intended to support the future operations of the Australian Square Kilometre Array radiotelescope and to boost next-generation computational geosciences in Australia. Supplementary funds have been directed to the development of a geothermal exploration well to research the potential for direct heat use applications at the Pawsey Centre site. Cooling the Pawsey supercomputer may be achieved by geothermal heat exchange rather than by conventional electrical power cooling, thus reducing the carbon footprint of the Pawsey Centre and demonstrating an innovative green technology that is widely applicable in industry and urban centres across the world. The exploration well is scheduled to be completed in 2013, with drilling due to commence in the third quarter of 2011. One year is allocated to finalizing the design of the exploration, monitoring and research well. Success in the geothermal exploration and research program will result in an industrial-scale geothermal cooling facility at the Pawsey Centre, and will provide a world-class student training environment in geothermal energy systems. A similar system is partially funded and in advanced planning to provide base-load air-conditioning for the main campus of the University of Western Australia. Both systems are expected to draw ~80-95 degrees C water from aquifers lying between 2000 and 3000 meters depth from naturally permeable rocks of the Perth sedimentary basin. The geothermal water will be run through absorption chilling devices, which only require heat (as opposed to mechanical work) to power a chilled water stream adequate to meet the cooling requirements. Once the heat has been removed from the geothermal water, licensing issues require the water to be re-injected back into the aquifer system. These systems are intended to demonstrate the feasibility of powering large-scale air

  2. Modeling radiative transport in ICF plasmas on an IBM SP2 supercomputer

    International Nuclear Information System (INIS)

    Johansen, J.A.; MacFarlane, J.J.; Moses, G.A.

    1995-01-01

    At the University of Wisconsin-Madison the authors have integrated a collisional-radiative-equilibrium model into their CONRAD radiation-hydrodynamics code. This integrated package allows them to accurately simulate the transport processes involved in ICF plasmas; including the important effects of self-absorption of line-radiation. However, as they increase the amount of atomic structure utilized in their transport models, the computational demands increase nonlinearly. In an attempt to meet this increased computational demand, they have recently embarked on a mission to parallelize the CONRAD program. The parallel CONRAD development is being performed on an IBM SP2 supercomputer. The parallelism is based on a message passing paradigm, and is being implemented using PVM. At the present time they have determined that approximately 70% of the sequential program can be executed in parallel. Accordingly, they expect that the parallel version will yield a speedup on the order of three times that of the sequential version. This translates into only 10 hours of execution time for the parallel version, whereas the sequential version required 30 hours

  3. Confronting the stigma of epilepsy

    Directory of Open Access Journals (Sweden)

    Sanjeev V Thomas

    2011-01-01

    Full Text Available Stigma and resultant psychosocial issues are major hurdles that people with epilepsy confront in their daily life. People with epilepsy, particularly women, living in economically weak countries are often ill equipped to handle the stigma that they experience at multiple levels. This paper offers a systematic review of the research on stigma from sociology and social psychology and details how stigma linked to epilepsy or similar conditions can result in stereotyping, prejudice and discrimination. We also briefly discuss the strategies that are most commonly utilized to mitigate stigma. Neurologists and other health care providers, social workers, support groups and policy makers working with epilepsy need to have a deep understanding of the social and cultural perceptions of epilepsy and the related stigma. It is necessary that societies establish unique determinants of stigma and set up appropriate strategies to mitigate stigma and facilitate the complete inclusion of people with epilepsy as well as mitigating any existing discrimination.

  4. Education confronts the energy dilemma

    Energy Technology Data Exchange (ETDEWEB)

    None

    1977-01-01

    The conference was convened to present a role that America's schools could play in solving or coping with the energy crisis. Eleven sessions were conducted to fulfill this concern: Our Energy Crisis and Education: A Critical Assessment; The Energy Agenda at the Office of Education; Energy Resources: Scenarios for the Future; The Moral Dilemma of Energy Education; Constraints Influencing Education's Role; Energy Education: What's Been Done to Date; Practitioners Discuss Their Future Roles, Responsibilities; Politics of Energy Education; Confronting the Energy Dilemma; The Meaning of Scarcity; and The Impact of the Carter Energy Program on American Schools. Summary reports and reactions to the conference conclude the proceedings. (MCW)

  5. MILC Code Performance on High End CPU and GPU Supercomputer Clusters

    Science.gov (United States)

    DeTar, Carleton; Gottlieb, Steven; Li, Ruizi; Toussaint, Doug

    2018-03-01

    With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.

  6. MILC Code Performance on High End CPU and GPU Supercomputer Clusters

    Directory of Open Access Journals (Sweden)

    DeTar Carleton

    2018-01-01

    Full Text Available With recent developments in parallel supercomputing architecture, many core, multi-core, and GPU processors are now commonplace, resulting in more levels of parallelism, memory hierarchy, and programming complexity. It has been necessary to adapt the MILC code to these new processors starting with NVIDIA GPUs, and more recently, the Intel Xeon Phi processors. We report on our efforts to port and optimize our code for the Intel Knights Landing architecture. We consider performance of the MILC code with MPI and OpenMP, and optimizations with QOPQDP and QPhiX. For the latter approach, we concentrate on the staggered conjugate gradient and gauge force. We also consider performance on recent NVIDIA GPUs using the QUDA library.

  7. Supercomputing - Use Cases, Advances, The Future (1/2)

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    Supercomputing has become a staple of science and the poster child for aggressive developments in silicon technology, energy efficiency and programming. In this series we examine the key components of supercomputing setups and the various advances – recent and past – that made headlines and delivered bigger and bigger machines. We also take a closer look at the future prospects of supercomputing, and the extent of its overlap with high throughput computing, in the context of main use cases ranging from oil exploration to market simulation. On the first day, we will focus on the history and theory of supercomputing, the top500 list and the hardware that makes supercomputers tick. Lecturer's short bio: Andrzej Nowak has 10 years of experience in computing technologies, primarily from CERN openlab and Intel. At CERN, he managed a research lab collaborating with Intel and was part of the openlab Chief Technology Office. Andrzej also worked closely and initiated projects with the private sector (e.g. HP an...

  8. Integration of PanDA workload management system with Titan supercomputer at OLCF

    Science.gov (United States)

    De, K.; Klimentov, A.; Oleynik, D.; Panitkin, S.; Petrosyan, A.; Schovancova, J.; Vaniachine, A.; Wenaus, T.

    2015-12-01

    The PanDA (Production and Distributed Analysis) workload management system (WMS) was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment. While PanDA currently distributes jobs to more than 100,000 cores at well over 100 Grid sites, the future LHC data taking runs will require more resources than Grid computing can possibly provide. To alleviate these challenges, ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF). The current approach utilizes a modified PanDA pilot framework for job submission to Titan's batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on Titan's multicore worker nodes. It also gives PanDA new capability to collect, in real time, information about unused worker nodes on Titan, which allows precise definition of the size and duration of jobs submitted to Titan according to available free resources. This capability significantly reduces PanDA job wait time while improving Titan's utilization efficiency. This implementation was tested with a variety of Monte-Carlo workloads on Titan and is being tested on several other supercomputing platforms. Notice: This manuscript has been authored, by employees of Brookhaven Science Associates, LLC under Contract No. DE-AC02-98CH10886 with the U.S. Department of Energy. The publisher by accepting the manuscript for publication acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.

  9. Confronting shibboleths of dental education.

    Science.gov (United States)

    Masella, Richard S

    2005-10-01

    Shibboleths are common expressions presented as indisputable truths. When used in educational discussions, they reflect "motherhood and apple pie" viewpoints and tend to bring debate to a halt. Use of shibboleths may precede a desired imposition of "locksteps" in educational programming and are easily perceived as paternalistic by recipients. Nine shibboleths are presented as common beliefs of dental faculty and administrators. Evidence contradicting the veracity of the "obvious truths" is offered. The traditional "splendid isolation" of dentistry contributes to parochialism and belief in false shibboleths. Sound principles of higher and health professions education, student learning, and dental practice apply to dental education as to all health disciplines. Student passivity in dental education is not the best preparation for proficiency in dental practice. The master teacher possesses a repertoire of methodologies specific to meeting defined educational objectives. Active learning experiences bear close resemblances to professional duties and responsibilities and internally motivate future doctors of dental medicine. The difficulty in achieving curricular change leads to curricular entrenchment. Dentistry and dental education should not trade their ethical high ground for the relatively low ethical standards of the business world. Principles of professional ethics should govern relationships between dentists, whether within the dental school workplace or in practice. Suggestions are made on how to confront shibboleths in dental school settings.

  10. Emergency Managers Confront Climate Change

    Directory of Open Access Journals (Sweden)

    John R. Labadie

    2011-08-01

    Full Text Available Emergency managers will have to deal with the impending, uncertain, and possibly extreme effects of climate change. Yet, many emergency managers are not aware of the full range of possible effects, and they are unsure of their place in the effort to plan for, adapt to, and cope with those effects. This may partly reflect emergency mangers’ reluctance to get caught up in the rancorous—and politically-charged—debate about climate change, but it mostly is due to the worldview shared by most emergency managers. We focus on: extreme events; acute vs. chronic hazards (floods vs. droughts; a shorter event horizon (5 years vs. 75–100 years; and a shorter planning and operational cycle. This paper explores the important intersection of emergency management, environmental management, and climate change mitigation and adaptation. It examines the different definitions of terms common to all three fields, the overlapping strategies used in all three fields, and the best means of collaboration and mutual re-enforcement among the three to confront and solve the many possible futures that we may face in the climate change world.

  11. Speedup predictions on large scientific parallel programs

    International Nuclear Information System (INIS)

    Williams, E.; Bobrowicz, F.

    1985-01-01

    How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory

  12. Status of the Fermilab lattice supercomputer project

    International Nuclear Information System (INIS)

    Mackenzie, P.; Eichten, E.; Hockney, G.

    1988-10-01

    Fermilab has completed construction of a sixteen node (320 megaflop peak speed) parallel computer for lattice gauge theory calculations. The architecture was designed to provide the highest possible cost effectiveness while maintaining a high level of programmability and constraining as little as possible the types of lattice problems which can be done on it. The machine is programmed in C. It is a prototype for a 256 node (5 gigaflop peak speed) computer which will be assembled this winter. 6 refs

  13. Integration of PanDA workload management system with Titan supercomputer at OLCF

    CERN Document Server

    AUTHOR|(INSPIRE)INSPIRE-00300320; Klimentov, Alexei; Oleynik, Danila; Panitkin, Sergey; Petrosyan, Artem; Vaniachine, Alexandre; Wenaus, Torre; Schovancova, Jaroslava

    2015-01-01

    The PanDA (Production and Distributed Analysis) workload management system (WMS) was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment. While PanDA currently distributes jobs to more than 100,000 cores at well over 100 Grid sites, next LHC data taking run will require more resources than Grid computing can possibly provide. To alleviate these challenges, ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF). Current approach utilizes modi ed PanDA pilot framework for job submission to Titan's batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on Titan's multi-core worker nodes. It also gives PanDA new capability to collect, in real time, information about unused...

  14. Integration of PanDA workload management system with Titan supercomputer at OLCF

    CERN Document Server

    Panitkin, Sergey; The ATLAS collaboration; Klimentov, Alexei; Oleynik, Danila; Petrosyan, Artem; Schovancova, Jaroslava; Vaniachine, Alexandre; Wenaus, Torre

    2015-01-01

    The PanDA (Production and Distributed Analysis) workload management system (WMS) was developed to meet the scale and complexity of LHC distributed computing for the ATLAS experiment. While PanDA currently uses more than 100,000 cores at well over 100 Grid sites with a peak performance of 0.3 petaFLOPS, next LHC data taking run will require more resources than Grid computing can possibly provide. To alleviate these challenges, ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. We will describe a project aimed at integration of PanDA WMS with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF). Current approach utilizes modified PanDA pilot framework for job submission to Titan's batch queues and local data management, with light-weight MPI wrappers to run single threaded workloads in parallel on Titan's multi-core worker nodes. It also gives PanDA new capability to collect, in real tim...

  15. Communication Characterization and Optimization of Applications Using Topology-Aware Task Mapping on Large Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Sreepathi, Sarat [ORNL; D' Azevedo, Eduardo [ORNL; Philip, Bobby [ORNL; Worley, Patrick H [ORNL

    2016-01-01

    On large supercomputers, the job scheduling systems may assign a non-contiguous node allocation for user applications depending on available resources. With parallel applications using MPI (Message Passing Interface), the default process ordering does not take into account the actual physical node layout available to the application. This contributes to non-locality in terms of physical network topology and impacts communication performance of the application. In order to mitigate such performance penalties, this work describes techniques to identify suitable task mapping that takes the layout of the allocated nodes as well as the application's communication behavior into account. During the first phase of this research, we instrumented and collected performance data to characterize communication behavior of critical US DOE (United States - Department of Energy) applications using an augmented version of the mpiP tool. Subsequently, we developed several reordering methods (spectral bisection, neighbor join tree etc.) to combine node layout and application communication data for optimized task placement. We developed a tool called mpiAproxy to facilitate detailed evaluation of the various reordering algorithms without requiring full application executions. This work presents a comprehensive performance evaluation (14,000 experiments) of the various task mapping techniques in lowering communication costs on Titan, the leadership class supercomputer at Oak Ridge National Laboratory.

  16. Unique Methodologies for Nano/Micro Manufacturing Job Training Via Desktop Supercomputer Modeling and Simulation

    Energy Technology Data Exchange (ETDEWEB)

    Kimball, Clyde [Northern Illinois Univ., DeKalb, IL (United States); Karonis, Nicholas [Northern Illinois Univ., DeKalb, IL (United States); Lurio, Laurence [Northern Illinois Univ., DeKalb, IL (United States); Piot, Philippe [Northern Illinois Univ., DeKalb, IL (United States); Xiao, Zhili [Northern Illinois Univ., DeKalb, IL (United States); Glatz, Andreas [Northern Illinois Univ., DeKalb, IL (United States); Pohlman, Nicholas [Northern Illinois Univ., DeKalb, IL (United States); Hou, Minmei [Northern Illinois Univ., DeKalb, IL (United States); Demir, Veysel [Northern Illinois Univ., DeKalb, IL (United States); Song, Jie [Northern Illinois Univ., DeKalb, IL (United States); Duffin, Kirk [Northern Illinois Univ., DeKalb, IL (United States); Johns, Mitrick [Northern Illinois Univ., DeKalb, IL (United States); Sims, Thomas [Northern Illinois Univ., DeKalb, IL (United States); Yin, Yanbin [Northern Illinois Univ., DeKalb, IL (United States)

    2012-11-21

    This project establishes an initiative in high speed (Teraflop)/large-memory desktop supercomputing for modeling and simulation of dynamic processes important for energy and industrial applications. It provides a training ground for employment of current students in an emerging field with skills necessary to access the large supercomputing systems now present at DOE laboratories. It also provides a foundation for NIU faculty to quantum leap beyond their current small cluster facilities. The funding extends faculty and student capability to a new level of analytic skills with concomitant publication avenues. The components of the Hewlett Packard computer obtained by the DOE funds create a hybrid combination of a Graphics Processing System (12 GPU/Teraflops) and a Beowulf CPU system (144 CPU), the first expandable via the NIU GAEA system to ~60 Teraflops integrated with a 720 CPU Beowulf system. The software is based on access to the NVIDIA/CUDA library and the ability through MATLAB multiple licenses to create additional local programs. A number of existing programs are being transferred to the CPU Beowulf Cluster. Since the expertise necessary to create the parallel processing applications has recently been obtained at NIU, this effort for software development is in an early stage. The educational program has been initiated via formal tutorials and classroom curricula designed for the coming year. Specifically, the cost focus was on hardware acquisitions and appointment of graduate students for a wide range of applications in engineering, physics and computer science.

  17. Argonne Leadership Computing Facility 2011 annual report : Shaping future supercomputing.

    Energy Technology Data Exchange (ETDEWEB)

    Papka, M.; Messina, P.; Coffey, R.; Drugan, C. (LCF)

    2012-08-16

    The ALCF's Early Science Program aims to prepare key applications for the architecture and scale of Mira and to solidify libraries and infrastructure that will pave the way for other future production applications. Two billion core-hours have been allocated to 16 Early Science projects on Mira. The projects, in addition to promising delivery of exciting new science, are all based on state-of-the-art, petascale, parallel applications. The project teams, in collaboration with ALCF staff and IBM, have undertaken intensive efforts to adapt their software to take advantage of Mira's Blue Gene/Q architecture, which, in a number of ways, is a precursor to future high-performance-computing architecture. The Argonne Leadership Computing Facility (ALCF) enables transformative science that solves some of the most difficult challenges in biology, chemistry, energy, climate, materials, physics, and other scientific realms. Users partnering with ALCF staff have reached research milestones previously unattainable, due to the ALCF's world-class supercomputing resources and expertise in computation science. In 2011, the ALCF's commitment to providing outstanding science and leadership-class resources was honored with several prestigious awards. Research on multiscale brain blood flow simulations was named a Gordon Bell Prize finalist. Intrepid, the ALCF's BG/P system, ranked No. 1 on the Graph 500 list for the second consecutive year. The next-generation BG/Q prototype again topped the Green500 list. Skilled experts at the ALCF enable researchers to conduct breakthrough science on the Blue Gene system in key ways. The Catalyst Team matches project PIs with experienced computational scientists to maximize and accelerate research in their specific scientific domains. The Performance Engineering Team facilitates the effective use of applications on the Blue Gene system by assessing and improving the algorithms used by applications and the techniques used to

  18. Design considerations for parallel graphics libraries

    Science.gov (United States)

    Crockett, Thomas W.

    1994-01-01

    Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.

  19. PFLOTRAN: Reactive Flow & Transport Code for Use on Laptops to Leadership-Class Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Hammond, Glenn E.; Lichtner, Peter C.; Lu, Chuan; Mills, Richard T.

    2012-04-18

    PFLOTRAN, a next-generation reactive flow and transport code for modeling subsurface processes, has been designed from the ground up to run efficiently on machines ranging from leadership-class supercomputers to laptops. Based on an object-oriented design, the code is easily extensible to incorporate additional processes. It can interface seamlessly with Fortran 9X, C and C++ codes. Domain decomposition parallelism is employed, with the PETSc parallel framework used to manage parallel solvers, data structures and communication. Features of the code include a modular input file, implementation of high-performance I/O using parallel HDF5, ability to perform multiple realization simulations with multiple processors per realization in a seamless manner, and multiple modes for multiphase flow and multicomponent geochemical transport. Chemical reactions currently implemented in the code include homogeneous aqueous complexing reactions and heterogeneous mineral precipitation/dissolution, ion exchange, surface complexation and a multirate kinetic sorption model. PFLOTRAN has demonstrated petascale performance using 2{sup 17} processor cores with over 2 billion degrees of freedom. Accomplishments achieved to date include applications to the Hanford 300 Area and modeling CO{sub 2} sequestration in deep geologic formations.

  20. Convex unwraps its first grown-up supercomputer

    Energy Technology Data Exchange (ETDEWEB)

    Manuel, T.

    1988-03-03

    Convex Computer Corp.'s new supercomputer family is even more of an industry blockbuster than its first system. At a tenfold jump in performance, it's far from just an incremental upgrade over its first minisupercomputer, the C-1. The heart of the new family, the new C-2 processor, churning at 50 million floating-point operations/s, spawns a group of systems whose performance could pass for some fancy supercomputers-namely those of the Cray Research Inc. family. When added to the C-1, Convex's five new supercomputers create the C series, a six-member product group offering a performance range from 20 to 200 Mflops. They mark an important transition for Convex from a one-product high-tech startup to a multinational company with a wide-ranging product line. It's a tough transition but the Richardson, Texas, company seems to be doing it. The extended product line propels Convex into the upper end of the minisupercomputer class and nudges it into the low end of the big supercomputers. It positions Convex in an uncrowded segment of the market in the $500,000 to $1 million range offering 50 to 200 Mflops of performance. The company is making this move because the minisuper area, which it pioneered, quickly became crowded with new vendors, causing prices and gross margins to drop drastically.

  1. QCD on the BlueGene/L Supercomputer

    International Nuclear Information System (INIS)

    Bhanot, G.; Chen, D.; Gara, A.; Sexton, J.; Vranas, P.

    2005-01-01

    In June 2004 QCD was simulated for the first time at sustained speed exceeding 1 TeraFlops in the BlueGene/L supercomputer at the IBM T.J. Watson Research Lab. The implementation and performance of QCD in the BlueGene/L is presented

  2. QCD on the BlueGene/L Supercomputer

    Science.gov (United States)

    Bhanot, G.; Chen, D.; Gara, A.; Sexton, J.; Vranas, P.

    2005-03-01

    In June 2004 QCD was simulated for the first time at sustained speed exceeding 1 TeraFlops in the BlueGene/L supercomputer at the IBM T.J. Watson Research Lab. The implementation and performance of QCD in the BlueGene/L is presented.

  3. Supercomputers and the future of computational atomic scattering physics

    International Nuclear Information System (INIS)

    Younger, S.M.

    1989-01-01

    The advent of the supercomputer has opened new vistas for the computational atomic physicist. Problems of hitherto unparalleled complexity are now being examined using these new machines, and important connections with other fields of physics are being established. This talk briefly reviews some of the most important trends in computational scattering physics and suggests some exciting possibilities for the future. 7 refs., 2 figs

  4. Mathematical methods and supercomputing in nuclear applications. Proceedings. Vol. 2

    International Nuclear Information System (INIS)

    Kuesters, H.; Stein, E.; Werner, W.

    1993-04-01

    All papers of the two volumes are separately indexed in the data base. Main topics are: Progress in advanced numerical techniques, fluid mechanics, on-line systems, artificial intelligence applications, nodal methods reactor kinetics, reactor design, supercomputer architecture, probabilistic estimation of risk assessment, methods in transport theory, advances in Monte Carlo techniques, and man-machine interface. (orig.)

  5. Mathematical methods and supercomputing in nuclear applications. Proceedings. Vol. 1

    International Nuclear Information System (INIS)

    Kuesters, H.; Stein, E.; Werner, W.

    1993-04-01

    All papers of the two volumes are separately indexed in the data base. Main topics are: Progress in advanced numerical techniques, fluid mechanics, on-line systems, artificial intelligence applications, nodal methods reactor kinetics, reactor design, supercomputer architecture, probabilistic estimation of risk assessment, methods in transport theory, advances in Monte Carlo techniques, and man-machine interface. (orig.)

  6. Role of supercomputers in magnetic fusion and energy research programs

    International Nuclear Information System (INIS)

    Killeen, J.

    1985-06-01

    The importance of computer modeling in magnetic fusion (MFE) and energy research (ER) programs is discussed. The need for the most advanced supercomputers is described, and the role of the National Magnetic Fusion Energy Computer Center in meeting these needs is explained

  7. Flux-Level Transit Injection Experiments with NASA Pleiades Supercomputer

    Science.gov (United States)

    Li, Jie; Burke, Christopher J.; Catanzarite, Joseph; Seader, Shawn; Haas, Michael R.; Batalha, Natalie; Henze, Christopher; Christiansen, Jessie; Kepler Project, NASA Advanced Supercomputing Division

    2016-06-01

    Flux-Level Transit Injection (FLTI) experiments are executed with NASA's Pleiades supercomputer for the Kepler Mission. The latest release (9.3, January 2016) of the Kepler Science Operations Center Pipeline is used in the FLTI experiments. Their purpose is to validate the Analytic Completeness Model (ACM), which can be computed for all Kepler target stars, thereby enabling exoplanet occurrence rate studies. Pleiades, a facility of NASA's Advanced Supercomputing Division, is one of the world's most powerful supercomputers and represents NASA's state-of-the-art technology. We discuss the details of implementing the FLTI experiments on the Pleiades supercomputer. For example, taking into account that ~16 injections are generated by one core of the Pleiades processors in an hour, the “shallow” FLTI experiment, in which ~2000 injections are required per target star, can be done for 16% of all Kepler target stars in about 200 hours. Stripping down the transit search to bare bones, i.e. only searching adjacent high/low periods at high/low pulse durations, makes the computationally intensive FLTI experiments affordable. The design of the FLTI experiments and the analysis of the resulting data are presented in “Validating an Analytic Completeness Model for Kepler Target Stars Based on Flux-level Transit Injection Experiments” by Catanzarite et al. (#2494058).Kepler was selected as the 10th mission of the Discovery Program. Funding for the Kepler Mission has been provided by the NASA Science Mission Directorate.

  8. Parallel R-matrix computation

    International Nuclear Information System (INIS)

    Heggarty, J.W.

    1999-06-01

    For almost thirty years, sequential R-matrix computation has been used by atomic physics research groups, from around the world, to model collision phenomena involving the scattering of electrons or positrons with atomic or molecular targets. As considerable progress has been made in the understanding of fundamental scattering processes, new data, obtained from more complex calculations, is of current interest to experimentalists. Performing such calculations, however, places considerable demands on the computational resources to be provided by the target machine, in terms of both processor speed and memory requirement. Indeed, in some instances the computational requirements are so great that the proposed R-matrix calculations are intractable, even when utilising contemporary classic supercomputers. Historically, increases in the computational requirements of R-matrix computation were accommodated by porting the problem codes to a more powerful classic supercomputer. Although this approach has been successful in the past, it is no longer considered to be a satisfactory solution due to the limitations of current (and future) Von Neumann machines. As a consequence, there has been considerable interest in the high performance multicomputers, that have emerged over the last decade which appear to offer the computational resources required by contemporary R-matrix research. Unfortunately, developing codes for these machines is not as simple a task as it was to develop codes for successive classic supercomputers. The difficulty arises from the considerable differences in the computing models that exist between the two types of machine and results in the programming of multicomputers to be widely acknowledged as a difficult, time consuming and error-prone task. Nevertheless, unless parallel R-matrix computation is realised, important theoretical and experimental atomic physics research will continue to be hindered. This thesis describes work that was undertaken in

  9. Reactive flow simulations in complex geometries with high-performance supercomputing

    International Nuclear Information System (INIS)

    Rehm, W.; Gerndt, M.; Jahn, W.; Vogelsang, R.; Binninger, B.; Herrmann, M.; Olivier, H.; Weber, M.

    2000-01-01

    In this paper, we report on a modern field code cluster consisting of state-of-the-art reactive Navier-Stokes- and reactive Euler solvers that has been developed on vector- and parallel supercomputers at the research center Juelich. This field code cluster is used for hydrogen safety analyses of technical systems, for example, in the field of nuclear reactor safety and conventional hydrogen demonstration plants with fuel cells. Emphasis is put on the assessment of combustion loads, which could result from slow, fast or rapid flames, including transition from deflagration to detonation. As a sample of proof tests, the special tools have been tested for specific tasks, based on the comparison of experimental and numerical results, which are in reasonable agreement. (author)

  10. Confronting New Demands : Inclusive Growth, Inclusive Trade ...

    International Development Research Centre (IDRC) Digital Library (Canada)

    Confronting New Demands : Inclusive Growth, Inclusive Trade. Policymakers, businesspeople and civil society advocates need evidence-based research to react ... understood implications, such as labour standards and intellectual property; ...

  11. Supercomputations and big-data analysis in strong-field ultrafast optical physics: filamentation of high-peak-power ultrashort laser pulses

    Science.gov (United States)

    Voronin, A. A.; Panchenko, V. Ya; Zheltikov, A. M.

    2016-06-01

    High-intensity ultrashort laser pulses propagating in gas media or in condensed matter undergo complex nonlinear spatiotemporal evolution where temporal transformations of optical field waveforms are strongly coupled to an intricate beam dynamics and ultrafast field-induced ionization processes. At the level of laser peak powers orders of magnitude above the critical power of self-focusing, the beam exhibits modulation instabilities, producing random field hot spots and breaking up into multiple noise-seeded filaments. This problem is described by a (3  +  1)-dimensional nonlinear field evolution equation, which needs to be solved jointly with the equation for ultrafast ionization of a medium. Analysis of this problem, which is equivalent to solving a billion-dimensional evolution problem, is only possible by means of supercomputer simulations augmented with coordinated big-data processing of large volumes of information acquired through theory-guiding experiments and supercomputations. Here, we review the main challenges of supercomputations and big-data processing encountered in strong-field ultrafast optical physics and discuss strategies to confront these challenges.

  12. An efficient parallel algorithm for matrix-vector multiplication

    Energy Technology Data Exchange (ETDEWEB)

    Hendrickson, B.; Leland, R.; Plimpton, S.

    1993-03-01

    The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in the well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.

  13. Parallel-Vector Algorithm For Rapid Structural Anlysis

    Science.gov (United States)

    Agarwal, Tarun R.; Nguyen, Duc T.; Storaasli, Olaf O.

    1993-01-01

    New algorithm developed to overcome deficiency of skyline storage scheme by use of variable-band storage scheme. Exploits both parallel and vector capabilities of modern high-performance computers. Gives engineers and designers opportunity to include more design variables and constraints during optimization of structures. Enables use of more refined finite-element meshes to obtain improved understanding of complex behaviors of aerospace structures leading to better, safer designs. Not only attractive for current supercomputers but also for next generation of shared-memory supercomputers.

  14. A user-friendly web portal for T-Coffee on supercomputers

    Directory of Open Access Journals (Sweden)

    Koetsier Jos

    2011-05-01

    Full Text Available Abstract Background Parallel T-Coffee (PTC was the first parallel implementation of the T-Coffee multiple sequence alignment tool. It is based on MPI and RMA mechanisms. Its purpose is to reduce the execution time of the large-scale sequence alignments. It can be run on distributed memory clusters allowing users to align data sets consisting of hundreds of proteins within a reasonable time. However, most of the potential users of this tool are not familiar with the use of grids or supercomputers. Results In this paper we show how PTC can be easily deployed and controlled on a super computer architecture using a web portal developed using Rapid. Rapid is a tool for efficiently generating standardized portlets for a wide range of applications and the approach described here is generic enough to be applied to other applications, or to deploy PTC on different HPC environments. Conclusions The PTC portal allows users to upload a large number of sequences to be aligned by the parallel version of TC that cannot be aligned by a single machine due to memory and execution time constraints. The web portal provides a user-friendly solution.

  15. Integration of Titan supercomputer at OLCF with ATLAS production system

    CERN Document Server

    Panitkin, Sergey; The ATLAS collaboration

    2016-01-01

    The PanDA (Production and Distributed Analysis) workload management system was developed to meet the scale and complexity of distributed computing for the ATLAS experiment. PanDA managed resources are distributed worldwide, on hundreds of computing sites, with thousands of physicists accessing hundreds of Petabytes of data and the rate of data processing already exceeds Exabyte per year. While PanDA currently uses more than 200,000 cores at well over 100 Grid sites, future LHC data taking runs will require more resources than Grid computing can possibly provide. Additional computing and storage resources are required. Therefore ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. In this talk we will describe a project aimed at integration of ATLAS Production System with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF). Current approach utilizes modified PanDA Pilot framework for job...

  16. Integration of Titan supercomputer at OLCF with ATLAS Production System

    CERN Document Server

    AUTHOR|(SzGeCERN)643806; The ATLAS collaboration; De, Kaushik; Klimentov, Alexei; Nilsson, Paul; Oleynik, Danila; Padolski, Siarhei; Panitkin, Sergey; Wenaus, Torre

    2017-01-01

    The PanDA (Production and Distributed Analysis) workload management system was developed to meet the scale and complexity of distributed computing for the ATLAS experiment. PanDA managed resources are distributed worldwide, on hundreds of computing sites, with thousands of physicists accessing hundreds of Petabytes of data and the rate of data processing already exceeds Exabyte per year. While PanDA currently uses more than 200,000 cores at well over 100 Grid sites, future LHC data taking runs will require more resources than Grid computing can possibly provide. Additional computing and storage resources are required. Therefore ATLAS is engaged in an ambitious program to expand the current computing model to include additional resources such as the opportunistic use of supercomputers. In this paper we will describe a project aimed at integration of ATLAS Production System with Titan supercomputer at Oak Ridge Leadership Computing Facility (OLCF). Current approach utilizes modified PanDA Pilot framework for jo...

  17. Intelligent Personal Supercomputer for Solving Scientific and Technical Problems

    Directory of Open Access Journals (Sweden)

    Khimich, O.M.

    2016-09-01

    Full Text Available New domestic intellіgent personal supercomputer of hybrid architecture Inparkom_pg for the mathematical modeling of processes in the defense industry, engineering, construction, etc. was developed. Intelligent software for the automatic research and tasks of computational mathematics with approximate data of different structures was designed. Applied software to provide mathematical modeling problems in construction, welding and filtration processes was implemented.

  18. Cellular-automata supercomputers for fluid-dynamics modeling

    International Nuclear Information System (INIS)

    Margolus, N.; Toffoli, T.; Vichniac, G.

    1986-01-01

    We report recent developments in the modeling of fluid dynamics, and give experimental results (including dynamical exponents) obtained using cellular automata machines. Because of their locality and uniformity, cellular automata lend themselves to an extremely efficient physical realization; with a suitable architecture, an amount of hardware resources comparable to that of a home computer can achieve (in the simulation of cellular automata) the performance of a conventional supercomputer

  19. Optimisation of a parallel ocean general circulation model

    Science.gov (United States)

    Beare, M. I.; Stevens, D. P.

    1997-10-01

    This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  20. Extracting the Textual and Temporal Structure of Supercomputing Logs

    Energy Technology Data Exchange (ETDEWEB)

    Jain, S; Singh, I; Chandra, A; Zhang, Z; Bronevetsky, G

    2009-05-26

    Supercomputers are prone to frequent faults that adversely affect their performance, reliability and functionality. System logs collected on these systems are a valuable resource of information about their operational status and health. However, their massive size, complexity, and lack of standard format makes it difficult to automatically extract information that can be used to improve system management. In this work we propose a novel method to succinctly represent the contents of supercomputing logs, by using textual clustering to automatically find the syntactic structures of log messages. This information is used to automatically classify messages into semantic groups via an online clustering algorithm. Further, we describe a methodology for using the temporal proximity between groups of log messages to identify correlated events in the system. We apply our proposed methods to two large, publicly available supercomputing logs and show that our technique features nearly perfect accuracy for online log-classification and extracts meaningful structural and temporal message patterns that can be used to improve the accuracy of other log analysis techniques.

  1. Computational fluid dynamics: complex flows requiring supercomputers. January 1975-July 1988 (Citations from the INSPEC: Information Services for the Physics and Engineering Communities data base). Report for January 1975-July 1988

    International Nuclear Information System (INIS)

    1988-08-01

    This bibliography contains citations concerning computational fluid dynamics (CFD), a new method in computational science to perform complex flow simulations in three dimensions. Applications include aerodynamic design and analysis for aircraft, rockets, and missiles, and automobiles; heat-transfer studies; and combustion processes. Included are references to supercomputers, array processors, and parallel processors where needed for complete, integrated design. Also included are software packages and grid-generation techniques required to apply CFD numerical solutions. Numerical methods for fluid dynamics, not requiring supercomputers, are found in a separate published search. (Contains 83 citations fully indexed and including a title list.)

  2. Recognizing, Confronting, and Eliminating Workplace Bullying.

    Science.gov (United States)

    Berry, Peggy Ann; Gillespie, Gordon L; Fisher, Bonnie S; Gormley, Denise K

    2016-07-01

    Workplace bullying (WPB) behaviors negatively affect nurse productivity, satisfaction, and retention, and hinder safe patient care. The purpose of this article is to define WPB, differentiate between incivility and WPB, and recommend actions to prevent WPB behaviors. Informed occupational and environmental health nurses and nurse leaders must recognize, confront, and eliminate WPB in their facilities and organizations. Recognizing, confronting, and eliminating WPB behaviors in health care is a crucial first step toward sustained improvements in patient care quality and the health and safety of health care employees. © 2016 The Author(s).

  3. Confronting new technological challenges in HEP

    International Nuclear Information System (INIS)

    Savoy-Navarro, Aurore

    2000-01-01

    The new technological challenges that will have to be confronted in HEP are mainly due to the new physics issues. What is beyond the standard model? That is the question. This review will first list the physics demands in order to explore this unknown world. It will then show with appropriate examples, how the new physics will require confronting new technological challenges in: designing new accelerators, developing new detectors, designing new front-end readout systems and using the new software and hardware tools for the online readout and DAQ systems

  4. Parallel plasma fluid turbulence calculations

    International Nuclear Information System (INIS)

    Leboeuf, J.N.; Carreras, B.A.; Charlton, L.A.; Drake, J.B.; Lynch, V.E.; Newman, D.E.; Sidikman, K.L.; Spong, D.A.

    1994-01-01

    The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center's CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated

  5. Proton decay: Numerical simulations confront grand unification

    International Nuclear Information System (INIS)

    Brower, R.C.; Maturana, G.; Giles, R.C.; Moriarty, K.J.M.; Samuel, S.

    1985-01-01

    The Grand Unified Theories of the electromagnetic, weak and strong interactions constitute a far reaching attempt to synthesize our knowledge of theoretical particle physics into a consistent and compelling whole. Unfortunately, many quantitative predictions of such unified theories are sensitive to the analytically intractible effects of the strong subnuclear theory (Quantum Chromodynamics or QCD). The consequence is that even ambitious experimental programs exploring weak and super-weak interaction effects often fail to give definitive theoretical tests. This paper describes large-scale calculations on a supercomputer which can help to overcome this gap between theoretical predictions and experimental results. Our focus here is on proton decay, though the methods described are useful for many weak processes. The basic algorithms for the numerical simulation of QCD are well known. We will discuss the advantages and challenges of applying these methods to weak transitions. The algorithms require a very large data base with regular data flow and are natural candidates for vectorization. Also, 32-bit floating point arithmetic is adequate. Thus they are most naturally approached using a supercomputer alone or in combination with a dedicated special purpose processor. (orig.)

  6. Development of a high performance eigensolver on the peta-scale next generation supercomputer system

    International Nuclear Information System (INIS)

    Imamura, Toshiyuki; Yamada, Susumu; Machida, Masahiko

    2010-01-01

    For the present supercomputer systems, a multicore and multisocket processors are necessary to build a system, and choice of interconnection is essential. In addition, for effective development of a new code, high performance, scalable, and reliable numerical software is one of the key items. ScaLAPACK and PETSc are well-known software on distributed memory parallel computer systems. It is needless to say that highly tuned software towards new architecture like many-core processors must be chosen for real computation. In this study, we present a high-performance and high-scalable eigenvalue solver towards the next-generation supercomputer system, so called 'K-computer' system. We have developed two versions, the standard version (eigen s) and enhanced performance version (eigen sx), which are developed on the T2K cluster system housed at University of Tokyo. Eigen s employs the conventional algorithms; Householder tridiagonalization, divide and conquer (DC) algorithm, and Householder back-transformation. They are carefully implemented with blocking technique and flexible two-dimensional data-distribution to reduce the overhead of memory traffic and data transfer, respectively. Eigen s performs excellently on the T2K system with 4096 cores (theoretical peak is 37.6 TFLOPS), and it shows fine performance 3.0 TFLOPS with a two hundred thousand dimensional matrix. The enhanced version, eigen sx, uses more advanced algorithms; the narrow-band reduction algorithm, DC for band matrices, and the block Householder back-transformation with WY-representation. Even though this version is still on a test stage, it shows 4.7 TFLOPS with the same dimensional matrix on eigen s. (author)

  7. High Temporal Resolution Mapping of Seismic Noise Sources Using Heterogeneous Supercomputers

    Science.gov (United States)

    Paitz, P.; Gokhberg, A.; Ermert, L. A.; Fichtner, A.

    2017-12-01

    The time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems like earthquake fault zones, volcanoes, geothermal and hydrocarbon reservoirs. We present results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service providing seismic noise source maps for Central Europe with high temporal resolution. We use source imaging methods based on the cross-correlation of seismic noise records from all seismic stations available in the region of interest. The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept to provide the interested researchers worldwide with regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for collecting, on a periodic basis, raw seismic records from the European seismic networks, (2) high-performance noise source mapping application responsible for the generation of source maps using cross-correlation of seismic records, (3) back-end infrastructure for the coordination of various tasks and computations, (4) front-end Web interface providing the service to the end-users and (5) data repository. The noise source mapping itself rests on the measurement of logarithmic amplitude ratios in suitably pre-processed noise correlations, and the use of simplified sensitivity kernels. During the implementation we addressed various challenges, in particular, selection of data sources and transfer protocols, automation and monitoring of daily data downloads, ensuring the required data processing performance, design of a general service-oriented architecture for coordination of various sub-systems, and

  8. Efficient development of memory bounded geo-applications to scale on modern supercomputers

    Science.gov (United States)

    Räss, Ludovic; Omlin, Samuel; Licul, Aleksandar; Podladchikov, Yuri; Herman, Frédéric

    2016-04-01

    Numerical modeling is an actual key tool in the area of geosciences. The current challenge is to solve problems that are multi-physics and for which the length scale and the place of occurrence might not be known in advance. Also, the spatial extend of the investigated domain might strongly vary in size, ranging from millimeters for reactive transport to kilometers for glacier erosion dynamics. An efficient way to proceed is to develop simple but robust algorithms that perform well and scale on modern supercomputers and permit therefore very high-resolution simulations. We propose an efficient approach to solve memory bounded real-world applications on modern supercomputers architectures. We optimize the software to run on our newly acquired state-of-the-art GPU cluster "octopus". Our approach shows promising preliminary results on important geodynamical and geomechanical problematics: we have developed a Stokes solver for glacier flow and a poromechanical solver including complex rheologies for nonlinear waves in stressed rocks porous rocks. We solve the system of partial differential equations on a regular Cartesian grid and use an iterative finite difference scheme with preconditioning of the residuals. The MPI communication happens only locally (point-to-point); this method is known to scale linearly by construction. The "octopus" GPU cluster, which we use for the computations, has been designed to achieve maximal data transfer throughput at minimal hardware cost. It is composed of twenty compute nodes, each hosting four Nvidia Titan X GPU accelerators. These high-density nodes are interconnected with a parallel (dual-rail) FDR InfiniBand network. Our efforts show promising preliminary results for the different physics investigated. The glacier flow solver achieves good accuracy in the relevant benchmarks and the coupled poromechanical solver permits to explain previously unresolvable focused fluid flow as a natural outcome of the porosity setup. In both cases

  9. Engineering-Based Thermal CFD Simulations on Massive Parallel Systems

    KAUST Repository

    Frisch, Jérôme

    2015-05-22

    The development of parallel Computational Fluid Dynamics (CFD) codes is a challenging task that entails efficient parallelization concepts and strategies in order to achieve good scalability values when running those codes on modern supercomputers with several thousands to millions of cores. In this paper, we present a hierarchical data structure for massive parallel computations that supports the coupling of a Navier–Stokes-based fluid flow code with the Boussinesq approximation in order to address complex thermal scenarios for energy-related assessments. The newly designed data structure is specifically designed with the idea of interactive data exploration and visualization during runtime of the simulation code; a major shortcoming of traditional high-performance computing (HPC) simulation codes. We further show and discuss speed-up values obtained on one of Germany’s top-ranked supercomputers with up to 140,000 processes and present simulation results for different engineering-based thermal problems.

  10. Macroeconomic Issues Confronting the Next President.

    Science.gov (United States)

    Solow, Robert M.

    1988-01-01

    Identifies economic issues that confronted the United States in the late 1980's and discusses how the president might deal with them. Highlights the following issues: recession, rising price levels, the budget deficit, international trade imbalance, and revival of U.S. long-term growth. (GEA)

  11. Confronting the neoliberal and libertarian reconceptualisations of ...

    African Journals Online (AJOL)

    Confronting this phenomenon, this paper reviews neoliberal and libertarian understandings of educational equality and democratic education and interrogates the rationale for the justification of markets in education. In the process, I criticise the notion of possessive individualism as a principle of democratic education on the ...

  12. 6th International Parallel Tools Workshop

    CERN Document Server

    Brinkmann, Steffen; Gracia, José; Resch, Michael; Nagel, Wolfgang

    2013-01-01

    The latest advances in the High Performance Computing hardware have significantly raised the level of available compute performance. At the same time, the growing hardware capabilities of modern supercomputing architectures have caused an increasing complexity of the parallel application development. Despite numerous efforts to improve and simplify parallel programming, there is still a lot of manual debugging and  tuning work required. This process  is supported by special software tools, facilitating debugging, performance analysis, and optimization and thus  making a major contribution to the development of  robust and efficient parallel software. This book introduces a selection of the tools, which were presented and discussed at the 6th International Parallel Tools Workshop, held in Stuttgart, Germany, 25-26 September 2012.

  13. Benchmarking Further Single Board Computers for Building a Mini Supercomputer for Simulation of Telecommunication Systems

    Directory of Open Access Journals (Sweden)

    Gábor Lencse

    2016-01-01

    Full Text Available Parallel Discrete Event Simulation (PDES with the conservative synchronization method can be efficiently used for the performance analysis of telecommunication systems because of their good lookahead properties. For PDES, a cost effective execution platform may be built by using single board computers (SBCs, which offer relatively high computation capacity compared to their price or power consumption and especially to the space they take up. A benchmarking method is proposed and its operation is demonstrated by benchmarking ten different SBCs, namely Banana Pi, Beaglebone Black, Cubieboard2, Odroid-C1+, Odroid-U3+, Odroid-XU3 Lite, Orange Pi Plus, Radxa Rock Lite, Raspberry Pi Model B+, and Raspberry Pi 2 Model B+. Their benchmarking results are compared to find out which one should be used for building a mini supercomputer for parallel discrete-event simulation of telecommunication systems. The SBCs are also used to build a heterogeneous cluster and the performance of the cluster is tested, too.

  14. Applications of supercomputing and the utility industry: Calculation of power transfer capabilities

    International Nuclear Information System (INIS)

    Jensen, D.D.; Behling, S.R.; Betancourt, R.

    1990-01-01

    Numerical models and iterative simulation using supercomputers can furnish cost-effective answers to utility industry problems that are all but intractable using conventional computing equipment. An example of the use of supercomputers by the utility industry is the determination of power transfer capability limits for power transmission systems. This work has the goal of markedly reducing the run time of transient stability codes used to determine power distributions following major system disturbances. To date, run times of several hours on a conventional computer have been reduced to several minutes on state-of-the-art supercomputers, with further improvements anticipated to reduce run times to less than a minute. In spite of the potential advantages of supercomputers, few utilities have sufficient need for a dedicated in-house supercomputing capability. This problem is resolved using a supercomputer center serving a geographically distributed user base coupled via high speed communication networks

  15. The shape of the invisible halo: N-body simulations on parallel supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Warren, M.S.; Zurek, W.H. (Los Alamos National Lab., NM (USA)); Quinn, P.J. (Australian National Univ., Canberra (Australia). Mount Stromlo and Siding Spring Observatories); Salmon, J.K. (California Inst. of Tech., Pasadena, CA (USA))

    1990-01-01

    We study the shapes of halos and the relationship to their angular momentum content by means of N-body (N {approximately} 10{sup 6}) simulations. Results indicate that in relaxed halos with no apparent substructure: (i) the shape and orientation of the isodensity contours tends to persist throughout the virialised portion of the halo; (ii) most ({approx}70%) of the halos are prolate; (iii) the approximate direction of the angular momentum vector tends to persist throughout the halo; (iv) for spherical shells centered on the core of the halo the magnitude of the specific angular momentum is approximately proportional to their radius; (v) the shortest axis of the ellipsoid which approximates the shape of the halo tends to align with the rotation axis of the halo. This tendency is strongest in the fastest rotating halos. 13 refs., 4 figs.

  16. Dust modelling and forecasting in the Barcelona Supercomputing Center: Activities and developments

    Energy Technology Data Exchange (ETDEWEB)

    Perez, C; Baldasano, J M; Jimenez-Guerrero, P; Jorba, O; Haustein, K; Basart, S [Earth Sciences Department. Barcelona Supercomputing Center. Barcelona (Spain); Cuevas, E [Izanaa Atmospheric Research Center. Agencia Estatal de Meteorologia, Tenerife (Spain); Nickovic, S [Atmospheric Research and Environment Branch, World Meteorological Organization, Geneva (Switzerland)], E-mail: carlos.perez@bsc.es

    2009-03-01

    The Barcelona Supercomputing Center (BSC) is the National Supercomputer Facility in Spain, hosting MareNostrum, one of the most powerful Supercomputers in Europe. The Earth Sciences Department of BSC operates daily regional dust and air quality forecasts and conducts intensive modelling research for short-term operational prediction. This contribution summarizes the latest developments and current activities in the field of sand and dust storm modelling and forecasting.

  17. Dust modelling and forecasting in the Barcelona Supercomputing Center: Activities and developments

    International Nuclear Information System (INIS)

    Perez, C; Baldasano, J M; Jimenez-Guerrero, P; Jorba, O; Haustein, K; Basart, S; Cuevas, E; Nickovic, S

    2009-01-01

    The Barcelona Supercomputing Center (BSC) is the National Supercomputer Facility in Spain, hosting MareNostrum, one of the most powerful Supercomputers in Europe. The Earth Sciences Department of BSC operates daily regional dust and air quality forecasts and conducts intensive modelling research for short-term operational prediction. This contribution summarizes the latest developments and current activities in the field of sand and dust storm modelling and forecasting.

  18. Supercomputers and the mathematical modeling of high complexity problems

    International Nuclear Information System (INIS)

    Belotserkovskii, Oleg M

    2010-01-01

    This paper is a review of many works carried out by members of our scientific school in past years. The general principles of constructing numerical algorithms for high-performance computers are described. Several techniques are highlighted and these are based on the method of splitting with respect to physical processes and are widely used in computing nonlinear multidimensional processes in fluid dynamics, in studies of turbulence and hydrodynamic instabilities and in medicine and other natural sciences. The advances and developments related to the new generation of high-performance supercomputing in Russia are presented.

  19. Performance Evaluation of Supercomputers using HPCC and IMB Benchmarks

    Science.gov (United States)

    Saini, Subhash; Ciotti, Robert; Gunney, Brian T. N.; Spelce, Thomas E.; Koniges, Alice; Dossa, Don; Adamidis, Panagiotis; Rabenseifner, Rolf; Tiyyagura, Sunil R.; Mueller, Matthias; hide

    2006-01-01

    The HPC Challenge (HPCC) benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnect fabric of five leading supercomputers - SGI Altix BX2, Cray XI, Cray Opteron Cluster, Dell Xeon cluster, and NEC SX-8. These five systems use five different networks (SGI NUMALINK4, Cray network, Myrinet, InfiniBand, and NEC IXS). The complete set of HPCC benchmarks are run on each of these systems. Additionally, we present Intel MPI Benchmarks (IMB) results to study the performance of 11 MPI communication functions on these systems.

  20. Who confronts prejudice?: the role of implicit theories in the motivation to confront prejudice.

    Science.gov (United States)

    Rattan, Aneeta; Dweck, Carol S

    2010-07-01

    Despite the possible costs, confronting prejudice can have important benefits, ranging from the well-being of the target of prejudice to social change. What, then, motivates targets of prejudice to confront people who express explicit bias? In three studies, we tested the hypothesis that targets who hold an incremental theory of personality (i.e., the belief that people can change) are more likely to confront prejudice than targets who hold an entity theory of personality (i.e., the belief that people have fixed traits). In Study 1, targets' beliefs about the malleability of personality predicted whether they spontaneously confronted an individual who expressed bias. In Study 2, targets who held more of an incremental theory reported that they would be more likely to confront prejudice and less likely to withdraw from future interactions with an individual who expressed prejudice. In Study 3, we manipulated implicit theories and replicated these findings. By highlighting the central role that implicit theories of personality play in targets' motivation to confront prejudice, this research has important implications for intergroup relations and social change.

  1. LUCKY-TD code for solving the time-dependent transport equation with the use of parallel computations

    Energy Technology Data Exchange (ETDEWEB)

    Moryakov, A. V., E-mail: sailor@orc.ru [National Research Centre Kurchatov Institute (Russian Federation)

    2016-12-15

    An algorithm for solving the time-dependent transport equation in the P{sub m}S{sub n} group approximation with the use of parallel computations is presented. The algorithm is implemented in the LUCKY-TD code for supercomputers employing the MPI standard for the data exchange between parallel processes.

  2. CHILD WITNESSES AND THE CONFRONTATION CLAUSE.

    Science.gov (United States)

    Lyon, Thomas D; Dente, Julia A

    2012-01-01

    After the Supreme Court's ruling in Crawford v. Washington that a criminal defendant's right to confront the witnesses against him is violated by the admission of testimonial hearsay that has not been cross-examined, lower courts have overturned convictions in which hearsay from children was admitted after child witnesses were either unwilling or unable to testify. A review of social scientific evidence regarding the dynamics of child sexual abuse suggests a means for facilitating the fair receipt of children's evidence. Courts should hold that defendants have forfeited their confrontation rights if they exploited a child's vulnerabilities such that they could reasonably anticipate that the child would be unavailable to testify. Exploitation includes choosing victims on the basis of their filial dependency, their vulnerability, or their immaturity, as well as taking actions that create or accentuate those vulnerabilities.

  3. Goal preference shapes confrontations of sexism.

    Science.gov (United States)

    Mallett, Robyn K; Melchiori, Kala J

    2014-05-01

    Although most women assume they would confront sexism, assertive responses are rare. We test whether women's preference for respect or liking during interpersonal interactions explains this surprising tendency. Women report preferring respect relative to liking after being asked sexist, compared with inappropriate, questions during a virtual job interview (Study 1, n = 149). Women's responses to sexism increase in assertiveness along with their preference for being respected, and a respect-preference mediates the relation between the type of questions and response assertiveness (Studies 1 and 2). In Study 2 (n = 105), women's responses to sexist questions are more assertive when the sense of belonging is enhanced with a belonging manipulation. Moreover, preference for respect mediates the effect of the type of questions on response assertiveness, but only when belonging needs are met. Thus the likelihood of confrontation depends on the goal to be respected outweighing the goal to be liked.

  4. Parallel rendering

    Science.gov (United States)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  5. Centralized supercomputer support for magnetic fusion energy research

    International Nuclear Information System (INIS)

    Fuss, D.; Tull, G.G.

    1984-01-01

    High-speed computers with large memories are vital to magnetic fusion energy research. Magnetohydrodynamic (MHD), transport, equilibrium, Vlasov, particle, and Fokker-Planck codes that model plasma behavior play an important role in designing experimental hardware and interpreting the resulting data, as well as in advancing plasma theory itself. The size, architecture, and software of supercomputers to run these codes are often the crucial constraints on the benefits such computational modeling can provide. Hence, vector computers such as the CRAY-1 offer a valuable research resource. To meet the computational needs of the fusion program, the National Magnetic Fusion Energy Computer Center (NMFECC) was established in 1974 at the Lawrence Livermore National Laboratory. Supercomputers at the central computing facility are linked to smaller computer centers at each of the major fusion laboratories by a satellite communication network. In addition to providing large-scale computing, the NMFECC environment stimulates collaboration and the sharing of computer codes and data among the many fusion researchers in a cost-effective manner

  6. Personal Supercomputing for Monte Carlo Simulation Using a GPU

    Energy Technology Data Exchange (ETDEWEB)

    Oh, Jae-Yong; Koo, Yang-Hyun; Lee, Byung-Ho [Korea Atomic Energy Research Institute, Daejeon (Korea, Republic of)

    2008-05-15

    Since the usability, accessibility, and maintenance of a personal computer (PC) are very good, a PC is a useful computer simulation tool for researchers. It has enough calculation power to simulate a small scale system with the improved performance of a PC's CPU. However, if a system is large or long time scale, we need a cluster computer or supercomputer. Recently great changes have occurred in the PC calculation environment. A graphic process unit (GPU) on a graphic card, only used to calculate display data, has a superior calculation capability to a PC's CPU. This GPU calculation performance is a match for the supercomputer in 2000. Although it has such a great calculation potential, it is not easy to program a simulation code for GPU due to difficult programming techniques for converting a calculation matrix to a 3D rendering image using graphic APIs. In 2006, NVIDIA provided the Software Development Kit (SDK) for the programming environment for NVIDIA's graphic cards, which is called the Compute Unified Device Architecture (CUDA). It makes the programming on the GPU easy without knowledge of the graphic APIs. This paper describes the basic architectures of NVIDIA's GPU and CUDA, and carries out a performance benchmark for the Monte Carlo simulation.

  7. Personal Supercomputing for Monte Carlo Simulation Using a GPU

    International Nuclear Information System (INIS)

    Oh, Jae-Yong; Koo, Yang-Hyun; Lee, Byung-Ho

    2008-01-01

    Since the usability, accessibility, and maintenance of a personal computer (PC) are very good, a PC is a useful computer simulation tool for researchers. It has enough calculation power to simulate a small scale system with the improved performance of a PC's CPU. However, if a system is large or long time scale, we need a cluster computer or supercomputer. Recently great changes have occurred in the PC calculation environment. A graphic process unit (GPU) on a graphic card, only used to calculate display data, has a superior calculation capability to a PC's CPU. This GPU calculation performance is a match for the supercomputer in 2000. Although it has such a great calculation potential, it is not easy to program a simulation code for GPU due to difficult programming techniques for converting a calculation matrix to a 3D rendering image using graphic APIs. In 2006, NVIDIA provided the Software Development Kit (SDK) for the programming environment for NVIDIA's graphic cards, which is called the Compute Unified Device Architecture (CUDA). It makes the programming on the GPU easy without knowledge of the graphic APIs. This paper describes the basic architectures of NVIDIA's GPU and CUDA, and carries out a performance benchmark for the Monte Carlo simulation

  8. The TeraGyroid Experiment – Supercomputing 2003

    Directory of Open Access Journals (Sweden)

    R.J. Blake

    2005-01-01

    Full Text Available Amphiphiles are molecules with hydrophobic tails and hydrophilic heads. When dispersed in solvents, they self assemble into complex mesophases including the beautiful cubic gyroid phase. The goal of the TeraGyroid experiment was to study defect pathways and dynamics in these gyroids. The UK's supercomputing and USA's TeraGrid facilities were coupled together, through a dedicated high-speed network, into a single computational Grid for research work that peaked around the Supercomputing 2003 conference. The gyroids were modeled using lattice Boltzmann methods with parameter spaces explored using many 1283 and 3grid point simulations, this data being used to inform the world's largest three-dimensional time dependent simulation with 10243-grid points. The experiment generated some 2 TBytes of useful data. In terms of Grid technology the project demonstrated the migration of simulations (using Globus middleware to and fro across the Atlantic exploiting the availability of resources. Integration of the systems accelerated the time to insight. Distributed visualisation of the output datasets enabled the parameter space of the interactions within the complex fluid to be explored from a number of sites, informed by discourse over the Access Grid. The project was sponsored by EPSRC (UK and NSF (USA with trans-Atlantic optical bandwidth provided by British Telecommunications.

  9. Parallel computations

    CERN Document Server

    1982-01-01

    Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn

  10. Algorithm comparison and benchmarking using a parallel spectra transform shallow water model

    Energy Technology Data Exchange (ETDEWEB)

    Worley, P.H. [Oak Ridge National Lab., TN (United States); Foster, I.T.; Toonen, B. [Argonne National Lab., IL (United States)

    1995-04-01

    In recent years, a number of computer vendors have produced supercomputers based on a massively parallel processing (MPP) architecture. These computers have been shown to be competitive in performance with conventional vector supercomputers for some applications. As spectral weather and climate models are heavy users of vector supercomputers, it is interesting to determine how these models perform on MPPS, and which MPPs are best suited to the execution of spectral models. The benchmarking of MPPs is complicated by the fact that different algorithms may be more efficient on different architectures. Hence, a comprehensive benchmarking effort must answer two related questions: which algorithm is most efficient on each computer and how do the most efficient algorithms compare on different computers. In general, these are difficult questions to answer because of the high cost associated with implementing and evaluating a range of different parallel algorithms on each MPP platform.

  11. Massive hybrid parallelism for fully implicit multiphysics

    International Nuclear Information System (INIS)

    Gaston, D. R.; Permann, C. J.; Andrs, D.; Peterson, J. W.

    2013-01-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided. (authors)

  12. Massive hybrid parallelism for fully implicit multiphysics

    Energy Technology Data Exchange (ETDEWEB)

    Gaston, D. R.; Permann, C. J.; Andrs, D.; Peterson, J. W. [Idaho National Laboratory, 2525 N. Fremont Ave., Idaho Falls, ID 83415 (United States)

    2013-07-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided. (authors)

  13. MASSIVE HYBRID PARALLELISM FOR FULLY IMPLICIT MULTIPHYSICS

    Energy Technology Data Exchange (ETDEWEB)

    Cody J. Permann; David Andrs; John W. Peterson; Derek R. Gaston

    2013-05-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided.

  14. Parallel visualization on leadership computing resources

    Energy Technology Data Exchange (ETDEWEB)

    Peterka, T; Ross, R B [Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439 (United States); Shen, H-W [Department of Computer Science and Engineering, Ohio State University, Columbus, OH 43210 (United States); Ma, K-L [Department of Computer Science, University of California at Davis, Davis, CA 95616 (United States); Kendall, W [Department of Electrical Engineering and Computer Science, University of Tennessee at Knoxville, Knoxville, TN 37996 (United States); Yu, H, E-mail: tpeterka@mcs.anl.go [Sandia National Laboratories, California, Livermore, CA 94551 (United States)

    2009-07-01

    Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.

  15. Parallel visualization on leadership computing resources

    International Nuclear Information System (INIS)

    Peterka, T; Ross, R B; Shen, H-W; Ma, K-L; Kendall, W; Yu, H

    2009-01-01

    Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.

  16. Parallel Computing Strategies for Irregular Algorithms

    Science.gov (United States)

    Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

  17. High temporal resolution mapping of seismic noise sources using heterogeneous supercomputers

    Science.gov (United States)

    Gokhberg, Alexey; Ermert, Laura; Paitz, Patrick; Fichtner, Andreas

    2017-04-01

    Time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems. Significant interest in seismic noise source maps with high temporal resolution (days) is expected to come from a number of domains, including natural resources exploration, analysis of active earthquake fault zones and volcanoes, as well as geothermal and hydrocarbon reservoir monitoring. Currently, knowledge of noise sources is insufficient for high-resolution subsurface monitoring applications. Near-real-time seismic data, as well as advanced imaging methods to constrain seismic noise sources have recently become available. These methods are based on the massive cross-correlation of seismic noise records from all available seismic stations in the region of interest and are therefore very computationally intensive. Heterogeneous massively parallel supercomputing systems introduced in the recent years combine conventional multi-core CPU with GPU accelerators and provide an opportunity for manifold increase and computing performance. Therefore, these systems represent an efficient platform for implementation of a noise source mapping solution. We present the first results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service that provides seismic noise source maps for Central Europe with high temporal resolution (days to few weeks depending on frequency and data availability). The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept in order to provide the interested external researchers the regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for

  18. KfK-seminar series on supercomputing und visualization from May till September 1992

    International Nuclear Information System (INIS)

    Hohenhinnebusch, W.

    1993-05-01

    During the period of may 1992 to september 1992 a series of seminars was held at KfK on several topics of supercomputing in different fields of application. The aim was to demonstrate the importance of supercomputing and visualization in numerical simulations of complex physical and technical phenomena. This report contains the collection of all submitted seminar papers. (orig./HP) [de

  19. Comprehensive efficiency analysis of supercomputer resource usage based on system monitoring data

    Science.gov (United States)

    Mamaeva, A. A.; Shaykhislamov, D. I.; Voevodin, Vad V.; Zhumatiy, S. A.

    2018-03-01

    One of the main problems of modern supercomputers is the low efficiency of their usage, which leads to the significant idle time of computational resources, and, in turn, to the decrease in speed of scientific research. This paper presents three approaches to study the efficiency of supercomputer resource usage based on monitoring data analysis. The first approach performs an analysis of computing resource utilization statistics, which allows to identify different typical classes of programs, to explore the structure of the supercomputer job flow and to track overall trends in the supercomputer behavior. The second approach is aimed specifically at analyzing off-the-shelf software packages and libraries installed on the supercomputer, since efficiency of their usage is becoming an increasingly important factor for the efficient functioning of the entire supercomputer. Within the third approach, abnormal jobs – jobs with abnormally inefficient behavior that differs significantly from the standard behavior of the overall supercomputer job flow – are being detected. For each approach, the results obtained in practice in the Supercomputer Center of Moscow State University are demonstrated.

  20. Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

    KAUST Repository

    Hasanov, Khalid

    2014-03-04

    © 2014, Springer Science+Business Media New York. Many state-of-the-art parallel algorithms, which are widely used in scientific applications executed on high-end computing systems, were designed in the twentieth century with relatively small-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel algorithms for execution on large-scale distributed-memory systems. The idea is to reduce the communication cost by introducing hierarchy and hence more parallelism in the communication scheme. We apply this approach to SUMMA, the state-of-the-art parallel algorithm for matrix–matrix multiplication, and demonstrate both theoretically and experimentally that the modified Hierarchical SUMMA significantly improves the communication cost and the overall performance on large-scale platforms.

  1. Automatic discovery of the communication network topology for building a supercomputer model

    Science.gov (United States)

    Sobolev, Sergey; Stefanov, Konstantin; Voevodin, Vadim

    2016-10-01

    The Research Computing Center of Lomonosov Moscow State University is developing the Octotron software suite for automatic monitoring and mitigation of emergency situations in supercomputers so as to maximize hardware reliability. The suite is based on a software model of the supercomputer. The model uses a graph to describe the computing system components and their interconnections. One of the most complex components of a supercomputer that needs to be included in the model is its communication network. This work describes the proposed approach for automatically discovering the Ethernet communication network topology in a supercomputer and its description in terms of the Octotron model. This suite automatically detects computing nodes and switches, collects information about them and identifies their interconnections. The application of this approach is demonstrated on the "Lomonosov" and "Lomonosov-2" supercomputers.

  2. Compiler and Runtime Support for Programming in Adaptive Parallel Environments

    Science.gov (United States)

    1998-10-15

    noother job is waiting for resources, and use a smaller number of processors when other jobs needresources. Setia et al. [15, 20] have shown that such...15] Vijay K. Naik, Sanjeev Setia , and Mark Squillante. Performance analysis of job scheduling policiesin parallel supercomputing environments. In...on networks ofheterogeneous workstations. Technical Report CSE-94-012, Oregon Graduate Institute of Scienceand Technology, 1994.[20] Sanjeev Setia

  3. 369 TFlop/s molecular dynamics simulations on the Roadrunner general-purpose heterogeneous supercomputer

    Energy Technology Data Exchange (ETDEWEB)

    Swaminarayan, Sriram [Los Alamos National Laboratory; Germann, Timothy C [Los Alamos National Laboratory; Kadau, Kai [Los Alamos National Laboratory; Fossum, Gordon C [IBM CORPORATION

    2008-01-01

    The authors present timing and performance numbers for a short-range parallel molecular dynamics (MD) code, SPaSM, that has been rewritten for the heterogeneous Roadrunner supercomputer. Each Roadrunner compute node consists of two AMD Opteron dual-core microprocessors and four PowerXCell 8i enhanced Cell microprocessors, so that there are four MPI ranks per node, each with one Opteron and one Cell. The interatomic forces are computed on the Cells (each with one PPU and eight SPU cores), while the Opterons are used to direct inter-rank communication and perform I/O-heavy periodic analysis, visualization, and checkpointing tasks. The performance measured for our initial implementation of a standard Lennard-Jones pair potential benchmark reached a peak of 369 Tflop/s double-precision floating-point performance on the full Roadrunner system (27.7% of peak), corresponding to 124 MFlop/Watt/s at a price of approximately 3.69 MFlops/dollar. They demonstrate an initial target application, the jetting and ejection of material from a shocked surface.

  4. Confronting tracker field quintessence with data

    International Nuclear Information System (INIS)

    Wang, Pao-Yu; Chen, Chien-Wen; Chen, Pisin

    2012-01-01

    We confront tracker field quintessence with observational data. The potentials considered in this paper include V(φ)∝φ −α , exp (M p /φ), exp (M p /φ)−1, exp (βM p /φ) and exp (γM p /φ)−1; while the data come from the latest SN Ia, CMB and BAO observations. Stringent parameter constraints are obtained. In comparison with the cosmological constant via information criteria, it is found that models with potentials exp (M p /φ), exp (M p /φ)−1 and exp (γM p /φ)−1 are not supported by the current data

  5. Dark radiation confronting LHC in Z′ models

    International Nuclear Information System (INIS)

    Solaguren-Beascoa, A.; Gonzalez-Garcia, M.C.

    2013-01-01

    Recent cosmological data favour additional relativistic degrees of freedom beyond the three active neutrinos and photons, often referred to as “dark radiation”. Extensions of the SM involving TeV-scale Z ′ gauge bosons generically contain superweakly interacting light right-handed neutrinos which can constitute this dark radiation. In this Letter we confront the requirement on the parameters of the E 6 Z ′ models to account for the present evidence of dark radiation with the already existing constraints from searches for new neutral gauge bosons at LHC7

  6. Parallel algorithms

    CERN Document Server

    Casanova, Henri; Robert, Yves

    2008-01-01

    ""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi

  7. Cooperative visualization and simulation in a supercomputer environment

    International Nuclear Information System (INIS)

    Ruehle, R.; Lang, U.; Wierse, A.

    1993-01-01

    The article takes a closer look on the requirements being imposed by the idea to integrate all the components into a homogeneous software environment. To this end several methods for the distribtuion of applications in dependence of certain problem types are discussed. The currently available methods at the University of Stuttgart Computer Center for the distribution of applications are further explained. Finally the aims and characteristics of a European sponsored project, called PAGEIN, are explained, which fits perfectly into the line of developments at RUS. The aim of the project is to experiment with future cooperative working modes of aerospace scientists in a high speed distributed supercomputing environment. Project results will have an impact on the development of real future scientific application environments. (orig./DG)

  8. Lectures in Supercomputational Neurosciences Dynamics in Complex Brain Networks

    CERN Document Server

    Graben, Peter beim; Thiel, Marco; Kurths, Jürgen

    2008-01-01

    Computational Neuroscience is a burgeoning field of research where only the combined effort of neuroscientists, biologists, psychologists, physicists, mathematicians, computer scientists, engineers and other specialists, e.g. from linguistics and medicine, seem to be able to expand the limits of our knowledge. The present volume is an introduction, largely from the physicists' perspective, to the subject matter with in-depth contributions by system neuroscientists. A conceptual model for complex networks of neurons is introduced that incorporates many important features of the real brain, such as various types of neurons, various brain areas, inhibitory and excitatory coupling and the plasticity of the network. The computational implementation on supercomputers, which is introduced and discussed in detail in this book, will enable the readers to modify and adapt the algortihm for their own research. Worked-out examples of applications are presented for networks of Morris-Lecar neurons to model the cortical co...

  9. Evaluating the networking characteristics of the Cray XC-40 Intel Knights Landing-based Cori supercomputer at NERSC

    Energy Technology Data Exchange (ETDEWEB)

    Doerfler, Douglas [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Austin, Brian [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Cook, Brandon [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Deslippe, Jack [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Kandalla, Krishna [Cray Inc, Bloomington, MN (United States); Mendygral, Peter [Cray Inc, Bloomington, MN (United States)

    2017-09-12

    There are many potential issues associated with deploying the Intel Xeon Phi™ (code named Knights Landing [KNL]) manycore processor in a large-scale supercomputer. One in particular is the ability to fully utilize the high-speed communications network, given that the serial performance of a Xeon Phi TM core is a fraction of a Xeon®core. In this paper, we take a look at the trade-offs associated with allocating enough cores to fully utilize the Aries high-speed network versus cores dedicated to computation, e.g., the trade-off between MPI and OpenMP. In addition, we evaluate new features of Cray MPI in support of KNL, such as internode optimizations. We also evaluate one-sided programming models such as Unified Parallel C. We quantify the impact of the above trade-offs and features using a suite of National Energy Research Scientific Computing Center applications.

  10. Development of a Cloud Resolving Model for Heterogeneous Supercomputers

    Science.gov (United States)

    Sreepathi, S.; Norman, M. R.; Pal, A.; Hannah, W.; Ponder, C.

    2017-12-01

    A cloud resolving climate model is needed to reduce major systematic errors in climate simulations due to structural uncertainty in numerical treatments of convection - such as convective storm systems. This research describes the porting effort to enable SAM (System for Atmosphere Modeling) cloud resolving model on heterogeneous supercomputers using GPUs (Graphical Processing Units). We have isolated a standalone configuration of SAM that is targeted to be integrated into the DOE ACME (Accelerated Climate Modeling for Energy) Earth System model. We have identified key computational kernels from the model and offloaded them to a GPU using the OpenACC programming model. Furthermore, we are investigating various optimization strategies intended to enhance GPU utilization including loop fusion/fission, coalesced data access and loop refactoring to a higher abstraction level. We will present early performance results, lessons learned as well as optimization strategies. The computational platform used in this study is the Summitdev system, an early testbed that is one generation removed from Summit, the next leadership class supercomputer at Oak Ridge National Laboratory. The system contains 54 nodes wherein each node has 2 IBM POWER8 CPUs and 4 NVIDIA Tesla P100 GPUs. This work is part of a larger project, ACME-MMF component of the U.S. Department of Energy(DOE) Exascale Computing Project. The ACME-MMF approach addresses structural uncertainty in cloud processes by replacing traditional parameterizations with cloud resolving "superparameterization" within each grid cell of global climate model. Super-parameterization dramatically increases arithmetic intensity, making the MMF approach an ideal strategy to achieve good performance on emerging exascale computing architectures. The goal of the project is to integrate superparameterization into ACME, and explore its full potential to scientifically and computationally advance climate simulation and prediction.

  11. Sieving for pseudosquares and pseudocubes in parallel using doubly-focused enumeration and wheel datastructures

    OpenAIRE

    Sorenson, Jonathan P.

    2010-01-01

    We extend the known tables of pseudosquares and pseudocubes, discuss the implications of these new data on the conjectured distribution of pseudosquares and pseudocubes, and present the details of the algorithm used to do this work. Our algorithm is based on the space-saving wheel data structure combined with doubly-focused enumeration, run in parallel on a cluster supercomputer.

  12. Streaming Parallel GPU Acceleration of Large-Scale filter-based Spiking Neural Networks

    NARCIS (Netherlands)

    L.P. Slazynski (Leszek); S.M. Bohte (Sander)

    2012-01-01

    htmlabstractThe arrival of graphics processing (GPU) cards suitable for massively parallel computing promises a↵ordable large-scale neural network simulation previously only available at supercomputing facil- ities. While the raw numbers suggest that GPUs may outperform CPUs by at least an order of

  13. Alleviating Search Uncertainty through Concept Associations: Automatic Indexing, Co-Occurrence Analysis, and Parallel Computing.

    Science.gov (United States)

    Chen, Hsinchun; Martinez, Joanne; Kirchhoff, Amy; Ng, Tobun D.; Schatz, Bruce R.

    1998-01-01

    Grounded on object filtering, automatic indexing, and co-occurrence analysis, an experiment was performed using a parallel supercomputer to analyze over 400,000 abstracts in an INSPEC computer engineering collection. A user evaluation revealed that system-generated thesauri were better than the human-generated INSPEC subject thesaurus in concept…

  14. Parents’ experience confronting child burning situation

    Directory of Open Access Journals (Sweden)

    Valdira Vieira de Oliveira

    2015-05-01

    Full Text Available Objective: to understand experiences of parents in a child burning situation during the hospitalization process. Methods: phenomenological research in view of Martin Heidegger, held with seven assisting parents at a pediatrics unit of a general hospital in Montes Claros. The information was obtained by phenomenological interview, containing the question guide: “What does it mean to you being with a son who is suffering with burns?”. Results: during the experience, parents revealed anguish, fear, helplessness, concerns and expectations of “being-in-the-world”. Conclusion: respect, understanding and care from the health team were fundamental for the adaptation and the confrontation demanded by the consequent suffering of the event.

  15. Parallel science and engineering applications the Charm++ approach

    CERN Document Server

    Kale, Laxmikant V

    2016-01-01

    Developed in the context of science and engineering applications, with each abstraction motivated by and further honed by specific application needs, Charm++ is a production-quality system that runs on almost all parallel computers available. Parallel Science and Engineering Applications: The Charm++ Approach surveys a diverse and scalable collection of science and engineering applications, most of which are used regularly on supercomputers by scientists to further their research. After a brief introduction to Charm++, the book presents several parallel CSE codes written in the Charm++ model, along with their underlying scientific and numerical formulations, explaining their parallelization strategies and parallel performance. These chapters demonstrate the versatility of Charm++ and its utility for a wide variety of applications, including molecular dynamics, cosmology, quantum chemistry, fracture simulations, agent-based simulations, and weather modeling. The book is intended for a wide audience of people i...

  16. Vectorization and parallelization of a production reactor assembly code

    International Nuclear Information System (INIS)

    Vujic, J.L.; Martin, W.R.; Michigan Univ., Ann Arbor, MI

    1991-01-01

    In order to use efficiently the new features of supercomputers, production codes, usually written 10 -20 years ago, must be tailored for modern computer architectures. We have chosen to optimize the CPM-2 code, a production reactor assembly code based on the collision probability transport method. Substantial speedup in the execution times was obtained with the parallel/vector version of the CPM-2 code. In addition, we have developed a new transfer probability method, which removes some of the modelling limitations of the collision probability method encoded in the CPM-2 code, and can fully utilize the parallel/vector architecture of a multiprocessor IBM 3090. (author)

  17. Development of a parallelization method for KENO V.a

    International Nuclear Information System (INIS)

    Basoglu, B.; Bentley, C.; Dunn, M.

    1995-01-01

    The KENO V.a codes is a widely used Monte carlo codes that is part of the SCALE modular codes system for performing standardized computer analysis of nuclear systems for licensing evaluation. In the past few years, attempts have been made to speed up KENO V.a using new generation computers. In this paper we report on the initial development of a parallel version of KENO V.a for the Kendall Square Research supercomputer (KSRI) at ORNL. Investigations thus far have shown that the parallel code provides accurate results with significantly reduced computation times relative to the conventional KENO V.a code

  18. Vectorization and parallelization of a production reactor assembly code

    International Nuclear Information System (INIS)

    Vujic, J.L.; Martin, W.R.

    1991-01-01

    In order to efficiently use new features of supercomputers, production codes, usually written 10 - 20 years ago, must be tailored for modern computer architectures. We have chosen to optimize the CPM-2 code, a production reactor assembly code based on the collision probability transport method. Substantial speedups in the execution times were obtained with the parallel/vector version of the CPM-2 code. In addition, we have developed a new transfer probability method, which removes some of the modelling limitations of the collision probability method encoded in the CPM-2 code, and can fully utilize parallel/vector architecture of a multiprocessor IBM 3090. (author)

  19. Parallel Evolutionary Optimization for Neuromorphic Network Training

    Energy Technology Data Exchange (ETDEWEB)

    Schuman, Catherine D [ORNL; Disney, Adam [University of Tennessee (UT); Singh, Susheela [North Carolina State University (NCSU), Raleigh; Bruer, Grant [University of Tennessee (UT); Mitchell, John Parker [University of Tennessee (UT); Klibisz, Aleksander [University of Tennessee (UT); Plank, James [University of Tennessee (UT)

    2016-01-01

    One of the key impediments to the success of current neuromorphic computing architectures is the issue of how best to program them. Evolutionary optimization (EO) is one promising programming technique; in particular, its wide applicability makes it especially attractive for neuromorphic architectures, which can have many different characteristics. In this paper, we explore different facets of EO on a spiking neuromorphic computing model called DANNA. We focus on the performance of EO in the design of our DANNA simulator, and on how to structure EO on both multicore and massively parallel computing systems. We evaluate how our parallel methods impact the performance of EO on Titan, the U.S.'s largest open science supercomputer, and BOB, a Beowulf-style cluster of Raspberry Pi's. We also focus on how to improve the EO by evaluating commonality in higher performing neural networks, and present the result of a study that evaluates the EO performed by Titan.

  20. Impact analysis on a massively parallel computer

    International Nuclear Information System (INIS)

    Zacharia, T.; Aramayo, G.A.

    1994-01-01

    Advanced mathematical techniques and computer simulation play a major role in evaluating and enhancing the design of beverage cans, industrial, and transportation containers for improved performance. Numerical models are used to evaluate the impact requirements of containers used by the Department of Energy (DOE) for transporting radioactive materials. Many of these models are highly compute-intensive. An analysis may require several hours of computational time on current supercomputers despite the simplicity of the models being studied. As computer simulations and materials databases grow in complexity, massively parallel computers have become important tools. Massively parallel computational research at the Oak Ridge National Laboratory (ORNL) and its application to the impact analysis of shipping containers is briefly described in this paper

  1. The four-meter confrontation visual field test.

    OpenAIRE

    Kodsi, S R; Younge, B R

    1992-01-01

    The 4-m confrontation visual field test has been successfully used at the Mayo Clinic for many years in addition to the standard 0.5-m confrontation visual field test. The 4-m confrontation visual field test is a test of macular function and can identify small central or paracentral scotomas that the examiner may not find when the patient is tested only at 0.5 m. Also, macular sparing in homonymous hemianopias and quadrantanopias may be identified with the 4-m confrontation visual field test....

  2. Novel Supercomputing Approaches for High Performance Linear Algebra Using FPGAs, Phase II

    Data.gov (United States)

    National Aeronautics and Space Administration — Supercomputing plays a major role in many areas of science and engineering, and it has had tremendous impact for decades in areas such as aerospace, defense, energy,...

  3. To Confront Versus not to Confront: Women’s Perception of Sexual Harassment

    OpenAIRE

    María del Carmen Herrera; Antonio Herrera; Francisca Expósito

    2017-01-01

    Current research has postulated that sexual harassment is one of the most serious social problems. Perceptions of sexual harassment vary according to some factors: gender, context, and perceiver’s ideology. The strategies most commonly used by women to cope with harassment range from avoiding or ignoring the harasser to confronting the harasser or reporting the incident. The aim of this study was to explore women’s perception of sexual harassment, and to assess the implications of different v...

  4. Scalable geocomputation: evolving an environmental model building platform from single-core to supercomputers

    Science.gov (United States)

    Schmitz, Oliver; de Jong, Kor; Karssenberg, Derek

    2017-04-01

    There is an increasing demand to run environmental models on a big scale: simulations over large areas at high resolution. The heterogeneity of available computing hardware such as multi-core CPUs, GPUs or supercomputer potentially provides significant computing power to fulfil this demand. However, this requires detailed knowledge of the underlying hardware, parallel algorithm design and the implementation thereof in an efficient system programming language. Domain scientists such as hydrologists or ecologists often lack this specific software engineering knowledge, their emphasis is (and should be) on exploratory building and analysis of simulation models. As a result, models constructed by domain specialists mostly do not take full advantage of the available hardware. A promising solution is to separate the model building activity from software engineering by offering domain specialists a model building framework with pre-programmed building blocks that they combine to construct a model. The model building framework, consequently, needs to have built-in capabilities to make full usage of the available hardware. Developing such a framework providing understandable code for domain scientists and being runtime efficient at the same time poses several challenges on developers of such a framework. For example, optimisations can be performed on individual operations or the whole model, or tasks need to be generated for a well-balanced execution without explicitly knowing the complexity of the domain problem provided by the modeller. Ideally, a modelling framework supports the optimal use of available hardware whichsoever combination of model building blocks scientists use. We demonstrate our ongoing work on developing parallel algorithms for spatio-temporal modelling and demonstrate 1) PCRaster, an environmental software framework (http://www.pcraster.eu) providing spatio-temporal model building blocks and 2) parallelisation of about 50 of these building blocks using

  5. SUPERCOMPUTERS FOR AIDING ECONOMIC PROCESSES WITH REFERENCE TO THE FINANCIAL SECTOR

    Directory of Open Access Journals (Sweden)

    Jerzy Balicki

    2014-12-01

    Full Text Available The article discusses the use of supercomputers to support business processes with particular emphasis on the financial sector. A reference was made to the selected projects that support economic development. In particular, we propose the use of supercomputers to perform artificial intel-ligence methods in banking. The proposed methods combined with modern technology enables a significant increase in the competitiveness of enterprises and banks by adding new functionality.

  6. Adventures in supercomputing: An innovative program for high school teachers

    Energy Technology Data Exchange (ETDEWEB)

    Oliver, C.E.; Hicks, H.R.; Summers, B.G. [Oak Ridge National Lab., TN (United States); Staten, D.G. [Wartburg Central High School, TN (United States)

    1994-12-31

    Within the realm of education, seldom does an innovative program become available with the potential to change an educator`s teaching methodology. Adventures in Supercomputing (AiS), sponsored by the U.S. Department of Energy (DOE), is such a program. It is a program for high school teachers that changes the teacher paradigm from a teacher-directed approach of teaching to a student-centered approach. {open_quotes}A student-centered classroom offers better opportunities for development of internal motivation, planning skills, goal setting and perseverance than does the traditional teacher-directed mode{close_quotes}. Not only is the process of teaching changed, but the cross-curricula integration within the AiS materials is remarkable. Written from a teacher`s perspective, this paper will describe the AiS program and its effects on teachers and students, primarily at Wartburg Central High School, in Wartburg, Tennessee. The AiS program in Tennessee is sponsored by Oak Ridge National Laboratory (ORNL).

  7. Accelerating Science Impact through Big Data Workflow Management and Supercomputing

    Directory of Open Access Journals (Sweden)

    De K.

    2016-01-01

    Full Text Available The Large Hadron Collider (LHC, operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. ATLAS, one of the largest collaborations ever assembled in the the history of science, is at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, the ATLAS experiment is relying on a heterogeneous distributed computational infrastructure. To manage the workflow for all data processing on hundreds of data centers the PanDA (Production and Distributed AnalysisWorkload Management System is used. An ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF, is realizing within BigPanDA and megaPanDA projects. These projects are now exploring how PanDA might be used for managing computing jobs that run on supercomputers including OLCF’s Titan and NRC-KI HPC2. The main idea is to reuse, as much as possible, existing components of the PanDA system that are already deployed on the LHC Grid for analysis of physics data. The next generation of PanDA will allow many data-intensive sciences employing a variety of computing platforms to benefit from ATLAS experience and proven tools in highly scalable processing.

  8. Symbolic simulation of engineering systems on a supercomputer

    International Nuclear Information System (INIS)

    Ragheb, M.; Gvillo, D.; Makowitz, H.

    1986-01-01

    Model-Based Production-Rule systems for analysis are developed for the symbolic simulation of Complex Engineering systems on a CRAY X-MP Supercomputer. The Fault-Tree and Event-Tree Analysis methodologies from Systems-Analysis are used for problem representation and are coupled to the Rule-Based System Paradigm from Knowledge Engineering to provide modelling of engineering devices. Modelling is based on knowledge of the structure and function of the device rather than on human expertise alone. To implement the methodology, we developed a production-Rule Analysis System that uses both backward-chaining and forward-chaining: HAL-1986. The inference engine uses an Induction-Deduction-Oriented antecedent-consequent logic and is programmed in Portable Standard Lisp (PSL). The inference engine is general and can accommodate general modifications and additions to the knowledge base. The methodologies used will be demonstrated using a model for the identification of faults, and subsequent recovery from abnormal situations in Nuclear Reactor Safety Analysis. The use of the exposed methodologies for the prognostication of future device responses under operational and accident conditions using coupled symbolic and procedural programming is discussed

  9. HEP Computing Tools, Grid and Supercomputers for Genome Sequencing Studies

    Science.gov (United States)

    De, K.; Klimentov, A.; Maeno, T.; Mashinistov, R.; Novikov, A.; Poyda, A.; Tertychnyy, I.; Wenaus, T.

    2017-10-01

    PanDA - Production and Distributed Analysis Workload Management System has been developed to address ATLAS experiment at LHC data processing and analysis challenges. Recently PanDA has been extended to run HEP scientific applications on Leadership Class Facilities and supercomputers. The success of the projects to use PanDA beyond HEP and Grid has drawn attention from other compute intensive sciences such as bioinformatics. Recent advances of Next Generation Genome Sequencing (NGS) technology led to increasing streams of sequencing data that need to be processed, analysed and made available for bioinformaticians worldwide. Analysis of genomes sequencing data using popular software pipeline PALEOMIX can take a month even running it on the powerful computer resource. In this paper we will describe the adaptation the PALEOMIX pipeline to run it on a distributed computing environment powered by PanDA. To run pipeline we split input files into chunks which are run separately on different nodes as separate inputs for PALEOMIX and finally merge output file, it is very similar to what it done by ATLAS to process and to simulate data. We dramatically decreased the total walltime because of jobs (re)submission automation and brokering within PanDA. Using software tools developed initially for HEP and Grid can reduce payload execution time for Mammoths DNA samples from weeks to days.

  10. Parallel computing of a climate model on the dawn 1000 by domain decomposition method

    Science.gov (United States)

    Bi, Xunqiang

    1997-12-01

    In this paper the parallel computing of a grid-point nine-level atmospheric general circulation model on the Dawn 1000 is introduced. The model was developed by the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences (CAS). The Dawn 1000 is a MIMD massive parallel computer made by National Research Center for Intelligent Computer (NCIC), CAS. A two-dimensional domain decomposition method is adopted to perform the parallel computing. The potential ways to increase the speed-up ratio and exploit more resources of future massively parallel supercomputation are also discussed.

  11. Socratic Confrontation with Athens: an Interpretation

    Directory of Open Access Journals (Sweden)

    Narges Tajik

    2009-08-01

    Full Text Available Impiety was one of the two charges against Socrates. As a civil religion (within which politics and religion are mutually intertwined was the then-dominated religion in Athens, impiety was regarded as a civil laws violation. Thus, charge of impiety, as a political subversion, might lead Socrates to death. However, in Apology there are some signs of Socrates’ religiousness as swearing and the claim to be at service of the Polis’ formal gods and goddess which lead to the question whether Socrates were an impious person, in addition to the question concerning the reasons why Socrates was sentenced to death, while he has showed his religiousness. In this study, we argued the nature of Socrates’ religiousness and offered an interpretation of Socrates’ silent confrontation with Athenians as is described in the court and his advocacy there. Therefore, introducing the state of religion in Athens, it would be shown that Socrates goes not deep in the inspirations, but intervening personal negative accounts, argues for a private religious experience, while does not offers any substitution for the formal religion.

  12. Composite inflation confronts BICEP2 and PLANCK

    International Nuclear Information System (INIS)

    Karwan, Khamphee; Channuie, Phongpichit

    2014-01-01

    We examine observational constraints on single-field inflation in which the inflaton is a composite field stemming from a four-dimensional strongly interacting field theory. We confront the predictions with the Planck and very recent BICEP2 data. In the large non-minimal coupling regions, we discover for the minimal composite inflationary model that the predictions lie well inside the joint 68% CL for the Planck data, but is in tension with the recent BICEP2 observations. In the case of the glueball inflationary model, the predictions satisfy the Planck results. However, this model can produce a large tensor-to-scalar ratio consistent with the recent BICEP2 observations if the number of e-foldings is slightly smaller than the range commonly used. For a super Yang-Mills paradigm, we discover that the predictions satisfy the Planck data, and surprisingly a large tensor-to-scalar ratio consistent with the BICEP2 results can also be produced for an acceptable range of the number of e-foldings and of the confining scale. In the small non-minimal coupling regions, all of the models can satisfy the BICEP2 results. However, the predictions of the glueball and superglueball inflationary models cannot satisfy the observational bound on the amplitude of the curvature perturbation launched by Planck, and the techni-inflaton self-coupling in the minimal composite inflationary model is constrained to be extremely small

  13. Dementia Care: Confronting Myths in Clinical Management.

    Science.gov (United States)

    Neitch, Shirley M; Meadows, Charles; Patton-Tackett, Eva; Yingling, Kevin W

    2016-01-01

    Every day, patients with dementia, their families, and their physicians face the enormous challenges of this pervasive life-changing condition. Seeking help, often grasping at straws, victims, and their care providers are confronted with misinformation and myths when they search the internet or other sources. When Persons with Dementia (PWD) and their caregivers believe and/or act on false information, proper treatment may be delayed, and ultimately damage can be done. In this paper, we review commonly misunderstood issues encountered in caring for PWD. Our goal is to equip Primary Care Practitioners (PCPs) with accurate information to share with patients and families, to improve the outcomes of PWD to the greatest extent possible. While there are innumerable myths about dementia and its causes and treatments, we are going to focus on the most common false claims or misunderstandings which we hear in our Internal Medicine practice at Marshall Health. We offer suggestions for busy practitioners approaching some of the more common issues with patients and families in a clinic setting.

  14. Confronting Higgcision with electric dipole moments

    Energy Technology Data Exchange (ETDEWEB)

    Cheung, Kingman [Department of Physics, National Tsing Hua University, Hsinchu 300, Taiwan (China); Division of Quantum Phases and Devices, School of Physics, Konkuk University, Seoul 143-701 (Korea, Republic of); Lee, Jae Sik [Department of Physics, Chonnam National University, 300 Yongbong-dong, Buk-gu, Gwangju, 500-757 (Korea, Republic of); Senaha, Eibun [Department of Physics, Nagoya University, Nagoya 464-8602 (Japan); Tseng, Po-Yan [Department of Physics, National Tsing Hua University, Hsinchu 300, Taiwan (China)

    2014-06-26

    Current data on the signal strengths and angular spectrum of the 125.5 GeV Higgs boson still allow a CP-mixed state, namely, the pseudoscalar coupling to the top quark can be as sizable as the scalar coupling: C{sub u}{sup S}≈C{sub u}{sup P}=1/2. CP violation can then arise and manifest in sizable electric dipole moments (EDMs). In the framework of two-Higgs-doublet models, we not only update the Higgs precision (Higgcision) study on the couplings with the most updated Higgs signal strength data, but also compute all the Higgs-mediated contributions from the 125.5 GeV Higgs boson to the EDMs, and confront the allowed parameter space against the existing constraints from the EDM measurements of Thallium, neutron, Mercury, and Thorium monoxide. We found that the combined EDM constraints restrict the pseudoscalar coupling to be less than about 10{sup −2}, unless there are contributions from other Higgs bosons, supersymmetric particles, or other exotic particles that delicately cancel the current Higgs-mediated contributions.

  15. Parallel computation

    International Nuclear Information System (INIS)

    Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.

    1997-01-01

    The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment

  16. Confronting Perpetrators of Prejudice: The Inhibitory Effects of Social Costs

    Science.gov (United States)

    Shelton, J. Nicole; Stewart, Rebecca E.

    2004-01-01

    The purpose of this research is to investigate the extent to which social costs influence whether or not targets of prejudice confront individuals who behave in a prejudiced manner during interpersonal interactions. Consistent with our predictions, we found that although women believe they will confront perpetrators of prejudice regardless of the…

  17. Wastewater Use in Irrigated Agriculture : Confronting the Livelihood ...

    International Development Research Centre (IDRC) Digital Library (Canada)

    Wastewater Use in Irrigated Agriculture : Confronting the Livelihood and Environmental Realities. Couverture du livre Wastewater Use in Irrigated Agriculture: Confronting the Livelihood and Environmental Realities. Directeur(s) : Christopher Scott, Naser I. Faruqui et Liqa Raschid. Maison(s) d'édition : CABI, IWMI, CRDI.

  18. The Vatican & Population Growth Control: Why an American Confrontation?

    Science.gov (United States)

    Mumford, Stephen D.

    1983-01-01

    The Vatican, because of its position on population growth, threatens the security of all nations. Catholic countries with right-wing dictatorships cannot confront the Vatican on family planning and survive. U.S. Catholics must confront the Vatican on this issue. American lay Catholics must break the American church away from the Vatican control.…

  19. 28 CFR 552.23 - Confrontation avoidance procedures.

    Science.gov (United States)

    2010-07-01

    ... 28 Judicial Administration 2 2010-07-01 2010-07-01 false Confrontation avoidance procedures. 552... MANAGEMENT CUSTODY Use of Force and Application of Restraints on Inmates § 552.23 Confrontation avoidance... information about the inmate and the immediate situation. Based on their assessment of that information, they...

  20. Partial Overhaul and Initial Parallel Optimization of KINETICS, a Coupled Dynamics and Chemistry Atmosphere Model

    Science.gov (United States)

    Nguyen, Howard; Willacy, Karen; Allen, Mark

    2012-01-01

    KINETICS is a coupled dynamics and chemistry atmosphere model that is data intensive and computationally demanding. The potential performance gain from using a supercomputer motivates the adaptation from a serial version to a parallelized one. Although the initial parallelization had been done, bottlenecks caused by an abundance of communication calls between processors led to an unfavorable drop in performance. Before starting on the parallel optimization process, a partial overhaul was required because a large emphasis was placed on streamlining the code for user convenience and revising the program to accommodate the new supercomputers at Caltech and JPL. After the first round of optimizations, the partial runtime was reduced by a factor of 23; however, performance gains are dependent on the size of the data, the number of processors requested, and the computer used.

  1. Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer

    Energy Technology Data Exchange (ETDEWEB)

    Villa, Oreste; Tumeo, Antonino; Secchi, Simone; Manzano Franco, Joseph B.

    2012-12-31

    Irregular applications, such as data mining and analysis or graph-based computations, show unpredictable memory/network access patterns and control structures. Highly multithreaded architectures with large processor counts, like the Cray MTA-1, MTA-2 and XMT, appear to address their requirements better than commodity clusters. However, the research on highly multithreaded systems is currently limited by the lack of adequate architectural simulation infrastructures due to issues such as size of the machines, memory footprint, simulation speed, accuracy and customization. At the same time, Shared-memory MultiProcessors (SMPs) with multi-core processors have become an attractive platform to simulate large scale machines. In this paper, we introduce a cycle-level simulator of the highly multithreaded Cray XMT supercomputer. The simulator runs unmodified XMT applications. We discuss how we tackled the challenges posed by its development, detailing the techniques introduced to make the simulation as fast as possible while maintaining a high accuracy. By mapping XMT processors (ThreadStorm with 128 hardware threads) to host computing cores, the simulation speed remains constant as the number of simulated processors increases, up to the number of available host cores. The simulator supports zero-overhead switching among different accuracy levels at run-time and includes a network model that takes into account contention. On a modern 48-core SMP host, our infrastructure simulates a large set of irregular applications 500 to 2000 times slower than real time when compared to a 128-processor XMT, while remaining within 10\\% of accuracy. Emulation is only from 25 to 200 times slower than real time.

  2. Parallelization for X-ray crystal structural analysis program

    Energy Technology Data Exchange (ETDEWEB)

    Watanabe, Hiroshi [Japan Atomic Energy Research Inst., Tokyo (Japan); Minami, Masayuki; Yamamoto, Akiji

    1997-10-01

    In this report we study vectorization and parallelization for X-ray crystal structural analysis program. The target machine is NEC SX-4 which is a distributed/shared memory type vector parallel supercomputer. X-ray crystal structural analysis is surveyed, and a new multi-dimensional discrete Fourier transform method is proposed. The new method is designed to have a very long vector length, so that it enables to obtain the 12.0 times higher performance result that the original code. Besides the above-mentioned vectorization, the parallelization by micro-task functions on SX-4 reaches 13.7 times acceleration in the part of multi-dimensional discrete Fourier transform with 14 CPUs, and 3.0 times acceleration in the whole program. Totally 35.9 times acceleration to the original 1CPU scalar version is achieved with vectorization and parallelization on SX-4. (author)

  3. Parallelization of a numerical simulation code for isotropic turbulence

    International Nuclear Information System (INIS)

    Sato, Shigeru; Yokokawa, Mitsuo; Watanabe, Tadashi; Kaburaki, Hideo.

    1996-03-01

    A parallel pseudospectral code which solves the three-dimensional Navier-Stokes equation by direct numerical simulation is developed and execution time, parallelization efficiency, load balance and scalability are evaluated. A vector parallel supercomputer, Fujitsu VPP500 with up to 16 processors is used for this calculation for Fourier modes up to 256x256x256 using 16 processors. Good scalability for number of processors is achieved when number of Fourier mode is fixed. For small Fourier modes, calculation time of the program is proportional to NlogN which is ideal complexity of calculation for 3D-FFT on vector parallel processors. It is found that the calculation performance decreases as the increase of the Fourier modes. (author)

  4. Optimisation of a parallel ocean general circulation model

    Directory of Open Access Journals (Sweden)

    M. I. Beare

    1997-10-01

    Full Text Available This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  5. Optimisation of a parallel ocean general circulation model

    Directory of Open Access Journals (Sweden)

    M. I. Beare

    Full Text Available This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  6. When Do We Confront? Perceptions of Costs and Benefits Predict Confronting Discrimination on Behalf of the Self and Others

    Science.gov (United States)

    Good, Jessica J.; Moss-Racusin, Corinne A.; Sanchez, Diana T.

    2012-01-01

    Across two studies, we tested whether perceived social costs and benefits of confrontation would similarly predict confronting discrimination both when it was experienced and when it was observed as directed at others. Female undergraduate participants were asked to recall past experiences and observations of sexism, as well as their confronting…

  7. Confronting Misinformation in Climate Change Higher Education

    Science.gov (United States)

    Bedford, D. P.

    2012-12-01

    Among the many challenges faced by climate change educators is the highly politicized nature of the subject matter (e.g. McCright and Dunlap, 2011) and the associated misinformation from key media outlets and websites (e.g. see Oreskes and Conway, 2010). Students typically do not enter the classroom as 'blank slates', but often have already formed some opinion about climate change which may or may not be based on reputable sources. Further, many students have lives outside the classroom and/or off campus, and even those who do live in an isolated bubble of campus life will eventually graduate. Thus, providing students with a level of climate change knowledge and understanding robust enough to cope with misinformation may be an important goal for educators. This paper presents a case study of the direct use of climate change misinformation as a college-level classroom activity. Some research from other fields (notably psychology) has found that directly addressing misconceptions in the classroom can be the most effective means of dispelling them (Kowalski and Taylor, 2009). However, directly confronting misinformation in the classroom carries inherent risks, such as reinforcing misconceptions (e.g. Cook and Lewandowsky, 2011). This paper therefore considers approaches to minimizing those risks while attempting to maximize the possible benefits. This paper argues that use of misinformation as a teaching tool can provide useful exercises in critical thinking, testing of content knowledge, and consideration of the nature of science. Cook, J. and S. Lewandowsky. 2011. The Debunking Handbook. Online publication available www.skepticalscience.com/docs/Debunking_Handbook.pdf. Accessed 7 July 2012. Kowalski, P. and A.K. Taylor. 2009. DOI: 10.1080/00986280902959986. McCright, A., and R.T. Dunlap. 2011. The politicization of climate change and polarization in the American public's views of global warming, 2001-2010. The Sociological Quarterly 52:2, 155-194. Oreskes, N. and E

  8. Parallel R

    CERN Document Server

    McCallum, Ethan

    2011-01-01

    It's tough to argue with R as a high-quality, cross-platform, open source statistical software product-unless you're in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You'll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don't. With these packages, you can overcome R's single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R's memory barrier.

  9. Efficient multitasking of Choleski matrix factorization on CRAY supercomputers

    Science.gov (United States)

    Overman, Andrea L.; Poole, Eugene L.

    1991-01-01

    A Choleski method is described and used to solve linear systems of equations that arise in large scale structural analysis. The method uses a novel variable-band storage scheme and is structured to exploit fast local memory caches while minimizing data access delays between main memory and vector registers. Several parallel implementations of this method are described for the CRAY-2 and CRAY Y-MP computers demonstrating the use of microtasking and autotasking directives. A portable parallel language, FORCE, is used for comparison with the microtasked and autotasked implementations. Results are presented comparing the matrix factorization times for three representative structural analysis problems from runs made in both dedicated and multi-user modes on both computers. CPU and wall clock timings are given for the parallel implementations and are compared to single processor timings of the same algorithm.

  10. Performance Analysis and Scaling Behavior of the Terrestrial Systems Modeling Platform TerrSysMP in Large-Scale Supercomputing Environments

    Science.gov (United States)

    Kollet, S. J.; Goergen, K.; Gasper, F.; Shresta, P.; Sulis, M.; Rihani, J.; Simmer, C.; Vereecken, H.

    2013-12-01

    In studies of the terrestrial hydrologic, energy and biogeochemical cycles, integrated multi-physics simulation platforms take a central role in characterizing non-linear interactions, variances and uncertainties of system states and fluxes in reciprocity with observations. Recently developed integrated simulation platforms attempt to honor the complexity of the terrestrial system across multiple time and space scales from the deeper subsurface including groundwater dynamics into the atmosphere. Technically, this requires the coupling of atmospheric, land surface, and subsurface-surface flow models in supercomputing environments, while ensuring a high-degree of efficiency in the utilization of e.g., standard Linux clusters and massively parallel resources. A systematic performance analysis including profiling and tracing in such an application is crucial in the understanding of the runtime behavior, to identify optimum model settings, and is an efficient way to distinguish potential parallel deficiencies. On sophisticated leadership-class supercomputers, such as the 28-rack 5.9 petaFLOP IBM Blue Gene/Q 'JUQUEEN' of the Jülich Supercomputing Centre (JSC), this is a challenging task, but even more so important, when complex coupled component models are to be analysed. Here we want to present our experience from coupling, application tuning (e.g. 5-times speedup through compiler optimizations), parallel scaling and performance monitoring of the parallel Terrestrial Systems Modeling Platform TerrSysMP. The modeling platform consists of the weather prediction system COSMO of the German Weather Service; the Community Land Model, CLM of NCAR; and the variably saturated surface-subsurface flow code ParFlow. The model system relies on the Multiple Program Multiple Data (MPMD) execution model where the external Ocean-Atmosphere-Sea-Ice-Soil coupler (OASIS3) links the component models. TerrSysMP has been instrumented with the performance analysis tool Scalasca and analyzed

  11. To Confront Versus not to Confront: Women’s Perception of Sexual Harassment

    Directory of Open Access Journals (Sweden)

    María del Carmen Herrera

    2017-05-01

    Full Text Available Current research has postulated that sexual harassment is one of the most serious social problems. Perceptions of sexual harassment vary according to some factors: gender, context, and perceiver’s ideology. The strategies most commonly used by women to cope with harassment range from avoiding or ignoring the harasser to confronting the harasser or reporting the incident. The aim of this study was to explore women’s perception of sexual harassment, and to assess the implications of different victim responses to harassment. A total of 138 women were administered a questionnaire where the type of harassment, and victim response were manipulated. Moreover, the influence of ideological variables (i.e. ambivalent sexism and the acceptance of myths of sexual harassment on perception was assessed. Results show perception of sexual harassment was lower in gender harassment than in unwanted sexual attention and participants believed women who confronted their harasser would be evaluated negatively by men. Furthermore, effects of ideology on perception of harassment were found. The results underscore the complexities involved in defining certain behaviours as harassment, and the implications of different victim responses to harassment.

  12. Parallel Lines

    Directory of Open Access Journals (Sweden)

    James G. Worner

    2017-05-01

    Full Text Available James Worner is an Australian-based writer and scholar currently pursuing a PhD at the University of Technology Sydney. His research seeks to expose masculinities lost in the shadow of Australia’s Anzac hegemony while exploring new opportunities for contemporary historiography. He is the recipient of the Doctoral Scholarship in Historical Consciousness at the university’s Australian Centre of Public History and will be hosted by the University of Bologna during 2017 on a doctoral research writing scholarship.   ‘Parallel Lines’ is one of a collection of stories, The Shapes of Us, exploring liminal spaces of modern life: class, gender, sexuality, race, religion and education. It looks at lives, like lines, that do not meet but which travel in proximity, simultaneously attracted and repelled. James’ short stories have been published in various journals and anthologies.

  13. Performance studies of the parallel VIM code

    International Nuclear Information System (INIS)

    Shi, B.; Blomquist, R.N.

    1996-01-01

    In this paper, the authors evaluate the performance of the parallel version of the VIM Monte Carlo code on the IBM SPx at the High Performance Computing Research Facility at ANL. Three test problems with contrasting computational characteristics were used to assess effects in performance. A statistical method for estimating the inefficiencies due to load imbalance and communication is also introduced. VIM is a large scale continuous energy Monte Carlo radiation transport program and was parallelized using history partitioning, the master/worker approach, and p4 message passing library. Dynamic load balancing is accomplished when the master processor assigns chunks of histories to workers that have completed a previously assigned task, accommodating variations in the lengths of histories, processor speeds, and worker loads. At the end of each batch (generation), the fission sites and tallies are sent from each worker to the master process, contributing to the parallel inefficiency. All communications are between master and workers, and are serial. The SPx is a scalable 128-node parallel supercomputer with high-performance Omega switches of 63 microsec latency and 35 MBytes/sec bandwidth. For uniform and reproducible performance, they used only the 120 identical regular processors (IBM RS/6000) and excluded the remaining eight planet nodes, which may be loaded by other's jobs

  14. Confronting Female Genital Mutilation: The Role of Youth and ICTs ...

    International Development Research Centre (IDRC) Digital Library (Canada)

    2011-07-14

    Jul 14, 2011 ... Book cover Confronting Female Genital Mutilation: The Role of ... of an innovative research and action project carried out by ENDA Tiers ... Congratulations to the first cohort of Women in Climate Change Science Fellows!

  15. Confronting quasi-exponential inflation with WMAP seven

    International Nuclear Information System (INIS)

    Pal, Barun Kumar; Pal, Supratik; Basu, B.

    2012-01-01

    We confront quasi-exponential models of inflation with WMAP seven years dataset using Hamilton Jacobi formalism. With a phenomenological Hubble parameter, representing quasi exponential inflation, we develop the formalism and subject the analysis to confrontation with WMAP seven using the publicly available code CAMB. The observable parameters are found to fair extremely well with WMAP seven. We also obtain a ratio of tensor to scalar amplitudes which may be detectable in PLANCK

  16. The Frontiers of Observational Cosmology and the Confrontation with Theory

    International Nuclear Information System (INIS)

    Longair, Malcolm

    2011-01-01

    The current state of observational cosmology and the confrontation with theory is presented. The review is divided into the following sections: - Basic observations on which the models are based. - Testing the basic assumptions made in the construction of the standard cosmological models. - Structure formation in the standard models; - Observational tests of the standard models - the confrontation with observation; - Basic problems and approaches to their solution; - Future challenges - the ESA EUCLID mission is given as an example.

  17. Parallelization of the Coupled Earthquake Model

    Science.gov (United States)

    Block, Gary; Li, P. Peggy; Song, Yuhe T.

    2007-01-01

    This Web-based tsunami simulation system allows users to remotely run a model on JPL s supercomputers for a given undersea earthquake. At the time of this reporting, predicting tsunamis on the Internet has never happened before. This new code directly couples the earthquake model and the ocean model on parallel computers and improves simulation speed. Seismometers can only detect information from earthquakes; they cannot detect whether or not a tsunami may occur as a result of the earthquake. When earthquake-tsunami models are coupled with the improved computational speed of modern, high-performance computers and constrained by remotely sensed data, they are able to provide early warnings for those coastal regions at risk. The software is capable of testing NASA s satellite observations of tsunamis. It has been successfully tested for several historical tsunamis, has passed all alpha and beta testing, and is well documented for users.

  18. Evolution of a minimal parallel programming model

    International Nuclear Information System (INIS)

    Lusk, Ewing; Butler, Ralph; Pieper, Steven C.

    2017-01-01

    Here, we take a historical approach to our presentation of self-scheduled task parallelism, a programming model with its origins in early irregular and nondeterministic computations encountered in automated theorem proving and logic programming. We show how an extremely simple task model has evolved into a system, asynchronous dynamic load balancing (ADLB), and a scalable implementation capable of supporting sophisticated applications on today’s (and tomorrow’s) largest supercomputers; and we illustrate the use of ADLB with a Green’s function Monte Carlo application, a modern, mature nuclear physics code in production use. Our lesson is that by surrendering a certain amount of generality and thus applicability, a minimal programming model (in terms of its basic concepts and the size of its application programmer interface) can achieve extreme scalability without introducing complexity.

  19. A parallel orbital-updating based plane-wave basis method for electronic structure calculations

    International Nuclear Information System (INIS)

    Pan, Yan; Dai, Xiaoying; Gironcoli, Stefano de; Gong, Xin-Gao; Rignanese, Gian-Marco; Zhou, Aihui

    2017-01-01

    Highlights: • Propose three parallel orbital-updating based plane-wave basis methods for electronic structure calculations. • These new methods can avoid the generating of large scale eigenvalue problems and then reduce the computational cost. • These new methods allow for two-level parallelization which is particularly interesting for large scale parallelization. • Numerical experiments show that these new methods are reliable and efficient for large scale calculations on modern supercomputers. - Abstract: Motivated by the recently proposed parallel orbital-updating approach in real space method , we propose a parallel orbital-updating based plane-wave basis method for electronic structure calculations, for solving the corresponding eigenvalue problems. In addition, we propose two new modified parallel orbital-updating methods. Compared to the traditional plane-wave methods, our methods allow for two-level parallelization, which is particularly interesting for large scale parallelization. Numerical experiments show that these new methods are more reliable and efficient for large scale calculations on modern supercomputers.

  20. Guide to dataflow supercomputing basic concepts, case studies, and a detailed example

    CERN Document Server

    Milutinovic, Veljko; Trifunovic, Nemanja; Giorgi, Roberto

    2015-01-01

    This unique text/reference describes an exciting and novel approach to supercomputing in the DataFlow paradigm. The major advantages and applications of this approach are clearly described, and a detailed explanation of the programming model is provided using simple yet effective examples. The work is developed from a series of lecture courses taught by the authors in more than 40 universities across more than 20 countries, and from research carried out by Maxeler Technologies, Inc. Topics and features: presents a thorough introduction to DataFlow supercomputing for big data problems; revie

  1. Performance Characteristics of Hybrid MPI/OpenMP Scientific Applications on a Large-Scale Multithreaded BlueGene/Q Supercomputer

    KAUST Repository

    Wu, Xingfu; Taylor, Valerie

    2013-01-01

    In this paper, we investigate the performance characteristics of five hybrid MPI/OpenMP scientific applications (two NAS Parallel benchmarks Multi-Zone SP-MZ and BT-MZ, an earthquake simulation PEQdyna, an aerospace application PMLB and a 3D particle-in-cell application GTC) on a large-scale multithreaded Blue Gene/Q supercomputer at Argonne National laboratory, and quantify the performance gap resulting from using different number of threads per node. We use performance tools and MPI profile and trace libraries available on the supercomputer to analyze and compare the performance of these hybrid scientific applications with increasing the number OpenMP threads per node, and find that increasing the number of threads to some extent saturates or worsens performance of these hybrid applications. For the strong-scaling hybrid scientific applications such as SP-MZ, BT-MZ, PEQdyna and PLMB, using 32 threads per node results in much better application efficiency than using 64 threads per node, and as increasing the number of threads per node, the FPU (Floating Point Unit) percentage decreases, and the MPI percentage (except PMLB) and IPC (Instructions per cycle) per core (except BT-MZ) increase. For the weak-scaling hybrid scientific application such as GTC, the performance trend (relative speedup) is very similar with increasing number of threads per node no matter how many nodes (32, 128, 512) are used. © 2013 IEEE.

  2. Performance Characteristics of Hybrid MPI/OpenMP Scientific Applications on a Large-Scale Multithreaded BlueGene/Q Supercomputer

    KAUST Repository

    Wu, Xingfu

    2013-07-01

    In this paper, we investigate the performance characteristics of five hybrid MPI/OpenMP scientific applications (two NAS Parallel benchmarks Multi-Zone SP-MZ and BT-MZ, an earthquake simulation PEQdyna, an aerospace application PMLB and a 3D particle-in-cell application GTC) on a large-scale multithreaded Blue Gene/Q supercomputer at Argonne National laboratory, and quantify the performance gap resulting from using different number of threads per node. We use performance tools and MPI profile and trace libraries available on the supercomputer to analyze and compare the performance of these hybrid scientific applications with increasing the number OpenMP threads per node, and find that increasing the number of threads to some extent saturates or worsens performance of these hybrid applications. For the strong-scaling hybrid scientific applications such as SP-MZ, BT-MZ, PEQdyna and PLMB, using 32 threads per node results in much better application efficiency than using 64 threads per node, and as increasing the number of threads per node, the FPU (Floating Point Unit) percentage decreases, and the MPI percentage (except PMLB) and IPC (Instructions per cycle) per core (except BT-MZ) increase. For the weak-scaling hybrid scientific application such as GTC, the performance trend (relative speedup) is very similar with increasing number of threads per node no matter how many nodes (32, 128, 512) are used. © 2013 IEEE.

  3. Highly parallel machines and future of scientific computing

    International Nuclear Information System (INIS)

    Singh, G.S.

    1992-01-01

    Computing requirement of large scale scientific computing has always been ahead of what state of the art hardware could supply in the form of supercomputers of the day. And for any single processor system the limit to increase in the computing power was realized a few years back itself. Now with the advent of parallel computing systems the availability of machines with the required computing power seems a reality. In this paper the author tries to visualize the future large scale scientific computing in the penultimate decade of the present century. The author summarized trends in parallel computers and emphasize the need for a better programming environment and software tools for optimal performance. The author concludes this paper with critique on parallel architectures, software tools and algorithms. (author). 10 refs., 2 tabs

  4. A tool for simulating parallel branch-and-bound methods

    Science.gov (United States)

    Golubeva, Yana; Orlov, Yury; Posypkin, Mikhail

    2016-01-01

    The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer's interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.

  5. Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

    International Nuclear Information System (INIS)

    Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; Davidson, Gregory G.; Hamilton, Steven P.; Godfrey, Andrew T.

    2015-01-01

    This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Some specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000 ® problems. These benchmark and scaling studies show promising results

  6. Interactive real-time nuclear plant simulations on a UNIX based supercomputer

    International Nuclear Information System (INIS)

    Behling, S.R.

    1990-01-01

    Interactive real-time nuclear plant simulations are critically important to train nuclear power plant engineers and operators. In addition, real-time simulations can be used to test the validity and timing of plant technical specifications and operational procedures. To accurately and confidently simulate a nuclear power plant transient in real-time, sufficient computer resources must be available. Since some important transients cannot be simulated using preprogrammed responses or non-physical models, commonly used simulation techniques may not be adequate. However, the power of a supercomputer allows one to accurately calculate the behavior of nuclear power plants even during very complex transients. Many of these transients can be calculated in real-time or quicker on the fastest supercomputers. The concept of running interactive real-time nuclear power plant transients on a supercomputer has been tested. This paper describes the architecture of the simulation program, the techniques used to establish real-time synchronization, and other issues related to the use of supercomputers in a new and potentially very important area. (author)

  7. Toward a Proof of Concept Cloud Framework for Physics Applications on Blue Gene Supercomputers

    International Nuclear Information System (INIS)

    Dreher, Patrick; Scullin, William; Vouk, Mladen

    2015-01-01

    Traditional high performance supercomputers are capable of delivering large sustained state-of-the-art computational resources to physics applications over extended periods of time using batch processing mode operating environments. However, today there is an increasing demand for more complex workflows that involve large fluctuations in the levels of HPC physics computational requirements during the simulations. Some of the workflow components may also require a richer set of operating system features and schedulers than normally found in a batch oriented HPC environment. This paper reports on progress toward a proof of concept design that implements a cloud framework onto BG/P and BG/Q platforms at the Argonne Leadership Computing Facility. The BG/P implementation utilizes the Kittyhawk utility and the BG/Q platform uses an experimental heterogeneous FusedOS operating system environment. Both platforms use the Virtual Computing Laboratory as the cloud computing system embedded within the supercomputer. This proof of concept design allows a cloud to be configured so that it can capitalize on the specialized infrastructure capabilities of a supercomputer and the flexible cloud configurations without resorting to virtualization. Initial testing of the proof of concept system is done using the lattice QCD MILC code. These types of user reconfigurable environments have the potential to deliver experimental schedulers and operating systems within a working HPC environment for physics computations that may be different from the native OS and schedulers on production HPC supercomputers. (paper)

  8. Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers

    KAUST Repository

    Wu, Xingfu; Taylor, Valerie

    2013-01-01

    In this paper, we present a performance modeling framework based on memory bandwidth contention time and a parameterized communication model to predict the performance of OpenMP, MPI and hybrid applications with weak scaling on three large-scale multicore supercomputers: IBM POWER4, POWER5+ and BlueGene/P, and analyze the performance of these MPI, OpenMP and hybrid applications. We use STREAM memory benchmarks and Intel's MPI benchmarks to provide initial performance analysis and model validation of MPI and OpenMP applications on these multicore supercomputers because the measured sustained memory bandwidth can provide insight into the memory bandwidth that a system should sustain on scientific applications with the same amount of workload per core. In addition to using these benchmarks, we also use a weak-scaling hybrid MPI/OpenMP large-scale scientific application: Gyrokinetic Toroidal Code (GTC) in magnetic fusion to validate our performance model of the hybrid application on these multicore supercomputers. The validation results for our performance modeling method show less than 7.77% error rate in predicting the performance of hybrid MPI/OpenMP GTC on up to 512 cores on these multicore supercomputers. © 2013 Elsevier Inc.

  9. Argonne National Lab deploys Force10 networks' massively dense ethernet switch for supercomputing cluster

    CERN Multimedia

    2003-01-01

    "Force10 Networks, Inc. today announced that Argonne National Laboratory (Argonne, IL) has successfully deployed Force10 E-Series switch/routers to connect to the TeraGrid, the world's largest supercomputing grid, sponsored by the National Science Foundation (NSF)" (1/2 page).

  10. Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers

    KAUST Repository

    Wu, Xingfu

    2013-12-01

    In this paper, we present a performance modeling framework based on memory bandwidth contention time and a parameterized communication model to predict the performance of OpenMP, MPI and hybrid applications with weak scaling on three large-scale multicore supercomputers: IBM POWER4, POWER5+ and BlueGene/P, and analyze the performance of these MPI, OpenMP and hybrid applications. We use STREAM memory benchmarks and Intel\\'s MPI benchmarks to provide initial performance analysis and model validation of MPI and OpenMP applications on these multicore supercomputers because the measured sustained memory bandwidth can provide insight into the memory bandwidth that a system should sustain on scientific applications with the same amount of workload per core. In addition to using these benchmarks, we also use a weak-scaling hybrid MPI/OpenMP large-scale scientific application: Gyrokinetic Toroidal Code (GTC) in magnetic fusion to validate our performance model of the hybrid application on these multicore supercomputers. The validation results for our performance modeling method show less than 7.77% error rate in predicting the performance of hybrid MPI/OpenMP GTC on up to 512 cores on these multicore supercomputers. © 2013 Elsevier Inc.

  11. Fiscal 2000 report on advanced parallelized compiler technology. Outlines; 2000 nendo advanced heiretsuka compiler gijutsu hokokusho (Gaiyo hen)

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2001-03-01

    Research and development was carried out concerning the automatic parallelized compiler technology which improves on the practical performance, cost/performance ratio, and ease of operation of the multiprocessor system now used for constructing supercomputers and expected to provide a fundamental architecture for microprocessors for the 21st century. Efforts were made to develop an automatic multigrain parallelization technology for extracting multigrain as parallelized from a program and for making full use of the same and a parallelizing tuning technology for accelerating parallelization by feeding back to the compiler the dynamic information and user knowledge to be acquired during execution. Moreover, a benchmark program was selected and studies were made to set execution rules and evaluation indexes for the establishment of technologies for subjectively evaluating the performance of parallelizing compilers for the existing commercial parallel processing computers, which was achieved through the implementation and evaluation of the 'Advanced parallelizing compiler technology research and development project.' (NEDO)

  12. A compositional reservoir simulator on distributed memory parallel computers

    International Nuclear Information System (INIS)

    Rame, M.; Delshad, M.

    1995-01-01

    This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. A portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented

  13. Parallel implementation of the PHOENIX generalized stellar atmosphere program. II. Wavelength parallelization

    International Nuclear Information System (INIS)

    Baron, E.; Hauschildt, Peter H.

    1998-01-01

    We describe an important addition to the parallel implementation of our generalized nonlocal thermodynamic equilibrium (NLTE) stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is, distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition, task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000 - 300,000) and hence parallelization over wavelength can lead both to considerable speedup in calculation time and the ability to make use of the aggregate memory available on massively parallel supercomputers. Here, we describe an implementation of a pipelined design for the wavelength parallelization of PHOENIX, where the necessary data from the processor working on a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known. Our implementation uses a MIMD design based on a relatively small number of standard message passing interface (MPI) library calls and is fully portable between serial and parallel computers. copyright 1998 The American Astronomical Society

  14. MaMiCo: Transient multi-instance molecular-continuum flow simulation on supercomputers

    Science.gov (United States)

    Neumann, Philipp; Bian, Xin

    2017-11-01

    We present extensions of the macro-micro-coupling tool MaMiCo, which was designed to couple continuum fluid dynamics solvers with discrete particle dynamics. To enable local extraction of smooth flow field quantities especially on rather short time scales, sampling over an ensemble of molecular dynamics simulations is introduced. We provide details on these extensions including the transient coupling algorithm, open boundary forcing, and multi-instance sampling. Furthermore, we validate the coupling in Couette flow using different particle simulation software packages and particle models, i.e. molecular dynamics and dissipative particle dynamics. Finally, we demonstrate the parallel scalability of the molecular-continuum simulations by using up to 65 536 compute cores of the supercomputer Shaheen II located at KAUST. Program Files doi:http://dx.doi.org/10.17632/w7rgdrhb85.1 Licensing provisions: BSD 3-clause Programming language: C, C++ External routines/libraries: For compiling: SCons, MPI (optional) Subprograms used: ESPResSo, LAMMPS, ls1 mardyn, waLBerla For installation procedures of the MaMiCo interfaces, see the README files in the respective code directories located in coupling/interface/impl. Journal reference of previous version: P. Neumann, H. Flohr, R. Arora, P. Jarmatz, N. Tchipev, H.-J. Bungartz. MaMiCo: Software design for parallel molecular-continuum flow simulations, Computer Physics Communications 200: 324-335, 2016 Does the new version supersede the previous version?: Yes. The functionality of the previous version is completely retained in the new version. Nature of problem: Coupled molecular-continuum simulation for multi-resolution fluid dynamics: parts of the domain are resolved by molecular dynamics or another particle-based solver whereas large parts are covered by a mesh-based CFD solver, e.g. a lattice Boltzmann automaton. Solution method: We couple existing MD and CFD solvers via MaMiCo (macro-micro coupling tool). Data exchange and

  15. The importance of confronting a colonial, patriarchal and racist past ...

    African Journals Online (AJOL)

    The importance of confronting a colonial, patriarchal and racist past in addressing post-apartheid sexual violence. ... It also needs to redress problems of social and economic inequality that exist in South Africa as hangovers from this country's colonial and apartheid-era past. Keywords: Zuma, rape, Kipling, colonialism, ...

  16. Confrontation and Alienation: Education's Flawed Response to Religious Textbook Objections.

    Science.gov (United States)

    Balajthy, Ernest

    Recent controversies over textbooks illustrate objections held by Evangelicals to "secular humanism" in the schools. Educators automatically tend to assume that all religious objections to curricula are clear-cut attempts at censorship. This confrontational attitude on the part of educators can lead to alienation of minority religious…

  17. Challenges confronting health care workers in government's ARV ...

    African Journals Online (AJOL)

    Challenges confronting health care workers in government's ARV rollout: rights and responsibilities. ... Potchefstroom Electronic Law Journal/Potchefstroomse Elektroniese Regsblad ... Unless the rights of HCWs are recognised and their needs adequately addressed, the best laid plans of government will be at risk.

  18. Challenge, Confrontation, and Exhortation as Intentional Invitations by Professional Helpers.

    Science.gov (United States)

    Schmidt, John J.

    1996-01-01

    Examines intentional invitations that challenge, confront, exhort, and persuade people to change their behaviors. Assumes that the sender controls the "intention" and that the receiver determines the degree of "inviting." Suggests that elements of the invitational model serve as a framework to create acceptable inducements in…

  19. Clinical confrontation results of diagnostics and treatment of skin cancer

    International Nuclear Information System (INIS)

    Zikiryakhodjaev, D.Z.; Sanginov, D.R.

    2001-01-01

    In this chapter of book authors investigated the clinical confrontation results of diagnostics and treatment of skin cancer. They noted that diagnostic of skin cancer have to foresee the determination morphologic implements and degree of malignancy tumorous process why in general depend prognosis of illness

  20. Confronting quintessence models with recent high-redshift supernovae data

    International Nuclear Information System (INIS)

    Calvo, G. Barro; Maroto, A. L.

    2006-01-01

    We confront the predictions of different quintessence models with recent measurements of the luminosity distance from two sets of supernovae type Ia. In particular, we consider the 157 SNe Ia in the Gold dataset with z M -α and Ω M -w φ planes for the different models and compare their predictions with dark energy models with constant equation of state

  1. A hybrid method for the parallel computation of Green's functions

    DEFF Research Database (Denmark)

    Petersen, Dan Erik; Li, Song; Stokbro, Kurt

    2009-01-01

    of the large number of times this calculation needs to be performed, this is computationally very expensive even on supercomputers. The classical approach is based on recurrence formulas which cannot be efficiently parallelized. This practically prevents the solution of large problems with hundreds...... of thousands of atoms. We propose new recurrences for a general class of sparse matrices to calculate Green's and lesser Green's function matrices which extend formulas derived by Takahashi and others. We show that these recurrences may lead to a dramatically reduced computational cost because they only...... require computing a small number of entries of the inverse matrix. Then. we propose a parallelization strategy for block tridiagonal matrices which involves a combination of Schur complement calculations and cyclic reduction. It achieves good scalability even on problems of modest size....

  2. A parallel finite-difference method for computational aerodynamics

    International Nuclear Information System (INIS)

    Swisshelm, J.M.

    1989-01-01

    A finite-difference scheme for solving complex three-dimensional aerodynamic flow on parallel-processing supercomputers is presented. The method consists of a basic flow solver with multigrid convergence acceleration, embedded grid refinements, and a zonal equation scheme. Multitasking and vectorization have been incorporated into the algorithm. Results obtained include multiprocessed flow simulations from the Cray X-MP and Cray-2. Speedups as high as 3.3 for the two-dimensional case and 3.5 for segments of the three-dimensional case have been achieved on the Cray-2. The entire solver attained a factor of 2.7 improvement over its unitasked version on the Cray-2. The performance of the parallel algorithm on each machine is analyzed. 14 refs

  3. Supercomputer and cluster performance modeling and analysis efforts:2004-2006.

    Energy Technology Data Exchange (ETDEWEB)

    Sturtevant, Judith E.; Ganti, Anand; Meyer, Harold (Hal) Edward; Stevenson, Joel O.; Benner, Robert E., Jr. (.,; .); Goudy, Susan Phelps; Doerfler, Douglas W.; Domino, Stefan Paul; Taylor, Mark A.; Malins, Robert Joseph; Scott, Ryan T.; Barnette, Daniel Wayne; Rajan, Mahesh; Ang, James Alfred; Black, Amalia Rebecca; Laub, Thomas William; Vaughan, Courtenay Thomas; Franke, Brian Claude

    2007-02-01

    This report describes efforts by the Performance Modeling and Analysis Team to investigate performance characteristics of Sandia's engineering and scientific applications on the ASC capability and advanced architecture supercomputers, and Sandia's capacity Linux clusters. Efforts to model various aspects of these computers are also discussed. The goals of these efforts are to quantify and compare Sandia's supercomputer and cluster performance characteristics; to reveal strengths and weaknesses in such systems; and to predict performance characteristics of, and provide guidelines for, future acquisitions and follow-on systems. Described herein are the results obtained from running benchmarks and applications to extract performance characteristics and comparisons, as well as modeling efforts, obtained during the time period 2004-2006. The format of the report, with hypertext links to numerous additional documents, purposefully minimizes the document size needed to disseminate the extensive results from our research.

  4. BSMBench: a flexible and scalable supercomputer benchmark from computational particle physics

    CERN Document Server

    Bennett, Ed; Del Debbio, Luigi; Jordan, Kirk; Patella, Agostino; Pica, Claudio; Rago, Antonio

    2016-01-01

    Benchmarking plays a central role in the evaluation of High Performance Computing architectures. Several benchmarks have been designed that allow users to stress various components of supercomputers. In order for the figures they provide to be useful, benchmarks need to be representative of the most common real-world scenarios. In this work, we introduce BSMBench, a benchmarking suite derived from Monte Carlo code used in computational particle physics. The advantage of this suite (which can be freely downloaded from http://www.bsmbench.org/) over others is the capacity to vary the relative importance of computation and communication. This enables the tests to simulate various practical situations. To showcase BSMBench, we perform a wide range of tests on various architectures, from desktop computers to state-of-the-art supercomputers, and discuss the corresponding results. Possible future directions of development of the benchmark are also outlined.

  5. Enabling Diverse Software Stacks on Supercomputers using High Performance Virtual Clusters.

    Energy Technology Data Exchange (ETDEWEB)

    Younge, Andrew J. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Pedretti, Kevin [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Grant, Ryan [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Brightwell, Ron [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-05-01

    While large-scale simulations have been the hallmark of the High Performance Computing (HPC) community for decades, Large Scale Data Analytics (LSDA) workloads are gaining attention within the scientific community not only as a processing component to large HPC simulations, but also as standalone scientific tools for knowledge discovery. With the path towards Exascale, new HPC runtime systems are also emerging in a way that differs from classical distributed com- puting models. However, system software for such capabilities on the latest extreme-scale DOE supercomputing needs to be enhanced to more appropriately support these types of emerging soft- ware ecosystems. In this paper, we propose the use of Virtual Clusters on advanced supercomputing resources to enable systems to support not only HPC workloads, but also emerging big data stacks. Specifi- cally, we have deployed the KVM hypervisor within Cray's Compute Node Linux on a XC-series supercomputer testbed. We also use libvirt and QEMU to manage and provision VMs directly on compute nodes, leveraging Ethernet-over-Aries network emulation. To our knowledge, this is the first known use of KVM on a true MPP supercomputer. We investigate the overhead our solution using HPC benchmarks, both evaluating single-node performance as well as weak scaling of a 32-node virtual cluster. Overall, we find single node performance of our solution using KVM on a Cray is very efficient with near-native performance. However overhead increases by up to 20% as virtual cluster size increases, due to limitations of the Ethernet-over-Aries bridged network. Furthermore, we deploy Apache Spark with large data analysis workloads in a Virtual Cluster, ef- fectively demonstrating how diverse software ecosystems can be supported by High Performance Virtual Clusters.

  6. Application of Supercomputer Technologies for Simulation Of Socio-Economic Systems

    Directory of Open Access Journals (Sweden)

    Vladimir Valentinovich Okrepilov

    2015-06-01

    Full Text Available To date, an extensive experience has been accumulated in investigation of problems related to quality, assessment of management systems, modeling of economic system sustainability. The performed studies have created a basis for development of a new research area — Economics of Quality. Its tools allow to use opportunities of model simulation for construction of the mathematical models adequately reflecting the role of quality in natural, technical, social regularities of functioning of the complex socio-economic systems. Extensive application and development of models, and also system modeling with use of supercomputer technologies, on our deep belief, will bring the conducted research of socio-economic systems to essentially new level. Moreover, the current scientific research makes a significant contribution to model simulation of multi-agent social systems and that is not less important, it belongs to the priority areas in development of science and technology in our country. This article is devoted to the questions of supercomputer technologies application in public sciences, first of all, — regarding technical realization of the large-scale agent-focused models (AFM. The essence of this tool is that owing to the power computer increase it has become possible to describe the behavior of many separate fragments of a difficult system, as socio-economic systems are. The article also deals with the experience of foreign scientists and practicians in launching the AFM on supercomputers, and also the example of AFM developed in CEMI RAS, stages and methods of effective calculating kernel display of multi-agent system on architecture of a modern supercomputer will be analyzed. The experiments on the basis of model simulation on forecasting the population of St. Petersburg according to three scenarios as one of the major factors influencing the development of socio-economic system and quality of life of the population are presented in the

  7. Heat dissipation computations of a HVDC ground electrode using a supercomputer

    International Nuclear Information System (INIS)

    Greiss, H.; Mukhedkar, D.; Lagace, P.J.

    1990-01-01

    This paper reports on the temperature, of soil surrounding a High Voltage Direct Current (HVDC) toroidal ground electrode of practical dimensions, in both homogeneous and non-homogeneous soils that was computed at incremental points in time using finite difference methods on a supercomputer. Curves of the response were computed and plotted at several locations within the soil in the vicinity of the ground electrode for various values of the soil parameters

  8. Analyzing the Interplay of Failures and Workload on a Leadership-Class Supercomputer

    Energy Technology Data Exchange (ETDEWEB)

    Meneses, Esteban [University of Pittsburgh; Ni, Xiang [University of Illinois at Urbana-Champaign; Jones, Terry R [ORNL; Maxwell, Don E [ORNL

    2015-01-01

    The unprecedented computational power of cur- rent supercomputers now makes possible the exploration of complex problems in many scientific fields, from genomic analysis to computational fluid dynamics. Modern machines are powerful because they are massive: they assemble millions of cores and a huge quantity of disks, cards, routers, and other components. But it is precisely the size of these machines that glooms the future of supercomputing. A system that comprises many components has a high chance to fail, and fail often. In order to make the next generation of supercomputers usable, it is imperative to use some type of fault tolerance platform to run applications on large machines. Most fault tolerance strategies can be optimized for the peculiarities of each system and boost efficacy by keeping the system productive. In this paper, we aim to understand how failure characterization can improve resilience in several layers of the software stack: applications, runtime systems, and job schedulers. We examine the Titan supercomputer, one of the fastest systems in the world. We analyze a full year of Titan in production and distill the failure patterns of the machine. By looking into Titan s log files and using the criteria of experts, we provide a detailed description of the types of failures. In addition, we inspect the job submission files and describe how the system is used. Using those two sources, we cross correlate failures in the machine to executing jobs and provide a picture of how failures affect the user experience. We believe such characterization is fundamental in developing appropriate fault tolerance solutions for Cray systems similar to Titan.

  9. Multiscale Hy3S: Hybrid stochastic simulation for supercomputers

    Directory of Open Access Journals (Sweden)

    Kaznessis Yiannis N

    2006-02-01

    Full Text Available Abstract Background Stochastic simulation has become a useful tool to both study natural biological systems and design new synthetic ones. By capturing the intrinsic molecular fluctuations of "small" systems, these simulations produce a more accurate picture of single cell dynamics, including interesting phenomena missed by deterministic methods, such as noise-induced oscillations and transitions between stable states. However, the computational cost of the original stochastic simulation algorithm can be high, motivating the use of hybrid stochastic methods. Hybrid stochastic methods partition the system into multiple subsets and describe each subset as a different representation, such as a jump Markov, Poisson, continuous Markov, or deterministic process. By applying valid approximations and self-consistently merging disparate descriptions, a method can be considerably faster, while retaining accuracy. In this paper, we describe Hy3S, a collection of multiscale simulation programs. Results Building on our previous work on developing novel hybrid stochastic algorithms, we have created the Hy3S software package to enable scientists and engineers to both study and design extremely large well-mixed biological systems with many thousands of reactions and chemical species. We have added adaptive stochastic numerical integrators to permit the robust simulation of dynamically stiff biological systems. In addition, Hy3S has many useful features, including embarrassingly parallelized simulations with MPI; special discrete events, such as transcriptional and translation elongation and cell division; mid-simulation perturbations in both the number of molecules of species and reaction kinetic parameters; combinatorial variation of both initial conditions and kinetic parameters to enable sensitivity analysis; use of NetCDF optimized binary format to quickly read and write large datasets; and a simple graphical user interface, written in Matlab, to help users

  10. Building more powerful less expensive supercomputers using Processing-In-Memory (PIM) LDRD final report.

    Energy Technology Data Exchange (ETDEWEB)

    Murphy, Richard C.

    2009-09-01

    This report details the accomplishments of the 'Building More Powerful Less Expensive Supercomputers Using Processing-In-Memory (PIM)' LDRD ('PIM LDRD', number 105809) for FY07-FY09. Latency dominates all levels of supercomputer design. Within a node, increasing memory latency, relative to processor cycle time, limits CPU performance. Between nodes, the same increase in relative latency impacts scalability. Processing-In-Memory (PIM) is an architecture that directly addresses this problem using enhanced chip fabrication technology and machine organization. PIMs combine high-speed logic and dense, low-latency, high-bandwidth DRAM, and lightweight threads that tolerate latency by performing useful work during memory transactions. This work examines the potential of PIM-based architectures to support mission critical Sandia applications and an emerging class of more data intensive informatics applications. This work has resulted in a stronger architecture/implementation collaboration between 1400 and 1700. Additionally, key technology components have impacted vendor roadmaps, and we are in the process of pursuing these new collaborations. This work has the potential to impact future supercomputer design and construction, reducing power and increasing performance. This final report is organized as follow: this summary chapter discusses the impact of the project (Section 1), provides an enumeration of publications and other public discussion of the work (Section 1), and concludes with a discussion of future work and impact from the project (Section 1). The appendix contains reprints of the refereed publications resulting from this work.

  11. Parallel Programming with Intel Parallel Studio XE

    CERN Document Server

    Blair-Chappell , Stephen

    2012-01-01

    Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the

  12. SiGN-SSM: open source parallel software for estimating gene networks with state space models.

    Science.gov (United States)

    Tamada, Yoshinori; Yamaguchi, Rui; Imoto, Seiya; Hirose, Osamu; Yoshida, Ryo; Nagasaki, Masao; Miyano, Satoru

    2011-04-15

    SiGN-SSM is an open-source gene network estimation software able to run in parallel on PCs and massively parallel supercomputers. The software estimates a state space model (SSM), that is a statistical dynamic model suitable for analyzing short time and/or replicated time series gene expression profiles. SiGN-SSM implements a novel parameter constraint effective to stabilize the estimated models. Also, by using a supercomputer, it is able to determine the gene network structure by a statistical permutation test in a practical time. SiGN-SSM is applicable not only to analyzing temporal regulatory dependencies between genes, but also to extracting the differentially regulated genes from time series expression profiles. SiGN-SSM is distributed under GNU Affero General Public Licence (GNU AGPL) version 3 and can be downloaded at http://sign.hgc.jp/signssm/. The pre-compiled binaries for some architectures are available in addition to the source code. The pre-installed binaries are also available on the Human Genome Center supercomputer system. The online manual and the supplementary information of SiGN-SSM is available on our web site. tamada@ims.u-tokyo.ac.jp.

  13. Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project.

    Science.gov (United States)

    Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A L

    2012-06-13

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.

  14. Time-dependent density-functional theory in massively parallel computer architectures: the octopus project

    Science.gov (United States)

    Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.

    2012-06-01

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.

  15. Time-dependent density-functional theory in massively parallel computer architectures: the octopus project

    International Nuclear Information System (INIS)

    Andrade, Xavier; Aspuru-Guzik, Alán; Alberdi-Rodriguez, Joseba; Rubio, Angel; Strubbe, David A; Louie, Steven G; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Marques, Miguel A L

    2012-01-01

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures. (topical review)

  16. Parallel-vector algorithms for particle simulations on shared-memory multiprocessors

    International Nuclear Information System (INIS)

    Nishiura, Daisuke; Sakaguchi, Hide

    2011-01-01

    Over the last few decades, the computational demands of massive particle-based simulations for both scientific and industrial purposes have been continuously increasing. Hence, considerable efforts are being made to develop parallel computing techniques on various platforms. In such simulations, particles freely move within a given space, and so on a distributed-memory system, load balancing, i.e., assigning an equal number of particles to each processor, is not guaranteed. However, shared-memory systems achieve better load balancing for particle models, but suffer from the intrinsic drawback of memory access competition, particularly during (1) paring of contact candidates from among neighboring particles and (2) force summation for each particle. Here, novel algorithms are proposed to overcome these two problems. For the first problem, the key is a pre-conditioning process during which particle labels are sorted by a cell label in the domain to which the particles belong. Then, a list of contact candidates is constructed by pairing the sorted particle labels. For the latter problem, a table comprising the list indexes of the contact candidate pairs is created and used to sum the contact forces acting on each particle for all contacts according to Newton's third law. With just these methods, memory access competition is avoided without additional redundant procedures. The parallel efficiency and compatibility of these two algorithms were evaluated in discrete element method (DEM) simulations on four types of shared-memory parallel computers: a multicore multiprocessor computer, scalar supercomputer, vector supercomputer, and graphics processing unit. The computational efficiency of a DEM code was found to be drastically improved with our algorithms on all but the scalar supercomputer. Thus, the developed parallel algorithms are useful on shared-memory parallel computers with sufficient memory bandwidth.

  17. Application of Raptor-M3G to reactor dosimetry problems on massively parallel architectures - 026

    International Nuclear Information System (INIS)

    Longoni, G.

    2010-01-01

    The solution of complex 3-D radiation transport problems requires significant resources both in terms of computation time and memory availability. Therefore, parallel algorithms and multi-processor architectures are required to solve efficiently large 3-D radiation transport problems. This paper presents the application of RAPTOR-M3G (Rapid Parallel Transport Of Radiation - Multiple 3D Geometries) to reactor dosimetry problems. RAPTOR-M3G is a newly developed parallel computer code designed to solve the discrete ordinates (SN) equations on multi-processor computer architectures. This paper presents the results for a reactor dosimetry problem using a 3-D model of a commercial 2-loop pressurized water reactor (PWR). The accuracy and performance of RAPTOR-M3G will be analyzed and the numerical results obtained from the calculation will be compared directly to measurements of the neutron field in the reactor cavity air gap. The parallel performance of RAPTOR-M3G on massively parallel architectures, where the number of computing nodes is in the order of hundreds, will be analyzed up to four hundred processors. The performance results will be presented based on two supercomputing architectures: the POPLE supercomputer operated by the Pittsburgh Supercomputing Center and the Westinghouse computer cluster. The Westinghouse computer cluster is equipped with a standard Ethernet network connection and an InfiniBand R interconnects capable of a bandwidth in excess of 20 GBit/sec. Therefore, the impact of the network architecture on RAPTOR-M3G performance will be analyzed as well. (authors)

  18. Issues confronting women participation in the construction industry

    OpenAIRE

    Aulin, Radhlinah; Jingmond, Monika

    2011-01-01

    This paper raises the issues confronting the minority cohort’s participation in the construction industry. Women in construction are seen as the wrong gender to be around for the construction occupations require not only manual dexterity but physical strength. Currently, the industry is employing less than 10% of the female in the workforce with even lower participation in crafts and trade. This paper discussed about the current women participation in construction focusing on the European Uni...

  19. The Lived Experience of Iranian Women Confronting Breast Cancer Diagnosis

    OpenAIRE

    Esmat Mehrabi; Sepideh Hajian; Masoomeh Simbar; Mohammad Hoshyari; Farid Zayeri

    2016-01-01

    Introduction: The populations who survive from breast cancer are growing; nevertheless, they mostly encounter with many cancer related problems in their life, especially after early diagnosis and have to deal with these problems. Except for the disease entity, several socio-cultural factors may affect confronting this challenge among patients and the way they deal with. Present study was carried out to prepare clear understanding of Iranian women's...

  20. Scheme with two large extra dimensions confronted with neutrino physics

    International Nuclear Information System (INIS)

    Maalampi, J.; Sipilaeinen, V.; Vilja, I.

    2003-01-01

    We investigate a particle physics model in a six-dimensional spacetime, where two extra dimensions form a torus. Particles with standard model charges are confined by interactions with a scalar field to four four-dimensional branes, two vortices accommodating ordinary type fermions and two antivortices accommodating mirror fermions. We investigate the phenomenological implications of this multibrane structure by confronting the model with neutrino physics data

  1. Stop Harassment! Men’s reactions to victims’ confrontation

    Directory of Open Access Journals (Sweden)

    M. Carmen Herrera

    2014-07-01

    Full Text Available Sexual harassment is one of the most widespread forms of gender violence. Perceptions of sexual harassment depend on gender, context, the perceivers’ ideology, and a host of other factors. Research has underscored the importance of coping strategies in raising a victim’s self-confidence by making her feel that she plays an active role in overcoming her own problems. The aim of this study was to assess the men’s perceptions of sexual harassment in relation to different victim responses. The study involved 101 men who were administered a questionnaire focusing on two of the most frequent types of harassment (gender harassment vs. unwanted sexual attention and victim response (confrontation vs. non confrontation, both of which were manipulated. Moreover, the influences of ideological variables, ambivalent sexism, and the acceptance of myths of sexual harassment on perception were also assessed. The results highlight the complexities involved in recognizing certain behaviors as harassment and the implications of different victim responses to incidents of harassment. As the coping strategies used by women to confront harassment entail drawbacks that pose problems or hinder them, the design and implementation of prevention and/or education programs should strive to raise awareness among men and women to further their understanding of this construct.

  2. Parallelism in computations in quantum and statistical mechanics

    International Nuclear Information System (INIS)

    Clementi, E.; Corongiu, G.; Detrich, J.H.

    1985-01-01

    Often very fundamental biochemical and biophysical problems defy simulations because of limitations in today's computers. We present and discuss a distributed system composed of two IBM 4341 s and/or an IBM 4381 as front-end processors and ten FPS-164 attached array processors. This parallel system - called LCAP - has presently a peak performance of about 110 Mflops; extensions to higher performance are discussed. Presently, the system applications use a modified version of VM/SP as the operating system: description of the modifications is given. Three applications programs have been migrated from sequential to parallel: a molecular quantum mechanical, a Metropolis-Monte Carlo and a molecular dynamics program. Descriptions of the parallel codes are briefly outlined. Use of these parallel codes has already opened up new capabilities for our research. The very positive performance comparisons with today's supercomputers allow us to conclude that parallel computers and programming, of the type we have considered, represent a pragmatic answer to many computationally intensive problems. (orig.)

  3. Confronting Violence, Improving Women's Lives Special Display Opens at NLM | NIH MedlinePlus the Magazine

    Science.gov (United States)

    ... of this page please turn JavaScript on. Confronting Violence, Improving Women's Lives Special Display Opens at NLM ... Medicine Division. Photo Courtesy of Lisa Helfert Confronting Violence, Improving Women's Lives is on display in the ...

  4. Practical parallel computing

    CERN Document Server

    Morse, H Stephen

    1994-01-01

    Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi

  5. Parallel sorting algorithms

    CERN Document Server

    Akl, Selim G

    1985-01-01

    Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the

  6. Parallel/vector algorithms for the spherical SN transport theory method

    International Nuclear Information System (INIS)

    Haghighat, A.; Mattis, R.E.

    1990-01-01

    This paper discusses vector and parallel processing of a 1-D curvilinear (i.e. spherical) S N transport theory algorithm on the Cornell National SuperComputer Facility (CNSF) IBM 3090/600E. Two different vector algorithms were developed and parallelized based on angular decomposition. It is shown that significant speedups are attainable. For example, for problems with large granularity, using 4 processors, the parallel/vector algorithm achieves speedups (for wall-clock time) of more than 4.5 relative to the old serial/scalar algorithm. Furthermore, this work has demonstrated the existing potential for the development of faster processing vector and parallel algorithms for multidimensional curvilinear geometries. (author)

  7. Introduction to parallel programming

    CERN Document Server

    Brawer, Steven

    1989-01-01

    Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race

  8. PENTACLE: Parallelized particle-particle particle-tree code for planet formation

    Science.gov (United States)

    Iwasawa, Masaki; Oshino, Shoichi; Fujii, Michiko S.; Hori, Yasunori

    2017-10-01

    We have newly developed a parallelized particle-particle particle-tree code for planet formation, PENTACLE, which is a parallelized hybrid N-body integrator executed on a CPU-based (super)computer. PENTACLE uses a fourth-order Hermite algorithm to calculate gravitational interactions between particles within a cut-off radius and a Barnes-Hut tree method for gravity from particles beyond. It also implements an open-source library designed for full automatic parallelization of particle simulations, FDPS (Framework for Developing Particle Simulator), to parallelize a Barnes-Hut tree algorithm for a memory-distributed supercomputer. These allow us to handle 1-10 million particles in a high-resolution N-body simulation on CPU clusters for collisional dynamics, including physical collisions in a planetesimal disc. In this paper, we show the performance and the accuracy of PENTACLE in terms of \\tilde{R}_cut and a time-step Δt. It turns out that the accuracy of a hybrid N-body simulation is controlled through Δ t / \\tilde{R}_cut and Δ t / \\tilde{R}_cut ˜ 0.1 is necessary to simulate accurately the accretion process of a planet for ≥106 yr. For all those interested in large-scale particle simulations, PENTACLE, customized for planet formation, will be freely available from https://github.com/PENTACLE-Team/PENTACLE under the MIT licence.

  9. Re-inventing electromagnetics - Supercomputing solution of Maxwell's equations via direct time integration on space grids

    International Nuclear Information System (INIS)

    Taflove, A.

    1992-01-01

    This paper summarizes the present state and future directions of applying finite-difference and finite-volume time-domain techniques for Maxwell's equations on supercomputers to model complex electromagnetic wave interactions with structures. Applications so far have been dominated by radar cross section technology, but by no means are limited to this area. In fact, the gains we have made place us on the threshold of being able to make tremendous contributions to non-defense electronics and optical technology. Some of the most interesting research in these commercial areas is summarized. 47 refs

  10. Watson will see you now: a supercomputer to help clinicians make informed treatment decisions.

    Science.gov (United States)

    Doyle-Lindrud, Susan

    2015-02-01

    IBM has collaborated with several cancer care providers to develop and train the IBM supercomputer Watson to help clinicians make informed treatment decisions. When a patient is seen in clinic, the oncologist can input all of the clinical information into the computer system. Watson will then review all of the data and recommend treatment options based on the latest evidence and guidelines. Once the oncologist makes the treatment decision, this information can be sent directly to the insurance company for approval. Watson has the ability to standardize care and accelerate the approval process, a benefit to the healthcare provider and the patient.

  11. MANAGERIAL PROBLEMS CONFRONTED BY EXECUTIVE CHEFS IN HOTELS

    Directory of Open Access Journals (Sweden)

    Kemal BIRDIR

    2014-07-01

    Full Text Available The study was conducted to determine the managerial problems confronted by executive chefs working at 4 and 5-star hotels in Turkey. A survey developed by the researchers was employed as a data collection tool. Answers given by participants were analyzed using “T-test” and “ANOVA” analyses in order to determine whether there are significant differences of opinion on the subject (collated in answers to the survey questionnaire amongst executive chefs, based on answers given by them (expressed as average figures dependent upon such variables as their “Age”, “Gender”, “Educational Status” and “Star status of the hotel within which they worked.” The study results showed that the most important problem confronting executive chefs was “finding educated/trained kitchen personnel.” On the specific problem, “responsibility and authority is not clear within the kitchen,” there was a significant difference of opinion by the gender of the executive chefs. Moreover, there was a significant difference of opinion dependent upon the star status of the hotels within which the chefs worked on the problem of whether or not “the working hours of kitchen personnel were too long.” The findings suggest that there are important problems confronted by executive chefs. Moreover, male and female executive chefs have different opinions on the magnitudes of some specific problems. Whereas there are various reports and similar publications discussing problems faced by executive chefs, the present study is the first one in the literature that solely explore the managerial problems experienced at a kitchen context.

  12. Confronting the Consequences of a Permanent Changing Environment

    Directory of Open Access Journals (Sweden)

    Raluca Ioana Vosloban

    2013-05-01

    Full Text Available Businesses and governments choose how they wish to deal with change. Whether this change is organizational, technological, political, financial etc or even individual pursuing actions as usual is likely to lead to a downward path. The authors of this paper are giving a set of tools for confronting and understanding the consequences of this era of permanent changes by building strengths and seeking opportunities within organizations (private or public and within family (including friends. The work environment and the personal life of the individual have a common point which is adaptability, coping efficiently with changes, a demanded ability of the 3rd millennium human being.

  13. Parallel Atomistic Simulations

    Energy Technology Data Exchange (ETDEWEB)

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  14. Grassroots Supercomputing

    CERN Multimedia

    Buchanan, Mark

    2005-01-01

    What started out as a way for SETI to plow through its piles or radio-signal data from deep space has turned into a powerful research tool as computer users acrosse the globe donate their screen-saver time to projects as diverse as climate-change prediction, gravitational-wave searches, and protein folding (4 pages)

  15. The strategy of parallel approaches in projects with unforeseeable uncertainty: the Manhattan case in retrospect

    OpenAIRE

    Sylvain Lenfle

    2011-01-01

    International audience; This paper discusses the literature on the management of projects with unforeseeable uncertainty. Recent work demonstrates that, when confronted with unforeseeable uncertainties, managers can adopt either a learning, trial-and-error-based strategy, or a parallel approach. In the latter, different solutions are developed in parallel and the best one is chosen when enough information becomes available. Studying the case of the Manhattan Project, which historically exempl...

  16. The Lived Experience of Iranian Women Confronting Breast Cancer Diagnosis.

    Science.gov (United States)

    Mehrabi, Esmat; Hajian, Sepideh; Simbar, Masoomeh; Hoshyari, Mohammad; Zayeri, Farid

    2016-03-01

    The populations who survive from breast cancer are growing; nevertheless, they mostly encounter with many cancer related problems in their life, especially after early diagnosis and have to deal with these problems. Except for the disease entity, several socio-cultural factors may affect confronting this challenge among patients and the way they deal with. Present study was carried out to prepare clear understanding of Iranian women's lived experiences confronting breast cancer diagnosis and coping ways they applied to deal with it. This study was carried out by using qualitative phenomenological design. Data gathering was done through purposive sampling using semi-structured, in-depth interviews with 18 women who survived from breast cancer. The transcribed interviews were analyzed using Van Manen's thematic analysis approach. Two main themes were emerged from the interviews including "emotional turbulence" and "threat control". The first, comprised three sub themes including uncertainty, perceived worries, and living with fears. The second included risk control, recurrence control, immediate seeking help, seeking support and resource to spirituality. Emotional response was the immediate reflection to cancer diagnosis. However, during post-treatment period a variety of emotions were not uncommon findings, patients' perceptions have been changing along the time and problem-focused coping strategies have replaced. Although women may experience a degree of improvement and adjustment with illness, the emotional problems are not necessarily resolved, they may continue and gradually engender positive outcomes.

  17. The Lived Experience of Iranian Women Confronting Breast Cancer Diagnosis

    Directory of Open Access Journals (Sweden)

    Esmat Mehrabi

    2016-03-01

    Full Text Available Introduction: The populations who survive from breast cancer are growing; nevertheless, they mostly encounter with many cancer related problems in their life, especially after early diagnosis and have to deal with these problems. Except for the disease entity, several socio-cultural factors may affect confronting this challenge among patients and the way they deal with. Present study was carried out to prepare clear understanding of Iranian women's lived experiences confronting breast cancer diagnosis and coping ways they applied to deal with it. Methods: This study was carried out by using qualitative phenomenological design. Data gathering was done through purposive sampling using semi-structured, in-depth interviews with 18 women who survived from breast cancer. The transcribed interviews were analyzed using Van Manen’s thematic analysis approach. Results: Two main themes were emerged from the interviews including "emotional turbulence" and "threat control". The first, comprised three sub themes including uncertainty, perceived worries, and living with fears. The second included risk control, recurrence control, immediate seeking help, seeking support and resource to spirituality. Conclusion: Emotional response was the immediate reflection to cancer diagnosis. However, during post-treatment period a variety of emotions were not uncommon findings, patients' perceptions have been changing along the time and problem-focused coping strategies have replaced. Although women may experience a degree of improvement and adjustment with illness, the emotional problems are not necessarily resolved, they may continue and gradually engender positive outcomes.

  18. Synergetic Paradigm of Geopolitical Confrontation in the Postmodern Era

    Directory of Open Access Journals (Sweden)

    Sergey N. Teplyakov

    2014-01-01

    Full Text Available The article analyzes current state and mechanisms of geopolitical struggle in postmodern information age that has come. The author judges from assumption that entirely new postmodern society appeared with expansion of information technology, accompanied by cardinal changes in mechanisms of political power. Information technologies have become one of the most important factors contributing to the transformation of modern society from industrial to informational (post-industrial. In modern conditions, ensuring national and global security is a comprehensive process that includes not only measures to ensure information and economic security individually, but also such an integrated component as providing both information and economic security. The author suggests that modem geopolitical confrontation is carried out based on the synergetic paradigm. The main tool is information and energy influence on enemy system weaknesses using information space control, organizing negative information campaigns and applying economic sanctions. If the main focus of geopolitical struggle in modern era was forced expansion of the territory, in information postmodern age control over economic and information space has become priority among forms of geopolitical struggle. Military expansion of modern era becomes substituted by information and economic expansionism of postmodern using synergetic paradigm of geopolitical confrontation in order to control and capture the opponent's political space.

  19. Frequently updated noise threat maps created with use of supercomputing grid

    Directory of Open Access Journals (Sweden)

    Szczodrak Maciej

    2014-09-01

    Full Text Available An innovative supercomputing grid services devoted to noise threat evaluation were presented. The services described in this paper concern two issues, first is related to the noise mapping, while the second one focuses on assessment of the noise dose and its influence on the human hearing system. The discussed serviceswere developed within the PL-Grid Plus Infrastructure which accumulates Polish academic supercomputer centers. Selected experimental results achieved by the usage of the services proposed were presented. The assessment of the environmental noise threats includes creation of the noise maps using either ofline or online data, acquired through a grid of the monitoring stations. A concept of estimation of the source model parameters based on the measured sound level for the purpose of creating frequently updated noise maps was presented. Connecting the noise mapping grid service with a distributed sensor network enables to automatically update noise maps for a specified time period. Moreover, a unique attribute of the developed software is the estimation of the auditory effects evoked by the exposure to noise. The estimation method uses a modified psychoacoustic model of hearing and is based on the calculated noise level values and on the given exposure period. Potential use scenarios of the grid services for research or educational purpose were introduced. Presentation of the results of predicted hearing threshold shift caused by exposure to excessive noise can raise the public awareness of the noise threats.

  20. Computational fluid dynamics research at the United Technologies Research Center requiring supercomputers

    Science.gov (United States)

    Landgrebe, Anton J.

    1987-01-01

    An overview of research activities at the United Technologies Research Center (UTRC) in the area of Computational Fluid Dynamics (CFD) is presented. The requirement and use of various levels of computers, including supercomputers, for the CFD activities is described. Examples of CFD directed toward applications to helicopters, turbomachinery, heat exchangers, and the National Aerospace Plane are included. Helicopter rotor codes for the prediction of rotor and fuselage flow fields and airloads were developed with emphasis on rotor wake modeling. Airflow and airload predictions and comparisons with experimental data are presented. Examples are presented of recent parabolized Navier-Stokes and full Navier-Stokes solutions for hypersonic shock-wave/boundary layer interaction, and hydrogen/air supersonic combustion. In addition, other examples of CFD efforts in turbomachinery Navier-Stokes methodology and separated flow modeling are presented. A brief discussion of the 3-tier scientific computing environment is also presented, in which the researcher has access to workstations, mid-size computers, and supercomputers.

  1. Feynman diagrams sampling for quantum field theories on the QPACE 2 supercomputer

    Energy Technology Data Exchange (ETDEWEB)

    Rappl, Florian

    2016-08-01

    This work discusses the application of Feynman diagram sampling in quantum field theories. The method uses a computer simulation to sample the diagrammatic space obtained in a series expansion. For running large physical simulations powerful computers are obligatory, effectively splitting the thesis in two parts. The first part deals with the method of Feynman diagram sampling. Here the theoretical background of the method itself is discussed. Additionally, important statistical concepts and the theory of the strong force, quantum chromodynamics, are introduced. This sets the context of the simulations. We create and evaluate a variety of models to estimate the applicability of diagrammatic methods. The method is then applied to sample the perturbative expansion of the vertex correction. In the end we obtain the value for the anomalous magnetic moment of the electron. The second part looks at the QPACE 2 supercomputer. This includes a short introduction to supercomputers in general, as well as a closer look at the architecture and the cooling system of QPACE 2. Guiding benchmarks of the InfiniBand network are presented. At the core of this part, a collection of best practices and useful programming concepts are outlined, which enables the development of efficient, yet easily portable, applications for the QPACE 2 system.

  2. Use of high performance networks and supercomputers for real-time flight simulation

    Science.gov (United States)

    Cleveland, Jeff I., II

    1993-01-01

    In order to meet the stringent time-critical requirements for real-time man-in-the-loop flight simulation, computer processing operations must be consistent in processing time and be completed in as short a time as possible. These operations include simulation mathematical model computation and data input/output to the simulators. In 1986, in response to increased demands for flight simulation performance, NASA's Langley Research Center (LaRC), working with the contractor, developed extensions to the Computer Automated Measurement and Control (CAMAC) technology which resulted in a factor of ten increase in the effective bandwidth and reduced latency of modules necessary for simulator communication. This technology extension is being used by more than 80 leading technological developers in the United States, Canada, and Europe. Included among the commercial applications are nuclear process control, power grid analysis, process monitoring, real-time simulation, and radar data acquisition. Personnel at LaRC are completing the development of the use of supercomputers for mathematical model computation to support real-time flight simulation. This includes the development of a real-time operating system and development of specialized software and hardware for the simulator network. This paper describes the data acquisition technology and the development of supercomputing for flight simulation.

  3. Federal Market Information Technology in the Post Flash Crash Era: Roles for Supercomputing

    Energy Technology Data Exchange (ETDEWEB)

    Bethel, E. Wes; Leinweber, David; Ruebel, Oliver; Wu, Kesheng

    2011-09-16

    This paper describes collaborative work between active traders, regulators, economists, and supercomputing researchers to replicate and extend investigations of the Flash Crash and other market anomalies in a National Laboratory HPC environment. Our work suggests that supercomputing tools and methods will be valuable to market regulators in achieving the goal of market safety, stability, and security. Research results using high frequency data and analytics are described, and directions for future development are discussed. Currently the key mechanism for preventing catastrophic market action are “circuit breakers.” We believe a more graduated approach, similar to the “yellow light” approach in motorsports to slow down traffic, might be a better way to achieve the same goal. To enable this objective, we study a number of indicators that could foresee hazards in market conditions and explore options to confirm such predictions. Our tests confirm that Volume Synchronized Probability of Informed Trading (VPIN) and a version of volume Herfindahl-Hirschman Index (HHI) for measuring market fragmentation can indeed give strong signals ahead of the Flash Crash event on May 6 2010. This is a preliminary step toward a full-fledged early-warning system for unusual market conditions.

  4. Computational Science with the Titan Supercomputer: Early Outcomes and Lessons Learned

    Science.gov (United States)

    Wells, Jack

    2014-03-01

    Modeling and simulation with petascale computing has supercharged the process of innovation and understanding, dramatically accelerating time-to-insight and time-to-discovery. This presentation will focus on early outcomes from the Titan supercomputer at the Oak Ridge National Laboratory. Titan has over 18,000 hybrid compute nodes consisting of both CPUs and GPUs. In this presentation, I will discuss the lessons we have learned in deploying Titan and preparing applications to move from conventional CPU architectures to a hybrid machine. I will present early results of materials applications running on Titan and the implications for the research community as we prepare for exascale supercomputer in the next decade. Lastly, I will provide an overview of user programs at the Oak Ridge Leadership Computing Facility with specific information how researchers may apply for allocations of computing resources. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

  5. An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer.

    Science.gov (United States)

    Yang, Xi; Wu, Chengkun; Lu, Kai; Fang, Lin; Zhang, Yong; Li, Shengkang; Guo, Guixin; Du, YunFei

    2017-12-01

    Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing in China provides extra possibilities and capacity to address the challenges associated with big data. In this paper, we propose Orion-a big data interface on the Tianhe-2 supercomputer-to enable big data applications to run on Tianhe-2 via a single command or a shell script. Orion supports multiple users, and each user can launch multiple tasks. It minimizes the effort needed to initiate big data applications on the Tianhe-2 supercomputer via automated configuration. Orion follows the "allocate-when-needed" paradigm, and it avoids the idle occupation of computational resources. We tested the utility and performance of Orion using a big genomic dataset and achieved a satisfactory performance on Tianhe-2 with very few modifications to existing applications that were implemented in Hadoop/Spark. In summary, Orion provides a practical and economical interface for big data processing on Tianhe-2.

  6. DVS-SOFTWARE: An Effective Tool for Applying Highly Parallelized Hardware To Computational Geophysics

    Science.gov (United States)

    Herrera, I.; Herrera, G. S.

    2015-12-01

    Most geophysical systems are macroscopic physical systems. The behavior prediction of such systems is carried out by means of computational models whose basic models are partial differential equations (PDEs) [1]. Due to the enormous size of the discretized version of such PDEs it is necessary to apply highly parallelized super-computers. For them, at present, the most efficient software is based on non-overlapping domain decomposition methods (DDM). However, a limiting feature of the present state-of-the-art techniques is due to the kind of discretizations used in them. Recently, I. Herrera and co-workers using 'non-overlapping discretizations' have produced the DVS-Software which overcomes this limitation [2]. The DVS-software can be applied to a great variety of geophysical problems and achieves very high parallel efficiencies (90%, or so [3]). It is therefore very suitable for effectively applying the most advanced parallel supercomputers available at present. In a parallel talk, in this AGU Fall Meeting, Graciela Herrera Z. will present how this software is being applied to advance MOD-FLOW. Key Words: Parallel Software for Geophysics, High Performance Computing, HPC, Parallel Computing, Domain Decomposition Methods (DDM)REFERENCES [1]. Herrera Ismael and George F. Pinder, Mathematical Modelling in Science and Engineering: An axiomatic approach", John Wiley, 243p., 2012. [2]. Herrera, I., de la Cruz L.M. and Rosas-Medina A. "Non Overlapping Discretization Methods for Partial, Differential Equations". NUMER METH PART D E, 30: 1427-1454, 2014, DOI 10.1002/num 21852. (Open source) [3]. Herrera, I., & Contreras Iván "An Innovative Tool for Effectively Applying Highly Parallelized Software To Problems of Elasticity". Geofísica Internacional, 2015 (In press)

  7. Algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations with the use of parallel computations

    Energy Technology Data Exchange (ETDEWEB)

    Moryakov, A. V., E-mail: sailor@orc.ru [National Research Centre Kurchatov Institute (Russian Federation)

    2016-12-15

    An algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations is presented. The algorithm for systems of first-order differential equations is implemented in the EDELWEISS code with the possibility of parallel computations on supercomputers employing the MPI (Message Passing Interface) standard for the data exchange between parallel processes. The solution is represented by a series of orthogonal polynomials on the interval [0, 1]. The algorithm is characterized by simplicity and the possibility to solve nonlinear problems with a correction of the operator in accordance with the solution obtained in the previous iterative process.

  8. Distributed Memory Parallel Computing with SEAWAT

    Science.gov (United States)

    Verkaik, J.; Huizer, S.; van Engelen, J.; Oude Essink, G.; Ram, R.; Vuik, K.

    2017-12-01

    Fresh groundwater reserves in coastal aquifers are threatened by sea-level rise, extreme weather conditions, increasing urbanization and associated groundwater extraction rates. To counteract these threats, accurate high-resolution numerical models are required to optimize the management of these precious reserves. The major model drawbacks are long run times and large memory requirements, limiting the predictive power of these models. Distributed memory parallel computing is an efficient technique for reducing run times and memory requirements, where the problem is divided over multiple processor cores. A new Parallel Krylov Solver (PKS) for SEAWAT is presented. PKS has recently been applied to MODFLOW and includes Conjugate Gradient (CG) and Biconjugate Gradient Stabilized (BiCGSTAB) linear accelerators. Both accelerators are preconditioned by an overlapping additive Schwarz preconditioner in a way that: a) subdomains are partitioned using Recursive Coordinate Bisection (RCB) load balancing, b) each subdomain uses local memory only and communicates with other subdomains by Message Passing Interface (MPI) within the linear accelerator, c) it is fully integrated in SEAWAT. Within SEAWAT, the PKS-CG solver replaces the Preconditioned Conjugate Gradient (PCG) solver for solving the variable-density groundwater flow equation and the PKS-BiCGSTAB solver replaces the Generalized Conjugate Gradient (GCG) solver for solving the advection-diffusion equation. PKS supports the third-order Total Variation Diminishing (TVD) scheme for computing advection. Benchmarks were performed on the Dutch national supercomputer (https://userinfo.surfsara.nl/systems/cartesius) using up to 128 cores, for a synthetic 3D Henry model (100 million cells) and the real-life Sand Engine model ( 10 million cells). The Sand Engine model was used to investigate the potential effect of the long-term morphological evolution of a large sand replenishment and climate change on fresh groundwater resources

  9. A Computational Fluid Dynamics Algorithm on a Massively Parallel Computer

    Science.gov (United States)

    Jespersen, Dennis C.; Levit, Creon

    1989-01-01

    The discipline of computational fluid dynamics is demanding ever-increasing computational power to deal with complex fluid flow problems. We investigate the performance of a finite-difference computational fluid dynamics algorithm on a massively parallel computer, the Connection Machine. Of special interest is an implicit time-stepping algorithm; to obtain maximum performance from the Connection Machine, it is necessary to use a nonstandard algorithm to solve the linear systems that arise in the implicit algorithm. We find that the Connection Machine ran achieve very high computation rates on both explicit and implicit algorithms. The performance of the Connection Machine puts it in the same class as today's most powerful conventional supercomputers.

  10. Parallelization in Modern C++

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...

  11. Parallelism in matrix computations

    CERN Document Server

    Gallopoulos, Efstratios; Sameh, Ahmed H

    2016-01-01

    This book is primarily intended as a research monograph that could also be used in graduate courses for the design of parallel algorithms in matrix computations. It assumes general but not extensive knowledge of numerical linear algebra, parallel architectures, and parallel programming paradigms. The book consists of four parts: (I) Basics; (II) Dense and Special Matrix Computations; (III) Sparse Matrix Computations; and (IV) Matrix functions and characteristics. Part I deals with parallel programming paradigms and fundamental kernels, including reordering schemes for sparse matrices. Part II is devoted to dense matrix computations such as parallel algorithms for solving linear systems, linear least squares, the symmetric algebraic eigenvalue problem, and the singular-value decomposition. It also deals with the development of parallel algorithms for special linear systems such as banded ,Vandermonde ,Toeplitz ,and block Toeplitz systems. Part III addresses sparse matrix computations: (a) the development of pa...

  12. A parallel buffer tree

    DEFF Research Database (Denmark)

    Sitchinava, Nodar; Zeh, Norbert

    2012-01-01

    We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number of...... in the optimal OhOf(psortN + K/PB) parallel I/O complexity, where K is the size of the output reported in the process and psortN is the parallel I/O complexity of sorting N elements using P processors....

  13. Parallel MR imaging.

    Science.gov (United States)

    Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A; Seiberlich, Nicole

    2012-07-01

    Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the undersampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. Copyright © 2012 Wiley Periodicals, Inc.

  14. Parallel Algorithms and Patterns

    Energy Technology Data Exchange (ETDEWEB)

    Robey, Robert W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-06-16

    This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.

  15. Application Portable Parallel Library

    Science.gov (United States)

    Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

    1995-01-01

    Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.

  16. Progress on H5Part: A Portable High Performance Parallel Data Interface for Electromagnetics Simulations

    International Nuclear Information System (INIS)

    Adelmann, Andreas; Gsell, Achim; Oswald, Benedikt; Schietinger, Thomas; Bethel, Wes; Shalf, John; Siegerist, Cristina; Stockinger, Kurt

    2007-01-01

    Significant problems facing all experimental and computational sciences arise from growing data size and complexity. Common to all these problems is the need to perform efficient data I/O on diverse computer architectures. In our scientific application, the largest parallel particle simulations generate vast quantities of six-dimensional data. Such a simulation run produces data for an aggregate data size up to several TB per run. Motivated by the need to address data I/O and access challenges, we have implemented H5Part, an open source data I/O API that simplifies the use of the Hierarchical Data Format v5 library (HDF5). HDF5 is an industry standard for high performance, cross-platform data storage and retrieval that runs on all contemporary architectures from large parallel supercomputers to laptops. H5Part, which is oriented to the needs of the particle physics and cosmology communities, provides support for parallel storage and retrieval of particles, structured and in the future unstructured meshes. In this paper, we describe recent work focusing on I/O support for particles and structured meshes and provide data showing performance on modern supercomputer architectures like the IBM POWER 5

  17. Confronting Theoretical Predictions With Experimental Data; Fitting Strategy For Multi-Dimensional Distributions

    Directory of Open Access Journals (Sweden)

    Tomasz Przedziński

    2015-01-01

    Full Text Available After developing a Resonance Chiral Lagrangian (RχL model to describe hadronic τ lepton decays [18], the model was confronted with experimental data. This was accomplished using a fitting framework which was developed to take into account the complexity of the model and to ensure the numerical stability for the algorithms used in the fitting. Since the model used in the fit contained 15 parameters and there were only three 1-dimensional distributions available, we could expect multiple local minima or even whole regions of equal potential to appear. Our methods had to thoroughly explore the whole parameter space and ensure, as well as possible, that the result is a global minimum. This paper is focused on the technical aspects of the fitting strategy used. The first approach was based on re-weighting algorithm published in [17] and produced results in around two weeks. Later approach, with improved theoretical model and simple parallelization algorithm based on Inter-Process Communication (IPC methods of UNIX system, reduced computation time down to 2-3 days. Additional approximations were introduced to the model decreasing time to obtain the preliminary results down to 8 hours. This allowed to better validate the results leading to a more robust analysis published in [12].

  18. Confronting hip resurfacing and big femoral head replacement gait analysis

    Directory of Open Access Journals (Sweden)

    Panagiotis K. Karampinas

    2014-03-01

    Full Text Available Improved hip kinematics and bone preservation have been reported after resurfacing total hip replacement (THRS. On the other hand, hip kinematics with standard total hip replacement (THR is optimized with large diameter femoral heads (BFH-THR. The purpose of this study is to evaluate the functional outcomes of THRS and BFH-THR and correlate these results to bone preservation or the large femoral heads. Thirty-one patients were included in the study. Gait speed, postural balance, proprioception and overall performance. Our results demonstrated a non-statistically significant improvement in gait, postural balance and proprioception in the THRS confronting to BFH-THR group. THRS provide identical outcomes to traditional BFH-THR. The THRS choice as bone preserving procedure in younger patients is still to be evaluated.

  19. Confronting Uncertainty in Life Cycle Assessment Used for Decision Support

    DEFF Research Database (Denmark)

    Herrmann, Ivan Tengbjerg; Hauschild, Michael Zwicky; Sohn, Michael D.

    2014-01-01

    the decision maker (DM) in making the best possible choice for the environment. At present, some DMs do not trust the LCA to be a reliable decisionsupport tool—often because DMs consider the uncertainty of an LCA to be too large. The standard evaluation of uncertainty in LCAs is an ex-post approach that can...... regarding which type of LCA study to employ for the decision context at hand. This taxonomy enables the derivation of an LCA classification matrix to clearly identify and communicate the type of a given LCA. By relating the LCA classification matrix to statistical principles, we can also rank the different......The aim of this article is to help confront uncertainty in life cycle assessments (LCAs) used for decision support. LCAs offer a quantitative approach to assess environmental effects of products, technologies, and services and are conducted by an LCA practitioner or analyst (AN) to support...

  20. Confronting the Danish sectors for food and agriculture with 'terroir'

    DEFF Research Database (Denmark)

    Stoye, Monica

    2007-01-01

    in e.g. PDO and PGI labelling. In the Roman approach, the superior product can be differentiated from all other products by its special taste, identity and/or integrated cultural elements. This definition of superiority is far from the average understanding of high food quality in a Scandinavian......The notion of ‘terroir' originates from wine production in southern Europe. It denotes a traditional approach to food, agriculture and rurality - an approach, which by some scholars has been summarized as a Roman approach. This Roman approach has exerted great influence on EU policies, resulting...... country like Denmark, where uniform products, high nutritional and hygienic levels and veterinary approval characterise an extremely export oriented food sector. However, Danish small scale food producers, who want to implement the terroir approach in their own production, increasingly confront...

  1. Defense Strategy of Aircraft Confronted with IR Guided Missile

    Directory of Open Access Journals (Sweden)

    Hesong Huang

    2017-01-01

    Full Text Available Surface-type infrared (IR decoy can simulate the IR characteristics of the target aircraft, which is one of the most effective equipment to confront IR guided missile. In the air combat, the IR guided missile poses a serious threat to the aircraft when it comes from the front of target aircraft. In this paper, firstly, the model of aircraft and surface-type IR decoy is established. To ensure their authenticity, the aircraft maneuver and radiation models based on real data of flight and exhaust system radiation in the state of different heights and different speeds are established. Secondly, the most effective avoidance maneuver is simulated when the missile comes from the front of the target aircraft. Lastly, combining maneuver with decoys, the best defense strategy is analysed when the missile comes from the front of aircraft. The result of simulation, which is authentic, is propitious to avoid the missile and improve the survivability of aircraft.

  2. Confronting hybrid inflation in supergravity with CMB data

    International Nuclear Information System (INIS)

    Jeannerot, Rachel; Postma, Marieke

    2005-01-01

    F-term GUT inflation coupled to N = 1 supergravity is confronted with CMB data. Corrections to the string mass-per-unit-length away from the Bogomolny limit are taken into account. We find that a superpotential coupling 10 -7 /N∼ -2 /N, with N the dimension of the Higgs-representation, is still compatible with the data. The parameter space is enlarged in warm inflation, as well as in the curvaton and inhomogeneous reheat scenario. F-strings formed at the end of P-term inflation are also considered. Because these strings satisfy the Bogomolny bound the bounds are stronger: the gauge coupling is constrained to the range 10 -7 -4

  3. The latent confrontation: The Korean peninsula’s uncertain future

    Directory of Open Access Journals (Sweden)

    Asier Blas Mendoza

    2007-10-01

    Full Text Available This article covers the changes that have taken place in inter-Korean relations since the fall of the Soviet regimes. In the first few years following the fall of the iron curtain, the Korean perimeter became a scenario of confrontation that seemed to perpetuate theproblem. However, in the second half of the ‘90s, north-east Asia began to undergo a real change that resulted in public contacts between the two Koreas. The new game that was officially opened by the “Sunshine policy” led to a deep-seated rethinking of foreignpolicy by both states, and opened a new chapter in inter-Korean relations that has clearly demonstrated the important dimension and repercussions of the conflict in the geo-strategic framework of the entire East Asian area, as well as in international politics.

  4. Energy condition bounds and their confrontation with supernovae data

    International Nuclear Information System (INIS)

    Lima, M. P.; Vitenti, S.; Reboucas, M. J.

    2008-01-01

    The energy conditions play an important role in the understanding of several properties of the Universe, including the current accelerating expansion phase and the possible existence of the so-called phantom fields. We show that the integrated bounds provided by the energy conditions on cosmological observables such as the distance modulus μ(z) and the lookback time t L (z) are not sufficient (or necessary) to ensure the local fulfillment of the energy conditions, making explicit the limitation of these bounds in the confrontation with observational data. We recast the energy conditions as bounds on the deceleration and normalized Hubble parameters, obtaining new bounds which are necessary and sufficient for the local fulfillment of the energy conditions. A statistical confrontation, with 1σ-3σ confidence levels, between our bounds and supernovae data from the gold and combined samples is made for the recent past. Our analyses indicate, with 3σ confidence levels, the fulfillment of both the weak energy condition (WEC) and dominant energy condition (DEC) for z≤1 and z < or approx. 0.8, respectively. In addition, they suggest a possible recent violation of the null energy condition (NEC) with 3σ, i.e. a very recent phase of superacceleration. Our analyses also show the possibility of violation of the strong energy condition (SEC) with 3σ in the recent past (z≤1), but interestingly the q(z)-best-fit curve crosses the SEC--fulfillment divider at z≅0.67, which is a value very close to the beginning of the epoch of cosmic acceleration predicted by the standard concordance flat ΛCDM scenario.

  5. Parallel discrete event simulation

    NARCIS (Netherlands)

    Overeinder, B.J.; Hertzberger, L.O.; Sloot, P.M.A.; Withagen, W.J.

    1991-01-01

    In simulating applications for execution on specific computing systems, the simulation performance figures must be known in a short period of time. One basic approach to the problem of reducing the required simulation time is the exploitation of parallelism. However, in parallelizing the simulation

  6. Parallel reservoir simulator computations

    International Nuclear Information System (INIS)

    Hemanth-Kumar, K.; Young, L.C.

    1995-01-01

    The adaptation of a reservoir simulator for parallel computations is described. The simulator was originally designed for vector processors. It performs approximately 99% of its calculations in vector/parallel mode and relative to scalar calculations it achieves speedups of 65 and 81 for black oil and EOS simulations, respectively on the CRAY C-90

  7. Totally parallel multilevel algorithms

    Science.gov (United States)

    Frederickson, Paul O.

    1988-01-01

    Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.

  8. Parallel computing works

    Energy Technology Data Exchange (ETDEWEB)

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  9. Massively parallel mathematical sieves

    Energy Technology Data Exchange (ETDEWEB)

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  10. Wavelet transform-vector quantization compression of supercomputer ocean model simulation output

    Energy Technology Data Exchange (ETDEWEB)

    Bradley, J N; Brislawn, C M

    1992-11-12

    We describe a new procedure for efficient compression of digital information for storage and transmission purposes. The algorithm involves a discrete wavelet transform subband decomposition of the data set, followed by vector quantization of the wavelet transform coefficients using application-specific vector quantizers. The new vector quantizer design procedure optimizes the assignment of both memory resources and vector dimensions to the transform subbands by minimizing an exponential rate-distortion functional subject to constraints on both overall bit-rate and encoder complexity. The wavelet-vector quantization method, which originates in digital image compression. is applicable to the compression of other multidimensional data sets possessing some degree of smoothness. In this paper we discuss the use of this technique for compressing the output of supercomputer simulations of global climate models. The data presented here comes from Semtner-Chervin global ocean models run at the National Center for Atmospheric Research and at the Los Alamos Advanced Computing Laboratory.

  11. Palacios and Kitten : high performance operating systems for scalable virtualized and native supercomputing.

    Energy Technology Data Exchange (ETDEWEB)

    Widener, Patrick (University of New Mexico); Jaconette, Steven (Northwestern University); Bridges, Patrick G. (University of New Mexico); Xia, Lei (Northwestern University); Dinda, Peter (Northwestern University); Cui, Zheng.; Lange, John (Northwestern University); Hudson, Trammell B.; Levenhagen, Michael J.; Pedretti, Kevin Thomas Tauke; Brightwell, Ronald Brian

    2009-09-01

    Palacios and Kitten are new open source tools that enable applications, whether ported or not, to achieve scalable high performance on large machines. They provide a thin layer over the hardware to support both full-featured virtualized environments and native code bases. Kitten is an OS under development at Sandia that implements a lightweight kernel architecture to provide predictable behavior and increased flexibility on large machines, while also providing Linux binary compatibility. Palacios is a VMM that is under development at Northwestern University and the University of New Mexico. Palacios, which can be embedded into Kitten and other OSes, supports existing, unmodified applications and operating systems by using virtualization that leverages hardware technologies. We describe the design and implementation of both Kitten and Palacios. Our benchmarks show that they provide near native, scalable performance. Palacios and Kitten provide an incremental path to using supercomputer resources that is not performance-compromised.

  12. Use of QUADRICS supercomputer as embedded simulator in emergency management systems

    International Nuclear Information System (INIS)

    Bove, R.; Di Costanzo, G.; Ziparo, A.

    1996-07-01

    The experience related to the implementation of a MRBT, atmospheric spreading model with a short duration releasing, are reported. This model was implemented on a QUADRICS-Q1 supercomputer. First is reported a description of the MRBT model. It is an analytical model to study the speadings of light gases realised in the atmosphere cause incidental releasing. The solution of diffusion equation is Gaussian like. It yield the concentration of pollutant substance released. The concentration is function of space and time. Thus the QUADRICS architecture is introduced. And the implementation of the model is described. At the end it will be consider the integration of the QUADRICS-based model as simulator in a emergency management system

  13. Affordable and accurate large-scale hybrid-functional calculations on GPU-accelerated supercomputers

    Science.gov (United States)

    Ratcliff, Laura E.; Degomme, A.; Flores-Livas, José A.; Goedecker, Stefan; Genovese, Luigi

    2018-03-01

    Performing high accuracy hybrid functional calculations for condensed matter systems containing a large number of atoms is at present computationally very demanding or even out of reach if high quality basis sets are used. We present a highly optimized multiple graphics processing unit implementation of the exact exchange operator which allows one to perform fast hybrid functional density-functional theory (DFT) calculations with systematic basis sets without additional approximations for up to a thousand atoms. With this method hybrid DFT calculations of high quality become accessible on state-of-the-art supercomputers within a time-to-solution that is of the same order of magnitude as traditional semilocal-GGA functionals. The method is implemented in a portable open-source library.

  14. Research to application: Supercomputing trends for the 90's - Opportunities for interdisciplinary computations

    International Nuclear Information System (INIS)

    Shankar, V.

    1991-01-01

    The progression of supercomputing is reviewed from the point of view of computational fluid dynamics (CFD), and multidisciplinary problems impacting the design of advanced aerospace configurations are addressed. The application of full potential and Euler equations to transonic and supersonic problems in the 70s and early 80s is outlined, along with Navier-Stokes computations widespread during the late 80s and early 90s. Multidisciplinary computations currently in progress are discussed, including CFD and aeroelastic coupling for both static and dynamic flexible computations, CFD, aeroelastic, and controls coupling for flutter suppression and active control, and the development of a computational electromagnetics technology based on CFD methods. Attention is given to computational challenges standing in a way of the concept of establishing a computational environment including many technologies. 40 refs

  15. Solving sparse linear least squares problems on some supercomputers by using large dense blocks

    DEFF Research Database (Denmark)

    Hansen, Per Christian; Ostromsky, T; Sameh, A

    1997-01-01

    technique is preferable to sparse matrix technique when the matrices are not large, because the high computational speed compensates fully the disadvantages of using more arithmetic operations and more storage. For very large matrices the computations must be organized as a sequence of tasks in each......Efficient subroutines for dense matrix computations have recently been developed and are available on many high-speed computers. On some computers the speed of many dense matrix operations is near to the peak-performance. For sparse matrices storage and operations can be saved by operating only...... and storing only nonzero elements. However, the price is a great degradation of the speed of computations on supercomputers (due to the use of indirect addresses, to the need to insert new nonzeros in the sparse storage scheme, to the lack of data locality, etc.). On many high-speed computers a dense matrix...

  16. Reliability Lessons Learned From GPU Experience With The Titan Supercomputer at Oak Ridge Leadership Computing Facility

    Energy Technology Data Exchange (ETDEWEB)

    Gallarno, George [Christian Brothers University; Rogers, James H [ORNL; Maxwell, Don E [ORNL

    2015-01-01

    The high computational capability of graphics processing units (GPUs) is enabling and driving the scientific discovery process at large-scale. The world s second fastest supercomputer for open science, Titan, has more than 18,000 GPUs that computational scientists use to perform scientific simu- lations and data analysis. Understanding of GPU reliability characteristics, however, is still in its nascent stage since GPUs have only recently been deployed at large-scale. This paper presents a detailed study of GPU errors and their impact on system operations and applications, describing experiences with the 18,688 GPUs on the Titan supercom- puter as well as lessons learned in the process of efficient operation of GPUs at scale. These experiences are helpful to HPC sites which already have large-scale GPU clusters or plan to deploy GPUs in the future.

  17. EDF's experience with supercomputing and challenges ahead - towards multi-physics and multi-scale approaches

    International Nuclear Information System (INIS)

    Delbecq, J.M.; Banner, D.

    2003-01-01

    Nuclear power plants are a major asset of the EDF company. To remain so, in particular in a context of deregulation, competitiveness, safety and public acceptance are three conditions. These stakes apply both to existing plants and to future reactors. The purpose of the presentation is to explain how supercomputing can help EDF to satisfy these requirements. Three examples are described in detail: ensuring optimal use of nuclear fuel under wholly safe conditions, understanding and simulating the material deterioration mechanisms and moving forward with numerical simulation for the performance of EDF's activities. In conclusion, a broader vision of EDF long term R and D in the field of numerical simulation is given and especially of five challenges taken up by EDF together with its industrial and scientific partners. (author)

  18. Parallel time domain solvers for electrically large transient scattering problems

    KAUST Repository

    Liu, Yang

    2014-09-26

    Marching on in time (MOT)-based integral equation solvers represent an increasingly appealing avenue for analyzing transient electromagnetic interactions with large and complex structures. MOT integral equation solvers for analyzing electromagnetic scattering from perfect electrically conducting objects are obtained by enforcing electric field boundary conditions and implicitly time advance electric surface current densities by iteratively solving sparse systems of equations at all time steps. Contrary to finite difference and element competitors, these solvers apply to nonlinear and multi-scale structures comprising geometrically intricate and deep sub-wavelength features residing atop electrically large platforms. Moreover, they are high-order accurate, stable in the low- and high-frequency limits, and applicable to conducting and penetrable structures represented by highly irregular meshes. This presentation reviews some recent advances in the parallel implementations of time domain integral equation solvers, specifically those that leverage multilevel plane-wave time-domain algorithm (PWTD) on modern manycore computer architectures including graphics processing units (GPUs) and distributed memory supercomputers. The GPU-based implementation achieves at least one order of magnitude speedups compared to serial implementations while the distributed parallel implementation are highly scalable to thousands of compute-nodes. A distributed parallel PWTD kernel has been adopted to solve time domain surface/volume integral equations (TDSIE/TDVIE) for analyzing transient scattering from large and complex-shaped perfectly electrically conducting (PEC)/dielectric objects involving ten million/tens of millions of spatial unknowns.

  19. Parallel solutions of the two-group neutron diffusion equations

    International Nuclear Information System (INIS)

    Zee, K.S.; Turinsky, P.J.

    1987-01-01

    Recent efforts to adapt various numerical solution algorithms to parallel computer architectures have addressed the possibility of substantially reducing the running time of few-group neutron diffusion calculations. The authors have developed an efficient iterative parallel algorithm and an associated computer code for the rapid solution of the finite difference method representation of the two-group neutron diffusion equations on the CRAY X/MP-48 supercomputer having multi-CPUs and vector pipelines. For realistic simulation of light water reactor cores, the code employees a macroscopic depletion model with trace capability for selected fission product transients and critical boron. In addition to this, moderator and fuel temperature feedback models are also incorporated into the code. The validity of the physics models used in the code were benchmarked against qualified codes and proved accurate. This work is an extension of previous work in that various feedback effects are accounted for in the system; the entire code is structured to accommodate extensive vectorization; and an additional parallelism by multitasking is achieved not only for the solution of the matrix equations associated with the inner iterations but also for the other segments of the code, e.g., outer iterations

  20. Parallel processing at the SSC: The fact and the fiction

    International Nuclear Information System (INIS)

    Bourianoff, G.; Cole, B.

    1991-10-01

    Accurately modelling the behavior of particles circulating in accelerators is a computationally demanding task. The particle tracking code currently in use at SSC is based upon a ''thin element'' analysis (TEAPOT). In this model each magnet in the lattice is described by a thin element at which the particle experiences an impulsive kick. Each kick requires approximately 200 floating point operations (''FLOP''). For the SSC collider lattice consisting of 10 4 elements, performing a tracking of study for a set of 100 particles for 10 7 turns would require 2 x 10 15 FLOPS. Even on a machine capable of 100 MFLOP/sec (MFLOPS), this would require 2 x 10 7 seconds, and many such runs are necessary. It should be noted that the accuracy with which the kicks are to be calculated is important: the large number of iterations involved will magnify the effects of small errors. The inability of current computational resources to effectively perform the full calculation motivates the migration of this calculation to the most powerful computers available. A survey of the current research into new technologies for superconducting reveals that the supercomputers of the future will be parallel in nature. Further, numerous such machines exist today, and are being used to solve other difficult problems. Thus it seems clear that it is not early to begin developing the capability to develop tracking codes for parallel architectures. This report discusses implementing parallel processing on the SCC

  1. Parallel file system performances in fusion data storage

    International Nuclear Information System (INIS)

    Iannone, F.; Podda, S.; Bracco, G.; Manduchi, G.; Maslennikov, A.; Migliori, S.; Wolkersdorfer, K.

    2012-01-01

    High I/O flow rates, up to 10 GB/s, are required in large fusion Tokamak experiments like ITER where hundreds of nodes store simultaneously large amounts of data acquired during the plasma discharges. Typical network topologies such as linear arrays (systolic), rings, meshes (2-D arrays), tori (3-D arrays), trees, butterfly, hypercube in combination with high speed data transports like Infiniband or 10G-Ethernet, are the main areas in which the effort to overcome the so-called parallel I/O bottlenecks is most focused. The high I/O flow rates were modelled in an emulated testbed based on the parallel file systems such as Lustre and GPFS, commonly used in High Performance Computing. The test runs on High Performance Computing–For Fusion (8640 cores) and ENEA CRESCO (3392 cores) supercomputers. Message Passing Interface based applications were developed to emulate parallel I/O on Lustre and GPFS using data archival and access solutions like MDSPLUS and Universal Access Layer. These methods of data storage organization are widely diffused in nuclear fusion experiments and are being developed within the EFDA Integrated Tokamak Modelling – Task Force; the authors tried to evaluate their behaviour in a realistic emulation setup.

  2. Parallel file system performances in fusion data storage

    Energy Technology Data Exchange (ETDEWEB)

    Iannone, F., E-mail: francesco.iannone@enea.it [Associazione EURATOM-ENEA sulla Fusione, C.R.ENEA Frascati, via E.Fermi, 45 - 00044 Frascati, Rome (Italy); Podda, S.; Bracco, G. [ENEA Information Communication Tecnologies, Lungotevere Thaon di Revel, 76 - 00196 Rome (Italy); Manduchi, G. [Associazione EURATOM-ENEA sulla Fusione, Consorzio RFX, Corso Stati Uniti, 4 - 35127 Padua (Italy); Maslennikov, A. [CASPUR Inter-University Consortium for the Application of Super-Computing for Research, via dei Tizii, 6b - 00185 Rome (Italy); Migliori, S. [ENEA Information Communication Tecnologies, Lungotevere Thaon di Revel, 76 - 00196 Rome (Italy); Wolkersdorfer, K. [Juelich Supercomputing Centre-FZJ, D-52425 Juelich (Germany)

    2012-12-15

    High I/O flow rates, up to 10 GB/s, are required in large fusion Tokamak experiments like ITER where hundreds of nodes store simultaneously large amounts of data acquired during the plasma discharges. Typical network topologies such as linear arrays (systolic), rings, meshes (2-D arrays), tori (3-D arrays), trees, butterfly, hypercube in combination with high speed data transports like Infiniband or 10G-Ethernet, are the main areas in which the effort to overcome the so-called parallel I/O bottlenecks is most focused. The high I/O flow rates were modelled in an emulated testbed based on the parallel file systems such as Lustre and GPFS, commonly used in High Performance Computing. The test runs on High Performance Computing-For Fusion (8640 cores) and ENEA CRESCO (3392 cores) supercomputers. Message Passing Interface based applications were developed to emulate parallel I/O on Lustre and GPFS using data archival and access solutions like MDSPLUS and Universal Access Layer. These methods of data storage organization are widely diffused in nuclear fusion experiments and are being developed within the EFDA Integrated Tokamak Modelling - Task Force; the authors tried to evaluate their behaviour in a realistic emulation setup.

  3. BCYCLIC: A parallel block tridiagonal matrix cyclic solver

    Science.gov (United States)

    Hirshman, S. P.; Perumalla, K. S.; Lynch, V. E.; Sanchez, R.

    2010-09-01

    A block tridiagonal matrix is factored with minimal fill-in using a cyclic reduction algorithm that is easily parallelized. Storage of the factored blocks allows the application of the inverse to multiple right-hand sides which may not be known at factorization time. Scalability with the number of block rows is achieved with cyclic reduction, while scalability with the block size is achieved using multithreaded routines (OpenMP, GotoBLAS) for block matrix manipulation. This dual scalability is a noteworthy feature of this new solver, as well as its ability to efficiently handle arbitrary (non-powers-of-2) block row and processor numbers. Comparison with a state-of-the art parallel sparse solver is presented. It is expected that this new solver will allow many physical applications to optimally use the parallel resources on current supercomputers. Example usage of the solver in magneto-hydrodynamic (MHD), three-dimensional equilibrium solvers for high-temperature fusion plasmas is cited.

  4. The island dynamics model on parallel quadtree grids

    Science.gov (United States)

    Mistani, Pouria; Guittet, Arthur; Bochkov, Daniil; Schneider, Joshua; Margetis, Dionisios; Ratsch, Christian; Gibou, Frederic

    2018-05-01

    We introduce an approach for simulating epitaxial growth by use of an island dynamics model on a forest of quadtree grids, and in a parallel environment. To this end, we use a parallel framework introduced in the context of the level-set method. This framework utilizes: discretizations that achieve a second-order accurate level-set method on non-graded adaptive Cartesian grids for solving the associated free boundary value problem for surface diffusion; and an established library for the partitioning of the grid. We consider the cases with: irreversible aggregation, which amounts to applying Dirichlet boundary conditions at the island boundary; and an asymmetric (Ehrlich-Schwoebel) energy barrier for attachment/detachment of atoms at the island boundary, which entails the use of a Robin boundary condition. We provide the scaling analyses performed on the Stampede supercomputer and numerical examples that illustrate the capability of our methodology to efficiently simulate different aspects of epitaxial growth. The combination of adaptivity and parallelism in our approach enables simulations that are several orders of magnitude faster than those reported in the recent literature and, thus, provides a viable framework for the systematic study of mound formation on crystal surfaces.

  5. Confronting as autonomy promotion: Speaking up against discrimination and psychological well-being in racial minorities.

    Science.gov (United States)

    Sanchez, Diana T; Himmelstein, Mary S; Young, Danielle M; Albuja, Analia F; Garcia, Julie A

    2016-09-01

    Few studies have considered confrontation in the context of coping with discriminatory experiences. These studies test for the first time whether confronting racial discrimination is associated with greater psychological well-being and physical health through the promotion of autonomy. In two separate samples of racial minorities who had experienced racial discrimination, confrontation was associated with greater psychological well-being, and this relationship was mediated by autonomy promotion. These findings did not extend to physical health symptoms. These studies provide preliminary evidence that confrontation may aid in the process of regaining autonomy after experiencing discrimination and therefore promote well-being. © The Author(s) 2015.

  6. Harnessing Petaflop-Scale Multi-Core Supercomputing for Problems in Space Science

    Science.gov (United States)

    Albright, B. J.; Yin, L.; Bowers, K. J.; Daughton, W.; Bergen, B.; Kwan, T. J.

    2008-12-01

    The particle-in-cell kinetic plasma code VPIC has been migrated successfully to the world's fastest supercomputer, Roadrunner, a hybrid multi-core platform built by IBM for the Los Alamos National Laboratory. How this was achieved will be described and examples of state-of-the-art calculations in space science, in particular, the study of magnetic reconnection, will be presented. With VPIC on Roadrunner, we have performed, for the first time, plasma PIC calculations with over one trillion particles, >100× larger than calculations considered "heroic" by community standards. This allows examination of physics at unprecedented scale and fidelity. Roadrunner is an example of an emerging paradigm in supercomputing: the trend toward multi-core systems with deep hierarchies and where memory bandwidth optimization is vital to achieving high performance. Getting VPIC to perform well on such systems is a formidable challenge: the core algorithm is memory bandwidth limited with low compute-to-data ratio and requires random access to memory in its inner loop. That we were able to get VPIC to perform and scale well, achieving >0.374 Pflop/s and linear weak scaling on real physics problems on up to the full 12240-core Roadrunner machine, bodes well for harnessing these machines for our community's needs in the future. Many of the design considerations encountered commute to other multi-core and accelerated (e.g., via GPU) platforms and we modified VPIC with flexibility in mind. These will be summarized and strategies for how one might adapt a code for such platforms will be shared. Work performed under the auspices of the U.S. DOE by the LANS LLC Los Alamos National Laboratory. Dr. Bowers is a LANL Guest Scientist; he is presently at D. E. Shaw Research LLC, 120 W 45th Street, 39th Floor, New York, NY 10036.

  7. Load-balancing techniques for a parallel electromagnetic particle-in-cell code

    Energy Technology Data Exchange (ETDEWEB)

    PLIMPTON,STEVEN J.; SEIDEL,DAVID B.; PASIK,MICHAEL F.; COATS,REBECCA S.

    2000-01-01

    QUICKSILVER is a 3-d electromagnetic particle-in-cell simulation code developed and used at Sandia to model relativistic charged particle transport. It models the time-response of electromagnetic fields and low-density-plasmas in a self-consistent manner: the fields push the plasma particles and the plasma current modifies the fields. Through an LDRD project a new parallel version of QUICKSILVER was created to enable large-scale plasma simulations to be run on massively-parallel distributed-memory supercomputers with thousands of processors, such as the Intel Tflops and DEC CPlant machines at Sandia. The new parallel code implements nearly all the features of the original serial QUICKSILVER and can be run on any platform which supports the message-passing interface (MPI) standard as well as on single-processor workstations. This report describes basic strategies useful for parallelizing and load-balancing particle-in-cell codes, outlines the parallel algorithms used in this implementation, and provides a summary of the modifications made to QUICKSILVER. It also highlights a series of benchmark simulations which have been run with the new code that illustrate its performance and parallel efficiency. These calculations have up to a billion grid cells and particles and were run on thousands of processors. This report also serves as a user manual for people wishing to run parallel QUICKSILVER.

  8. Load-balancing techniques for a parallel electromagnetic particle-in-cell code

    International Nuclear Information System (INIS)

    Plimpton, Steven J.; Seidel, David B.; Pasik, Michael F.; Coats, Rebecca S.

    2000-01-01

    QUICKSILVER is a 3-d electromagnetic particle-in-cell simulation code developed and used at Sandia to model relativistic charged particle transport. It models the time-response of electromagnetic fields and low-density-plasmas in a self-consistent manner: the fields push the plasma particles and the plasma current modifies the fields. Through an LDRD project a new parallel version of QUICKSILVER was created to enable large-scale plasma simulations to be run on massively-parallel distributed-memory supercomputers with thousands of processors, such as the Intel Tflops and DEC CPlant machines at Sandia. The new parallel code implements nearly all the features of the original serial QUICKSILVER and can be run on any platform which supports the message-passing interface (MPI) standard as well as on single-processor workstations. This report describes basic strategies useful for parallelizing and load-balancing particle-in-cell codes, outlines the parallel algorithms used in this implementation, and provides a summary of the modifications made to QUICKSILVER. It also highlights a series of benchmark simulations which have been run with the new code that illustrate its performance and parallel efficiency. These calculations have up to a billion grid cells and particles and were run on thousands of processors. This report also serves as a user manual for people wishing to run parallel QUICKSILVER

  9. Algorithms for parallel computers

    International Nuclear Information System (INIS)

    Churchhouse, R.F.

    1985-01-01

    Until relatively recently almost all the algorithms for use on computers had been designed on the (usually unstated) assumption that they were to be run on single processor, serial machines. With the introduction of vector processors, array processors and interconnected systems of mainframes, minis and micros, however, various forms of parallelism have become available. The advantage of parallelism is that it offers increased overall processing speed but it also raises some fundamental questions, including: (i) which, if any, of the existing 'serial' algorithms can be adapted for use in the parallel mode. (ii) How close to optimal can such adapted algorithms be and, where relevant, what are the convergence criteria. (iii) How can we design new algorithms specifically for parallel systems. (iv) For multi-processor systems how can we handle the software aspects of the interprocessor communications. Aspects of these questions illustrated by examples are considered in these lectures. (orig.)

  10. Parallelism and array processing

    International Nuclear Information System (INIS)

    Zacharov, V.

    1983-01-01

    Modern computing, as well as the historical development of computing, has been dominated by sequential monoprocessing. Yet there is the alternative of parallelism, where several processes may be in concurrent execution. This alternative is discussed in a series of lectures, in which the main developments involving parallelism are considered, both from the standpoint of computing systems and that of applications that can exploit such systems. The lectures seek to discuss parallelism in a historical context, and to identify all the main aspects of concurrency in computation right up to the present time. Included will be consideration of the important question as to what use parallelism might be in the field of data processing. (orig.)

  11. Parallel magnetic resonance imaging

    International Nuclear Information System (INIS)

    Larkman, David J; Nunes, Rita G

    2007-01-01

    Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed. (invited topical review)

  12. Confronting uncertainty in wildlife management: performance of grizzly bear management.

    Science.gov (United States)

    Artelle, Kyle A; Anderson, Sean C; Cooper, Andrew B; Paquet, Paul C; Reynolds, John D; Darimont, Chris T

    2013-01-01

    Scientific management of wildlife requires confronting the complexities of natural and social systems. Uncertainty poses a central problem. Whereas the importance of considering uncertainty has been widely discussed, studies of the effects of unaddressed uncertainty on real management systems have been rare. We examined the effects of outcome uncertainty and components of biological uncertainty on hunt management performance, illustrated with grizzly bears (Ursus arctos horribilis) in British Columbia, Canada. We found that both forms of uncertainty can have serious impacts on management performance. Outcome uncertainty alone--discrepancy between expected and realized mortality levels--led to excess mortality in 19% of cases (population-years) examined. Accounting for uncertainty around estimated biological parameters (i.e., biological uncertainty) revealed that excess mortality might have occurred in up to 70% of cases. We offer a general method for identifying targets for exploited species that incorporates uncertainty and maintains the probability of exceeding mortality limits below specified thresholds. Setting targets in our focal system using this method at thresholds of 25% and 5% probability of overmortality would require average target mortality reductions of 47% and 81%, respectively. Application of our transparent and generalizable framework to this or other systems could improve management performance in the presence of uncertainty.

  13. Confronting uncertainty in wildlife management: performance of grizzly bear management.

    Directory of Open Access Journals (Sweden)

    Kyle A Artelle

    Full Text Available Scientific management of wildlife requires confronting the complexities of natural and social systems. Uncertainty poses a central problem. Whereas the importance of considering uncertainty has been widely discussed, studies of the effects of unaddressed uncertainty on real management systems have been rare. We examined the effects of outcome uncertainty and components of biological uncertainty on hunt management performance, illustrated with grizzly bears (Ursus arctos horribilis in British Columbia, Canada. We found that both forms of uncertainty can have serious impacts on management performance. Outcome uncertainty alone--discrepancy between expected and realized mortality levels--led to excess mortality in 19% of cases (population-years examined. Accounting for uncertainty around estimated biological parameters (i.e., biological uncertainty revealed that excess mortality might have occurred in up to 70% of cases. We offer a general method for identifying targets for exploited species that incorporates uncertainty and maintains the probability of exceeding mortality limits below specified thresholds. Setting targets in our focal system using this method at thresholds of 25% and 5% probability of overmortality would require average target mortality reductions of 47% and 81%, respectively. Application of our transparent and generalizable framework to this or other systems could improve management performance in the presence of uncertainty.

  14. Some Challenges the Management Confronts with, in the Financial Institutions

    Directory of Open Access Journals (Sweden)

    Laurentiu Mihai Treapat

    2014-02-01

    Full Text Available In this paper, we analyze some features and components of the management in general, and of the management in the financial area in particular. Special attention is given to how they cope with some risk which could affect their activity. Trying to find from practice what kind of difficulties the management faces in their work, for sure, we get to interesting conclusions and furthermore, to optimum solutions. We already have some data, result of some earlier preoccupations of the specialists (Dănilă and Berea, 2000 pp.39-48 while others can be foreseen as specific elements for the beginning of the 3rd millennium, that started with what the rating agencies seem to admit as the most important economic decline and prolonged recession risk within the post World War II history. We consider an evaluation of the challenges the management confronts with, lately - while subject to pressures and to the need for radical changes that come with an astonishing speed and that are enhanced by the shareholders’ desperate need to protect their capital. Findings reveal that, in any business enterprise the shareholders’ strategy and the management’s objectives are earning new clients, enlarging the market share, creating added value and on these bases, maximizing the gained profits. We consider that the volatile and fluctuant nature of the raw material the banks operate with - namely the money – turn the management in this area into a particular one, depicted by some specific features, which we analyze in the following pages.

  15. Dynamics of confrontation : Tarapur and Indo-US relations

    International Nuclear Information System (INIS)

    Banerji, Sanjukta

    1981-01-01

    Under the Tarapur Agreement signed in 1963, the United States is under contractual obligations to supply the enriched uranium fuel for the Tarapur Atomic Power Station. However, the supply of fuel has become an issue of confrontation between India and the United States after India conducted the peaceful nuclear explosion test at Pokhran in 1974. India also refused to sign the Non-Proliferation Treaty (NPT) due to its discriminatory nature. The United States insists on India's signing the NPT under the Nuclear Non-Proliferation Act passed by the American Congress in 1978. During 1977-79 period, a license for 12 tonnes was cleared after a sort of assurance that no nuclear test would be conducted. India refused to accept the full scope safeguards as stipulated in the U.S. Nuclear Non-Proliferation Act. In 1980, a presidential executive order for 40 tonnes of fuel was issued, but only one consignment of 19.6 tonnes was cleared by the American Senate. After Reagan became the U.S. President, the fuel supply completely stopped. Now discussions are taking place to terminate the Tarapur Agreement on mutually acceptable terms. (M.G.B.)

  16. Fear appeals and confronting information campaigns. [Previously: Fear-based information campaigns.

    NARCIS (Netherlands)

    2007-01-01

    Fear appeals or confronting information campaigns confront people in an often hard and sometimes even shocking way with the consequences of risky behaviour. This can have a positive impact on the attitudes and behavioural intentions of the target group, but only if key conditions are met. Those

  17. The STAPL Parallel Graph Library

    KAUST Repository

    Harshvardhan,; Fidel, Adam; Amato, Nancy M.; Rauchwerger, Lawrence

    2013-01-01

    This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable

  18. Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing

    Science.gov (United States)

    Amooie, M. A.; Moortgat, J.

    2017-12-01

    We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.

  19. US-CUBA RELATIONS: A NEW WAVE OF CONFRONTATION?

    Directory of Open Access Journals (Sweden)

    С Перес Бенитес

    2017-12-01

    Full Text Available The article seeks to analyze the role of the changes introduced by the administration of the former president Barack Obama in 2014-2016 into the bilateral US-Cuba relations; and the way in which the new presidential team are to reorganize this direction. The question on the attitude of Donald Trump towards currently existing policies aimed at solving the long-lasting problem with Cuban socialism is especially interesting since new US president has multiple times condemned the old ways practiced by the former establishment, but at the same time has shown readiness to act in a straight-forward and confrontational manner. One of contributors of the paper, Santiago Perez Benitez, deputy director of the Center for International Political Studies in Havana, is attempting to provide his professional expertise in granting an insider view from the Cuban side, evaluating the progress made since the 2014 and interpret the notion of the upcoming policy changes in Washington. The importance of the Cuban issue in the framework of US. policy in the Western hemisphere is explained by the fact that a solution in this sphere could help remake a negative image of Pan-American policies that haunts Washington. Cuban issue has also been long considered a possible key for reestablish-ment of trust between the United States and Latin American countries. For president Trump, quite unpopular judging by the polls, Cuban issue also has a potential to earn support of his own constituents, who strongly support lifting the embargo from Cuba. However now after certain decisions of Donald Trump the future of US-Cuban relations seems to get gloomier by the day.

  20. Massively parallel multicanonical simulations

    Science.gov (United States)

    Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

    2018-03-01

    Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.

  1. Research center Juelich to install Germany's most powerful supercomputer new IBM System for science and research will achieve 5.8 trillion computations per second

    CERN Multimedia

    2002-01-01

    "The Research Center Juelich, Germany, and IBM today announced that they have signed a contract for the delivery and installation of a new IBM supercomputer at the Central Institute for Applied Mathematics" (1/2 page).

  2. SPINning parallel systems software

    International Nuclear Information System (INIS)

    Matlin, O.S.; Lusk, E.; McCune, W.

    2002-01-01

    We describe our experiences in using Spin to verify parts of the Multi Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of processes connected by Unix network sockets. MPD is dynamic processes and connections among them are created and destroyed as MPD is initialized, runs user processes, recovers from faults, and terminates. This dynamic nature is easily expressible in the Spin/Promela framework but poses performance and scalability challenges. We present here the results of expressing some of the parallel algorithms of MPD and executing both simulation and verification runs with Spin

  3. Parallel programming with Python

    CERN Document Server

    Palach, Jan

    2014-01-01

    A fast, easy-to-follow and clear tutorial to help you develop Parallel computing systems using Python. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts and will help you in implementing these techniques in the real world. If you are an experienced Python programmer and are willing to utilize the available computing resources by parallelizing applications in a simple way, then this book is for you. You are required to have a basic knowledge of Python development to get the most of this book.

  4. A Massively Parallel Solver for the Mechanical Harmonic Analysis of Accelerator Cavities

    International Nuclear Information System (INIS)

    2015-01-01

    ACE3P is a 3D massively parallel simulation suite that developed at SLAC National Accelerator Laboratory that can perform coupled electromagnetic, thermal and mechanical study. Effectively utilizing supercomputer resources, ACE3P has become a key simulation tool for particle accelerator R and D. A new frequency domain solver to perform mechanical harmonic response analysis of accelerator components is developed within the existing parallel framework. This solver is designed to determine the frequency response of the mechanical system to external harmonic excitations for time-efficient accurate analysis of the large-scale problems. Coupled with the ACE3P electromagnetic modules, this capability complements a set of multi-physics tools for a comprehensive study of microphonics in superconducting accelerating cavities in order to understand the RF response and feedback requirements for the operational reliability of a particle accelerator. (auth)

  5. Accurate reaction-diffusion operator splitting on tetrahedral meshes for parallel stochastic molecular simulations

    Energy Technology Data Exchange (ETDEWEB)

    Hepburn, I.; De Schutter, E., E-mail: erik@oist.jp [Computational Neuroscience Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa 904 0495 (Japan); Theoretical Neurobiology & Neuroengineering, University of Antwerp, Antwerp 2610 (Belgium); Chen, W. [Computational Neuroscience Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa 904 0495 (Japan)

    2016-08-07

    Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realistic biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification.

  6. Highly efficient parallel direct solver for solving dense complex matrix equations from method of moments

    Directory of Open Access Journals (Sweden)

    Yan Chen

    2017-03-01

    Full Text Available Based on the vectorised and cache optimised kernel, a parallel lower upper decomposition with a novel communication avoiding pivoting scheme is developed to solve dense complex matrix equations generated by the method of moments. The fine-grain data rearrangement and assembler instructions are adopted to reduce memory accessing times and improve CPU cache utilisation, which also facilitate vectorisation of the code. Through grouping processes in a binary tree, a parallel pivoting scheme is designed to optimise the communication pattern and thus reduces the solving time of the proposed solver. Two large electromagnetic radiation problems are solved on two supercomputers, respectively, and the numerical results demonstrate that the proposed method outperforms those in open source and commercial libraries.

  7. A hybrid method for the parallel computation of Green's functions

    International Nuclear Information System (INIS)

    Petersen, Dan Erik; Li Song; Stokbro, Kurt; Sorensen, Hans Henrik B.; Hansen, Per Christian; Skelboe, Stig; Darve, Eric

    2009-01-01

    Quantum transport models for nanodevices using the non-equilibrium Green's function method require the repeated calculation of the block tridiagonal part of the Green's and lesser Green's function matrices. This problem is related to the calculation of the inverse of a sparse matrix. Because of the large number of times this calculation needs to be performed, this is computationally very expensive even on supercomputers. The classical approach is based on recurrence formulas which cannot be efficiently parallelized. This practically prevents the solution of large problems with hundreds of thousands of atoms. We propose new recurrences for a general class of sparse matrices to calculate Green's and lesser Green's function matrices which extend formulas derived by Takahashi and others. We show that these recurrences may lead to a dramatically reduced computational cost because they only require computing a small number of entries of the inverse matrix. Then, we propose a parallelization strategy for block tridiagonal matrices which involves a combination of Schur complement calculations and cyclic reduction. It achieves good scalability even on problems of modest size.

  8. The BlueGene/L Supercomputer and Quantum ChromoDynamics

    International Nuclear Information System (INIS)

    Vranas, P; Soltz, R

    2006-01-01

    In summary our update contains: (1) Perfect speedup sustaining 19.3% of peak for the Wilson D D-slash Dirac operator. (2) Measurements of the full Conjugate Gradient (CG) inverter that inverts the Dirac operator. The CG inverter contains two global sums over the entire machine. Nevertheless, our measurements retain perfect speedup scaling demonstrating the robustness of our methods. (3) We ran on the largest BG/L system, the LLNL 64 rack BG/L supercomputer, and obtained a sustained speed of 59.1 TFlops. Furthermore, the speedup scaling of the Dirac operator and of the CG inverter are perfect all the way up to the full size of the machine, 131,072 cores (please see Figure II). The local lattice is rather small (4 x 4 x 4 x 16) while the total lattice has been a lattice QCD vision for thermodynamic studies (a total of 128 x 128 x 256 x 32 lattice sites). This speed is about five times larger compared to the speed we quoted in our submission. As we have pointed out in our paper QCD is notoriously sensitive to network and memory latencies, has a relatively high communication to computation ratio which can not be overlapped in BGL in virtual node mode, and as an application is in a class of its own. The above results are thrilling to us and a 30 year long dream for lattice QCD

  9. Portable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputers

    Science.gov (United States)

    Oyarzun, Guillermo; Borrell, Ricard; Gorobets, Andrey; Oliva, Assensi

    2017-10-01

    Nowadays, high performance computing (HPC) systems experience a disruptive moment with a variety of novel architectures and frameworks, without any clarity of which one is going to prevail. In this context, the portability of codes across different architectures is of major importance. This paper presents a portable implementation model based on an algebraic operational approach for direct numerical simulation (DNS) and large eddy simulation (LES) of incompressible turbulent flows using unstructured hybrid meshes. The strategy proposed consists in representing the whole time-integration algorithm using only three basic algebraic operations: sparse matrix-vector product, a linear combination of vectors and dot product. The main idea is based on decomposing the nonlinear operators into a concatenation of two SpMV operations. This provides high modularity and portability. An exhaustive analysis of the proposed implementation for hybrid CPU/GPU supercomputers has been conducted with tests using up to 128 GPUs. The main objective consists in understanding the challenges of implementing CFD codes on new architectures.

  10. Assessment techniques for a learning-centered curriculum: evaluation design for adventures in supercomputing

    Energy Technology Data Exchange (ETDEWEB)

    Helland, B. [Ames Lab., IA (United States); Summers, B.G. [Oak Ridge National Lab., TN (United States)

    1996-09-01

    As the classroom paradigm shifts from being teacher-centered to being learner-centered, student assessments are evolving from typical paper and pencil testing to other methods of evaluation. Students should be probed for understanding, reasoning, and critical thinking abilities rather than their ability to return memorized facts. The assessment of the Department of Energy`s pilot program, Adventures in Supercomputing (AiS), offers one example of assessment techniques developed for learner-centered curricula. This assessment has employed a variety of methods to collect student data. Methods of assessment used were traditional testing, performance testing, interviews, short questionnaires via email, and student presentations of projects. The data obtained from these sources have been analyzed by a professional assessment team at the Center for Children and Technology. The results have been used to improve the AiS curriculum and establish the quality of the overall AiS program. This paper will discuss the various methods of assessment used and the results.

  11. Visualization at supercomputing centers: the tale of little big iron and the three skinny guys.

    Science.gov (United States)

    Bethel, E W; van Rosendale, J; Southard, D; Gaither, K; Childs, H; Brugger, E; Ahern, S

    2011-01-01

    Supercomputing centers are unique resources that aim to enable scientific knowledge discovery by employing large computational resources-the "Big Iron." Design, acquisition, installation, and management of the Big Iron are carefully planned and monitored. Because these Big Iron systems produce a tsunami of data, it's natural to colocate the visualization and analysis infrastructure. This infrastructure consists of hardware (Little Iron) and staff (Skinny Guys). Our collective experience suggests that design, acquisition, installation, and management of the Little Iron and Skinny Guys doesn't receive the same level of treatment as that of the Big Iron. This article explores the following questions about the Little Iron: How should we size the Little Iron to adequately support visualization and analysis of data coming off the Big Iron? What sort of capabilities must it have? Related questions concern the size of visualization support staff: How big should a visualization program be-that is, how many Skinny Guys should it have? What should the staff do? How much of the visualization should be provided as a support service, and how much should applications scientists be expected to do on their own?

  12. Expressing Parallelism with ROOT

    Energy Technology Data Exchange (ETDEWEB)

    Piparo, D. [CERN; Tejedor, E. [CERN; Guiraud, E. [CERN; Ganis, G. [CERN; Mato, P. [CERN; Moneta, L. [CERN; Valls Pla, X. [CERN; Canal, P. [Fermilab

    2017-11-22

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  13. Expressing Parallelism with ROOT

    Science.gov (United States)

    Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

    2017-10-01

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  14. Parallel Fast Legendre Transform

    NARCIS (Netherlands)

    Alves de Inda, M.; Bisseling, R.H.; Maslen, D.K.

    1998-01-01

    We discuss a parallel implementation of a fast algorithm for the discrete polynomial Legendre transform We give an introduction to the DriscollHealy algorithm using polynomial arithmetic and present experimental results on the eciency and accuracy of our implementation The algorithms were

  15. Practical parallel programming

    CERN Document Server

    Bauer, Barr E

    2014-01-01

    This is the book that will teach programmers to write faster, more efficient code for parallel processors. The reader is introduced to a vast array of procedures and paradigms on which actual coding may be based. Examples and real-life simulations using these devices are presented in C and FORTRAN.

  16. Parallel hierarchical radiosity rendering

    Energy Technology Data Exchange (ETDEWEB)

    Carter, Michael [Iowa State Univ., Ames, IA (United States)

    1993-07-01

    In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.

  17. Parallel universes beguile science

    CERN Multimedia

    2007-01-01

    A staple of mind-bending science fiction, the possibility of multiple universes has long intrigued hard-nosed physicists, mathematicians and cosmologists too. We may not be able -- as least not yet -- to prove they exist, many serious scientists say, but there are plenty of reasons to think that parallel dimensions are more than figments of eggheaded imagination.

  18. Parallel k-means++

    Energy Technology Data Exchange (ETDEWEB)

    2017-04-04

    A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.

  19. Parallel plate detectors

    International Nuclear Information System (INIS)

    Gardes, D.; Volkov, P.

    1981-01-01

    A 5x3cm 2 (timing only) and a 15x5cm 2 (timing and position) parallel plate avalanche counters (PPAC) are considered. The theory of operation and timing resolution is given. The measurement set-up and the curves of experimental results illustrate the possibilities of the two counters [fr

  20. [Adolescent confronting cancer and its place in the family].

    Science.gov (United States)

    Chavand, Aurélie; Grandjean, Hélène; Vignes, Michel

    2007-04-01

    Adolescent medicine is expanding in Europe with particular attention being given to cancer of adolescents and its treatment. At a time where specialised units for adolescents are being born, it is essential to collect the current knowledge on the pathological impact of the illness in this age period whose limits themselves are often blurred (13-21 years or 15-25 years). Adolescence is a transition between childhood and adulthood, during which one seeks psychological and emotional development. Cancer, by its direct repercussion on the adolescent and also by the disorganisation of the family, can involve risks impending the process of maturation and can also be a purveyor of psychological after-affects. The occurrence of the illness can isolate the adolescent and leak to a restriction of the psychological investment. The reality of possible death can hinder the ill adolescent from developing his natural opposition to the adults who represent authority such as parents or nurses, thereby hindering access to autonomy, independence and identity construction. One can find oneself locked in a state of trouble, confusion, becoming a stranger to oneself, with an impression of distance waxing between the young patient and others. The parents find themselves weakening and must make calls on their supporters. The siblings see their daily life becoming more unsettled and find themselves confronted by parents less available and reassuring. The impact on the brothers and sisters vary depending on their age and the capacity of the parent's adaptation. From the onset, adolescents struck by cancer necessitate an adaptation of the medical staff. The medical information, the treatment and the aid-care contracts must be approved by the adolescent himself but the parent's involvement remains essential. It is necessary to create an alliance of three. Conflicts and rivalry occur frequently between parents and the medical staff. One must study the possibility of creating a place adapted to

  1. The Heat is On! Confronting Climate Change in the Classroom

    Science.gov (United States)

    Bowman, R.; Atwood-Blaine, D.

    2008-12-01

    This paper discusses a professional development workshop for K-12 science teachers entitled "The Heat is On! Confronting Climate Change in the Classroom." This workshop was conducted by the Center for Remote Sensing of Ice Sheets (CReSIS), which has the primary goal to understand and predict the role of polar ice sheets in sea level change. The specific objectives of this summer workshop were two-fold; first, to address the need for advancement in science technology engineering and mathematics (STEM) education and second, to address the need for science teacher training in climate change science. Twenty-eight Kansas teachers completed four pre-workshop assignments online in Moodle and attended a one-week workshop. The workshop included lecture presentations by scientists (both face-to-face and via video-conference) and collaboration between teachers and scientists to create online inquiry-based lessons on the water budget, remote sensing, climate data, and glacial modeling. Follow-up opportunities are communicated via the CReSIS Teachers listserv to maintain and further develop the collegial connections and collaborations established during the workshop. Both qualitative and quantitative evaluation results indicate that this workshop was particularly effective in the following four areas: 1) creating meaningful connections between K-12 teachers and CReSIS scientists; 2) integrating distance-learning technologies to facilitate the social construction of knowledge; 3) increasing teachers' content understanding of climate change and its impacts on the cryosphere and global sea level; and 4) increasing teachers' self-efficacy beliefs about teaching climate science. Evaluation methods included formative content understanding assessments (via "clickers") during each scientist's presentation, a qualitative evaluation survey administered at the end of the workshop, and two quantitative evaluation instruments administered pre- and post- workshop. The first of these

  2. Parallel family trees for transfer matrices in the Potts model

    Science.gov (United States)

    Navarro, Cristobal A.; Canfora, Fabrizio; Hitschfeld, Nancy; Navarro, Gonzalo

    2015-02-01

    scenario it was in the range p ∈ [ 8 , 10 ] . Because of the parallel capabilities of the algorithm, a large-scale execution of the parallel family trees strategy in a supercomputer could contribute to the study of wider strip lattices.

  3. Space-charge-dominated beam dynamics simulations using the massively parallel processors (MPPs) of the Cray T3D

    International Nuclear Information System (INIS)

    Liu, H.

    1996-01-01

    Computer simulations using the multi-particle code PARMELA with a three-dimensional point-by-point space charge algorithm have turned out to be very helpful in supporting injector commissioning and operations at Thomas Jefferson National Accelerator Facility (Jefferson Lab, formerly called CEBAF). However, this algorithm, which defines a typical N 2 problem in CPU time scaling, is very time-consuming when N, the number of macro-particles, is large. Therefore, it is attractive to use massively parallel processors (MPPs) to speed up the simulations. Motivated by this, the authors modified the space charge subroutine for using the MPPs of the Cray T3D. The techniques used to parallelize and optimize the code on the T3D are discussed in this paper. The performance of the code on the T3D is examined in comparison with a Parallel Vector Processing supercomputer of the Cray C90 and an HP 735/15 high-end workstation

  4. An efficient parallel algorithm: Poststack and prestack Kirchhoff 3D depth migration using flexi-depth iterations

    Science.gov (United States)

    Rastogi, Richa; Srivastava, Abhishek; Khonde, Kiran; Sirasala, Kirannmayi M.; Londhe, Ashutosh; Chavhan, Hitesh

    2015-07-01

    This paper presents an efficient parallel 3D Kirchhoff depth migration algorithm suitable for current class of multicore architecture. The fundamental Kirchhoff depth migration algorithm exhibits inherent parallelism however, when it comes to 3D data migration, as the data size increases the resource requirement of the algorithm also increases. This challenges its practical implementation even on current generation high performance computing systems. Therefore a smart parallelization approach is essential to handle 3D data for migration. The most compute intensive part of Kirchhoff depth migration algorithm is the calculation of traveltime tables due to its resource requirements such as memory/storage and I/O. In the current research work, we target this area and develop a competent parallel algorithm for post and prestack 3D Kirchhoff depth migration, using hybrid MPI+OpenMP programming techniques. We introduce a concept of flexi-depth iterations while depth migrating data in parallel imaging space, using optimized traveltime table computations. This concept provides flexibility to the algorithm by migrating data in a number of depth iterations, which depends upon the available node memory and the size of data to be migrated during runtime. Furthermore, it minimizes the requirements of storage, I/O and inter-node communication, thus making it advantageous over the conventional parallelization approaches. The developed parallel algorithm is demonstrated and analysed on Yuva II, a PARAM series of supercomputers. Optimization, performance and scalability experiment results along with the migration outcome show the effectiveness of the parallel algorithm.

  5. Performance assessment of the SIMFAP parallel cluster at IFIN-HH Bucharest

    International Nuclear Information System (INIS)

    Adam, Gh.; Adam, S.; Ayriyan, A.; Dushanov, E.; Hayryan, E.; Korenkov, V.; Lutsenko, A.; Mitsyn, V.; Sapozhnikova, T.; Sapozhnikov, A; Streltsova, O.; Buzatu, F.; Dulea, M.; Vasile, I.; Sima, A.; Visan, C.; Busa, J.; Pokorny, I.

    2008-01-01

    Performance assessment and case study outputs of the parallel SIMFAP cluster at IFIN-HH Bucharest point to its effective and reliable operation. A comparison with results on the supercomputing system in LIT-JINR Dubna adds insight on resource allocation for problem solving by parallel computing. The solution of models asking for very large numbers of knots in the discretization mesh needs the migration to high performance computing based on parallel cluster architectures. The acquisition of ready-to-use parallel computing facilities being beyond limited budgetary resources, the solution at IFIN-HH was to buy the hardware and the inter-processor network, and to implement by own efforts the open software concerning both the operating system and the parallel computing standard. The present paper provides a report demonstrating the successful solution of these tasks. The implementation of the well-known HPL (High Performance LINPACK) Benchmark points to the effective and reliable operation of the cluster. The comparison of HPL outputs obtained on parallel clusters of different magnitudes shows that there is an optimum range of the order N of the linear algebraic system over which a given parallel cluster provides optimum parallel solutions. For the SIMFAP cluster, this range can be inferred to correspond to about 1 to 2 x 10 4 linear algebraic equations. For an algorithm of polynomial complexity N α the task sharing among p processors within a parallel solution mainly follows an (N/p)α behaviour under peak performance achievement. Thus, while the problem complexity remains the same, a substantial decrease of the coefficient of the leading order of the polynomial complexity is achieved. (authors)

  6. User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code

    International Nuclear Information System (INIS)

    Earth Sciences Division; Zhang, Keni; Zhang, Keni; Wu, Yu-Shu; Pruess, Karsten

    2008-01-01

    TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator is to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code. The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used

  7. A confrontation with reality - Proceedings of the 19th Association for Learning Technology Conference

    NARCIS (Netherlands)

    Hawkridge, David; Verjans, Steven; Wilson, Gail

    2012-01-01

    Hawkridge, D., Verjans, S., & Wilson, G. (Eds.) (2012). A confrontation with reality - Proceedings of the 19th Association for Learning Technology Conference (ALT-C 2012). September, 11-14, 2012, Manchester, UK.

  8. Parallel grid population

    Science.gov (United States)

    Wald, Ingo; Ize, Santiago

    2015-07-28

    Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.

  9. More parallel please

    DEFF Research Database (Denmark)

    Gregersen, Frans; Josephson, Olle; Kristoffersen, Gjert

    of departure that English may be used in parallel with the various local, in this case Nordic, languages. As such, the book integrates the challenge of internationalization faced by any university with the wish to improve quality in research, education and administration based on the local language......Abstract [en] More parallel, please is the result of the work of an Inter-Nordic group of experts on language policy financed by the Nordic Council of Ministers 2014-17. The book presents all that is needed to plan, practice and revise a university language policy which takes as its point......(s). There are three layers in the text: First, you may read the extremely brief version of the in total 11 recommendations for best practice. Second, you may acquaint yourself with the extended version of the recommendations and finally, you may study the reasoning behind each of them. At the end of the text, we give...

  10. PARALLEL MOVING MECHANICAL SYSTEMS

    Directory of Open Access Journals (Sweden)

    Florian Ion Tiberius Petrescu

    2014-09-01

    Full Text Available Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Moving mechanical systems parallel structures are solid, fast, and accurate. Between parallel systems it is to be noticed Stewart platforms, as the oldest systems, fast, solid and precise. The work outlines a few main elements of Stewart platforms. Begin with the geometry platform, kinematic elements of it, and presented then and a few items of dynamics. Dynamic primary element on it means the determination mechanism kinetic energy of the entire Stewart platforms. It is then in a record tail cinematic mobile by a method dot matrix of rotation. If a structural mottoelement consists of two moving elements which translates relative, drive train and especially dynamic it is more convenient to represent the mottoelement as a single moving components. We have thus seven moving parts (the six motoelements or feet to which is added mobile platform 7 and one fixed.

  11. Fear appeals and confronting information campaigns. [Previously: Fear-based information campaigns.

    OpenAIRE

    2007-01-01

    Fear appeals or confronting information campaigns confront people in an often hard and sometimes even shocking way with the consequences of risky behaviour. This can have a positive impact on the attitudes and behavioural intentions of the target group, but only if key conditions are met. Those conditions are that the information does not only evoke fear, but also informs the target group individuals of their personal risk and provides them with feasible and effective behavioural alternatives...

  12. NASA's Climate in a Box: Desktop Supercomputing for Open Scientific Model Development

    Science.gov (United States)

    Wojcik, G. S.; Seablom, M. S.; Lee, T. J.; McConaughy, G. R.; Syed, R.; Oloso, A.; Kemp, E. M.; Greenseid, J.; Smith, R.

    2009-12-01

    NASA's High Performance Computing Portfolio in cooperation with its Modeling, Analysis, and Prediction program intends to make its climate and earth science models more accessible to a larger community. A key goal of this effort is to open the model development and validation process to the scientific community at large such that a natural selection process is enabled and results in a more efficient scientific process. One obstacle to others using NASA models is the complexity of the models and the difficulty in learning how to use them. This situation applies not only to scientists who regularly use these models but also non-typical users who may want to use the models such as scientists from different domains, policy makers, and teachers. Another obstacle to the use of these models is that access to high performance computing (HPC) accounts, from which the models are implemented, can be restrictive with long wait times in job queues and delays caused by an arduous process of obtaining an account, especially for foreign nationals. This project explores the utility of using desktop supercomputers in providing a complete ready-to-use toolkit of climate research products to investigators and on demand access to an HPC system. One objective of this work is to pre-package NASA and NOAA models so that new users will not have to spend significant time porting the models. In addition, the prepackaged toolkit will include tools, such as workflow, visualization, social networking web sites, and analysis tools, to assist users in running the models and analyzing the data. The system architecture to be developed will allow for automatic code updates for each user and an effective means with which to deal with data that are generated. We plan to investigate several desktop systems, but our work to date has focused on a Cray CX1. Currently, we are investigating the potential capabilities of several non-traditional development environments. While most NASA and NOAA models are

  13. Xyce parallel electronic simulator.

    Energy Technology Data Exchange (ETDEWEB)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.

    2010-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.

  14. Stability of parallel flows

    CERN Document Server

    Betchov, R

    2012-01-01

    Stability of Parallel Flows provides information pertinent to hydrodynamical stability. This book explores the stability problems that occur in various fields, including electronics, mechanics, oceanography, administration, economics, as well as naval and aeronautical engineering. Organized into two parts encompassing 10 chapters, this book starts with an overview of the general equations of a two-dimensional incompressible flow. This text then explores the stability of a laminar boundary layer and presents the equation of the inviscid approximation. Other chapters present the general equation

  15. Algorithmically specialized parallel computers

    CERN Document Server

    Snyder, Lawrence; Gannon, Dennis B

    1985-01-01

    Algorithmically Specialized Parallel Computers focuses on the concept and characteristics of an algorithmically specialized computer.This book discusses the algorithmically specialized computers, algorithmic specialization using VLSI, and innovative architectures. The architectures and algorithms for digital signal, speech, and image processing and specialized architectures for numerical computations are also elaborated. Other topics include the model for analyzing generalized inter-processor, pipelined architecture for search tree maintenance, and specialized computer organization for raster

  16. Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

    Energy Technology Data Exchange (ETDEWEB)

    Xu, Chuanfu, E-mail: xuchuanfu@nudt.edu.cn [College of Computer Science, National University of Defense Technology, Changsha 410073 (China); Deng, Xiaogang; Zhang, Lilun [College of Computer Science, National University of Defense Technology, Changsha 410073 (China); Fang, Jianbin [Parallel and Distributed Systems Group, Delft University of Technology, Delft 2628CD (Netherlands); Wang, Guangxue; Jiang, Yi [State Key Laboratory of Aerodynamics, P.O. Box 211, Mianyang 621000 (China); Cao, Wei; Che, Yonggang; Wang, Yongxian; Wang, Zhenghua; Liu, Wei; Cheng, Xinghua [College of Computer Science, National University of Defense Technology, Changsha 410073 (China)

    2014-12-01

    Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations for high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU–GPU collaborative simulations

  17. Collaborating CPU and GPU for large-scale high-order CFD simulations with complex grids on the TianHe-1A supercomputer

    International Nuclear Information System (INIS)

    Xu, Chuanfu; Deng, Xiaogang; Zhang, Lilun; Fang, Jianbin; Wang, Guangxue; Jiang, Yi; Cao, Wei; Che, Yonggang; Wang, Yongxian; Wang, Zhenghua; Liu, Wei; Cheng, Xinghua

    2014-01-01

    Programming and optimizing complex, real-world CFD codes on current many-core accelerated HPC systems is very challenging, especially when collaborating CPUs and accelerators to fully tap the potential of heterogeneous systems. In this paper, with a tri-level hybrid and heterogeneous programming model using MPI + OpenMP + CUDA, we port and optimize our high-order multi-block structured CFD software HOSTA on the GPU-accelerated TianHe-1A supercomputer. HOSTA adopts two self-developed high-order compact definite difference schemes WCNS and HDCS that can simulate flows with complex geometries. We present a dual-level parallelization scheme for efficient multi-block computation on GPUs and perform particular kernel optimizations for high-order CFD schemes. The GPU-only approach achieves a speedup of about 1.3 when comparing one Tesla M2050 GPU with two Xeon X5670 CPUs. To achieve a greater speedup, we collaborate CPU and GPU for HOSTA instead of using a naive GPU-only approach. We present a novel scheme to balance the loads between the store-poor GPU and the store-rich CPU. Taking CPU and GPU load balance into account, we improve the maximum simulation problem size per TianHe-1A node for HOSTA by 2.3×, meanwhile the collaborative approach can improve the performance by around 45% compared to the GPU-only approach. Further, to scale HOSTA on TianHe-1A, we propose a gather/scatter optimization to minimize PCI-e data transfer times for ghost and singularity data of 3D grid blocks, and overlap the collaborative computation and communication as far as possible using some advanced CUDA and MPI features. Scalability tests show that HOSTA can achieve a parallel efficiency of above 60% on 1024 TianHe-1A nodes. With our method, we have successfully simulated an EET high-lift airfoil configuration containing 800M cells and China's large civil airplane configuration containing 150M cells. To our best knowledge, those are the largest-scale CPU–GPU collaborative simulations

  18. GPU: the biggest key processor for AI and parallel processing

    Science.gov (United States)

    Baji, Toru

    2017-07-01

    Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.

  19. Resistor Combinations for Parallel Circuits.

    Science.gov (United States)

    McTernan, James P.

    1978-01-01

    To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)

  20. SOFTWARE FOR DESIGNING PARALLEL APPLICATIONS

    Directory of Open Access Journals (Sweden)

    M. K. Bouza

    2017-01-01

    Full Text Available The object of research is the tools to support the development of parallel programs in C/C ++. The methods and software which automates the process of designing parallel applications are proposed.

  1. Parallel External Memory Graph Algorithms

    DEFF Research Database (Denmark)

    Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari

    2010-01-01

    In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of ¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....

  2. Parallel inter channel interaction mechanisms

    International Nuclear Information System (INIS)

    Jovic, V.; Afgan, N.; Jovic, L.

    1995-01-01

    Parallel channels interactions are examined. For experimental researches of nonstationary regimes flow in three parallel vertical channels results of phenomenon analysis and mechanisms of parallel channel interaction for adiabatic condition of one-phase fluid and two-phase mixture flow are shown. (author)

  3. A Parallel Butterfly Algorithm

    KAUST Repository

    Poulson, Jack; Demanet, Laurent; Maxwell, Nicholas; Ying, Lexing

    2014-01-01

    The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.

  4. A Parallel Butterfly Algorithm

    KAUST Repository

    Poulson, Jack

    2014-02-04

    The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.

  5. Fast parallel event reconstruction

    CERN Multimedia

    CERN. Geneva

    2010-01-01

    On-line processing of large data volumes produced in modern HEP experiments requires using maximum capabilities of modern and future many-core CPU and GPU architectures.One of such powerful feature is a SIMD instruction set, which allows packing several data items in one register and to operate on all of them, thus achievingmore operations per clock cycle. Motivated by the idea of using the SIMD unit ofmodern processors, the KF based track fit has been adapted for parallelism, including memory optimization, numerical analysis, vectorization with inline operator overloading, and optimization using SDKs. The speed of the algorithm has been increased in 120000 times with 0.1 ms/track, running in parallel on 16 SPEs of a Cell Blade computer.  Running on a Nehalem CPU with 8 cores it shows the processing speed of 52 ns/track using the Intel Threading Building Blocks. The same KF algorithm running on an Nvidia GTX 280 in the CUDA frameworkprovi...

  6. Supercomputing with toys: harnessing the power of NVIDIA 8800GTX and playstation 3 for bioinformatics problem.

    Science.gov (United States)

    Wilson, Justin; Dai, Manhong; Jakupovic, Elvis; Watson, Stanley; Meng, Fan

    2007-01-01

    Modern video cards and game consoles typically have much better performance to price ratios than that of general purpose CPUs. The parallel processing capabilities of game hardware are well-suited for high throughput biomedical data analysis. Our initial results suggest that game hardware is a cost-effective platform for some computationally demanding bioinformatics problems.

  7. Parallel Computing in SCALE

    International Nuclear Information System (INIS)

    DeHart, Mark D.; Williams, Mark L.; Bowman, Stephen M.

    2010-01-01

    The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement

  8. GRADSPMHD: A parallel MHD code based on the SPH formalism

    Science.gov (United States)

    Vanaverbeke, S.; Keppens, R.; Poedts, S.

    2014-03-01

    We present GRADSPMHD, a completely Lagrangian parallel magnetohydrodynamics code based on the SPH formalism. The implementation of the equations of SPMHD in the “GRAD-h” formalism assembles known results, including the derivation of the discretized MHD equations from a variational principle, the inclusion of time-dependent artificial viscosity, resistivity and conductivity terms, as well as the inclusion of a mixed hyperbolic/parabolic correction scheme for satisfying the ∇ṡB→ constraint on the magnetic field. The code uses a tree-based formalism for neighbor finding and can optionally use the tree code for computing the self-gravity of the plasma. The structure of the code closely follows the framework of our parallel GRADSPH FORTRAN 90 code which we added previously to the CPC program library. We demonstrate the capabilities of GRADSPMHD by running 1, 2, and 3 dimensional standard benchmark tests and we find good agreement with previous work done by other researchers. The code is also applied to the problem of simulating the magnetorotational instability in 2.5D shearing box tests as well as in global simulations of magnetized accretion disks. We find good agreement with available results on this subject in the literature. Finally, we discuss the performance of the code on a parallel supercomputer with distributed memory architecture. Catalogue identifier: AERP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERP_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 620503 No. of bytes in distributed program, including test data, etc.: 19837671 Distribution format: tar.gz Programming language: FORTRAN 90/MPI. Computer: HPC cluster. Operating system: Unix. Has the code been vectorized or parallelized?: Yes, parallelized using MPI. RAM: ˜30 MB for a

  9. Contribution to the algorithmic and efficient programming of new parallel architectures including accelerators for neutron physics and shielding computations

    International Nuclear Information System (INIS)

    Dubois, J.

    2011-01-01

    In science, simulation is a key process for research or validation. Modern computer technology allows faster numerical experiments, which are cheaper than real models. In the field of neutron simulation, the calculation of eigenvalues is one of the key challenges. The complexity of these problems is such that a lot of computing power may be necessary. The work of this thesis is first the evaluation of new computing hardware such as graphics card or massively multi-core chips, and their application to eigenvalue problems for neutron simulation. Then, in order to address the massive parallelism of supercomputers national, we also study the use of asynchronous hybrid methods for solving eigenvalue problems with this very high level of parallelism. Then we experiment the work of this research on several national supercomputers such as the Titane hybrid machine of the Computing Center, Research and Technology (CCRT), the Curie machine of the Very Large Computing Centre (TGCC), currently being installed, and the Hopper machine at the Lawrence Berkeley National Laboratory (LBNL). We also do our experiments on local workstations to illustrate the interest of this research in an everyday use with local computing resources. (author) [fr

  10. Parallel Block Structured Adaptive Mesh Refinement on Graphics Processing Units

    Energy Technology Data Exchange (ETDEWEB)

    Beckingsale, D. A. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom); Gaudin, W. P. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom); Hornung, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Gunney, B. T. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Gamblin, T. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Herdman, J. A. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom); Jarvis, S. A. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom)

    2014-11-17

    Block-structured adaptive mesh refinement is a technique that can be used when solving partial differential equations to reduce the number of zones necessary to achieve the required accuracy in areas of interest. These areas (shock fronts, material interfaces, etc.) are recursively covered with finer mesh patches that are grouped into a hierarchy of refinement levels. Despite the potential for large savings in computational requirements and memory usage without a corresponding reduction in accuracy, AMR adds overhead in managing the mesh hierarchy, adding complex communication and data movement requirements to a simulation. In this paper, we describe the design and implementation of a native GPU-based AMR library, including: the classes used to manage data on a mesh patch, the routines used for transferring data between GPUs on different nodes, and the data-parallel operators developed to coarsen and refine mesh data. We validate the performance and accuracy of our implementation using three test problems and two architectures: an eight-node cluster, and over four thousand nodes of Oak Ridge National Laboratory’s Titan supercomputer. Our GPU-based AMR hydrodynamics code performs up to 4.87× faster than the CPU-based implementation, and has been scaled to over four thousand GPUs using a combination of MPI and CUDA.

  11. Massively Parallel Polar Decomposition on Distributed-Memory Systems

    KAUST Repository

    Ltaief, Hatem

    2018-01-01

    We present a high-performance implementation of the Polar Decomposition (PD) on distributed-memory systems. Building upon on the QR-based Dynamically Weighted Halley (QDWH) algorithm, the key idea lies in finding the best rational approximation for the scalar sign function, which also corresponds to the polar factor for symmetric matrices, to further accelerate the QDWH convergence. Based on the Zolotarev rational functions—introduced by Zolotarev (ZOLO) in 1877— this new PD algorithm ZOLO-PD converges within two iterations even for ill-conditioned matrices, instead of the original six iterations needed for QDWH. ZOLO-PD uses the property of Zolotarev functions that optimality is maintained when two functions are composed in an appropriate manner. The resulting ZOLO-PD has a convergence rate up to seventeen, in contrast to the cubic convergence rate for QDWH. This comes at the price of higher arithmetic costs and memory footprint. These extra floating-point operations can, however, be processed in an embarrassingly parallel fashion. We demonstrate performance using up to 102, 400 cores on two supercomputers. We demonstrate that, in the presence of a large number of processing units, ZOLO-PD is able to outperform QDWH by up to 2.3X speedup, especially in situations where QDWH runs out of work, for instance, in the strong scaling mode of operation.

  12. Parallel Performance Optimizations on Unstructured Mesh-based Simulations

    Energy Technology Data Exchange (ETDEWEB)

    Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid

    2015-01-01

    © The Authors. Published by Elsevier B.V. This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.

  13. Parallel Polarization State Generation.

    Science.gov (United States)

    She, Alan; Capasso, Federico

    2016-05-17

    The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.

  14. 3-D electromagnetic plasma particle simulations on the Intel Delta parallel computer

    International Nuclear Information System (INIS)

    Wang, J.; Liewer, P.C.

    1994-01-01

    A three-dimensional electromagnetic PIC code has been developed on the 512 node Intel Touchstone Delta MIMD parallel computer. This code is based on the General Concurrent PIC algorithm which uses a domain decomposition to divide the computation among the processors. The 3D simulation domain can be partitioned into 1-, 2-, or 3-dimensional sub-domains. Particles must be exchanged between processors as they move among the subdomains. The Intel Delta allows one to use this code for very-large-scale simulations (i.e. over 10 8 particles and 10 6 grid cells). The parallel efficiency of this code is measured, and the overall code performance on the Delta is compared with that on Cray supercomputers. It is shown that their code runs with a high parallel efficiency of ≥ 95% for large size problems. The particle push time achieved is 115 nsecs/particle/time step for 162 million particles on 512 nodes. Comparing with the performance on a single processor Cray C90, this represents a factor of 58 speedup. The code uses a finite-difference leap frog method for field solve which is significantly more efficient than fast fourier transforms on parallel computers. The performance of this code on the 128 node Cray T3D will also be discussed

  15. Parallel imaging microfluidic cytometer.

    Science.gov (United States)

    Ehrlich, Daniel J; McKenna, Brian K; Evans, James G; Belkina, Anna C; Denis, Gerald V; Sherr, David H; Cheung, Man Ching

    2011-01-01

    By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of fluorescence-activated flow cytometry (FCM) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity, and (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in ∼6-10 min, about 30 times the speed of most current FCM systems. In 1D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times for the sample throughput of charge-coupled device (CCD)-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take. Copyright © 2011 Elsevier Inc. All rights reserved.

  16. Simulation of x-rays in refractive structure by the Monte Carlo method using the supercomputer SKIF

    International Nuclear Information System (INIS)

    Yaskevich, Yu.R.; Kravchenko, O.I.; Soroka, I.I.; Chembrovskij, A.G.; Kolesnik, A.S.; Serikova, N.V.; Petrov, P.V.; Kol'chevskij, N.N.

    2013-01-01

    Software 'Xray-SKIF' for the simulation of the X-rays in refractive structures by the Monte-Carlo method using the supercomputer SKIF BSU are developed. The program generates a large number of rays propagated from a source to the refractive structure. The ray trajectory under assumption of geometrical optics is calculated. Absorption is calculated for each ray inside of refractive structure. Dynamic arrays are used for results of calculation rays parameters, its restore the X-ray field distributions very fast at different position of detector. It was found that increasing the number of processors leads to proportional decreasing of calculation time: simulation of 10 8 X-rays using supercomputer with the number of processors from 1 to 30 run-times equal 3 hours and 6 minutes, respectively. 10 9 X-rays are calculated by software 'Xray-SKIF' which allows to reconstruct the X-ray field after refractive structure with a special resolution of 1 micron. (authors)

  17. Perceptions of racial confrontation: the role of color blindness and comment ambiguity.

    Science.gov (United States)

    Zou, Linda X; Dickter, Cheryl L

    2013-01-01

    Because of its emphasis on diminishing race and avoiding racial discourse, color-blind racial ideology has been suggested to have negative consequences for modern day race relations. The current research examined the influence of color blindness and the ambiguity of a prejudiced remark on perceptions of a racial minority group member who confronts the remark. One hundred thirteen White participants responded to a vignette depicting a White character making a prejudiced comment of variable ambiguity, after which a Black target character confronted the comment. Results demonstrated that the target confronter was perceived more negatively and as responding less appropriately by participants high in color blindness, and that this effect was particularly pronounced when participants responded to the ambiguous comment. Implications for the ways in which color blindness, as an accepted norm that is endorsed across legal and educational settings, can facilitate Whites' complicity in racial inequality are discussed.

  18. Advances in Supercomputing for the Modeling of Atomic Processes in Plasmas

    International Nuclear Information System (INIS)

    Ludlow, J. A.; Ballance, C. P.; Loch, S. D.; Lee, T. G.; Pindzola, M. S.; Griffin, D. C.; McLaughlin, B. M.; Colgan, J.

    2009-01-01

    An overview will be given of recent atomic and molecular collision methods developed to take advantage of modern massively parallel computers. The focus will be on direct solutions of the time-dependent Schroedinger equation for simple systems using large numerical lattices, as found in the time-dependent close-coupling method, and for configuration interaction solutions of the time-independent Schroedinger equation for more complex systems using large numbers of basis functions, as found in the R-matrix with pseudo-states method. Results from these large scale calculations are extremely useful in benchmarking less accurate theoretical methods and experimental data. To take full advantage of future petascale and exascale computing resources, it appears that even finer grain parallelism will be needed.

  19. About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems

    Directory of Open Access Journals (Sweden)

    Loredana MOCEAN

    2009-01-01

    Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.

  20. Coherent 40 Gb/s SP-16QAM and 80 Gb/s PDM-16QAM in an Optimal Supercomputer Optical Switch Fabric

    DEFF Research Database (Denmark)

    Karinou, Fotini; Borkowski, Robert; Zibar, Darko

    2013-01-01

    We demonstrate, for the first time, the feasibility of using 40 Gb/s SP-16QAM and 80 Gb/s PDM-16QAM in an optimized cell switching supercomputer optical interconnect architecture based on semiconductor optical amplifiers as ON/OFF gates.......We demonstrate, for the first time, the feasibility of using 40 Gb/s SP-16QAM and 80 Gb/s PDM-16QAM in an optimized cell switching supercomputer optical interconnect architecture based on semiconductor optical amplifiers as ON/OFF gates....

  1. Parallel Dynamic Analysis of a Large-Scale Water Conveyance Tunnel under Seismic Excitation Using ALE Finite-Element Method

    Directory of Open Access Journals (Sweden)

    Xiaoqing Wang

    2016-01-01

    Full Text Available Parallel analyses about the dynamic responses of a large-scale water conveyance tunnel under seismic excitation are presented in this paper. A full three-dimensional numerical model considering the water-tunnel-soil coupling is established and adopted to investigate the tunnel’s dynamic responses. The movement and sloshing of the internal water are simulated using the multi-material Arbitrary Lagrangian Eulerian (ALE method. Nonlinear fluid–structure interaction (FSI between tunnel and inner water is treated by using the penalty method. Nonlinear soil-structure interaction (SSI between soil and tunnel is dealt with by using the surface to surface contact algorithm. To overcome computing power limitations and to deal with such a large-scale calculation, a parallel algorithm based on the modified recursive coordinate bisection (MRCB considering the balance of SSI and FSI loads is proposed and used. The whole simulation is accomplished on Dawning 5000 A using the proposed MRCB based parallel algorithm optimized to run on supercomputers. The simulation model and the proposed approaches are validated by comparison with the added mass method. Dynamic responses of the tunnel are analyzed and the parallelism is discussed. Besides, factors affecting the dynamic responses are investigated. Better speedup and parallel efficiency show the scalability of the parallel method and the analysis results can be used to aid in the design of water conveyance tunnels.

  2. Confrontation Naming and Reading Abilities at Primary School: A Longitudinal Study

    Directory of Open Access Journals (Sweden)

    Chiara Luoni

    2015-01-01

    naming (i.e., the Boston Naming Test (BNT in a nonclinical sample of Italian primary school children was conducted (n=126, testing them at the end of each school year, to assess nonverbal intelligence, confrontation naming, and reading abilities. Results. Performance on the BNT emerged as a function of IQ and SES. Significant correlations between confrontation naming and reading abilities, especially comprehension, were found; BNT scores correlated better with reading fluency than with reading accuracy. Conclusions. The longitudinal data obtained in this study are discussed with regard to reading abilities, intelligence, age, gender, and socioeconomic status.

  3. A concurrent visualization system for large-scale unsteady simulations. Parallel vector performance on an NEC SX-4

    International Nuclear Information System (INIS)

    Takei, Toshifumi; Doi, Shun; Matsumoto, Hideki; Muramatsu, Kazuhiro

    2000-01-01

    We have developed a concurrent visualization system RVSLIB (Real-time Visual Simulation Library). This paper shows the effectiveness of the system when it is applied to large-scale unsteady simulations, for which the conventional post-processing approach may no longer work, on high-performance parallel vector supercomputers. The system performs almost all of the visualization tasks on a computation server and uses compressed visualized image data for efficient communication between the server and the user terminal. We have introduced several techniques, including vectorization and parallelization, into the system to minimize the computational costs of the visualization tools. The performance of RVSLIB was evaluated by using an actual CFD code on an NEC SX-4. The computational time increase due to the concurrent visualization was at most 3% for a smaller (1.6 million) grid and less than 1% for a larger (6.2 million) one. (author)

  4. High performance shallow water kernels for parallel overland flow simulations based on FullSWOF2D

    KAUST Repository

    Wittmann, Roland

    2017-01-25

    We describe code optimization and parallelization procedures applied to the sequential overland flow solver FullSWOF2D. Major difficulties when simulating overland flows comprise dealing with high resolution datasets of large scale areas which either cannot be computed on a single node either due to limited amount of memory or due to too many (time step) iterations resulting from the CFL condition. We address these issues in terms of two major contributions. First, we demonstrate a generic step-by-step transformation of the second order finite volume scheme in FullSWOF2D towards MPI parallelization. Second, the computational kernels are optimized by the use of templates and a portable vectorization approach. We discuss the load imbalance of the flux computation due to dry and wet cells and propose a solution using an efficient cell counting approach. Finally, scalability results are shown for different test scenarios along with a flood simulation benchmark using the Shaheen II supercomputer.

  5. Asynchronous Task-Based Parallelization of Algebraic Multigrid

    KAUST Repository

    AlOnazi, Amani A.

    2017-06-23

    As processor clock rates become more dynamic and workloads become more adaptive, the vulnerability to global synchronization that already complicates programming for performance in today\\'s petascale environment will be exacerbated. Algebraic multigrid (AMG), the solver of choice in many large-scale PDE-based simulations, scales well in the weak sense, with fixed problem size per node, on tightly coupled systems when loads are well balanced and core performance is reliable. However, its strong scaling to many cores within a node is challenging. Reducing synchronization and increasing concurrency are vital adaptations of AMG to hybrid architectures. Recent communication-reducing improvements to classical additive AMG by Vassilevski and Yang improve concurrency and increase communication-computation overlap, while retaining convergence properties close to those of standard multiplicative AMG, but remain bulk synchronous.We extend the Vassilevski and Yang additive AMG to asynchronous task-based parallelism using a hybrid MPI+OmpSs (from the Barcelona Supercomputer Center) within a node, along with MPI for internode communications. We implement a tiling approach to decompose the grid hierarchy into parallel units within task containers. We compare against the MPI-only BoomerAMG and the Auxiliary-space Maxwell Solver (AMS) in the hypre library for the 3D Laplacian operator and the electromagnetic diffusion, respectively. In time to solution for a full solve an MPI-OmpSs hybrid improves over an all-MPI approach in strong scaling at full core count (32 threads per single Haswell node of the Cray XC40) and maintains this per node advantage as both weak scale to thousands of cores, with MPI between nodes.

  6. Enabling parallel simulation of large-scale HPC network systems

    International Nuclear Information System (INIS)

    Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert B.; Carns, Philip

    2016-01-01

    Here, with the increasing complexity of today’s high-performance computing (HPC) architectures, simulation has become an indispensable tool for exploring the design space of HPC systems—in particular, networks. In order to make effective design decisions, simulations of these systems must possess the following properties: (1) have high accuracy and fidelity, (2) produce results in a timely manner, and (3) be able to analyze a broad range of network workloads. Most state-of-the-art HPC network simulation frameworks, however, are constrained in one or more of these areas. In this work, we present a simulation framework for modeling two important classes of networks used in today’s IBM and Cray supercomputers: torus and dragonfly networks. We use the Co-Design of Multi-layer Exascale Storage Architecture (CODES) simulation framework to simulate these network topologies at a flit-level detail using the Rensselaer Optimistic Simulation System (ROSS) for parallel discrete-event simulation. Our simulation framework meets all the requirements of a practical network simulation and can assist network designers in design space exploration. First, it uses validated and detailed flit-level network models to provide an accurate and high-fidelity network simulation. Second, instead of relying on serial time-stepped or traditional conservative discrete-event simulations that limit simulation scalability and efficiency, we use the optimistic event-scheduling capability of ROSS to achieve efficient and scalable HPC network simulations on today’s high-performance cluster systems. Third, our models give network designers a choice in simulating a broad range of network workloads, including HPC application workloads using detailed network traces, an ability that is rarely offered in parallel with high-fidelity network simulations

  7. Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications

    International Nuclear Information System (INIS)

    Hoisie, A.; Lubeck, O.; Wasserman, H.

    1998-01-01

    The authors develop a model for the parallel performance of algorithms that consist of concurrent, two-dimensional wavefronts implemented in a message passing environment. The model, based on a LogGP machine parameterization, combines the separate contributions of computation and communication wavefronts. They validate the model on three important supercomputer systems, on up to 500 processors. They use data from a deterministic particle transport application taken from the ASCI workload, although the model is general to any wavefront algorithm implemented on a 2-D processor domain. They also use the validated model to make estimates of performance and scalability of wavefront algorithms on 100-TFLOPS computer systems expected to be in existence within the next decade as part of the ASCI program and elsewhere. In this context, they analyze two problem sizes. The model shows that on the largest such problem (1 billion cells), inter-processor communication performance is not the bottleneck. Single-node efficiency is the dominant factor

  8. Grid-based Parallel Data Streaming Implemented for the Gyrokinetic Toroidal Code

    International Nuclear Information System (INIS)

    Klasky, S.; Ethier, S.; Lin, Z.; Martins, K.; McCune, D.; Samtaney, R.

    2003-01-01

    We have developed a threaded parallel data streaming approach using Globus to transfer multi-terabyte simulation data from a remote supercomputer to the scientist's home analysis/visualization cluster, as the simulation executes, with negligible overhead. Data transfer experiments show that this concurrent data transfer approach is more favorable compared with writing to local disk and then transferring this data to be post-processed. The present approach is conducive to using the grid to pipeline the simulation with post-processing and visualization. We have applied this method to the Gyrokinetic Toroidal Code (GTC), a 3-dimensional particle-in-cell code used to study microturbulence in magnetic confinement fusion from first principles plasma theory

  9. Parallel photonic information processing at gigabyte per second data rates using transient states

    Science.gov (United States)

    Brunner, Daniel; Soriano, Miguel C.; Mirasso, Claudio R.; Fischer, Ingo

    2013-01-01

    The increasing demands on information processing require novel computational concepts and true parallelism. Nevertheless, hardware realizations of unconventional computing approaches never exceeded a marginal existence. While the application of optics in super-computing receives reawakened interest, new concepts, partly neuro-inspired, are being considered and developed. Here we experimentally demonstrate the potential of a simple photonic architecture to process information at unprecedented data rates, implementing a learning-based approach. A semiconductor laser subject to delayed self-feedback and optical data injection is employed to solve computationally hard tasks. We demonstrate simultaneous spoken digit and speaker recognition and chaotic time-series prediction at data rates beyond 1Gbyte/s. We identify all digits with very low classification errors and perform chaotic time-series prediction with 10% error. Our approach bridges the areas of photonic information processing, cognitive and information science.

  10. Massively parallel simulations of strong electronic correlations: Realistic Coulomb vertex and multiplet effects

    Science.gov (United States)

    Baumgärtel, M.; Ghanem, K.; Kiani, A.; Koch, E.; Pavarini, E.; Sims, H.; Zhang, G.

    2017-07-01

    We discuss the efficient implementation of general impurity solvers for dynamical mean-field theory. We show that both Lanczos and quantum Monte Carlo in different flavors (Hirsch-Fye, continuous-time hybridization- and interaction-expansion) exhibit excellent scaling on massively parallel supercomputers. We apply these algorithms to simulate realistic model Hamiltonians including the full Coulomb vertex, crystal-field splitting, and spin-orbit interaction. We discuss how to remove the sign problem in the presence of non-diagonal crystal-field and hybridization matrices. We show how to extract the physically observable quantities from imaginary time data, in particular correlation functions and susceptibilities. Finally, we present benchmarks and applications for representative correlated systems.

  11. Parallel Computation of RCS of Electrically Large Platform with Coatings Modeled with NURBS Surfaces

    Directory of Open Access Journals (Sweden)

    Ying Yan

    2012-01-01

    Full Text Available The significance of Radar Cross Section (RCS in the military applications makes its prediction an important problem. This paper uses large-scale parallel Physical Optics (PO to realize the fast computation of RCS to electrically large targets, which are modeled by Non-Uniform Rational B-Spline (NURBS surfaces and coated with dielectric materials. Some numerical examples are presented to validate this paper’s method. In addition, 1024 CPUs are used in Shanghai Supercomputer Center (SSC to perform the simulation of a model with the maximum electrical size 1966.7 λ for the first time in China. From which, it can be found that this paper’s method can greatly speed the calculation and is capable of solving the real-life problem of RCS prediction.

  12. Implementation of a 3D plasma particle-in-cell code on a MIMD parallel computer

    International Nuclear Information System (INIS)

    Liewer, P.C.; Lyster, P.; Wang, J.

    1993-01-01

    A three-dimensional plasma particle-in-cell (PIC) code has been implemented on the Intel Delta MIMD parallel supercomputer using the General Concurrent PIC algorithm. The GCPIC algorithm uses a domain decomposition to divide the computation among the processors: A processor is assigned a subdomain and all the particles in it. Particles must be exchanged between processors as they move. Results are presented comparing the efficiency for 1-, 2- and 3-dimensional partitions of the three dimensional domain. This algorithm has been found to be very efficient even when a large fraction (e.g. 30%) of the particles must be exchanged at every time step. On the 512-node Intel Delta, up to 125 million particles have been pushed with an electrostatic push time of under 500 nsec/particle/time step

  13. GPAW - massively parallel electronic structure calculations with Python-based software

    DEFF Research Database (Denmark)

    Enkovaara, Jussi; Romero, Nichols A.; Shende, Sameer

    2011-01-01

    of the productivity enhancing features together with a good numerical performance. We have used this approach in implementing an electronic structure simulation software GPAW using the combination of Python and C programming languages. While the chosen approach works well in standard workstations and Unix...... popular choice. While dynamic, interpreted languages, such as Python, can increase the effciency of programmer, they cannot compete directly with the raw performance of compiled languages. However, by using an interpreted language together with a compiled language, it is possible to have most...... environments, massively parallel supercomputing systems can present some challenges in porting, debugging and profiling the software. In this paper we describe some details of the implementation and discuss the advantages and challenges of the combined Python/C approach. We show that despite the challenges...

  14. Parallel Framework for Cooperative Processes

    Directory of Open Access Journals (Sweden)

    Mitică Craus

    2005-01-01

    Full Text Available This paper describes the work of an object oriented framework designed to be used in the parallelization of a set of related algorithms. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can be used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. Two applications are presented: an Ant Colony Optimization (ACO parallel algorithm for the Travelling Salesman Problem (TSP and an Image Processing (IP parallel algorithm for the Symmetrical Neighborhood Filter (SNF. The implementations of these applications by means of the parallel framework prove to have good performances: approximatively linear speedup and low communication cost.

  15. Parallel Monte Carlo reactor neutronics

    International Nuclear Information System (INIS)

    Blomquist, R.N.; Brown, F.B.

    1994-01-01

    The issues affecting implementation of parallel algorithms for large-scale engineering Monte Carlo neutron transport simulations are discussed. For nuclear reactor calculations, these include load balancing, recoding effort, reproducibility, domain decomposition techniques, I/O minimization, and strategies for different parallel architectures. Two codes were parallelized and tested for performance. The architectures employed include SIMD, MIMD-distributed memory, and workstation network with uneven interactive load. Speedups linear with the number of nodes were achieved

  16. Anti-parallel triplexes

    DEFF Research Database (Denmark)

    Kosbar, Tamer R.; Sofan, Mamdouh A.; Waly, Mohamed A.

    2015-01-01

    about 6.1 °C when the TFO strand was modified with Z and the Watson-Crick strand with adenine-LNA (AL). The molecular modeling results showed that, in case of nucleobases Y and Z a hydrogen bond (1.69 and 1.72 Å, respectively) was formed between the protonated 3-aminopropyn-1-yl chain and one...... of the phosphate groups in Watson-Crick strand. Also, it was shown that the nucleobase Y made a good stacking and binding with the other nucleobases in the TFO and Watson-Crick duplex, respectively. In contrast, the nucleobase Z with LNA moiety was forced to twist out of plane of Watson-Crick base pair which......The phosphoramidites of DNA monomers of 7-(3-aminopropyn-1-yl)-8-aza-7-deazaadenine (Y) and 7-(3-aminopropyn-1-yl)-8-aza-7-deazaadenine LNA (Z) are synthesized, and the thermal stability at pH 7.2 and 8.2 of anti-parallel triplexes modified with these two monomers is determined. When, the anti...

  17. Parallel consensual neural networks.

    Science.gov (United States)

    Benediktsson, J A; Sveinsson, J R; Ersoy, O K; Swain, P H

    1997-01-01

    A new type of a neural-network architecture, the parallel consensual neural network (PCNN), is introduced and applied in classification/data fusion of multisource remote sensing and geographic data. The PCNN architecture is based on statistical consensus theory and involves using stage neural networks with transformed input data. The input data are transformed several times and the different transformed data are used as if they were independent inputs. The independent inputs are first classified using the stage neural networks. The output responses from the stage networks are then weighted and combined to make a consensual decision. In this paper, optimization methods are used in order to weight the outputs from the stage networks. Two approaches are proposed to compute the data transforms for the PCNN, one for binary data and another for analog data. The analog approach uses wavelet packets. The experimental results obtained with the proposed approach show that the PCNN outperforms both a conjugate-gradient backpropagation neural network and conventional statistical methods in terms of overall classification accuracy of test data.

  18. A Parallel Particle Swarm Optimizer

    National Research Council Canada - National Science Library

    Schutte, J. F; Fregly, B .J; Haftka, R. T; George, A. D

    2003-01-01

    .... Motivated by a computationally demanding biomechanical system identification problem, we introduce a parallel implementation of a stochastic population based global optimizer, the Particle Swarm...

  19. Seeing or moving in parallel

    DEFF Research Database (Denmark)

    Christensen, Mark Schram; Ehrsson, H Henrik; Nielsen, Jens Bo

    2013-01-01

    a different network, involving bilateral dorsal premotor cortex (PMd), primary motor cortex, and SMA, was more active when subjects viewed parallel movements while performing either symmetrical or parallel movements. Correlations between behavioral instability and brain activity were present in right lateral...... adduction-abduction movements symmetrically or in parallel with real-time congruent or incongruent visual feedback of the movements. One network, consisting of bilateral superior and middle frontal gyrus and supplementary motor area (SMA), was more active when subjects performed parallel movements, whereas...

  20. Numerical methods and parallel algorithms for fast transient strongly coupled fluid-structure dynamics

    International Nuclear Information System (INIS)

    Faucher, V.

    2014-01-01

    This HDR is dedicated to the research in the framework of fast transient dynamics for industrial fluid-structure systems carried in the Laboratory of Dynamic Studies from CEA, implementing new numerical methods for the modelling of complex systems and the parallel solution of large coupled problems on supercomputers. One key issue for the proposed approaches is the limitation to its minimum of the number of non-physical parameters, to cope with constraints arising from the area of usage of the concepts: safety for both nuclear applications (CEA, EDF) and aeronautics (ONERA), protection of the citizen (EC/JRC) in particular. Kinematic constraints strongly coupling structures (namely through unilateral contact) or fluid and structures (with both conformant or non-conformant meshes depending on the geometrical situation) are handled through exact methods including Lagrange Multipliers, with consequences on the solution strategy to be dealt with. This latter aspect makes EPX, the simulation code where the methods are integrated, a singular tool in the community of fast transient dynamics software. The document mainly relies on a description of the modelling needs for industrial fast transient scenarios, for nuclear applications in particular, and the proposed solutions built in the framework of the collaboration between CEA, EDF (via the LaMSID laboratory) and the LaMCoS laboratory from INSA Lyon. The main considered examples are the tearing of the fluid-filled tank after impact, the Code Disruptive Accident for a Generation IV reactor or the ruin of reinforced concrete structures under impact. Innovative models and parallel algorithms are thus proposed, allowing to carry out with robustness and performance the corresponding simulations on supercomputers made of interconnected multi-core nodes, with a strict preservation of the quality of the physical solution. This was particularly the main point of the ANR RePDyn project (2010-2013), with CEA as the pilot. (author

  1. MADmap: A Massively Parallel Maximum-Likelihood Cosmic Microwave Background Map-Maker

    Energy Technology Data Exchange (ETDEWEB)

    Cantalupo, Christopher; Borrill, Julian; Jaffe, Andrew; Kisner, Theodore; Stompor, Radoslaw

    2009-06-09

    MADmap is a software application used to produce maximum-likelihood images of the sky from time-ordered data which include correlated noise, such as those gathered by Cosmic Microwave Background (CMB) experiments. It works efficiently on platforms ranging from small workstations to the most massively parallel supercomputers. Map-making is a critical step in the analysis of all CMB data sets, and the maximum-likelihood approach is the most accurate and widely applicable algorithm; however, it is a computationally challenging task. This challenge will only increase with the next generation of ground-based, balloon-borne and satellite CMB polarization experiments. The faintness of the B-mode signal that these experiments seek to measure requires them to gather enormous data sets. MADmap is already being run on up to O(1011) time samples, O(108) pixels and O(104) cores, with ongoing work to scale to the next generation of data sets and supercomputers. We describe MADmap's algorithm based around a preconditioned conjugate gradient solver, fast Fourier transforms and sparse matrix operations. We highlight MADmap's ability to address problems typically encountered in the analysis of realistic CMB data sets and describe its application to simulations of the Planck and EBEX experiments. The massively parallel and distributed implementation is detailed and scaling complexities are given for the resources required. MADmap is capable of analysing the largest data sets now being collected on computing resources currently available, and we argue that, given Moore's Law, MADmap will be capable of reducing the most massive projected data sets.

  2. Race: A scalable and elastic parallel system for discovering repeats in very long sequences

    KAUST Repository

    Mansour, Essam

    2013-08-26

    A wide range of applications, including bioinformatics, time series, and log analysis, depend on the identification of repetitions in very long sequences. The problem of finding maximal pairs subsumes most important types of repetition-finding tasks. Existing solutions require both the input sequence and its index (typically an order of magnitude larger than the input) to fit in memory. Moreover, they are serial algorithms with long execution time. Therefore, they are limited to small datasets, despite the fact that modern applications demand orders of magnitude longer sequences. In this paper we present RACE, a parallel system for finding maximal pairs in very long sequences. RACE supports parallel execution on stand-alone multicore systems, in addition to scaling to thousands of nodes on clusters or supercomputers. RACE does not require the input or the index to fit in memory; therefore, it supports very long sequences with limited memory. Moreover, it uses a novel array representation that allows for cache-efficient implementation. RACE is particularly suitable for the cloud (e.g., Amazon EC2) because, based on availability, it can scale elastically to more or fewer machines during its execution. Since scaling out introduces overheads, mainly due to load imbalance, we propose a cost model to estimate the expected speedup, based on statistics gathered through sampling. The model allows the user to select the appropriate combination of cloud resources based on the provider\\'s prices and the required deadline. We conducted extensive experimental evaluation with large real datasets and large computing infrastructures. In contrast to existing methods, RACE can handle the entire human genome on a typical desktop computer with 16GB RAM. Moreover, for a problem that takes 10 hours of serial execution, RACE finishes in 28 seconds using 2,048 nodes on an IBM BlueGene/P supercomputer.

  3. Parallel Optimization of Polynomials for Large-scale Problems in Stability and Control

    Science.gov (United States)

    Kamyar, Reza

    In this thesis, we focus on some of the NP-hard problems in control theory. Thanks to the converse Lyapunov theory, these problems can often be modeled as optimization over polynomials. To avoid the problem of intractability, we establish a trade off between accuracy and complexity. In particular, we develop a sequence of tractable optimization problems --- in the form of Linear Programs (LPs) and/or Semi-Definite Programs (SDPs) --- whose solutions converge to the exact solution of the NP-hard problem. However, the computational and memory complexity of these LPs and SDPs grow exponentially with the progress of the sequence - meaning that improving the accuracy of the solutions requires solving SDPs with tens of thousands of decision variables and constraints. Setting up and solving such problems is a significant challenge. The existing optimization algorithms and software are only designed to use desktop computers or small cluster computers --- machines which do not have sufficient memory for solving such large SDPs. Moreover, the speed-up of these algorithms does not scale beyond dozens of processors. This in fact is the reason we seek parallel algorithms for setting-up and solving large SDPs on large cluster- and/or super-computers. We propose parallel algorithms for stability analysis of two classes of systems: 1) Linear systems with a large number of uncertain parameters; 2) Nonlinear systems defined by polynomial vector fields. First, we develop a distributed parallel algorithm which applies Polya's and/or Handelman's theorems to some variants of parameter-dependent Lyapunov inequalities with parameters defined over the standard simplex. The result is a sequence of SDPs which possess a block-diagonal structure. We then develop a parallel SDP solver which exploits this structure in order to map the computation, memory and communication to a distributed parallel environment. Numerical tests on a supercomputer demonstrate the ability of the algorithm to

  4. Supercomputing for molecular dynamics simulations handling multi-trillion particles in nanofluidics

    CERN Document Server

    Heinecke, Alexander; Horsch, Martin; Bungartz, Hans-Joachim

    2015-01-01

    This work presents modern implementations of relevant molecular dynamics algorithms using ls1 mardyn, a simulation program for engineering applications. The text focuses strictly on HPC-related aspects, covering implementation on HPC architectures, taking Intel Xeon and Intel Xeon Phi clusters as representatives of current platforms. The work describes distributed and shared-memory parallelization on these platforms, including load balancing, with a particular focus on the efficient implementation of the compute kernels. The text also discusses the software-architecture of the resulting code.

  5. Citizen-sensor-networks to confront government decision-makers: Two lessons from the Netherlands

    NARCIS (Netherlands)

    Carton, L.J.; Ache, P.M.

    2017-01-01

    This paper presents one emerging social-technical innovation: The evolution of citizen-sensor-networks where citizens organize themselves from the ‘bottom up’, for the sake of confronting governance officials with measured information about environmental qualities. We have observed how

  6. Sharing skills and knowledge to confront real-world problems | CRDI ...

    International Development Research Centre (IDRC) Digital Library (Canada)

    Sharing skills and knowledge to confront real-world problems ... because 1.3 billion people cannot read English, and only 163 million people (a fraction ... childhood obesity in urban areas, caused by increases in junk food and lack of exercise.

  7. The future of Asian feminisms: confronting fundamentalisms, conflicts and neo-liberalism

    NARCIS (Netherlands)

    Katjasungkana, N.; Wieringa, S.E.

    2012-01-01

    This book on the future of Asian feminisms, confronting fundamentalisms, conflicts, and neo-liberalism is a critical contribution to the rising voices of Asian women’s studies scholars and activists. It is based on the ongoing research and advocacy work of the Kartini Asia Network, founded in 2003

  8. La Grippe and World War I: conflict participation and pandemic confrontation.

    Science.gov (United States)

    Steele, B J; Collins, C D

    2009-01-01

    This paper assesses whether a nation-state's participation in conflict influences its ability to confront global pandemic or disease. Two alternative hypotheses are proposed. First, increased levels of conflict participation lead to increased abilities of states to confront pandemics. A second and alternative hypothesis is that increased conflict participation decreases the ability of states to confront pandemics. The hypotheses are tested through the ultimate case of war and pandemic: the 1918 Influenza pandemic (Spanish Flu or 'La Grippe') that killed 20-100 million people worldwide. Using simple correlation and case illustrations, we test these hypotheses with special focus upon the ability of the participant countries to confront the pandemic. The findings suggest, in a limited and varied fashion, that while neutral countries enjoyed the lowest levels of pandemic deaths, of the participant countries greater levels of conflict participation correlate with lower levels of pandemic deaths. The paper concludes with some propositions regarding the relationship between the current 'war on terror' and prospective pandemics such as avian flu.

  9. Differentiating between Confrontive and Coercive Kinds of Parental Power-Assertive Disciplinary Practices

    Science.gov (United States)

    Baumrind, Diana

    2012-01-01

    In this essay, I differentiate between coercive and confrontive kinds of power assertion to elucidate the significantly different effects on children's well-being of authoritarian and authoritative styles of parental authority. Although both parenting styles (in contrast to the permissive style) are equally demanding, forceful, and…

  10. Perceiving and Confronting Sexism: The Causal Role of Gender Identity Salience

    Science.gov (United States)

    Wang, Katie; Dovidio, John F.

    2017-01-01

    Although many researchers have explored the relations among gender identification, discriminatory attributions, and intentions to challenge discrimination, few have examined the causal impact of gender identity salience on women’s actual responses to a sexist encounter. In the current study, we addressed this question by experimentally manipulating the salience of gender identity and assessing its impact on women’s decision to confront a sexist comment in a simulated online interaction. Female participants (N = 114) were randomly assigned to complete a short measure of either personal or collective self-esteem, which was designed to increase the salience of personal versus gender identity. They were then given the opportunity to confront a male interaction partner who expressed sexist views. Compared to those who were primed to focus on their personal identity, participants who were primed to focus on their gender identity perceived the interaction partner’s remarks as more sexist and were more likely to engage in confrontation. By highlighting the powerful role of subtle contextual cues in shaping women’s perceptions of, and responses to, sexism, our findings have important implications for the understanding of gender identity salience as an antecedent of prejudice confrontation. Online slides for instructors who want to use this article for teaching are available on PWQ’s website at http://journals.sagepub.com/page/pwq/suppl/index. PMID:29051685

  11. Post Stereotypes: Deconstructing Racial Assumptions and Biases through Visual Culture and Confrontational Pedagogy

    Science.gov (United States)

    Jung, Yuha

    2015-01-01

    The Post Stereotypes project embodies confrontational pedagogy and involves postcard artmaking designed to both solicit expression of and deconstruct students' racial, ethnic, and cultural stereotypes and assumptions. As part of the Cultural Diversity in American Art course, students created postcard art that visually represented their personal…

  12. Educating Women Students in the Academy to Confront Gender Discrimination and Contribute to Equity Afterward

    Science.gov (United States)

    Mentkowski, Marcia; Rogers, Glen

    2010-01-01

    We argue that (1) faculty and other academic professionals who educate undergraduate women in capabilities such as effective communication, teamwork, and leadership that are integrated with the disciplines (e.g., biology, history, fine arts) and professions (e.g., education, nursing, management) indirectly assist their students to confront gender…

  13. The challenges that head nurses confront on financial management today: A qualitative study

    Directory of Open Access Journals (Sweden)

    Yang Bai, RN, BA

    2017-04-01

    Conclusions: The confusion confronted by head nurses in Changsha include three aspects: managerial roles, managerial training, and managerial tools. Cooperative management model, evidence-based management training, and data-driven tools will contribute to improving the financial management capacity of nurse managers.

  14. Markets, Equality and Democratic Education: Confronting the Neoliberal and Libertarian Reconceptualisations of Education

    Science.gov (United States)

    Sung, Youl-Kwan

    2010-01-01

    The global emergence of market liberalism marks an effort to decouple the link between citizenship and the welfare state and to rearticulate people's identity as homo economicus, as independent citizens having the right to property and the freedom to choose in the marketplace. Confronting this phenomenon, this paper reviews neoliberal and…

  15. Understanding and Confronting Alcohol-Induced Risky Behavior among College Students

    Science.gov (United States)

    Dornier, Lucien J.; Fauquier, Katharine J.; Field, April R.; Budden, Michael C.

    2010-01-01

    Confronting alcohol abuse is a challenge for most higher education institutions. Each year, students are admitted to hospitals for issues arising from the misuse of alcohol. The deaths of some engaged in alcohol related activities is especially worrisome. Factors such as age and financial standing could impact the likelihood of abuse. So-called…

  16. Unraveling the age-productivity nexus : Confronting perceptions of employers and employees

    NARCIS (Netherlands)

    van Dalen, H.P.; Henkens, C.J.I.M.; Schippers, J.

    2009-01-01

    What determines the perceived productivity of young and older workers? In this study we present evidence for (Dutch) employers and employees. By confronting the perceptions of employers and employees some remarkable similarities and differences are revealed. It turns out that productivity

  17. Unraveling the age-productivity nexus: confronting perceptions of employers and employees

    NARCIS (Netherlands)

    van Dalen, H.P.; Henkens, C.J.I.M.; Schippers, J.

    2009-01-01

    What determines the perceived productivity of young and older workers? In this study we present evidence for (Dutch) employers and employees. By confronting the perceptions of employers and employees some remarkable similarities and differences are revealed. It turns out that productivity

  18. Perceiving and Confronting Sexism: The Causal Role of Gender Identity Salience.

    Science.gov (United States)

    Wang, Katie; Dovidio, John F

    2017-03-01

    Although many researchers have explored the relations among gender identification, discriminatory attributions, and intentions to challenge discrimination, few have examined the causal impact of gender identity salience on women's actual responses to a sexist encounter. In the current study, we addressed this question by experimentally manipulating the salience of gender identity and assessing its impact on women's decision to confront a sexist comment in a simulated online interaction. Female participants ( N = 114) were randomly assigned to complete a short measure of either personal or collective self-esteem, which was designed to increase the salience of personal versus gender identity. They were then given the opportunity to confront a male interaction partner who expressed sexist views. Compared to those who were primed to focus on their personal identity, participants who were primed to focus on their gender identity perceived the interaction partner's remarks as more sexist and were more likely to engage in confrontation. By highlighting the powerful role of subtle contextual cues in shaping women's perceptions of, and responses to, sexism, our findings have important implications for the understanding of gender identity salience as an antecedent of prejudice confrontation. Online slides for instructors who want to use this article for teaching are available on PWQ's website at http://journals.sagepub.com/page/pwq/suppl/index.

  19. An overview of the activities of the OECD/NEA Task Force on adapting computer codes in nuclear applications to parallel architectures

    Energy Technology Data Exchange (ETDEWEB)

    Kirk, B.L. [Oak Ridge National Lab., TN (United States); Sartori, E. [OCDE/OECD NEA Data Bank, Issy-les-Moulineaux (France); Viedma, L.G. de [Consejo de Seguridad Nuclear, Madrid (Spain)

    1997-06-01

    Subsequent to the introduction of High Performance Computing in the developed countries, the Organization for Economic Cooperation and Development/Nuclear Energy Agency (OECD/NEA) created the Task Force on Adapting Computer Codes in Nuclear Applications to Parallel Architectures (under the guidance of the Nuclear Science Committee`s Working Party on Advanced Computing) to study the growth area in supercomputing and its applicability to the nuclear community`s computer codes. The result has been four years of investigation for the Task Force in different subject fields - deterministic and Monte Carlo radiation transport, computational mechanics and fluid dynamics, nuclear safety, atmospheric models and waste management.

  20. An overview of the activities of the OECD/NEA Task Force on adapting computer codes in nuclear applications to parallel architectures

    International Nuclear Information System (INIS)

    Kirk, B.L.; Sartori, E.; Viedma, L.G. de

    1997-01-01

    Subsequent to the introduction of High Performance Computing in the developed countries, the Organization for Economic Cooperation and Development/Nuclear Energy Agency (OECD/NEA) created the Task Force on Adapting Computer Codes in Nuclear Applications to Parallel Architectures (under the guidance of the Nuclear Science Committee's Working Party on Advanced Computing) to study the growth area in supercomputing and its applicability to the nuclear community's computer codes. The result has been four years of investigation for the Task Force in different subject fields - deterministic and Monte Carlo radiation transport, computational mechanics and fluid dynamics, nuclear safety, atmospheric models and waste management