parallel scientific applications: Topics by WorldWideScience.org

Sample records for parallel scientific applications

Scalability of Parallel Scientific Applications on the Cloud

Directory of Open Access Journals (Sweden)

Satish Narayana Srirama

2011-01-01

Full Text Available Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study the effects of moving parallel scientific applications onto the cloud, we deployed several benchmark applications like matrix–vector operations and NAS parallel benchmarks, and DOUG (Domain decomposition On Unstructured Grids on the cloud. DOUG is an open source software package for parallel iterative solution of very large sparse systems of linear equations. The detailed analysis of DOUG on the cloud showed that parallel applications benefit a lot and scale reasonable on the cloud. We could also observe the limitations of the cloud and its comparison with cluster in terms of performance. However, for efficiently running the scientific applications on the cloud infrastructure, the applications must be reduced to frameworks that can successfully exploit the cloud resources, like the MapReduce framework. Several iterative and embarrassingly parallel algorithms are reduced to the MapReduce model and their performance is measured and analyzed. The analysis showed that Hadoop MapReduce has significant problems with iterative methods, while it suits well for embarrassingly parallel algorithms. Scientific computing often uses iterative methods to solve large problems. Thus, for scientific computing on the cloud, this paper raises the necessity for better frameworks or optimizations for MapReduce.
State of the art of parallel scientific visualization applications on PC clusters

International Nuclear Information System (INIS)

Juliachs, M.

2004-01-01

In this state of the art on parallel scientific visualization applications on PC clusters, we deal with both surface and volume rendering approaches. We first analyze available PC cluster configurations and existing parallel rendering software components for parallel graphics rendering. CEA/DIF has been studying cluster visualization since 2001. This report is part of a study to set up a new visualization research platform. This platform consisting of an eight-node PC cluster under Linux and a tiled display was installed in collaboration with Versailles-Saint-Quentin University in August 2003. (author)
Emerging Nanophotonic Applications Explored with Advanced Scientific Parallel Computing

Science.gov (United States)

Meng, Xiang

The domain of nanoscale optical science and technology is a combination of the classical world of electromagnetics and the quantum mechanical regime of atoms and molecules. Recent advancements in fabrication technology allows the optical structures to be scaled down to nanoscale size or even to the atomic level, which are far smaller than the wavelength they are designed for. These nanostructures can have unique, controllable, and tunable optical properties and their interactions with quantum materials can have important near-field and far-field optical response. Undoubtedly, these optical properties can have many important applications, ranging from the efficient and tunable light sources, detectors, filters, modulators, high-speed all-optical switches; to the next-generation classical and quantum computation, and biophotonic medical sensors. This emerging research of nanoscience, known as nanophotonics, is a highly interdisciplinary field requiring expertise in materials science, physics, electrical engineering, and scientific computing, modeling and simulation. It has also become an important research field for investigating the science and engineering of light-matter interactions that take place on wavelength and subwavelength scales where the nature of the nanostructured matter controls the interactions. In addition, the fast advancements in the computing capabilities, such as parallel computing, also become as a critical element for investigating advanced nanophotonic devices. This role has taken on even greater urgency with the scale-down of device dimensions, and the design for these devices require extensive memory and extremely long core hours. Thus distributed computing platforms associated with parallel computing are required for faster designs processes. Scientific parallel computing constructs mathematical models and quantitative analysis techniques, and uses the computing machines to analyze and solve otherwise intractable scientific challenges. In
State of the art of parallel scientific visualization applications on PC clusters; Etat de l'art des applications de visualisation scientifique paralleles sur grappes de PC

Energy Technology Data Exchange (ETDEWEB)

Juliachs, M

2004-07-01

In this state of the art on parallel scientific visualization applications on PC clusters, we deal with both surface and volume rendering approaches. We first analyze available PC cluster configurations and existing parallel rendering software components for parallel graphics rendering. CEA/DIF has been studying cluster visualization since 2001. This report is part of a study to set up a new visualization research platform. This platform consisting of an eight-node PC cluster under Linux and a tiled display was installed in collaboration with Versailles-Saint-Quentin University in August 2003. (author)
State of the art of parallel scientific visualization applications on PC clusters; Etat de l'art des applications de visualisation scientifique paralleles sur grappes de PC

Energy Technology Data Exchange (ETDEWEB)

Juliachs, M

2004-07-01

In this state of the art on parallel scientific visualization applications on PC clusters, we deal with both surface and volume rendering approaches. We first analyze available PC cluster configurations and existing parallel rendering software components for parallel graphics rendering. CEA/DIF has been studying cluster visualization since 2001. This report is part of a study to set up a new visualization research platform. This platform consisting of an eight-node PC cluster under Linux and a tiled display was installed in collaboration with Versailles-Saint-Quentin University in August 2003. (author)
The BLAZE language - A parallel language for scientific programming

Science.gov (United States)

Mehrotra, Piyush; Van Rosendale, John

1987-01-01

A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.
The BLAZE language: A parallel language for scientific programming

Science.gov (United States)

Mehrotra, P.; Vanrosendale, J.

1985-01-01

A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.
Simplifying the parallelization of scientific codes by a function-centric approach in Python

International Nuclear Information System (INIS)

Nilsen, Jon K; Cai Xing; Langtangen, Hans Petter; Hoeyland, Bjoern

2010-01-01

The purpose of this paper is to show how existing scientific software can be parallelized using a separate thin layer of Python code where all parallelization-specific tasks are implemented. We provide specific examples of such a Python code layer, which can act as templates for parallelizing a wide set of serial scientific codes. The use of Python for parallelization is motivated by the fact that the language is well suited for reusing existing serial codes programmed in other languages. The extreme flexibility of Python with regard to handling functions makes it very easy to wrap up decomposed computational tasks of a serial scientific application as Python functions. Many parallelization-specific components can be implemented as generic Python functions, which may take as input those wrapped functions that perform concrete computational tasks. The overall programming effort needed by this parallelization approach is limited, and the resulting parallel Python scripts have a compact and clean structure. The usefulness of the parallelization approach is exemplified by three different classes of application in natural and social sciences.
On the Performance of the Python Programming Language for Serial and Parallel Scientific Computations

Directory of Open Access Journals (Sweden)

Xing Cai

2005-01-01

Full Text Available This article addresses the performance of scientific applications that use the Python programming language. First, we investigate several techniques for improving the computational efficiency of serial Python codes. Then, we discuss the basic programming techniques in Python for parallelizing serial scientific applications. It is shown that an efficient implementation of the array-related operations is essential for achieving good parallel performance, as for the serial case. Once the array-related operations are efficiently implemented, probably using a mixed-language implementation, good serial and parallel performance become achievable. This is confirmed by a set of numerical experiments. Python is also shown to be well suited for writing high-level parallel programs.
Compiler Technology for Parallel Scientific Computation

Directory of Open Access Journals (Sweden)

Can Özturan

1994-01-01

Full Text Available There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with equational programming language (EPL. Our approach is based on a program decomposition, parallel code synthesis, and run-time support for parallel scientific computation. The program decomposition is guided by the source program annotations provided by the user. The synthesis of parallel code is based on configurations that describe the overall computation as a set of interacting components. Run-time support is provided by the compiler-generated code that redistributes computation and data during object program execution. The generated parallel code is optimized using techniques of data alignment, operator placement, wavefront determination, and memory optimization. In this article we discuss annotations, configurations, parallel code generation, and run-time support suitable for parallel programs written in the functional parallel programming language EPL and in Fortran.
Vdebug: debugging tool for parallel scientific programs. Design report on vdebug

International Nuclear Information System (INIS)

Matsuda, Katsuyuki; Takemiya, Hiroshi

2000-02-01

We report on a debugging tool called vdebug which supports debugging work for parallel scientific simulation programs. It is difficult to debug scientific programs with an existing debugger, because the volume of data generated by the programs is too large for users to check data in characters. Usually, the existing debugger shows data values in characters. To alleviate it, we have developed vdebug which enables to check the validity of large amounts of data by showing these data values visually. Although targets of vdebug have been restricted to sequential programs, we have made it applicable to parallel programs by realizing the function of merging and visualizing data distributed on programs on each computer node. Now, vdebug works on seven kinds of parallel computers. In this report, we describe the design of vdebug. (author)
Speedup predictions on large scientific parallel programs

International Nuclear Information System (INIS)

Williams, E.; Bobrowicz, F.

1985-01-01

How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory
Parallel science and engineering applications the Charm++ approach

CERN Document Server

Kale, Laxmikant V

2016-01-01

Developed in the context of science and engineering applications, with each abstraction motivated by and further honed by specific application needs, Charm++ is a production-quality system that runs on almost all parallel computers available. Parallel Science and Engineering Applications: The Charm++ Approach surveys a diverse and scalable collection of science and engineering applications, most of which are used regularly on supercomputers by scientists to further their research. After a brief introduction to Charm++, the book presents several parallel CSE codes written in the Charm++ model, along with their underlying scientific and numerical formulations, explaining their parallelization strategies and parallel performance. These chapters demonstrate the versatility of Charm++ and its utility for a wide variety of applications, including molecular dynamics, cosmology, quantum chemistry, fracture simulations, agent-based simulations, and weather modeling. The book is intended for a wide audience of people i...
Parallelization of applications for networks with homogeneous and heterogeneous processors

International Nuclear Information System (INIS)

Colombet, L.

1994-01-01

The aim of this thesis is to study and develop efficient methods for parallelization of scientific applications on parallel computers with distributed memory. The first part presents two libraries of PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) communication tools. They allow implementation of programs on most parallel machines, but also on heterogeneous computer networks. This chapter illustrates the problems faced when trying to evaluate performances of networks with heterogeneous processors. To evaluate such performances, the concepts of speed-up and efficiency have been modified and adapted to account for heterogeneity. The second part deals with a study of parallel application libraries such as ScaLAPACK and with the development of communication masking techniques. The general concept is based on communication anticipation, in particular by pipelining message sending operations. Experimental results on Cray T3D and IBM SP1 machines validates the theoretical studies performed on basic algorithms of the libraries discussed above. Two examples of scientific applications are given: the first is a model of young stars for astrophysics and the other is a model of photon trajectories in the Compton effect. (J.S.). 83 refs., 65 figs., 24 tabs
Language interoperability for high-performance parallel scientific components

International Nuclear Information System (INIS)

Elliot, N; Kohn, S; Smolinski, B

1999-01-01

With the increasing complexity and interdisciplinary nature of scientific applications, code reuse is becoming increasingly important in scientific computing. One method for facilitating code reuse is the use of components technologies, which have been used widely in industry. However, components have only recently worked their way into scientific computing. Language interoperability is an important underlying technology for these component architectures. In this paper, we present an approach to language interoperability for a high-performance parallel, component architecture being developed by the Common Component Architecture (CCA) group. Our approach is based on Interface Definition Language (IDL) techniques. We have developed a Scientific Interface Definition Language (SIDL), as well as bindings to C and Fortran. We have also developed a SIDL compiler and run-time library support for reference counting, reflection, object management, and exception handling (Babel). Results from using Babel to call a standard numerical solver library (written in C) from C and Fortran show that the cost of using Babel is minimal, where as the savings in development time and the benefits of object-oriented development support for C and Fortran far outweigh the costs
Highly parallel machines and future of scientific computing

International Nuclear Information System (INIS)

Singh, G.S.

1992-01-01

Computing requirement of large scale scientific computing has always been ahead of what state of the art hardware could supply in the form of supercomputers of the day. And for any single processor system the limit to increase in the computing power was realized a few years back itself. Now with the advent of parallel computing systems the availability of machines with the required computing power seems a reality. In this paper the author tries to visualize the future large scale scientific computing in the penultimate decade of the present century. The author summarized trends in parallel computers and emphasize the need for a better programming environment and software tools for optimal performance. The author concludes this paper with critique on parallel architectures, software tools and algorithms. (author). 10 refs., 2 tabs
Dynamic file-access characteristics of a production parallel scientific workload

Science.gov (United States)

Kotz, David; Nieuwejaar, Nils

1994-01-01

Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to tremendous amounts of data in parallel to hundreds or thousands of processors. Most successful systems are based on a solid understanding of the expected workload, but thus far there have been no comprehensive workload characterizations of multiprocessor file systems. This paper presents the results of a three week tracing study in which all file-related activity on a massively parallel computer was recorded. Our instrumentation differs from previous efforts in that it collects information about every I/O request and about the mix of jobs running in a production environment. We also present the results of a trace-driven caching simulation and recommendations for designers of multiprocessor file systems.
Load Balancing Scientific Applications

Energy Technology Data Exchange (ETDEWEB)

Pearce, Olga Tkachyshyn [Texas A & M Univ., College Station, TX (United States)

2014-12-01

The largest supercomputers have millions of independent processors, and concurrency levels are rapidly increasing. For ideal efficiency, developers of the simulations that run on these machines must ensure that computational work is evenly balanced among processors. Assigning work evenly is challenging because many large modern parallel codes simulate behavior of physical systems that evolve over time, and their workloads change over time. Furthermore, the cost of imbalanced load increases with scale because most large-scale scientific simulations today use a Single Program Multiple Data (SPMD) parallel programming model, and an increasing number of processors will wait for the slowest one at the synchronization points. To address load imbalance, many large-scale parallel applications use dynamic load balance algorithms to redistribute work evenly. The research objective of this dissertation is to develop methods to decide when and how to load balance the application, and to balance it effectively and affordably. We measure and evaluate the computational load of the application, and develop strategies to decide when and how to correct the imbalance. Depending on the simulation, a fast, local load balance algorithm may be suitable, or a more sophisticated and expensive algorithm may be required. We developed a model for comparison of load balance algorithms for a specific state of the simulation that enables the selection of a balancing algorithm that will minimize overall runtime.
Application Portable Parallel Library

Science.gov (United States)

Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

1995-01-01

Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
Parallelization of applications for networks with homogeneous and heterogeneous processors; Parallelisation d`applications pour des reseaux de processeurs homogenes ou heterogenes

Energy Technology Data Exchange (ETDEWEB)

Colombet, L

1994-10-07

The aim of this thesis is to study and develop efficient methods for parallelization of scientific applications on parallel computers with distributed memory. The first part presents two libraries of PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) communication tools. They allow implementation of programs on most parallel machines, but also on heterogeneous computer networks. This chapter illustrates the problems faced when trying to evaluate performances of networks with heterogeneous processors. To evaluate such performances, the concepts of speed-up and efficiency have been modified and adapted to account for heterogeneity. The second part deals with a study of parallel application libraries such as ScaLAPACK and with the development of communication masking techniques. The general concept is based on communication anticipation, in particular by pipelining message sending operations. Experimental results on Cray T3D and IBM SP1 machines validates the theoretical studies performed on basic algorithms of the libraries discussed above. Two examples of scientific applications are given: the first is a model of young stars for astrophysics and the other is a model of photon trajectories in the Compton effect. (J.S.). 83 refs., 65 figs., 24 tabs.

Parallel Tensor Compression for Large-Scale Scientific Data.

Energy Technology Data Exchange (ETDEWEB)

Kolda, Tamara G. [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Ballard, Grey [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Austin, Woody Nathan [Univ. of Texas, Austin, TX (United States)

2015-10-01

As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data. By viewing the data as a dense five way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 10000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed memory parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.
The Galley Parallel File System

Science.gov (United States)

Nieuwejaar, Nils; Kotz, David

1996-01-01

Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.
Frontiers of massively parallel scientific computation

International Nuclear Information System (INIS)

Fischer, J.R.

1987-07-01

Practical applications using massively parallel computer hardware first appeared during the 1980s. Their development was motivated by the need for computing power orders of magnitude beyond that available today for tasks such as numerical simulation of complex physical and biological processes, generation of interactive visual displays, satellite image analysis, and knowledge based systems. Representative of the first generation of this new class of computers is the Massively Parallel Processor (MPP). A team of scientists was provided the opportunity to test and implement their algorithms on the MPP. The first results are presented. The research spans a broad variety of applications including Earth sciences, physics, signal and image processing, computer science, and graphics. The performance of the MPP was very good. Results obtained using the Connection Machine and the Distributed Array Processor (DAP) are presented
Parallel computing works

Energy Technology Data Exchange (ETDEWEB)

1991-10-23

An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
Scientific programming on massively parallel processor CP-PACS

International Nuclear Information System (INIS)

Boku, Taisuke

1998-01-01

The massively parallel processor CP-PACS takes various problems of calculation physics as the object, and it has been designed so that its architecture has been devised to do various numerical processings. In this report, the outline of the CP-PACS and the example of programming in the Kernel CG benchmark in NAS Parallel Benchmarks, version 1, are shown, and the pseudo vector processing mechanism and the parallel processing tuning of scientific and technical computation utilizing the three-dimensional hyper crossbar net, which are two great features of the architecture of the CP-PACS are described. As for the CP-PACS, the PUs based on RISC processor and added with pseudo vector processor are used. Pseudo vector processing is realized as the loop processing by scalar command. The features of the connection net of PUs are explained. The algorithm of the NPB version 1 Kernel CG is shown. The part that takes the time for processing most in the main loop is the product of matrix and vector (matvec), and the parallel processing of the matvec is explained. The time for the computation by the CPU is determined. As the evaluation of the performance, the evaluation of the time for execution, the short vector processing of pseudo vector processor based on slide window, and the comparison with other parallel computers are reported. (K.I.)
Parallel computing works!

CERN Document Server

Fox, Geoffrey C; Messina, Guiseppe C

2014-01-01

A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop
Efficient Machine Learning Approach for Optimizing Scientific Computing Applications on Emerging HPC Architectures

Energy Technology Data Exchange (ETDEWEB)

Arumugam, Kamesh [Old Dominion Univ., Norfolk, VA (United States)

2017-05-01

Efficient parallel implementations of scientific applications on multi-core CPUs with accelerators such as GPUs and Xeon Phis is challenging. This requires - exploiting the data parallel architecture of the accelerator along with the vector pipelines of modern x86 CPU architectures, load balancing, and efficient memory transfer between different devices. It is relatively easy to meet these requirements for highly structured scientific applications. In contrast, a number of scientific and engineering applications are unstructured. Getting performance on accelerators for these applications is extremely challenging because many of these applications employ irregular algorithms which exhibit data-dependent control-ow and irregular memory accesses. Furthermore, these applications are often iterative with dependency between steps, and thus making it hard to parallelize across steps. As a result, parallelism in these applications is often limited to a single step. Numerical simulation of charged particles beam dynamics is one such application where the distribution of work and memory access pattern at each time step is irregular. Applications with these properties tend to present significant branch and memory divergence, load imbalance between different processor cores, and poor compute and memory utilization. Prior research on parallelizing such irregular applications have been focused around optimizing the irregular, data-dependent memory accesses and control-ow during a single step of the application independent of the other steps, with the assumption that these patterns are completely unpredictable. We observed that the structure of computation leading to control-ow divergence and irregular memory accesses in one step is similar to that in the next step. It is possible to predict this structure in the current step by observing the computation structure of previous steps. In this dissertation, we present novel machine learning based optimization techniques to address
Piecewise - Parabolic Methods for Parallel Computation with Applications to Unstable Fluid Flow in 2 and 3 Dimensions

Energy Technology Data Exchange (ETDEWEB)

Woodward, P. R.

2003-03-26

This report summarizes the results of the project entitled, ''Piecewise-Parabolic Methods for Parallel Computation with Applications to Unstable Fluid Flow in 2 and 3 Dimensions'' This project covers a span of many years, beginning in early 1987. It has provided over that considerable period the core funding to my research activities in scientific computation at the University of Minnesota. It has supported numerical algorithm development, application of those algorithms to fundamental fluid dynamics problems in order to demonstrate their effectiveness, and the development of scientific visualization software and systems to extract scientific understanding from those applications.
Parallelism and Scalability in an Image Processing Application

DEFF Research Database (Denmark)

Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

2008-01-01

parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chip. This means that parallel processing is required in application areas that traditionally have not used...
Parallelism and Scalability in an Image Processing Application

DEFF Research Database (Denmark)

Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

2009-01-01

parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chips. This means that parallel processing is required in application areas that traditionally have not used...
BurstMem: A High-Performance Burst Buffer System for Scientific Applications

Energy Technology Data Exchange (ETDEWEB)

Wang, Teng [Auburn University, Auburn, Alabama; Oral, H Sarp [ORNL; Wang, Yandong [Auburn University, Auburn, Alabama; Settlemyer, Bradley W [ORNL; Atchley, Scott [ORNL; Yu, Weikuan [Auburn University, Auburn, Alabama

2014-01-01

The growth of computing power on large-scale sys- tems requires commensurate high-bandwidth I/O system. Many parallel file systems are designed to provide fast sustainable I/O in response to applications soaring requirements. To meet this need, a novel system is imperative to temporarily buffer the bursty I/O and gradually flush datasets to long-term parallel file systems. In this paper, we introduce the design of BurstMem, a high- performance burst buffer system. BurstMem provides a storage framework with efficient storage and communication manage- ment strategies. Our experiments demonstrate that BurstMem is able to speed up the I/O performance of scientific applications by up to 8.5 on leadership computer systems.
Parallel processing for fluid dynamics applications

International Nuclear Information System (INIS)

Johnson, G.M.

1989-01-01

The impact of parallel processing on computational science and, in particular, on computational fluid dynamics is growing rapidly. In this paper, particular emphasis is given to developments which have occurred within the past two years. Parallel processing is defined and the reasons for its importance in high-performance computing are reviewed. Parallel computer architectures are classified according to the number and power of their processing units, their memory, and the nature of their connection scheme. Architectures which show promise for fluid dynamics applications are emphasized. Fluid dynamics problems are examined for parallelism inherent at the physical level. CFD algorithms and their mappings onto parallel architectures are discussed. Several example are presented to document the performance of fluid dynamics applications on present-generation parallel processing devices
Productive Parallel Programming: The PCN Approach

Directory of Open Access Journals (Sweden)

Ian Foster

1992-01-01

Full Text Available We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel computer and run unchanged on supercomputers, and integrated debugging and performance analysis tools. We survey representative scientific applications and identify problem classes for which PCN has proved particularly useful.
SOFTWARE FOR DESIGNING PARALLEL APPLICATIONS

Directory of Open Access Journals (Sweden)

M. K. Bouza

2017-01-01

Full Text Available The object of research is the tools to support the development of parallel programs in C/C ++. The methods and software which automates the process of designing parallel applications are proposed.
Collectively loading an application in a parallel computer

Science.gov (United States)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.; Miller, Samuel J.; Mundy, Michael B.

2016-01-05

Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.
Designing scientific applications on GPUs

CERN Document Server

Couturier, Raphael

2013-01-01

Many of today's complex scientific applications now require a vast amount of computational power. General purpose graphics processing units (GPGPUs) enable researchers in a variety of fields to benefit from the computational power of all the cores available inside graphics cards.Understand the Benefits of Using GPUs for Many Scientific ApplicationsDesigning Scientific Applications on GPUs shows you how to use GPUs for applications in diverse scientific fields, from physics and mathematics to computer science. The book explains the methods necessary for designing or porting your scientific appl
Experiences in Data-Parallel Programming

Directory of Open Access Journals (Sweden)

Terry W. Clark

1997-01-01

Full Text Available To efficiently parallelize a scientific application with a data-parallel compiler requires certain structural properties in the source program, and conversely, the absence of others. A recent parallelization effort of ours reinforced this observation and motivated this correspondence. Specifically, we have transformed a Fortran 77 version of GROMOS, a popular dusty-deck program for molecular dynamics, into Fortran D, a data-parallel dialect of Fortran. During this transformation we have encountered a number of difficulties that probably are neither limited to this particular application nor do they seem likely to be addressed by improved compiler technology in the near future. Our experience with GROMOS suggests a number of points to keep in mind when developing software that may at some time in its life cycle be parallelized with a data-parallel compiler. This note presents some guidelines for engineering data-parallel applications that are compatible with Fortran D or High Performance Fortran compilers.
Performance of the Galley Parallel File System

Science.gov (United States)

Nieuwejaar, Nils; Kotz, David

1996-01-01

As the input/output (I/O) needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. This interface conceals the parallism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. Initial experiments, reported in this paper, indicate that Galley is capable of providing high-performance 1/O to applications the applications that rely on them. In Section 3 we describe that access data in patterns that have been observed to be common.
Scientific Applications Performance Evaluation on Burst Buffer

KAUST Repository

Markomanolis, George S.

2017-10-19

Parallel I/O is an integral component of modern high performance computing, especially in storing and processing very large datasets, such as the case of seismic imaging, CFD, combustion and weather modeling. The storage hierarchy includes nowadays additional layers, the latest being the usage of SSD-based storage as a Burst Buffer for I/O acceleration. We present an in-depth analysis on how to use Burst Buffer for specific cases and how the internal MPI I/O aggregators operate according to the options that the user provides during his job submission. We analyze the performance of a range of I/O intensive scientific applications, at various scales on a large installation of Lustre parallel file system compared to an SSD-based Burst Buffer. Our results show a performance improvement over Lustre when using Burst Buffer. Moreover, we show results from a data hierarchy library which indicate that the standard I/O approaches are not enough to get the expected performance from this technology. The performance gain on the total execution time of the studied applications is between 1.16 and 3 times compared to Lustre. One of the test cases achieved an impressive I/O throughput of 900 GB/s on Burst Buffer.
Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis.

Science.gov (United States)

Kindlmann, Gordon; Chiw, Charisee; Seltzer, Nicholas; Samuels, Lamont; Reppy, John

2016-01-01

Many algorithms for scientific visualization and image analysis are rooted in the world of continuous scalar, vector, and tensor fields, but are programmed in low-level languages and libraries that obscure their mathematical foundations. Diderot is a parallel domain-specific language that is designed to bridge this semantic gap by providing the programmer with a high-level, mathematical programming notation that allows direct expression of mathematical concepts in code. Furthermore, Diderot provides parallel performance that takes advantage of modern multicore processors and GPUs. The high-level notation allows a concise and natural expression of the algorithms and the parallelism allows efficient execution on real-world datasets.

Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts

Energy Technology Data Exchange (ETDEWEB)

NONE

1997-12-31

This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.
Design strategies for irregularly adapting parallel applications

International Nuclear Information System (INIS)

Oliker, Leonid; Biswas, Rupak; Shan, Hongzhang; Sing, Jaswinder Pal

2000-01-01

Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance of dynamically adapting computations. In this work, we examine two major classes of adaptive applications, under five competing programming methodologies and four leading parallel architectures. Results indicate that it is possible to achieve message-passing performance using shared-memory programming techniques by carefully following the same high level strategies. Adaptive applications have computational work loads and communication patterns which change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications are therefore a challenging task. This work examines the implementation of two typical adaptive applications, Dynamic Remeshing and N-Body, across various programming paradigms and architectural platforms. We compare several critical factors of the parallel code development, including performance, programmability, scalability, algorithmic development, and portability
Aspects of computation on asynchronous parallel processors

International Nuclear Information System (INIS)

Wright, M.

1989-01-01

The increasing availability of asynchronous parallel processors has provided opportunities for original and useful work in scientific computing. However, the field of parallel computing is still in a highly volatile state, and researchers display a wide range of opinion about many fundamental questions such as models of parallelism, approaches for detecting and analyzing parallelism of algorithms, and tools that allow software developers and users to make effective use of diverse forms of complex hardware. This volume collects the work of researchers specializing in different aspects of parallel computing, who met to discuss the framework and the mechanics of numerical computing. The far-reaching impact of high-performance asynchronous systems is reflected in the wide variety of topics, which include scientific applications (e.g. linear algebra, lattice gauge simulation, ordinary and partial differential equations), models of parallelism, parallel language features, task scheduling, automatic parallelization techniques, tools for algorithm development in parallel environments, and system design issues
SMARTS: Exploiting Temporal Locality and Parallelism through Vertical Execution

International Nuclear Information System (INIS)

Beckman, P.; Crotinger, J.; Karmesin, S.; Malony, A.; Oldehoeft, R.; Shende, S.; Smith, S.; Vajracharya, S.

1999-01-01

In the solution of large-scale numerical prob- lems, parallel computing is becoming simultaneously more important and more difficult. The complex organization of today's multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely com- plex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of manag- ing parallelism and data locality from the user. We present innovative algorithms, based on the macro -dataflow model, for detecting data parallelism and efficiently executing data- parallel statements on shared-memory multiprocessors. We also desclibe how these algorithms can be implemented on clusters of SMPS
SMARTS: Exploiting Temporal Locality and Parallelism through Vertical Execution

Energy Technology Data Exchange (ETDEWEB)

Beckman, P.; Crotinger, J.; Karmesin, S.; Malony, A.; Oldehoeft, R.; Shende, S.; Smith, S.; Vajracharya, S.

1999-01-04

In the solution of large-scale numerical prob- lems, parallel computing is becoming simultaneously more important and more difficult. The complex organization of today's multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely com- plex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of manag- ing parallelism and data locality from the user. We present innovative algorithms, based on the macro -dataflow model, for detecting data parallelism and efficiently executing data- parallel statements on shared-memory multiprocessors. We also desclibe how these algorithms can be implemented on clusters of SMPS.
Parallel processing is good for your scientific codes...But massively parallel processing is so much better

International Nuclear Information System (INIS)

Thomas, B.; Domain, Ch.; Souffez, Y.; Eon-Duval, P.

1998-01-01

Harnessing the power of many computers, to solve concurrently difficult scientific problems, is one of the most innovative trend in High Performance Computing. At EDF, we have invested in parallel computing and have achieved significant results. First we improved the processing speed of strategic codes, in order to extend their scope. Then we turned to numerical simulations at the atomic scale. These computations, we never dreamt of before, provided us with a better understanding of metallurgic phenomena. More precisely we were able to trace defects in alloys that are used in nuclear power plants. (author)
Applications of Parallel Processing in Mobile Banking

Directory of Open Access Journals (Sweden)

2007-01-01

Full Text Available The future of mobile banking will be represented by such applications that support mobile, Internet banking and EFT (Electronic Funds Transfer transactions in a single user interface. In such a way, the mobile banking will be able to cover all the types of applications demanded at the market level. The parallel processing of credit card bank transactions could be performed with the help of a grid network. Excluding some limitations, the grid processing offers huge opportunities to exploit the parallelism. For this reason, a lot of applications of waiting queues in grid processing were developed in the last years. Grid networks represent a distinctive and very modern field of the parallel and distributed processing.
Scalable Parallel Distributed Coprocessor System for Graph Searching Problems with Massive Data

Directory of Open Access Journals (Sweden)

Wanrong Huang

2017-01-01

Full Text Available The Internet applications, such as network searching, electronic commerce, and modern medical applications, produce and process massive data. Considerable data parallelism exists in computation processes of data-intensive applications. A traversal algorithm, breadth-first search (BFS, is fundamental in many graph processing applications and metrics when a graph grows in scale. A variety of scientific programming methods have been proposed for accelerating and parallelizing BFS because of the poor temporal and spatial locality caused by inherent irregular memory access patterns. However, new parallel hardware could provide better improvement for scientific methods. To address small-world graph problems, we propose a scalable and novel field-programmable gate array-based heterogeneous multicore system for scientific programming. The core is multithread for streaming processing. And the communication network InfiniBand is adopted for scalability. We design a binary search algorithm to address mapping to unify all processor addresses. Within the limits permitted by the Graph500 test bench after 1D parallel hybrid BFS algorithm testing, our 8-core and 8-thread-per-core system achieved superior performance and efficiency compared with the prior work under the same degree of parallelism. Our system is efficient not as a special acceleration unit but as a processor platform that deals with graph searching applications.
Application of Pfortran and Co-Array Fortran in the Parallelization of the GROMOS96 Molecular Dynamics Module

Directory of Open Access Journals (Sweden)

Piotr Bała

2001-01-01

Full Text Available After at least a decade of parallel tool development, parallelization of scientific applications remains a significant undertaking. Typically parallelization is a specialized activity supported only partially by the programming tool set, with the programmer involved with parallel issues in addition to sequential ones. The details of concern range from algorithm design down to low-level data movement details. The aim of parallel programming tools is to automate the latter without sacrificing performance and portability, allowing the programmer to focus on algorithm specification and development. We present our use of two similar parallelization tools, Pfortran and Cray's Co-Array Fortran, in the parallelization of the GROMOS96 molecular dynamics module. Our parallelization started from the GROMOS96 distribution's shared-memory implementation of the replicated algorithm, but used little of that existing parallel structure. Consequently, our parallelization was close to starting with the sequential version. We found the intuitive extensions to Pfortran and Co-Array Fortran helpful in the rapid parallelization of the project. We present performance figures for both the Pfortran and Co-Array Fortran parallelizations showing linear speedup within the range expected by these parallelization methods.
File-System Workload on a Scientific Multiprocessor

Science.gov (United States)

Kotz, David; Nieuwejaar, Nils

1995-01-01

Many scientific applications have intense computational and I/O requirements. Although multiprocessors have permitted astounding increases in computational performance, the formidable I/O needs of these applications cannot be met by current multiprocessors a their I/O subsystems. To prevent I/O subsystems from forever bottlenecking multiprocessors and limiting the range of feasible applications, new I/O subsystems must be designed. The successful design of computer systems (both hardware and software) depends on a thorough understanding of their intended use. A system designer optimizes the policies and mechanisms for the cases expected to most common in the user's workload. In the case of multiprocessor file systems, however, designers have been forced to build file systems based only on speculation about how they would be used, extrapolating from file-system characterizations of general-purpose workloads on uniprocessor and distributed systems or scientific workloads on vector supercomputers (see sidebar on related work). To help these system designers, in June 1993 we began the Charisma Project, so named because the project sought to characterize 1/0 in scientific multiprocessor applications from a variety of production parallel computing platforms and sites. The Charisma project is unique in recording individual read and write requests-in live, multiprogramming, parallel workloads (rather than from selected or nonparallel applications). In this article, we present the first results from the project: a characterization of the file-system workload an iPSC/860 multiprocessor running production, parallel scientific applications at NASA's Ames Research Center.
AMRZone: A Runtime AMR Data Sharing Framework For Scientific Applications

Energy Technology Data Exchange (ETDEWEB)

Zhang, Wenzhao; Tang, Houjun; Harenberg, Steven; Byna, Suren; Zou, Xiaocheng; Devendran, Dharshi; Martin, Daniel; Wu, Kesheng; Dong, Bin; Klasky, Scott; Samatova, Nagiza

2017-08-31

Frameworks that facilitate runtime data sharing across multiple applications are of great importance for scientific data analytics. Although existing frameworks work well over uniform mesh data, they can not effectively handle adaptive mesh refinement (AMR) data. Among the challenges to construct an AMR-capable framework include: (1) designing an architecture that facilitates online AMR data management; (2) achieving a load-balanced AMR data distribution for the data staging space at runtime; and (3) building an effective online index to support the unique spatial data retrieval requirements for AMR data. Towards addressing these challenges to support runtime AMR data sharing across scientific applications, we present the AMRZone framework. Experiments over real-world AMR datasets demonstrate AMRZone's effectiveness at achieving a balanced workload distribution, reading/writing large-scale datasets with thousands of parallel processes, and satisfying queries with spatial constraints. Moreover, AMRZone's performance and scalability are even comparable with existing state-of-the-art work when tested over uniform mesh data with up to 16384 cores; in the best case, our framework achieves a 46% performance improvement.
Overview of the Force Scientific Parallel Language

Directory of Open Access Journals (Sweden)

Gita Alaghband

1994-01-01

Full Text Available The Force parallel programming language designed for large-scale shared-memory multiprocessors is presented. The language provides a number of parallel constructs as extensions to the ordinary Fortran language and is implemented as a two-level macro preprocessor to support portability across shared memory multiprocessors. The global parallelism model on which the Force is based provides a powerful parallel language. The parallel constructs, generic synchronization, and freedom from process management supported by the Force has resulted in structured parallel programs that are ported to the many multiprocessors on which the Force is implemented. Two new parallel constructs for looping and functional decomposition are discussed. Several programming examples to illustrate some parallel programming approaches using the Force are also presented.
Parallelization and checkpointing of GPU applications through program transformation

Energy Technology Data Exchange (ETDEWEB)

Solano-Quinde, Lizandro Damian [Iowa State Univ., Ames, IA (United States)

2012-01-01

GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solve the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and
An object-oriented programming paradigm for parallelization of computational fluid dynamics

International Nuclear Information System (INIS)

Ohta, Takashi.

1997-03-01

We propose an object-oriented programming paradigm for parallelization of scientific computing programs, and show that the approach can be a very useful strategy. Generally, parallelization of scientific programs tends to be complicated and unportable due to the specific requirements of each parallel computer or compiler. In this paper, we show that the object-oriented programming design, which separates the parallel processing parts from the solver of the applications, can achieve the large improvement in the maintenance of the codes, as well as the high portability. We design the program for the two-dimensional Euler equations according to the paradigm, and evaluate the parallel performance on IBM SP2. (author)
Parallel computation for solving the tridiagonal linear system of equations

International Nuclear Information System (INIS)

Ishiguro, Misako; Harada, Hiroo; Fujii, Minoru; Fujimura, Toichiro; Nakamura, Yasuhiro; Nanba, Katsumi.

1981-09-01

Recently, applications of parallel computation for scientific calculations have increased from the need of the high speed calculation of large scale programs. At the JAERI computing center, an array processor FACOM 230-75 APU has installed to study the applicability of parallel computation for nuclear codes. We made some numerical experiments by using the APU on the methods of solution of tridiagonal linear equation which is an important problem in scientific calculations. Referring to the recent papers with parallel methods, we investigate eight ones. These are Gauss elimination method, Parallel Gauss method, Accelerated parallel Gauss method, Jacobi method, Recursive doubling method, Cyclic reduction method, Chebyshev iteration method, and Conjugate gradient method. The computing time and accuracy were compared among the methods on the basis of the numerical experiments. As the result, it is found that the Cyclic reduction method is best both in computing time and accuracy and the Gauss elimination method is the second one. (author)
Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

Science.gov (United States)

Quealy, Angela; Cole, Gary L.; Blech, Richard A.

1993-01-01

The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.
Performance Analysis of Parallel Mathematical Subroutine library PARCEL

International Nuclear Information System (INIS)

Yamada, Susumu; Shimizu, Futoshi; Kobayashi, Kenichi; Kaburaki, Hideo; Kishida, Norio

2000-01-01

The parallel mathematical subroutine library PARCEL (Parallel Computing Elements) has been developed by Japan Atomic Energy Research Institute for easy use of typical parallelized mathematical codes in any application problems on distributed parallel computers. The PARCEL includes routines for linear equations, eigenvalue problems, pseudo-random number generation, and fast Fourier transforms. It is shown that the results of performance for linear equations routines exhibit good parallelization efficiency on vector, as well as scalar, parallel computers. A comparison of the efficiency results with the PETSc (Portable Extensible Tool kit for Scientific Computations) library has been reported. (author)
Conceptual Teaching Based on Scientific Storyline Method and Conceptual Change Texts: Latitude-Parallel Concepts

Science.gov (United States)

Uzunöz, Abdulkadir

2018-01-01

The purpose of this study is to identify the conceptual mistakes frequently encountered in teaching geography such as latitude-parallel concepts, and to prepare conceptual change text based on the Scientific Storyline Method, in order to resolve the identified misconceptions. In this study, the special case method, which is one of the qualitative…
User-friendly parallelization of GAUDI applications with Python

International Nuclear Information System (INIS)

Mato, Pere; Smith, Eoin

2010-01-01

GAUDI is a software framework in C++ used to build event data processing applications using a set of standard components with well-defined interfaces. Simulation, high-level trigger, reconstruction, and analysis programs used by several experiments are developed using GAUDI. These applications can be configured and driven by simple Python scripts. Given the fact that a considerable amount of existing software has been developed using serial methodology, and has existed in some cases for many years, implementation of parallelisation techniques at the framework level may offer a way of exploiting current multi-core technologies to maximize performance and reduce latencies without re-writing thousands/millions of lines of code. In the solution we have developed, the parallelization techniques are introduced to the high level Python scripts which configure and drive the applications, such that the core C++ application code requires no modification, and that end users need make only minimal changes to their scripts. The developed solution leverages from existing generic Python modules that support parallel processing. Naturally, the parallel version of a given program should produce results consistent with its serial execution. The evaluation of several prototypes incorporating various parallelization techniques are presented and discussed.
User-friendly parallelization of GAUDI applications with Python

Energy Technology Data Exchange (ETDEWEB)

Mato, Pere; Smith, Eoin, E-mail: pere.mato@cern.c [PH Department, CERN, 1211 Geneva 23 (Switzerland)

2010-04-01

GAUDI is a software framework in C++ used to build event data processing applications using a set of standard components with well-defined interfaces. Simulation, high-level trigger, reconstruction, and analysis programs used by several experiments are developed using GAUDI. These applications can be configured and driven by simple Python scripts. Given the fact that a considerable amount of existing software has been developed using serial methodology, and has existed in some cases for many years, implementation of parallelisation techniques at the framework level may offer a way of exploiting current multi-core technologies to maximize performance and reduce latencies without re-writing thousands/millions of lines of code. In the solution we have developed, the parallelization techniques are introduced to the high level Python scripts which configure and drive the applications, such that the core C++ application code requires no modification, and that end users need make only minimal changes to their scripts. The developed solution leverages from existing generic Python modules that support parallel processing. Naturally, the parallel version of a given program should produce results consistent with its serial execution. The evaluation of several prototypes incorporating various parallelization techniques are presented and discussed.

Scientific data analysis on data-parallel platforms.

Energy Technology Data Exchange (ETDEWEB)

Ulmer, Craig D.; Bayer, Gregory W.; Choe, Yung Ryn; Roe, Diana C.

2010-09-01

As scientific computing users migrate to petaflop platforms that promise to generate multi-terabyte datasets, there is a growing need in the community to be able to embed sophisticated analysis algorithms in the computing platforms' storage systems. Data Warehouse Appliances (DWAs) are attractive for this work, due to their ability to store and process massive datasets efficiently. While DWAs have been utilized effectively in data-mining and informatics applications, they remain largely unproven in scientific workloads. In this paper we present our experiences in adapting two mesh analysis algorithms to function on five different DWA architectures: two Netezza database appliances, an XtremeData dbX database, a LexisNexis DAS, and multiple Hadoop MapReduce clusters. The main contribution of this work is insight into the differences between these DWAs from a user's perspective. In addition, we present performance measurements for ten DWA systems to help understand the impact of different architectural trade-offs in these systems.
Parallelization methods study of thermal-hydraulics codes

International Nuclear Information System (INIS)

Gaudart, Catherine

2000-01-01

The variety of parallelization methods and machines leads to a wide selection for programmers. In this study we suggest, in an industrial context, some solutions from the experience acquired through different parallelization methods. The study is about several scientific codes which simulate a large variety of thermal-hydraulics phenomena. A bibliography on parallelization methods and a first analysis of the codes showed the difficulty of our process on the whole applications to study. Therefore, it would be necessary to identify and extract a representative part of these applications and parallelization methods. The linear solver part of the codes forced itself. On this particular part several parallelization methods had been used. From these developments one could estimate the necessary work for a non initiate programmer to parallelize his application, and the impact of the development constraints. The different methods of parallelization tested are the numerical library PETSc, the parallelizer PAF, the language HPF, the formalism PEI and the communications library MPI and PYM. In order to test several methods on different applications and to follow the constraint of minimization of the modifications in codes, a tool called SPS (Server of Parallel Solvers) had be developed. We propose to describe the different constraints about the optimization of codes in an industrial context, to present the solutions given by the tool SPS, to show the development of the linear solver part with the tested parallelization methods and lastly to compare the results against the imposed criteria. (author) [fr
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

Science.gov (United States)

Sun, Xian-He

1997-01-01

Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm
Running parallel applications with topology-aware grid middleware

NARCIS (Netherlands)

Bar, P.; Coti, C.; Groen, D.; Herault, T.; Kravtsov, V.; Schuster, A; Swain, M.

2009-01-01

The concept of topology-aware grid applications is derived from parallelized computational models of complex systems that are executed on heterogeneous resources, either because they require specialized hardware for certain calculations, or because their parallelization is flexible enough to exploit
Massively Parallel Computing at Sandia and Its Application to National Defense

National Research Council Canada - National Science Library

Dosanjh, Sudip

1991-01-01

Two years ago, researchers at Sandia National Laboratories showed that a massively parallel computer with 1024 processors could solve scientific problems more than 1000 times faster than a single processor...
Benefits of Parallel I/O in Ab Initio Nuclear Physics Calculations

International Nuclear Information System (INIS)

Laghave, Nikhil; Sosonkina, Masha; Maris, Pieter; Vary, James P.

2009-01-01

Many modern scientific applications rely on highly parallel calculations, which scale to 10's of thousands processors. However, most applications do not concentrate on parallelizing input/output operations. In particular, sequential I/O has been identified as a bottleneck for the highly scalable MFDn (Many Fermion Dynamics for nuclear structure) code performing ab initio nuclear structure calculations. In this paper, we develop interfaces and parallel I/O procedures to use a well-known parallel I/O library in MFDn. As a result, we gain efficient input/output of large datasets along with their portability and ease of use in the downstream processing.
III - Template Metaprogramming for massively parallel scientific computing - Templates for Iteration; Thread-level Parallelism

CERN Multimedia

CERN. Geneva

2016-01-01

Large scale scientific computing raises questions on different levels ranging from the fomulation of the problems to the choice of the best algorithms and their implementation for a specific platform. There are similarities in these different topics that can be exploited by modern-style C++ template metaprogramming techniques to produce readable, maintainable and generic code. Traditional low-level code tend to be fast but platform-dependent, and it obfuscates the meaning of the algorithm. On the other hand, object-oriented approach is nice to read, but may come with an inherent performance penalty. These lectures aim to present he basics of the Expression Template (ET) idiom which allows us to keep the object-oriented approach without sacrificing performance. We will in particular show to to enhance ET to include SIMD vectorization. We will then introduce techniques for abstracting iteration, and introduce thread-level parallelism for use in heavy data-centric loads. We will show to to apply these methods i...
Performance Characteristics of Hybrid MPI/OpenMP Scientific Applications on a Large-Scale Multithreaded BlueGene/Q Supercomputer

KAUST Repository

Wu, Xingfu; Taylor, Valerie

2013-01-01

In this paper, we investigate the performance characteristics of five hybrid MPI/OpenMP scientific applications (two NAS Parallel benchmarks Multi-Zone SP-MZ and BT-MZ, an earthquake simulation PEQdyna, an aerospace application PMLB and a 3D particle-in-cell application GTC) on a large-scale multithreaded Blue Gene/Q supercomputer at Argonne National laboratory, and quantify the performance gap resulting from using different number of threads per node. We use performance tools and MPI profile and trace libraries available on the supercomputer to analyze and compare the performance of these hybrid scientific applications with increasing the number OpenMP threads per node, and find that increasing the number of threads to some extent saturates or worsens performance of these hybrid applications. For the strong-scaling hybrid scientific applications such as SP-MZ, BT-MZ, PEQdyna and PLMB, using 32 threads per node results in much better application efficiency than using 64 threads per node, and as increasing the number of threads per node, the FPU (Floating Point Unit) percentage decreases, and the MPI percentage (except PMLB) and IPC (Instructions per cycle) per core (except BT-MZ) increase. For the weak-scaling hybrid scientific application such as GTC, the performance trend (relative speedup) is very similar with increasing number of threads per node no matter how many nodes (32, 128, 512) are used. © 2013 IEEE.
Performance Characteristics of Hybrid MPI/OpenMP Scientific Applications on a Large-Scale Multithreaded BlueGene/Q Supercomputer

KAUST Repository

Wu, Xingfu

2013-07-01

In this paper, we investigate the performance characteristics of five hybrid MPI/OpenMP scientific applications (two NAS Parallel benchmarks Multi-Zone SP-MZ and BT-MZ, an earthquake simulation PEQdyna, an aerospace application PMLB and a 3D particle-in-cell application GTC) on a large-scale multithreaded Blue Gene/Q supercomputer at Argonne National laboratory, and quantify the performance gap resulting from using different number of threads per node. We use performance tools and MPI profile and trace libraries available on the supercomputer to analyze and compare the performance of these hybrid scientific applications with increasing the number OpenMP threads per node, and find that increasing the number of threads to some extent saturates or worsens performance of these hybrid applications. For the strong-scaling hybrid scientific applications such as SP-MZ, BT-MZ, PEQdyna and PLMB, using 32 threads per node results in much better application efficiency than using 64 threads per node, and as increasing the number of threads per node, the FPU (Floating Point Unit) percentage decreases, and the MPI percentage (except PMLB) and IPC (Instructions per cycle) per core (except BT-MZ) increase. For the weak-scaling hybrid scientific application such as GTC, the performance trend (relative speedup) is very similar with increasing number of threads per node no matter how many nodes (32, 128, 512) are used. © 2013 IEEE.
A development framework for parallel CFD applications: TRIOU project

International Nuclear Information System (INIS)

Calvin, Ch.

2003-01-01

We present in this paper the parallel structure of a thermal-hydraulic framework: Trio-U. This development platform has been designed in order to solve large 3-dimensional structured or unstructured CFD (computational fluid dynamics) problems. The code is intrinsically parallel, and an object-oriented design, UML, is used. The implementation language chosen is C++. All the parallelism management and the communication routines have been encapsulated. Parallel I/O and communication classes over standard I/O streams of C++ have been defined, which allows the developer an easy use of the different modules of the application without dealing with basic parallel process management and communications. Moreover, the encapsulation of the communication routines, guarantees the portability of the application and allows an efficient tuning of basic communication methods in order to achieve the best performances of the target architecture. The speed-up of parallel applications designed using the Trio U framework are very good since we obtained, for instance, on complex turbulent flow Large Eddy Simulation (LES) simulations an efficiency of up to 90% on 20 processors. The efficiencies obtained on direct numerical simulations of two phase flow fluids are similar since the speed-up is nearly equals to 7.5 for a 3-dimensional simulation using a one million element mesh on 8 processors. The purpose of this paper is to focus on the main concepts and their implementation that were the guidelines of the design of the parallel architecture of the code. (author)
Teaching Scientific Computing: A Model-Centered Approach to Pipeline and Parallel Programming with C

Directory of Open Access Journals (Sweden)

Vladimiras Dolgopolovas

2015-01-01

Full Text Available The aim of this study is to present an approach to the introduction into pipeline and parallel computing, using a model of the multiphase queueing system. Pipeline computing, including software pipelines, is among the key concepts in modern computing and electronics engineering. The modern computer science and engineering education requires a comprehensive curriculum, so the introduction to pipeline and parallel computing is the essential topic to be included in the curriculum. At the same time, the topic is among the most motivating tasks due to the comprehensive multidisciplinary and technical requirements. To enhance the educational process, the paper proposes a novel model-centered framework and develops the relevant learning objects. It allows implementing an educational platform of constructivist learning process, thus enabling learners’ experimentation with the provided programming models, obtaining learners’ competences of the modern scientific research and computational thinking, and capturing the relevant technical knowledge. It also provides an integral platform that allows a simultaneous and comparative introduction to pipelining and parallel computing. The programming language C for developing programming models and message passing interface (MPI and OpenMP parallelization tools have been chosen for implementation.
Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters

Energy Technology Data Exchange (ETDEWEB)

Yoo, Wucherl [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Koo, Michelle [Univ. of California, Berkeley, CA (United States); Cao, Yu [California Inst. of Technology (CalTech), Pasadena, CA (United States); Sim, Alex [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Nugent, Peter [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Berkeley, CA (United States); Wu, Kesheng [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

2016-09-17

Big data is prevalent in HPC computing. Many HPC projects rely on complex workflows to analyze terabytes or petabytes of data. These workflows often require running over thousands of CPU cores and performing simultaneous data accesses, data movements, and computation. It is challenging to analyze the performance involving terabytes or petabytes of workflow data or measurement data of the executions, from complex workflows over a large number of nodes and multiple parallel task executions. To help identify performance bottlenecks or debug the performance issues in large-scale scientific applications and scientific clusters, we have developed a performance analysis framework, using state-ofthe- art open-source big data processing tools. Our tool can ingest system logs and application performance measurements to extract key performance features, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of the big data analysis framework, we conduct case studies on the workflows from an astronomy project known as the Palomar Transient Factory (PTF) and the job logs from the genome analysis scientific cluster. Our study processed many terabytes of system logs and application performance measurements collected on the HPC systems at NERSC. The implementation of our tool is generic enough to be used for analyzing the performance of other HPC systems and Big Data workows.
Parallel Object-Oriented Computation Applied to a Finite Element Problem

Directory of Open Access Journals (Sweden)

Jon B. Weissman

1993-01-01

Full Text Available The conventional wisdom in the scientific computing community is that the best way to solve large-scale numerically intensive scientific problems on today's parallel MIMD computers is to use Fortran or C programmed in a data-parallel style using low-level message-passing primitives. This approach inevitably leads to nonportable codes and extensive development time, and restricts parallel programming to the domain of the expert programmer. We believe that these problems are not inherent to parallel computing but are the result of the programming tools used. We will show that comparable performance can be achieved with little effort if better tools that present higher level abstractions are used. The vehicle for our demonstration is a 2D electromagnetic finite element scattering code we have implemented in Mentat, an object-oriented parallel processing system. We briefly describe the application. Mentat, the implementation, and present performance results for both a Mentat and a hand-coded parallel Fortran version.
Parallel computing in genomic research: advances and applications

Directory of Open Access Journals (Sweden)

Ocaña K

2015-11-01

Full Text Available Kary Ocaña,1 Daniel de Oliveira2 1National Laboratory of Scientific Computing, Petrópolis, Rio de Janeiro, 2Institute of Computing, Fluminense Federal University, Niterói, Brazil Abstract: Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities. Keywords: high-performance computing, genomic research, cloud computing, grid computing, cluster computing, parallel computing
Numerical Platon: A unified linear equation solver interface by CEA for solving open foe scientific applications

International Nuclear Information System (INIS)

Secher, Bernard; Belliard, Michel; Calvin, Christophe

2009-01-01

This paper describes a tool called 'Numerical Platon' developed by the French Atomic Energy Commission (CEA). It provides a freely available (GNU LGPL license) interface for coupling scientific computing applications to various freeware linear solver libraries (essentially PETSc, SuperLU and HyPre), together with some proprietary CEA solvers, for high-performance computers that may be used in industrial software written in various programming languages. This tool was developed as part of considerable efforts by the CEA Nuclear Energy Division in the past years to promote massively parallel software and on-shelf parallel tools to help develop new generation simulation codes. After the presentation of the package architecture and the available algorithms, we show examples of how Numerical Platon is used in sequential and parallel CEA codes. Comparing with in-house solvers, the gain in terms of increases in computation capacities or in terms of parallel performances is notable, without considerable extra development cost
Numerical Platon: A unified linear equation solver interface by CEA for solving open foe scientific applications

Energy Technology Data Exchange (ETDEWEB)

Secher, Bernard [French Atomic Energy Commission (CEA), Nuclear Energy Division (DEN) (France); CEA Saclay DM2S/SFME/LGLS, Bat. 454, F-91191 Gif-sur-Yvette Cedex (France)], E-mail: bsecher@cea.fr; Belliard, Michel [French Atomic Energy Commission (CEA), Nuclear Energy Division (DEN) (France); CEA Cadarache DER/SSTH/LMDL, Bat. 238, F-13108 Saint-Paul-lez-Durance Cedex (France); Calvin, Christophe [French Atomic Energy Commission (CEA), Nuclear Energy Division (DEN) (France); CEA Saclay DM2S/SERMA/LLPR, Bat. 470, F-91191 Gif-sur-Yvette Cedex (France)

2009-01-15

This paper describes a tool called 'Numerical Platon' developed by the French Atomic Energy Commission (CEA). It provides a freely available (GNU LGPL license) interface for coupling scientific computing applications to various freeware linear solver libraries (essentially PETSc, SuperLU and HyPre), together with some proprietary CEA solvers, for high-performance computers that may be used in industrial software written in various programming languages. This tool was developed as part of considerable efforts by the CEA Nuclear Energy Division in the past years to promote massively parallel software and on-shelf parallel tools to help develop new generation simulation codes. After the presentation of the package architecture and the available algorithms, we show examples of how Numerical Platon is used in sequential and parallel CEA codes. Comparing with in-house solvers, the gain in terms of increases in computation capacities or in terms of parallel performances is notable, without considerable extra development cost.
Efficient Use of Distributed Systems for Scientific Applications

Science.gov (United States)

Taylor, Valerie; Chen, Jian; Canfield, Thomas; Richard, Jacques

2000-01-01

Distributed computing has been regarded as the future of high performance computing. Nationwide high speed networks such as vBNS are becoming widely available to interconnect high-speed computers, virtual environments, scientific instruments and large data sets. One of the major issues to be addressed with distributed systems is the development of computational tools that facilitate the efficient execution of parallel applications on such systems. These tools must exploit the heterogeneous resources (networks and compute nodes) in distributed systems. This paper presents a tool, called PART, which addresses this issue for mesh partitioning. PART takes advantage of the following heterogeneous system features: (1) processor speed; (2) number of processors; (3) local network performance; and (4) wide area network performance. Further, different finite element applications under consideration may have different computational complexities, different communication patterns, and different element types, which also must be taken into consideration when partitioning. PART uses parallel simulated annealing to partition the domain, taking into consideration network and processor heterogeneity. The results of using PART for an explicit finite element application executing on two IBM SPs (located at Argonne National Laboratory and the San Diego Supercomputer Center) indicate an increase in efficiency by up to 36% as compared to METIS, a widely used mesh partitioning tool. The input to METIS was modified to take into consideration heterogeneous processor performance; METIS does not take into consideration heterogeneous networks. The execution times for these applications were reduced by up to 30% as compared to METIS. These results are given in Figure 1 for four irregular meshes with number of elements ranging from 30,269 elements for the Barth5 mesh to 11,451 elements for the Barth4 mesh. Future work with PART entails using the tool with an integrated application requiring
Design, analysis and control of cable-suspended parallel robots and its applications

CERN Document Server

Zi, Bin

2017-01-01

This book provides an essential overview of the authors’ work in the field of cable-suspended parallel robots, focusing on innovative design, mechanics, control, development and applications. It presents and analyzes several typical mechanical architectures of cable-suspended parallel robots in practical applications, including the feed cable-suspended structure for super antennae, hybrid-driven-based cable-suspended parallel robots, and cooperative cable parallel manipulators for multiple mobile cranes. It also addresses the fundamental mechanics of cable-suspended parallel robots on the basis of their typical applications, including the kinematics, dynamics and trajectory tracking control of the feed cable-suspended structure for super antennae. In addition it proposes a novel hybrid-driven-based cable-suspended parallel robot that uses integrated mechanism design methods to improve the performance of traditional cable-suspended parallel robots. A comparative study on error and performance indices of hybr...
Topic 14+16: High-performance and scientific applications and extreme-scale computing (Introduction)

KAUST Repository

Downes, Turlough P.

2013-01-01

As our understanding of the world around us increases it becomes more challenging to make use of what we already know, and to increase our understanding still further. Computational modeling and simulation have become critical tools in addressing this challenge. The requirements of high-resolution, accurate modeling have outstripped the ability of desktop computers and even small clusters to provide the necessary compute power. Many applications in the scientific and engineering domains now need very large amounts of compute time, while other applications, particularly in the life sciences, frequently have large data I/O requirements. There is thus a growing need for a range of high performance applications which can utilize parallel compute systems effectively, which have efficient data handling strategies and which have the capacity to utilise current and future systems. The High Performance and Scientific Applications topic aims to highlight recent progress in the use of advanced computing and algorithms to address the varied, complex and increasing challenges of modern research throughout both the "hard" and "soft" sciences. This necessitates being able to use large numbers of compute nodes, many of which are equipped with accelerators, and to deal with difficult I/O requirements. © 2013 Springer-Verlag.
Leveraging Cloud Heterogeneity for Cost-Efficient Execution of Parallel Applications

OpenAIRE

Roloff, Eduardo; Diener, Matthias; Diaz Carreño, Emmanuell; Gaspary, Luciano Paschoal; Navaux, Philippe O.A.

2017-01-01

Public cloud providers offer a wide range of instance types, with different processing and interconnection speeds, as well as varying prices. Furthermore, the tasks of many parallel applications show different computational demands due to load imbalance. These differences can be exploited for improving the cost efficiency of parallel applications in many cloud environments by matching application requirements to instance types. In this paper, we introduce the concept of heterogeneous cloud sy...

Optimizing transformations of stencil operations for parallel object-oriented scientific frameworks on cache-based architectures

Energy Technology Data Exchange (ETDEWEB)

Bassetti, F.; Davis, K.; Quinlan, D.

1998-12-31

High-performance scientific computing relies increasingly on high-level large-scale object-oriented software frameworks to manage both algorithmic complexity and the complexities of parallelism: distributed data management, process management, inter-process communication, and load balancing. This encapsulation of data management, together with the prescribed semantics of a typical fundamental component of such object-oriented frameworks--a parallel or serial array-class library--provides an opportunity for increasingly sophisticated compile-time optimization techniques. This paper describes two optimizing transformations suitable for certain classes of numerical algorithms, one for reducing the cost of inter-processor communication, and one for improving cache utilization; demonstrates and analyzes the resulting performance gains; and indicates how these transformations are being automated.
Parallel Libraries to support High-Level Programming

DEFF Research Database (Denmark)

Larsen, Morten Nørgaard

and the Microsoft .NET iv framework. Normally, one would not directly think of the .NET framework when talking scientific applications, but Microsoft has in the last couple of versions of .NET introduce a number of tools for writing parallel and high performance code. The first section examines how programmers can...
Temporal locality optimizations for stencil operations for parallel object-oriented scientific frameworks on cache-based architectures

Energy Technology Data Exchange (ETDEWEB)

Bassetti, F.; Davis, K.; Quinlan, D.

1998-12-01

High-performance scientific computing relies increasingly on high-level large-scale object-oriented software frameworks to manage both algorithmic complexity and the complexities of parallelism: distributed data management, process management, inter-process communication, and load balancing. This encapsulation of data management, together with the prescribed semantics of a typical fundamental component of such object-oriented frameworks--a parallel or serial array-class library--provides an opportunity for increasingly sophisticated compile-time optimization techniques. This paper describes a technique for introducing cache blocking suitable for certain classes of numerical algorithms, demonstrates and analyzes the resulting performance gains, and indicates how this optimization transformation is being automated.
Rapid development of scalable scientific software using a process oriented approach

DEFF Research Database (Denmark)

Friborg, Rune Møllegaard; Vinter, Brian

2011-01-01

Scientific applications are often not written with multiprocessing, cluster computing or grid computing in mind. This paper suggests using Python and PyCSP to structure scientific software through Communicating Sequential Processes. Three scientific applications are used to demonstrate the features...... of PyCSP and how networks of processes may easily be mapped into a visual representation for better understanding of the process workflow. We show that for many sequential solutions, the difficulty in implementing a parallel application is removed. The use of standard multi-threading mechanisms...
Aggregating job exit statuses of a plurality of compute nodes executing a parallel application

Science.gov (United States)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.; Mundy, Michael B.

2015-07-21

Aggregating job exit statuses of a plurality of compute nodes executing a parallel application, including: identifying a subset of compute nodes in the parallel computer to execute the parallel application; selecting one compute node in the subset of compute nodes in the parallel computer as a job leader compute node; initiating execution of the parallel application on the subset of compute nodes; receiving an exit status from each compute node in the subset of compute nodes, where the exit status for each compute node includes information describing execution of some portion of the parallel application by the compute node; aggregating each exit status from each compute node in the subset of compute nodes; and sending an aggregated exit status for the subset of compute nodes in the parallel computer.
Parallel, distributed and GPU computing technologies in single-particle electron microscopy.

Science.gov (United States)

Schmeisser, Martin; Heisen, Burkhard C; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger

2009-07-01

Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today's technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined.
A high-speed linear algebra library with automatic parallelism

Science.gov (United States)

Boucher, Michael L.

1994-01-01

Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
A Secure Web Application Providing Public Access to High-Performance Data Intensive Scientific Resources - ScalaBLAST Web Application

International Nuclear Information System (INIS)

Curtis, Darren S.; Peterson, Elena S.; Oehmen, Chris S.

2008-01-01

This work presents the ScalaBLAST Web Application (SWA), a web based application implemented using the PHP script language, MySQL DBMS, and Apache web server under a GNU/Linux platform. SWA is an application built as part of the Data Intensive Computer for Complex Biological Systems (DICCBS) project at the Pacific Northwest National Laboratory (PNNL). SWA delivers accelerated throughput of bioinformatics analysis via high-performance computing through a convenient, easy-to-use web interface. This approach greatly enhances emerging fields of study in biology such as ontology-based homology, and multiple whole genome comparisons which, in the absence of a tool like SWA, require a heroic effort to overcome the computational bottleneck associated with genome analysis. The current version of SWA includes a user account management system, a web based user interface, and a backend process that generates the files necessary for the Internet scientific community to submit a ScalaBLAST parallel processing job on a dedicated cluster
I - Template Metaprogramming for Massively Parallel Scientific Computing - Expression Templates

CERN Multimedia

CERN. Geneva

2016-01-01

Large scale scientific computing raises questions on different levels ranging from the fomulation of the problems to the choice of the best algorithms and their implementation for a specific platform. There are similarities in these different topics that can be exploited by modern-style C++ template metaprogramming techniques to produce readable, maintainable and generic code. Traditional low-level code tend to be fast but platform-dependent, and it obfuscates the meaning of the algorithm. On the other hand, object-oriented approach is nice to read, but may come with an inherent performance penalty. These lectures aim to present he basics of the Expression Template (ET) idiom which allows us to keep the object-oriented approach without sacrificing performance. We will in particular show to to enhance ET to include SIMD vectorization. We will then introduce techniques for abstracting iteration, and introduce thread-level parallelism for use in heavy data-centric loads. We will show to to apply these methods i...
Scientific applications and numerical algorithms on the midas multiprocessor system

International Nuclear Information System (INIS)

Logan, D.; Maples, C.

1986-01-01

The MIDAS multiprocessor system is a multi-level, hierarchial structure designed at the Advanced Computer Architecture Laboratory of the University of California's Lawrence Berkeley Laboratory. A two-stage, 11-processor system has been operational for over a year and is currently undergoing expansion. It has been employed to investigate the performance of different methods of decomposing various problems and algorithms into a multiprocessor environment. The results of such tests on a variety of applications such as scientific data analysis, Monte Carlo calculations, and image processing, are discussed. Often such decompositions involve investigating the parallel structure of fundamental algorithms. Several basic algorithms dealing with random number generation, matrix diagonalization, fast Fourier transforms, and finite element methods in solving partial differential equations are also discussed. The performance and projected extensibilities of these decompositions on the MIDAS system are reported
Scientific Services on the Cloud

Science.gov (United States)

Chapman, David; Joshi, Karuna P.; Yesha, Yelena; Halem, Milt; Yesha, Yaacov; Nguyen, Phuong

Scientific Computing was one of the first every applications for parallel and distributed computation. To this date, scientific applications remain some of the most compute intensive, and have inspired creation of petaflop compute infrastructure such as the Oak Ridge Jaguar and Los Alamos RoadRunner. Large dedicated hardware infrastructure has become both a blessing and a curse to the scientific community. Scientists are interested in cloud computing for much the same reason as businesses and other professionals. The hardware is provided, maintained, and administrated by a third party. Software abstraction and virtualization provide reliability, and fault tolerance. Graduated fees allow for multi-scale prototyping and execution. Cloud computing resources are only a few clicks away, and by far the easiest high performance distributed platform to gain access to. There may still be dedicated infrastructure for ultra-scale science, but the cloud can easily play a major part of the scientific computing initiative.
Comparative Evaluation and Case Studies of Shared-Memory and Data-Parallel Execution Patterns

Directory of Open Access Journals (Sweden)

Xiaodong Zhang

1999-01-01

Full Text Available Shared‐memory and data‐parallel programming models are two important paradigms for scientific applications. Both models provide high‐level program abstractions, and simple and uniform views of network structures. The common features of the two models significantly simplify program coding and debugging for scientific applications. However, the underlining execution and overhead patterns are significantly different between the two models due to their programming constraints, and due to different and complex structures of interconnection networks and systems which support the two models. We performed this experimental study to present implications and comparisons of execution patterns on two commercial architectures. We implemented a standard electromagnetic simulation program (EM and a linear system solver using the shared‐memory model on the KSR‐1 and the data‐parallel model on the CM‐5. Our objectives are to examine the execution pattern changes required for an implementation transformation between the two models; to study memory access patterns; to address scalability issues; and to investigate relative costs and advantages/disadvantages of using the two models for scientific computations. Our results indicate that the EM program tends to become computation‐intensive in the KSR‐1 shared‐memory system, and memory‐demanding in the CM‐5 data‐parallel system when the systems and the problems are scaled. The EM program, a highly data‐parallel program performed extremely well, and the linear system solver, a highly control‐structured program suffered significantly in the data‐parallel model on the CM‐5. Our study provides further evidence that matching execution patterns of algorithms to parallel architectures would achieve better performance.
Research initiatives for plug-and-play scientific computing

International Nuclear Information System (INIS)

McInnes, Lois Curfman; Dahlgren, Tamara; Nieplocha, Jarek; Bernholdt, David; Allan, Ben; Armstrong, Rob; Chavarria, Daniel; Elwasif, Wael; Gorton, Ian; Kenny, Joe; Krishan, Manoj; Malony, Allen; Norris, Boyana; Ray, Jaideep; Shende, Sameer

2007-01-01

This paper introduces three component technology initiatives within the SciDAC Center for Technology for Advanced Scientific Component Software (TASCS) that address ever-increasing productivity challenges in creating, managing, and applying simulation software to scientific discovery. By leveraging the Common Component Architecture (CCA), a new component standard for high-performance scientific computing, these initiatives tackle difficulties at different but related levels in the development of component-based scientific software: (1) deploying applications on massively parallel and heterogeneous architectures, (2) investigating new approaches to the runtime enforcement of behavioral semantics, and (3) developing tools to facilitate dynamic composition, substitution, and reconfiguration of component implementations and parameters, so that application scientists can explore tradeoffs among factors such as accuracy, reliability, and performance
Parallel, distributed and GPU computing technologies in single-particle electron microscopy

International Nuclear Information System (INIS)

Schmeisser, Martin; Heisen, Burkhard C.; Luettich, Mario; Busche, Boris; Hauer, Florian; Koske, Tobias; Knauber, Karl-Heinz; Stark, Holger

2009-01-01

An introduction to the current paradigm shift towards concurrency in software. Most known methods for the determination of the structure of macromolecular complexes are limited or at least restricted at some point by their computational demands. Recent developments in information technology such as multicore, parallel and GPU processing can be used to overcome these limitations. In particular, graphics processing units (GPUs), which were originally developed for rendering real-time effects in computer games, are now ubiquitous and provide unprecedented computational power for scientific applications. Each parallel-processing paradigm alone can improve overall performance; the increased computational performance obtained by combining all paradigms, unleashing the full power of today’s technology, makes certain applications feasible that were previously virtually impossible. In this article, state-of-the-art paradigms are introduced, the tools and infrastructure needed to apply these paradigms are presented and a state-of-the-art infrastructure and solution strategy for moving scientific applications to the next generation of computer hardware is outlined
Parallel and distributed processing: applications to power systems

Energy Technology Data Exchange (ETDEWEB)

Wu, Felix; Murphy, Liam [California Univ., Berkeley, CA (United States). Dept. of Electrical Engineering and Computer Sciences

1994-12-31

Applications of parallel and distributed processing to power systems problems are still in the early stages. Rapid progress in computing and communications promises a revolutionary increase in the capacity of distributed processing systems. In this paper, the state-of-the art in distributed processing technology and applications is reviewed and future trends are discussed. (author) 14 refs.,1 tab.
An object-oriented bulk synchronous parallel library for multicore programming

NARCIS (Netherlands)

Yzelman, A.N.; Bisseling, R.H.

2012-01-01

We show that the bulk synchronous parallel (BSP) model, originally designed for distributed-memory systems, is also applicable for shared-memory multicore systems and, furthermore, that BSP libraries are useful in scientific computing on these systems. A proof-of-concept MulticoreBSP library has
Geospatial Applications on Different Parallel and Distributed Systems in enviroGRIDS Project

Science.gov (United States)

Rodila, D.; Bacu, V.; Gorgan, D.

2012-04-01

The execution of Earth Science applications and services on parallel and distributed systems has become a necessity especially due to the large amounts of Geospatial data these applications require and the large geographical areas they cover. The parallelization of these applications comes to solve important performance issues and can spread from task parallelism to data parallelism as well. Parallel and distributed architectures such as Grid, Cloud, Multicore, etc. seem to offer the necessary functionalities to solve important problems in the Earth Science domain: storing, distribution, management, processing and security of Geospatial data, execution of complex processing through task and data parallelism, etc. A main goal of the FP7-funded project enviroGRIDS (Black Sea Catchment Observation and Assessment System supporting Sustainable Development) [1] is the development of a Spatial Data Infrastructure targeting this catchment region but also the development of standardized and specialized tools for storing, analyzing, processing and visualizing the Geospatial data concerning this area. For achieving these objectives, the enviroGRIDS deals with the execution of different Earth Science applications, such as hydrological models, Geospatial Web services standardized by the Open Geospatial Consortium (OGC) and others, on parallel and distributed architecture to maximize the obtained performance. This presentation analysis the integration and execution of Geospatial applications on different parallel and distributed architectures and the possibility of choosing among these architectures based on application characteristics and user requirements through a specialized component. Versions of the proposed platform have been used in enviroGRIDS project on different use cases such as: the execution of Geospatial Web services both on Web and Grid infrastructures [2] and the execution of SWAT hydrological models both on Grid and Multicore architectures [3]. The current
A structured representation for parallel algorithm design on multicomputers

International Nuclear Information System (INIS)

Sun, Xian-He; Ni, L.M.

1991-01-01

Traditionally, parallel algorithms have been designed by brute force methods and fine-tuned on each architecture to achieve high performance. Rather than studying the design case by case, a systematic approach is proposed. A notation is first developed. Using this notation, most of the frequently used scientific and engineering applications can be presented by simple formulas. The formulas constitute the structured representation of the corresponding applications. The structured representation is simple, adequate and easy to understand. They also contain sufficient information about uneven allocation and communication latency degradations. With the structured representation, applications can be compared, classified and partitioned. Some of the basic building blocks, called computation models, of frequently used applications are identified and studied. Most applications are combinations of some computation models. The structured representation relates general applications to computation models. Studying computation models leads to a guideline for efficient parallel algorithm design for general applications. 6 refs., 7 figs
HPCToolkit: performance tools for scientific computing

Energy Technology Data Exchange (ETDEWEB)

Tallent, N; Mellor-Crummey, J; Adhianto, L; Fagan, M; Krentel, M [Department of Computer Science, Rice University, Houston, TX 77005 (United States)

2008-07-15

As part of the U.S. Department of Energy's Scientific Discovery through Advanced Computing (SciDAC) program, science teams are tackling problems that require simulation and modeling on petascale computers. As part of activities associated with the SciDAC Center for Scalable Application Development Software (CScADS) and the Performance Engineering Research Institute (PERI), Rice University is building software tools for performance analysis of scientific applications on the leadership-class platforms. In this poster abstract, we briefly describe the HPCToolkit performance tools and how they can be used to pinpoint bottlenecks in SPMD and multi-threaded parallel codes. We demonstrate HPCToolkit's utility by applying it to two SciDAC applications: the S3D code for simulation of turbulent combustion and the MFDn code for ab initio calculations of microscopic structure of nuclei.
HPCToolkit: performance tools for scientific computing

International Nuclear Information System (INIS)

Tallent, N; Mellor-Crummey, J; Adhianto, L; Fagan, M; Krentel, M

2008-01-01

As part of the U.S. Department of Energy's Scientific Discovery through Advanced Computing (SciDAC) program, science teams are tackling problems that require simulation and modeling on petascale computers. As part of activities associated with the SciDAC Center for Scalable Application Development Software (CScADS) and the Performance Engineering Research Institute (PERI), Rice University is building software tools for performance analysis of scientific applications on the leadership-class platforms. In this poster abstract, we briefly describe the HPCToolkit performance tools and how they can be used to pinpoint bottlenecks in SPMD and multi-threaded parallel codes. We demonstrate HPCToolkit's utility by applying it to two SciDAC applications: the S3D code for simulation of turbulent combustion and the MFDn code for ab initio calculations of microscopic structure of nuclei

High performance statistical computing with parallel R: applications to biology and climate modelling

International Nuclear Information System (INIS)

Samatova, Nagiza F; Branstetter, Marcia; Ganguly, Auroop R; Hettich, Robert; Khan, Shiraj; Kora, Guruprasad; Li, Jiangtian; Ma, Xiaosong; Pan, Chongle; Shoshani, Arie; Yoginath, Srikanth

2006-01-01

Ultrascale computing and high-throughput experimental technologies have enabled the production of scientific data about complex natural phenomena. With this opportunity, comes a new problem - the massive quantities of data so produced. Answers to fundamental questions about the nature of those phenomena remain largely hidden in the produced data. The goal of this work is to provide a scalable high performance statistical data analysis framework to help scientists perform interactive analyses of these raw data to extract knowledge. Towards this goal we have been developing an open source parallel statistical analysis package, called Parallel R, that lets scientists employ a wide range of statistical analysis routines on high performance shared and distributed memory architectures without having to deal with the intricacies of parallelizing these routines
Compiling Scientific Programs for Scalable Parallel Systems

National Research Council Canada - National Science Library

Kennedy, Ken

2001-01-01

...). The research performed in this project included new techniques for recognizing implicit parallelism in sequential programs, a powerful and precise set-based framework for analysis and transformation...
5th International Conference on High Performance Scientific Computing

CERN Document Server

Hoang, Xuan; Rannacher, Rolf; Schlöder, Johannes

2014-01-01

This proceedings volume gathers a selection of papers presented at the Fifth International Conference on High Performance Scientific Computing, which took place in Hanoi on March 5-9, 2012. The conference was organized by the Institute of Mathematics of the Vietnam Academy of Science and Technology (VAST), the Interdisciplinary Center for Scientific Computing (IWR) of Heidelberg University, Ho Chi Minh City University of Technology, and the Vietnam Institute for Advanced Study in Mathematics. The contributions cover the broad interdisciplinary spectrum of scientific computing and present recent advances in theory, development of methods, and practical applications. Subjects covered include mathematical modeling; numerical simulation; methods for optimization and control; parallel computing; software development; and applications of scientific computing in physics, mechanics and biomechanics, material science, hydrology, chemistry, biology, biotechnology, medicine, sports, psychology, transport, logistics, com...
3rd International Conference on High Performance Scientific Computing

CERN Document Server

Kostina, Ekaterina; Phu, Hoang; Rannacher, Rolf

2008-01-01

This proceedings volume contains a selection of papers presented at the Third International Conference on High Performance Scientific Computing held at the Hanoi Institute of Mathematics, Vietnamese Academy of Science and Technology (VAST), March 6-10, 2006. The conference has been organized by the Hanoi Institute of Mathematics, Interdisciplinary Center for Scientific Computing (IWR), Heidelberg, and its International PhD Program ``Complex Processes: Modeling, Simulation and Optimization'', and Ho Chi Minh City University of Technology. The contributions cover the broad interdisciplinary spectrum of scientific computing and present recent advances in theory, development of methods, and applications in practice. Subjects covered are mathematical modelling, numerical simulation, methods for optimization and control, parallel computing, software development, applications of scientific computing in physics, chemistry, biology and mechanics, environmental and hydrology problems, transport, logistics and site loca...
6th International Conference on High Performance Scientific Computing

CERN Document Server

Phu, Hoang; Rannacher, Rolf; Schlöder, Johannes

2017-01-01

This proceedings volume highlights a selection of papers presented at the Sixth International Conference on High Performance Scientific Computing, which took place in Hanoi, Vietnam on March 16-20, 2015. The conference was jointly organized by the Heidelberg Institute of Theoretical Studies (HITS), the Institute of Mathematics of the Vietnam Academy of Science and Technology (VAST), the Interdisciplinary Center for Scientific Computing (IWR) at Heidelberg University, and the Vietnam Institute for Advanced Study in Mathematics, Ministry of Education The contributions cover a broad, interdisciplinary spectrum of scientific computing and showcase recent advances in theory, methods, and practical applications. Subjects covered numerical simulation, methods for optimization and control, parallel computing, and software development, as well as the applications of scientific computing in physics, mechanics, biomechanics and robotics, material science, hydrology, biotechnology, medicine, transport, scheduling, and in...
Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

KAUST Repository

Quintin, Jean-Noel

2013-10-01

Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon\\'s algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon\\'s algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.
Hierarchical Parallel Matrix Multiplication on Large-Scale Distributed Memory Platforms

KAUST Repository

Quintin, Jean-Noel; Hasanov, Khalid; Lastovetsky, Alexey

2013-01-01

Matrix multiplication is a very important computation kernel both in its own right as a building block of many scientific applications and as a popular representative for other scientific applications. Cannon's algorithm which dates back to 1969 was the first efficient algorithm for parallel matrix multiplication providing theoretically optimal communication cost. However this algorithm requires a square number of processors. In the mid-1990s, the SUMMA algorithm was introduced. SUMMA overcomes the shortcomings of Cannon's algorithm as it can be used on a nonsquare number of processors as well. Since then the number of processors in HPC platforms has increased by two orders of magnitude making the contribution of communication in the overall execution time more significant. Therefore, the state of the art parallel matrix multiplication algorithms should be revisited to reduce the communication cost further. This paper introduces a new parallel matrix multiplication algorithm, Hierarchical SUMMA (HSUMMA), which is a redesign of SUMMA. Our algorithm reduces the communication cost of SUMMA by introducing a two-level virtual hierarchy into the two-dimensional arrangement of processors. Experiments on an IBM BlueGene/P demonstrate the reduction of communication cost up to 2.08 times on 2048 cores and up to 5.89 times on 16384 cores. © 2013 IEEE.
Fast robot kinematics modeling by using a parallel simulator (PSIM)

International Nuclear Information System (INIS)

El-Gazzar, H.M.; Ayad, N.M.A.

2002-01-01

High-speed computers are strongly needed not only for solving scientific and engineering problems, but also for numerous industrial applications. Such applications include computer-aided design, oil exploration, weather predication, space applications and safety of nuclear reactors. The rapid development in VLSI technology makes it possible to implement time consuming algorithms in real-time situations. Parallel processing approaches can now be used to reduce the processing-time for models of very high mathematical structure such as the kinematics molding of robot manipulator. This system is used to construct and evaluate the performance and cost effectiveness of several proposed methods to solve the Jacobian algorithm. Parallelism is introduced to the algorithms by using different task-allocations and dividing the whole job into sub tasks. Detailed analysis is performed and results are obtained for the case of six DOF (degree of freedom) robot arms (Stanford Arm). Execution times comparisons between Von Neumann (uni processor) and parallel processor architectures by using parallel simulator package (PSIM) are presented. The gained results are much in favour for the parallel techniques by at least fifty-percent improvements. Of course, further studies are needed to achieve the convenient and optimum number of processors has to be done
Fast robot kinematics modeling by using a parallel simulator (PSIM)

Energy Technology Data Exchange (ETDEWEB)

El-Gazzar, H M; Ayad, N M.A. [Atomic Energy Authority, Reactor Dept., Computer and Control Lab., P.O. Box no 13759 (Egypt)

2002-09-15

High-speed computers are strongly needed not only for solving scientific and engineering problems, but also for numerous industrial applications. Such applications include computer-aided design, oil exploration, weather predication, space applications and safety of nuclear reactors. The rapid development in VLSI technology makes it possible to implement time consuming algorithms in real-time situations. Parallel processing approaches can now be used to reduce the processing-time for models of very high mathematical structure such as the kinematics molding of robot manipulator. This system is used to construct and evaluate the performance and cost effectiveness of several proposed methods to solve the Jacobian algorithm. Parallelism is introduced to the algorithms by using different task-allocations and dividing the whole job into sub tasks. Detailed analysis is performed and results are obtained for the case of six DOF (degree of freedom) robot arms (Stanford Arm). Execution times comparisons between Von Neumann (uni processor) and parallel processor architectures by using parallel simulator package (PSIM) are presented. The gained results are much in favour for the parallel techniques by at least fifty-percent improvements. Of course, further studies are needed to achieve the convenient and optimum number of processors has to be done.
Scientific applications of symbolic computation

International Nuclear Information System (INIS)

Hearn, A.C.

1976-02-01

The use of symbolic computation systems for problem solving in scientific research is reviewed. The nature of the field is described, and particular examples are considered from celestial mechanics, quantum electrodynamics and general relativity. Symbolic integration and some more recent applications of algebra systems are also discussed [fr
Automatic Parallelization of Scientific Application

DEFF Research Database (Denmark)

Blum, Troels

performance gains. Scientists working with computer simulations should be allowed to focus on their field of research and not spend excessive amounts of time learning exotic programming models and languages. We have with Bohrium achieved very promising results by starting out with a relatively simple approach...
Application of the opportunities of tool system 'CUDA' for graphic processors programming in scientific and technical calculation tasks

International Nuclear Information System (INIS)

Dudnik, V.A.; Kudryavtsev, V.I.; Sereda, T.M.; Us, S.A.; Shestakov, M.V.

2009-01-01

The opportunities of technology CUDA (Compute Unified Device Architecture - the unified hardware-software decision for parallel calculations on GPU)of the company NVIDIA were described. The basic differences of the programming language 'C' for GPU from 'usual' language 'C' were selected. The examples of CUDA usage for acceleration of development of applications and realization of algorithms of scientific and technical calculations were given which are carried out by the means of graphic processors (GPGPU) of accelerators GeForce of the eighth generation. The recommendations on optimization of the programs using GPU were resulted.
Dynamic workload balancing of parallel applications with user-level scheduling on the Grid

CERN Document Server

Korkhov, Vladimir V; Krzhizhanovskaya, Valeria V

2009-01-01

This paper suggests a hybrid resource management approach for efficient parallel distributed computing on the Grid. It operates on both application and system levels, combining user-level job scheduling with dynamic workload balancing algorithm that automatically adapts a parallel application to the heterogeneous resources, based on the actual resource parameters and estimated requirements of the application. The hybrid environment and the algorithm for automated load balancing are described, the influence of resource heterogeneity level is measured, and the speedup achieved with this technique is demonstrated for different types of applications and resources.
Application of parallel processing for automatic inspection of printed circuits

International Nuclear Information System (INIS)

Lougheed, R.M.

1986-01-01

Automated visual inspection of printed electronic circuits is a challenging application for image processing systems. Detailed inspection requires high speed analysis of gray scale imagery along with high quality optics, lighting, and sensing equipment. A prototype system has been developed and demonstrated at the Environmental Research Institute of Michigan (ERIM) for inspection of multilayer thick-film circuits. The central problem of real-time image processing is solved by a special-purpose parallel processor which includes a new high-speed Cytocomputer. In this chapter the inspection process and the algorithms used are summarized, along with the functional requirements of the machine vision system. Next, the parallel processor is described in detail and then performance on this application is given
Parallel Backprojection: A Case Study in High-Performance Reconfigurable Computing

Directory of Open Access Journals (Sweden)

Cordes Ben

2009-01-01

Full Text Available High-performance reconfigurable computing (HPRC is a novel approach to provide large-scale computing power to modern scientific applications. Using both general-purpose processors and FPGAs allows application designers to exploit fine-grained and coarse-grained parallelism, achieving high degrees of speedup. One scientific application that benefits from this technique is backprojection, an image formation algorithm that can be used as part of a synthetic aperture radar (SAR processing system. We present an implementation of backprojection for SAR on an HPRC system. Using simulated data taken at a variety of ranges, our implementation runs over 200 times faster than a similar software program, with an overall application speedup better than 50x. The backprojection application is easily parallelizable, achieving near-linear speedup when run on multiple nodes of a clustered HPRC system. The results presented can be applied to other systems and other algorithms with similar characteristics.
Parallel Backprojection: A Case Study in High-Performance Reconfigurable Computing

Directory of Open Access Journals (Sweden)

2009-03-01

Full Text Available High-performance reconfigurable computing (HPRC is a novel approach to provide large-scale computing power to modern scientific applications. Using both general-purpose processors and FPGAs allows application designers to exploit fine-grained and coarse-grained parallelism, achieving high degrees of speedup. One scientific application that benefits from this technique is backprojection, an image formation algorithm that can be used as part of a synthetic aperture radar (SAR processing system. We present an implementation of backprojection for SAR on an HPRC system. Using simulated data taken at a variety of ranges, our implementation runs over 200 times faster than a similar software program, with an overall application speedup better than 50x. The backprojection application is easily parallelizable, achieving near-linear speedup when run on multiple nodes of a clustered HPRC system. The results presented can be applied to other systems and other algorithms with similar characteristics.
Development and application of efficient strategies for parallel magnetic resonance imaging

Energy Technology Data Exchange (ETDEWEB)

Breuer, F.

2006-07-01

Virtually all existing MRI applications require both a high spatial and high temporal resolution for optimum detection and classification of the state of disease. The main strategy to meet the increasing demands of advanced diagnostic imaging applications has been the steady improvement of gradient systems, which provide increased gradient strengths and faster switching times. Rapid imaging techniques and the advances in gradient performance have significantly reduced acquisition times from about an hour to several minutes or seconds. In order to further increase imaging speed, much higher gradient strengths and much faster switching times are required which are technically challenging to provide. In addition to significant hardware costs, peripheral neuro-stimulations and the surpassing of admissable acoustic noise levels may occur. Today's whole body gradient systems already operate just below the allowed safety levels. For these reasons, alternative strategies are needed to bypass these limitations. The greatest progress in further increasing imaging speed has been the development of multi-coil arrays and the advent of partially parallel acquisition (PPA) techniques in the late 1990's. Within the last years, parallel imaging methods have become commercially available,and are therefore ready for broad clinical use. The basic feature of parallel imaging is a scan time reduction, applicable to nearly any available MRI method, while maintaining the contrast behavior without requiring higher gradient system performance. PPA operates by allowing an array of receiver surface coils, positioned around the object under investigation, to partially replace time-consuming spatial encoding which normally is performed by switching magnetic field gradients. Using this strategy, spatial resolution can be improved given a specific imaging time, or scan times can be reduced at a given spatial resolution. Furthermore, in some cases, PPA can even be used to reduce image
Development and application of efficient strategies for parallel magnetic resonance imaging

International Nuclear Information System (INIS)

Breuer, F.

2006-01-01

Virtually all existing MRI applications require both a high spatial and high temporal resolution for optimum detection and classification of the state of disease. The main strategy to meet the increasing demands of advanced diagnostic imaging applications has been the steady improvement of gradient systems, which provide increased gradient strengths and faster switching times. Rapid imaging techniques and the advances in gradient performance have significantly reduced acquisition times from about an hour to several minutes or seconds. In order to further increase imaging speed, much higher gradient strengths and much faster switching times are required which are technically challenging to provide. In addition to significant hardware costs, peripheral neuro-stimulations and the surpassing of admissable acoustic noise levels may occur. Today's whole body gradient systems already operate just below the allowed safety levels. For these reasons, alternative strategies are needed to bypass these limitations. The greatest progress in further increasing imaging speed has been the development of multi-coil arrays and the advent of partially parallel acquisition (PPA) techniques in the late 1990's. Within the last years, parallel imaging methods have become commercially available,and are therefore ready for broad clinical use. The basic feature of parallel imaging is a scan time reduction, applicable to nearly any available MRI method, while maintaining the contrast behavior without requiring higher gradient system performance. PPA operates by allowing an array of receiver surface coils, positioned around the object under investigation, to partially replace time-consuming spatial encoding which normally is performed by switching magnetic field gradients. Using this strategy, spatial resolution can be improved given a specific imaging time, or scan times can be reduced at a given spatial resolution. Furthermore, in some cases, PPA can even be used to reduce image artifacts
Parallel PDE-Based Simulations Using the Common Component Architecture

International Nuclear Information System (INIS)

McInnes, Lois C.; Allan, Benjamin A.; Armstrong, Robert; Benson, Steven J.; Bernholdt, David E.; Dahlgren, Tamara L.; Diachin, Lori; Krishnan, Manoj Kumar; Kohl, James A.; Larson, J. Walter; Lefantzi, Sophia; Nieplocha, Jarek; Norris, Boyana; Parker, Steven G.; Ray, Jaideep; Zhou, Shujia

2006-01-01

The complexity of parallel PDE-based simulations continues to increase as multimodel, multiphysics, and multi-institutional projects become widespread. A goal of component based software engineering in such large-scale simulations is to help manage this complexity by enabling better interoperability among various codes that have been independently developed by different groups. The Common Component Architecture (CCA) Forum is defining a component architecture specification to address the challenges of high-performance scientific computing. In addition, several execution frameworks, supporting infrastructure, and general purpose components are being developed. Furthermore, this group is collaborating with others in the high-performance computing community to design suites of domain-specific component interface specifications and underlying implementations. This chapter discusses recent work on leveraging these CCA efforts in parallel PDE-based simulations involving accelerator design, climate modeling, combustion, and accidental fires and explosions. We explain how component technology helps to address the different challenges posed by each of these applications, and we highlight how component interfaces built on existing parallel toolkits facilitate the reuse of software for parallel mesh manipulation, discretization, linear algebra, integration, optimization, and parallel data redistribution. We also present performance data to demonstrate the suitability of this approach, and we discuss strategies for applying component technologies to both new and existing applications
High-performance scientific computing in the cloud

Science.gov (United States)

Jorissen, Kevin; Vila, Fernando; Rehr, John

2011-03-01

Cloud computing has the potential to open up high-performance computational science to a much broader class of researchers, owing to its ability to provide on-demand, virtualized computational resources. However, before such approaches can become commonplace, user-friendly tools must be developed that hide the unfamiliar cloud environment and streamline the management of cloud resources for many scientific applications. We have recently shown that high-performance cloud computing is feasible for parallelized x-ray spectroscopy calculations. We now present benchmark results for a wider selection of scientific applications focusing on electronic structure and spectroscopic simulation software in condensed matter physics. These applications are driven by an improved portable interface that can manage virtual clusters and run various applications in the cloud. We also describe a next generation of cluster tools, aimed at improved performance and a more robust cluster deployment. Supported by NSF grant OCI-1048052.

II - Template Metaprogramming for Massively Parallel Scientific Computing - Vectorization with Expression Templates

CERN Multimedia

CERN. Geneva

2016-01-01

Large scale scientific computing raises questions on different levels ranging from the fomulation of the problems to the choice of the best algorithms and their implementation for a specific platform. There are similarities in these different topics that can be exploited by modern-style C++ template metaprogramming techniques to produce readable, maintainable and generic code. Traditional low-level code tend to be fast but platform-dependent, and it obfuscates the meaning of the algorithm. On the other hand, object-oriented approach is nice to read, but may come with an inherent performance penalty. These lectures aim to present he basics of the Expression Template (ET) idiom which allows us to keep the object-oriented approach without sacrificing performance. We will in particular show to to enhance ET to include SIMD vectorization. We will then introduce techniques for abstracting iteration, and introduce thread-level parallelism for use in heavy data-centric loads. We will show to to apply these methods i...
Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

Directory of Open Access Journals (Sweden)

Hari Radhakrishnan

2015-01-01

Full Text Available This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were done using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.
Animated computer graphics models of space and earth sciences data generated via the massively parallel processor

Science.gov (United States)

Treinish, Lloyd A.; Gough, Michael L.; Wildenhain, W. David

1987-01-01

The capability was developed of rapidly producing visual representations of large, complex, multi-dimensional space and earth sciences data sets via the implementation of computer graphics modeling techniques on the Massively Parallel Processor (MPP) by employing techniques recently developed for typically non-scientific applications. Such capabilities can provide a new and valuable tool for the understanding of complex scientific data, and a new application of parallel computing via the MPP. A prototype system with such capabilities was developed and integrated into the National Space Science Data Center's (NSSDC) Pilot Climate Data System (PCDS) data-independent environment for computer graphics data display to provide easy access to users. While developing these capabilities, several problems had to be solved independently of the actual use of the MPP, all of which are outlined.
Parallel GPU implementation of iterative PCA algorithms.

Science.gov (United States)

Andrecut, M

2009-11-01

Principal component analysis (PCA) is a key statistical technique for multivariate data analysis. For large data sets, the common approach to PCA computation is based on the standard NIPALS-PCA algorithm, which unfortunately suffers from loss of orthogonality, and therefore its applicability is usually limited to the estimation of the first few components. Here we present an algorithm based on Gram-Schmidt orthogonalization (called GS-PCA), which eliminates this shortcoming of NIPALS-PCA. Also, we discuss the GPU (Graphics Processing Unit) parallel implementation of both NIPALS-PCA and GS-PCA algorithms. The numerical results show that the GPU parallel optimized versions, based on CUBLAS (NVIDIA), are substantially faster (up to 12 times) than the CPU optimized versions based on CBLAS (GNU Scientific Library).
A highly scalable massively parallel fast marching method for the Eikonal equation

Science.gov (United States)

Yang, Jianming; Stern, Frederick

2017-03-01

The fast marching method is a widely used numerical method for solving the Eikonal equation arising from a variety of scientific and engineering fields. It is long deemed inherently sequential and an efficient parallel algorithm applicable to large-scale practical applications is not available in the literature. In this study, we present a highly scalable massively parallel implementation of the fast marching method using a domain decomposition approach. Central to this algorithm is a novel restarted narrow band approach that coordinates the frequency of communications and the amount of computations extra to a sequential run for achieving an unprecedented parallel performance. Within each restart, the narrow band fast marching method is executed; simple synchronous local exchanges and global reductions are adopted for communicating updated data in the overlapping regions between neighboring subdomains and getting the latest front status, respectively. The independence of front characteristics is exploited through special data structures and augmented status tags to extract the masked parallelism within the fast marching method. The efficiency, flexibility, and applicability of the parallel algorithm are demonstrated through several examples. These problems are extensively tested on six grids with up to 1 billion points using different numbers of processes ranging from 1 to 65536. Remarkable parallel speedups are achieved using tens of thousands of processes. Detailed pseudo-codes for both the sequential and parallel algorithms are provided to illustrate the simplicity of the parallel implementation and its similarity to the sequential narrow band fast marching algorithm.
Final Report: Migration Mechanisms for Large-scale Parallel Applications

Energy Technology Data Exchange (ETDEWEB)

Jason Nieh

2009-10-30

Process migration is the ability to transfer a process from one machine to another. It is a useful facility in distributed computing environments, especially as computing devices become more pervasive and Internet access becomes more ubiquitous. The potential benefits of process migration, among others, are fault resilience by migrating processes off of faulty hosts, data access locality by migrating processes closer to the data, better system response time by migrating processes closer to users, dynamic load balancing by migrating processes to less loaded hosts, and improved service availability and administration by migrating processes before host maintenance so that applications can continue to run with minimal downtime. Although process migration provides substantial potential benefits and many approaches have been considered, achieving transparent process migration functionality has been difficult in practice. To address this problem, our work has designed, implemented, and evaluated new and powerful transparent process checkpoint-restart and migration mechanisms for desktop, server, and parallel applications that operate across heterogeneous cluster and mobile computing environments. A key aspect of this work has been to introduce lightweight operating system virtualization to provide processes with private, virtual namespaces that decouple and isolate processes from dependencies on the host operating system instance. This decoupling enables processes to be transparently checkpointed and migrated without modifying, recompiling, or relinking applications or the operating system. Building on this lightweight operating system virtualization approach, we have developed novel technologies that enable (1) coordinated, consistent checkpoint-restart and migration of multiple processes, (2) fast checkpointing of process and file system state to enable restart of multiple parallel execution environments and time travel, (3) process migration across heterogeneous
Parallel computing: numerics, applications, and trends

National Research Council Canada - National Science Library

Trobec, Roman; Vajteršic, Marián; Zinterhof, Peter

2009-01-01

... and/or distributed systems. The contributions to this book are focused on topics most concerned in the trends of today's parallel computing. These range from parallel algorithmics, programming, tools, network computing to future parallel computing. Particular attention is paid to parallel numerics: linear algebra, differential equations, numerica...
Parallel translation in warped product spaces: application to the Reissner-Nordstroem spacetime

International Nuclear Information System (INIS)

Raposo, A P; Del Riego, L

2005-01-01

A formal treatment of the parallel translation transformations in warped product manifolds is presented and related to those parallel translation transformations in each of the factor manifolds. A straightforward application to the Schwarzschild and Reissner-Nordstroem geometries, considered here as particular examples, explains some apparently surprising properties of the holonomy in these manifolds
An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications

Energy Technology Data Exchange (ETDEWEB)

Vydyanathan, Naga; Krishnamoorthy, Sriram; Sabin, Gerald M.; Catalyurek, Umit V.; Kurc, Tahsin; Sadayappan, Ponnuswamy; Saltz, Joel H.

2009-08-01

Complex parallel applications can often be modeled as directed acyclic graphs of coarse-grained application-tasks with dependences. These applications exhibit both task- and data-parallelism, and combining these two (also called mixedparallelism), has been shown to be an effective model for their execution. In this paper, we present an algorithm to compute the appropriate mix of task- and data-parallelism required to minimize the parallel completion time (makespan) of these applications. In other words, our algorithm determines the set of tasks that should be run concurrently and the number of processors to be allocated to each task. The processor allocation and scheduling decisions are made in an integrated manner and are based on several factors such as the structure of the taskgraph, the runtime estimates and scalability characteristics of the tasks and the inter-task data communication volumes. A locality conscious scheduling strategy is used to improve inter-task data reuse. Evaluation through simulations and actual executions of task graphs derived from real applications as well as synthetic graphs shows that our algorithm consistently generates schedules with lower makespan as compared to CPR and CPA, two previously proposed scheduling algorithms. Our algorithm also produces schedules that have lower makespan than pure taskand data-parallel schedules. For task graphs with known optimal schedules or lower bounds on the makespan, our algorithm generates schedules that are closer to the optima than other scheduling approaches.
A primer on scientific programming with Python

CERN Document Server

Langtangen, Hans Petter

2014-01-01

The book serves as a first introduction to computer programming of scientific applications, using the high-level Python language. The exposition is example and problem-oriented, where the applications are taken from mathematics, numerical calculus, statistics, physics, biology and finance. The book teaches "Matlab-style" and procedural programming as well as object-oriented programming. High school mathematics is a required background and it is advantageous to study classical and numerical one-variable calculus in parallel with reading this book. Besides learning how to program computers, the reader will also learn how to solve mathematical problems, arising in various branches of science and engineering, with the aid of numerical methods and programming. By blending programming, mathematics and scientific applications, the book lays a solid foundation for practicing computational science. From the reviews: Langtangen … does an excellent job of introducing programming as a set of skills in problem solving. ...
A primer on scientific programming with Python

CERN Document Server

Langtangen, Hans Petter

2016-01-01

The book serves as a first introduction to computer programming of scientific applications, using the high-level Python language. The exposition is example and problem-oriented, where the applications are taken from mathematics, numerical calculus, statistics, physics, biology and finance. The book teaches "Matlab-style" and procedural programming as well as object-oriented programming. High school mathematics is a required background and it is advantageous to study classical and numerical one-variable calculus in parallel with reading this book. Besides learning how to program computers, the reader will also learn how to solve mathematical problems, arising in various branches of science and engineering, with the aid of numerical methods and programming. By blending programming, mathematics and scientific applications, the book lays a solid foundation for practicing computational science. From the reviews: Langtangen … does an excellent job of introducing programming as a set of skills in problem solving. ...
Parallel visualization on leadership computing resources

Energy Technology Data Exchange (ETDEWEB)

Peterka, T; Ross, R B [Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439 (United States); Shen, H-W [Department of Computer Science and Engineering, Ohio State University, Columbus, OH 43210 (United States); Ma, K-L [Department of Computer Science, University of California at Davis, Davis, CA 95616 (United States); Kendall, W [Department of Electrical Engineering and Computer Science, University of Tennessee at Knoxville, Knoxville, TN 37996 (United States); Yu, H, E-mail: tpeterka@mcs.anl.go [Sandia National Laboratories, California, Livermore, CA 94551 (United States)

2009-07-01

Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.
Parallel visualization on leadership computing resources

International Nuclear Information System (INIS)

Peterka, T; Ross, R B; Shen, H-W; Ma, K-L; Kendall, W; Yu, H

2009-01-01

Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.
A massively parallel method of characteristic neutral particle transport code for GPUs

International Nuclear Information System (INIS)

Boyd, W. R.; Smith, K.; Forget, B.

2013-01-01

Over the past 20 years, parallel computing has enabled computers to grow ever larger and more powerful while scientific applications have advanced in sophistication and resolution. This trend is being challenged, however, as the power consumption for conventional parallel computing architectures has risen to unsustainable levels and memory limitations have come to dominate compute performance. Heterogeneous computing platforms, such as Graphics Processing Units (GPUs), are an increasingly popular paradigm for solving these issues. This paper explores the applicability of GPUs for deterministic neutron transport. A 2D method of characteristics (MOC) code - OpenMOC - has been developed with solvers for both shared memory multi-core platforms as well as GPUs. The multi-threading and memory locality methodologies for the GPU solver are presented. Performance results for the 2D C5G7 benchmark demonstrate 25-35 x speedup for MOC on the GPU. The lessons learned from this case study will provide the basis for further exploration of MOC on GPUs as well as design decisions for hardware vendors exploring technologies for the next generation of machines for scientific computing. (authors)
A general purpose subroutine for fast fourier transform on a distributed memory parallel machine

Science.gov (United States)

Dubey, A.; Zubair, M.; Grosch, C. E.

1992-01-01

One issue which is central in developing a general purpose Fast Fourier Transform (FFT) subroutine on a distributed memory parallel machine is the data distribution. It is possible that different users would like to use the FFT routine with different data distributions. Thus, there is a need to design FFT schemes on distributed memory parallel machines which can support a variety of data distributions. An FFT implementation on a distributed memory parallel machine which works for a number of data distributions commonly encountered in scientific applications is presented. The problem of rearranging the data after computing the FFT is also addressed. The performance of the implementation on a distributed memory parallel machine Intel iPSC/860 is evaluated.
Scientific production and technological production: transforming a scientific paper into patent applications.

Science.gov (United States)

Dias, Cleber Gustavo; Almeida, Roberto Barbosa de

2013-01-01

Brazil has been presenting in the last years a scientific production well-recognized in the international scenario, in several areas of knowledge, according to the impact of their publications in important events and especially in indexed journals of wide circulation. On the other hand, the country does not seem to be in the same direction regarding to the technological production and wealth creation from the established scientific development, and particularly from the applied research. The present paper covers such issue and discloses the main similarities and differences between a scientific paper and a patent application, in order to contribute to a better understanding of both types of documents and help the researchers to chose and select the results with technological potential, decide what is appropriated for industrial protection, as well as foster new business opportunities for each technology which has been created.
High performance parallel I/O

CERN Document Server

Prabhat

2014-01-01

Gain Critical Insight into the Parallel I/O EcosystemParallel I/O is an integral component of modern high performance computing (HPC), especially in storing and processing very large datasets to facilitate scientific discovery. Revealing the state of the art in this field, High Performance Parallel I/O draws on insights from leading practitioners, researchers, software architects, developers, and scientists who shed light on the parallel I/O ecosystem.The first part of the book explains how large-scale HPC facilities scope, configure, and operate systems, with an emphasis on choices of I/O har
The Benefits of Adaptive Partitioning for Parallel AMR Applications

Energy Technology Data Exchange (ETDEWEB)

Steensland, Johan [Sandia National Lab. (SNL-CA), Livermore, CA (United States). Advanced Software Research and Development

2008-07-01

Parallel adaptive mesh refinement methods potentially lead to realistic modeling of complex three-dimensional physical phenomena. However, the dynamics inherent in these methods present significant challenges in data partitioning and load balancing. Significant human resources, including time, effort, experience, and knowledge, are required for determining the optimal partitioning technique for each new simulation. In reality, scientists resort to using the on-board partitioner of the computational framework, or to using the partitioning industry standard, ParMetis. Adaptive partitioning refers to repeatedly selecting, configuring and invoking the optimal partitioning technique at run-time, based on the current state of the computer and application. In theory, adaptive partitioning automatically delivers superior performance and eliminates the need for repeatedly spending valuable human resources for determining the optimal static partitioning technique. In practice, however, enabling frameworks are non-existent due to the inherent significant inter-disciplinary research challenges. This paper presents a study of a simple implementation of adaptive partitioning and discusses implied potential benefits from the perspective of common groups of users within computational science. The study is based on a large set of data derived from experiments including six real-life, multi-time-step adaptive applications from various scientific domains, five complementing and fundamentally different partitioning techniques, a large set of parameters corresponding to a wide spectrum of computing environments, and a flexible cost function that considers the relative impact of multiple partitioning metrics and diverse partitioning objectives. The results show that even a simple implementation of adaptive partitioning can automatically generate results statistically equivalent to the best static partitioning. Thus, it is possible to effectively eliminate the problem of determining the
Advanced scientific computational methods and their applications to nuclear technologies. (4) Overview of scientific computational methods, introduction of continuum simulation methods and their applications (4)

International Nuclear Information System (INIS)

Sekimura, Naoto; Okita, Taira

2006-01-01

Scientific computational methods have advanced remarkably with the progress of nuclear development. They have played the role of weft connecting each realm of nuclear engineering and then an introductory course of advanced scientific computational methods and their applications to nuclear technologies were prepared in serial form. This is the fourth issue showing the overview of scientific computational methods with the introduction of continuum simulation methods and their applications. Simulation methods on physical radiation effects on materials are reviewed based on the process such as binary collision approximation, molecular dynamics, kinematic Monte Carlo method, reaction rate method and dislocation dynamics. (T. Tanaka)
Comparative Study on Paralleled vs. Scaled Dc-dc Converters in High Voltage Gain Applications

DEFF Research Database (Denmark)

Klimczak, Pawel; Munk-Nielsen, Stig

2008-01-01

Today power converters are present in many commercial, medical and industrial applications. A lot of them are high power and high current applications. In order to increase power handling capability several transistors or diodes are paralleled often. However such paralleling may lead to converter...

A new bipolar RRAM selector based on anti-parallel connected diodes for crossbar applications

International Nuclear Information System (INIS)

Li, Yingtao; Gong, Qingchun; Li, Rongrong; Jiang, Xinyu

2014-01-01

Crossbar arrays are the most promising application of a resistive random access memory (RRAM) device for achieving high density memory. However, cross-talk interference in the crossbar array limits the increase in the integration density. In this paper, the combination of two anti-parallel connected diodes and a bipolar RRAM cell is proposed to suppress the sneak current in a crossbar array with anti-parallel connected diodes as the selector for the bipolar RRAM. By using the anti-parallel connected diodes as a selector, the sneak current can be effectively suppressed and the high density crossbar array of more than 1 Mb can be realized as estimated by the 1/2V read voltage scheme. These results indicate that anti-parallel connected diodes can be used as a bipolar selector and have great potential for high density bipolar RRAM crossbar array applications. (papers)
Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

KAUST Repository

Hasanov, Khalid

2014-03-04

© 2014, Springer Science+Business Media New York. Many state-of-the-art parallel algorithms, which are widely used in scientific applications executed on high-end computing systems, were designed in the twentieth century with relatively small-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel algorithms for execution on large-scale distributed-memory systems. The idea is to reduce the communication cost by introducing hierarchy and hence more parallelism in the communication scheme. We apply this approach to SUMMA, the state-of-the-art parallel algorithm for matrix–matrix multiplication, and demonstrate both theoretically and experimentally that the modified Hierarchical SUMMA significantly improves the communication cost and the overall performance on large-scale platforms.
Adaptive Load Balancing of Parallel Applications with Multi-Agent Reinforcement Learning on Heterogeneous Systems

Directory of Open Access Journals (Sweden)

Johan Parent

2004-01-01

Full Text Available We report on the improvements that can be achieved by applying machine learning techniques, in particular reinforcement learning, for the dynamic load balancing of parallel applications. The applications being considered in this paper are coarse grain data intensive applications. Such applications put high pressure on the interconnect of the hardware. Synchronization and load balancing in complex, heterogeneous networks need fast, flexible, adaptive load balancing algorithms. Viewing a parallel application as a one-state coordination game in the framework of multi-agent reinforcement learning, and by using a recently introduced multi-agent exploration technique, we are able to improve upon the classic job farming approach. The improvements are achieved with limited computation and communication overhead.
Parallel real-time visualization system for large-scale simulation. Application to WSPEEDI

International Nuclear Information System (INIS)

Muramatsu, Kazuhiro; Otani, Takayuki; Kitabata, Hideyuki; Matsumoto, Hideki; Takei, Toshifumi; Doi, Shun

2000-01-01

The real-time visualization system, PATRAS (PArallel TRAcking Steering system) has been developed on parallel computing servers. The system performs almost all of the visualization tasks on a parallel computing server, and uses image data compression technique for efficient communication between the server and the client terminal. Therefore, the system realizes high performance concurrent visualization in an internet computing environment. The experience in applying PATRAS to WSPEEDI (Worldwide version of System for Prediction Environmental Emergency Dose Information) is reported. The application of PATRAS to WSPEEDI enables users to understand behaviours of radioactive tracers from different release points easily and quickly. (author)
Are Cloud Environments Ready for Scientific Applications?

Science.gov (United States)

Mehrotra, P.; Shackleford, K.

2011-12-01

Cloud computing environments are becoming widely available both in the commercial and government sectors. They provide flexibility to rapidly provision resources in order to meet dynamic and changing computational needs without the customers incurring capital expenses and/or requiring technical expertise. Clouds also provide reliable access to resources even though the end-user may not have in-house expertise for acquiring or operating such resources. Consolidation and pooling in a cloud environment allow organizations to achieve economies of scale in provisioning or procuring computing resources and services. Because of these and other benefits, many businesses and organizations are migrating their business applications (e.g., websites, social media, and business processes) to cloud environments-evidenced by the commercial success of offerings such as the Amazon EC2. In this paper, we focus on the feasibility of utilizing cloud environments for scientific workloads and workflows particularly of interest to NASA scientists and engineers. There is a wide spectrum of such technical computations. These applications range from small workstation-level computations to mid-range computing requiring small clusters to high-performance simulations requiring supercomputing systems with high bandwidth/low latency interconnects. Data-centric applications manage and manipulate large data sets such as satellite observational data and/or data previously produced by high-fidelity modeling and simulation computations. Most of the applications are run in batch mode with static resource requirements. However, there do exist situations that have dynamic demands, particularly ones with public-facing interfaces providing information to the general public, collaborators and partners, as well as to internal NASA users. In the last few months we have been studying the suitability of cloud environments for NASA's technical and scientific workloads. We have ported several applications to
The Automatic Parallelisation of Scientific Application Codes Using a Computer Aided Parallelisation Toolkit

Science.gov (United States)

Ierotheou, C.; Johnson, S.; Leggett, P.; Cross, M.; Evans, E.; Jin, Hao-Qiang; Frumkin, M.; Yan, J.; Biegel, Bryan (Technical Monitor)

2001-01-01

The shared-memory programming model is a very effective way to achieve parallelism on shared memory parallel computers. Historically, the lack of a programming standard for using directives and the rather limited performance due to scalability have affected the take-up of this programming model approach. Significant progress has been made in hardware and software technologies, as a result the performance of parallel programs with compiler directives has also made improvements. The introduction of an industrial standard for shared-memory programming with directives, OpenMP, has also addressed the issue of portability. In this study, we have extended the computer aided parallelization toolkit (developed at the University of Greenwich), to automatically generate OpenMP based parallel programs with nominal user assistance. We outline the way in which loop types are categorized and how efficient OpenMP directives can be defined and placed using the in-depth interprocedural analysis that is carried out by the toolkit. We also discuss the application of the toolkit on the NAS Parallel Benchmarks and a number of real-world application codes. This work not only demonstrates the great potential of using the toolkit to quickly parallelize serial programs but also the good performance achievable on up to 300 processors for hybrid message passing and directive-based parallelizations.
On the impact of communication complexity in the design of parallel numerical algorithms

Science.gov (United States)

Gannon, D.; Vanrosendale, J.

1984-01-01

This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation.
Advanced scientific computational methods and their applications of nuclear technologies. (1) Overview of scientific computational methods, introduction of continuum simulation methods and their applications (1)

International Nuclear Information System (INIS)

Oka, Yoshiaki; Okuda, Hiroshi

2006-01-01

Scientific computational methods have advanced remarkably with the progress of nuclear development. They have played the role of weft connecting each realm of nuclear engineering and then an introductory course of advanced scientific computational methods and their applications to nuclear technologies were prepared in serial form. This is the first issue showing their overview and introduction of continuum simulation methods. Finite element method as their applications is also reviewed. (T. Tanaka)
ADL: a graphical design language for real time parallel applications

NARCIS (Netherlands)

M.R. van Steen; T. Vogel; A. ten Dam

1993-01-01

textabstractDesigning parallel applications is generally experienced as a tedious and difficult task, especially when hard real-time performance requirements have to be met. This paper discusses on-going work concerning the construction of a Design Entry System which supports the design phase of
Data communications in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-11-12

Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.
Sight Application Analysis Tool

Energy Technology Data Exchange (ETDEWEB)

Bronevetsky, G. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2014-09-17

The scale and complexity of scientific applications makes it very difficult to optimize, debug and extend them to support new capabilities. We have developed a tool that supports developers’ efforts to understand the logical flow of their applications and interactions between application components and hardware in a way that scales with application complexity and parallelism.
AC losses in horizontally parallel HTS tapes for possible wireless power transfer applications

Science.gov (United States)

Shen, Boyang; Geng, Jianzhao; Zhang, Xiuchang; Fu, Lin; Li, Chao; Zhang, Heng; Dong, Qihuan; Ma, Jun; Gawith, James; Coombs, T. A.

2017-12-01

This paper presents the concept of using horizontally parallel HTS tapes with AC loss study, and the investigation on possible wireless power transfer (WPT) applications. An example of three parallel HTS tapes was proposed, whose AC loss study was carried out both from experiment using electrical method; and simulation using 2D H-formulation on the FEM platform of COMSOL Multiphysics. The electromagnetic induction around the three parallel tapes was monitored using COMSOL simulation. The electromagnetic induction and AC losses generated by a conventional three turn coil was simulated as well, and then compared to the case of three parallel tapes with the same AC transport current. The analysis demonstrates that HTS parallel tapes could be potentially used into wireless power transfer systems, which could have lower total AC losses than conventional HTS coils.
Biomedical applications on the GRID efficient management of parallel jobs

CERN Document Server

Moscicki, Jakub T; Lee Hurng Chun; Lin, S C; Pia, Maria Grazia

2004-01-01

Distributed computing based on the Master-Worker and PULL interaction model is applicable to a number of applications in high energy physics, medical physics and bio-informatics. We demonstrate a realistic medical physics use-case of a dosimetric system for brachytherapy using distributed Grid resources. We present the efficient techniques for running parallel jobs in a case of the BLAST, a gene sequencing application, as well as for the Monte Carlo simulation based on Geant4. We present a strategy for improving the runtime performance and robustness of the jobs as well as for the minimization of the development time needed to migrate the applications to a distributed environment.
Parallel Breadth-First Search on Distributed Memory Systems

Energy Technology Data Exchange (ETDEWEB)

Computational Research Division; Buluc, Aydin; Madduri, Kamesh

2011-04-15

Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms. We present two highly-tuned par- allel approaches for BFS on large parallel systems: a level-synchronous strategy that relies on a simple vertex-based partitioning of the graph, and a two-dimensional sparse matrix- partitioning-based approach that mitigates parallel commu- nication overhead. For both approaches, we also present hybrid versions with intra-node multithreading. Our novel hybrid two-dimensional algorithm reduces communication times by up to a factor of 3.5, relative to a common vertex based approach. Our experimental study identifies execu- tion regimes in which these approaches will be competitive, and we demonstrate extremely high performance on lead- ing distributed-memory parallel systems. For instance, for a 40,000-core parallel execution on Hopper, an AMD Magny- Cours based system, we achieve a BFS performance rate of 17.8 billion edge visits per second on an undirected graph of 4.3 billion vertices and 68.7 billion edges with skewed degree distribution.
Parallel computing in cluster of GPU applied to a problem of nuclear engineering

International Nuclear Information System (INIS)

Moraes, Sergio Ricardo S.; Heimlich, Adino; Resende, Pedro

2013-01-01

Cluster computing has been widely used as a low cost alternative for parallel processing in scientific applications. With the use of Message-Passing Interface (MPI) protocol development became even more accessible and widespread in the scientific community. A more recent trend is the use of Graphic Processing Unit (GPU), which is a powerful co-processor able to perform hundreds of instructions in parallel, reaching a capacity of hundreds of times the processing of a CPU. However, a standard PC does not allow, in general, more than two GPUs. Hence, it is proposed in this work development and evaluation of a hybrid low cost parallel approach to the solution to a nuclear engineering typical problem. The idea is to use clusters parallelism technology (MPI) together with GPU programming techniques (CUDA - Compute Unified Device Architecture) to simulate neutron transport through a slab using Monte Carlo method. By using a cluster comprised by four quad-core computers with 2 GPU each, it has been developed programs using MPI and CUDA technologies. Experiments, applying different configurations, from 1 to 8 GPUs has been performed and results were compared with the sequential (non-parallel) version. A speed up of about 2.000 times has been observed when comparing the 8-GPU with the sequential version. Results here presented are discussed and analyzed with the objective of outlining gains and possible limitations of the proposed approach. (author)
Parallel Resolved Open Source CFD-DEM: Method, Validation and Application

Directory of Open Access Journals (Sweden)

A. Hager

2014-03-01

Full Text Available In the following paper the authors present a fully parallelized Open Source method for calculating the interaction of immersed bodies and surrounding fluid. A combination of computational fluid dynamics (CFD and a discrete element method (DEM accounts for the physics of both the fluid and the particles. The objects considered are relatively big compared to the cells of the fluid mesh, i.e. they cover several cells each. Thus this fictitious domain method (FDM is called resolved. The implementation is realized within the Open Source framework CFDEMcOupling (www.cfdem.com, which provides an interface between OpenFOAM® based CFD-solvers and the DEM software LIGGGHTS (www.liggghts.com. While both LIGGGHTS and OpenFOAM® were already parallelized, only a recent improvement of the algorithm permits the fully parallel computation of resolved problems. Alongside with a detailed description of the method, its implementation and recent improvements, a number of application and validation examples is presented in the scope of this paper.
Scientific Data Management Center for Enabling Technologies

Energy Technology Data Exchange (ETDEWEB)

Vouk, Mladen A.

2013-01-15

Managing scientific data has been identified by the scientific community as one of the most important emerging needs because of the sheer volume and increasing complexity of data being collected. Effectively generating, managing, and analyzing this information requires a comprehensive, end-to-end approach to data management that encompasses all of the stages from the initial data acquisition to the final analysis of the data. Fortunately, the data management problems encountered by most scientific domains are common enough to be addressed through shared technology solutions. Based on community input, we have identified three significant requirements. First, more efficient access to storage systems is needed. In particular, parallel file system and I/O system improvements are needed to write and read large volumes of data without slowing a simulation, analysis, or visualization engine. These processes are complicated by the fact that scientific data are structured differently for specific application domains, and are stored in specialized file formats. Second, scientists require technologies to facilitate better understanding of their data, in particular the ability to effectively perform complex data analysis and searches over extremely large data sets. Specialized feature discovery and statistical analysis techniques are needed before the data can be understood or visualized. Furthermore, interactive analysis requires techniques for efficiently selecting subsets of the data. Finally, generating the data, collecting and storing the results, keeping track of data provenance, data post-processing, and analysis of results is a tedious, fragmented process. Tools for automation of this process in a robust, tractable, and recoverable fashion are required to enhance scientific exploration. The SDM center was established under the SciDAC program to address these issues. The SciDAC-1 Scientific Data Management (SDM) Center succeeded in bringing an initial set of advanced
Parallel processing from applications to systems

CERN Document Server

Moldovan, Dan I

1993-01-01

This text provides one of the broadest presentations of parallelprocessing available, including the structure of parallelprocessors and parallel algorithms. The emphasis is on mappingalgorithms to highly parallel computers, with extensive coverage ofarray and multiprocessor architectures. Early chapters provideinsightful coverage on the analysis of parallel algorithms andprogram transformations, effectively integrating a variety ofmaterial previously scattered throughout the literature. Theory andpractice are well balanced across diverse topics in this concisepresentation. For exceptional cla
Designing Scientific Software for Heterogeneous Computing

DEFF Research Database (Denmark)

Glimberg, Stefan Lemvig

, algorithms and data structures must be designed to utilize the underlying parallel architecture. The architectural changes in hardware design within the last decade, from single to multi and many-core architectures, require software developers to identify and properly implement methods that both exploit...... makes parallel software design applicable, but also a challenge for scientific software developers at all levels. We have developed a generic C++ library for fast prototyping of large-scale PDEs solvers based on flexible-order finite difference approximations on structured regular grids. The library...... is designed with a high abstraction interface to improve developer productivity. The library is based on modern template-based design concepts as described in Glimberg, Engsig-Karup, Nielsen & Dammann (2013). The library utilizes heterogeneous CPU/GPU environments in order to maximize computational throughput...
Control of parallel-connected bidirectional AC-DC converters in stationary frame for microgrid application

DEFF Research Database (Denmark)

Lu, Xiaonan; Guerrero, Josep M.; Teodorescu, Remus

2011-01-01

With the penetration of renewable energy in modern power system, microgrid has become a popular application worldwide. In this paper, parallel-connected bidirectional converters for AC and DC hybrid microgrid application are proposed as an efficient interface. To reach the goal of bidirectional...... power conversion, both rectifier and inverter modes are analyzed. In order to achieve high performance operation, hierarchical control system is accomplished. The control system is designed in stationary frame, with harmonic compensation in parallel and no coupled terms between axes. In this control...

Parallel transmission techniques in magnetic resonance imaging: experimental realization, applications and perspectives

International Nuclear Information System (INIS)

Ullmann, P.

2007-06-01

The primary objective of this work was the first experimental realization of parallel RF transmission for accelerating spatially selective excitation in magnetic resonance imaging. Furthermore, basic aspects regarding the performance of this technique were investigated, potential risks regarding the specific absorption rate (SAR) were considered and feasibility studies under application-oriented conditions as first steps towards a practical utilisation of this technique were undertaken. At first, based on the RF electronics platform of the Bruker Avance MRI systems, the technical foundations were laid to perform simultaneous transmission of individual RF waveforms on different RF channels. Another essential requirement for the realization of Parallel Excitation (PEX) was the design and construction of suitable RF transmit arrays with elements driven by separate transmit channels. In order to image the PEX results two imaging methods were implemented based on a spin-echo and a gradient-echo sequence, in which a parallel spatially selective pulse was included as an excitation pulse. In the course of this work PEX experiments were successfully performed on three different MRI systems, a 4.7 T and a 9.4 T animal system and a 3 T human scanner, using 5 different RF coil setups in total. In the last part of this work investigations regarding possible applications of Parallel Excitation were performed. A first study comprised experiments of slice-selective B1 inhomogeneity correction by using 3D-selective Parallel Excitation. The investigations were performed in a phantom as well as in a rat fixed in paraformaldehyde solution. In conjunction with these experiments a novel method of calculating RF pulses for spatially selective excitation based on a so-called Direct Calibration approach was developed, which is particularly suitable for this type of experiments. In the context of these experiments it was demonstrated how to combine the advantages of parallel transmission
Parallel MR imaging.

Science.gov (United States)

Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A; Seiberlich, Nicole

2012-07-01

Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the undersampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. Copyright © 2012 Wiley Periodicals, Inc.
Practical parallel computing

CERN Document Server

Morse, H Stephen

1994-01-01

Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi
Application of parallel computing techniques to a large-scale reservoir simulation

International Nuclear Information System (INIS)

Zhang, Keni; Wu, Yu-Shu; Ding, Chris; Pruess, Karsten

2001-01-01

Even with the continual advances made in both computational algorithms and computer hardware used in reservoir modeling studies, large-scale simulation of fluid and heat flow in heterogeneous reservoirs remains a challenge. The problem commonly arises from intensive computational requirement for detailed modeling investigations of real-world reservoirs. This paper presents the application of a massive parallel-computing version of the TOUGH2 code developed for performing large-scale field simulations. As an application example, the parallelized TOUGH2 code is applied to develop a three-dimensional unsaturated-zone numerical model simulating flow of moisture, gas, and heat in the unsaturated zone of Yucca Mountain, Nevada, a potential repository for high-level radioactive waste. The modeling approach employs refined spatial discretization to represent the heterogeneous fractured tuffs of the system, using more than a million 3-D gridblocks. The problem of two-phase flow and heat transfer within the model domain leads to a total of 3,226,566 linear equations to be solved per Newton iteration. The simulation is conducted on a Cray T3E-900, a distributed-memory massively parallel computer. Simulation results indicate that the parallel computing technique, as implemented in the TOUGH2 code, is very efficient. The reliability and accuracy of the model results have been demonstrated by comparing them to those of small-scale (coarse-grid) models. These comparisons show that simulation results obtained with the refined grid provide more detailed predictions of the future flow conditions at the site, aiding in the assessment of proposed repository performance
Parallel evolutionary computation in bioinformatics applications.

Science.gov (United States)

Pinho, Jorge; Sobral, João Luis; Rocha, Miguel

2013-05-01

A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

Science.gov (United States)

Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

2017-01-01

In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.
Parallel algorithms for mapping pipelined and parallel computations

Science.gov (United States)

Nicol, David M.

1988-01-01

Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
Test Driven Development of Scientific Models

Science.gov (United States)

Clune, Thomas L.

2012-01-01

Test-Driven Development (TDD) is a software development process that promises many advantages for developer productivity and has become widely accepted among professional software engineers. As the name suggests, TDD practitioners alternate between writing short automated tests and producing code that passes those tests. Although this overly simplified description will undoubtedly sound prohibitively burdensome to many uninitiated developers, the advent of powerful unit-testing frameworks greatly reduces the effort required to produce and routinely execute suites of tests. By testimony, many developers find TDD to be addicting after only a few days of exposure, and find it unthinkable to return to previous practices. Of course, scientific/technical software differs from other software categories in a number of important respects, but I nonetheless believe that TDD is quite applicable to the development of such software and has the potential to significantly improve programmer productivity and code quality within the scientific community. After a detailed introduction to TDD, I will present the experience within the Software Systems Support Office (SSSO) in applying the technique to various scientific applications. This discussion will emphasize the various direct and indirect benefits as well as some of the difficulties and limitations of the methodology. I will conclude with a brief description of pFUnit, a unit testing framework I co-developed to support test-driven development of parallel Fortran applications.
Parallel computations

CERN Document Server

1982-01-01

Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn
FPS scientific and supercomputers computers in chemistry

International Nuclear Information System (INIS)

Curington, I.J.

1987-01-01

FPS Array Processors, scientific computers, and highly parallel supercomputers are used in nearly all aspects of compute-intensive computational chemistry. A survey is made of work utilizing this equipment, both published and current research. The relationship of the computer architecture to computational chemistry is discussed, with specific reference to Molecular Dynamics, Quantum Monte Carlo simulations, and Molecular Graphics applications. Recent installations of the FPS T-Series are highlighted, and examples of Molecular Graphics programs running on the FPS-5000 are shown
RAPPORT: running scientific high-performance computing applications on the cloud.

Science.gov (United States)

Cohen, Jeremy; Filippis, Ioannis; Woodbridge, Mark; Bauer, Daniela; Hong, Neil Chue; Jackson, Mike; Butcher, Sarah; Colling, David; Darlington, John; Fuchs, Brian; Harvey, Matt

2013-01-28

Cloud computing infrastructure is now widely used in many domains, but one area where there has been more limited adoption is research computing, in particular for running scientific high-performance computing (HPC) software. The Robust Application Porting for HPC in the Cloud (RAPPORT) project took advantage of existing links between computing researchers and application scientists in the fields of bioinformatics, high-energy physics (HEP) and digital humanities, to investigate running a set of scientific HPC applications from these domains on cloud infrastructure. In this paper, we focus on the bioinformatics and HEP domains, describing the applications and target cloud platforms. We conclude that, while there are many factors that need consideration, there is no fundamental impediment to the use of cloud infrastructure for running many types of HPC applications and, in some cases, there is potential for researchers to benefit significantly from the flexibility offered by cloud platforms.
Application of Logic Models in a Large Scientific Research Program

Science.gov (United States)

O'Keefe, Christine M.; Head, Richard J.

2011-01-01

It is the purpose of this article to discuss the development and application of a logic model in the context of a large scientific research program within the Commonwealth Scientific and Industrial Research Organisation (CSIRO). CSIRO is Australia's national science agency and is a publicly funded part of Australia's innovation system. It conducts…
Parallel transmission techniques in magnetic resonance imaging: experimental realization, applications and perspectives; Parallele Sendetechniken in der Magnetresonanztomographie: experimentelle Realisierung, Anwendungen und Perspektiven

Energy Technology Data Exchange (ETDEWEB)

Ullmann, P.

2007-06-15

The primary objective of this work was the first experimental realization of parallel RF transmission for accelerating spatially selective excitation in magnetic resonance imaging. Furthermore, basic aspects regarding the performance of this technique were investigated, potential risks regarding the specific absorption rate (SAR) were considered and feasibility studies under application-oriented conditions as first steps towards a practical utilisation of this technique were undertaken. At first, based on the RF electronics platform of the Bruker Avance MRI systems, the technical foundations were laid to perform simultaneous transmission of individual RF waveforms on different RF channels. Another essential requirement for the realization of Parallel Excitation (PEX) was the design and construction of suitable RF transmit arrays with elements driven by separate transmit channels. In order to image the PEX results two imaging methods were implemented based on a spin-echo and a gradient-echo sequence, in which a parallel spatially selective pulse was included as an excitation pulse. In the course of this work PEX experiments were successfully performed on three different MRI systems, a 4.7 T and a 9.4 T animal system and a 3 T human scanner, using 5 different RF coil setups in total. In the last part of this work investigations regarding possible applications of Parallel Excitation were performed. A first study comprised experiments of slice-selective B1 inhomogeneity correction by using 3D-selective Parallel Excitation. The investigations were performed in a phantom as well as in a rat fixed in paraformaldehyde solution. In conjunction with these experiments a novel method of calculating RF pulses for spatially selective excitation based on a so-called Direct Calibration approach was developed, which is particularly suitable for this type of experiments. In the context of these experiments it was demonstrated how to combine the advantages of parallel transmission
Applications of artificial intelligence to scientific research

Science.gov (United States)

Prince, Mary Ellen

1986-01-01

Artificial intelligence (AI) is a growing field which is just beginning to make an impact on disciplines other than computer science. While a number of military and commercial applications were undertaken in recent years, few attempts were made to apply AI techniques to basic scientific research. There is no inherent reason for the discrepancy. The characteristics of the problem, rather than its domain, determines whether or not it is suitable for an AI approach. Expert system, intelligent tutoring systems, and learning programs are examples of theoretical topics which can be applied to certain areas of scientific research. Further research and experimentation should eventurally make it possible for computers to act as intelligent assistants to scientists.
Massive hybrid parallelism for fully implicit multiphysics

International Nuclear Information System (INIS)

Gaston, D. R.; Permann, C. J.; Andrs, D.; Peterson, J. W.

2013-01-01

As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided. (authors)
Massive hybrid parallelism for fully implicit multiphysics

Energy Technology Data Exchange (ETDEWEB)

Gaston, D. R.; Permann, C. J.; Andrs, D.; Peterson, J. W. [Idaho National Laboratory, 2525 N. Fremont Ave., Idaho Falls, ID 83415 (United States)

2013-07-01

As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided. (authors)
MASSIVE HYBRID PARALLELISM FOR FULLY IMPLICIT MULTIPHYSICS

Energy Technology Data Exchange (ETDEWEB)

Cody J. Permann; David Andrs; John W. Peterson; Derek R. Gaston

2013-05-01

As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided.
A high performance scientific cloud computing environment for materials simulations

Science.gov (United States)

Jorissen, K.; Vila, F. D.; Rehr, J. J.

2012-09-01

We describe the development of a scientific cloud computing (SCC) platform that offers high performance computation capability. The platform consists of a scientific virtual machine prototype containing a UNIX operating system and several materials science codes, together with essential interface tools (an SCC toolset) that offers functionality comparable to local compute clusters. In particular, our SCC toolset provides automatic creation of virtual clusters for parallel computing, including tools for execution and monitoring performance, as well as efficient I/O utilities that enable seamless connections to and from the cloud. Our SCC platform is optimized for the Amazon Elastic Compute Cloud (EC2). We present benchmarks for prototypical scientific applications and demonstrate performance comparable to local compute clusters. To facilitate code execution and provide user-friendly access, we have also integrated cloud computing capability in a JAVA-based GUI. Our SCC platform may be an alternative to traditional HPC resources for materials science or quantum chemistry applications.
Implementation of Scientific Computing Applications on the Cell Broadband Engine

Directory of Open Access Journals (Sweden)

Guochun Shi

2009-01-01

Full Text Available The Cell Broadband Engine architecture is a revolutionary processor architecture well suited for many scientific codes. This paper reports on an effort to implement several traditional high-performance scientific computing applications on the Cell Broadband Engine processor, including molecular dynamics, quantum chromodynamics and quantum chemistry codes. The paper discusses data and code restructuring strategies necessary to adapt the applications to the intrinsic properties of the Cell processor and demonstrates performance improvements achieved on the Cell architecture. It concludes with the lessons learned and provides practical recommendations on optimization techniques that are believed to be most appropriate.
Vision systems for scientific and engineering applications

International Nuclear Information System (INIS)

Chadda, V.K.

2009-01-01

Human performance can get degraded due to boredom, distraction and fatigue in vision-related tasks such as measurement, counting etc. Vision based techniques are increasingly being employed in many scientific and engineering applications. Notable advances in this field are emerging from continuing improvements in the fields of sensors and related technologies, and advances in computer hardware and software. Automation utilizing vision-based systems can perform repetitive tasks faster and more accurately, with greater consistency over time than humans. Electronics and Instrumentation Services Division has developed vision-based systems for several applications to perform tasks such as precision alignment, biometric access control, measurement, counting etc. This paper describes in brief four such applications. (author)

Getting To Exascale: Applying Novel Parallel Programming Models To Lab Applications For The Next Generation Of Supercomputers

Energy Technology Data Exchange (ETDEWEB)

Dube, Evi [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Shereda, Charles [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Nau, Lee [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Harris, Lance [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2010-09-27

As supercomputing moves toward exascale, node architectures will change significantly. CPU core counts on nodes will increase by an order of magnitude or more. Heterogeneous architectures will become more commonplace, with GPUs or FPGAs providing additional computational power. Novel programming models may make better use of on-node parallelism in these new architectures than do current models. In this paper we examine several of these novel models – UPC, CUDA, and OpenCL –to determine their suitability to LLNL scientific application codes. Our study consisted of several phases: We conducted interviews with code teams and selected two codes to port; We learned how to program in the new models and ported the codes; We debugged and tuned the ported applications; We measured results, and documented our findings. We conclude that UPC is a challenge for porting code, Berkeley UPC is not very robust, and UPC is not suitable as a general alternative to OpenMP for a number of reasons. CUDA is well supported and robust but is a proprietary NVIDIA standard, while OpenCL is an open standard. Both are well suited to a specific set of application problems that can be run on GPUs, but some problems are not suited to GPUs. Further study of the landscape of novel models is recommended.
OS and Runtime Support for Efficiently Managing Cores in Parallel Applications

OpenAIRE

Klues, Kevin Alan

2015-01-01

Parallel applications can benefit from the ability to explicitly control their thread scheduling policies in user-space. However, modern operating systems lack the interfaces necessary to make this type of “user-level” scheduling efficient. The key component missing is the ability for applications to gain direct access to cores and keep control of those cores even when making I/O operations that traditionally block in the kernel. A number of former systems provided limited support for these c...
8th International Workshop on Parallel Tools for High Performance Computing

CERN Document Server

Gracia, José; Knüpfer, Andreas; Resch, Michael; Nagel, Wolfgang

2015-01-01

Numerical simulation and modelling using High Performance Computing has evolved into an established technique in academic and industrial research. At the same time, the High Performance Computing infrastructure is becoming ever more complex. For instance, most of the current top systems around the world use thousands of nodes in which classical CPUs are combined with accelerator cards in order to enhance their compute power and energy efficiency. This complexity can only be mastered with adequate development and optimization tools. Key topics addressed by these tools include parallelization on heterogeneous systems, performance optimization for CPUs and accelerators, debugging of increasingly complex scientific applications, and optimization of energy usage in the spirit of green IT. This book represents the proceedings of the 8th International Parallel Tools Workshop, held October 1-2, 2014 in Stuttgart, Germany – which is a forum to discuss the latest advancements in the parallel tools.
Application of Text Analytics to Extract and Analyze Material–Application Pairs from a Large Scientific Corpus

Directory of Open Access Journals (Sweden)

Nikhil Kalathil

2018-01-01

Full Text Available When assessing the importance of materials (or other components to a given set of applications, machine analysis of a very large corpus of scientific abstracts can provide an analyst a base of insights to develop further. The use of text analytics reduces the time required to conduct an evaluation, while allowing analysts to experiment with a multitude of different hypotheses. Because the scope and quantity of metadata analyzed can, and should, be large, any divergence from what a human analyst determines and what the text analysis shows provides a prompt for the human analyst to reassess any preliminary findings. In this work, we have successfully extracted material–application pairs and ranked them on their importance. This method provides a novel way to map scientific advances in a particular material to the application for which it is used. Approximately 438,000 titles and abstracts of scientific papers published from 1992 to 2011 were used to examine 16 materials. This analysis used coclustering text analysis to associate individual materials with specific clean energy applications, evaluate the importance of materials to specific applications, and assess their importance to clean energy overall. Our analysis reproduced the judgments of experts in assigning material importance to applications. The validated methods were then used to map the replacement of one material with another material in a specific application (batteries.
Using of opportunities of graphic processors for acceleration of scientific and technical calculations

International Nuclear Information System (INIS)

Dudnik, V.A.; Kudryavtsev, V.I.; Sereda, T.M.; Us, S.A.; Shestakov, M.V.

2009-01-01

The new opportunities of modern graphic processors (GPU) for acceleration of the scientific and technical calculations with the help of paralleling of a calculating task between the central processor and GPU are described. The description of using the technology NVIDIA CUDA for connection of parallel computing opportunities of GPU within the programme of the some intensive mathematical tasks is resulted. The examples of comparison of parameters of productivity in the process of these tasks' calculation without application of GPU and with use of opportunities NVIDIA CUDA for graphic processor GeForce 8800 are resulted
Applications of the parallel computing system using network

International Nuclear Information System (INIS)

Ido, Shunji; Hasebe, Hiroki

1994-01-01

Parallel programming is applied to multiple processors connected in Ethernet. Data exchanges between tasks located in each processing element are realized by two ways. One is socket which is standard library on recent UNIX operating systems. Another is a network connecting software, named as Parallel Virtual Machine (PVM) which is a free software developed by ORNL, to use many workstations connected to network as a parallel computer. This paper discusses the availability of parallel computing using network and UNIX workstations and comparison between specialized parallel systems (Transputer and iPSC/860) in a Monte Carlo simulation which generally shows high parallelization ratio. (author)
High Performance Fortran for Aerospace Applications

National Research Council Canada - National Science Library

Mehrotra, Piyush

2000-01-01

.... HPF is a set of Fortran extensions designed to provide users with a high-level interface for programming data parallel scientific applications while delegating to the compiler/runtime system the task...
GPU-based Parallel Application Design for Emerging Mobile Devices

Science.gov (United States)

Gupta, Kshitij

A revolution is underway in the computing world that is causing a fundamental paradigm shift in device capabilities and form-factor, with a move from well-established legacy desktop/laptop computers to mobile devices in varying sizes and shapes. Amongst all the tasks these devices must support, graphics has emerged as the 'killer app' for providing a fluid user interface and high-fidelity game rendering, effectively making the graphics processor (GPU) one of the key components in (present and future) mobile systems. By utilizing the GPU as a general-purpose parallel processor, this dissertation explores the GPU computing design space from an applications standpoint, in the mobile context, by focusing on key challenges presented by these devices---limited compute, memory bandwidth, and stringent power consumption requirements---while improving the overall application efficiency of the increasingly important speech recognition workload for mobile user interaction. We broadly partition trends in GPU computing into four major categories. We analyze hardware and programming model limitations in current-generation GPUs and detail an alternate programming style called Persistent Threads, identify four use case patterns, and propose minimal modifications that would be required for extending native support. We show how by manually extracting data locality and altering the speech recognition pipeline, we are able to achieve significant savings in memory bandwidth while simultaneously reducing the compute burden on GPU-like parallel processors. As we foresee GPU computing to evolve from its current 'co-processor' model into an independent 'applications processor' that is capable of executing complex work independently, we create an alternate application framework that enables the GPU to handle all control-flow dependencies autonomously at run-time while minimizing host involvement to just issuing commands, that facilitates an efficient application implementation. Finally, as
An ontology model for execution records of Grid scientific applications

NARCIS (Netherlands)

Baliś, B.; Bubak, M.

2008-01-01

Records of past application executions are particularly important in the case of loosely-coupled, workflow driven scientific applications which are used to conduct in silico experiments, often on top of Grid infrastructures. In this paper, we propose an ontology-based model for storing and querying
78 FR 52760 - Application(s) for Duty-Free Entry of Scientific Instruments

Science.gov (United States)

2013-08-26

... invite comments on the question of whether instruments of equivalent scientific value, for the purposes... platforms based on self- assembled DNA nanostructures for studying cell biology. DNA nanostructures will be... 23, 2013. Docket Number: 13-033. Applicant: University of Pittsburgh School of Medicine, 3500 Terrace...
Parallel magnetic resonance imaging

International Nuclear Information System (INIS)

Larkman, David J; Nunes, Rita G

2007-01-01

Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed. (invited topical review)
Data driven parallelism in experimental high energy physics applications

International Nuclear Information System (INIS)

Pohl, M.

1987-01-01

I present global design principles for the implementation of high energy physics data analysis code on sequential and parallel processors with mixed shared and local memory. Potential parallelism in the structure of high energy physics tasks is identified with granularity varying from a few times 10 8 instructions all the way down to a few times 10 4 instructions. It follows the hierarchical structure of detector and data acquisition systems. To take advantage of this - yet preserving the necessary portability of the code - I propose a computational model with purely data driven concurrency in Single Program Multiple Data (SPMD) mode. The task granularity is defined by varying the granularity of the central data structure manipulated. Concurrent processes coordiate themselves asynchroneously using simple lock constructs on parts of the data structure. Load balancing among processes occurs naturally. The scheme allows to map the internal layout of the data structure closely onto the layout of local and shared memory in a parallel architecture. It thus allows to optimize the application with respect to synchronization as well as data transport overheads. I present a coarse top level design for a portable implementation of this scheme on sequential machines, multiprocessor mainframes (e.g. IBM 3090), tightly coupled multiprocessors (e.g. RP-3) and loosely coupled processor arrays (e.g. LCAP, Emulating Processor Farms). (orig.)
Data driven parallelism in experimental high energy physics applications

Science.gov (United States)

Pohl, Martin

1987-08-01

I present global design principles for the implementation of High Energy Physics data analysis code on sequential and parallel processors with mixed shared and local memory. Potential parallelism in the structure of High Energy Physics tasks is identified with granularity varying from a few times 10 8 instructions all the way down to a few times 10 4 instructions. It follows the hierarchical structure of detector and data acquisition systems. To take advantage of this - yet preserving the necessary portability of the code - I propose a computational model with purely data driven concurrency in Single Program Multiple Data (SPMD) mode. The Task granularity is defined by varying the granularity of the central data structure manipulated. Concurrent processes coordinate themselves asynchroneously using simple lock constructs on parts of the data structure. Load balancing among processes occurs naturally. The scheme allows to map the internal layout of the data structure closely onto the layout of local and shared memory in a parallel architecture. It thus allows to optimize the application with respect to synchronization as well as data transport overheads. I present a coarse top level design for a portable implementation of this scheme on sequential machines, multiprocessor mainframes (e.g. IBM 3090), tightly coupled multiprocessors (e.g. RP-3) and loosely coupled processor arrays (e.g. LCAP, Emulating Processor Farms).
Accelerating Scientific Applications using High Performance Dense and Sparse Linear Algebra Kernels on GPUs

KAUST Repository

Abdelfattah, Ahmad

2015-01-15

High performance computing (HPC) platforms are evolving to more heterogeneous configurations to support the workloads of various applications. The current hardware landscape is composed of traditional multicore CPUs equipped with hardware accelerators that can handle high levels of parallelism. Graphical Processing Units (GPUs) are popular high performance hardware accelerators in modern supercomputers. GPU programming has a different model than that for CPUs, which means that many numerical kernels have to be redesigned and optimized specifically for this architecture. GPUs usually outperform multicore CPUs in some compute intensive and massively parallel applications that have regular processing patterns. However, most scientific applications rely on crucial memory-bound kernels and may witness bottlenecks due to the overhead of the memory bus latency. They can still take advantage of the GPU compute power capabilities, provided that an efficient architecture-aware design is achieved. This dissertation presents a uniform design strategy for optimizing critical memory-bound kernels on GPUs. Based on hierarchical register blocking, double buffering and latency hiding techniques, this strategy leverages the performance of a wide range of standard numerical kernels found in dense and sparse linear algebra libraries. The work presented here focuses on matrix-vector multiplication kernels (MVM) as repre- sentative and most important memory-bound operations in this context. Each kernel inherits the benefits of the proposed strategies. By exposing a proper set of tuning parameters, the strategy is flexible enough to suit different types of matrices, ranging from large dense matrices, to sparse matrices with dense block structures, while high performance is maintained. Furthermore, the tuning parameters are used to maintain the relative performance across different GPU architectures. Multi-GPU acceleration is proposed to scale the performance on several devices. The
Scientific computing and algorithms in industrial simulations projects and products of Fraunhofer SCAI

CERN Document Server

Schüller, Anton; Schweitzer, Marc

2017-01-01

The contributions gathered here provide an overview of current research projects and selected software products of the Fraunhofer Institute for Algorithms and Scientific Computing SCAI. They show the wide range of challenges that scientific computing currently faces, the solutions it offers, and its important role in developing applications for industry. Given the exciting field of applied collaborative research and development it discusses, the book will appeal to scientists, practitioners, and students alike. The Fraunhofer Institute for Algorithms and Scientific Computing SCAI combines excellent research and application-oriented development to provide added value for our partners. SCAI develops numerical techniques, parallel algorithms and specialized software tools to support and optimize industrial simulations. Moreover, it implements custom software solutions for production and logistics, and offers calculations on high-performance computers. Its services and products are based on state-of-the-art metho...
Applications of field-programmable gate arrays in scientific research

CERN Document Server

Sadrozinski, Hartmut F W

2011-01-01

Focusing on resource awareness in field-programmable gate array (FPGA) design, Applications of Field-Programmable Gate Arrays in Scientific Research covers the principle of FPGAs and their functionality. It explores a host of applications, ranging from small one-chip laboratory systems to large-scale applications in ""big science."" The book first describes various FPGA resources, including logic elements, RAM, multipliers, microprocessors, and content-addressable memory. It then presents principles and methods for controlling resources, such as process sequencing, location constraints, and in
Parallel Algorithms for the Exascale Era

Energy Technology Data Exchange (ETDEWEB)

Robey, Robert W. [Los Alamos National Laboratory

2016-10-19

New parallel algorithms are needed to reach the Exascale level of parallelism with millions of cores. We look at some of the research developed by students in projects at LANL. The research blends ideas from the early days of computing while weaving in the fresh approach brought by students new to the field of high performance computing. We look at reproducibility of global sums and why it is important to parallel computing. Next we look at how the concept of hashing has led to the development of more scalable algorithms suitable for next-generation parallel computers. Nearly all of this work has been done by undergraduates and published in leading scientific journals.
Automatic Loop Parallelization via Compiler Guided Refactoring

DEFF Research Database (Denmark)

Larsen, Per; Ladelsky, Razya; Lidman, Jacob

For many parallel applications, performance relies not on instruction-level parallelism, but on loop-level parallelism. Unfortunately, many modern applications are written in ways that obstruct automatic loop parallelization. Since we cannot identify sufficient parallelization opportunities...... for these codes in a static, off-line compiler, we developed an interactive compilation feedback system that guides the programmer in iteratively modifying application source, thereby improving the compiler’s ability to generate loop-parallel code. We use this compilation system to modify two sequential...... benchmarks, finding that the code parallelized in this way runs up to 8.3 times faster on an octo-core Intel Xeon 5570 system and up to 12.5 times faster on a quad-core IBM POWER6 system. Benchmark performance varies significantly between the systems. This suggests that semi-automatic parallelization should...
Manuscript Architect: a Web application for scientific writing in virtual interdisciplinary groups

Directory of Open Access Journals (Sweden)

Menezes Andreia P

2005-06-01

Full Text Available Abstract Background Although scientific writing plays a central role in the communication of clinical research findings and consumes a significant amount of time from clinical researchers, few Web applications have been designed to systematically improve the writing process. This application had as its main objective the separation of the multiple tasks associated with scientific writing into smaller components. It was also aimed at providing a mechanism where sections of the manuscript (text blocks could be assigned to different specialists. Manuscript Architect was built using Java language in conjunction with the classic lifecycle development method. The interface was designed for simplicity and economy of movements. Manuscripts are divided into multiple text blocks that can be assigned to different co-authors by the first author. Each text block contains notes to guide co-authors regarding the central focus of each text block, previous examples, and an additional field for translation when the initial text is written in a language different from the one used by the target journal. Usability was evaluated using formal usability tests and field observations. Results The application presented excellent usability and integration with the regular writing habits of experienced researchers. Workshops were developed to train novice researchers, presenting an accelerated learning curve. The application has been used in over 20 different scientific articles and grant proposals. Conclusion The current version of Manuscript Architect has proven to be very useful in the writing of multiple scientific texts, suggesting that virtual writing by interdisciplinary groups is an effective manner of scientific writing when interdisciplinary work is required.
Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

Science.gov (United States)

OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

1998-01-01

This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).

Strategies for application of scientific findings in prevention.

Science.gov (United States)

Wei, S H

1995-07-01

Dental research in the last 50 years has accomplished numerous significant advances in preventive dentistry, particularly in the area of research in fluorides, periodontal diseases, restorative dentistry, and dental materials, as well as craniofacial development and molecular biology. The transfer of scientific knowledge to clinical practitioners requires additional effort. It is the responsibility of the scientific communities to transfer the fruits of their findings to society through publications, conferences, media, and the press. Specific programs that the International Association for Dental Research (IADR) has developed to transmit science to the profession and the public have included science transfer seminars, the Visiting Lecture Program, and hands-on workshops. The IADR Strategic Plan also has a major outreach goal. In addition, the Federation Dentaire Internationale (FDI) and the World Health Organization (WHO) have initiated plans to celebrate World Health Day and the Year of Oral Health in 1994. These are important strategies for the application of scientific findings in prevention.
A parallel additive Schwarz preconditioned Jacobi-Davidson algorithm for polynomial eigenvalue problems in quantum dot simulation

International Nuclear Information System (INIS)

Hwang, F-N; Wei, Z-H; Huang, T-M; Wang Weichung

2010-01-01

We develop a parallel Jacobi-Davidson approach for finding a partial set of eigenpairs of large sparse polynomial eigenvalue problems with application in quantum dot simulation. A Jacobi-Davidson eigenvalue solver is implemented based on the Portable, Extensible Toolkit for Scientific Computation (PETSc). The eigensolver thus inherits PETSc's efficient and various parallel operations, linear solvers, preconditioning schemes, and easy usages. The parallel eigenvalue solver is then used to solve higher degree polynomial eigenvalue problems arising in numerical simulations of three dimensional quantum dots governed by Schroedinger's equations. We find that the parallel restricted additive Schwarz preconditioner in conjunction with a parallel Krylov subspace method (e.g. GMRES) can solve the correction equations, the most costly step in the Jacobi-Davidson algorithm, very efficiently in parallel. Besides, the overall performance is quite satisfactory. We have observed near perfect superlinear speedup by using up to 320 processors. The parallel eigensolver can find all target interior eigenpairs of a quintic polynomial eigenvalue problem with more than 32 million variables within 12 minutes by using 272 Intel 3.0 GHz processors.
Implementation and performance of parallelized elegant

International Nuclear Information System (INIS)

Wang, Y.; Borland, M.

2008-01-01

The program elegant is widely used for design and modeling of linacs for free-electron lasers and energy recovery linacs, as well as storage rings and other applications. As part of a multi-year effort, we have parallelized many aspects of the code, including single-particle dynamics, wakefields, and coherent synchrotron radiation. We report on the approach used for gradual parallelization, which proved very beneficial in getting parallel features into the hands of users quickly. We also report details of parallelization of collective effects. Finally, we discuss performance of the parallelized code in various applications.
Parallel Polarization State Generation.

Science.gov (United States)

She, Alan; Capasso, Federico

2016-05-17

The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.
Parallel rendering

Science.gov (United States)

Crockett, Thomas W.

1995-01-01

This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
A framework for distributed mixed-language scientific applications

International Nuclear Information System (INIS)

Quarrie, D.R.

1996-01-01

The Object Management Group has defined an architecture (COBRA) for distributed object applications based on an Object Broker and Interface Definition Language. This project builds upon this architecture to establish a framework for the creation of mixed language scientific applications. A prototype compiler has been written that generates FORTRAN 90 or Eiffel subs and skeletons and the required C++ glue code from an input IDL file that specifies object interfaces. This generated code can be used directly for non-distributed mixed language applications or in conjunction with the C++ code generated from a commercial IDL compiler for distributed applications. A feasibility study is presently to see whether a fully integrated software development environment for distributed, mixed-language applications can be created by modifying the back-end code generator of a commercial CASE tool to emit IDL. (author)
SDS: A Framework for Scientific Data Services

Energy Technology Data Exchange (ETDEWEB)

Dong, Bin; Byna, Surendra; Wu, Kesheng

2013-10-31

Large-scale scientific applications typically write their data to parallel file systems with organizations designed to achieve fast write speeds. Analysis tasks frequently read the data in a pattern that is different from the write pattern, and therefore experience poor I/O performance. In this paper, we introduce a prototype framework for bridging the performance gap between write and read stages of data access from parallel file systems. We call this framework Scientific Data Services, or SDS for short. This initial implementation of SDS focuses on reorganizing previously written files into data layouts that benefit read patterns, and transparently directs read calls to the reorganized data. SDS follows a client-server architecture. The SDS Server manages partial or full replicas of reorganized datasets and serves SDS Clients' requests for data. The current version of the SDS client library supports HDF5 programming interface for reading data. The client library intercepts HDF5 calls and transparently redirects them to the reorganized data. The SDS client library also provides a querying interface for reading part of the data based on user-specified selective criteria. We describe the design and implementation of the SDS client-server architecture, and evaluate the response time of the SDS Server and the performance benefits of SDS.
Development, Verification and Validation of Parallel, Scalable Volume of Fluid CFD Program for Propulsion Applications

Science.gov (United States)

West, Jeff; Yang, H. Q.

2014-01-01

There are many instances involving liquid/gas interfaces and their dynamics in the design of liquid engine powered rockets such as the Space Launch System (SLS). Some examples of these applications are: Propellant tank draining and slosh, subcritical condition injector analysis for gas generators, preburners and thrust chambers, water deluge mitigation for launch induced environments and even solid rocket motor liquid slag dynamics. Commercially available CFD programs simulating gas/liquid interfaces using the Volume of Fluid approach are currently limited in their parallel scalability. In 2010 for instance, an internal NASA/MSFC review of three commercial tools revealed that parallel scalability was seriously compromised at 8 cpus and no additional speedup was possible after 32 cpus. Other non-interface CFD applications at the time were demonstrating useful parallel scalability up to 4,096 processors or more. Based on this review, NASA/MSFC initiated an effort to implement a Volume of Fluid implementation within the unstructured mesh, pressure-based algorithm CFD program, Loci-STREAM. After verification was achieved by comparing results to the commercial CFD program CFD-Ace+, and validation by direct comparison with data, Loci-STREAM-VoF is now the production CFD tool for propellant slosh force and slosh damping rate simulations at NASA/MSFC. On these applications, good parallel scalability has been demonstrated for problems sizes of tens of millions of cells and thousands of cpu cores. Ongoing efforts are focused on the application of Loci-STREAM-VoF to predict the transient flow patterns of water on the SLS Mobile Launch Platform in order to support the phasing of water for launch environment mitigation so that vehicle determinantal effects are not realized.
SDM center technologies for accelerating scientific discoveries

International Nuclear Information System (INIS)

Shoshani, Arie; Altintas, Ilkay; Choudhary, Alok; Critchlow, Terence; Kamath, Chandrika; Ludaescher, Bertram; Nieplocha, Jarek; Parker, Steve; Ross, Rob; Samatova, Nagiza; Vouk, Mladen

2007-01-01

With the increasing volume and complexity of data produced by ultra-scale simulations and high-throughput experiments, understanding the science is largely hampered by the lack of comprehensive, end-to-end data management solutions ranging from initial data acquisition to final analysis and visualization. The SciDAC-1 Scientific Data Management (SDM) Center succeeded in bringing an initial set of advanced data management technologies to DOE application scientists in astrophysics, climate, fusion, and biology. Equally important, it established collaborations with these scientists to better understand their science as well as their forthcoming data management and data analytics challenges. Our future focus is on improving the SDM framework to address the needs of ultra-scale science during SciDAC-2. Specifically, we are enhancing and extending our existing tools to allow for more interactivity and fault tolerance when managing scientists' workflows, for better parallelism and feature extraction capabilities in their data analytics operations, and for greater efficiency and functionality in users' interactions with local parallel file systems, active storage, and access to remote storage. These improvements are necessary for the scalability and complexity challenges presented by hardware and applications at ultra scale, and are complemented by continued efforts to work with application scientists in various domains
A framework for integration of scientific applications into the OpenTopography workflow

Science.gov (United States)

Nandigam, V.; Crosby, C.; Baru, C.

2012-12-01

The NSF-funded OpenTopography facility provides online access to Earth science-oriented high-resolution LIDAR topography data, online processing tools, and derivative products. The underlying cyberinfrastructure employs a multi-tier service oriented architecture that is comprised of an infrastructure tier, a processing services tier, and an application tier. The infrastructure tier consists of storage, compute resources as well as supporting databases. The services tier consists of the set of processing routines each deployed as a Web service. The applications tier provides client interfaces to the system. (e.g. Portal). We propose a "pluggable" infrastructure design that will allow new scientific algorithms and processing routines developed and maintained by the community to be integrated into the OpenTopography system so that the wider earth science community can benefit from its availability. All core components in OpenTopography are available as Web services using a customized open-source Opal toolkit. The Opal toolkit provides mechanisms to manage and track job submissions, with the help of a back-end database. It allows monitoring of job and system status by providing charting tools. All core components in OpenTopography have been developed, maintained and wrapped as Web services using Opal by OpenTopography developers. However, as the scientific community develops new processing and analysis approaches this integration approach is not scalable efficiently. Most of the new scientific applications will have their own active development teams performing regular updates, maintenance and other improvements. It would be optimal to have the application co-located where its developers can continue to actively work on it while still making it accessible within the OpenTopography workflow for processing capabilities. We will utilize a software framework for remote integration of these scientific applications into the OpenTopography system. This will be accomplished by
QDP++: Data Parallel Interface for QCD

Energy Technology Data Exchange (ETDEWEB)

Robert Edwards

2003-03-01

This is a user's guide for the C++ binding for the QDP Data Parallel Applications Programmer Interface developed under the auspices of the US Department of Energy Scientific Discovery through Advanced Computing (SciDAC) program. The QDP Level 2 API has the following features: (1) Provides data parallel operations (logically SIMD) on all sites across the lattice or subsets of these sites. (2) Operates on lattice objects, which have an implementation-dependent data layout that is not visible above this API. (3) Hides details of how the implementation maps onto a given architecture, namely how the logical problem grid (i.el lattice) is mapped onto the machine architecture. (4) Allows asynchronous (non-blocking) shifts of lattice level objects over any permutation map of site sonto sites. However, from the user's view these instructions appear blocking and in fact may be so in some implementation. (5) Provides broadcast operations (filling a lattice quantity from a scalar value(s)), global reduction operations, and lattice-wide operations on various data-type primitives, such as matrices, vectors, and tensor products of matrices (propagators). (6) Operator syntax that support complex expression constructions.
Parallel supercomputing: Advanced methods, algorithms, and software for large-scale linear and nonlinear problems

Energy Technology Data Exchange (ETDEWEB)

Carey, G.F.; Young, D.M.

1993-12-31

The program outlined here is directed to research on methods, algorithms, and software for distributed parallel supercomputers. Of particular interest are finite element methods and finite difference methods together with sparse iterative solution schemes for scientific and engineering computations of very large-scale systems. Both linear and nonlinear problems will be investigated. In the nonlinear case, applications with bifurcation to multiple solutions will be considered using continuation strategies. The parallelizable numerical methods of particular interest are a family of partitioning schemes embracing domain decomposition, element-by-element strategies, and multi-level techniques. The methods will be further developed incorporating parallel iterative solution algorithms with associated preconditioners in parallel computer software. The schemes will be implemented on distributed memory parallel architectures such as the CRAY MPP, Intel Paragon, the NCUBE3, and the Connection Machine. We will also consider other new architectures such as the Kendall-Square (KSQ) and proposed machines such as the TERA. The applications will focus on large-scale three-dimensional nonlinear flow and reservoir problems with strong convective transport contributions. These are legitimate grand challenge class computational fluid dynamics (CFD) problems of significant practical interest to DOE. The methods developed and algorithms will, however, be of wider interest.
Opus: A Coordination Language for Multidisciplinary Applications

Directory of Open Access Journals (Sweden)

Barbara Chapman

1997-01-01

Full Text Available Data parallel languages, such as High Performance Fortran, can be successfully applied to a wide range of numerical applications.However, many advanced scientific and engineering applications are multidisciplinary and heterogeneous in nature, and thus do not fit well into the data parallel paradigm. In this paper we present Opus, a language designed to fill this gap. The central concept of Opus is a mechanism called ShareD Abstractions (SDA. An SDA can be used as a computation server, i.e., a locus of computational activity, or as a data repository for sharing data between asynchronous tasks. SDAs can be internally data parallel, providing support for the integration of data and task parallelism as well as nested task parallelism. They can thus be used to express multidisciplinary applications in a natural and efficient way. In this paper we describe the features of the language through a series of examples and give an overview of the runtime support required to implement these concepts in parallel and distributed environments.
Impulse: Memory System Support for Scientific Applications

Directory of Open Access Journals (Sweden)

John B. Carter

1999-01-01

Full Text Available Impulse is a new memory system architecture that adds two important features to a traditional memory controller. First, Impulse supports application‐specific optimizations through configurable physical address remapping. By remapping physical addresses, applications control how their data is accessed and cached, improving their cache and bus utilization. Second, Impulse supports prefetching at the memory controller, which can hide much of the latency of DRAM accesses. Because it requires no modification to processor, cache, or bus designs, Impulse can be adopted in conventional systems. In this paper we describe the design of the Impulse architecture, and show how an Impulse memory system can improve the performance of memory‐bound scientific applications. For instance, Impulse decreases the running time of the NAS conjugate gradient benchmark by 67%. We expect that Impulse will also benefit regularly strided, memory‐bound applications of commercial importance, such as database and multimedia programs.
Techniques and tools for measuring energy efficiency of scientific software applications

CERN Document Server

Abdurachmanov, David; Eulisse, Giulio; Knight, Robert; Niemi, Tapio; Nurminen, Jukka K.; Nyback, Filip; Pestana, Goncalo; Ou, Zhonghong; Khan, Kashif

2014-01-01

The scale of scientific High Performance Computing (HPC) and High Throughput Computing (HTC) has increased significantly in recent years, and is becoming sensitive to total energy use and cost. Energy-efficiency has thus become an important concern in scientific fields such as High Energy Physics (HEP). There has been a growing interest in utilizing alternate architectures, such as low power ARM processors, to replace traditional Intel x86 architectures. Nevertheless, even though such solutions have been successfully used in mobile applications with low I/O and memory demands, it is unclear if they are suitable and more energy-efficient in the scientific computing environment. Furthermore, there is a lack of tools and experience to derive and compare power consumption between the architectures for various workloads, and eventually to support software optimizations for energy efficiency. To that end, we have performed several physical and software-based measurements of workloads from HEP applications running o...
A Programming Framework for Scientific Applications on CPU-GPU Systems

Energy Technology Data Exchange (ETDEWEB)

Owens, John

2013-03-24

At a high level, my research interests center around designing, programming, and evaluating computer systems that use new approaches to solve interesting problems. The rapid change of technology allows a variety of different architectural approaches to computationally difficult problems, and a constantly shifting set of constraints and trends makes the solutions to these problems both challenging and interesting. One of the most important recent trends in computing has been a move to commodity parallel architectures. This sea change is motivated by the industry’s inability to continue to profitably increase performance on a single processor and instead to move to multiple parallel processors. In the period of review, my most significant work has been leading a research group looking at the use of the graphics processing unit (GPU) as a general-purpose processor. GPUs can potentially deliver superior performance on a broad range of problems than their CPU counterparts, but effectively mapping complex applications to a parallel programming model with an emerging programming environment is a significant and important research problem.
Tile-based parallel coordinates and its application in financial visualization

Science.gov (United States)

Alsakran, Jamal; Zhao, Ye; Zhao, Xinlei

2010-01-01

Parallel coordinates technique has been widely used in information visualization applications and it has achieved great success in visualizing multivariate data and perceiving their trends. Nevertheless, visual clutter usually weakens or even diminishes its ability when the data size increases. In this paper, we first propose a tile-based parallel coordinates, where the plotting area is divided into rectangular tiles. Each tile stores an intersection density that counts the total number of polylines intersecting with that tile. Consequently, the intersection density is mapped to optical attributes, such as color and opacity, by interactive transfer functions. The method visualizes the polylines efficiently and informatively in accordance with the density distribution, and thus, reduces visual cluttering and promotes knowledge discovery. The interactivity of our method allows the user to instantaneously manipulate the tiles distribution and the transfer functions. Specifically, the classic parallel coordinates rendering is a special case of our method when each tile represents only one pixel. A case study on a real world data set, U.S. stock mutual fund data of year 2006, is presented to show the capability of our method in visually analyzing financial data. The presented visual analysis is conducted by an expert in the domain of finance. Our method gains the support from professionals in the finance field, they embrace it as a potential investment analysis tool for mutual fund managers, financial planners, and investors.
TME (Task Mapping Editor): tool for executing distributed parallel computing. TME user's manual

International Nuclear Information System (INIS)

Takemiya, Hiroshi; Yamagishi, Nobuhiro; Imamura, Toshiyuki

2000-03-01

At the Center for Promotion of Computational Science and Engineering, a software environment PPExe has been developed to support scientific computing on a parallel computer cluster (distributed parallel scientific computing). TME (Task Mapping Editor) is one of components of the PPExe and provides a visual programming environment for distributed parallel scientific computing. Users can specify data dependence among tasks (programs) visually as a data flow diagram and map these tasks onto computers interactively through GUI of TME. The specified tasks are processed by other components of PPExe such as Meta-scheduler, RIM (Resource Information Monitor), and EMS (Execution Management System) according to the execution order of these tasks determined by TME. In this report, we describe the usage of TME. (author)
The Centre of High-Performance Scientific Computing, Geoverbund, ABC/J - Geosciences enabled by HPSC

Science.gov (United States)

Kollet, Stefan; Görgen, Klaus; Vereecken, Harry; Gasper, Fabian; Hendricks-Franssen, Harrie-Jan; Keune, Jessica; Kulkarni, Ketan; Kurtz, Wolfgang; Sharples, Wendy; Shrestha, Prabhakar; Simmer, Clemens; Sulis, Mauro; Vanderborght, Jan

2016-04-01

The Centre of High-Performance Scientific Computing (HPSC TerrSys) was founded 2011 to establish a centre of competence in high-performance scientific computing in terrestrial systems and the geosciences enabling fundamental and applied geoscientific research in the Geoverbund ABC/J (geoscientfic research alliance of the Universities of Aachen, Cologne, Bonn and the Research Centre Jülich, Germany). The specific goals of HPSC TerrSys are to achieve relevance at the national and international level in (i) the development and application of HPSC technologies in the geoscientific community; (ii) student education; (iii) HPSC services and support also to the wider geoscientific community; and in (iv) the industry and public sectors via e.g., useful applications and data products. A key feature of HPSC TerrSys is the Simulation Laboratory Terrestrial Systems, which is located at the Jülich Supercomputing Centre (JSC) and provides extensive capabilities with respect to porting, profiling, tuning and performance monitoring of geoscientific software in JSC's supercomputing environment. We will present a summary of success stories of HPSC applications including integrated terrestrial model development, parallel profiling and its application from watersheds to the continent; massively parallel data assimilation using physics-based models and ensemble methods; quasi-operational terrestrial water and energy monitoring; and convection permitting climate simulations over Europe. The success stories stress the need for a formalized education of students in the application of HPSC technologies in future.
PetClaw: Parallelization and Performance Optimization of a Python-Based Nonlinear Wave Propagation Solver Using PETSc

KAUST Repository

Alghamdi, Amal Mohammed

2012-04-01

Clawpack, a conservation laws package implemented in Fortran, and its Python-based version, PyClaw, are existing tools providing nonlinear wave propagation solvers that use state of the art finite volume methods. Simulations using those tools can have extensive computational requirements to provide accurate results. Therefore, a number of tools, such as BearClaw and MPIClaw, have been developed based on Clawpack to achieve significant speedup by exploiting parallel architectures. However, none of them has been shown to scale on a large number of cores. Furthermore, these tools, implemented in Fortran, achieve parallelization by inserting parallelization logic and MPI standard routines throughout the serial code in a non modular manner. Our contribution in this thesis research is three-fold. First, we demonstrate an advantageous use case of Python in implementing easy-to-use modular extensible scalable scientific software tools by developing an implementation of a parallelization framework, PetClaw, for PyClaw using the well-known Portable Extensible Toolkit for Scientific Computation, PETSc, through its Python wrapper petsc4py. Second, we demonstrate the possibility of getting acceptable Python code performance when compared to Fortran performance after introducing a number of serial optimizations to the Python code including integrating Clawpack Fortran kernels into PyClaw for low-level computationally intensive parts of the code. As a result of those optimizations, the Python overhead in PetClaw for a shallow water application is only 12 percent when compared to the corresponding Fortran Clawpack application. Third, we provide a demonstration of PetClaw scalability on up to the entirety of Shaheen; a 16-rack Blue Gene/P IBM supercomputer that comprises 65,536 cores and located at King Abdullah University of Science and Technology (KAUST). The PetClaw solver achieved above 0.98 weak scaling efficiency for an Euler application on the whole machine excluding the

Parallel Framework for Cooperative Processes

Directory of Open Access Journals (Sweden)

Mitică Craus

2005-01-01

Full Text Available This paper describes the work of an object oriented framework designed to be used in the parallelization of a set of related algorithms. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can be used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. Two applications are presented: an Ant Colony Optimization (ACO parallel algorithm for the Travelling Salesman Problem (TSP and an Image Processing (IP parallel algorithm for the Symmetrical Neighborhood Filter (SNF. The implementations of these applications by means of the parallel framework prove to have good performances: approximatively linear speedup and low communication cost.
Parallel Computing Strategies for Irregular Algorithms

Science.gov (United States)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
SPINET: A Parallel Computing Approach to Spine Simulations

Directory of Open Access Journals (Sweden)

Peter G. Kropf

1996-01-01

Full Text Available Research in scientitic programming enables us to realize more and more complex applications, and on the other hand, application-driven demands on computing methods and power are continuously growing. Therefore, interdisciplinary approaches become more widely used. The interdisciplinary SPINET project presented in this article applies modern scientific computing tools to biomechanical simulations: parallel computing and symbolic and modern functional programming. The target application is the human spine. Simulations of the spine help us to investigate and better understand the mechanisms of back pain and spinal injury. Two approaches have been used: the first uses the finite element method for high-performance simulations of static biomechanical models, and the second generates a simulation developmenttool for experimenting with different dynamic models. A finite element program for static analysis has been parallelized for the MUSIC machine. To solve the sparse system of linear equations, a conjugate gradient solver (iterative method and a frontal solver (direct method have been implemented. The preprocessor required for the frontal solver is written in the modern functional programming language SML, the solver itself in C, thus exploiting the characteristic advantages of both functional and imperative programming. The speedup analysis of both solvers show very satisfactory results for this irregular problem. A mixed symbolic-numeric environment for rigid body system simulations is presented. It automatically generates C code from a problem specification expressed by the Lagrange formalism using Maple.
Techniques and tools for measuring energy efficiency of scientific software applications

International Nuclear Information System (INIS)

Abdurachmanov, David; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Niemi, Tapio; Pestana, Gonçalo; Khan, Kashif; Nurminen, Jukka K; Nyback, Filip; Ou, Zhonghong

2015-01-01

The scale of scientific High Performance Computing (HPC) and High Throughput Computing (HTC) has increased significantly in recent years, and is becoming sensitive to total energy use and cost. Energy-efficiency has thus become an important concern in scientific fields such as High Energy Physics (HEP). There has been a growing interest in utilizing alternate architectures, such as low power ARM processors, to replace traditional Intel x86 architectures. Nevertheless, even though such solutions have been successfully used in mobile applications with low I/O and memory demands, it is unclear if they are suitable and more energy-efficient in the scientific computing environment. Furthermore, there is a lack of tools and experience to derive and compare power consumption between the architectures for various workloads, and eventually to support software optimizations for energy efficiency. To that end, we have performed several physical and software-based measurements of workloads from HEP applications running on ARM and Intel architectures, and compare their power consumption and performance. We leverage several profiling tools (both in hardware and software) to extract different characteristics of the power use. We report the results of these measurements and the experience gained in developing a set of measurement techniques and profiling tools to accurately assess the power consumption for scientific workloads. (paper)
Application of software quality assurance to a specific scientific code development task

International Nuclear Information System (INIS)

Dronkers, J.J.

1986-03-01

This paper describes an application of software quality assurance to a specific scientific code development program. The software quality assurance program consists of three major components: administrative control, configuration management, and user documentation. The program attempts to be consistent with existing local traditions of scientific code development while at the same time providing a controlled process of development
Neighborhood communication paradigm to increase scalability in large-scale dynamic scientific applications

KAUST Repository

Ovcharenko, Aleksandr; Ibanez, Daniel; Delalondre, Fabien; Sahni, Onkar; Jansen, Kenneth E.; Carothers, Christopher D.; Shephard, Mark S.

2012-01-01

packing to manage message flow control and reduce the number and time of communication calls. The test application demonstrated is parallel unstructured mesh adaptation. Results on IBM Blue Gene/P and Cray XE6 computers show that the use of neighborhood
A SPECT reconstruction method for extending parallel to non-parallel geometries

International Nuclear Information System (INIS)

Wen Junhai; Liang Zhengrong

2010-01-01

Due to its simplicity, parallel-beam geometry is usually assumed for the development of image reconstruction algorithms. The established reconstruction methodologies are then extended to fan-beam, cone-beam and other non-parallel geometries for practical application. This situation occurs for quantitative SPECT (single photon emission computed tomography) imaging in inverting the attenuated Radon transform. Novikov reported an explicit parallel-beam formula for the inversion of the attenuated Radon transform in 2000. Thereafter, a formula for fan-beam geometry was reported by Bukhgeim and Kazantsev (2002 Preprint N. 99 Sobolev Institute of Mathematics). At the same time, we presented a formula for varying focal-length fan-beam geometry. Sometimes, the reconstruction formula is so implicit that we cannot obtain the explicit reconstruction formula in the non-parallel geometries. In this work, we propose a unified reconstruction framework for extending parallel-beam geometry to any non-parallel geometry using ray-driven techniques. Studies by computer simulations demonstrated the accuracy of the presented unified reconstruction framework for extending parallel-beam to non-parallel geometries in inverting the attenuated Radon transform.
Parallel Processing and Applied Mathematics. 10th International Conference, PPAM 2013. Revised Selected Papers

DEFF Research Database (Denmark)

The following topics are dealt with: parallel scientific computing; numerical algorithms; parallel nonnumerical algorithms; cloud computing; evolutionary computing; metaheuristics; applied mathematics; GPU computing; multicore systems; hybrid architectures; hierarchical parallelism; HPC systems......; power monitoring; energy monitoring; and distributed computing....
Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2014-08-12

Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Parallel discrete event simulation

NARCIS (Netherlands)

Overeinder, B.J.; Hertzberger, L.O.; Sloot, P.M.A.; Withagen, W.J.

1991-01-01

In simulating applications for execution on specific computing systems, the simulation performance figures must be known in a short period of time. One basic approach to the problem of reducing the required simulation time is the exploitation of parallelism. However, in parallelizing the simulation
Parallel Reservoir Simulations with Sparse Grid Techniques and Applications to Wormhole Propagation

KAUST Repository

Wu, Yuanqing

2015-09-08

In this work, two topics of reservoir simulations are discussed. The first topic is the two-phase compositional flow simulation in hydrocarbon reservoir. The major obstacle that impedes the applicability of the simulation code is the long run time of the simulation procedure, and thus speeding up the simulation code is necessary. Two means are demonstrated to address the problem: parallelism in physical space and the application of sparse grids in parameter space. The parallel code can gain satisfactory scalability, and the sparse grids can remove the bottleneck of flash calculations. Instead of carrying out the flash calculation in each time step of the simulation, a sparse grid approximation of all possible results of the flash calculation is generated before the simulation. Then the constructed surrogate model is evaluated to approximate the flash calculation results during the simulation. The second topic is the wormhole propagation simulation in carbonate reservoir. In this work, different from the traditional simulation technique relying on the Darcy framework, we propose a new framework called Darcy-Brinkman-Forchheimer framework to simulate wormhole propagation. Furthermore, to process the large quantity of cells in the simulation grid and shorten the long simulation time of the traditional serial code, standard domain-based parallelism is employed, using the Hypre multigrid library. In addition to that, a new technique called “experimenting field approach” to set coefficients in the model equations is introduced. In the 2D dissolution experiments, different configurations of wormholes and a series of properties simulated by both frameworks are compared. We conclude that the numerical results of the DBF framework are more like wormholes and more stable than the Darcy framework, which is a demonstration of the advantages of the DBF framework. The scalability of the parallel code is also evaluated, and good scalability can be achieved. Finally, a mixed
Integrated Task And Data Parallel Programming: Language Design

Science.gov (United States)

Grimshaw, Andrew S.; West, Emily A.

1998-01-01

his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated
Design Issues and Application of Cable-Based Parallel Manipulators for Rehabilitation Therapy

Directory of Open Access Journals (Sweden)

E. Ottaviano

2008-01-01

Full Text Available In this study, cable-based manipulators are proposed for application in rehabilitation therapies. Cable-based manipulators show good features that are very useful when the system has to interact with humans. In particular, they can be used to aid motion or as monitoring/training systems in rehabilitation therapies. Modelling and simulation of both active and passive cable-based parallel manipulators are presented for an application to help older people, patients or disabled people in the sit-to-stand transfer and as a monitoring/training system. Experimental results are presented by using built prototypes.
Scientific Programming with High Performance Fortran: A Case Study Using the xHPF Compiler

Directory of Open Access Journals (Sweden)

Eric De Sturler

1997-01-01

Full Text Available Recently, the first commercial High Performance Fortran (HPF subset compilers have appeared. This article reports on our experiences with the xHPF compiler of Applied Parallel Research, version 1.2, for the Intel Paragon. At this stage, we do not expect very High Performance from our HPF programs, even though performance will eventually be of paramount importance for the acceptance of HPF. Instead, our primary objective is to study how to convert large Fortran 77 (F77 programs to HPF such that the compiler generates reasonably efficient parallel code. We report on a case study that identifies several problems when parallelizing code with HPF; most of these problems affect current HPF compiler technology in general, although some are specific for the xHPF compiler. We discuss our solutions from the perspective of the scientific programmer, and presenttiming results on the Intel Paragon. The case study comprises three programs of different complexity with respect to parallelization. We use the dense matrix-matrix product to show that the distribution of arrays and the order of nested loops significantly influence the performance of the parallel program. We use Gaussian elimination with partial pivoting to study the parallelization strategy of the compiler. There are various ways to structure this algorithm for a particular data distribution. This example shows how much effort may be demanded from the programmer to support the compiler in generating an efficient parallel implementation. Finally, we use a small application to show that the more complicated structure of a larger program may introduce problems for the parallelization, even though all subroutines of the application are easy to parallelize by themselves. The application consists of a finite volume discretization on a structured grid and a nested iterative solver. Our case study shows that it is possible to obtain reasonably efficient parallel programs with xHPF, although the compiler
Domain decomposition methods and parallel computing

International Nuclear Information System (INIS)

Meurant, G.

1991-01-01

In this paper, we show how to efficiently solve large linear systems on parallel computers. These linear systems arise from discretization of scientific computing problems described by systems of partial differential equations. We show how to get a discrete finite dimensional system from the continuous problem and the chosen conjugate gradient iterative algorithm is briefly described. Then, the different kinds of parallel architectures are reviewed and their advantages and deficiencies are emphasized. We sketch the problems found in programming the conjugate gradient method on parallel computers. For this algorithm to be efficient on parallel machines, domain decomposition techniques are introduced. We give results of numerical experiments showing that these techniques allow a good rate of convergence for the conjugate gradient algorithm as well as computational speeds in excess of a billion of floating point operations per second. (author). 5 refs., 11 figs., 2 tabs., 1 inset
Reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application

Science.gov (United States)

Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda E [Cambridge, MA; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

2012-04-17

Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.
Reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application

Science.gov (United States)

Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda A [Rochester, MN; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

2012-01-10

Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.
Advanced scientific computational methods and their applications to nuclear technologies. (3) Introduction of continuum simulation methods and their applications (3)

International Nuclear Information System (INIS)

Satake, Shin-ichi; Kunugi, Tomoaki

2006-01-01

Scientific computational methods have advanced remarkably with the progress of nuclear development. They have played the role of weft connecting each realm of nuclear engineering and then an introductory course of advanced scientific computational methods and their applications to nuclear technologies were prepared in serial form. This is the third issue showing the introduction of continuum simulation methods and their applications. Spectral methods and multi-interface calculation methods in fluid dynamics are reviewed. (T. Tanaka)
Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis

Science.gov (United States)

Choudhary, Alok Nidhi

1989-01-01

Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.
Exploring HPCS languages in scientific computing

International Nuclear Information System (INIS)

Barrett, R F; Alam, S R; Almeida, V F d; Bernholdt, D E; Elwasif, W R; Kuehn, J A; Poole, S W; Shet, A G

2008-01-01

As computers scale up dramatically to tens and hundreds of thousands of cores, develop deeper computational and memory hierarchies, and increased heterogeneity, developers of scientific software are increasingly challenged to express complex parallel simulations effectively and efficiently. In this paper, we explore the three languages developed under the DARPA High-Productivity Computing Systems (HPCS) program to help address these concerns: Chapel, Fortress, and X10. These languages provide a variety of features not found in currently popular HPC programming environments and make it easier to express powerful computational constructs, leading to new ways of thinking about parallel programming. Though the languages and their implementations are not yet mature enough for a comprehensive evaluation, we discuss some of the important features, and provide examples of how they can be used in scientific computing. We believe that these characteristics will be important to the future of high-performance scientific computing, whether the ultimate language of choice is one of the HPCS languages or something else

Exploring HPCS languages in scientific computing

Science.gov (United States)

Barrett, R. F.; Alam, S. R.; Almeida, V. F. d.; Bernholdt, D. E.; Elwasif, W. R.; Kuehn, J. A.; Poole, S. W.; Shet, A. G.

2008-07-01

As computers scale up dramatically to tens and hundreds of thousands of cores, develop deeper computational and memory hierarchies, and increased heterogeneity, developers of scientific software are increasingly challenged to express complex parallel simulations effectively and efficiently. In this paper, we explore the three languages developed under the DARPA High-Productivity Computing Systems (HPCS) program to help address these concerns: Chapel, Fortress, and X10. These languages provide a variety of features not found in currently popular HPC programming environments and make it easier to express powerful computational constructs, leading to new ways of thinking about parallel programming. Though the languages and their implementations are not yet mature enough for a comprehensive evaluation, we discuss some of the important features, and provide examples of how they can be used in scientific computing. We believe that these characteristics will be important to the future of high-performance scientific computing, whether the ultimate language of choice is one of the HPCS languages or something else.
CCD developed for scientific application by Hamamatsu

CERN Document Server

Miyaguchi, K; Dezaki, J; Yamamoto, K

1999-01-01

We have developed CCDs for scientific applications that feature a low readout noise of less than 5 e-rms and low dark current of 10-25 pA/cm sup 2 at room temperature. CCDs with these characteristics will prove extremely useful in applications such as spectroscopic measurement and dental radiography. In addition, a large-area CCD of 2kx4k pixels and 15 mu m square pixel size has recently been completed for optical use in astronomical observations. Applications to X-ray astronomy require the most challenging device performance in terms of deep depletion, high CTE, and focal plane size, among others. An abuttable X-ray CCD, having 1024x1024 pixels and 24 mu m square pixel size, is to be installed in an international space station (ISS). We are now striving to achieve the lowest usable cooling temperature by means of a built-in TEC with limited power consumption. Details on the development status are described in this report. We would also like to present our future plans for a large active area and deep depleti...
Reactor Dosimetry Applications Using RAPTOR-M3G:. a New Parallel 3-D Radiation Transport Code

Science.gov (United States)

Longoni, Gianluca; Anderson, Stanwood L.

2009-08-01

The numerical solution of the Linearized Boltzmann Equation (LBE) via the Discrete Ordinates method (SN) requires extensive computational resources for large 3-D neutron and gamma transport applications due to the concurrent discretization of the angular, spatial, and energy domains. This paper will discuss the development RAPTOR-M3G (RApid Parallel Transport Of Radiation - Multiple 3D Geometries), a new 3-D parallel radiation transport code, and its application to the calculation of ex-vessel neutron dosimetry responses in the cavity of a commercial 2-loop Pressurized Water Reactor (PWR). RAPTOR-M3G is based domain decomposition algorithms, where the spatial and angular domains are allocated and processed on multi-processor computer architectures. As compared to traditional single-processor applications, this approach reduces the computational load as well as the memory requirement per processor, yielding an efficient solution methodology for large 3-D problems. Measured neutron dosimetry responses in the reactor cavity air gap will be compared to the RAPTOR-M3G predictions. This paper is organized as follows: Section 1 discusses the RAPTOR-M3G methodology; Section 2 describes the 2-loop PWR model and the numerical results obtained. Section 3 addresses the parallel performance of the code, and Section 4 concludes this paper with final remarks and future work.
Parallel multigrid methods: implementation on message-passing computers and applications to fluid dynamics. A draft

International Nuclear Information System (INIS)

Solchenbach, K.; Thole, C.A.; Trottenberg, U.

1987-01-01

For a wide class of problems in scientific computing, in particular for partial differential equations, the multigrid principle has proved to yield highly efficient numerical methods. However, the principle has to be applied carefully: if the multigrid components are not chosen adequately with respect to the given problem, the efficiency may be much smaller than possible. This has been demonstrated for many practical problems. Unfortunately, the general theories on multigrid convergence do not give much help in constructing really efficient multigrid algorithms. Although some progress has been made in bridging the gap between theory and practice during the last few years, there are still several theoretical approaches which are misleading rather than helpful with respect to the objective of real efficiency. The research in finding highly efficient algorithms for non-model applications therefore is still a sophisticated mixture of theoretical considerations, a transfer of experiences from model to real life problems and systematical experimental work. The emphasis of the practical research activity today lies - among others - in the following fields: - finding efficient multigrid components for really complex problems, - combining the multigrid approach with advanced discretizative techniques: - constructing highly parallel multigrid algorithms. In this paper, we want to deal mainly with the last topic
Parallel kinematics type, kinematics, and optimal design

CERN Document Server

Liu, Xin-Jun

2014-01-01

Parallel Kinematics- Type, Kinematics, and Optimal Design presents the results of 15 year's research on parallel mechanisms and parallel kinematics machines. This book covers the systematic classification of parallel mechanisms (PMs) as well as providing a large number of mechanical architectures of PMs available for use in practical applications. It focuses on the kinematic design of parallel robots. One successful application of parallel mechanisms in the field of machine tools, which is also called parallel kinematics machines, has been the emerging trend in advanced machine tools. The book describes not only the main aspects and important topics in parallel kinematics, but also references novel concepts and approaches, i.e. type synthesis based on evolution, performance evaluation and optimization based on screw theory, singularity model taking into account motion and force transmissibility, and others. This book is intended for researchers, scientists, engineers and postgraduates or above with interes...
Parallel Application Development Using Architecture View Driven Model Transformations

NARCIS (Netherlands)

Arkin, E.; Tekinerdogan, B.

2015-01-01

o realize the increased need for computing performance the current trend is towards applying parallel computing in which the tasks are run in parallel on multiple nodes. On its turn we can observe the rapid increase of the scale of parallel computing platforms. This situation has led to a complexity
Workshop on scientific and industrial applications of free electron lasers

International Nuclear Information System (INIS)

Difilippo, F.C.; Perez, R.B.

1990-05-01

A Workshop on Scientific and Industrial Applications of Free Electron Lasers was organized to address potential uses of a Free Electron Laser in the infrared wavelength region. A total of 13 speakers from national laboratories, universities, and the industry gave seminars to an average audience of 30 persons during June 12 and 13, 1989. The areas covered were: Free Electron Laser Technology, Chemistry and Surface Science, Atomic and Molecular Physics, Condensed Matter, and Biomedical Applications, Optical Damage, and Optoelectronics
Award for Distinguished Scientific Applications of Psychology: Nancy E. Adler

Science.gov (United States)

American Psychologist, 2009

2009-01-01

Nancy E. Adler, winner of the Award for Distinguished Scientific Applications of Psychology, is cited for her research on reproductive health examining adolescent decision making with regard to contraception, conscious and preconscious motivations for pregnancy, and perception of risk for sexually transmitted diseases, and for her groundbreaking…
Where are the parallel algorithms?

Science.gov (United States)

Voigt, R. G.

1985-01-01

Four paradigms that can be useful in developing parallel algorithms are discussed. These include computational complexity analysis, changing the order of computation, asynchronous computation, and divide and conquer. Each is illustrated with an example from scientific computation, and it is shown that computational complexity must be used with great care or an inefficient algorithm may be selected.
Analysis and design of a parallel-connected single active bridge DC-DC converter for high-power wind farm applications

DEFF Research Database (Denmark)

Park, Kiwoo; Chen, Zhe

2013-01-01

This paper presents a parallel-connected Single Active Bridge (SAB) dc-dc converter for high-power applications. Paralleling lower-power converters can lower the current rating of each modular converter and interleaving the outputs can significantly reduce the magnitudes of input and output curre...
Massively parallel fabrication of repetitive nanostructures: nanolithography for nanoarrays

International Nuclear Information System (INIS)

Luttge, Regina

2009-01-01

This topical review provides an overview of nanolithographic techniques for nanoarrays. Using patterning techniques such as lithography, normally we aim for a higher order architecture similarly to functional systems in nature. Inspired by the wealth of complexity in nature, these architectures are translated into technical devices, for example, found in integrated circuitry or other systems in which structural elements work as discrete building blocks in microdevices. Ordered artificial nanostructures (arrays of pillars, holes and wires) have shown particular properties and bring about the opportunity to modify and tune the device operation. Moreover, these nanostructures deliver new applications, for example, the nanoscale control of spin direction within a nanomagnet. Subsequently, we can look for applications where this unique property of the smallest manufactured element is repetitively used such as, for example with respect to spin, in nanopatterned magnetic media for data storage. These nanostructures are generally called nanoarrays. Most of these applications require massively parallel produced nanopatterns which can be directly realized by laser interference (areas up to 4 cm 2 are easily achieved with a Lloyd's mirror set-up). In this topical review we will further highlight the application of laser interference as a tool for nanofabrication, its limitations and ultimate advantages towards a variety of devices including nanostructuring for photonic crystal devices, high resolution patterned media and surface modifications of medical implants. The unique properties of nanostructured surfaces have also found applications in biomedical nanoarrays used either for diagnostic or functional assays including catalytic reactions on chip. Bio-inspired templated nanoarrays will be presented in perspective to other massively parallel nanolithography techniques currently discussed in the scientific literature. (topical review)
High performance parallel computing of flows in complex geometries: II. Applications

International Nuclear Information System (INIS)

Gourdain, N; Gicquel, L; Staffelbach, G; Vermorel, O; Duchaine, F; Boussuge, J-F; Poinsot, T

2009-01-01

Present regulations in terms of pollutant emissions, noise and economical constraints, require new approaches and designs in the fields of energy supply and transportation. It is now well established that the next breakthrough will come from a better understanding of unsteady flow effects and by considering the entire system and not only isolated components. However, these aspects are still not well taken into account by the numerical approaches or understood whatever the design stage considered. The main challenge is essentially due to the computational requirements inferred by such complex systems if it is to be simulated by use of supercomputers. This paper shows how new challenges can be addressed by using parallel computing platforms for distinct elements of a more complex systems as encountered in aeronautical applications. Based on numerical simulations performed with modern aerodynamic and reactive flow solvers, this work underlines the interest of high-performance computing for solving flow in complex industrial configurations such as aircrafts, combustion chambers and turbomachines. Performance indicators related to parallel computing efficiency are presented, showing that establishing fair criterions is a difficult task for complex industrial applications. Examples of numerical simulations performed in industrial systems are also described with a particular interest for the computational time and the potential design improvements obtained with high-fidelity and multi-physics computing methods. These simulations use either unsteady Reynolds-averaged Navier-Stokes methods or large eddy simulation and deal with turbulent unsteady flows, such as coupled flow phenomena (thermo-acoustic instabilities, buffet, etc). Some examples of the difficulties with grid generation and data analysis are also presented when dealing with these complex industrial applications.
Integrating multiple scientific computing needs via a Private Cloud infrastructure

International Nuclear Information System (INIS)

Bagnasco, S; Berzano, D; Brunetti, R; Lusso, S; Vallero, S

2014-01-01

In a typical scientific computing centre, diverse applications coexist and share a single physical infrastructure. An underlying Private Cloud facility eases the management and maintenance of heterogeneous use cases such as multipurpose or application-specific batch farms, Grid sites catering to different communities, parallel interactive data analysis facilities and others. It allows to dynamically and efficiently allocate resources to any application and to tailor the virtual machines according to the applications' requirements. Furthermore, the maintenance of large deployments of complex and rapidly evolving middleware and application software is eased by the use of virtual images and contextualization techniques; for example, rolling updates can be performed easily and minimizing the downtime. In this contribution we describe the Private Cloud infrastructure at the INFN-Torino Computer Centre, that hosts a full-fledged WLCG Tier-2 site and a dynamically expandable PROOF-based Interactive Analysis Facility for the ALICE experiment at the CERN LHC and several smaller scientific computing applications. The Private Cloud building blocks include the OpenNebula software stack, the GlusterFS filesystem (used in two different configurations for worker- and service-class hypervisors) and the OpenWRT Linux distribution (used for network virtualization). A future integration into a federated higher-level infrastructure is made possible by exposing commonly used APIs like EC2 and by using mainstream contextualization tools like CloudInit.
Parallel phase model : a programming model for high-end parallel machines with manycores.

Energy Technology Data Exchange (ETDEWEB)

Wu, Junfeng (Syracuse University, Syracuse, NY); Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian

2009-04-01

This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.
The graphics future in scientific applications

International Nuclear Information System (INIS)

Enderle, G.

1982-01-01

Computer graphics methods and tools are being used to a great extent in scientific research. The future development in this area will be influenced both by new hardware developments and by software advances. On the hardware sector, the development of the raster technology will lead to the increased use of colour workstations with more local processing power. Colour hardcopy devices for creating plots, slides, or movies will be available at a lower price than today. The first real 3D-workstations appear on the marketplace. One of the main activities on the software sector is the standardization of computer graphics systems, graphical files, and device interfaces. This will lead to more portable graphical application programs and to a common base for computer graphics education. (orig.)
National Laboratory for Advanced Scientific Visualization at UNAM - Mexico

Science.gov (United States)

Manea, Marina; Constantin Manea, Vlad; Varela, Alfredo

2016-04-01

In 2015, the National Autonomous University of Mexico (UNAM) joined the family of Universities and Research Centers where advanced visualization and computing plays a key role to promote and advance missions in research, education, community outreach, as well as business-oriented consulting. This initiative provides access to a great variety of advanced hardware and software resources and offers a range of consulting services that spans a variety of areas related to scientific visualization, among which are: neuroanatomy, embryonic development, genome related studies, geosciences, geography, physics and mathematics related disciplines. The National Laboratory for Advanced Scientific Visualization delivers services through three main infrastructure environments: the 3D fully immersive display system Cave, the high resolution parallel visualization system Powerwall, the high resolution spherical displays Earth Simulator. The entire visualization infrastructure is interconnected to a high-performance-computing-cluster (HPCC) called ADA in honor to Ada Lovelace, considered to be the first computer programmer. The Cave is an extra large 3.6m wide room with projected images on the front, left and right, as well as floor walls. Specialized crystal eyes LCD-shutter glasses provide a strong stereo depth perception, and a variety of tracking devices allow software to track the position of a user's hand, head and wand. The Powerwall is designed to bring large amounts of complex data together through parallel computing for team interaction and collaboration. This system is composed by 24 (6x4) high-resolution ultra-thin (2 mm) bezel monitors connected to a high-performance GPU cluster. The Earth Simulator is a large (60") high-resolution spherical display used for global-scale data visualization like geophysical, meteorological, climate and ecology data. The HPCC-ADA, is a 1000+ computing core system, which offers parallel computing resources to applications that requires
Educational and Scientific Applications of the \\itTime Navigator}

Science.gov (United States)

Cole, M.; Snow, J. T.; Slatt, R. M.

2001-05-01

Several recent conferences have noted the need to focus on the evolving interface between research and education at all levels of science, mathematics, engineering, and technology education. This interface, which is a distinguishing feature of graduate education in the U.S., is increasingly in demand at the undergraduate and K-12 levels, particularly in the earth sciences. In this talk, we present a new database for earth systems science and will explore applications to K-12 and undergraduate education, as well as the scientific and graduate role. The University of Oklahoma, College of Geosciences is in the process of acquiring the \\itTime Navigator}, a multi-disciplinary, multimedia database, which will form the core asset of the Center for Earth Systems Science. The Center, whose mission is to further the understanding of the dynamic Earth within both the academic and the general public communities, will serve as a portal for research, information, and education for scientists and educators. \\itTime Navigator} was developed over a period of some twenty years by the noted British geoscience author, Ron Redfern, in connection with the recently published, \\itOrigins, the evolution of continents, oceans and life}, the third in a series of books for the educated layperson. Over the years \\itTime Navigator} has evolved into an interactive, multimedia database displaying much of the significant geological, paleontological, climatological, and tectonic events from the latest Proterozoic (750 MYA) through to the present. The focus is mainly on the Western Hemisphere and events associated with the coalescence and breakup of Pangea and the evolution of the earth into its present form. \\itOrigins} will be available as early as Fall 2001 as an interactive electronic book for the general, scientifically-literate public. While electronic books are unlikely to replace traditional print books, the format does allow non-linear exploration of content. We believe that the
ForistomApp a Web application for scientific and technological information management of Forsitom foundation

Science.gov (United States)

Saavedra-Duarte, L. A.; Angarita-Jerardino, A.; Ruiz, P. A.; Dulce-Moreno, H. J.; Vera-Rivera, F. H.; V-Niño, E. D.

2017-12-01

Information and Communication Technologies (ICT) are essential in the transfer of knowledge, and the Web tools, as part of ICT, are important for institutions seeking greater visibility of the products developed by their researchers. For this reason, we implemented an application that allows the information management of the FORISTOM Foundation (Foundation of Researchers in Science and Technology of Materials). The application shows a detailed description, not only of all its members also of all the scientific production that they carry out, such as technological developments, research projects, articles, presentations, among others. This application can be implemented by other entities committed to the scientific dissemination and transfer of technology and knowledge.
Porting of Scientific Applications to Grid Computing on GridWay

Directory of Open Access Journals (Sweden)

J. Herrera

2005-01-01

Full Text Available The expansion and adoption of Grid technologies is prevented by the lack of a standard programming paradigm to port existing applications among different environments. The Distributed Resource Management Application API has been proposed to aid the rapid development and distribution of these applications across different Distributed Resource Management Systems. In this paper we describe an implementation of the DRMAA standard on a Globus-based testbed, and show its suitability to express typical scientific applications, like High-Throughput and Master-Worker applications. The DRMAA routines are supported by the functionality offered by the GridWay2 framework, which provides the runtime mechanisms needed for transparently executing jobs on a dynamic Grid environment based on Globus. As cases of study, we consider the implementation with DRMAA of a bioinformatics application, a genetic algorithm and the NAS Grid Benchmarks.
Application of parallel computing to seismic damage process simulation of an arch dam

International Nuclear Information System (INIS)

Zhong Hong; Lin Gao; Li Jianbo

2010-01-01

The simulation of damage process of high arch dam subjected to strong earthquake shocks is significant to the evaluation of its performance and seismic safety, considering the catastrophic effect of dam failure. However, such numerical simulation requires rigorous computational capacity. Conventional serial computing falls short of that and parallel computing is a fairly promising solution to this problem. The parallel finite element code PDPAD was developed for the damage prediction of arch dams utilizing the damage model with inheterogeneity of concrete considered. Developed with programming language Fortran, the code uses a master/slave mode for programming, domain decomposition method for allocation of tasks, MPI (Message Passing Interface) for communication and solvers from AZTEC library for solution of large-scale equations. Speedup test showed that the performance of PDPAD was quite satisfactory. The code was employed to study the damage process of a being-built arch dam on a 4-node PC Cluster, with more than one million degrees of freedom considered. The obtained damage mode was quite similar to that of shaking table test, indicating that the proposed procedure and parallel code PDPAD has a good potential in simulating seismic damage mode of arch dams. With the rapidly growing need for massive computation emerged from engineering problems, parallel computing will find more and more applications in pertinent areas.

Kelly D. Brownell: Award for Distinguished Scientific Applications of Psychology

Science.gov (United States)

American Psychologist, 2012

2012-01-01

Presents a short biography of Kelly D. Brownwell, winner of the American Psychological Association's Award for Distinguished Scientific Applications of Psychology (2012). He won the award for outstanding contributions to our understanding of the etiology and management of obesity and the crisis it poses for the modern world. A seminal thinker in…
Secure Scientific Applications Scheduling Technique for Cloud Computing Environment Using Global League Championship Algorithm.

Science.gov (United States)

Abdulhamid, Shafi'i Muhammad; Abd Latiff, Muhammad Shafie; Abdul-Salaam, Gaddafi; Hussain Madni, Syed Hamid

2016-01-01

Cloud computing system is a huge cluster of interconnected servers residing in a datacenter and dynamically provisioned to clients on-demand via a front-end interface. Scientific applications scheduling in the cloud computing environment is identified as NP-hard problem due to the dynamic nature of heterogeneous resources. Recently, a number of metaheuristics optimization schemes have been applied to address the challenges of applications scheduling in the cloud system, without much emphasis on the issue of secure global scheduling. In this paper, scientific applications scheduling techniques using the Global League Championship Algorithm (GBLCA) optimization technique is first presented for global task scheduling in the cloud environment. The experiment is carried out using CloudSim simulator. The experimental results show that, the proposed GBLCA technique produced remarkable performance improvement rate on the makespan that ranges between 14.44% to 46.41%. It also shows significant reduction in the time taken to securely schedule applications as parametrically measured in terms of the response time. In view of the experimental results, the proposed technique provides better-quality scheduling solution that is suitable for scientific applications task execution in the Cloud Computing environment than the MinMin, MaxMin, Genetic Algorithm (GA) and Ant Colony Optimization (ACO) scheduling techniques.
Secure Scientific Applications Scheduling Technique for Cloud Computing Environment Using Global League Championship Algorithm

Science.gov (United States)

Abdulhamid, Shafi’i Muhammad; Abd Latiff, Muhammad Shafie; Abdul-Salaam, Gaddafi; Hussain Madni, Syed Hamid

2016-01-01

Cloud computing system is a huge cluster of interconnected servers residing in a datacenter and dynamically provisioned to clients on-demand via a front-end interface. Scientific applications scheduling in the cloud computing environment is identified as NP-hard problem due to the dynamic nature of heterogeneous resources. Recently, a number of metaheuristics optimization schemes have been applied to address the challenges of applications scheduling in the cloud system, without much emphasis on the issue of secure global scheduling. In this paper, scientific applications scheduling techniques using the Global League Championship Algorithm (GBLCA) optimization technique is first presented for global task scheduling in the cloud environment. The experiment is carried out using CloudSim simulator. The experimental results show that, the proposed GBLCA technique produced remarkable performance improvement rate on the makespan that ranges between 14.44% to 46.41%. It also shows significant reduction in the time taken to securely schedule applications as parametrically measured in terms of the response time. In view of the experimental results, the proposed technique provides better-quality scheduling solution that is suitable for scientific applications task execution in the Cloud Computing environment than the MinMin, MaxMin, Genetic Algorithm (GA) and Ant Colony Optimization (ACO) scheduling techniques. PMID:27384239
Parallel computing in genomic research: advances and applications.

Science.gov (United States)

Ocaña, Kary; de Oliveira, Daniel

2015-01-01

Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities.
Architecture of top down, parallel pattern recognition system TOPS and its application to the MR head images

International Nuclear Information System (INIS)

Matsunoshita, Jun-ichi; Akamatsu, Shigeo; Yamamoto, Shinji.

1993-01-01

This paper describes about the system architecture of a new image recognition system TOPS (top-down parallel pattern recognition system), and its application to the automatic extraction of brain organs (cerebrum, cerebellum, brain stem) from 3D-MRI images. Main concepts of TOPS are as follows: (1) TOPS is the top-down type recognition system, which allows parallel models in each level of hierarchy structure. (2) TOPS allows parallel image processing algorithms for one purpose (for example, for extraction of one special organ). This results in multiple candidates for one purpose, and judgment to get unique solution for it will be made at upper level of hierarchy structure. (author)
Development of imaging and reconstructions algorithms on parallel processing architectures for applications in non-destructive testing

International Nuclear Information System (INIS)

Pedron, Antoine

2013-01-01

This thesis work is placed between the scientific domain of ultrasound non-destructive testing and algorithm-architecture adequation. Ultrasound non-destructive testing includes a group of analysis techniques used in science and industry to evaluate the properties of a material, component, or system without causing damage. In order to characterise possible defects, determining their position, size and shape, imaging and reconstruction tools have been developed at CEA-LIST, within the CIVA software platform. Evolution of acquisition sensors implies a continuous growth of datasets and consequently more and more computing power is needed to maintain interactive reconstructions. General purpose processors (GPP) evolving towards parallelism and emerging architectures such as GPU allow large acceleration possibilities than can be applied to these algorithms. The main goal of the thesis is to evaluate the acceleration than can be obtained for two reconstruction algorithms on these architectures. These two algorithms differ in their parallelization scheme. The first one can be properly parallelized on GPP whereas on GPU, an intensive use of atomic instructions is required. Within the second algorithm, parallelism is easier to express, but loop ordering on GPP, as well as thread scheduling and a good use of shared memory on GPU are necessary in order to obtain efficient results. Different API or libraries, such as OpenMP, CUDA and OpenCL are evaluated through chosen benchmarks. An integration of both algorithms in the CIVA software platform is proposed and different issues related to code maintenance and durability are discussed. (author) [fr
A Visual Database System for Image Analysis on Parallel Computers and its Application to the EOS Amazon Project

Science.gov (United States)

Shapiro, Linda G.; Tanimoto, Steven L.; Ahrens, James P.

1996-01-01

The goal of this task was to create a design and prototype implementation of a database environment that is particular suited for handling the image, vision and scientific data associated with the NASA's EOC Amazon project. The focus was on a data model and query facilities that are designed to execute efficiently on parallel computers. A key feature of the environment is an interface which allows a scientist to specify high-level directives about how query execution should occur.
A parallel implementation of particle tracking with space charge effects on an INTEL iPSC/860

International Nuclear Information System (INIS)

Chang, L.; Bourianoff, G.; Cole, B.; Machida, S.

1993-05-01

Particle-tracking simulation is one of the scientific applications that is well-suited to parallel computations. At the Superconducting Super Collider, it has been theoretically and empirically demonstrated that particle tracking on a designed lattice can achieve very high parallel efficiency on a MIMD Intel iPSC/860 machine. The key to such success is the realization that the particles can be tracked independently without considering their interaction. The perfectly parallel nature of particle tracking is broken if the interaction effects between particles are included. The space charge introduces an electromagnetic force that will affect the motion of tracked particles in 3-D space. For accurate modeling of the beam dynamics with space charge effects, one needs to solve three-dimensional Maxwell field equations, usually by a particle-in-cell (PIC) algorithm. This will require each particle to communicate with its neighbor grids to compute the momentum changes at each time step. It is expected that the 3-D PIC method will degrade parallel efficiency of particle-tracking implementation on any parallel computer. In this paper, we describe an efficient scheme for implementing particle tracking with space charge effects on an INTEL iPSC/860 machine. Experimental results show that a parallel efficiency of 75% can be obtained
CUDA/GPU Technology : Parallel Programming For High Performance Scientific Computing

OpenAIRE

YUHENDRA; KUZE, Hiroaki; JOSAPHAT, Tetuko Sri Sumantyo

2009-01-01

[ABSTRACT]Graphics processing units (GP Us) originally designed for computer video cards have emerged as the most powerful chip in a high-performance workstation. In the high performance computation capabilities, graphic processing units (GPU) lead to much more powerful performance than conventional CPUs by means of parallel processing. In 2007, the birth of Compute Unified Device Architecture (CUDA) and CUDA-enabled GPUs by NVIDIA Corporation brought a revolution in the general purpose GPU a...
Conical Refraction of Elastic Waves by Anisotropic Metamaterials and Application for Parallel Translation of Elastic Waves.

Science.gov (United States)

Ahn, Young Kwan; Lee, Hyung Jin; Kim, Yoon Young

2017-08-30

Conical refraction, which is quite well-known in electromagnetic waves, has not been explored well in elastic waves due to the lack of proper natural elastic media. Here, we propose and design a unique anisotropic elastic metamaterial slab that realizes conical refraction for horizontally incident longitudinal or transverse waves; the single-mode wave is split into two oblique coupled longitudinal-shear waves. As an interesting application, we carried out an experiment of parallel translation of an incident elastic wave system through the anisotropic metamaterial slab. The parallel translation can be useful for ultrasonic non-destructive testing of a system hidden by obstacles. While the parallel translation resembles light refraction through a parallel plate without angle deviation between entry and exit beams, this wave behavior cannot be achieved without the engineered metamaterial because an elastic wave incident upon a dissimilar medium is always split at different refraction angles into two different modes, longitudinal and shear.
Resonant enhanced parallel-T topology for weak coupling wireless power transfer pickup applications

Directory of Open Access Journals (Sweden)

Yao Guo

2015-07-01

Full Text Available For the wireless power transfer (WPT system, the transfer performance and the coupling coefficient are contradictory. In this paper, a novel parallel-T resonant topology consists of a traditional parallel circuit and a T-matching network for secondary side is proposed. With this method, a boosted voltage can be output to the load, since this topology has a resonant enhancement effect, and high Q value can be obtained at a low resonant frequency and low coil inductance. This feature makes it more suitable for weak coupling WPT applications. Besides, the proposed topology shows good frequency stability and adaptability to variations of load. Experimental results show that the output voltage gain improves by 757% compared with traditional series circuit, and reaches 85% total efficiency when the coupling coefficient is 0.046.
Parallel application of plasma equilibrium fitting based on inhomogeneous platforms

International Nuclear Information System (INIS)

Liao Min; Zhang Jinhua; Chen Liaoyuan; Li Yongge; Pan Wei; Pan Li

2008-01-01

An online analysis and online display platform EFIT, which is based on the equilibrium-fitting mode, is inducted in this paper. This application can realize large data transportation between inhomogeneous platforms by designing a communication mechanism using sockets. It spends approximately one minute to complete the equilibrium fitting reconstruction by using a finite state machine to describe the management node and several node computers of cluster system to fulfill the parallel computation, this satisfies the online display during the discharge interval. An effective communication model between inhomogeneous platforms is provided, which could transport the computing results from Linux platform to Windows platform for online analysis and display. (authors)
Parallelism and array processing

International Nuclear Information System (INIS)

Zacharov, V.

1983-01-01

Modern computing, as well as the historical development of computing, has been dominated by sequential monoprocessing. Yet there is the alternative of parallelism, where several processes may be in concurrent execution. This alternative is discussed in a series of lectures, in which the main developments involving parallelism are considered, both from the standpoint of computing systems and that of applications that can exploit such systems. The lectures seek to discuss parallelism in a historical context, and to identify all the main aspects of concurrency in computation right up to the present time. Included will be consideration of the important question as to what use parallelism might be in the field of data processing. (orig.)
Introduction to massively-parallel computing in high-energy physics

CERN Document Server

AUTHOR|(CDS)2083520

1993-01-01

Ever since computers were first used for scientific and numerical work, there has existed an "arms race" between the technical development of faster computing hardware, and the desires of scientists to solve larger problems in shorter time-scales. However, the vast leaps in processor performance achieved through advances in semi-conductor science have reached a hiatus as the technology comes up against the physical limits of the speed of light and quantum effects. This has lead all high performance computer manufacturers to turn towards a parallel architecture for their new machines. In these lectures we will introduce the history and concepts behind parallel computing, and review the various parallel architectures and software environments currently available. We will then introduce programming methodologies that allow efficient exploitation of parallel machines, and present case studies of the parallelization of typical High Energy Physics codes for the two main classes of parallel computing architecture (S...
Optimal Design of Fixed-Point and Floating-Point Arithmetic Units for Scientific Applications

OpenAIRE

Pongyupinpanich, Surapong

2012-01-01

The challenge in designing a floating-point arithmetic co-processor/processor for scientific and engineering applications is to improve the performance, efficiency, and computational accuracy of the arithmetic unit. The arithmetic unit should efficiently support several mathematical functions corresponding to scientific and engineering computation demands. Moreover, the computations should be performed as fast as possible with a high degree of accuracy. Thus, this thesis proposes algorithm, d...
Parallelization in Modern C++

CERN Multimedia

CERN. Geneva

2016-01-01

The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...
A Parallel Implementation of a Smoothed Particle Hydrodynamics Method on Graphics Hardware Using the Compute Unified Device Architecture

International Nuclear Information System (INIS)

Wong Unhong; Wong Honcheng; Tang Zesheng

2010-01-01

The smoothed particle hydrodynamics (SPH), which is a class of meshfree particle methods (MPMs), has a wide range of applications from micro-scale to macro-scale as well as from discrete systems to continuum systems. Graphics hardware, originally designed for computer graphics, now provide unprecedented computational power for scientific computation. Particle system needs a huge amount of computations in physical simulation. In this paper, an efficient parallel implementation of a SPH method on graphics hardware using the Compute Unified Device Architecture is developed for fluid simulation. Comparing to the corresponding CPU implementation, our experimental results show that the new approach allows significant speedups of fluid simulation through handling huge amount of computations in parallel on graphics hardware.
Parallel Access of Out-Of-Core Dense Extendible Arrays

Energy Technology Data Exchange (ETDEWEB)

Otoo, Ekow J; Rotem, Doron

2007-07-26

Datasets used in scientific and engineering applications are often modeled as dense multi-dimensional arrays. For very large datasets, the corresponding array models are typically stored out-of-core as array files. The array elements are mapped onto linear consecutive locations that correspond to the linear ordering of the multi-dimensional indices. Two conventional mappings used are the row-major order and the column-major order of multi-dimensional arrays. Such conventional mappings of dense array files highly limit the performance of applications and the extendibility of the dataset. Firstly, an array file that is organized in say row-major order causes applications that subsequently access the data in column-major order, to have abysmal performance. Secondly, any subsequent expansion of the array file is limited to only one dimension. Expansions of such out-of-core conventional arrays along arbitrary dimensions, require storage reorganization that can be very expensive. Wepresent a solution for storing out-of-core dense extendible arrays that resolve the two limitations. The method uses a mapping function F*(), together with information maintained in axial vectors, to compute the linear address of an extendible array element when passed its k-dimensional index. We also give the inverse function, F-1*() for deriving the k-dimensional index when given the linear address. We show how the mapping function, in combination with MPI-IO and a parallel file system, allows for the growth of the extendible array without reorganization and no significant performance degradation of applications accessing elements in any desired order. We give methods for reading and writing sub-arrays into and out of parallel applications that run on a cluster of workstations. The axial-vectors are replicated and maintained in each node that accesses sub-array elements.
.NET 4.5 parallel extensions

CERN Document Server

Freeman, Bryan

2013-01-01

This book contains practical recipes on everything you will need to create task-based parallel programs using C#, .NET 4.5, and Visual Studio. The book is packed with illustrated code examples to create scalable programs.This book is intended to help experienced C# developers write applications that leverage the power of modern multicore processors. It provides the necessary knowledge for an experienced C# developer to work with .NET parallelism APIs. Previous experience of writing multithreaded applications is not necessary.
Scientific Applications of Optical Instruments to Materials Research

Science.gov (United States)

Witherow, William K.

1997-01-01

Microgravity is a unique environment for materials and biotechnology processing. Microgravity minimizes or eliminates some of the effects that occur in one g. This can lead to the production of new materials or crystal structures. It is important to understand the processes that create these new materials. Thus, experiments are designed so that optical data collection can take place during the formation of the material. This presentation will discuss scientific application of optical instruments at MSFC. These instruments include a near-field scanning optical microscope, a miniaturized holographic system, and a phase-shifting interferometer.

77 FR 9896 - Proposed Information Collection; Comment Request; Application and Reports for Scientific Research...

Science.gov (United States)

2012-02-21

... Collection; Comment Request; Application and Reports for Scientific Research and Enhancement Permits Under... allows permits authorizing the taking of endangered species for research/enhancement purposes. The... sets of information collections: (1) Applications for research/enhancement permits, and (2) reporting...
The Performance of an Object-Oriented, Parallel Operating System

Directory of Open Access Journals (Sweden)

David R. Kohr, Jr.

1994-01-01

Full Text Available The nascent and rapidly evolving state of parallel systems often leaves parallel application developers at the mercy of inefficient, inflexible operating system software. Given the relatively primitive state of parallel systems software, maximizing the performance of parallel applications not only requires judicious tuning of the application software, but occasionally, the replacement of specific system software modules with others that can more readily respond to the imposed pattern of resource demands. To assess the feasibility of application and performance tuning via malleable system software and to understand the performance penalties for detailed operating system performance data capture, we describe a set of performance instrumentation techniques for parallel, object-oriented operating systems and a set of performance experiments with Choices, an experimental, object-oriented operating system designed for use with parallel sys- tems. These performance experiments show that (a the performance overhead for operating system data capture is modest, (b the penalty for malleable, object-oriented operating systems is negligible, but (c techniques are needed to strictly enforce adherence of implementation to design if operating system modules are to be replaced.
Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

CERN Document Server

Calafiura, Paolo; The ATLAS collaboration; Seuster, Rolf; Tsulaia, Vakhtang; van Gemmeren, Peter

2015-01-01

AthenaMP is a multi-process version of the ATLAS reconstruction and data analysis framework Athena. By leveraging Linux fork and copy-on-write, it allows the sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain confugurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows to run AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the...
Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

CERN Document Server

Calafiura, Paolo; Seuster, Rolf; Tsulaia, Vakhtang; van Gemmeren, Peter

2015-01-01

AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows to run AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of Ath...
Data communications in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-10-29

Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a data communications instruction, the instruction characterized by an instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance with the instruction type, the transfer data from the origin endpoint to the target endpoint.
Parallelization Issues and Particle-In Codes.

Science.gov (United States)

Elster, Anne Cathrine

1994-01-01

"Everything should be made as simple as possible, but not simpler." Albert Einstein. The field of parallel scientific computing has concentrated on parallelization of individual modules such as matrix solvers and factorizers. However, many applications involve several interacting modules. Our analyses of a particle-in-cell code modeling charged particles in an electric field, show that these accompanying dependencies affect data partitioning and lead to new parallelization strategies concerning processor, memory and cache utilization. Our test-bed, a KSR1, is a distributed memory machine with a globally shared addressing space. However, most of the new methods presented hold generally for hierarchical and/or distributed memory systems. We introduce a novel approach that uses dual pointers on the local particle arrays to keep the particle locations automatically partially sorted. Complexity and performance analyses with accompanying KSR benchmarks, have been included for both this scheme and for the traditional replicated grids approach. The latter approach maintains load-balance with respect to particles. However, our results demonstrate it fails to scale properly for problems with large grids (say, greater than 128-by-128) running on as few as 15 KSR nodes, since the extra storage and computation time associated with adding the grid copies, becomes significant. Our grid partitioning scheme, although harder to implement, does not need to replicate the whole grid. Consequently, it scales well for large problems on highly parallel systems. It may, however, require load balancing schemes for non-uniform particle distributions. Our dual pointer approach may facilitate this through dynamically partitioned grids. We also introduce hierarchical data structures that store neighboring grid-points within the same cache -line by reordering the grid indexing. This alignment produces a 25% savings in cache-hits for a 4-by-4 cache. A consideration of the input data's effect on
A parallel Monte Carlo code for planar and SPECT imaging: implementation, verification and applications in (131)I SPECT.

Science.gov (United States)

Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F

2002-02-01

This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.
An efficient parallel algorithm for matrix-vector multiplication

Energy Technology Data Exchange (ETDEWEB)

Hendrickson, B.; Leland, R.; Plimpton, S.

1993-03-01

The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in the well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.
Neighborhood communication paradigm to increase scalability in large-scale dynamic scientific applications

KAUST Repository

Ovcharenko, Aleksandr

2012-03-01

This paper introduces a general-purpose communication package built on top of MPI which is aimed at improving inter-processor communications independently of the supercomputer architecture being considered. The package is developed to support parallel applications that rely on computation characterized by large number of messages of various sizes, often small, that are focused within processor neighborhoods. In some cases, such as solvers having static mesh partitions, the number and size of messages are known a priori. However, in other cases such as mesh adaptation, the messages evolve and vary in number and size and include the dynamic movement of partition objects. The current package provides a utility for dynamic applications based on two key attributes that are: (i) explicit consideration of the neighborhood communication pattern to avoid many-to-many calls and also to reduce the number of collective calls to a minimum, and (ii) use of non-blocking MPI functions along with message packing to manage message flow control and reduce the number and time of communication calls. The test application demonstrated is parallel unstructured mesh adaptation. Results on IBM Blue Gene/P and Cray XE6 computers show that the use of neighborhood-based communication control leads to scalable results when executing generally imbalanced mesh adaptation runs. © 2011 Elsevier B.V. All rights reserved.
Development of a parallel genetic algorithm using MPI and its application in a nuclear reactor core. Design optimization

International Nuclear Information System (INIS)

Waintraub, Marcel; Pereira, Claudio M.N.A.; Baptista, Rafael P.

2005-01-01

This work presents the development of a distributed parallel genetic algorithm applied to a nuclear reactor core design optimization. In the implementation of the parallelism, a 'Message Passing Interface' (MPI) library, standard for parallel computation in distributed memory platforms, has been used. Another important characteristic of MPI is its portability for various architectures. The main objectives of this paper are: validation of the results obtained by the application of this algorithm in a nuclear reactor core optimization problem, through comparisons with previous results presented by Pereira et al.; and performance test of the Brazilian Nuclear Engineering Institute (IEN) cluster in reactors physics optimization problems. The experiments demonstrated that the developed parallel genetic algorithm using the MPI library presented significant gains in the obtained results and an accentuated reduction of the processing time. Such results ratify the use of the parallel genetic algorithms for the solution of nuclear reactor core optimization problems. (author)
Evaluation of the Intel iWarp parallel processor for space flight applications

Science.gov (United States)

Hine, Butler P., III; Fong, Terrence W.

1993-01-01

The potential of a DARPA-sponsored advanced processor, the Intel iWarp, for use in future SSF Data Management Systems (DMS) upgrades is evaluated through integration into the Ames DMS testbed and applications testing. The iWarp is a distributed, parallel computing system well suited for high performance computing applications such as matrix operations and image processing. The system architecture is modular, supports systolic and message-based computation, and is capable of providing massive computational power in a low-cost, low-power package. As a consequence, the iWarp offers significant potential for advanced space-based computing. This research seeks to determine the iWarp's suitability as a processing device for space missions. In particular, the project focuses on evaluating the ease of integrating the iWarp into the SSF DMS baseline architecture and the iWarp's ability to support computationally stressing applications representative of SSF tasks.
Exploiting Symmetry on Parallel Architectures.

Science.gov (United States)

Stiller, Lewis Benjamin

1995-01-01

This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
A Fast parallel tridiagonal algorithm for a class of CFD applications

Science.gov (United States)

Moitra, Stuti; Sun, Xian-He

1996-01-01

The parallel diagonal dominant (PDD) algorithm is an efficient tridiagonal solver. This paper presents for study a variation of the PDD algorithm, the reduced PDD algorithm. The new algorithm maintains the minimum communication provided by the PDD algorithm, but has a reduced operation count. The PDD algorithm also has a smaller operation count than the conventional sequential algorithm for many applications. Accuracy analysis is provided for the reduced PDD algorithm for symmetric Toeplitz tridiagonal (STT) systems. Implementation results on Langley's Intel Paragon and IBM SP2 show that both the PDD and reduced PDD algorithms are efficient and scalable.
Large-scale parallel configuration interaction. II. Two- and four-component double-group general active space implementation with application to BiH

DEFF Research Database (Denmark)

Knecht, Stefan; Jensen, Hans Jørgen Aagaard; Fleig, Timo

2010-01-01

We present a parallel implementation of a large-scale relativistic double-group configuration interaction CIprogram. It is applicable with a large variety of two- and four-component Hamiltonians. The parallel algorithm is based on a distributed data model in combination with a static load balanci...
Parallelism at Cern: real-time and off-line applications in the GP-MIMD2 project

International Nuclear Information System (INIS)

Calafiura, P.

1997-01-01

A wide range of general purpose high-energy physics applications, ranging from Monte Carlo simulation to data acquisition, from interactive data analysis to on-line filtering, have been ported, or developed, and run in parallel on IBM SP-2 and Meiko CS-2 CERN large multi-processor machines. The ESPRIT project GP-MIMD2 has been a catalyst for the interest in parallel computing at CERN. The project provided the 128 processor Meiko CS-2 system that is now succesfully integrated in the CERN computing environment. The CERN experiment NA48 was involved in the GP-MIMD2 project since the beginning. NA48 physicists run, as part of their day-to-day work, simulation and analysis programs parallelized using the message passing interface MPI. The CS-2 is also a vital component of the experiment data acquisition system and will be used to calibrate in real-time the 13000 channels liquid krypton calorimeter. (orig.)
Parallel computation

International Nuclear Information System (INIS)

Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.

1997-01-01

The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment
Parallel scan hyperspectral fluorescence imaging system and biomedical application for microarrays

International Nuclear Information System (INIS)

Liu Zhiyi; Ma Suihua; Liu Le; Guo Jihua; He Yonghong; Ji Yanhong

2011-01-01

Microarray research offers great potential for analysis of gene expression profile and leads to greatly improved experimental throughput. A number of instruments have been reported for microarray detection, such as chemiluminescence, surface plasmon resonance, and fluorescence markers. Fluorescence imaging is popular for the readout of microarrays. In this paper we develop a quasi-confocal, multichannel parallel scan hyperspectral fluorescence imaging system for microarray research. Hyperspectral imaging records the entire emission spectrum for every voxel within the imaged area in contrast to recording only fluorescence intensities of filter-based scanners. Coupled with data analysis, the recorded spectral information allows for quantitative identification of the contributions of multiple, spectrally overlapping fluorescent dyes and elimination of unwanted artifacts. The mechanism of quasi-confocal imaging provides a high signal-to-noise ratio, and parallel scan makes this approach a high throughput technique for microarray analysis. This system is improved with a specifically designed spectrometer which can offer a spectral resolution of 0.2 nm, and operates with spatial resolutions ranging from 2 to 30 μm . Finally, the application of the system is demonstrated by reading out microarrays for identification of bacteria.
Professional Parallel Programming with C# Master Parallel Extensions with NET 4

CERN Document Server

Hillar, Gastón

2010-01-01

Expert guidance for those programming today's dual-core processors PCs As PC processors explode from one or two to now eight processors, there is an urgent need for programmers to master concurrent programming. This book dives deep into the latest technologies available to programmers for creating professional parallel applications using C#, .NET 4, and Visual Studio 2010. The book covers task-based programming, coordination data structures, PLINQ, thread pools, asynchronous programming model, and more. It also teaches other parallel programming techniques, such as SIMD and vectorization.Teach
Language constructs for modular parallel programs

Energy Technology Data Exchange (ETDEWEB)

Foster, I.

1996-03-01

We describe programming language constructs that facilitate the application of modular design techniques in parallel programming. These constructs allow us to isolate resource management and processor scheduling decisions from the specification of individual modules, which can themselves encapsulate design decisions concerned with concurrence, communication, process mapping, and data distribution. This approach permits development of libraries of reusable parallel program components and the reuse of these components in different contexts. In particular, alternative mapping strategies can be explored without modifying other aspects of program logic. We describe how these constructs are incorporated in two practical parallel programming languages, PCN and Fortran M. Compilers have been developed for both languages, allowing experimentation in substantial applications.
Parallel multigrid smoothing: polynomial versus Gauss-Seidel

International Nuclear Information System (INIS)

Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

2003-01-01

Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines

Parallel multigrid smoothing: polynomial versus Gauss-Seidel

Science.gov (United States)

Adams, Mark; Brezina, Marian; Hu, Jonathan; Tuminaro, Ray

2003-07-01

Gauss-Seidel is often the smoother of choice within multigrid applications. In the context of unstructured meshes, however, maintaining good parallel efficiency is difficult with multiplicative iterative methods such as Gauss-Seidel. This leads us to consider alternative smoothers. We discuss the computational advantages of polynomial smoothers within parallel multigrid algorithms for positive definite symmetric systems. Two particular polynomials are considered: Chebyshev and a multilevel specific polynomial. The advantages of polynomial smoothing over traditional smoothers such as Gauss-Seidel are illustrated on several applications: Poisson's equation, thin-body elasticity, and eddy current approximations to Maxwell's equations. While parallelizing the Gauss-Seidel method typically involves a compromise between a scalable convergence rate and maintaining high flop rates, polynomial smoothers achieve parallel scalable multigrid convergence rates without sacrificing flop rates. We show that, although parallel computers are the main motivation, polynomial smoothers are often surprisingly competitive with Gauss-Seidel smoothers on serial machines.
The present status of scientific applications of nuclear explosions

International Nuclear Information System (INIS)

Cowan, G.A.; Diven, B.C.

1970-01-01

This is the fourth in a series of symposia which started, in 1957 at Livermore with the purpose of examining the peaceful uses of nuclear explosives. Although principal emphasis has b een placed on technological applications, the discussions have, from the outset, included the fascinating question of scientific uses. Of the possible scientific applications which were mentioned at the 1957 meeting, the proposals which attracted most attention involved uses of nuclear explosions for research in seismology. It is interesting to note that since then a very large and stimulating body of data in the field of seismology has been collected from nuclear tests. Ideas for scientific applications of nuclear explosions go back considerably further than 1957. During the war days Otto Frisch at Los Alamos suggested that a fission bomb would provide an excellent source of fast neutrons which could be led down a vacuum pipe and used for experiments in a relatively unscattered state. This idea, reinvented, modified, and elaborated upon in the ensuing twenty-five years, provides the basis for much of the research discussed in this morning's program. In 1952 a somewhat different property of nuclear explosions, their ability to produce intense neutron exposures on internal targets and to synthesize large quantities of multiple neutron capture products, was dramatically brought to our attention by analysis of debris from the first large thermonuclear explosion (Mike) in which the elements einsteinium and fermiun were observed for the first time. The reports of the next two Plowshare symposia in 1959 and 1964 help record the fascinating development of the scientific uses of neutrons in nuclear explosions. Starting with two 'wheel' experiments in 1958 to measure symmetry of fission in 235-U resonances, the use of external beams of energy-resolved neutrons was expanded on the 'Gnome' experiment in 1961 to include the measurement of neutron capture excitation functions for 238-U, 232-Th
Parallel transport in ideal magnetohydrodynamics and applications to resistive wall modes

International Nuclear Information System (INIS)

Finn, J.M.; Gerwin, R.A.

1996-01-01

It is shown that in magnetohydrodynamics (MHD) with an ideal Ohm close-quote s law, in the presence of parallel heat flux, density gradient, temperature gradient, and parallel compression, but in the absence of perpendicular compressibility, there is an exact cancellation of the parallel transport terms. This cancellation is due to the fact that magnetic flux is advected in the presence of an ideal Ohm close-quote s law, and therefore parallel transport of temperature and density gives the same result as perpendicular advection of the same quantities. Discussions are also presented regarding parallel viscosity and parallel velocity shear, and the generalization to toroidal geometry. These results suggest that a correct generalization of the Hammett endash Perkins fluid operator [G. W. Hammett and F. W. Perkins, Phys. Rev. Lett. 64, 3019 (1990)] to simulate Landau damping for electromagnetic modes must give an operator that acts on the dynamics parallel to the perturbed magnetic field lines. copyright 1996 American Institute of Physics
Automatic Management of Parallel and Distributed System Resources

Science.gov (United States)

Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

1990-01-01

Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.
Literature Review on the Hybrid Flow Shop Scheduling Problem with Unrelated Parallel Machines

Directory of Open Access Journals (Sweden)

Eliana Marcela Peña Tibaduiza

2017-01-01

Full Text Available Context: The flow shop hybrid problem with unrelated parallel machines has been less studied in the academia compared to the flow shop hybrid with identical processors. For this reason, there are few reports about the kind of application of this problem in industries. Method: A literature review of the state of the art on flow-shop scheduling problem was conducted by collecting and analyzing academic papers on several scientific databases. For this aim, a search query was constructed using keywords defining the problem and checking the inclusion of unrelated parallel machines in such definition; as a result, 50 papers were finally selected for this study. Results: A classification of the problem according to the characteristics of the production system was performed, also solution methods, constraints and objective functions commonly used are presented. Conclusions: An increasing trend is observed in studies of flow shop with multiple stages, but few are based on industry case-studies.
Physics through the 1990s: Scientific interfaces and technological applications

International Nuclear Information System (INIS)

1986-01-01

Physics traditionally serves mankind through its fundamental discoveries, which enrich our understanding of nature and the cosmos. While the basic driving force for physics research is intellectual curiosity and the search for understanding, the nation's support for physics is also motivated by strategic national goals, by the pride of world scientific leadership, by societal impact through symbiosis with other natural sciences, and through the stimulus of advanced technology provided by applications of physics. This Physics Survey volume looks outward from physics to report its profound impact on society and the economy through interactions at the interfaces with other natural sciences and through applications of physics to technology, medicine, and national defense
Scientific applications of frequency-stabilized laser technology in space

Science.gov (United States)

Schumaker, Bonny L.

1990-01-01

A synoptic investigation of the uses of frequency-stabilized lasers for scientific applications in space is presented. It begins by summarizing properties of lasers, characterizing their frequency stability, and describing limitations and techniques to achieve certain levels of frequency stability. Limits to precision set by laser frequency stability for various kinds of measurements are investigated and compared with other sources of error. These other sources include photon-counting statistics, scattered laser light, fluctuations in laser power, and intensity distribution across the beam, propagation effects, mechanical and thermal noise, and radiation pressure. Methods are explored to improve the sensitivity of laser-based interferometric and range-rate measurements. Several specific types of science experiments that rely on highly precise measurements made with lasers are analyzed, and anticipated errors and overall performance are discussed. Qualitative descriptions are given of a number of other possible science applications involving frequency-stabilized lasers and related laser technology in space. These applications will warrant more careful analysis as technology develops.
Physics through the 1990s: scientific interfaces and technological applications

International Nuclear Information System (INIS)

1986-01-01

The volume examines the scientific interfaces and technological applications of physics. Twelve areas are dealt with: biological physics--biophysics, the brain, and theoretical biology; the physics-chemistry interface--instrumentation, surfaces, neutron and synchrotron radiation, polymers, organic electronic materials; materials science; geophysics--tectonics, the atmosphere and oceans, planets, drilling and seismic exploration, and remote sensing; computational physics--complex systems and applications in basic research; mathematics--field theory and chaos; microelectronics--integrated circuits, miniaturization, future trends; optical information technologies--fiber optics and photonics; instrumentation; physics applications to energy needs and the environment; national security--devices, weapons, and arms control; medical physics--radiology, ultrasonics, NMR, and photonics. An executive summary and many chapters contain recommendations regarding funding, education, industry participation, small-group university research and large facility programs, government agency programs, and computer database needs
HPC Cloud for Scientific and Business Applications: Taxonomy, Vision, and Research Challenges

OpenAIRE

Netto, Marco A. S.; Calheiros, Rodrigo N.; Rodrigues, Eduardo R.; Cunha, Renato L. F.; Buyya, Rajkumar

2017-01-01

High Performance Computing (HPC) clouds are becoming an alternative to on-premise clusters for executing scientific applications and business analytics services. Most research efforts in HPC cloud aim to understand the cost-benefit of moving resource-intensive applications from on-premise environments to public cloud platforms. Industry trends show hybrid environments are the natural path to get the best of the on-premise and cloud resources---steady (and sensitive) workloads can run on on-pr...
Software Engineering Support of the Third Round of Scientific Grand Challenge Investigations: Earth System Modeling Software Framework Survey

Science.gov (United States)

Talbot, Bryan; Zhou, Shu-Jia; Higgins, Glenn; Zukor, Dorothy (Technical Monitor)

2002-01-01

One of the most significant challenges in large-scale climate modeling, as well as in high-performance computing in other scientific fields, is that of effectively integrating many software models from multiple contributors. A software framework facilitates the integration task, both in the development and runtime stages of the simulation. Effective software frameworks reduce the programming burden for the investigators, freeing them to focus more on the science and less on the parallel communication implementation. while maintaining high performance across numerous supercomputer and workstation architectures. This document surveys numerous software frameworks for potential use in Earth science modeling. Several frameworks are evaluated in depth, including Parallel Object-Oriented Methods and Applications (POOMA), Cactus (from (he relativistic physics community), Overture, Goddard Earth Modeling System (GEMS), the National Center for Atmospheric Research Flux Coupler, and UCLA/UCB Distributed Data Broker (DDB). Frameworks evaluated in less detail include ROOT, Parallel Application Workspace (PAWS), and Advanced Large-Scale Integrated Computational Environment (ALICE). A host of other frameworks and related tools are referenced in this context. The frameworks are evaluated individually and also compared with each other.
Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

Directory of Open Access Journals (Sweden)

Stephen L. Olivier

2013-01-01

Full Text Available Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems. Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.
FPGAs and parallel architectures for aerospace applications soft errors and fault-tolerant design

CERN Document Server

Rech, Paolo

2016-01-01

This book introduces the concepts of soft errors in FPGAs, as well as the motivation for using commercial, off-the-shelf (COTS) FPGAs in mission-critical and remote applications, such as aerospace. The authors describe the effects of radiation in FPGAs, present a large set of soft-error mitigation techniques that can be applied in these circuits, as well as methods for qualifying these circuits under radiation. Coverage includes radiation effects in FPGAs, fault-tolerant techniques for FPGAs, use of COTS FPGAs in aerospace applications, experimental data of FPGAs under radiation, FPGA embedded processors under radiation, and fault injection in FPGAs. Since dedicated parallel processing architectures such as GPUs have become more desirable in aerospace applications due to high computational power, GPU analysis under radiation is also discussed. · Discusses features and drawbacks of reconfigurability methods for FPGAs, focused on aerospace applications; · Explains how radia...
Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers

KAUST Repository

Wu, Xingfu; Taylor, Valerie

2013-01-01

In this paper, we present a performance modeling framework based on memory bandwidth contention time and a parameterized communication model to predict the performance of OpenMP, MPI and hybrid applications with weak scaling on three large-scale multicore supercomputers: IBM POWER4, POWER5+ and BlueGene/P, and analyze the performance of these MPI, OpenMP and hybrid applications. We use STREAM memory benchmarks and Intel's MPI benchmarks to provide initial performance analysis and model validation of MPI and OpenMP applications on these multicore supercomputers because the measured sustained memory bandwidth can provide insight into the memory bandwidth that a system should sustain on scientific applications with the same amount of workload per core. In addition to using these benchmarks, we also use a weak-scaling hybrid MPI/OpenMP large-scale scientific application: Gyrokinetic Toroidal Code (GTC) in magnetic fusion to validate our performance model of the hybrid application on these multicore supercomputers. The validation results for our performance modeling method show less than 7.77% error rate in predicting the performance of hybrid MPI/OpenMP GTC on up to 512 cores on these multicore supercomputers. © 2013 Elsevier Inc.
Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers

KAUST Repository

Wu, Xingfu

2013-12-01

In this paper, we present a performance modeling framework based on memory bandwidth contention time and a parameterized communication model to predict the performance of OpenMP, MPI and hybrid applications with weak scaling on three large-scale multicore supercomputers: IBM POWER4, POWER5+ and BlueGene/P, and analyze the performance of these MPI, OpenMP and hybrid applications. We use STREAM memory benchmarks and Intel\\'s MPI benchmarks to provide initial performance analysis and model validation of MPI and OpenMP applications on these multicore supercomputers because the measured sustained memory bandwidth can provide insight into the memory bandwidth that a system should sustain on scientific applications with the same amount of workload per core. In addition to using these benchmarks, we also use a weak-scaling hybrid MPI/OpenMP large-scale scientific application: Gyrokinetic Toroidal Code (GTC) in magnetic fusion to validate our performance model of the hybrid application on these multicore supercomputers. The validation results for our performance modeling method show less than 7.77% error rate in predicting the performance of hybrid MPI/OpenMP GTC on up to 512 cores on these multicore supercomputers. © 2013 Elsevier Inc.
15 CFR 301.3 - Application for duty-free entry of scientific instruments.

Science.gov (United States)

2010-01-01

... 15 Commerce and Foreign Trade 2 2010-01-01 2010-01-01 false Application for duty-free entry of scientific instruments. 301.3 Section 301.3 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) INTERNATIONAL TRADE ADMINISTRATION, DEPARTMENT OF COMMERCE MISCELLANEOUS...
Adding Data Management Services to Parallel File Systems

Energy Technology Data Exchange (ETDEWEB)

Brandt, Scott [Univ. of California, Santa Cruz, CA (United States)

2015-03-04

The objective of this project, called DAMASC for “Data Management in Scientific Computing”, is to coalesce data management with parallel file system management to present a declarative interface to scientists for managing, querying, and analyzing extremely large data sets efficiently and predictably. Managing extremely large data sets is a key challenge of exascale computing. The overhead, energy, and cost of moving massive volumes of data demand designs where computation is close to storage. In current architectures, compute/analysis clusters access data in a physically separate parallel file system and largely leave it scientist to reduce data movement. Over the past decades the high-end computing community has adopted middleware with multiple layers of abstractions and specialized file formats such as NetCDF-4 and HDF5. These abstractions provide a limited set of high-level data processing functions, but have inherent functionality and performance limitations: middleware that provides access to the highly structured contents of scientific data files stored in the (unstructured) file systems can only optimize to the extent that file system interfaces permit; the highly structured formats of these files often impedes native file system performance optimizations. We are developing Damasc, an enhanced high-performance file system with native rich data management services. Damasc will enable efficient queries and updates over files stored in their native byte-stream format while retaining the inherent performance of file system data storage via declarative queries and updates over views of underlying files. Damasc has four key benefits for the development of data-intensive scientific code: (1) applications can use important data-management services, such as declarative queries, views, and provenance tracking, that are currently available only within database systems; (2) the use of these services becomes easier, as they are provided within a familiar file
Parallel Scaling Characteristics of Selected NERSC User ProjectCodes

Energy Technology Data Exchange (ETDEWEB)

Skinner, David; Verdier, Francesca; Anand, Harsh; Carter,Jonathan; Durst, Mark; Gerber, Richard

2005-03-05

This report documents parallel scaling characteristics of NERSC user project codes between Fiscal Year 2003 and the first half of Fiscal Year 2004 (Oct 2002-March 2004). The codes analyzed cover 60% of all the CPU hours delivered during that time frame on seaborg, a 6080 CPU IBM SP and the largest parallel computer at NERSC. The scale in terms of concurrency and problem size of the workload is analyzed. Drawing on batch queue logs, performance data and feedback from researchers we detail the motivations, benefits, and challenges of implementing highly parallel scientific codes on current NERSC High Performance Computing systems. An evaluation and outlook of the NERSC workload for Allocation Year 2005 is presented.
ERAST: Scientific Applications and Technology Commercialization

Science.gov (United States)

Hunley, John D. (Compiler); Kellogg, Yvonne (Compiler)

2000-01-01

This is a conference publication for an event designed to inform potential contractors and appropriate personnel in various scientific disciplines that the ERAST (Environmental Research Aircraft and Sensor Technology) vehicles have reached a certain level of maturity and are available to perform a variety of missions ranging from data gathering to telecommunications. There are multiple applications of the technology and a great many potential commercial and governmental markets. As high altitude platforms, the ERAST vehicles can gather data at higher resolution than satellites and can do so continuously, whereas satellites pass over a particular area only once each orbit. Formal addresses are given by Rich Christiansen, (Director of Programs, NASA Aerospace Technology Ent.), Larry Roeder, (Senior Policy Advisor, U.S. Dept. of State), and Dr. Marianne McCarthy, (DFRC Education Dept.). The Commercialization Workshop is chaired by Dale Tietz (President, New Vista International) and the Science Workshop is chaired by Steve Wegener, (Deputy Manager of NASA ERAST, NASA Ames Research Center.
Scientific production on the applicability of phenytoin in wound healing

Directory of Open Access Journals (Sweden)

Flávia Firmino

2014-02-01

Full Text Available Phenytoin is an anticonvulsant that has been used in wound healing. The objectives of this study were to describe how the scientific production presents the use ofphenytoinas a healing agent and to discuss its applicability in wounds. A literature review and hierarchy analysis of evidence-based practices was performed. Eighteen articles were analyzed that tested the intervention in wounds such as leprosy ulcers, leg ulcers, diabetic foot ulcers, pressure ulcers, trophic ulcers, war wounds, burns, preparation of recipient graft area, radiodermatitis and post-extraction of melanocytic nevi. Systemic use ofphenytoinin the treatment of fistulas and the hypothesis of topical use in the treatment of vitiligo were found. In conclusion, topical use ofphenytoinis scientifically evidenced. However robust research is needed that supports a protocol for the use ofphenytoinas another option of a healing agent in clinical practice.
Non-Cartesian parallel imaging reconstruction.

Science.gov (United States)

Wright, Katherine L; Hamilton, Jesse I; Griswold, Mark A; Gulani, Vikas; Seiberlich, Nicole

2014-11-01

Non-Cartesian parallel imaging has played an important role in reducing data acquisition time in MRI. The use of non-Cartesian trajectories can enable more efficient coverage of k-space, which can be leveraged to reduce scan times. These trajectories can be undersampled to achieve even faster scan times, but the resulting images may contain aliasing artifacts. Just as Cartesian parallel imaging can be used to reconstruct images from undersampled Cartesian data, non-Cartesian parallel imaging methods can mitigate aliasing artifacts by using additional spatial encoding information in the form of the nonhomogeneous sensitivities of multi-coil phased arrays. This review will begin with an overview of non-Cartesian k-space trajectories and their sampling properties, followed by an in-depth discussion of several selected non-Cartesian parallel imaging algorithms. Three representative non-Cartesian parallel imaging methods will be described, including Conjugate Gradient SENSE (CG SENSE), non-Cartesian generalized autocalibrating partially parallel acquisition (GRAPPA), and Iterative Self-Consistent Parallel Imaging Reconstruction (SPIRiT). After a discussion of these three techniques, several potential promising clinical applications of non-Cartesian parallel imaging will be covered. © 2014 Wiley Periodicals, Inc.

A Parallel Processing Algorithm for Remote Sensing Classification

Science.gov (United States)

Gualtieri, J. Anthony

2005-01-01

A current thread in parallel computation is the use of cluster computers created by networking a few to thousands of commodity general-purpose workstation-level commuters using the Linux operating system. For example on the Medusa cluster at NASA/GSFC, this provides for super computing performance, 130 G(sub flops) (Linpack Benchmark) at moderate cost, $370K. However, to be useful for scientific computing in the area of Earth science, issues of ease of programming, access to existing scientific libraries, and portability of existing code need to be considered. In this paper, I address these issues in the context of tools for rendering earth science remote sensing data into useful products. In particular, I focus on a problem that can be decomposed into a set of independent tasks, which on a serial computer would be performed sequentially, but with a cluster computer can be performed in parallel, giving an obvious speedup. To make the ideas concrete, I consider the problem of classifying hyperspectral imagery where some ground truth is available to train the classifier. In particular I will use the Support Vector Machine (SVM) approach as applied to hyperspectral imagery. The approach will be to introduce notions about parallel computation and then to restrict the development to the SVM problem. Pseudocode (an outline of the computation) will be described and then details specific to the implementation will be given. Then timing results will be reported to show what speedups are possible using parallel computation. The paper will close with a discussion of the results.
Design considerations for parallel graphics libraries

Science.gov (United States)

Crockett, Thomas W.

1994-01-01

Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
Predicting environmental aspects of CCSR leachates through the application of scientifically valid leaching protocols

International Nuclear Information System (INIS)

Hassett, D.J.

1993-01-01

The disposal of solid wastes from energy production, particularly solid wastes from coal conversion processes, requires a thorough understanding of the waste material as well as the disposal environment. Many coal conversion solid residues (CCSRs) have chemical, mineralogical, and physical properties advantageous for use as engineering construction materials and in other industrial applications. If disposal is to be the final disposition of CCSRs from any source, the very properties that can make ash useful also contribute to behavior that must be understood for scientifically logical and environmentally responsible disposal. This paper describes the application of scientifically valid leaching and characterization tests designed to predict field phenomena. The key to proper characterization of these unique materials is the recognition of and compensation for the hydration reactions that can occur during long-term leaching. Many of these reactions, such as the formation of the mineral ettringite, can have a profound effect on the concentration of potentially problematic trace elements such as boron, chromium, and selenium. The mobility of these elements, which may be concentrated in CCSRs due to the conversion process, must be properly evaluated for the formation of informed and scientifically sound decisions regarding safe disposal. Groundwater is an extremely important and relatively scarce resource. Contamination of this resource is a threat to life, which is highly dependent on it, so management of materials that can impact groundwater must be carefully planned and executed. The application of scientifically valid leaching protocols and complete testing are critical to proper waste management
Proceeding on the Scientific Meeting and Presentation on Accelerator Technology and Its Applications

International Nuclear Information System (INIS)

Susilo Widodo; Darsono; Slamet Santosa; Sudjatmoko; Tjipto Sujitno; Pramudita Anggraita; Wahini Nurhayati

2015-11-01

The scientific meeting and presentation on accelerator technology and its applications was held by PSTA BATAN on 30 November 2015. This meeting aims to promote the technology and its applications to accelerator scientists, academics, researchers and technology users as well as accelerator-based accelerator research that have been conducted by researchers in and outside BATAN. This proceeding contains 20 papers about physics and nuclear reactor. (PPIKSN)
Development of parallel algorithms for electrical power management in space applications

Science.gov (United States)

Berry, Frederick C.

1989-01-01

The application of parallel techniques for electrical power system analysis is discussed. The Newton-Raphson method of load flow analysis was used along with the decomposition-coordination technique to perform load flow analysis. The decomposition-coordination technique enables tasks to be performed in parallel by partitioning the electrical power system into independent local problems. Each independent local problem represents a portion of the total electrical power system on which a loan flow analysis can be performed. The load flow analysis is performed on these partitioned elements by using the Newton-Raphson load flow method. These independent local problems will produce results for voltage and power which can then be passed to the coordinator portion of the solution procedure. The coordinator problem uses the results of the local problems to determine if any correction is needed on the local problems. The coordinator problem is also solved by an iterative method much like the local problem. The iterative method for the coordination problem will also be the Newton-Raphson method. Therefore, each iteration at the coordination level will result in new values for the local problems. The local problems will have to be solved again along with the coordinator problem until some convergence conditions are met.
Application of Raptor-M3G to reactor dosimetry problems on massively parallel architectures - 026

International Nuclear Information System (INIS)

Longoni, G.

2010-01-01

The solution of complex 3-D radiation transport problems requires significant resources both in terms of computation time and memory availability. Therefore, parallel algorithms and multi-processor architectures are required to solve efficiently large 3-D radiation transport problems. This paper presents the application of RAPTOR-M3G (Rapid Parallel Transport Of Radiation - Multiple 3D Geometries) to reactor dosimetry problems. RAPTOR-M3G is a newly developed parallel computer code designed to solve the discrete ordinates (SN) equations on multi-processor computer architectures. This paper presents the results for a reactor dosimetry problem using a 3-D model of a commercial 2-loop pressurized water reactor (PWR). The accuracy and performance of RAPTOR-M3G will be analyzed and the numerical results obtained from the calculation will be compared directly to measurements of the neutron field in the reactor cavity air gap. The parallel performance of RAPTOR-M3G on massively parallel architectures, where the number of computing nodes is in the order of hundreds, will be analyzed up to four hundred processors. The performance results will be presented based on two supercomputing architectures: the POPLE supercomputer operated by the Pittsburgh Supercomputing Center and the Westinghouse computer cluster. The Westinghouse computer cluster is equipped with a standard Ethernet network connection and an InfiniBand R interconnects capable of a bandwidth in excess of 20 GBit/sec. Therefore, the impact of the network architecture on RAPTOR-M3G performance will be analyzed as well. (authors)
Workshop on scientific applications of short wavelength coherent light sources

International Nuclear Information System (INIS)

Spicer, W.; Arthur, J.; Winick, H.

1993-02-01

This report contains paper on the following topics: A 2 to 4nm High Power FEL On the SLAC Linac; Atomic Physics with an X-ray Laser; High Resolution, Three Dimensional Soft X-ray Imaging; The Role of X-ray Induced Damage in Biological Micro-imaging; Prospects for X-ray Microscopy in Biology; Femtosecond Optical Pulses?; Research in Chemical Physics Surface Science, and Materials Science, with a Linear Accelerator Coherent Light Source; Application of 10 GeV Electron Driven X-ray Laser in Gamma-ray Laser Research; Non-Linear Optics, Fluorescence, Spectromicroscopy, Stimulated Desorption: We Need LCLS' Brightness and Time Scale; Application of High Intensity X-rays to Materials Synthesis and Processing; LCLS Optics: Selected Technological Issues and Scientific Opportunities; Possible Applications of an FEL for Materials Studies in the 60 eV to 200 eV Spectral Region
Scheduling Parallel Jobs Using Migration and Consolidation in the Cloud

Directory of Open Access Journals (Sweden)

Xiaocheng Liu

2012-01-01

Full Text Available An increasing number of high performance computing parallel applications leverages the power of the cloud for parallel processing. How to schedule the parallel applications to improve the quality of service is the key to the successful host of parallel applications in the cloud. The large scale of the cloud makes the parallel job scheduling more complicated as even simple parallel job scheduling problem is NP-complete. In this paper, we propose a parallel job scheduling algorithm named MEASY. MEASY adopts migration and consolidation to enhance the most popular EASY scheduling algorithm. Our extensive experiments on well-known workloads show that our algorithm takes very good care of the quality of service. For two common parallel job scheduling objectives, our algorithm produces an up to 41.1% and an average of 23.1% improvement on the average response time; an up to 82.9% and an average of 69.3% improvement on the average slowdown. Our algorithm is robust even in terms that it allows inaccurate CPU usage estimation and high migration cost. Our approach involves trivial modification on EASY and requires no additional technique; it is practical and effective in the cloud environment.
Comparison of Pre-Analytical FFPE Sample Preparation Methods and Their Impact on Massively Parallel Sequencing in Routine Diagnostics

Science.gov (United States)

Heydt, Carina; Fassunke, Jana; Künstlinger, Helen; Ihle, Michaela Angelika; König, Katharina; Heukamp, Lukas Carl; Schildhaus, Hans-Ulrich; Odenthal, Margarete; Büttner, Reinhard; Merkelbach-Bruse, Sabine

2014-01-01

Over the last years, massively parallel sequencing has rapidly evolved and has now transitioned into molecular pathology routine laboratories. It is an attractive platform for analysing multiple genes at the same time with very little input material. Therefore, the need for high quality DNA obtained from automated DNA extraction systems has increased, especially to those laboratories which are dealing with formalin-fixed paraffin-embedded (FFPE) material and high sample throughput. This study evaluated five automated FFPE DNA extraction systems as well as five DNA quantification systems using the three most common techniques, UV spectrophotometry, fluorescent dye-based quantification and quantitative PCR, on 26 FFPE tissue samples. Additionally, the effects on downstream applications were analysed to find the most suitable pre-analytical methods for massively parallel sequencing in routine diagnostics. The results revealed that the Maxwell 16 from Promega (Mannheim, Germany) seems to be the superior system for DNA extraction from FFPE material. The extracts had a 1.3–24.6-fold higher DNA concentration in comparison to the other extraction systems, a higher quality and were most suitable for downstream applications. The comparison of the five quantification methods showed intermethod variations but all methods could be used to estimate the right amount for PCR amplification and for massively parallel sequencing. Interestingly, the best results in massively parallel sequencing were obtained with a DNA input of 15 ng determined by the NanoDrop 2000c spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). No difference could be detected in mutation analysis based on the results of the quantification methods. These findings emphasise, that it is particularly important to choose the most reliable and constant DNA extraction system, especially when using small biopsies and low elution volumes, and that all common DNA quantification techniques can be used for
Comparison of pre-analytical FFPE sample preparation methods and their impact on massively parallel sequencing in routine diagnostics.

Directory of Open Access Journals (Sweden)

Carina Heydt

Full Text Available Over the last years, massively parallel sequencing has rapidly evolved and has now transitioned into molecular pathology routine laboratories. It is an attractive platform for analysing multiple genes at the same time with very little input material. Therefore, the need for high quality DNA obtained from automated DNA extraction systems has increased, especially to those laboratories which are dealing with formalin-fixed paraffin-embedded (FFPE material and high sample throughput. This study evaluated five automated FFPE DNA extraction systems as well as five DNA quantification systems using the three most common techniques, UV spectrophotometry, fluorescent dye-based quantification and quantitative PCR, on 26 FFPE tissue samples. Additionally, the effects on downstream applications were analysed to find the most suitable pre-analytical methods for massively parallel sequencing in routine diagnostics. The results revealed that the Maxwell 16 from Promega (Mannheim, Germany seems to be the superior system for DNA extraction from FFPE material. The extracts had a 1.3-24.6-fold higher DNA concentration in comparison to the other extraction systems, a higher quality and were most suitable for downstream applications. The comparison of the five quantification methods showed intermethod variations but all methods could be used to estimate the right amount for PCR amplification and for massively parallel sequencing. Interestingly, the best results in massively parallel sequencing were obtained with a DNA input of 15 ng determined by the NanoDrop 2000c spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA. No difference could be detected in mutation analysis based on the results of the quantification methods. These findings emphasise, that it is particularly important to choose the most reliable and constant DNA extraction system, especially when using small biopsies and low elution volumes, and that all common DNA quantification techniques can
Parallel algorithms for numerical linear algebra

CERN Document Server

van der Vorst, H

1990-01-01

This is the first in a new series of books presenting research results and developments concerning the theory and applications of parallel computers, including vector, pipeline, array, fifth/future generation computers, and neural computers.All aspects of high-speed computing fall within the scope of the series, e.g. algorithm design, applications, software engineering, networking, taxonomy, models and architectural trends, performance, peripheral devices.Papers in Volume One cover the main streams of parallel linear algebra: systolic array algorithms, message-passing systems, algorithms for p
Application of BIM technology in green scientific research office building

Science.gov (United States)

Ni, Xin; Sun, Jianhua; Wang, Bo

2017-05-01

BIM technology as a kind of information technology, has been along with the advancement of building industrialization application in domestic building industry gradually. Based on reasonable construction BIM model, using BIM technology platform, through collaborative design tools can effectively improve the design efficiency and design quality. Vanda northwest engineering design and research institute co., LTD., the scientific research office building project in combination with the practical situation of engineering using BIM technology, formed in the BIM model combined with related information according to the energy energy model (BEM) and the application of BIM technology in construction management stage made exploration, and the direct experience and the achievements gained by the architectural design part made a summary.
Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

Science.gov (United States)

Harper, Richard

1989-01-01

In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.
The present status of scientific applications of nuclear explosions

Energy Technology Data Exchange (ETDEWEB)

Cowan, G A; Diven, B C [Los Alamos Scientific Laboratory, University of California, Los Alamos, NM (United States)

1970-05-15

This is the fourth in a series of symposia which started, in 1957 at Livermore with the purpose of examining the peaceful uses of nuclear explosives. Although principal emphasis has {sup b}een placed on technological applications, the discussions have, from the outset, included the fascinating question of scientific uses. Of the possible scientific applications which were mentioned at the 1957 meeting, the proposals which attracted most attention involved uses of nuclear explosions for research in seismology. It is interesting to note that since then a very large and stimulating body of data in the field of seismology has been collected from nuclear tests. Ideas for scientific applications of nuclear explosions go back considerably further than 1957. During the war days Otto Frisch at Los Alamos suggested that a fission bomb would provide an excellent source of fast neutrons which could be led down a vacuum pipe and used for experiments in a relatively unscattered state. This idea, reinvented, modified, and elaborated upon in the ensuing twenty-five years, provides the basis for much of the research discussed in this morning's program. In 1952 a somewhat different property of nuclear explosions, their ability to produce intense neutron exposures on internal targets and to synthesize large quantities of multiple neutron capture products, was dramatically brought to our attention by analysis of debris from the first large thermonuclear explosion (Mike) in which the elements einsteinium and fermiun were observed for the first time. The reports of the next two Plowshare symposia in 1959 and 1964 help record the fascinating development of the scientific uses of neutrons in nuclear explosions. Starting with two 'wheel' experiments in 1958 to measure symmetry of fission in 235-U resonances, the use of external beams of energy-resolved neutrons was expanded on the 'Gnome' experiment in 1961 to include the measurement of neutron capture excitation functions for 238-U, 232
Althusserian Theory: From Scientific Truth to Institutional History

Directory of Open Access Journals (Sweden)

Philip Goldstein

1994-01-01

Full Text Available Scholars have emphasized the scientific and the rationalist features of Althusser's work, but few have noted its post-structuralist aspects, especially its Foucauldian accounts of discourse and power. In the early Pour Marx , Althusser divides ideological practices from objective science and theoretical norms from empirical facts; however, in several later essays Althusser repudiates his earlier faith in theory's normative force as well as his broad distinction between science and ideology. He argues that every discipline establishes its own relationship between its ideological history and its formal, scientific ideals. This argument, together with Althusser's earlier rejection of totalizing approaches, establishes important parallels with Foucault's archaeological studies. The literary theory of Tony Bennett, who develops a Foucauldian critique of traditional and Marxist aesthetics, illuminates the rich implications of these parallels for cultural analyses.
PSHED: a simplified approach to developing parallel programs

International Nuclear Information System (INIS)

Mahajan, S.M.; Ramesh, K.; Rajesh, K.; Somani, A.; Goel, M.

1992-01-01

This paper presents a simplified approach in the forms of a tree structured computational model for parallel application programs. An attempt is made to provide a standard user interface to execute programs on BARC Parallel Processing System (BPPS), a scalable distributed memory multiprocessor. The interface package called PSHED provides a basic framework for representing and executing parallel programs on different parallel architectures. The PSHED package incorporates concepts from a broad range of previous research in programming environments and parallel computations. (author). 6 refs
Scientific and technical guidance for the preparation and presentation of a health claim application (Revision 2)

DEFF Research Database (Denmark)

Sjödin, Anders Mikael

2017-01-01

EFSA asked the Panel on Dietetic Products, Nutrition and Allergies (NDA) to update the scientific and technical guidance for the preparation and presentation of an application for authorisation of a health claim published in 2011. Since then, the NDA Panel has gained considerable experience...... developments in this area. This guidance document presents a common format for the organisation of information for the preparation of a well-structured application for authorisation of health claims which fall under Articles 13(5), 14 and 19 of Regulation (EC) No 1924/2006. This guidance outlines...... the information and scientific data which must be included in the application, the hierarchy of different types of data and study designs, and the key issues which should be addressed in the application to substantiate the health claim....
3D, parallel fluid-structure interaction code

CSIR Research Space (South Africa)

Oxtoby, Oliver F

2011-01-01

Full Text Available The authors describe the development of a 3D parallel Fluid–Structure–Interaction (FSI) solver and its application to benchmark problems. Fluid and solid domains are discretised using and edge-based finite-volume scheme for efficient parallel...
The ongoing investigation of high performance parallel computing in HEP

CERN Document Server

Peach, Kenneth J; Böck, R K; Dobinson, Robert W; Hansroul, M; Norton, Alan Robert; Willers, Ian Malcolm; Baud, J P; Carminati, F; Gagliardi, F; McIntosh, E; Metcalf, M; Robertson, L; CERN. Geneva. Detector Research and Development Committee

1993-01-01

Past and current exploitation of parallel computing in High Energy Physics is summarized and a list of R & D projects in this area is presented. The applicability of new parallel hardware and software to physics problems is investigated, in the light of the requirements for computing power of LHC experiments and the current trends in the computer industry. Four main themes are discussed (possibilities for a finer grain of parallelism; fine-grain communication mechanism; usable parallel programming environment; different programming models and architectures, using standard commercial products). Parallel computing technology is potentially of interest for offline and vital for real time applications in LHC. A substantial investment in applications development and evaluation of state of the art hardware and software products is needed. A solid development environment is required at an early stage, before mainline LHC program development begins.
Advanced parallel processing with supercomputer architectures

International Nuclear Information System (INIS)

Hwang, K.

1987-01-01

This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers

Integrated computer network high-speed parallel interface

International Nuclear Information System (INIS)

Frank, R.B.

1979-03-01

As the number and variety of computers within Los Alamos Scientific Laboratory's Central Computer Facility grows, the need for a standard, high-speed intercomputer interface has become more apparent. This report details the development of a High-Speed Parallel Interface from conceptual through implementation stages to meet current and future needs for large-scle network computing within the Integrated Computer Network. 4 figures
Parallel processing for artificial intelligence 2

CERN Document Server

Kumar, V; Suttner, CB

1994-01-01

With the increasing availability of parallel machines and the raising of interest in large scale and real world applications, research on parallel processing for Artificial Intelligence (AI) is gaining greater importance in the computer science environment. Many applications have been implemented and delivered but the field is still considered to be in its infancy. This book assembles diverse aspects of research in the area, providing an overview of the current state of technology. It also aims to promote further growth across the discipline. Contributions have been grouped according to their
A Parallel Numerical Micromagnetic Code Using FEniCS

Science.gov (United States)

Nagy, L.; Williams, W.; Mitchell, L.

2013-12-01

Many problems in the geosciences depend on understanding the ability of magnetic minerals to provide stable paleomagnetic recordings. Numerical micromagnetic modelling allows us to calculate the domain structures found in naturally occurring magnetic materials. However the computational cost rises exceedingly quickly with respect to the size and complexity of the geometries that we wish to model. This problem is compounded by the fact that the modern processor design no longer focuses on the speed at which calculations are performed, but rather on the number of computational units amongst which we may distribute our calculations. Consequently to better exploit modern computational resources our micromagnetic simulations must "go parallel". We present a parallel and scalable micromagnetics code written using FEniCS. FEniCS is a multinational collaboration involving several institutions (University of Cambridge, University of Chicago, The Simula Research Laboratory, etc.) that aims to provide a set of tools for writing scientific software; in particular software that employs the finite element method. The advantages of this approach are the leveraging of pre-existing projects from the world of scientific computing (PETSc, Trilinos, Metis/Parmetis, etc.) and exposing these so that researchers may pose problems in a manner closer to the mathematical language of their domain. Our code provides a scriptable interface (in Python) that allows users to not only run micromagnetic models in parallel, but also to perform pre/post processing of data.
Proceeding of the Scientific Meeting and Presentation on Accelerator Technology and its Application

International Nuclear Information System (INIS)

Sudjatmoko; Anggraita, P.; Darsono; Sudiyanto; Kusminarto; Karyono

1999-07-01

The proceeding contains papers presented on Scientific Meeting and Presentation on Accelerator Technology and Its Application, held in Yogyakarta, 16 january 1996. This proceeding contains papers on accelerator technology, especially electron beam machine. There are 11 papers indexed individually. (ID)
Performance of Air Pollution Models on Massively Parallel Computers

DEFF Research Database (Denmark)

Brown, John; Hansen, Per Christian; Wasniewski, Jerzy

1996-01-01

To compare the performance and use of three massively parallel SIMD computers, we implemented a large air pollution model on the computers. Using a realistic large-scale model, we gain detailed insight about the performance of the three computers when used to solve large-scale scientific problems...
Parallel algorithms for continuum dynamics

International Nuclear Information System (INIS)

Hicks, D.L.; Liebrock, L.M.

1987-01-01

Simply porting existing parallel programs to a new parallel processor may not achieve the full speedup possible; to achieve the maximum efficiency may require redesigning the parallel algorithms for the specific architecture. The authors discuss here parallel algorithms that were developed first for the HEP processor and then ported to the CRAY X-MP/4, the ELXSI/10, and the Intel iPSC/32. Focus is mainly on the most recent parallel processing results produced, i.e., those on the Intel Hypercube. The applications are simulations of continuum dynamics in which the momentum and stress gradients are important. Examples of these are inertial confinement fusion experiments, severe breaks in the coolant system of a reactor, weapons physics, shock-wave physics. Speedup efficiencies on the Intel iPSC Hypercube are very sensitive to the ratio of communication to computation. Great care must be taken in designing algorithms for this machine to avoid global communication. This is much more critical on the iPSC than it was on the three previous parallel processors
Dynamic Analysis and Vibration Attenuation of Cable-Driven Parallel Manipulators for Large Workspace Applications

Directory of Open Access Journals (Sweden)

Jingli Du

2013-01-01

Full Text Available Cable-driven parallel manipulators are one of the best solutions to achieving large workspace since flexible cables can be easily stored on reels. However, due to the negligible flexural stiffness of cables, long cables will unavoidably vibrate during operation for large workspace applications. In this paper a finite element model for cable-driven parallel manipulators is proposed to mimic small amplitude vibration of cables around their desired position. Output feedback of the cable tension variation at the end of the end-effector is utilized to design the vibration attenuation controller which aims at attenuating the vibration of cables by slightly varying the cable length, thus decreasing its effect on the end-effector. When cable vibration is attenuated, motion controller could be designed for implementing precise large motion to track given trajectories. A numerical example is presented to demonstrate the dynamic model and the control algorithm.
Multitasking TORT Under UNICOS: Parallel Performance Models and Measurements

International Nuclear Information System (INIS)

Azmy, Y.Y.; Barnett, D.A.

1999-01-01

The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-COS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead
Multitasking TORT under UNICOS: Parallel performance models and measurements

International Nuclear Information System (INIS)

Barnett, A.; Azmy, Y.Y.

1999-01-01

The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead
Modern industrial simulation tools: Kernel-level integration of high performance parallel processing, object-oriented numerics, and adaptive finite element analysis. Final report, July 16, 1993--September 30, 1997

Energy Technology Data Exchange (ETDEWEB)

Deb, M.K.; Kennon, S.R.

1998-04-01

A cooperative R&D effort between industry and the US government, this project, under the HPPP (High Performance Parallel Processing) initiative of the Dept. of Energy, started the investigations into parallel object-oriented (OO) numerics. The basic goal was to research and utilize the emerging technologies to create a physics-independent computational kernel for applications using adaptive finite element method. The industrial team included Computational Mechanics Co., Inc. (COMCO) of Austin, TX (as the primary contractor), Scientific Computing Associates, Inc. (SCA) of New Haven, CT, Texaco and CONVEX. Sandia National Laboratory (Albq., NM) was the technology partner from the government side. COMCO had the responsibility of the main kernel design and development, SCA had the lead in parallel solver technology and guidance on OO technologies was Sandia`s main expertise in this venture. CONVEX and Texaco supported the partnership by hardware resource and application knowledge, respectively. As such, a minimum of fifty-percent cost-sharing was provided by the industry partnership during this project. This report describes the R&D activities and provides some details about the prototype kernel and example applications.
Parallel thermal radiation transport in two dimensions

International Nuclear Information System (INIS)

Smedley-Stevenson, R.P.; Ball, S.R.

2003-01-01

This paper describes the distributed memory parallel implementation of a deterministic thermal radiation transport algorithm in a 2-dimensional ALE hydrodynamics code. The parallel algorithm consists of a variety of components which are combined in order to produce a state of the art computational capability, capable of solving large thermal radiation transport problems using Blue-Oak, the 3 Tera-Flop MPP (massive parallel processors) computing facility at AWE (United Kingdom). Particular aspects of the parallel algorithm are described together with examples of the performance on some challenging applications. (author)
Parallel thermal radiation transport in two dimensions

Energy Technology Data Exchange (ETDEWEB)

Smedley-Stevenson, R.P.; Ball, S.R. [AWE Aldermaston (United Kingdom)

2003-07-01

This paper describes the distributed memory parallel implementation of a deterministic thermal radiation transport algorithm in a 2-dimensional ALE hydrodynamics code. The parallel algorithm consists of a variety of components which are combined in order to produce a state of the art computational capability, capable of solving large thermal radiation transport problems using Blue-Oak, the 3 Tera-Flop MPP (massive parallel processors) computing facility at AWE (United Kingdom). Particular aspects of the parallel algorithm are described together with examples of the performance on some challenging applications. (author)
Parallel Computational Fluid Dynamics 2007 : Implementations and Experiences on Large Scale and Grid Computing

CERN Document Server

2009-01-01

At the 19th Annual Conference on Parallel Computational Fluid Dynamics held in Antalya, Turkey, in May 2007, the most recent developments and implementations of large-scale and grid computing were presented. This book, comprised of the invited and selected papers of this conference, details those advances, which are of particular interest to CFD and CFD-related communities. It also offers the results related to applications of various scientific and engineering problems involving flows and flow-related topics. Intended for CFD researchers and graduate students, this book is a state-of-the-art presentation of the relevant methodology and implementation techniques of large-scale computing.
Parallel fabrication of macroporous scaffolds.

Science.gov (United States)

Dobos, Andrew; Grandhi, Taraka Sai Pavan; Godeshala, Sudhakar; Meldrum, Deirdre R; Rege, Kaushal

2018-07-01

Scaffolds generated from naturally occurring and synthetic polymers have been investigated in several applications because of their biocompatibility and tunable chemo-mechanical properties. Existing methods for generation of 3D polymeric scaffolds typically cannot be parallelized, suffer from low throughputs, and do not allow for quick and easy removal of the fragile structures that are formed. Current molds used in hydrogel and scaffold fabrication using solvent casting and porogen leaching are often single-use and do not facilitate 3D scaffold formation in parallel. Here, we describe a simple device and related approaches for the parallel fabrication of macroporous scaffolds. This approach was employed for the generation of macroporous and non-macroporous materials in parallel, in higher throughput and allowed for easy retrieval of these 3D scaffolds once formed. In addition, macroporous scaffolds with interconnected as well as non-interconnected pores were generated, and the versatility of this approach was employed for the generation of 3D scaffolds from diverse materials including an aminoglycoside-derived cationic hydrogel ("Amikagel"), poly(lactic-co-glycolic acid) or PLGA, and collagen. Macroporous scaffolds generated using the device were investigated for plasmid DNA binding and cell loading, indicating the use of this approach for developing materials for different applications in biotechnology. Our results demonstrate that the device-based approach is a simple technology for generating scaffolds in parallel, which can enhance the toolbox of current fabrication techniques. © 2018 Wiley Periodicals, Inc.
Educational and Scientific Applications of Climate Model Diagnostic Analyzer

Science.gov (United States)

Lee, S.; Pan, L.; Zhai, C.; Tang, B.; Kubar, T. L.; Zhang, J.; Bao, Q.

2016-12-01

Climate Model Diagnostic Analyzer (CMDA) is a web-based information system designed for the climate modeling and model analysis community to analyze climate data from models and observations. CMDA provides tools to diagnostically analyze climate data for model validation and improvement, and to systematically manage analysis provenance for sharing results with other investigators. CMDA utilizes cloud computing resources, multi-threading computing, machine-learning algorithms, web service technologies, and provenance-supporting technologies to address technical challenges that the Earth science modeling and model analysis community faces in evaluating and diagnosing climate models. As CMDA infrastructure and technology have matured, we have developed the educational and scientific applications of CMDA. Educationally, CMDA supported the summer school of the JPL Center for Climate Sciences for three years since 2014. In the summer school, the students work on group research projects where CMDA provide datasets and analysis tools. Each student is assigned to a virtual machine with CMDA installed in Amazon Web Services. A provenance management system for CMDA is developed to keep track of students' usages of CMDA, and to recommend datasets and analysis tools for their research topic. The provenance system also allows students to revisit their analysis results and share them with their group. Scientifically, we have developed several science use cases of CMDA covering various topics, datasets, and analysis types. Each use case developed is described and listed in terms of a scientific goal, datasets used, the analysis tools used, scientific results discovered from the use case, an analysis result such as output plots and data files, and a link to the exact analysis service call with all the input arguments filled. For example, one science use case is the evaluation of NCAR CAM5 model with MODIS total cloud fraction. The analysis service used is Difference Plot Service of
Building Input Adaptive Parallel Applications: A Case Study of Sparse Grid Interpolation

KAUST Repository

Murarasu, Alin

2012-12-01

The well-known power wall resulting in multi-cores requires special techniques for speeding up applications. In this sense, parallelization plays a crucial role. Besides standard serial optimizations, techniques such as input specialization can also bring a substantial contribution to the speedup. By identifying common patterns in the input data, we propose new algorithms for sparse grid interpolation that accelerate the state-of-the-art non-specialized version. Sparse grid interpolation is an inherently hierarchical method of interpolation employed for example in computational steering applications for decompressing highdimensional simulation data. In this context, improving the speedup is essential for real-time visualization. Using input specialization, we report a speedup of up to 9x over the nonspecialized version. The paper covers the steps we took to reach this speedup by means of input adaptivity. Our algorithms will be integrated in fastsg, a library for fast sparse grid interpolation. © 2012 IEEE.
Performance Modeling of Hybrid MPI/OpenMP Scientific Applications on Large-scale Multicore Cluster Systems

KAUST Repository

Wu, Xingfu; Taylor, Valerie

2011-01-01

In this paper, we present a performance modeling framework based on memory bandwidth contention time and a parameterized communication model to predict the performance of OpenMP, MPI and hybrid applications with weak scaling on three large-scale multicore clusters: IBM POWER4, POWER5+ and Blue Gene/P, and analyze the performance of these MPI, OpenMP and hybrid applications. We use STREAM memory benchmarks to provide initial performance analysis and model validation of MPI and OpenMP applications on these multicore clusters because the measured sustained memory bandwidth can provide insight into the memory bandwidth that a system should sustain on scientific applications with the same amount of workload per core. In addition to using these benchmarks, we also use a weak-scaling hybrid MPI/OpenMP large-scale scientific application: Gyro kinetic Toroidal Code in magnetic fusion to validate our performance model of the hybrid application on these multicore clusters. The validation results for our performance modeling method show less than 7.77% error rate in predicting the performance of hybrid MPI/OpenMP GTC on up to 512 cores on these multicore clusters. © 2011 IEEE.
Performance Modeling of Hybrid MPI/OpenMP Scientific Applications on Large-scale Multicore Cluster Systems

KAUST Repository

Wu, Xingfu

2011-08-01

In this paper, we present a performance modeling framework based on memory bandwidth contention time and a parameterized communication model to predict the performance of OpenMP, MPI and hybrid applications with weak scaling on three large-scale multicore clusters: IBM POWER4, POWER5+ and Blue Gene/P, and analyze the performance of these MPI, OpenMP and hybrid applications. We use STREAM memory benchmarks to provide initial performance analysis and model validation of MPI and OpenMP applications on these multicore clusters because the measured sustained memory bandwidth can provide insight into the memory bandwidth that a system should sustain on scientific applications with the same amount of workload per core. In addition to using these benchmarks, we also use a weak-scaling hybrid MPI/OpenMP large-scale scientific application: Gyro kinetic Toroidal Code in magnetic fusion to validate our performance model of the hybrid application on these multicore clusters. The validation results for our performance modeling method show less than 7.77% error rate in predicting the performance of hybrid MPI/OpenMP GTC on up to 512 cores on these multicore clusters. © 2011 IEEE.
Progress on H5Part: A Portable High Performance Parallel Data Interface for Electromagnetics Simulations

International Nuclear Information System (INIS)

Adelmann, Andreas; Gsell, Achim; Oswald, Benedikt; Schietinger, Thomas; Bethel, Wes; Shalf, John; Siegerist, Cristina; Stockinger, Kurt

2007-01-01

Significant problems facing all experimental and computational sciences arise from growing data size and complexity. Common to all these problems is the need to perform efficient data I/O on diverse computer architectures. In our scientific application, the largest parallel particle simulations generate vast quantities of six-dimensional data. Such a simulation run produces data for an aggregate data size up to several TB per run. Motivated by the need to address data I/O and access challenges, we have implemented H5Part, an open source data I/O API that simplifies the use of the Hierarchical Data Format v5 library (HDF5). HDF5 is an industry standard for high performance, cross-platform data storage and retrieval that runs on all contemporary architectures from large parallel supercomputers to laptops. H5Part, which is oriented to the needs of the particle physics and cosmology communities, provides support for parallel storage and retrieval of particles, structured and in the future unstructured meshes. In this paper, we describe recent work focusing on I/O support for particles and structured meshes and provide data showing performance on modern supercomputer architectures like the IBM POWER 5
Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

2014-11-11

Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

Parallelism in computations in quantum and statistical mechanics

International Nuclear Information System (INIS)

Clementi, E.; Corongiu, G.; Detrich, J.H.

1985-01-01

Often very fundamental biochemical and biophysical problems defy simulations because of limitations in today's computers. We present and discuss a distributed system composed of two IBM 4341 s and/or an IBM 4381 as front-end processors and ten FPS-164 attached array processors. This parallel system - called LCAP - has presently a peak performance of about 110 Mflops; extensions to higher performance are discussed. Presently, the system applications use a modified version of VM/SP as the operating system: description of the modifications is given. Three applications programs have been migrated from sequential to parallel: a molecular quantum mechanical, a Metropolis-Monte Carlo and a molecular dynamics program. Descriptions of the parallel codes are briefly outlined. Use of these parallel codes has already opened up new capabilities for our research. The very positive performance comparisons with today's supercomputers allow us to conclude that parallel computers and programming, of the type we have considered, represent a pragmatic answer to many computationally intensive problems. (orig.)
Back-End of the web application for the scientific journal Studia Kinanthropologica

OpenAIRE

ŠIMÁK, Lubomír

2017-01-01

The bachelor thesis deals with the creation of the server part of the web application of the scientific reviewed magazine Studia Kinanthropologica, which will serve for the review of the articles for printing. The bachelor thesis describes how to work on this system and how to solve the problems that have arisen in this work.
Parallel programming with Python

CERN Document Server

Palach, Jan

2014-01-01

A fast, easy-to-follow and clear tutorial to help you develop Parallel computing systems using Python. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts and will help you in implementing these techniques in the real world. If you are an experienced Python programmer and are willing to utilize the available computing resources by parallelizing applications in a simple way, then this book is for you. You are required to have a basic knowledge of Python development to get the most of this book.
75 FR 51439 - Proposed Information Collection; Comment Request; Application and Reports for Scientific Research...

Science.gov (United States)

2010-08-20

... DEPARTMENT OF COMMERCE National Oceanic and Atmospheric Administration Proposed Information Collection; Comment Request; Application and Reports for Scientific Research and Enhancement Permits Under the Endangered Species Act AGENCY: National Oceanic and Atmospheric Administration (NOAA), Commerce...
FISH: A THREE-DIMENSIONAL PARALLEL MAGNETOHYDRODYNAMICS CODE FOR ASTROPHYSICAL APPLICATIONS

International Nuclear Information System (INIS)

Kaeppeli, R.; Whitehouse, S. C.; Scheidegger, S.; Liebendoerfer, M.; Pen, U.-L.

2011-01-01

FISH is a fast and simple ideal magnetohydrodynamics code that scales to ∼10,000 processes for a Cartesian computational domain of ∼1000 3 cells. The simplicity of FISH has been achieved by the rigorous application of the operator splitting technique, while second-order accuracy is maintained by the symmetric ordering of the operators. Between directional sweeps, the three-dimensional data are rotated in memory so that the sweep is always performed in a cache-efficient way along the direction of contiguous memory. Hence, the code only requires a one-dimensional description of the conservation equations to be solved. This approach also enables an elegant novel parallelization of the code that is based on persistent communications with MPI for cubic domain decomposition on machines with distributed memory. This scheme is then combined with an additional OpenMP parallelization of different sweeps that can take advantage of clusters of shared memory. We document the detailed implementation of a second-order total variation diminishing advection scheme based on flux reconstruction. The magnetic fields are evolved by a constrained transport scheme. We show that the subtraction of a simple estimate of the hydrostatic gradient from the total gradients can significantly reduce the dissipation of the advection scheme in simulations of gravitationally bound hydrostatic objects. Through its simplicity and efficiency, FISH is as well suited for hydrodynamics classes as for large-scale astrophysical simulations on high-performance computer clusters. In preparation for the release of a public version, we demonstrate the performance of FISH in a suite of astrophysically orientated test cases.
Communications oriented programming of parallel iterative solutions of sparse linear systems

Science.gov (United States)

Patrick, M. L.; Pratt, T. W.

1986-01-01

Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.
Distributed Geant4 simulation in medical and space science applications using DIANE framework and the GRID

CERN Document Server

Moscicki, J T; Mantero, A; Pia, M G

2003-01-01

Distributed computing is one of the most important trends in IT which has recently gained significance for large-scale scientific applications. Distributed analysis environment (DIANE) is a R&D study, focusing on semiinteractive parallel and remote data analysis and simulation, which has been conducted at CERN. DIANE provides necessary software infrastructure for parallel scientific applications in the master-worker model. Advanced error recovery policies, automatic book-keeping of distributed jobs and on-line monitoring and control tools are provided. DIANE makes a transparent use of a number of different middleware implementations such as load balancing service (LSF, PBS, GRID Resource Broker, Condor) and security service (GSI, Kerberos, openssh). A number of distributed Geant 4 simulations have been deployed and tested, ranging from interactive radiotherapy treatment planning using dedicated clusters in hospitals, to globally-distributed simulations of astrophysics experiments using the European data g...
Teleconference versus face-to-face scientific peer review of grant application: effects on review outcomes.

Directory of Open Access Journals (Sweden)

Stephen A Gallo

Full Text Available Teleconferencing as a setting for scientific peer review is an attractive option for funding agencies, given the substantial environmental and cost savings. Despite this, there is a paucity of published data validating teleconference-based peer review compared to the face-to-face process. Our aim was to conduct a retrospective analysis of scientific peer review data to investigate whether review setting has an effect on review process and outcome measures. We analyzed reviewer scoring data from a research program that had recently modified the review setting from face-to-face to a teleconference format with minimal changes to the overall review procedures. This analysis included approximately 1600 applications over a 4-year period: two years of face-to-face panel meetings compared to two years of teleconference meetings. The average overall scientific merit scores, score distribution, standard deviations and reviewer inter-rater reliability statistics were measured, as well as reviewer demographics and length of time discussing applications. The data indicate that few differences are evident between face-to-face and teleconference settings with regard to average overall scientific merit score, scoring distribution, standard deviation, reviewer demographics or inter-rater reliability. However, some difference was found in the discussion time. These findings suggest that most review outcome measures are unaffected by review setting, which would support the trend of using teleconference reviews rather than face-to-face meetings. However, further studies are needed to assess any correlations among discussion time, application funding and the productivity of funded research projects.
Teleconference versus Face-to-Face Scientific Peer Review of Grant Application: Effects on Review Outcomes

Science.gov (United States)

Gallo, Stephen A.; Carpenter, Afton S.; Glisson, Scott R.

2013-01-01

Teleconferencing as a setting for scientific peer review is an attractive option for funding agencies, given the substantial environmental and cost savings. Despite this, there is a paucity of published data validating teleconference-based peer review compared to the face-to-face process. Our aim was to conduct a retrospective analysis of scientific peer review data to investigate whether review setting has an effect on review process and outcome measures. We analyzed reviewer scoring data from a research program that had recently modified the review setting from face-to-face to a teleconference format with minimal changes to the overall review procedures. This analysis included approximately 1600 applications over a 4-year period: two years of face-to-face panel meetings compared to two years of teleconference meetings. The average overall scientific merit scores, score distribution, standard deviations and reviewer inter-rater reliability statistics were measured, as well as reviewer demographics and length of time discussing applications. The data indicate that few differences are evident between face-to-face and teleconference settings with regard to average overall scientific merit score, scoring distribution, standard deviation, reviewer demographics or inter-rater reliability. However, some difference was found in the discussion time. These findings suggest that most review outcome measures are unaffected by review setting, which would support the trend of using teleconference reviews rather than face-to-face meetings. However, further studies are needed to assess any correlations among discussion time, application funding and the productivity of funded research projects. PMID:23951223
Numeric algorithms for parallel processors computer architectures with applications to the few-groups neutron diffusion equations

International Nuclear Information System (INIS)

Zee, S.K.

1987-01-01

A numeric algorithm and an associated computer code were developed for the rapid solution of the finite-difference method representation of the few-group neutron-diffusion equations on parallel computers. Applications of the numeric algorithm on both SIMD (vector pipeline) and MIMD/SIMD (multi-CUP/vector pipeline) architectures were explored. The algorithm was successfully implemented in the two-group, 3-D neutron diffusion computer code named DIFPAR3D (DIFfusion PARallel 3-Dimension). Numerical-solution techniques used in the code include the Chebyshev polynomial acceleration technique in conjunction with the power method of outer iteration. For inner iterations, a parallel form of red-black (cyclic) line SOR with automated determination of group dependent relaxation factors and iteration numbers required to achieve specified inner iteration error tolerance is incorporated. The code employs a macroscopic depletion model with trace capability for selected fission products' transients and critical boron. In addition to this, moderator and fuel temperature feedback models are also incorporated into the DIFPAR3D code, for realistic simulation of power reactor cores. The physics models used were proven acceptable in separate benchmarking studies
The sixth Nordic conference on the application of scientific methods in archaeology

International Nuclear Information System (INIS)

1993-01-01

The Sixth Nordic Conference on the Application of Scientific Methods in Archaeology with 73 participants was convened in Esbjerg (Denmark), 19-23 September 1993. Isotope dating of archaeological, paleoecological and geochronological objects, neutron activation and XRF analytical methods, magnetometry, thermoluminescence etc. have been discussed. The program included excursions to archaeological sites and a poster session with 12 posters. (EG)
Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

Science.gov (United States)

Moon, Hongsik

What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the
Mechanics of curved surfaces, with application to surface-parallel cracks

Science.gov (United States)

Martel, Stephen J.

2011-10-01

The surfaces of many bodies are weakened by shallow enigmatic cracks that parallel the surface. A re-formulation of the static equilibrium equations in a curvilinear reference frame shows that a tension perpendicular to a traction-free surface can arise at shallow depths even under the influence of gravity. This condition occurs if σ11k1 + σ22k2 > ρg cosβ, where k1 and k2 are the principal curvatures (negative if convex) at the surface, σ11 and σ22 are tensile (positive) or compressive (negative) stresses parallel to the respective principal curvature arcs, ρ is material density, g is gravitational acceleration, and β is the surface slope. The curvature terms do not appear in equilibrium equations in a Cartesian reference frame. Compression parallel to a convex surface thus can cause subsurface cracks to open. A quantitative test of the relationship above accounts for where sheeting joints (prominent shallow surface-parallel fractures in rock) are abundant and for where they are scarce or absent in the varied topography of Yosemite National Park, resolving key aspects of a classic problem in geology: the formation of sheeting joints. Moreover, since the equilibrium equations are independent of rheology, the relationship above can be applied to delamination or spalling caused by surface-parallel cracks in many materials.
Java parallel secure stream for grid computing

International Nuclear Information System (INIS)

Chen, J.; Akers, W.; Chen, Y.; Watson, W.

2001-01-01

The emergence of high speed wide area networks makes grid computing a reality. However grid applications that need reliable data transfer still have difficulties to achieve optimal TCP performance due to network tuning of TCP window size to improve the bandwidth and to reduce latency on a high speed wide area network. The authors present a pure Java package called JPARSS (Java Parallel Secure Stream) that divides data into partitions that are sent over several parallel Java streams simultaneously and allows Java or Web applications to achieve optimal TCP performance in a gird environment without the necessity of tuning the TCP window size. Several experimental results are provided to show that using parallel stream is more effective than tuning TCP window size. In addition X.509 certificate based single sign-on mechanism and SSL based connection establishment are integrated into this package. Finally a few applications using this package will be discussed
Optimization and parallelization of the thermal–hydraulic subchannel code CTF for high-fidelity multi-physics applications

International Nuclear Information System (INIS)

Salko, Robert K.; Schmidt, Rodney C.; Avramova, Maria N.

2015-01-01

parallel, two additional libraries are currently needed: MPI, for inter-processor message passing, and the Parallel Extensible Toolkit for Scientific Computation (PETSc), which is used to solve the global pressure matrix in parallel. Results presented include a set of testing and verification calculations and performance tests assessing parallel scaling characteristics up to a full-core, pincell-resolved model of a PWR core containing 193 17 × 17 assemblies under hot full-power conditions. This model, representative of Watts Bar Unit 1 and containing about 56,000 pins, was modeled with roughly 59,000 subchannels, leading to about 2.8 million thermal–hydraulic control volumes in total. Results demonstrate that CTF can now perform full-core analysis of a PWR (not previously possible owing to excessively long runtimes and memory requirements) on the order of 20 min. This new capability not only is useful to stand-alone CTF users, but also is being leveraged in support of coupled code multi-physics calculations being done in the CASL program
The application of image processing in the measurement for three-light-axis parallelity of laser ranger

Science.gov (United States)

Wang, Yang; Wang, Qianqian

2008-12-01

When laser ranger is transported or used in field operations, the transmitting axis, receiving axis and aiming axis may be not parallel. The nonparallelism of the three-light-axis will affect the range-measuring ability or make laser ranger not be operated exactly. So testing and adjusting the three-light-axis parallelity in the production and maintenance of laser ranger is important to ensure using laser ranger reliably. The paper proposes a new measurement method using digital image processing based on the comparison of some common measurement methods for the three-light-axis parallelity. It uses large aperture off-axis paraboloid reflector to get the images of laser spot and white light cross line, and then process the images on LabVIEW platform. The center of white light cross line can be achieved by the matching arithmetic in LABVIEW DLL. And the center of laser spot can be achieved by gradation transformation, binarization and area filter in turn. The software system can set CCD, detect the off-axis paraboloid reflector, measure the parallelity of transmitting axis and aiming axis and control the attenuation device. The hardware system selects SAA7111A, a programmable vedio decoding chip, to perform A/D conversion. FIFO (first-in first-out) is selected as buffer.USB bus is used to transmit data to PC. The three-light-axis parallelity can be achieved according to the position bias between them. The device based on this method has been already used. The application proves this method has high precision, speediness and automatization.
Scientific Data Services -- A High-Performance I/O System with Array Semantics

Energy Technology Data Exchange (ETDEWEB)

Wu, Kesheng; Byna, Surendra; Rotem, Doron; Shoshani, Arie

2011-09-21

As high-performance computing approaches exascale, the existing I/O system design is having trouble keeping pace in both performance and scalability. We propose to address this challenge by adopting database principles and techniques in parallel I/O systems. First, we propose to adopt an array data model because many scientific applications represent their data in arrays. This strategy follows a cardinal principle from database research, which separates the logical view from the physical layout of data. This high-level data model gives the underlying implementation more freedom to optimize the physical layout and to choose the most effective way of accessing the data. For example, knowing that a set of write operations is working on a single multi-dimensional array makes it possible to keep the subarrays in a log structure during the write operations and reassemble them later into another physical layout as resources permit. While maintaining the high-level view, the storage system could compress the user data to reduce the physical storage requirement, collocate data records that are frequently used together, or replicate data to increase availability and fault-tolerance. Additionally, the system could generate secondary data structures such as database indexes and summary statistics. We expect the proposed Scientific Data Services approach to create a “live” storage system that dynamically adjusts to user demands and evolves with the massively parallel storage hardware.
A Pipelined and Parallel Architecture for Quantum Monte Carlo Simulations on FPGAs

Directory of Open Access Journals (Sweden)

Akila Gothandaraman

2010-01-01

Full Text Available Recent advances in Field-Programmable Gate Array (FPGA technology make reconfigurable computing using FPGAs an attractive platform for accelerating scientific applications. We develop a deeply pipelined and parallel architecture for Quantum Monte Carlo simulations using FPGAs. Quantum Monte Carlo simulations enable us to obtain the structural and energetic properties of atomic clusters. We experiment with different pipeline structures for each component of the design and develop a deeply pipelined architecture that provides the best performance in terms of achievable clock rate, while at the same time has a modest use of the FPGA resources. We discuss the details of the pipelined and generic architecture that is used to obtain the potential energy and wave function of a cluster of atoms.
REEXPORT OF SCIENTIFIC COMPETENCIES IN THE LIGHT OF THE RE-CONSTRUCTION OF A NETWORK OF SCIENTIFIC-RESEARCH BODIES

Directory of Open Access Journals (Sweden)

О. A. Yeremchenko

2016-01-01

Full Text Available One of the primary challengesRussiais currently facing is the need for diversification of the Russian economy and its increase in the share of manufacturing and exported scientific-driven work products. In this light, improving the effectiveness of the scientific-technological complex of the country is becoming increasingly important. The article considers two scalable, developed in parallel, projects for increasing effectiveness of the scientificresearch sector: restructurization of the scientific organizations network and the project for bringing back home 15 thousand Russian scientists reverse immigration. A conclusion is made about the adequacy of a refusal from a large-scale change in the personnel of scientists in circumstances of when the budget for research and development and the number of scientific-research organizations is cut. It is proposed to create comfortable conditions for scientific search for all parties involved in the process of new knowledge creation, both for the scientists returning toRussiaand those that remain working in the country.
Scientific basis of development and application of nanotechnologies in oil industry

International Nuclear Information System (INIS)

Mirzajanzadeh, A.; Maharramov, A.; Abdullayev, R.; Yuzifzadeh, Kh.; Shahbazov, E.; Qurbanov, R.; Akhmadov, S.; Kazimov, E; Ramazanov, M.; Shafiyev, Sh.; Hajizadeh, N.

2010-01-01

Development and introduction of nanotechnologies in the oil industry is one of the most pressing issues of the present times.For the first time in the world practice scientific-methodological basis and application practice of nanotechnologies in oil industry is developed on the basis of uniform, scientifically proven approach by taking into account the specificities of oil and gas industry.The application system of such nanotechnologies was developed in oil and gas production.Mathematical models of nanotechnological processes, i.e. c haos regulation a nd hyper-accidental process were offered. Nanomedium and nanoimpact on the w ell-layer s ystem was studied. Wide application results of nanotechnologies in SOCAR's production fields in oil and gas production are shown.Research results of N ANOSAA o n the basis of N ANO + NANO e ffect are described in the development.For the first time in world practice N ANOOIL , N ANOBITUMEN , N ANOGUDRON' and N ANOMAY' systems on the basis of machine waste oil in the drilling mud were developed for the application in oil and gas drilling. Original property, e ffect of super small concentrations a nd n anomemory i n N ANOOIL a nd N ANOBITUMEN s ystems was discovered.By applying N ANOOIL , N ANOBITUMEN a nd N ANOMAY s ystems in the drilling process was discovered: the increase of linear speed, early turbulence, decrease of hydraulic resistance coefficient and economy in energy consumption.Hyper-accidental evaluation of mathematical expectation of general sum of values of the surface strain on the sample data is spelled out with the various experiment conditions. Estimated hyper-accidental value of the mathematical expectation allows us to offer practical recommendations for the development of new nanotechnologies on the basis of rheological parameters of oil.

Discrete Hadamard transformation algorithm's parallelism analysis and achievement

Science.gov (United States)

Hu, Hui

2009-07-01

With respect to Discrete Hadamard Transformation (DHT) wide application in real-time signal processing while limitation in operation speed of DSP. The article makes DHT parallel research and its parallel performance analysis. Based on multiprocessor platform-TMS320C80 programming structure, the research is carried out to achieve two kinds of parallel DHT algorithms. Several experiments demonstrated the effectiveness of the proposed algorithms.
Parallel hierarchical radiosity rendering

Energy Technology Data Exchange (ETDEWEB)

Carter, Michael [Iowa State Univ., Ames, IA (United States)

1993-07-01

In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.
Scientific Ethics: A New Approach.

Science.gov (United States)

Menapace, Marcello

2018-06-04

Science is an activity of the human intellect and as such has ethical implications that should be reviewed and taken into account. Although science and ethics have conventionally been considered different, it is herewith proposed that they are essentially similar. The proposal set henceforth is to create a new ethics rooted in science: scientific ethics. Science has firm axiological foundations and searches for truth (as a value, axiology) and knowledge (epistemology). Hence, science cannot be value neutral. Looking at standard scientific principles, it is possible to construct a scientific ethic (that is, an ethical framework based on scientific methods and rules), which can be applied to all sciences. These intellectual standards include the search for truth (honesty and its derivatives), human dignity (and by reflection the dignity of all animals) and respect for life. Through these it is thence achievable to draft a foundation of a ethics based purely on science and applicable beyond the confines of science. A few applications of these will be presented. Scientific ethics can have vast applications in other fields even in non scientific ones.
Software engineering and automatic continuous verification of scientific software

Science.gov (United States)

Piggott, M. D.; Hill, J.; Farrell, P. E.; Kramer, S. C.; Wilson, C. R.; Ham, D.; Gorman, G. J.; Bond, T.

2011-12-01

Software engineering of scientific code is challenging for a number of reasons including pressure to publish and a lack of awareness of the pitfalls of software engineering by scientists. The Applied Modelling and Computation Group at Imperial College is a diverse group of researchers that employ best practice software engineering methods whilst developing open source scientific software. Our main code is Fluidity - a multi-purpose computational fluid dynamics (CFD) code that can be used for a wide range of scientific applications from earth-scale mantle convection, through basin-scale ocean dynamics, to laboratory-scale classic CFD problems, and is coupled to a number of other codes including nuclear radiation and solid modelling. Our software development infrastructure consists of a number of free tools that could be employed by any group that develops scientific code and has been developed over a number of years with many lessons learnt. A single code base is developed by over 30 people for which we use bazaar for revision control, making good use of the strong branching and merging capabilities. Using features of Canonical's Launchpad platform, such as code review, blueprints for designing features and bug reporting gives the group, partners and other Fluidity uers an easy-to-use platform to collaborate and allows the induction of new members of the group into an environment where software development forms a central part of their work. The code repositoriy are coupled to an automated test and verification system which performs over 20,000 tests, including unit tests, short regression tests, code verification and large parallel tests. Included in these tests are build tests on HPC systems, including local and UK National HPC services. The testing of code in this manner leads to a continuous verification process; not a discrete event performed once development has ceased. Much of the code verification is done via the "gold standard" of comparisons to analytical
6th International Parallel Tools Workshop

CERN Document Server

Brinkmann, Steffen; Gracia, José; Resch, Michael; Nagel, Wolfgang

2013-01-01

The latest advances in the High Performance Computing hardware have significantly raised the level of available compute performance. At the same time, the growing hardware capabilities of modern supercomputing architectures have caused an increasing complexity of the parallel application development. Despite numerous efforts to improve and simplify parallel programming, there is still a lot of manual debugging and tuning work required. This process is supported by special software tools, facilitating debugging, performance analysis, and optimization and thus making a major contribution to the development of robust and efficient parallel software. This book introduces a selection of the tools, which were presented and discussed at the 6th International Parallel Tools Workshop, held in Stuttgart, Germany, 25-26 September 2012.
78 FR 13860 - Application(s) for Duty-Free Entry of Scientific Instruments

Science.gov (United States)

2013-03-01

... Scientific Instruments Pursuant to Section 6(c) of the Educational, Scientific and Cultural Materials... invite comments on the question of whether instruments of equivalent scientific value, for the purposes... conformational change of assemblies involved in biological processes such as ATP production, signal transduction...
76 FR 56156 - Application(s) for Duty-Free Entry of Scientific Instruments

Science.gov (United States)

2011-09-12

... Scientific Instruments Pursuant to Section 6(c) of the Educational, Scientific and Cultural Materials... invite comments on the question of whether instruments of equivalent scientific value, for the purposes... materials for energy production. The experiments will involve structural and chemical analyses of materials...
Parallel operation of voltage-source converters: issues and applications

Energy Technology Data Exchange (ETDEWEB)

Almeida, F.C.B.; Silva, D.S. [Federal University of Juiz de Fora (UFJF), MG (Brazil)], Emails: felipe.brum@engenharia.ufjf.br, salomaoime@yahoo.com.br; Ribeiro, P.F. [Calvin College, Grand Rapids, MI (United States); Federal University of Juiz de Fora (UFJF), MG (Brazil)], E-mail: pfribeiro@ieee.org

2009-07-01

Technological advancements in power electronics have prompted the development of advanced AC/DC conversion systems with high efficiency and flexible performance. Among these devices, the Voltage-Source Converter (VSC) has become an essential building block. This paper considers the parallel operation of VSCs under different system conditions and how they can assist the operation of highly complex power networks. A multi-terminal VSC-based High Voltage Direct Current (M-VSC-HVDC) system is chosen to be modeled, simulated and then analyzed as an example of VSCs operating in parallel. (author)
Proceedings of the Scientific Meeting on Research and Development of Isotopes Application and Radiation

International Nuclear Information System (INIS)

Singgih Sutrisno; Sofyan Yatim; Pattiradjawane, EIsje L.; Ismachin, Moch; Mugiono; Marga Utama; Komaruddin Idris

2004-02-01

The Proceedings of Scientific Meeting on Research and Development of Isotopes Application and Radiation has been presented on February 17-18, 2004 in Jakarta. The aims of the Meeting is to disseminate the results of research on application of nuclear techniques on agricultural, animal, industry, hydrology and environment. There were 4 invited papers and 38 papers from BATAN participants as well as outside. The articles are indexing separately. (PPIN)
Social behavioural epistemology and the scientific community.

Science.gov (United States)

Watve, Milind

2017-07-01

The progress of science is influenced substantially by social behaviour of and social interactions within the scientific community. Similar to innovations in primate groups, the social acceptance of an innovation depends not only upon the relevance of the innovation but also on the social dominance and connectedness of the innovator. There are a number of parallels between many well-known phenomena in behavioural evolution and various behavioural traits observed in the scientific community. It would be useful, therefore, to use principles of behavioural evolution as hypotheses to study the social behaviour of the scientific community. I argue in this paper that a systematic study of social behavioural epistemology is likely to boost the progress of science by addressing several prevalent biases and other problems in scientific communication and by facilitating appropriate acceptance/rejection of novel concepts.
Cloud Data Storage Federation for Scientific Applications

NARCIS (Netherlands)

Koulouzis, S.; Vasyunin, D.; Cushing, R.; Belloum, A.; Bubak, M.; an Mey, D.; Alexander, M.; Bientinesi, P.; Cannataro, M.; Clauss, C.; Costan, A.; Kecskemeti, G.; Morin, C.; Ricci, L.; Sahuquillo, J.; Schulz, M.; Scarano, V.; Scott, S.L.; Weidendorfer, J.

2014-01-01

Nowadays, data-intensive scientific research needs storage capabilities that enable efficient data sharing. This is of great importance for many scientific domains such as the Virtual Physiological Human. In this paper, we introduce a solution that federates a variety of systems ranging from file
A high performance scientific cloud computing environment for materials simulations

OpenAIRE

Jorissen, Kevin; Vila, Fernando D.; Rehr, John J.

2011-01-01

We describe the development of a scientific cloud computing (SCC) platform that offers high performance computation capability. The platform consists of a scientific virtual machine prototype containing a UNIX operating system and several materials science codes, together with essential interface tools (an SCC toolset) that offers functionality comparable to local compute clusters. In particular, our SCC toolset provides automatic creation of virtual clusters for parallel computing, including...
Massively parallel sparse matrix function calculations with NTPoly

Science.gov (United States)

Dawson, William; Nakajima, Takahito

2018-04-01

We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
ADaCGH: A parallelized web-based application and R package for the analysis of aCGH data.

Directory of Open Access Journals (Sweden)

Ramón Díaz-Uriarte

Full Text Available BACKGROUND: Copy number alterations (CNAs in genomic DNA have been associated with complex human diseases, including cancer. One of the most common techniques to detect CNAs is array-based comparative genomic hybridization (aCGH. The availability of aCGH platforms and the need for identification of CNAs has resulted in a wealth of methodological studies. METHODOLOGY/PRINCIPAL FINDINGS: ADaCGH is an R package and a web-based application for the analysis of aCGH data. It implements eight methods for detection of CNAs, gains and losses of genomic DNA, including all of the best performing ones from two recent reviews (CBS, GLAD, CGHseg, HMM. For improved speed, we use parallel computing (via MPI. Additional information (GO terms, PubMed citations, KEGG and Reactome pathways is available for individual genes, and for sets of genes with altered copy numbers. CONCLUSIONS/SIGNIFICANCE: ADACGH represents a qualitative increase in the standards of these types of applications: a all of the best performing algorithms are included, not just one or two; b we do not limit ourselves to providing a thin layer of CGI on top of existing BioConductor packages, but instead carefully use parallelization, examining different schemes, and are able to achieve significant decreases in user waiting time (factors up to 45x; c we have added functionality not currently available in some methods, to adapt to recent recommendations (e.g., merging of segmentation results in wavelet-based and CGHseg algorithms; d we incorporate redundancy, fault-tolerance and checkpointing, which are unique among web-based, parallelized applications; e all of the code is available under open source licenses, allowing to build upon, copy, and adapt our code for other software projects.
Accelerating the scientific exploration process with scientific workflows

International Nuclear Information System (INIS)

Altintas, Ilkay; Barney, Oscar; Cheng, Zhengang; Critchlow, Terence; Ludaescher, Bertram; Parker, Steve; Shoshani, Arie; Vouk, Mladen

2006-01-01

Although an increasing amount of middleware has emerged in the last few years to achieve remote data access, distributed job execution, and data management, orchestrating these technologies with minimal overhead still remains a difficult task for scientists. Scientific workflow systems improve this situation by creating interfaces to a variety of technologies and automating the execution and monitoring of the workflows. Workflow systems provide domain-independent customizable interfaces and tools that combine different tools and technologies along with efficient methods for using them. As simulations and experiments move into the petascale regime, the orchestration of long running data and compute intensive tasks is becoming a major requirement for the successful steering and completion of scientific investigations. A scientific workflow is the process of combining data and processes into a configurable, structured set of steps that implement semi-automated computational solutions of a scientific problem. Kepler is a cross-project collaboration, co-founded by the SciDAC Scientific Data Management (SDM) Center, whose purpose is to develop a domain-independent scientific workflow system. It provides a workflow environment in which scientists design and execute scientific workflows by specifying the desired sequence of computational actions and the appropriate data flow, including required data transformations, between these steps. Currently deployed workflows range from local analytical pipelines to distributed, high-performance and high-throughput applications, which can be both data- and compute-intensive. The scientific workflow approach offers a number of advantages over traditional scripting-based approaches, including ease of configuration, improved reusability and maintenance of workflows and components (called actors), automated provenance management, 'smart' re-running of different versions of workflow instances, on-the-fly updateable parameters, monitoring
Application of quality assurance to scientific activities at Westinghouse Hanford Company

International Nuclear Information System (INIS)

Delvin, W.L.; Farwick, D.G.

1988-01-01

The application of quality assurance to scientific activities has been an ongoing subject of review, discussion, interpretation, and evaluation within the nuclear community for the past several years. This paper provides a discussion on the natures of science and quality assurance and presents suggestions for integrating the two successfully. The paper shows how those actions were used at the Westinghouse Hanford Company to successfully apply quality assurance to experimental studies and materials testing and evaluation activities that supported a major project. An important factor in developing and implementing the quality assurance program was the close working relationship that existed between the assigned quality engineers and the scientists. The quality engineers, who had had working experience in the scientific disciplines involved, were able to bridge across from the scientists to the more traditional quality assurance personnel who had overall responsibility for the project's quality assurance program
A Parallel Priority Queue with Constant Time Operations

DEFF Research Database (Denmark)

Brodal, Gerth Stølting; Träff, Jesper Larsson; Zaroliagis, Christos D.

1998-01-01

We present a parallel priority queue that supports the following operations in constant time:parallel insertionof a sequence of elements ordered according to key,parallel decrease keyfor a sequence of elements ordered according to key,deletion of the minimum key element, anddeletion of an arbitrary...... application is a parallel implementation of Dijkstra's algorithm for the single-source shortest path problem, which runs inO(n) time andO(mlogn) work on a CREW PRAM on graphs withnvertices andmedges. This is a logarithmic factor improvement in the running time compared with previous approaches....
Parallel Application Performance on Two Generations of Intel Xeon HPC Platforms

Energy Technology Data Exchange (ETDEWEB)

Chang, Christopher H.; Long, Hai; Sides, Scott; Vaidhynathan, Deepthi; Jones, Wesley

2015-10-15

Two next-generation node configurations hosting the Haswell microarchitecture were tested with a suite of microbenchmarks and application examples, and compared with a current Ivy Bridge production node on NREL" tm s Peregrine high-performance computing cluster. A primary conclusion from this study is that the additional cores are of little value to individual task performance--limitations to application parallelism, or resource contention among concurrently running but independent tasks, limits effective utilization of these added cores. Hyperthreading generally impacts throughput negatively, but can improve performance in the absence of detailed attention to runtime workflow configuration. The observations offer some guidance to procurement of future HPC systems at NREL. First, raw core count must be balanced with available resources, particularly memory bandwidth. Balance-of-system will determine value more than processor capability alone. Second, hyperthreading continues to be largely irrelevant to the workloads that are commonly seen, and were tested here, at NREL. Finally, perhaps the most impactful enhancement to productivity might occur through enabling multiple concurrent jobs per node. Given the right type and size of workload, more may be achieved by doing many slow things at once, than fast things in order.
Parallel transport of long mean-free-path plasma along open magnetic field lines: Parallel heat flux

International Nuclear Information System (INIS)

Guo Zehua; Tang Xianzhu

2012-01-01

In a long mean-free-path plasma where temperature anisotropy can be sustained, the parallel heat flux has two components with one associated with the parallel thermal energy and the other the perpendicular thermal energy. Due to the large deviation of the distribution function from local Maxwellian in an open field line plasma with low collisionality, the conventional perturbative calculation of the parallel heat flux closure in its local or non-local form is no longer applicable. Here, a non-perturbative calculation is presented for a collisionless plasma in a two-dimensional flux expander bounded by absorbing walls. Specifically, closures of previously unfamiliar form are obtained for ions and electrons, which relate two distinct components of the species parallel heat flux to the lower order fluid moments such as density, parallel flow, parallel and perpendicular temperatures, and the field quantities such as the magnetic field strength and the electrostatic potential. The plasma source and boundary condition at the absorbing wall enter explicitly in the closure calculation. Although the closure calculation does not take into account wave-particle interactions, the results based on passing orbits from steady-state collisionless drift-kinetic equation show remarkable agreement with fully kinetic-Maxwell simulations. As an example of the physical implications of the theory, the parallel heat flux closures are found to predict a surprising observation in the kinetic-Maxwell simulation of the 2D magnetic flux expander problem, where the parallel heat flux of the parallel thermal energy flows from low to high parallel temperature region.
Adapting algorithms to massively parallel hardware

CERN Document Server

Sioulas, Panagiotis

2016-01-01

In the recent years, the trend in computing has shifted from delivering processors with faster clock speeds to increasing the number of cores per processor. This marks a paradigm shift towards parallel programming in which applications are programmed to exploit the power provided by multi-cores. Usually there is gain in terms of the time-to-solution and the memory footprint. Specifically, this trend has sparked an interest towards massively parallel systems that can provide a large number of processors, and possibly computing nodes, as in the GPUs and MPPAs (Massively Parallel Processor Arrays). In this project, the focus was on two distinct computing problems: k-d tree searches and track seeding cellular automata. The goal was to adapt the algorithms to parallel systems and evaluate their performance in different cases.

Performance study of a cluster calculation; parallelization and application under geant4

International Nuclear Information System (INIS)

Trabelsi, Abir

2007-01-01

This work concretizes the final studies project for engineering computer sciences, it is archived within the national center of nuclear sciences and technology. The project consists in studying the performance of a set of machines in order to determine the best architecture to assemble them in a cluster. As well as the parallelism and the parallel implementation of GEANT4, as a tool of simulation. The realisation of this project consists on : 1) programming with C++ and executing the two benchmarks P MV and PMM on each station; 2) Interpreting this result in order to show the best architecture of the cluster; 3) parallelism with TOP-C the two benchmarks; 4) Executing the two Top-C versions on the cluster; 5) Generalizing this results; 6)parallelism et executing the parallel version of GEANT4. (Author). 14 refs
MINUIT package parallelization and applications using the RooFit package

International Nuclear Information System (INIS)

Lazzaro, Alfio; Moneta, Lorenzo

2010-01-01

The fitting procedures are based on numerical minimization of functions. The MINUIT package is the most common package used for such procedures in High Energy Physics community. The main algorithm in this package, MIGRAD, searches the minimum of a function using the gradient information. For each minimization iteration, MIGRAD requires the calculation of the derivative for each free parameter of the function to be minimized. Minimization is required for data analysis problems based on the maximum likelihood technique. The calculation of complex likelihood functions, with several free parameters, many independent variables and large data samples, can be very CPU-time consuming. Then, the minimization process requires the calculation of the likelihood functions several times for each minimization iteration. In this paper we will show how MIGRAD algorithm and the likelihood function calculation can be easily parallelized using Message Passing Interface techniques. We will present the speed-up improvements obtained in typical physics applications such as complex maximum likelihood fits using the RooFit package.
Leveraging Transcultural Enrollments to Enhance Application of the Scientific Method

Science.gov (United States)

Loudin, M.

2013-12-01

Continued growth of transcultural academic programs presents an opportunity for all of the students involved to improve utilization of the scientific method. Our own business success depends on how effectively we apply the scientific method, and so it is unsurprising that our hiring programs focus on three broad areas of capability among applicants which are strongly related to the scientific method. These are 1) ability to continually learn up-to-date earth science concepts, 2) ability to effectively and succinctly communicate in the English language, both oral and written, and 3) ability to employ behaviors that are advantageous with respect to the various phases of the scientific method. This third area is often the most difficult to develop, because neither so-called Western nor Eastern cultures encourage a suite of behaviors that are ideally suited. Generally, the acceptance of candidates into academic programs, together with subsequent high performance evidenced by grades, is a highly valid measure of continuous learning capability. Certainly, students for whom English is not a native language face additional challenges, but succinct and effective communication is an art which requires practice and development, regardless of native language. The ability to communicate in English is crucial, since it is today's lingua franca for both science and commerce globally. Therefore, we strongly support the use of frequent English written assignments and oral presentations as an integral part of all scientific academic programs. There is no question but that this poses additional work for faculty; nevertheless it is a key ingredient to the optimal development of students. No one culture has a monopoly with respect to behaviors that promote effective leveraging of the scientific method. For instance, the growing complexity of experimental protocols argues for a high degree of interdependent effort, which is more often associated with so-called Eastern than Western
Applications of scientific imaging in environmental toxicology

Science.gov (United States)

El-Demerdash, Aref M.

The national goals of clean air, clean water, and healthy ecosystems are a few of the primary forces that drive the need for better environmental monitoring. As we approach the end of the 1990s, the environmental questions at regional to global scales are being redefined and refined in the light of developments in environmental understanding and technological capability. Research in the use of scientific imaging data for the study of the environment is urgently needed in order to explore the possibilities of utilizing emerging new technologies. The objective of this research proposal is to demonstrate the usability of a wealth of new technology made available in the last decade to providing a better understanding of environmental problems. Research is focused in two imaging techniques macro and micro imaging. Several examples of applications of scientific imaging in research in the field of environmental toxicology were presented. This was achieved on two scales, micro and macro imaging. On the micro level four specific examples were covered. First, the effect of utilizing scanning electron microscopy as an imaging tool in enhancing taxa identification when studying diatoms was presented. Second, scanning electron microscopy combined with energy dispersive x-ray analyzer were demonstrated as a valuable and effective tool for identifying and analyzing household dust samples. Third, electronic autoradiography combined with FT-IR microscopy were used to study the distribution pattern of [14C]-Malathion in rats as a result of dermal exposure. The results of the autoradiography made on skin sections of the application site revealed the presence of [ 14C]-activity in the first region of the skin. These results were evidenced by FT-IR microscopy. The obtained results suggest that the penetration of Malathion into the skin and other tissues is vehicle and dose dependent. The results also suggest the use of FT-IR microscopy imaging for monitoring the disposition of
14th ACIS/IEEE International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing

CERN Document Server

Studies in Computational Intelligence : Volume 492

2013-01-01

This edited book presents scientific results of the 14th ACIS/IEEE International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2013), held in Honolulu, Hawaii, USA on July 1-3, 2013. The aim of this conference was to bring together scientists, engineers, computer users, and students to share their experiences and exchange new ideas, research results about all aspects (theory, applications and tools) of computer and information science, and to discuss the practical challenges encountered along the way and the solutions adopted to solve them. The conference organizers selected the 17 outstanding papers from those papers accepted for presentation at the conference.
15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing

CERN Document Server

2015-01-01

This edited book presents scientific results of 15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2014) held on June 30 – July 2, 2014 in Las Vegas Nevada, USA. The aim of this conference was to bring together scientists, engineers, computer users, and students to share their experiences and exchange new ideas, research results about all aspects (theory, applications and tools) of computer and information science, and to discuss the practical challenges encountered along the way and the solutions adopted to solve them. The conference organizers selected the 13 outstanding papers from those papers accepted for presentation at the conference.
Methods for Specifying Scientific Data Standards and Modeling Relationships with Applications to Neuroscience

Science.gov (United States)

Rübel, Oliver; Dougherty, Max; Prabhat; Denes, Peter; Conant, David; Chang, Edward F.; Bouchard, Kristofer

2016-01-01

Neuroscience continues to experience a tremendous growth in data; in terms of the volume and variety of data, the velocity at which data is acquired, and in turn the veracity of data. These challenges are a serious impediment to sharing of data, analyses, and tools within and across labs. Here, we introduce BRAINformat, a novel data standardization framework for the design and management of scientific data formats. The BRAINformat library defines application-independent design concepts and modules that together create a general framework for standardization of scientific data. We describe the formal specification of scientific data standards, which facilitates sharing and verification of data and formats. We introduce the concept of Managed Objects, enabling semantic components of data formats to be specified as self-contained units, supporting modular and reusable design of data format components and file storage. We also introduce the novel concept of Relationship Attributes for modeling and use of semantic relationships between data objects. Based on these concepts we demonstrate the application of our framework to design and implement a standard format for electrophysiology data and show how data standardization and relationship-modeling facilitate data analysis and sharing. The format uses HDF5, enabling portable, scalable, and self-describing data storage and integration with modern high-performance computing for data-driven discovery. The BRAINformat library is open source, easy-to-use, and provides detailed user and developer documentation and is freely available at: https://bitbucket.org/oruebel/brainformat. PMID:27867355
Parallel experience: how art and art theory can inform ethics in human research.

Science.gov (United States)

Schwartz, L

2003-12-01

Trends in ethical research involving humans emphasise the importance of collaboration, of involving research subjects, alongside the researchers in the construction and implementation of research. This paper will explore parallels derived from another tradition of investigation of the human: art and art theory. An artist's inquiry into the problems of human research will be described, followed by the application of arguments from art theory to research practice. Recently artist Christine Borland has provided examples in which the lack of collaboration in research has caused injustice. Borland's work reflects these ethical dilemmas and questions the procedures and assumptions involved. In most cases the value of subject anonymity is called into question because it reduces the subjects' control over themselves. The application of art theory, which has already considered these problems, helps question and explore the ways in which the subject turned object of artistic or scientific interpretation can maintain some control and dignity.
Applications of industrial computed tomography at Los Alamos Scientific Laboratory

International Nuclear Information System (INIS)

Kruger, R.P.; Morris, R.A.; Wecksung, G.W.

1980-01-01

A research and development program was begun three years ago at the Los Alamos Scientific Laboratory (LASL) to study nonmedical applications of computed tomography. This program had several goals. The first goal was to develop the necessary reconstruction algorithms to accurately reconstruct cross sections of nonmedical industrial objects. The second goal was to be able to perform extensive tomographic simulations to determine the efficacy of tomographic reconstruction with a variety of hardware configurations. The final goal was to construct an inexpensive industrial prototype scanner with a high degree of design flexibility. The implementation of these program goals is described
Experimental analysis of a capillary pumped loop for terrestrial applications with several evaporators in parallel

International Nuclear Information System (INIS)

Blet, Nicolas; Bertin, Yves; Ayel, Vincent; Romestant, Cyril; Platel, Vincent

2016-01-01

Highlights: • This paper introduces experimental studies of a CPLTA with 3 evaporators in parallel. • Operating principles of mono-evaporator CPLTA are reminded. • A reference test with the new bench with only one evaporator is introduced. • Global behavior of the multi-evaporators loop is presented and discussed. • Some additional thermohydraulic couplings are revealed. - Abstract: In the context of high-dissipation electronics cooling for ground transportation, a new design of two-phase loop has been improved in recent years: the capillary pumped loop for terrestrial application (CPLTA). This hybrid system, between the two standard capillary pumped loop (CPL) and loop heat pipe (LHP), has been widely investigated with a single evaporator, and so a single dissipative area, to know its mean operating principles and thermohydraulic couplings between the components. To aim to extend its scope of applications, a new experimental CPLTA with three evaporators in parallel is studied in this paper with methanol as working fluid. Even if the dynamics of the loop in multi-evaporators mode appears on the whole similar to that with a single operating evaporator, additional couplings are highlighted between the several evaporators. A decoupling between vapor generation flow rate and pressure drop in each evaporator is especially revealed. The impact of this phenomenon on the conductance at evaporator is analyzed.
Parallel log structured file system collective buffering to achieve a compact representation of scientific and/or dimensional data

Science.gov (United States)

Grider, Gary A.; Poole, Stephen W.

2015-09-01

Collective buffering and data pattern solutions are provided for storage, retrieval, and/or analysis of data in a collective parallel processing environment. For example, a method can be provided for data storage in a collective parallel processing environment. The method comprises receiving data to be written for a plurality of collective processes within a collective parallel processing environment, extracting a data pattern for the data to be written for the plurality of collective processes, generating a representation describing the data pattern, and saving the data and the representation.
IPython: components for interactive and parallel computing across disciplines. (Invited)

Science.gov (United States)

Perez, F.; Bussonnier, M.; Frederic, J. D.; Froehle, B. M.; Granger, B. E.; Ivanov, P.; Kluyver, T.; Patterson, E.; Ragan-Kelley, B.; Sailer, Z.

2013-12-01

Scientific computing is an inherently exploratory activity that requires constantly cycling between code, data and results, each time adjusting the computations as new insights and questions arise. To support such a workflow, good interactive environments are critical. The IPython project (http://ipython.org) provides a rich architecture for interactive computing with: 1. Terminal-based and graphical interactive consoles. 2. A web-based Notebook system with support for code, text, mathematical expressions, inline plots and other rich media. 3. Easy to use, high performance tools for parallel computing. Despite its roots in Python, the IPython architecture is designed in a language-agnostic way to facilitate interactive computing in any language. This allows users to mix Python with Julia, R, Octave, Ruby, Perl, Bash and more, as well as to develop native clients in other languages that reuse the IPython clients. In this talk, I will show how IPython supports all stages in the lifecycle of a scientific idea: 1. Individual exploration. 2. Collaborative development. 3. Production runs with parallel resources. 4. Publication. 5. Education. In particular, the IPython Notebook provides an environment for "literate computing" with a tight integration of narrative and computation (including parallel computing). These Notebooks are stored in a JSON-based document format that provides an "executable paper": notebooks can be version controlled, exported to HTML or PDF for publication, and used for teaching.
Comparison of Resource Platform Selection Approaches for Scientific Workflows

Energy Technology Data Exchange (ETDEWEB)

Simmhan, Yogesh; Ramakrishnan, Lavanya

2010-03-05

Cloud computing is increasingly considered as an additional computational resource platform for scientific workflows. The cloud offers opportunity to scale-out applications from desktops and local cluster resources. At the same time, it can eliminate the challenges of restricted software environments and queue delays in shared high performance computing environments. Choosing from these diverse resource platforms for a workflow execution poses a challenge for many scientists. Scientists are often faced with deciding resource platform selection trade-offs with limited information on the actual workflows. While many workflow planning methods have explored task scheduling onto different resources, these methods often require fine-scale characterization of the workflow that is onerous for a scientist. In this position paper, we describe our early exploratory work into using blackbox characteristics to do a cost-benefit analysis across of using cloud platforms. We use only very limited high-level information on the workflow length, width, and data sizes. The length and width are indicative of the workflow duration and parallelism. The data size characterizes the IO requirements. We compare the effectiveness of this approach to other resource selection models using two exemplar scientific workflows scheduled on desktops, local clusters, HPC centers, and clouds. Early results suggest that the blackbox model often makes the same resource selections as a more fine-grained whitebox model. We believe the simplicity of the blackbox model can help inform a scientist on the applicability of cloud computing resources even before porting an existing workflow.
Massively parallel evolutionary computation on GPGPUs

CERN Document Server

Tsutsui, Shigeyoshi

2013-01-01

Evolutionary algorithms (EAs) are metaheuristics that learn from natural collective behavior and are applied to solve optimization problems in domains such as scheduling, engineering, bioinformatics, and finance. Such applications demand acceptable solutions with high-speed execution using finite computational resources. Therefore, there have been many attempts to develop platforms for running parallel EAs using multicore machines, massively parallel cluster machines, or grid computing environments. Recent advances in general-purpose computing on graphics processing units (GPGPU) have opened u
GRID Prototype for imagery processing in scientific applications

International Nuclear Information System (INIS)

Stan, Ionel; Zgura, Ion Sorin; Haiduc, Maria; Valeanu, Vlad; Giurgiu, Liviu

2004-01-01

The paper presents the results of our study which is part of the InGRID project. This project is supported by ROSA (ROmanian Space Agency). In this paper we will show the possibility to take images from the optical microscope through web camera. The images are then stored on the PC in Linux operating system and distributed to other clusters through GRID technology (using http, php, MySQL, Globus or AliEn systems). The images are provided from nuclear emulsions in the frame of Becquerel Collaboration. The main goal of the project InGRID is to actuate developing and deploying GRID technology for images technique taken from space, different application fields and telemedicine. Also it will create links with the same international projects which use advanced Grid technology and scalable storage solutions. The main topics proposed to be solved in the frame of InGRID project are: - Implementation of two GRID clusters, minimum level Tier 3; - Adapting and updating the common storage and processing computing facility; - Testing the middelware packages developed in the frame of this project; - Testbed production of the prototype; - Build-up and advertise the InGRID prototype in scientific community through current dissemination. InGRID Prototype developed in the frame of this project, will be used by partner institutes as deploying environment of the imaging applications the dynamical features of which will be defined by conditions of contract. Subsequent applications will be deployed by the partners of this project with governmental, nongovernmental and private institutions. (authors)
Massive Parallelism of Monte-Carlo Simulation on Low-End Hardware using Graphic Processing Units

International Nuclear Information System (INIS)

Mburu, Joe Mwangi; Hah, Chang Joo Hah

2014-01-01

Within the past decade, research has been done on utilizing GPU massive parallelization in core simulation with impressive results but unfortunately, not much commercial application has been done in the nuclear field especially in reactor core simulation. The purpose of this paper is to give an introductory concept on the topic and illustrate the potential of exploiting the massive parallel nature of GPU computing on a simple monte-carlo simulation with very minimal hardware specifications. To do a comparative analysis, a simple two dimension monte-carlo simulation is implemented for both the CPU and GPU in order to evaluate performance gain based on the computing devices. The heterogeneous platform utilized in this analysis is done on a slow notebook with only 1GHz processor. The end results are quite surprising whereby high speedups obtained are almost a factor of 10. In this work, we have utilized heterogeneous computing in a GPU-based approach in applying potential high arithmetic intensive calculation. By applying a complex monte-carlo simulation on GPU platform, we have speed up the computational process by almost a factor of 10 based on one million neutrons. This shows how easy, cheap and efficient it is in using GPU in accelerating scientific computing and the results should encourage in exploring further this avenue especially in nuclear reactor physics simulation where deterministic and stochastic calculations are quite favourable in parallelization
Massive Parallelism of Monte-Carlo Simulation on Low-End Hardware using Graphic Processing Units

Energy Technology Data Exchange (ETDEWEB)

Mburu, Joe Mwangi; Hah, Chang Joo Hah [KEPCO International Nuclear Graduate School, Ulsan (Korea, Republic of)

2014-05-15

Within the past decade, research has been done on utilizing GPU massive parallelization in core simulation with impressive results but unfortunately, not much commercial application has been done in the nuclear field especially in reactor core simulation. The purpose of this paper is to give an introductory concept on the topic and illustrate the potential of exploiting the massive parallel nature of GPU computing on a simple monte-carlo simulation with very minimal hardware specifications. To do a comparative analysis, a simple two dimension monte-carlo simulation is implemented for both the CPU and GPU in order to evaluate performance gain based on the computing devices. The heterogeneous platform utilized in this analysis is done on a slow notebook with only 1GHz processor. The end results are quite surprising whereby high speedups obtained are almost a factor of 10. In this work, we have utilized heterogeneous computing in a GPU-based approach in applying potential high arithmetic intensive calculation. By applying a complex monte-carlo simulation on GPU platform, we have speed up the computational process by almost a factor of 10 based on one million neutrons. This shows how easy, cheap and efficient it is in using GPU in accelerating scientific computing and the results should encourage in exploring further this avenue especially in nuclear reactor physics simulation where deterministic and stochastic calculations are quite favourable in parallelization.
Design and performance characterization of electronic structure calculations on massively parallel supercomputers

DEFF Research Database (Denmark)

Romero, N. A.; Glinsvad, Christian; Larsen, Ask Hjorth

2013-01-01

Density function theory (DFT) is the most widely employed electronic structure method because of its favorable scaling with system size and accuracy for a broad range of molecular and condensed-phase systems. The advent of massively parallel supercomputers has enhanced the scientific community...
Parallelization of an existing high energy physics event reconstruction software package

International Nuclear Information System (INIS)

Schiefer, R.; Francis, D.

1996-01-01

Software parallelization allows an efficient use of available computing power to increase the performance of applications. In a case study the authors have investigated the parallelization of high energy physics event reconstruction software in terms of costs (effort, computing resource requirements), benefits (performance increase) and the feasibility of a systematic parallelization approach. Guidelines facilitating a parallel implementation are proposed for future software development
Complementarity beyond physics Niels Bohr's parallels

CERN Document Server

Bala, Arun

2017-01-01

In this study Arun Bala examines the implications that Niels Bohr’s principle of complementarity holds for fields beyond physics. Bohr, one of the founding figures of modern quantum physics, argued that the principle of complementarity he proposed for understanding atomic processes has parallels in psychology, biology, and social science, as well as in Buddhist and Taoist thought. But Bohr failed to offer any explanation for why complementarity might extend beyond physics, and his claims have been widely rejected by scientists as empty speculation. Scientific scepticism has only been reinforced by the naïve enthusiasm of postmodern relativists and New Age intuitionists, who seize upon Bohr’s ideas to justify anti-realist and mystical positions. Arun Bala offers a detailed defence of Bohr’s claim that complementarity has far-reaching implications for the biological and social sciences, as well as for comparative philosophies of science, by explaining Bohr’s parallels as responses to the omnipresence...

Parallel fuzzy connected image segmentation on GPU

OpenAIRE

Zhuge, Ying; Cao, Yong; Udupa, Jayaram K.; Miller, Robert W.

2011-01-01

Purpose: Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm impleme...
Ab initio quantum chemistry in parallel-portable tools and applications

International Nuclear Information System (INIS)

Harrison, R.J.; Shepard, R.; Kendall, R.A.

1991-01-01

In common with many of the computational sciences, ab initio chemistry faces computational constraints to which a partial solution is offered by the prospect of highly parallel computers. Ab initio codes are large and complex (O(10 5 ) lines of FORTRAN), representing a significant investment of communal effort. The often conflicting requirements of portability and efficiency have been successfully resolved on vector computers by reliance on matrix oriented kernels. This proves inadequate even upon closely-coupled shared-memory parallel machines. We examine the algorithms employed during a typical sequence of calculations. Then we investigate how efficient portable parallel implementations may be derived, including the complex multi-reference singles and doubles configuration interaction algorithm. A portable toolkit, modeled after the Intel iPSC and the ANL-ACRF PARMACS, is developed, using shared memory and TCP/IP sockets. The toolkit is used as an initial platform for programs portable between LANS, Crays and true distributed-memory MIMD machines. Timings are presented. 53 refs., 4 tabs
Parallel heat transport in integrable and chaotic magnetic fields

Energy Technology Data Exchange (ETDEWEB)

Castillo-Negrete, D. del; Chacon, L. [Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-8071 (United States)

2012-05-15

The study of transport in magnetized plasmas is a problem of fundamental interest in controlled fusion, space plasmas, and astrophysics research. Three issues make this problem particularly challenging: (i) The extreme anisotropy between the parallel (i.e., along the magnetic field), {chi}{sub ||} , and the perpendicular, {chi}{sub Up-Tack }, conductivities ({chi}{sub ||} /{chi}{sub Up-Tack} may exceed 10{sup 10} in fusion plasmas); (ii) Nonlocal parallel transport in the limit of small collisionality; and (iii) Magnetic field lines chaos which in general complicates (and may preclude) the construction of magnetic field line coordinates. Motivated by these issues, we present a Lagrangian Green's function method to solve the local and non-local parallel transport equation applicable to integrable and chaotic magnetic fields in arbitrary geometry. The method avoids by construction the numerical pollution issues of grid-based algorithms. The potential of the approach is demonstrated with nontrivial applications to integrable (magnetic island), weakly chaotic (Devil's staircase), and fully chaotic magnetic field configurations. For the latter, numerical solutions of the parallel heat transport equation show that the effective radial transport, with local and non-local parallel closures, is non-diffusive, thus casting doubts on the applicability of quasilinear diffusion descriptions. General conditions for the existence of non-diffusive, multivalued flux-gradient relations in the temperature evolution are derived.
Processing data communications events by awakening threads in parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2016-03-15

Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.
Software Engineering for Scientific Computer Simulations

Science.gov (United States)

Post, Douglass E.; Henderson, Dale B.; Kendall, Richard P.; Whitney, Earl M.

2004-11-01

Computer simulation is becoming a very powerful tool for analyzing and predicting the performance of fusion experiments. Simulation efforts are evolving from including only a few effects to many effects, from small teams with a few people to large teams, and from workstations and small processor count parallel computers to massively parallel platforms. Successfully making this transition requires attention to software engineering issues. We report on the conclusions drawn from a number of case studies of large scale scientific computing projects within DOE, academia and the DoD. The major lessons learned include attention to sound project management including setting reasonable and achievable requirements, building a good code team, enforcing customer focus, carrying out verification and validation and selecting the optimum computational mathematics approaches.
Scientific Applications Performance Evaluation on Burst Buffer

KAUST Repository

Markomanolis, George S.; Hadri, Bilel; Khurram, Rooh Ul Amin; Feki, Saber

2017-01-01

Parallel I/O is an integral component of modern high performance computing, especially in storing and processing very large datasets, such as the case of seismic imaging, CFD, combustion and weather modeling. The storage hierarchy includes nowadays
A compositional reservoir simulator on distributed memory parallel computers

International Nuclear Information System (INIS)

Rame, M.; Delshad, M.

1995-01-01

This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. A portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented
A parallel algorithm for 3D dislocation dynamics

International Nuclear Information System (INIS)

Wang Zhiqiang; Ghoniem, Nasr; Swaminarayan, Sriram; LeSar, Richard

2006-01-01

Dislocation dynamics (DD), a discrete dynamic simulation method in which dislocations are the fundamental entities, is a powerful tool for investigation of plasticity, deformation and fracture of materials at the micron length scale. However, severe computational difficulties arising from complex, long-range interactions between these curvilinear line defects limit the application of DD in the study of large-scale plastic deformation. We present here the development of a parallel algorithm for accelerated computer simulations of DD. By representing dislocations as a 3D set of dislocation particles, we show here that the problem of an interacting ensemble of dislocations can be converted to a problem of a particle ensemble, interacting with a long-range force field. A grid using binary space partitioning is constructed to keep track of node connectivity across domains. We demonstrate the computational efficiency of the parallel micro-plasticity code and discuss how O(N) methods map naturally onto the parallel data structure. Finally, we present results from applications of the parallel code to deformation in single crystal fcc metals
Parallel field line and stream line tracing algorithms for space physics applications

Science.gov (United States)

Toth, G.; de Zeeuw, D.; Monostori, G.

2004-05-01

Field line and stream line tracing is required in various space physics applications, such as the coupling of the global magnetosphere and inner magnetosphere models, the coupling of the solar energetic particle and heliosphere models, or the modeling of comets, where the multispecies chemical equations are solved along stream lines of a steady state solution obtained with single fluid MHD model. Tracing a vector field is an inherently serial process, which is difficult to parallelize. This is especially true when the data corresponding to the vector field is distributed over a large number of processors. We designed algorithms for the various applications, which scale well to a large number of processors. In the first algorithm the computational domain is divided into blocks. Each block is on a single processor. The algorithm folows the vector field inside the blocks, and calculates a mapping of the block surfaces. The blocks communicate the values at the coinciding surfaces, and the results are interpolated. Finally all block surfaces are defined and values inside the blocks are obtained. In the second algorithm all processors start integrating along the vector field inside the accessible volume. When the field line leaves the local subdomain, the position and other information is stored in a buffer. Periodically the processors exchange the buffers, and continue integration of the field lines until they reach a boundary. At that point the results are sent back to the originating processor. Efficiency is achieved by a careful phasing of computation and communication. In the third algorithm the results of a steady state simulation are stored on a hard drive. The vector field is contained in blocks. All processors read in all the grid and vector field data and the stream lines are integrated in parallel. If a stream line enters a block, which has already been integrated, the results can be interpolated. By a clever ordering of the blocks the execution speed can be
Parallel Algorithms and Software for Nuclear, Energy, and Environmental Applications Part I: Multiphysics Algorithms

International Nuclear Information System (INIS)

Gaston, Derek; Guo, Luanjing; Hansen, Glen; Huang, Hai; Johnson, Richard; Park, HyeongKae; Podgorney, Robert; Tonks, Michael; Williamson, Richard

2012-01-01

There is a growing trend within energy and environmental simulation to consider tightly coupled solutions to multiphysics problems. This can be seen in nuclear reactor analysis where analysts are interested in coupled flow, heat transfer and neutronics, and in nuclear fuel performance simulation where analysts are interested in thermomechanics with contact coupled to species transport and chemistry. In energy and environmental applications, energy extraction involves geomechanics, flow through porous media and fractured formations, adding heat transport for enhanced oil recovery and geothermal applications, and adding reactive transport in the case of applications modeling the underground flow of contaminants. These more ambitious simulations usually motivate some level of parallel computing. Many of the physics coupling efforts to date utilize simple code coupling or first-order operator splitting, often referred to as loose coupling. While these approaches can produce answers, they usually leave questions of accuracy and stability unanswered. Additionally, the different physics often reside on distinct meshes and data are coupled via simple interpolation, again leaving open questions of stability and accuracy.
Parallel transposition of sparse data structures

DEFF Research Database (Denmark)

Wang, Hao; Liu, Weifeng; Hou, Kaixi

2016-01-01

Many applications in computational sciences and social sciences exploit sparsity and connectivity of acquired data. Even though many parallel sparse primitives such as sparse matrix-vector (SpMV) multiplication have been extensively studied, some other important building blocks, e.g., parallel tr...... transposition in the latest vendor-supplied library on an Intel multicore CPU platform, and the MergeTrans approach achieves on average of 3.4-fold (up to 11.7-fold) speedup on an Intel Xeon Phi many-core processor....
High-performance parallel processors based on star-coupled wavelength division multiplexing optical interconnects

Science.gov (United States)

Deri, Robert J.; DeGroot, Anthony J.; Haigh, Ronald E.

2002-01-01

As the performance of individual elements within parallel processing systems increases, increased communication capability between distributed processor and memory elements is required. There is great interest in using fiber optics to improve interconnect communication beyond that attainable using electronic technology. Several groups have considered WDM, star-coupled optical interconnects. The invention uses a fiber optic transceiver to provide low latency, high bandwidth channels for such interconnects using a robust multimode fiber technology. Instruction-level simulation is used to quantify the bandwidth, latency, and concurrency required for such interconnects to scale to 256 nodes, each operating at 1 GFLOPS performance. Performance scales have been shown to .apprxeq.100 GFLOPS for scientific application kernels using a small number of wavelengths (8 to 32), only one wavelength received per node, and achievable optoelectronic bandwidth and latency.
Application of Massively Parallel Sequencing in the Clinical Diagnostic Testing of Inherited Cardiac Conditions

Directory of Open Access Journals (Sweden)

Ivone U. S. Leong

2014-06-01

Full Text Available Sudden cardiac death in people between the ages of 1–40 years is a devastating event and is frequently caused by several heritable cardiac disorders. These disorders include cardiac ion channelopathies, such as long QT syndrome, catecholaminergic polymorphic ventricular tachycardia and Brugada syndrome and cardiomyopathies, such as hypertrophic cardiomyopathy and arrhythmogenic right ventricular cardiomyopathy. Through careful molecular genetic evaluation of DNA from sudden death victims, the causative gene mutation can be uncovered, and the rest of the family can be screened and preventative measures implemented in at-risk individuals. The current screening approach in most diagnostic laboratories uses Sanger-based sequencing; however, this method is time consuming and labour intensive. The development of massively parallel sequencing has made it possible to produce millions of sequence reads simultaneously and is potentially an ideal approach to screen for mutations in genes that are associated with sudden cardiac death. This approach offers mutation screening at reduced cost and turnaround time. Here, we will review the current commercially available enrichment kits, massively parallel sequencing (MPS platforms, downstream data analysis and its application to sudden cardiac death in a diagnostic environment.
The development of GPU-based parallel PRNG for Monte Carlo applications in CUDA Fortran

Directory of Open Access Journals (Sweden)

Hamed Kargaran

2016-04-01

Full Text Available The implementation of Monte Carlo simulation on the CUDA Fortran requires a fast random number generation with good statistical properties on GPU. In this study, a GPU-based parallel pseudo random number generator (GPPRNG have been proposed to use in high performance computing systems. According to the type of GPU memory usage, GPU scheme is divided into two work modes including GLOBAL_MODE and SHARED_MODE. To generate parallel random numbers based on the independent sequence method, the combination of middle-square method and chaotic map along with the Xorshift PRNG have been employed. Implementation of our developed PPRNG on a single GPU showed a speedup of 150x and 470x (with respect to the speed of PRNG on a single CPU core for GLOBAL_MODE and SHARED_MODE, respectively. To evaluate the accuracy of our developed GPPRNG, its performance was compared to that of some other commercially available PPRNGs such as MATLAB, FORTRAN and Miller-Park algorithm through employing the specific standard tests. The results of this comparison showed that the developed GPPRNG in this study can be used as a fast and accurate tool for computational science applications.
The development of GPU-based parallel PRNG for Monte Carlo applications in CUDA Fortran

Energy Technology Data Exchange (ETDEWEB)

Kargaran, Hamed, E-mail: h-kargaran@sbu.ac.ir; Minuchehr, Abdolhamid; Zolfaghari, Ahmad [Department of nuclear engineering, Shahid Behesti University, Tehran, 1983969411 (Iran, Islamic Republic of)

2016-04-15

The implementation of Monte Carlo simulation on the CUDA Fortran requires a fast random number generation with good statistical properties on GPU. In this study, a GPU-based parallel pseudo random number generator (GPPRNG) have been proposed to use in high performance computing systems. According to the type of GPU memory usage, GPU scheme is divided into two work modes including GLOBAL-MODE and SHARED-MODE. To generate parallel random numbers based on the independent sequence method, the combination of middle-square method and chaotic map along with the Xorshift PRNG have been employed. Implementation of our developed PPRNG on a single GPU showed a speedup of 150x and 470x (with respect to the speed of PRNG on a single CPU core) for GLOBAL-MODE and SHARED-MODE, respectively. To evaluate the accuracy of our developed GPPRNG, its performance was compared to that of some other commercially available PPRNGs such as MATLAB, FORTRAN and Miller-Park algorithm through employing the specific standard tests. The results of this comparison showed that the developed GPPRNG in this study can be used as a fast and accurate tool for computational science applications.
Scientific and technical guidance for the preparation and presentation of an application for authorisation of a health claim (revision 1)

DEFF Research Database (Denmark)

Tetens, Inge

2011-01-01

The scientific and technical guidance of the EFSA Panel on Dietetic Products, Nutrition and Allergies for the preparation and presentation of an application for authorisation of a health claim presents a common format for the organisation of information for the preparation of a well......-structured application for authorisation of health claims which fall under Article 14 (referring to children’s development and health, and to disease risk reduction claims), or 13(5) (which are based on newly developed scientific evidence and/or which include a request for the protection of proprietary data......), or for the modification of an existing authorisation in accordance with Article 19 of Regulation (EC) No 1924/2006 on nutrition and health claims made on foods. This guidance outlines: the information and scientific data which must be included in the application, the hierarchy of different types of data and study designs...
A Generic Mesh Data Structure with Parallel Applications

Science.gov (United States)

Cochran, William Kenneth, Jr.

2009-01-01

High performance, massively-parallel multi-physics simulations are built on efficient mesh data structures. Most data structures are designed from the bottom up, focusing on the implementation of linear algebra routines. In this thesis, we explore a top-down approach to design, evaluating the various needs of many aspects of simulation, not just…
6th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing

CERN Document Server

2016-01-01

This edited book presents scientific results of the 16th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2015) which was held on June 1 – 3, 2015 in Takamatsu, Japan. The aim of this conference was to bring together researchers and scientists, businessmen and entrepreneurs, teachers, engineers, computer users, and students to discuss the numerous fields of computer science and to share their experiences and exchange new ideas and information in a meaningful way. Research results about all aspects (theory, applications and tools) of computer and information science, and to discuss the practical challenges encountered along the way and the solutions adopted to solve them.
17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing

CERN Document Server

SNPD 2016

2016-01-01

This edited book presents scientific results of the 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD 2016) which was held on May 30 - June 1, 2016 in Shanghai, China. The aim of this conference was to bring together researchers and scientists, businessmen and entrepreneurs, teachers, engineers, computer users, and students to discuss the numerous fields of computer science and to share their experiences and exchange new ideas and information in a meaningful way. Research results about all aspects (theory, applications and tools) of computer and information science, and to discuss the practical challenges encountered along the way and the solutions adopted to solve them.
Analysis for Parallel Execution without Performing Hardware/Software Co-simulation

OpenAIRE

Muhammad Rashid

2014-01-01

Hardware/software co-simulation improves the performance of embedded applications by executing the applications on a virtual platform before the actual hardware is available in silicon. However, the virtual platform of the target architecture is often not available during early stages of the embedded design flow. Consequently, analysis for parallel execution without performing hardware/software co-simulation is required. This article presents an analysis methodology for parallel execution of ...

The Organizational-Legal Peculiarities of Application of the Remote Labor Mode and Flexible Working Hours of Scientific Workers at Higher Education Institution

Directory of Open Access Journals (Sweden)

Lytovchenko Iryna V.

2018-01-01

Full Text Available The article is aimed at defining the main organizational-legal peculiarities of application of the remote labor mode, establishing and accounting the flexible working hours of scientific workers at higher educational institutions and scientific institutes. In the course of research the organizational-legal peculiarities of application of the remote labor mode and flexible working hours of the scientific workers at higher education institutions were analyzed. The article suggests their integration into the activities of higher education institution with the purpose of efficient distribution of their working time, provided that the tasks set are fully executed in a timely manner. As the basic means of control of measurement of results of scientific activity it is suggested to use acts of executed works and other absolute indicators (quantity of the processed scientific sources, quantity of the written pages of scientific papers etc.. The prospective direction of further research is development of practical recommendations on the use of special reports and indicators with an assessment of their impact on the results of activities of scientific workers at higher education institutions.
Flexibility and Performance of Parallel File Systems

Science.gov (United States)

Kotz, David; Nieuwejaar, Nils

1996-01-01

As we gain experience with parallel file systems, it becomes increasingly clear that a single solution does not suit all applications. For example, it appears to be impossible to find a single appropriate interface, caching policy, file structure, or disk-management strategy. Furthermore, the proliferation of file-system interfaces and abstractions make applications difficult to port. We propose that the traditional functionality of parallel file systems be separated into two components: a fixed core that is standard on all platforms, encapsulating only primitive abstractions and interfaces, and a set of high-level libraries to provide a variety of abstractions and application-programmer interfaces (API's). We present our current and next-generation file systems as examples of this structure. Their features, such as a three-dimensional file structure, strided read and write interfaces, and I/O-node programs, are specifically designed with the flexibility and performance necessary to support a wide range of applications.
Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications.

Science.gov (United States)

D'Angelo, Gianni; Rampone, Salvatore

2014-01-01

The huge quantity of data produced in Biomedical research needs sophisticated algorithmic methodologies for its storage, analysis, and processing. High Performance Computing (HPC) appears as a magic bullet in this challenge. However, several hard to solve parallelization and load balancing problems arise in this context. Here we discuss the HPC-oriented implementation of a general purpose learning algorithm, originally conceived for DNA analysis and recently extended to treat uncertainty on data (U-BRAIN). The U-BRAIN algorithm is a learning algorithm that finds a Boolean formula in disjunctive normal form (DNF), of approximately minimum complexity, that is consistent with a set of data (instances) which may have missing bits. The conjunctive terms of the formula are computed in an iterative way by identifying, from the given data, a family of sets of conditions that must be satisfied by all the positive instances and violated by all the negative ones; such conditions allow the computation of a set of coefficients (relevances) for each attribute (literal), that form a probability distribution, allowing the selection of the term literals. The great versatility that characterizes it, makes U-BRAIN applicable in many of the fields in which there are data to be analyzed. However the memory and the execution time required by the running are of O(n(3)) and of O(n(5)) order, respectively, and so, the algorithm is unaffordable for huge data sets. We find mathematical and programming solutions able to lead us towards the implementation of the algorithm U-BRAIN on parallel computers. First we give a Dynamic Programming model of the U-BRAIN algorithm, then we minimize the representation of the relevances. When the data are of great size we are forced to use the mass memory, and depending on where the data are actually stored, the access times can be quite different. According to the evaluation of algorithmic efficiency based on the Disk Model, in order to reduce the costs of
Multilevel Parallelization of AutoDock 4.2

Directory of Open Access Journals (Sweden)

Norgan Andrew P

2011-04-01

Full Text Available Abstract Background Virtual (computational screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4. Results Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Conclusions Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI and node-level (OpenMP parallelization to best fit both workloads and computational resources.
Multilevel Parallelization of AutoDock 4.2.

Science.gov (United States)

Norgan, Andrew P; Coffman, Paul K; Kocher, Jean-Pierre A; Katzmann, David J; Sosa, Carlos P

2011-04-28

Virtual (computational) screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4). Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O) traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI) and node-level (OpenMP) parallelization to best fit both workloads and computational resources.
OpenMP parallelization of a gridded SWAT (SWATG)

Science.gov (United States)

Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin

2017-12-01

Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.
On а Recursive-Parallel Algorithm for Solving the Knapsack Problem

Directory of Open Access Journals (Sweden)

Vladimir V. Vasilchikov

2018-01-01

Full Text Available In this paper, we offer an efficient parallel algorithm for solving the NP-complete Knapsack Problem in its basic, so-called 0-1 variant. To find its exact solution, algorithms belonging to the category ”branch and bound methods” have long been used. To speed up the solving with varying degrees of efficiency, various options for parallelizing computations are also used. We propose here an algorithm for solving the problem, based on the paradigm of recursive-parallel computations. We consider it suited well for problems of this kind, when it is difficult to immediately break up the computations into a sufficient number of subtasks that are comparable in complexity, since they appear dynamically at run time. We used the RPM ParLib library, developed by the author, as the main tool to program the algorithm. This library allows us to develop effective applications for parallel computing on a local network in the .NET Framework. Such applications have the ability to generate parallel branches of computation directly during program execution and dynamically redistribute work between computing modules. Any language with support for the .NET Framework can be used as a programming language in conjunction with this library. For our experiments, we developed some C# applications using this library. The main purpose of these experiments was to study the acceleration achieved by recursive-parallel computing. A detailed description of the algorithm and its testing, as well as the results obtained, are also given in the paper.
A case study in application I/O on Linux clusters

International Nuclear Information System (INIS)

Ross, R.; Nurmi, D.; Cheng, A.; Zingale, M.

2001-01-01

A critical but often ignored component of system performance is the I/O system. Today's applications expect a great deal from underlying storage systems and software, and both high performance distributed storage and high level interfaces have been developed to fill these needs. In this paper they discuss the I/O performance of a parallel scientific application on a Linux cluster, the FLASH astrophysics code. This application relies on three I/O software components to provide high performance parallel I/O on Linux clusters: the Parallel Virtual File System (PVFS), the ROMIO MPI-IO implementation, and the Hierarchical Data Format (HDF5) library. First they discuss the roles played by each of these components in providing an I/O solution. Next they discuss the FLASH I/O benchmark and point out its relevance. Following this they examine the performance of the benchmark, and through instrumentation of both the application and underlying system software code they discover the location of major software bottlenecks. They work around the most inhibiting of these bottlenecks, showing substantial performance improvement. Finally they point out similarities between the inefficiencies found here and those found in message passing systems, indicating that research in the message passing field could be leveraged to solve similar problems in high-level I/O interfaces
Proceeding on the scientific meeting and presentation on accelerator technology and its applications: physics, nuclear reactor

International Nuclear Information System (INIS)

Pramudita Anggraita; Sudjatmoko; Darsono; Tri Marji Atmono; Tjipto Sujitno; Wahini Nurhayati

2012-01-01

The scientific meeting and presentation on accelerator technology and its applications was held by PTAPB BATAN on 13 December 2011. This meeting aims to promote the technology and its applications to accelerator scientists, academics, researchers and technology users as well as accelerator-based accelerator research that have been conducted by researchers in and outside BATAN. This proceeding contains 23 papers about physics and nuclear reactor. (PPIKSN)
Results of data base management system parameterized performance testing related to GSFC scientific applications

Science.gov (United States)

Carchedi, C. H.; Gough, T. L.; Huston, H. A.

1983-01-01

The results of a variety of tests designed to demonstrate and evaluate the performance of several commercially available data base management system (DBMS) products compatible with the Digital Equipment Corporation VAX 11/780 computer system are summarized. The tests were performed on the INGRES, ORACLE, and SEED DBMS products employing applications that were similar to scientific applications under development by NASA. The objectives of this testing included determining the strength and weaknesses of the candidate systems, performance trade-offs of various design alternatives and the impact of some installation and environmental (computer related) influences.
Data parallel sorting for particle simulation

Science.gov (United States)

Dagum, Leonardo

1992-01-01

Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.
High performance parallel computers for science

International Nuclear Information System (INIS)

Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.

1989-01-01

This paper reports that Fermilab's Advanced Computer Program (ACP) has been developing cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 Mflops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction
Parallel Execution of Multi Set Constraint Rewrite Rules

DEFF Research Database (Denmark)

Sulzmann, Martin; Lam, Edmund Soon Lee

2008-01-01

that the underlying constraint rewrite implementation executes rewrite steps in parallel on increasingly popular becoming multi-core architectures. We design and implement efficient algorithms which allow for the parallel execution of multi-set constraint rewrite rules. Our experiments show that we obtain some......Multi-set constraint rewriting allows for a highly parallel computational model and has been used in a multitude of application domains such as constraint solving, agent specification etc. Rewriting steps can be applied simultaneously as long as they do not interfere with each other.We wish...
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB

Science.gov (United States)

Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.

2017-01-01

Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
Parallel processing of Monte Carlo code MCNP for particle transport problem

Energy Technology Data Exchange (ETDEWEB)

Higuchi, Kenji; Kawasaki, Takuji

1996-06-01

It is possible to vectorize or parallelize Monte Carlo codes (MC code) for photon and neutron transport problem, making use of independency of the calculation for each particle. Applicability of existing MC code to parallel processing is mentioned. As for parallel computer, we have used both vector-parallel processor and scalar-parallel processor in performance evaluation. We have made (i) vector-parallel processing of MCNP code on Monte Carlo machine Monte-4 with four vector processors, (ii) parallel processing on Paragon XP/S with 256 processors. In this report we describe the methodology and results for parallel processing on two types of parallel or distributed memory computers. In addition, we mention the evaluation of parallel programming environments for parallel computers used in the present work as a part of the work developing STA (Seamless Thinking Aid) Basic Software. (author)
MEDUSA - An overset grid flow solver for network-based parallel computer systems

Science.gov (United States)

Smith, Merritt H.; Pallis, Jani M.

1993-01-01

Continuing improvement in processing speed has made it feasible to solve the Reynolds-Averaged Navier-Stokes equations for simple three-dimensional flows on advanced workstations. Combining multiple workstations into a network-based heterogeneous parallel computer allows the application of programming principles learned on MIMD (Multiple Instruction Multiple Data) distributed memory parallel computers to the solution of larger problems. An overset-grid flow solution code has been developed which uses a cluster of workstations as a network-based parallel computer. Inter-process communication is provided by the Parallel Virtual Machine (PVM) software. Solution speed equivalent to one-third of a Cray-YMP processor has been achieved from a cluster of nine commonly used engineering workstation processors. Load imbalance and communication overhead are the principal impediments to parallel efficiency in this application.
Scientific evaluation at the CEA

International Nuclear Information System (INIS)

1999-01-01

This report presents a statement of the scientific and technical activity of the French atomic energy commission (CEA) for the year 1998. This evaluation is made by external and independent experts and requires some specific dispositions for the nuclear protection and safety institute (IPSN) and for the direction of military applications (DAM). The report is divided into 5 parts dealing successively with: part 1 - the CEA, a public research organization (civil nuclear research, technology research and transfers, defence activities); the scientific and technical evaluation at the CEA (general framework, evaluation of the IPSN and DAM); part 2 - the scientific and technical councils (directions of fuel cycle, of nuclear reactors, and of advanced technologies); part 3 - the scientific councils (directions of matter and of life sciences); the nuclear protection and safety institute; the direction of military applications; part 4 - the corresponding members of the evaluation; part 5 - the list of scientific and technical councils and members. (J.S.)
Parallel computing techniques for rotorcraft aerodynamics

Science.gov (United States)

Ekici, Kivanc

The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).
Unified Lambert Tool for Massively Parallel Applications in Space Situational Awareness

Science.gov (United States)

Woollands, Robyn M.; Read, Julie; Hernandez, Kevin; Probe, Austin; Junkins, John L.

2018-03-01

Awareness computer cluster at the LASR Lab, Texas A&M University. We demonstrate the power of our tool by solving a highly parallel example problem, that is the generation of extremal field maps for optimal spacecraft rendezvous (and eventual orbit debris removal). In addition we demonstrate the need for including perturbative effects in simulations for satellite tracking or data association. The unified Lambert tool is ideal for but not limited to space situational awareness applications.
Parallel In Situ Indexing for Data-intensive Computing

Energy Technology Data Exchange (ETDEWEB)

Kim, Jinoh; Abbasi, Hasan; Chacon, Luis; Docan, Ciprian; Klasky, Scott; Liu, Qing; Podhorszki, Norbert; Shoshani, Arie; Wu, Kesheng

2011-09-09

As computing power increases exponentially, vast amount of data is created by many scientific re- search activities. However, the bandwidth for storing the data to disks and reading the data from disks has been improving at a much slower pace. These two trends produce an ever-widening data access gap. Our work brings together two distinct technologies to address this data access issue: indexing and in situ processing. From decades of database research literature, we know that indexing is an effective way to address the data access issue, particularly for accessing relatively small fraction of data records. As data sets increase in sizes, more and more analysts need to use selective data access, which makes indexing an even more important for improving data access. The challenge is that most implementations of in- dexing technology are embedded in large database management systems (DBMS), but most scientific datasets are not managed by any DBMS. In this work, we choose to include indexes with the scientific data instead of requiring the data to be loaded into a DBMS. We use compressed bitmap indexes from the FastBit software which are known to be highly effective for query-intensive workloads common to scientific data analysis. To use the indexes, we need to build them first. The index building procedure needs to access the whole data set and may also require a significant amount of compute time. In this work, we adapt the in situ processing technology to generate the indexes, thus removing the need of read- ing data from disks and to build indexes in parallel. The in situ data processing system used is ADIOS, a middleware for high-performance I/O. Our experimental results show that the indexes can improve the data access time up to 200 times depending on the fraction of data selected, and using in situ data processing system can effectively reduce the time needed to create the indexes, up to 10 times with our in situ technique when using identical parallel settings.

78 FR 27186 - Application(s) for Duty-Free Entry of Scientific Instruments

Science.gov (United States)

2013-05-09

... Scientific Instruments Pursuant to Section 6(c) of the Educational, Scientific and Cultural Materials...: New Mexico Institute of Mining and Technology, 801 Leroy Place, Socorro, NM 87801. Instrument: Delay... dimensions. The experiments depend on this fast 3D scanning to capture sufficient data from the dendrites of...
Parallel computation of nondeterministic algorithms in VLSI

Energy Technology Data Exchange (ETDEWEB)

Hortensius, P D

1987-01-01

This work examines parallel VLSI implementations of nondeterministic algorithms. It is demonstrated that conventional pseudorandom number generators are unsuitable for highly parallel applications. Efficient parallel pseudorandom sequence generation can be accomplished using certain classes of elementary one-dimensional cellular automata. The pseudorandom numbers appear in parallel on each clock cycle. Extensive study of the properties of these new pseudorandom number generators is made using standard empirical random number tests, cycle length tests, and implementation considerations. Furthermore, it is shown these particular cellular automata can form the basis of efficient VLSI architectures for computations involved in the Monte Carlo simulation of both the percolation and Ising models from statistical mechanics. Finally, a variation on a Built-In Self-Test technique based upon cellular automata is presented. These Cellular Automata-Logic-Block-Observation (CALBO) circuits improve upon conventional design for testability circuitry.
Center for Programming Models for Scalable Parallel Computing - Towards Enhancing OpenMP for Manycore and Heterogeneous Nodes

Energy Technology Data Exchange (ETDEWEB)

Barbara Chapman

2012-02-01

OpenMP was not well recognized at the beginning of the project, around year 2003, because of its limited use in DoE production applications and the inmature hardware support for an efficient implementation. Yet in the recent years, it has been graduately adopted both in HPC applications, mostly in the form of MPI+OpenMP hybrid code, and in mid-scale desktop applications for scientific and experimental studies. We have observed this trend and worked deligiently to improve our OpenMP compiler and runtimes, as well as to work with the OpenMP standard organization to make sure OpenMP are evolved in the direction close to DoE missions. In the Center for Programming Models for Scalable Parallel Computing project, the HPCTools team at the University of Houston (UH), directed by Dr. Barbara Chapman, has been working with project partners, external collaborators and hardware vendors to increase the scalability and applicability of OpenMP for multi-core (and future manycore) platforms and for distributed memory systems by exploring different programming models, language extensions, compiler optimizations, as well as runtime library support.
Parallel Genetic Algorithms for calibrating Cellular Automata models: Application to lava flows

International Nuclear Information System (INIS)

D'Ambrosio, D.; Spataro, W.; Di Gregorio, S.; Calabria Univ., Cosenza; Crisci, G.M.; Rongo, R.; Calabria Univ., Cosenza

2005-01-01

Cellular Automata are highly nonlinear dynamical systems which are suitable far simulating natural phenomena whose behaviour may be specified in terms of local interactions. The Cellular Automata model SCIARA, developed far the simulation of lava flows, demonstrated to be able to reproduce the behaviour of Etnean events. However, in order to apply the model far the prediction of future scenarios, a thorough calibrating phase is required. This work presents the application of Genetic Algorithms, general-purpose search algorithms inspired to natural selection and genetics, far the parameters optimisation of the model SCIARA. Difficulties due to the elevated computational time suggested the adoption a Master-Slave Parallel Genetic Algorithm far the calibration of the model with respect to the 2001 Mt. Etna eruption. Results demonstrated the usefulness of the approach, both in terms of computing time and quality of performed simulations
Parallel adaptive simulations on unstructured meshes

International Nuclear Information System (INIS)

Shephard, M S; Jansen, K E; Sahni, O; Diachin, L A

2007-01-01

This paper discusses methods being developed by the ITAPS center to support the execution of parallel adaptive simulations on unstructured meshes. The paper first outlines the ITAPS approach to the development of interoperable mesh, geometry and field services to support the needs of SciDAC application in these areas. The paper then demonstrates the ability of unstructured adaptive meshing methods built on such interoperable services to effectively solve important physics problems. Attention is then focused on ITAPs' developing ability to solve adaptive unstructured mesh problems on massively parallel computers
Persona and the Performance of Identity : Parallel Developments in the Biographical Historiography of Science and Gender, and the Related Uses of Self Narrative

NARCIS (Netherlands)

Bosch, Mineke

2013-01-01

In this article Bosch explores the parallel development in scientific and gender biography to shed light on the relation between the individual and the collective, the self and society. In the history of science the relational/collective scientific self and the concept of the scientific persona (or
Parallel Connection of Silicon Carbide MOSFETs for Multichip Power Modules

DEFF Research Database (Denmark)

Li, Helong

challenges from the manufacture and application points of view. The less mature manufacture process limits the yield and the single die size of the SiC MOSFETs, which results a smaller current capability of a single SiC MOSFET die. Consequently, in high current application, the paralleled connections of Si...... connections for the paralleled dies are presented and the source of the transient current imbalance is concluded. To mitigate the transient current imbalance in the traditional DBC layout, a novel DBC layout with split output is proposed. First, the working mechanism of the split output topology is studied...... the current sharing performance among the paralleled SiC MOSFET dies in the power module. The proposed DBC layout is not only limited for SiC MOSFETs, but also for Si IGBTs and other voltage controlled devices. of the circuit mismatch on the paralleled connection of SiC MOSFETs. It reveals the circuit...
Teaching RLC Parallel Circuits in High-School Physics Class

Science.gov (United States)

Simon, Alpár

2015-01-01

This paper will try to give an alternative treatment of the subject "parallel RLC circuits" and "resonance in parallel RLC circuits" from the Physics curricula for the XIth grade from Romanian high-schools, with an emphasis on practical type circuits and their possible applications, and intends to be an aid for both Physics…
Parallel Algorithms for Graph Optimization using Tree Decompositions

Energy Technology Data Exchange (ETDEWEB)

Sullivan, Blair D [ORNL; Weerapurage, Dinesh P [ORNL; Groer, Christopher S [ORNL

2012-06-01

Although many $\\cal{NP}$-hard graph optimization problems can be solved in polynomial time on graphs of bounded tree-width, the adoption of these techniques into mainstream scientific computation has been limited due to the high memory requirements of the necessary dynamic programming tables and excessive runtimes of sequential implementations. This work addresses both challenges by proposing a set of new parallel algorithms for all steps of a tree decomposition-based approach to solve the maximum weighted independent set problem. A hybrid OpenMP/MPI implementation includes a highly scalable parallel dynamic programming algorithm leveraging the MADNESS task-based runtime, and computational results demonstrate scaling. This work enables a significant expansion of the scale of graphs on which exact solutions to maximum weighted independent set can be obtained, and forms a framework for solving additional graph optimization problems with similar techniques.
Scientific and Technological Report 2011

International Nuclear Information System (INIS)

Lopez Milla, Alcides; Prado Cuba, Antonio; Agapito Panta, Juan; Montoya Rossi, Eduardo

2013-01-01

This annual scientific and technological report provides an overview of research and development activities at Peruvian Institute of Nuclear Energy (IPEN) during the period from 1 january to 31 december, 2011. This report includes 30 papers divided in 8 subject matters, such as: physics and chemistry, materials science, nuclear engineering, mining industrial and environmental applications, medical and biological applications, radiation protection and nuclear safety, scientific instrumentation and management aspects. It also includes annexes. (APC)
Scientific and Technological Report 2010

International Nuclear Information System (INIS)

Prado Cuba, Antonio; Santiago Contreras, Julio; Solis Veliz, Jose; Lopez Moreno, Edith

2011-10-01

This annual scientific and technological report provides an overview of research and development activities at Peruvian Institute of Nuclear Energy (IPEN) during the period from 1 january to 31 december, 2010. This report includes 41 papers divided in 8 subject matters, such as: physics and chemistry, materials science, nuclear engineering, mining industrial and environmental applications, medical and biological applications, radiation protection and nuclear safety, scientific instrumentation and management aspects. It also includes annexes. (APC)
Hardware-Oblivious Parallelism for In-Memory Column-Stores

NARCIS (Netherlands)

M. Heimel; M. Saecker; H. Pirk (Holger); S. Manegold (Stefan); V. Markl

2013-01-01

htmlabstractThe multi-core architectures of today’s computer systems make parallelism a necessity for performance critical applications. Writing such applications in a generic, hardware-oblivious manner is a challenging problem: Current database systems thus rely on labor-intensive and error-prone
Open archives, the expectations of the scientific communities

CERN Multimedia

CERN. Geneva

2007-01-01

Open archives (OA) started in physics more than 15 years ago with ArXiv, and have since played a more and more important role in the activity of the disciplin; actually, in many fields of physics, ArXiv has now become the major vector or scientific communication. We now have two communication channels in parallel, traditional scientific journals with peer review, and open archives, both with different functionalities and both indispensable. It it therefore interesting to try and transpose to other disciplins the scheme that has worked so well for physicists, which means that the reasons for the success of ArXiv should be analyzed. Scientists do not care about the technicalities, and whether or not the OA is centralized, or distributed with a high level of interoperability. What they wish is to have one single interface where all the scientific information in their domain is available, with the same scientific classifications, etc.. In case of collaborations beween different institutions, they do not wish to ...
Python for Scientific Computing Education: Modeling of Queueing Systems

Directory of Open Access Journals (Sweden)

Vladimiras Dolgopolovas

2014-01-01

Full Text Available In this paper, we present the methodology for the introduction to scientific computing based on model-centered learning. We propose multiphase queueing systems as a basis for learning objects. We use Python and parallel programming for implementing the models and present the computer code and results of stochastic simulations.
Unified dataflow model for the analysis of data and pipeline parallelism, and buffer sizing

NARCIS (Netherlands)

Hausmans, J.P.H.M.; Geuns, S.J.; Wiggers, M.H.; Bekooij, Marco Jan Gerrit

2014-01-01

Real-time stream processing applications such as software defined radios are usually executed concurrently on multiprocessor systems. Exploiting coarse-grained data parallelism by duplicating tasks is often required, besides pipeline parallelism, to meet the temporal constraints of the applications.
Towards measurement of the Casimir force between parallel plates separated at sub-mircon distance

NARCIS (Netherlands)

Syed Nawazuddin, M.B.; Lammerink, Theodorus S.J.; Wiegerink, Remco J.; Berenschot, Johan W.; de Boer, Meint J.; Elwenspoek, Michael Curt

2011-01-01

Ever since its prediction, experimental investigation of the Casimir force has been of great scientific interest. Many research groups have successfully attempted quantifying the force with different device geometries; however measurement of the Casimir force between parallel plates with sub-micron
A COMPUTER CLUSTER SYSTEM FOR PSEUDO-PARALLEL EXECUTION OF GEANT4 SERIAL APPLICATION

Directory of Open Access Journals (Sweden)

Memmo Federici

2013-12-01

Full Text Available Simulation of the interactions between particles and matter in studies for developing X-rays detectors generally requires very long calculation times (up to several days or weeks. These times are often a serious limitation for the success of the simulations and for the accuracy of the simulated models. One of the tools used by the scientific community to perform these simulations is Geant4 (Geometry And Tracking [2, 3]. On the best of experience in the design of the AVES cluster computing system, Federici et al. [1], the IAPS (Istituto di Astrofisica e Planetologia Spaziali INAF laboratories were able to develop a cluster computer system dedicated to Geant 4. The Cluster is easy to use and easily expandable, and thanks to the design criteria adopted it achieves an excellent compromise between performance and cost. The management software developed for the Cluster splits the single instance of simulation on the cores available, allowing the use of software written for serial computation to reach a computing speed similar to that obtainable from a native parallel software. The simulations carried out on the Cluster showed an increase in execution time by a factor of 20 to 60 compared to the times obtained with the use of a single PC of medium quality.
The FORCE: A portable parallel programming language supporting computational structural mechanics

Science.gov (United States)

Jordan, Harry F.; Benten, Muhammad S.; Brehm, Juergen; Ramanan, Aruna

1989-01-01

This project supports the conversion of codes in Computational Structural Mechanics (CSM) to a parallel form which will efficiently exploit the computational power available from multiprocessors. The work is a part of a comprehensive, FORTRAN-based system to form a basis for a parallel version of the NICE/SPAR combination which will form the CSM Testbed. The software is macro-based and rests on the force methodology developed by the principal investigator in connection with an early scientific multiprocessor. Machine independence is an important characteristic of the system so that retargeting it to the Flex/32, or any other multiprocessor on which NICE/SPAR might be imnplemented, is well supported. The principal investigator has experience in producing parallel software for both full and sparse systems of linear equations using the force macros. Other researchers have used the Force in finite element programs. It has been possible to rapidly develop software which performs at maximum efficiency on a multiprocessor. The inherent machine independence of the system also means that the parallelization will not be limited to a specific multiprocessor.
Parallel Programming with Intel Parallel Studio XE

CERN Document Server

Blair-Chappell , Stephen

2012-01-01

Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the
Domain-Specific Acceleration and Auto-Parallelization of Legacy Scientific Code in FORTRAN 77 using Source-to-Source Compilation

OpenAIRE

Vanderbauwhede, Wim; Davidson, Gavin

2017-01-01

Massively parallel accelerators such as GPGPUs, manycores and FPGAs represent a powerful and affordable tool for scientists who look to speed up simulations of complex systems. However, porting code to such devices requires a detailed understanding of heterogeneous programming tools and effective strategies for parallelization. In this paper we present a source to source compilation approach with whole-program analysis to automatically transform single-threaded FORTRAN 77 legacy code into Ope...

Method of Parallel-Hierarchical Network Self-Training and its Application for Pattern Classification and Recognition

Directory of Open Access Journals (Sweden)

TIMCHENKO, L.

2012-11-01

Full Text Available Propositions necessary for development of parallel-hierarchical (PH network training methods are discussed in this article. Unlike already known structures of the artificial neural network, where non-normalized (absolute similarity criteria are used for comparison, the suggested structure uses a normalized criterion. Based on the analysis of training rules, a conclusion is made that application of two training methods with a teacher is optimal for PH network training: error correction-based training and memory-based training. Mathematical models of training and a combined method of PH network training for recognition of static and dynamic patterns are developed.
BCYCLIC: A parallel block tridiagonal matrix cyclic solver

Science.gov (United States)

Hirshman, S. P.; Perumalla, K. S.; Lynch, V. E.; Sanchez, R.

2010-09-01

A block tridiagonal matrix is factored with minimal fill-in using a cyclic reduction algorithm that is easily parallelized. Storage of the factored blocks allows the application of the inverse to multiple right-hand sides which may not be known at factorization time. Scalability with the number of block rows is achieved with cyclic reduction, while scalability with the block size is achieved using multithreaded routines (OpenMP, GotoBLAS) for block matrix manipulation. This dual scalability is a noteworthy feature of this new solver, as well as its ability to efficiently handle arbitrary (non-powers-of-2) block row and processor numbers. Comparison with a state-of-the art parallel sparse solver is presented. It is expected that this new solver will allow many physical applications to optimally use the parallel resources on current supercomputers. Example usage of the solver in magneto-hydrodynamic (MHD), three-dimensional equilibrium solvers for high-temperature fusion plasmas is cited.
Why not make a PC cluster of your own? 5. AppleSeed: A Parallel Macintosh Cluster for Scientific Computing

Science.gov (United States)

Decyk, Viktor K.; Dauger, Dean E.

We have constructed a parallel cluster consisting of Apple Macintosh G4 computers running both Classic Mac OS as well as the Unix-based Mac OS X, and have achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. Unlike other Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the mainstream of computing.
The StratusLab cloud distribution: Use-cases and support for scientific applications

Science.gov (United States)

Floros, E.

2012-04-01

The StratusLab project is integrating an open cloud software distribution that enables organizations to setup and provide their own private or public IaaS (Infrastructure as a Service) computing clouds. StratusLab distribution capitalizes on popular infrastructure virtualization solutions like KVM, the OpenNebula virtual machine manager, Claudia service manager and SlipStream deployment platform, which are further enhanced and expanded with additional components developed within the project. The StratusLab distribution covers the core aspects of a cloud IaaS architecture, namely Computing (life-cycle management of virtual machines), Storage, Appliance management and Networking. The resulting software stack provides a packaged turn-key solution for deploying cloud computing services. The cloud computing infrastructures deployed using StratusLab can support a wide range of scientific and business use cases. Grid computing has been the primary use case pursued by the project and for this reason the initial priority has been the support for the deployment and operation of fully virtualized production-level grid sites; a goal that has already been achieved by operating such a site as part of EGI's (European Grid Initiative) pan-european grid infrastructure. In this area the project is currently working to provide non-trivial capabilities like elastic and autonomic management of grid site resources. Although grid computing has been the motivating paradigm, StratusLab's cloud distribution can support a wider range of use cases. Towards this direction, we have developed and currently provide support for setting up general purpose computing solutions like Hadoop, MPI and Torque clusters. For what concerns scientific applications the project is collaborating closely with the Bioinformatics community in order to prepare VM appliances and deploy optimized services for bioinformatics applications. In a similar manner additional scientific disciplines like Earth Science can take
Using the Eclipse Parallel Tools Platform to Assist Earth Science Model Development and Optimization on High Performance Computers

Science.gov (United States)

Alameda, J. C.

2011-12-01

Development and optimization of computational science models, particularly on high performance computers, and with the advent of ubiquitous multicore processor systems, practically on every system, has been accomplished with basic software tools, typically, command-line based compilers, debuggers, performance tools that have not changed substantially from the days of serial and early vector computers. However, model complexity, including the complexity added by modern message passing libraries such as MPI, and the need for hybrid code models (such as openMP and MPI) to be able to take full advantage of high performance computers with an increasing core count per shared memory node, has made development and optimization of such codes an increasingly arduous task. Additional architectural developments, such as many-core processors, only complicate the situation further. In this paper, we describe how our NSF-funded project, "SI2-SSI: A Productive and Accessible Development Workbench for HPC Applications Using the Eclipse Parallel Tools Platform" (WHPC) seeks to improve the Eclipse Parallel Tools Platform, an environment designed to support scientific code development targeted at a diverse set of high performance computing systems. Our WHPC project to improve Eclipse PTP takes an application-centric view to improve PTP. We are using a set of scientific applications, each with a variety of challenges, and using PTP to drive further improvements to both the scientific application, as well as to understand shortcomings in Eclipse PTP from an application developer perspective, to drive our list of improvements we seek to make. We are also partnering with performance tool providers, to drive higher quality performance tool integration. We have partnered with the Cactus group at Louisiana State University to improve Eclipse's ability to work with computational frameworks and extremely complex build systems, as well as to develop educational materials to incorporate into
A Comparative Taxonomy of Parallel Algorithms for RNA Secondary Structure Prediction

Science.gov (United States)

Al-Khatib, Ra’ed M.; Abdullah, Rosni; Rashid, Nur’Aini Abdul

2010-01-01

RNA molecules have been discovered playing crucial roles in numerous biological and medical procedures and processes. RNA structures determination have become a major problem in the biology context. Recently, computer scientists have empowered the biologists with RNA secondary structures that ease an understanding of the RNA functions and roles. Detecting RNA secondary structure is an NP-hard problem, especially in pseudoknotted RNA structures. The detection process is also time-consuming; as a result, an alternative approach such as using parallel architectures is a desirable option. The main goal in this paper is to do an intensive investigation of parallel methods used in the literature to solve the demanding issues, related to the RNA secondary structure prediction methods. Then, we introduce a new taxonomy for the parallel RNA folding methods. Based on this proposed taxonomy, a systematic and scientific comparison is performed among these existing methods. PMID:20458364
Parallelization of a Monte Carlo particle transport simulation code

Science.gov (United States)

Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

2010-05-01

We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
A novel numerical approach for workspace determination of parallel mechanisms

Energy Technology Data Exchange (ETDEWEB)

Zhou, Yiqun; Niu, Junchuan; Liu, Zhihui; Zhang, Fuliang [Shandong University, Shandong (China)

2017-06-15

In this paper, a novel numerical approach is proposed for workspace determination of parallel mechanisms. Compared with the classical numerical approaches, this presented approach discretizes both location and orientation of the mechanism simultaneously, not only one of the two. This technique makes the presented numerical approach applicable in determining almost all types of workspaces, while traditional numerical approaches are only applicable in determining the constant orientation workspace and orientation workspace. The presented approach and its steps to determine the inclusive orientation workspace and total orientation workspace are described in detail. A lower-mobility parallel mechanism and a six-degrees-of-freedom Stewart platform are set as examples, the workspaces of these mechanisms are estimated and visualized by the proposed numerical approach. Furthermore, the efficiency of the presented approach is discussed. The examples show that the presented approach is applicable in determining the inclusive orientation workspace and total orientation workspace of parallel mechanisms with high efficiency.
Medical Image Retrieval Based On the Parallelization of the Cluster Sampling Algorithm

OpenAIRE

Ali, Hesham Arafat; Attiya, Salah; El-henawy, Ibrahim

2017-01-01

In this paper we develop parallel cluster sampling algorithms and show that a multi-chain version is embarrassingly parallel and can be used efficiently for medical image retrieval among other applications.
Parallelization of ITOUGH2 using PVM

International Nuclear Information System (INIS)

Finsterle, Stefan

1998-01-01

ITOUGH2 inversions are computationally intensive because the forward problem must be solved many times to evaluate the objective function for different parameter combinations or to numerically calculate sensitivity coefficients. Most of these forward runs are independent from each other and can therefore be performed in parallel. Message passing based on the Parallel Virtual Machine (PVM) system has been implemented into ITOUGH2 to enable parallel processing of ITOUGH2 jobs on a heterogeneous network of Unix workstations. This report describes the PVM system and its implementation into ITOUGH2. Instructions are given for installing PVM, compiling ITOUGH2-PVM for use on a workstation cluster, the preparation of an 1.TOUGH2 input file under PVM, and the execution of an ITOUGH2-PVM application. Examples are discussed, demonstrating the use of ITOUGH2-PVM
Historic Learning Approach for Auto-tuning OpenACC Accelerated Scientific Applications

KAUST Repository

Siddiqui, Shahzeb

2015-04-17

The performance optimization of scientific applications usually requires an in-depth knowledge of the hardware and software. A performance tuning mechanism is suggested to automatically tune OpenACC parameters to adapt to the execution environment on a given system. A historic learning based methodology is suggested to prune the parameter search space for a more efficient auto-tuning process. This approach is applied to tune the OpenACC gang and vector clauses for a better mapping of the compute kernels onto the underlying architecture. Our experiments show a significant performance improvement against the default compiler parameters and drastic reduction in tuning time compared to a brute force search-based approach.
Application of parallelized software architecture to an autonomous ground vehicle

Science.gov (United States)

Shakya, Rahul; Wright, Adam; Shin, Young Ho; Momin, Orko; Petkovsek, Steven; Wortman, Paul; Gautam, Prasanna; Norton, Adam

2011-01-01

This paper presents improvements made to Q, an autonomous ground vehicle designed to participate in the Intelligent Ground Vehicle Competition (IGVC). For the 2010 IGVC, Q was upgraded with a new parallelized software architecture and a new vision processor. Improvements were made to the power system reducing the number of batteries required for operation from six to one. In previous years, a single state machine was used to execute the bulk of processing activities including sensor interfacing, data processing, path planning, navigation algorithms and motor control. This inefficient approach led to poor software performance and made it difficult to maintain or modify. For IGVC 2010, the team implemented a modular parallel architecture using the National Instruments (NI) LabVIEW programming language. The new architecture divides all the necessary tasks - motor control, navigation, sensor data collection, etc. into well-organized components that execute in parallel, providing considerable flexibility and facilitating efficient use of processing power. Computer vision is used to detect white lines on the ground and determine their location relative to the robot. With the new vision processor and some optimization of the image processing algorithm used last year, two frames can be acquired and processed in 70ms. With all these improvements, Q placed 2nd in the autonomous challenge.
A Parallel Compact Multi-Dimensional Numerical Algorithm with Aeroacoustics Applications

Science.gov (United States)

Povitsky, Alex; Morris, Philip J.

1999-01-01

In this study we propose a novel method to parallelize high-order compact numerical algorithms for the solution of three-dimensional PDEs (Partial Differential Equations) in a space-time domain. For this numerical integration most of the computer time is spent in computation of spatial derivatives at each stage of the Runge-Kutta temporal update. The most efficient direct method to compute spatial derivatives on a serial computer is a version of Gaussian elimination for narrow linear banded systems known as the Thomas algorithm. In a straightforward pipelined implementation of the Thomas algorithm processors are idle due to the forward and backward recurrences of the Thomas algorithm. To utilize processors during this time, we propose to use them for either non-local data independent computations, solving lines in the next spatial direction, or local data-dependent computations by the Runge-Kutta method. To achieve this goal, control of processor communication and computations by a static schedule is adopted. Thus, our parallel code is driven by a communication and computation schedule instead of the usual "creative, programming" approach. The obtained parallelization speed-up of the novel algorithm is about twice as much as that for the standard pipelined algorithm and close to that for the explicit DRP algorithm.
First scientific application of the membrane cryostat technology

Energy Technology Data Exchange (ETDEWEB)

Montanari, David; Adamowski, Mark; Baller, Bruce R.; Barger, Robert K.; Chi, Edward C.; Davis, Ronald P.; Johnson, Bryan D.; Kubinski, Bob M.; Najdzion, John J.; Rucinski, Russel A.; Schmitt, Rich L.; Tope, Terry E. [Particle Physics Division, Fermilab, P.O. Box 500, Batavia, IL 60510 (United States); Mahoney, Ryan; Norris, Barry L.; Watkins, Daniel J. [Technical Division, Fermilab, P.O. Box 500, Batavia, IL 60510 (United States); McCluskey, Elaine G. [LBNE Project, Fermilab, P.O. Box 500, Batavia, IL 60510 (United States); Stewart, James [Physics Department, Brookhaven National Laboratory, P.O. Box 5000, Uptown, NY 11973 (United States)

2014-01-29

We report on the design, fabrication, performance and commissioning of the first membrane cryostat to be used for scientific application. The Long Baseline Neutrino Experiment (LBNE) has designed and fabricated a membrane cryostat prototype in collaboration with IHI Corporation (IHI). Original goals of the prototype are: to demonstrate the membrane cryostat technology in terms of thermal performance, feasibility for liquid argon, and leak tightness; to demonstrate that we can remove all the impurities from the vessel and achieve the purity requirements in a membrane cryostat without evacuation and using only a controlled gaseous argon purge; to demonstrate that we can achieve and maintain the purity requirements of the liquid argon during filling, purification, and maintenance mode using mole sieve and copper filters from the Liquid Argon Purity Demonstrator (LAPD) R and D project. The purity requirements of a large liquid argon detector such as LBNE are contaminants below 200 parts per trillion oxygen equivalent. This paper gives the requirements, design, construction, and performance of the LBNE membrane cryostat prototype, with experience and results important to the development of the LBNE detector.
Exploiting variability for energy optimization of parallel programs

Energy Technology Data Exchange (ETDEWEB)

Lavrijsen, Wim [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Iancu, Costin [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); de Jong, Wibe [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Chen, Xin [Georgia Inst. of Technology, Atlanta, GA (United States); Schwan, Karsten [Georgia Inst. of Technology, Atlanta, GA (United States)

2016-04-18

Here in this paper we present optimizations that use DVFS mechanisms to reduce the total energy usage in scientific applications. Our main insight is that noise is intrinsic to large scale parallel executions and it appears whenever shared resources are contended. The presence of noise allows us to identify and manipulate any program regions amenable to DVFS. When compared to previous energy optimizations that make per core decisions using predictions of the running time, our scheme uses a qualitative approach to recognize the signature of executions amenable to DVFS. By recognizing the "shape of variability" we can optimize codes with highly dynamic behavior, which pose challenges to all existing DVFS techniques. We validate our approach using offline and online analyses for one-sided and two-sided communication paradigms. We have applied our methods to NWChem, and we show best case improvements in energy use of 12% at no loss in performance when using online optimizations running on 720 Haswell cores with one-sided communication. With NWChem on MPI two-sided and offline analysis, capturing the initialization, we find energy savings of up to 20%, with less than 1% performance cost.
CUBESIM, Hypercube and Denelcor Hep Parallel Computer Simulation

International Nuclear Information System (INIS)

Dunigan, T.H.

1988-01-01

1 - Description of program or function: CUBESIM is a set of subroutine libraries and programs for the simulation of message-passing parallel computers and shared-memory parallel computers. Subroutines are supplied to simulate the Intel hypercube and the Denelcor HEP parallel computers. The system permits a user to develop and test parallel programs written in C or FORTRAN on a single processor. The user may alter such hypercube parameters as message startup times, packet size, and the computation-to-communication ratio. The simulation generates a trace file that can be used for debugging, performance analysis, or graphical display. 2 - Method of solution: The CUBESIM simulator is linked with the user's parallel application routines to run as a single UNIX process. The simulator library provides a small operating system to perform process and message management. 3 - Restrictions on the complexity of the problem: Up to 128 processors can be simulated with a virtual memory limit of 6 million bytes. Up to 1000 processes can be simulated
Exploiting Thread Parallelism for Ocean Modeling on Cray XC Supercomputers

Energy Technology Data Exchange (ETDEWEB)

Sarje, Abhinav [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Jacobsen, Douglas W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Williams, Samuel W. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Ringler, Todd [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

2016-05-01

The incorporation of increasing core counts in modern processors used to build state-of-the-art supercomputers is driving application development towards exploitation of thread parallelism, in addition to distributed memory parallelism, with the goal of delivering efficient high-performance codes. In this work we describe the exploitation of threading and our experiences with it with respect to a real-world ocean modeling application code, MPAS-Ocean. We present detailed performance analysis and comparisons of various approaches and configurations for threading on the Cray XC series supercomputers.
The development of a scalable parallel 3-D CFD algorithm for turbomachinery. M.S. Thesis Final Report

Science.gov (United States)

Luke, Edward Allen

1993-01-01

Two algorithms capable of computing a transonic 3-D inviscid flow field about rotating machines are considered for parallel implementation. During the study of these algorithms, a significant new method of measuring the performance of parallel algorithms is developed. The theory that supports this new method creates an empirical definition of scalable parallel algorithms that is used to produce quantifiable evidence that a scalable parallel application was developed. The implementation of the parallel application and an automated domain decomposition tool are also discussed.
Scientific Technological Report 2002

International Nuclear Information System (INIS)

Gayoso C, C.; Cuya G, T.; Robles N, A.; Prado C, A.

2003-07-01

This annual scientific-technological report provides an overview of research and development activities at Peruvian Institute of Nuclear Energy (IPEN) during the period from 1 january to 31 december, 2002. This report includes 58 papers divided in 10 subject matters: physics and nuclear chemistry, nuclear engineering, materials, industrial applications, biological applications, medical applications, environmental applications, protection and radiological safety, nuclear safety, and management aspects
PARALLEL ADAPTIVE MULTILEVEL SAMPLING ALGORITHMS FOR THE BAYESIAN ANALYSIS OF MATHEMATICAL MODELS

KAUST Repository

Prudencio, Ernesto; Cheung, Sai Hung

2012-01-01

In recent years, Bayesian model updating techniques based on measured data have been applied to many engineering and applied science problems. At the same time, parallel computational platforms are becoming increasingly more powerful and are being used more frequently by the engineering and scientific communities. Bayesian techniques usually require the evaluation of multi-dimensional integrals related to the posterior probability density function (PDF) of uncertain model parameters. The fact that such integrals cannot be computed analytically motivates the research of stochastic simulation methods for sampling posterior PDFs. One such algorithm is the adaptive multilevel stochastic simulation algorithm (AMSSA). In this paper we discuss the parallelization of AMSSA, formulating the necessary load balancing step as a binary integer programming problem. We present a variety of results showing the effectiveness of load balancing on the overall performance of AMSSA in a parallel computational environment.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.