WorldWideScience

Sample records for highly parallel applications

  1. Data driven parallelism in experimental high energy physics applications

    International Nuclear Information System (INIS)

    Pohl, M.

    1987-01-01

    I present global design principles for the implementation of high energy physics data analysis code on sequential and parallel processors with mixed shared and local memory. Potential parallelism in the structure of high energy physics tasks is identified with granularity varying from a few times 10 8 instructions all the way down to a few times 10 4 instructions. It follows the hierarchical structure of detector and data acquisition systems. To take advantage of this - yet preserving the necessary portability of the code - I propose a computational model with purely data driven concurrency in Single Program Multiple Data (SPMD) mode. The task granularity is defined by varying the granularity of the central data structure manipulated. Concurrent processes coordiate themselves asynchroneously using simple lock constructs on parts of the data structure. Load balancing among processes occurs naturally. The scheme allows to map the internal layout of the data structure closely onto the layout of local and shared memory in a parallel architecture. It thus allows to optimize the application with respect to synchronization as well as data transport overheads. I present a coarse top level design for a portable implementation of this scheme on sequential machines, multiprocessor mainframes (e.g. IBM 3090), tightly coupled multiprocessors (e.g. RP-3) and loosely coupled processor arrays (e.g. LCAP, Emulating Processor Farms). (orig.)

  2. Data driven parallelism in experimental high energy physics applications

    Science.gov (United States)

    Pohl, Martin

    1987-08-01

    I present global design principles for the implementation of High Energy Physics data analysis code on sequential and parallel processors with mixed shared and local memory. Potential parallelism in the structure of High Energy Physics tasks is identified with granularity varying from a few times 10 8 instructions all the way down to a few times 10 4 instructions. It follows the hierarchical structure of detector and data acquisition systems. To take advantage of this - yet preserving the necessary portability of the code - I propose a computational model with purely data driven concurrency in Single Program Multiple Data (SPMD) mode. The Task granularity is defined by varying the granularity of the central data structure manipulated. Concurrent processes coordinate themselves asynchroneously using simple lock constructs on parts of the data structure. Load balancing among processes occurs naturally. The scheme allows to map the internal layout of the data structure closely onto the layout of local and shared memory in a parallel architecture. It thus allows to optimize the application with respect to synchronization as well as data transport overheads. I present a coarse top level design for a portable implementation of this scheme on sequential machines, multiprocessor mainframes (e.g. IBM 3090), tightly coupled multiprocessors (e.g. RP-3) and loosely coupled processor arrays (e.g. LCAP, Emulating Processor Farms).

  3. Application Portable Parallel Library

    Science.gov (United States)

    Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

    1995-01-01

    Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.

  4. Comparative Study on  Paralleled vs. Scaled Dc-dc Converters  in High Voltage Gain Applications

    DEFF Research Database (Denmark)

    Klimczak, Pawel; Munk-Nielsen, Stig

    2008-01-01

    Today power converters are present in many commercial, medical and industrial applications. A lot of them are high power and high current applications. In order to increase power handling capability several transistors or diodes are paralleled often. However such paralleling may lead to converter...

  5. High performance parallel computing of flows in complex geometries: II. Applications

    International Nuclear Information System (INIS)

    Gourdain, N; Gicquel, L; Staffelbach, G; Vermorel, O; Duchaine, F; Boussuge, J-F; Poinsot, T

    2009-01-01

    Present regulations in terms of pollutant emissions, noise and economical constraints, require new approaches and designs in the fields of energy supply and transportation. It is now well established that the next breakthrough will come from a better understanding of unsteady flow effects and by considering the entire system and not only isolated components. However, these aspects are still not well taken into account by the numerical approaches or understood whatever the design stage considered. The main challenge is essentially due to the computational requirements inferred by such complex systems if it is to be simulated by use of supercomputers. This paper shows how new challenges can be addressed by using parallel computing platforms for distinct elements of a more complex systems as encountered in aeronautical applications. Based on numerical simulations performed with modern aerodynamic and reactive flow solvers, this work underlines the interest of high-performance computing for solving flow in complex industrial configurations such as aircrafts, combustion chambers and turbomachines. Performance indicators related to parallel computing efficiency are presented, showing that establishing fair criterions is a difficult task for complex industrial applications. Examples of numerical simulations performed in industrial systems are also described with a particular interest for the computational time and the potential design improvements obtained with high-fidelity and multi-physics computing methods. These simulations use either unsteady Reynolds-averaged Navier-Stokes methods or large eddy simulation and deal with turbulent unsteady flows, such as coupled flow phenomena (thermo-acoustic instabilities, buffet, etc). Some examples of the difficulties with grid generation and data analysis are also presented when dealing with these complex industrial applications.

  6. SOFTWARE FOR DESIGNING PARALLEL APPLICATIONS

    Directory of Open Access Journals (Sweden)

    M. K. Bouza

    2017-01-01

    Full Text Available The object of research is the tools to support the development of parallel programs in C/C ++. The methods and software which automates the process of designing parallel applications are proposed.

  7. High performance statistical computing with parallel R: applications to biology and climate modelling

    International Nuclear Information System (INIS)

    Samatova, Nagiza F; Branstetter, Marcia; Ganguly, Auroop R; Hettich, Robert; Khan, Shiraj; Kora, Guruprasad; Li, Jiangtian; Ma, Xiaosong; Pan, Chongle; Shoshani, Arie; Yoginath, Srikanth

    2006-01-01

    Ultrascale computing and high-throughput experimental technologies have enabled the production of scientific data about complex natural phenomena. With this opportunity, comes a new problem - the massive quantities of data so produced. Answers to fundamental questions about the nature of those phenomena remain largely hidden in the produced data. The goal of this work is to provide a scalable high performance statistical data analysis framework to help scientists perform interactive analyses of these raw data to extract knowledge. Towards this goal we have been developing an open source parallel statistical analysis package, called Parallel R, that lets scientists employ a wide range of statistical analysis routines on high performance shared and distributed memory architectures without having to deal with the intricacies of parallelizing these routines

  8. Application of parallel connected power-MOSFET elements to high current d.c. power supply

    International Nuclear Information System (INIS)

    Matsukawa, Tatsuya; Shioyama, Masanori; Shimada, Katsuhiro; Takaku, Taku; Neumeyer, Charles; Tsuji-Iio, Shunji; Shimada, Ryuichi

    2001-01-01

    The low aspect ratio spherical torus (ST), which has single turn toroidal field coil, requires the extremely high d.c. current like as 20 MA to energize the coil. Considering the ratings of such extremely high current and low voltage, power-MOSFET element is employed as the switching device for the a.c./d.c. converter of power supply. One of the advantages of power-MOSFET element is low on-state resistance, which is to meet the high current and low voltage operation. Recently, the capacity of power-MOSFET element has been increased and its on-state resistance has been decreased, so that the possibility of construction of high current and low voltage a.c./d.c. converter with parallel connected power-MOSFET elements has been growing. With the aim of developing the high current d.c. power supply using power-MOSFET, the basic characteristics of parallel operation with power-MOSFET elements are experimentally investigated. And, the synchronous rectifier type and the bi-directional self commutated type a.c./d.c. converters using parallel connected power-MOSFET elements are proposed

  9. Design of massively parallel hardware multi-processors for highly-demanding embedded applications

    NARCIS (Netherlands)

    Jozwiak, L.; Jan, Y.

    2013-01-01

    Many new embedded applications require complex computations to be performed to tight schedules, while at the same time demanding low energy consumption and low cost. For implementation of these highly-demanding applications, highly-optimized application-specific multi-processor system-on-a-chip

  10. Applications of Emerging Parallel Optical Link Technology to High Energy Physics Experiments

    International Nuclear Information System (INIS)

    Chramowicz, J.; Kwan, S.; Prosser, A.; Winchell, M.

    2011-01-01

    Modern particle detectors depend upon optical fiber links to deliver event data to upstream trigger and data processing systems. Future detector systems can benefit from the development of dense arrangements of high speed optical links emerging from the telecommunications and storage area network market segments. These links support data transfers in each direction at rates up to 120 Gbps in packages that minimize or even eliminate edge connector requirements. Emerging products include a class of devices known as optical engines which permit assembly of the optical transceivers in close proximity to the electrical interfaces of ASICs and FPGAs which handle the data in parallel electrical format. Such assemblies will reduce required printed circuit board area and minimize electromagnetic interference and susceptibility. We will present test results of some of these parallel components and report on the development of pluggable FPGA Mezzanine Cards equipped with optical engines to provide to collaborators on the Versatile Link Common Project for the HI-LHC at CERN.

  11. Parallel processing for fluid dynamics applications

    International Nuclear Information System (INIS)

    Johnson, G.M.

    1989-01-01

    The impact of parallel processing on computational science and, in particular, on computational fluid dynamics is growing rapidly. In this paper, particular emphasis is given to developments which have occurred within the past two years. Parallel processing is defined and the reasons for its importance in high-performance computing are reviewed. Parallel computer architectures are classified according to the number and power of their processing units, their memory, and the nature of their connection scheme. Architectures which show promise for fluid dynamics applications are emphasized. Fluid dynamics problems are examined for parallelism inherent at the physical level. CFD algorithms and their mappings onto parallel architectures are discussed. Several example are presented to document the performance of fluid dynamics applications on present-generation parallel processing devices

  12. Parallel point-multiplication architecture using combined group operations for high-speed cryptographic applications.

    Directory of Open Access Journals (Sweden)

    Md Selim Hossain

    Full Text Available In this paper, we propose a novel parallel architecture for fast hardware implementation of elliptic curve point multiplication (ECPM, which is the key operation of an elliptic curve cryptography processor. The point multiplication over binary fields is synthesized on both FPGA and ASIC technology by designing fast elliptic curve group operations in Jacobian projective coordinates. A novel combined point doubling and point addition (PDPA architecture is proposed for group operations to achieve high speed and low hardware requirements for ECPM. It has been implemented over the binary field which is recommended by the National Institute of Standards and Technology (NIST. The proposed ECPM supports two Koblitz and random curves for the key sizes 233 and 163 bits. For group operations, a finite-field arithmetic operation, e.g. multiplication, is designed on a polynomial basis. The delay of a 233-bit point multiplication is only 3.05 and 3.56 μs, in a Xilinx Virtex-7 FPGA, for Koblitz and random curves, respectively, and 0.81 μs in an ASIC 65-nm technology, which are the fastest hardware implementation results reported in the literature to date. In addition, a 163-bit point multiplication is also implemented in FPGA and ASIC for fair comparison which takes around 0.33 and 0.46 μs, respectively. The area-time product of the proposed point multiplication is very low compared to similar designs. The performance ([Formula: see text] and Area × Time × Energy (ATE product of the proposed design are far better than the most significant studies found in the literature.

  13. A high performance data parallel tensor contraction framework: Application to coupled electro-mechanics

    Science.gov (United States)

    Poya, Roman; Gil, Antonio J.; Ortigosa, Rogelio

    2017-07-01

    The paper presents aspects of implementation of a new high performance tensor contraction framework for the numerical analysis of coupled and multi-physics problems on streaming architectures. In addition to explicit SIMD instructions and smart expression templates, the framework introduces domain specific constructs for the tensor cross product and its associated algebra recently rediscovered by Bonet et al. (2015, 2016) in the context of solid mechanics. The two key ingredients of the presented expression template engine are as follows. First, the capability to mathematically transform complex chains of operations to simpler equivalent expressions, while potentially avoiding routes with higher levels of computational complexity and, second, to perform a compile time depth-first or breadth-first search to find the optimal contraction indices of a large tensor network in order to minimise the number of floating point operations. For optimisations of tensor contraction such as loop transformation, loop fusion and data locality optimisations, the framework relies heavily on compile time technologies rather than source-to-source translation or JIT techniques. Every aspect of the framework is examined through relevant performance benchmarks, including the impact of data parallelism on the performance of isomorphic and nonisomorphic tensor products, the FLOP and memory I/O optimality in the evaluation of tensor networks, the compilation cost and memory footprint of the framework and the performance of tensor cross product kernels. The framework is then applied to finite element analysis of coupled electro-mechanical problems to assess the speed-ups achieved in kernel-based numerical integration of complex electroelastic energy functionals. In this context, domain-aware expression templates combined with SIMD instructions are shown to provide a significant speed-up over the classical low-level style programming techniques.

  14. Analysis and design of a parallel-connected single active bridge DC-DC converter for high-power wind farm applications

    DEFF Research Database (Denmark)

    Park, Kiwoo; Chen, Zhe

    2013-01-01

    This paper presents a parallel-connected Single Active Bridge (SAB) dc-dc converter for high-power applications. Paralleling lower-power converters can lower the current rating of each modular converter and interleaving the outputs can significantly reduce the magnitudes of input and output curre...

  15. High throughput, parallel scanning probe microscope for nanometrology and nanopatterning applications

    NARCIS (Netherlands)

    Sadeghian Marnani, H.; Paul, P.C.; Herfst, R.W.; Dekker, A.; Winters, J.; Maturova, K.

    2017-01-01

    Scanning Probe microscope (SPM) is an important nanoinstrument for several applications such as bioresearch, metrology, inspection and nanopatterning. Single SPM is associated with relatively slow rate of scanning and low throughput measurement, thus not being suitable for scanning large samples

  16. High performance parallel computers for science

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.

    1989-01-01

    This paper reports that Fermilab's Advanced Computer Program (ACP) has been developing cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 Mflops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction

  17. Design strategies for irregularly adapting parallel applications

    International Nuclear Information System (INIS)

    Oliker, Leonid; Biswas, Rupak; Shan, Hongzhang; Sing, Jaswinder Pal

    2000-01-01

    Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance of dynamically adapting computations. In this work, we examine two major classes of adaptive applications, under five competing programming methodologies and four leading parallel architectures. Results indicate that it is possible to achieve message-passing performance using shared-memory programming techniques by carefully following the same high level strategies. Adaptive applications have computational work loads and communication patterns which change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications are therefore a challenging task. This work examines the implementation of two typical adaptive applications, Dynamic Remeshing and N-Body, across various programming paradigms and architectural platforms. We compare several critical factors of the parallel code development, including performance, programmability, scalability, algorithmic development, and portability

  18. High-speed parallel counter

    International Nuclear Information System (INIS)

    Gus'kov, B.N.; Kalinnikov, V.A.; Krastev, V.R.; Maksimov, A.N.; Nikityuk, N.M.

    1985-01-01

    This paper describes a high-speed parallel counter that contains 31 inputs and 15 outputs and is implemented by integrated circuits of series 500. The counter is designed for fast sampling of events according to the number of particles that pass simultaneously through the hodoscopic plane of the detector. The minimum delay of the output signals relative to the input is 43 nsec. The duration of the output signals can be varied from 75 to 120 nsec

  19. Dynamic perception: Some theorems about the possibility of parallel pattern recognition with an application to high energy physics

    International Nuclear Information System (INIS)

    Perrone, A.; Basti, G.

    1994-01-01

    In the context of M. Minsky's and S. Papert's theorems on the impossibility of evaluating simple linear predicates by parallel architectures the authors want to show how these limitations can be avoided by introducing a generalized input-dependent preprocessing technique that does not suppose any a-priori knowledge of input like in classical input filtering procedures. This technique can be formalized in a very general way and can be also deduced by metamathematical arguments. A further development of the same technique can be applied at level of learning procedure to introduce in such a way the complete notion of open-quotes dynamic perceptronclose quotes. From the experimental standpoint, they show two applications of the open-quotes dynamic perceptronclose quotes in particle track recognition in high-energy accelerators. Firstly, they show the amazing improvement of performances that can be obtained in a perceptron architecture with classical learning by adding their open-quotes dynamicclose quotes pre-processing technique, already introduced last year in another paper presented at this Conference. Secondly, they show the results of this technique extended also at the level of learning procedure always applied to the problem of particle track recognition. This work is a part of open-quotes Feniceclose quotes international collaboration supported by INFN (National Institute for Nuclear Physics) devoted to the study of the time-like electromagnetic form factor of neutrons obtained by electron-positron high energy collisions in ADONE (Frascati, Rome) storage ring

  20. Optimization and parallelization of the thermal–hydraulic subchannel code CTF for high-fidelity multi-physics applications

    International Nuclear Information System (INIS)

    Salko, Robert K.; Schmidt, Rodney C.; Avramova, Maria N.

    2015-01-01

    Highlights: • COBRA-TF was adopted by the Consortium for Advanced Simulation of LWRs. • We have improved code performance to support running large-scale LWR simulations. • Code optimization has led to reductions in execution time and memory usage. • An MPI parallelization has reduced full-core simulation time from days to minutes. - Abstract: This paper describes major improvements to the computational infrastructure of the CTF subchannel code so that full-core, pincell-resolved (i.e., one computational subchannel per real bundle flow channel) simulations can now be performed in much shorter run-times, either in stand-alone mode or as part of coupled-code multi-physics calculations. These improvements support the goals of the Department Of Energy Consortium for Advanced Simulation of Light Water Reactors (CASL) Energy Innovation Hub to develop high fidelity multi-physics simulation tools for nuclear energy design and analysis. A set of serial code optimizations—including fixing computational inefficiencies, optimizing the numerical approach, and making smarter data storage choices—are first described and shown to reduce both execution time and memory usage by about a factor of ten. Next, a “single program multiple data” parallelization strategy targeting distributed memory “multiple instruction multiple data” platforms utilizing domain decomposition is presented. In this approach, data communication between processors is accomplished by inserting standard Message-Passing Interface (MPI) calls at strategic points in the code. The domain decomposition approach implemented assigns one MPI process to each fuel assembly, with each domain being represented by its own CTF input file. The creation of CTF input files, both for serial and parallel runs, is also fully automated through use of a pressurized water reactor (PWR) pre-processor utility that uses a greatly simplified set of user input compared with the traditional CTF input. To run CTF in

  1. Multi-petascale highly efficient parallel supercomputer

    Science.gov (United States)

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen-Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2018-05-15

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.

  2. Parallel processing from applications to systems

    CERN Document Server

    Moldovan, Dan I

    1993-01-01

    This text provides one of the broadest presentations of parallelprocessing available, including the structure of parallelprocessors and parallel algorithms. The emphasis is on mappingalgorithms to highly parallel computers, with extensive coverage ofarray and multiprocessor architectures. Early chapters provideinsightful coverage on the analysis of parallel algorithms andprogram transformations, effectively integrating a variety ofmaterial previously scattered throughout the literature. Theory andpractice are well balanced across diverse topics in this concisepresentation. For exceptional cla

  3. 8051 microcontroller to FPGA and ADC interface design for high speed parallel processing systems – Application in ultrasound scanners

    Directory of Open Access Journals (Sweden)

    J. Jean Rossario Raj

    2016-09-01

    Full Text Available Microcontrollers perform the hardware control in many instruments. Instruments requiring huge data throughput and parallel computing use FPGA’s for data processing. The microcontroller in turn configures the application hardware devices such as FPGA’s, ADC’s and Ethernet chips etc. The interfacing of these devices uses address/data bus interface, serial interface or serial peripheral interface. The choice of the interface depends upon the input/output pins available with different devices, programming ease and proprietary interfaces supported by devices such as ADC’s. The novelty of this paper is to describe the programming logic used for various types of interface scenarios from microcontroller to different programmable devices. The study presented describes the methods and logic flowcharts for different interfaces. The implementation of the interface logics were in prototype hardware for ultrasound scanner. The internal devices were controlled from the graphical user interface in a laptop and the scan results are taken. It is seen that the optimum solution of the hardware design can be achieved by using a common serial interface towards all the devices.

  4. Parallel phase model : a programming model for high-end parallel machines with manycores.

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Junfeng (Syracuse University, Syracuse, NY); Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian

    2009-04-01

    This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.

  5. Parallel processing in nuclear applications

    International Nuclear Information System (INIS)

    Muniz, Francisco Junqueira

    1995-01-01

    This paper summarizes some investigations on effective and scalable dynamic load-balancing mechanisms suitable for distributed-memory (loosely-coupled) MIMD systems. The selected implementation environment is composed of T800 transputers programed in the occam and C languages and an automatic routing package communication software mechanism (the virtual channel router). Tasks were generated, at execution time, using a multiple-spawning mechanism based on a set of remote procedure calls primitives. The objective is to improve maximum resource utilization. In particular, the investigation described here facilitate portability of the user application, since it concentrates on system-level load balancing mechanisms. The load-balancing mechanisms studies are also suitable for systems that can vary in size, concentrating on methods with potential for scalability. Two possible application examples, chosen from the nuclear area, where distributed-memory MIMD machines can be utilized, are mentioned. (author). 24 refs., 1 fig

  6. High performance parallel I/O

    CERN Document Server

    Prabhat

    2014-01-01

    Gain Critical Insight into the Parallel I/O EcosystemParallel I/O is an integral component of modern high performance computing (HPC), especially in storing and processing very large datasets to facilitate scientific discovery. Revealing the state of the art in this field, High Performance Parallel I/O draws on insights from leading practitioners, researchers, software architects, developers, and scientists who shed light on the parallel I/O ecosystem.The first part of the book explains how large-scale HPC facilities scope, configure, and operate systems, with an emphasis on choices of I/O har

  7. Teaching RLC Parallel Circuits in High-School Physics Class

    Science.gov (United States)

    Simon, Alpár

    2015-01-01

    This paper will try to give an alternative treatment of the subject "parallel RLC circuits" and "resonance in parallel RLC circuits" from the Physics curricula for the XIth grade from Romanian high-schools, with an emphasis on practical type circuits and their possible applications, and intends to be an aid for both Physics…

  8. Applications of Parallel Processing in Mobile Banking

    Directory of Open Access Journals (Sweden)

    2007-01-01

    Full Text Available The future of mobile banking will be represented by such applications that support mobile, Internet banking and EFT (Electronic Funds Transfer transactions in a single user interface. In such a way, the mobile banking will be able to cover all the types of applications demanded at the market level. The parallel processing of credit card bank transactions could be performed with the help of a grid network. Excluding some limitations, the grid processing offers huge opportunities to exploit the parallelism. For this reason, a lot of applications of waiting queues in grid processing were developed in the last years. Grid networks represent a distinctive and very modern field of the parallel and distributed processing.

  9. The ongoing investigation of high performance parallel computing in HEP

    CERN Document Server

    Peach, Kenneth J; Böck, R K; Dobinson, Robert W; Hansroul, M; Norton, Alan Robert; Willers, Ian Malcolm; Baud, J P; Carminati, F; Gagliardi, F; McIntosh, E; Metcalf, M; Robertson, L; CERN. Geneva. Detector Research and Development Committee

    1993-01-01

    Past and current exploitation of parallel computing in High Energy Physics is summarized and a list of R & D projects in this area is presented. The applicability of new parallel hardware and software to physics problems is investigated, in the light of the requirements for computing power of LHC experiments and the current trends in the computer industry. Four main themes are discussed (possibilities for a finer grain of parallelism; fine-grain communication mechanism; usable parallel programming environment; different programming models and architectures, using standard commercial products). Parallel computing technology is potentially of interest for offline and vital for real time applications in LHC. A substantial investment in applications development and evaluation of state of the art hardware and software products is needed. A solid development environment is required at an early stage, before mainline LHC program development begins.

  10. Multi-petascale highly efficient parallel supercomputer

    Science.gov (United States)

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2015-07-14

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

  11. Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

    Science.gov (United States)

    Sun, Xian-He

    1997-01-01

    Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm

  12. A high-speed linear algebra library with automatic parallelism

    Science.gov (United States)

    Boucher, Michael L.

    1994-01-01

    Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.

  13. Parallelism and Scalability in an Image Processing Application

    DEFF Research Database (Denmark)

    Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

    2008-01-01

    parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chip. This means that parallel processing is required in application areas that traditionally have not used...

  14. Parallelism and Scalability in an Image Processing Application

    DEFF Research Database (Denmark)

    Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

    2009-01-01

    parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chips. This means that parallel processing is required in application areas that traditionally have not used...

  15. Collectively loading an application in a parallel computer

    Science.gov (United States)

    Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.; Miller, Samuel J.; Mundy, Michael B.

    2016-01-05

    Collectively loading an application in a parallel computer, the parallel computer comprising a plurality of compute nodes, including: identifying, by a parallel computer control system, a subset of compute nodes in the parallel computer to execute a job; selecting, by the parallel computer control system, one of the subset of compute nodes in the parallel computer as a job leader compute node; retrieving, by the job leader compute node from computer memory, an application for executing the job; and broadcasting, by the job leader to the subset of compute nodes in the parallel computer, the application for executing the job.

  16. Parallelization and checkpointing of GPU applications through program transformation

    Energy Technology Data Exchange (ETDEWEB)

    Solano-Quinde, Lizandro Damian [Iowa State Univ., Ames, IA (United States)

    2012-01-01

    GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solve the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and

  17. Parallelization of an existing high energy physics event reconstruction software package

    International Nuclear Information System (INIS)

    Schiefer, R.; Francis, D.

    1996-01-01

    Software parallelization allows an efficient use of available computing power to increase the performance of applications. In a case study the authors have investigated the parallelization of high energy physics event reconstruction software in terms of costs (effort, computing resource requirements), benefits (performance increase) and the feasibility of a systematic parallelization approach. Guidelines facilitating a parallel implementation are proposed for future software development

  18. Parallel computing: numerics, applications, and trends

    National Research Council Canada - National Science Library

    Trobec, Roman; Vajteršic, Marián; Zinterhof, Peter

    2009-01-01

    ... and/or distributed systems. The contributions to this book are focused on topics most concerned in the trends of today's parallel computing. These range from parallel algorithmics, programming, tools, network computing to future parallel computing. Particular attention is paid to parallel numerics: linear algebra, differential equations, numerica...

  19. User-friendly parallelization of GAUDI applications with Python

    International Nuclear Information System (INIS)

    Mato, Pere; Smith, Eoin

    2010-01-01

    GAUDI is a software framework in C++ used to build event data processing applications using a set of standard components with well-defined interfaces. Simulation, high-level trigger, reconstruction, and analysis programs used by several experiments are developed using GAUDI. These applications can be configured and driven by simple Python scripts. Given the fact that a considerable amount of existing software has been developed using serial methodology, and has existed in some cases for many years, implementation of parallelisation techniques at the framework level may offer a way of exploiting current multi-core technologies to maximize performance and reduce latencies without re-writing thousands/millions of lines of code. In the solution we have developed, the parallelization techniques are introduced to the high level Python scripts which configure and drive the applications, such that the core C++ application code requires no modification, and that end users need make only minimal changes to their scripts. The developed solution leverages from existing generic Python modules that support parallel processing. Naturally, the parallel version of a given program should produce results consistent with its serial execution. The evaluation of several prototypes incorporating various parallelization techniques are presented and discussed.

  20. User-friendly parallelization of GAUDI applications with Python

    Energy Technology Data Exchange (ETDEWEB)

    Mato, Pere; Smith, Eoin, E-mail: pere.mato@cern.c [PH Department, CERN, 1211 Geneva 23 (Switzerland)

    2010-04-01

    GAUDI is a software framework in C++ used to build event data processing applications using a set of standard components with well-defined interfaces. Simulation, high-level trigger, reconstruction, and analysis programs used by several experiments are developed using GAUDI. These applications can be configured and driven by simple Python scripts. Given the fact that a considerable amount of existing software has been developed using serial methodology, and has existed in some cases for many years, implementation of parallelisation techniques at the framework level may offer a way of exploiting current multi-core technologies to maximize performance and reduce latencies without re-writing thousands/millions of lines of code. In the solution we have developed, the parallelization techniques are introduced to the high level Python scripts which configure and drive the applications, such that the core C++ application code requires no modification, and that end users need make only minimal changes to their scripts. The developed solution leverages from existing generic Python modules that support parallel processing. Naturally, the parallel version of a given program should produce results consistent with its serial execution. The evaluation of several prototypes incorporating various parallelization techniques are presented and discussed.

  1. Biomedical applications on the GRID efficient management of parallel jobs

    CERN Document Server

    Moscicki, Jakub T; Lee Hurng Chun; Lin, S C; Pia, Maria Grazia

    2004-01-01

    Distributed computing based on the Master-Worker and PULL interaction model is applicable to a number of applications in high energy physics, medical physics and bio-informatics. We demonstrate a realistic medical physics use-case of a dosimetric system for brachytherapy using distributed Grid resources. We present the efficient techniques for running parallel jobs in a case of the BLAST, a gene sequencing application, as well as for the Monte Carlo simulation based on Geant4. We present a strategy for improving the runtime performance and robustness of the jobs as well as for the minimization of the development time needed to migrate the applications to a distributed environment.

  2. Parallel Libraries to support High-Level Programming

    DEFF Research Database (Denmark)

    Larsen, Morten Nørgaard

    and the Microsoft .NET iv framework. Normally, one would not directly think of the .NET framework when talking scientific applications, but Microsoft has in the last couple of versions of .NET introduce a number of tools for writing parallel and high performance code. The first section examines how programmers can...

  3. Parallel Computing:. Some Activities in High Energy Physics

    Science.gov (United States)

    Willers, Ian

    This paper examines some activities in High Energy Physics that utilise parallel computing. The topic includes all computing from the proposed SIMD front end detectors, the farming applications, high-powered RISC processors and the large machines in the computer centers. We start by looking at the motivation behind using parallelism for general purpose computing. The developments around farming are then described from its simplest form to the more complex system in Fermilab. Finally, there is a list of some developments that are happening close to the experiments.

  4. Application of parallel processing for automatic inspection of printed circuits

    International Nuclear Information System (INIS)

    Lougheed, R.M.

    1986-01-01

    Automated visual inspection of printed electronic circuits is a challenging application for image processing systems. Detailed inspection requires high speed analysis of gray scale imagery along with high quality optics, lighting, and sensing equipment. A prototype system has been developed and demonstrated at the Environmental Research Institute of Michigan (ERIM) for inspection of multilayer thick-film circuits. The central problem of real-time image processing is solved by a special-purpose parallel processor which includes a new high-speed Cytocomputer. In this chapter the inspection process and the algorithms used are summarized, along with the functional requirements of the machine vision system. Next, the parallel processor is described in detail and then performance on this application is given

  5. High performance parallel backprojection on FPGA

    Energy Technology Data Exchange (ETDEWEB)

    Pfanner, Florian; Knaup, Michael; Kachelriess, Marc [Erlangen-Nuernberg Univ., Erlangen (Germany). Inst. of Medical Physics (IMP)

    2011-07-01

    Reconstruction of tomographic images, i.e., images from a Computed Tomography scanner, is a very time consuming issue. The most calculation power is needed for the backprojection step. A closer inspection shows that the algorithm for backprojection is easy to parallelize. FPGAs are able to execute many operations in the same time, so a highly parallel algorithm is a requirement for a powerful acceleration. For data flow rate maximization, we realized the backprojection in a pipelined structure with data throughput of one clock cycle. Due the hardware limitations of the FPGA, it is not possible to reconstruct the image as a whole. So it is necessary to split up the image and reconstruct these parts separately. Despite that, a reconstruction of 512 projections into a 5122 image is calculated within 13 ms on a Virtex 5 FPGA. To save hardware resources we use fixed point arithmetic with an accuracy of 23 bit for calculation. A comparison of the result image and an image, calculated with floating point arithmetic on CPU, shows that there are no differences between these images. (orig.)

  6. Applications of the parallel computing system using network

    International Nuclear Information System (INIS)

    Ido, Shunji; Hasebe, Hiroki

    1994-01-01

    Parallel programming is applied to multiple processors connected in Ethernet. Data exchanges between tasks located in each processing element are realized by two ways. One is socket which is standard library on recent UNIX operating systems. Another is a network connecting software, named as Parallel Virtual Machine (PVM) which is a free software developed by ORNL, to use many workstations connected to network as a parallel computer. This paper discusses the availability of parallel computing using network and UNIX workstations and comparison between specialized parallel systems (Transputer and iPSC/860) in a Monte Carlo simulation which generally shows high parallelization ratio. (author)

  7. Parallel science and engineering applications the Charm++ approach

    CERN Document Server

    Kale, Laxmikant V

    2016-01-01

    Developed in the context of science and engineering applications, with each abstraction motivated by and further honed by specific application needs, Charm++ is a production-quality system that runs on almost all parallel computers available. Parallel Science and Engineering Applications: The Charm++ Approach surveys a diverse and scalable collection of science and engineering applications, most of which are used regularly on supercomputers by scientists to further their research. After a brief introduction to Charm++, the book presents several parallel CSE codes written in the Charm++ model, along with their underlying scientific and numerical formulations, explaining their parallelization strategies and parallel performance. These chapters demonstrate the versatility of Charm++ and its utility for a wide variety of applications, including molecular dynamics, cosmology, quantum chemistry, fracture simulations, agent-based simulations, and weather modeling. The book is intended for a wide audience of people i...

  8. Parallel evolutionary computation in bioinformatics applications.

    Science.gov (United States)

    Pinho, Jorge; Sobral, João Luis; Rocha, Miguel

    2013-05-01

    A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  9. New high voltage parallel plate analyzer

    International Nuclear Information System (INIS)

    Hamada, Y.; Kawasumi, Y.; Masai, K.; Iguchi, H.; Fujisawa, A.; Abe, Y.

    1992-01-01

    A new modification on the parallel plate analyzer for 500 keV heavy ions to eliminate the effect of the intense UV and visible radiations, is successfully conducted. Its principle and results are discussed. (author)

  10. Parallel Microcracks-based Ultrasensitive and Highly Stretchable Strain Sensors.

    Science.gov (United States)

    Amjadi, Morteza; Turan, Mehmet; Clementson, Cameron P; Sitti, Metin

    2016-03-02

    There is an increasing demand for flexible, skin-attachable, and wearable strain sensors due to their various potential applications. However, achieving strain sensors with both high sensitivity and high stretchability is still a grand challenge. Here, we propose highly sensitive and stretchable strain sensors based on the reversible microcrack formation in composite thin films. Controllable parallel microcracks are generated in graphite thin films coated on elastomer films. Sensors made of graphite thin films with short microcracks possess high gauge factors (maximum value of 522.6) and stretchability (ε ≥ 50%), whereas sensors with long microcracks show ultrahigh sensitivity (maximum value of 11,344) with limited stretchability (ε ≤ 50%). We demonstrate the high performance strain sensing of our sensors in both small and large strain sensing applications such as human physiological activity recognition, human body large motion capturing, vibration detection, pressure sensing, and soft robotics.

  11. Running parallel applications with topology-aware grid middleware

    NARCIS (Netherlands)

    Bar, P.; Coti, C.; Groen, D.; Herault, T.; Kravtsov, V.; Schuster, A; Swain, M.

    2009-01-01

    The concept of topology-aware grid applications is derived from parallelized computational models of complex systems that are executed on heterogeneous resources, either because they require specialized hardware for certain calculations, or because their parallelization is flexible enough to exploit

  12. Emerging Nanophotonic Applications Explored with Advanced Scientific Parallel Computing

    Science.gov (United States)

    Meng, Xiang

    The domain of nanoscale optical science and technology is a combination of the classical world of electromagnetics and the quantum mechanical regime of atoms and molecules. Recent advancements in fabrication technology allows the optical structures to be scaled down to nanoscale size or even to the atomic level, which are far smaller than the wavelength they are designed for. These nanostructures can have unique, controllable, and tunable optical properties and their interactions with quantum materials can have important near-field and far-field optical response. Undoubtedly, these optical properties can have many important applications, ranging from the efficient and tunable light sources, detectors, filters, modulators, high-speed all-optical switches; to the next-generation classical and quantum computation, and biophotonic medical sensors. This emerging research of nanoscience, known as nanophotonics, is a highly interdisciplinary field requiring expertise in materials science, physics, electrical engineering, and scientific computing, modeling and simulation. It has also become an important research field for investigating the science and engineering of light-matter interactions that take place on wavelength and subwavelength scales where the nature of the nanostructured matter controls the interactions. In addition, the fast advancements in the computing capabilities, such as parallel computing, also become as a critical element for investigating advanced nanophotonic devices. This role has taken on even greater urgency with the scale-down of device dimensions, and the design for these devices require extensive memory and extremely long core hours. Thus distributed computing platforms associated with parallel computing are required for faster designs processes. Scientific parallel computing constructs mathematical models and quantitative analysis techniques, and uses the computing machines to analyze and solve otherwise intractable scientific challenges. In

  13. Application of parallel preprocessors in data acquisition

    International Nuclear Information System (INIS)

    Butler, H.S.; Cooper, M.D.; Williams, R.A.; Hughes, E.B.; Rolfe, J.R.; Wilson, S.L.; Zeman, H.D.

    1981-01-01

    A data-acquisition system is being developed for a large-scale experiment at LAMPF. It will make use of four microprocessors running in parallel to acquire and preprocess data from 432 photomultiplier tubes (PMT) attached to 396 NaI crystals. The microprocessors are LSI-11/23s operating through CAMAC Auxiliary Crate Controllers (ACC). Data acquired by the microprocessors will be collected through a programmable Branch Driver (MBD) which also will read data from 52 scintillators (88 PMTs) and 728 wires comprising a drift chamber. The MBD will transfer data from each event into a PDP-11/44 for further processing and taping. The microprocessors will perform the secondary function of monitoring the calibration of the NaI PMTs. A special trigger circuit allows the system to stack data from a second event while the first is still being processed. Major components of the system were tested in April 1981. Timing measurements from this test are reported

  14. GPU-based Parallel Application Design for Emerging Mobile Devices

    Science.gov (United States)

    Gupta, Kshitij

    A revolution is underway in the computing world that is causing a fundamental paradigm shift in device capabilities and form-factor, with a move from well-established legacy desktop/laptop computers to mobile devices in varying sizes and shapes. Amongst all the tasks these devices must support, graphics has emerged as the 'killer app' for providing a fluid user interface and high-fidelity game rendering, effectively making the graphics processor (GPU) one of the key components in (present and future) mobile systems. By utilizing the GPU as a general-purpose parallel processor, this dissertation explores the GPU computing design space from an applications standpoint, in the mobile context, by focusing on key challenges presented by these devices---limited compute, memory bandwidth, and stringent power consumption requirements---while improving the overall application efficiency of the increasingly important speech recognition workload for mobile user interaction. We broadly partition trends in GPU computing into four major categories. We analyze hardware and programming model limitations in current-generation GPUs and detail an alternate programming style called Persistent Threads, identify four use case patterns, and propose minimal modifications that would be required for extending native support. We show how by manually extracting data locality and altering the speech recognition pipeline, we are able to achieve significant savings in memory bandwidth while simultaneously reducing the compute burden on GPU-like parallel processors. As we foresee GPU computing to evolve from its current 'co-processor' model into an independent 'applications processor' that is capable of executing complex work independently, we create an alternate application framework that enables the GPU to handle all control-flow dependencies autonomously at run-time while minimizing host involvement to just issuing commands, that facilitates an efficient application implementation. Finally, as

  15. Scalability of Parallel Scientific Applications on the Cloud

    Directory of Open Access Journals (Sweden)

    Satish Narayana Srirama

    2011-01-01

    Full Text Available Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study the effects of moving parallel scientific applications onto the cloud, we deployed several benchmark applications like matrix–vector operations and NAS parallel benchmarks, and DOUG (Domain decomposition On Unstructured Grids on the cloud. DOUG is an open source software package for parallel iterative solution of very large sparse systems of linear equations. The detailed analysis of DOUG on the cloud showed that parallel applications benefit a lot and scale reasonable on the cloud. We could also observe the limitations of the cloud and its comparison with cluster in terms of performance. However, for efficiently running the scientific applications on the cloud infrastructure, the applications must be reduced to frameworks that can successfully exploit the cloud resources, like the MapReduce framework. Several iterative and embarrassingly parallel algorithms are reduced to the MapReduce model and their performance is measured and analyzed. The analysis showed that Hadoop MapReduce has significant problems with iterative methods, while it suits well for embarrassingly parallel algorithms. Scientific computing often uses iterative methods to solve large problems. Thus, for scientific computing on the cloud, this paper raises the necessity for better frameworks or optimizations for MapReduce.

  16. A development framework for parallel CFD applications: TRIOU project

    International Nuclear Information System (INIS)

    Calvin, Ch.

    2003-01-01

    We present in this paper the parallel structure of a thermal-hydraulic framework: Trio-U. This development platform has been designed in order to solve large 3-dimensional structured or unstructured CFD (computational fluid dynamics) problems. The code is intrinsically parallel, and an object-oriented design, UML, is used. The implementation language chosen is C++. All the parallelism management and the communication routines have been encapsulated. Parallel I/O and communication classes over standard I/O streams of C++ have been defined, which allows the developer an easy use of the different modules of the application without dealing with basic parallel process management and communications. Moreover, the encapsulation of the communication routines, guarantees the portability of the application and allows an efficient tuning of basic communication methods in order to achieve the best performances of the target architecture. The speed-up of parallel applications designed using the Trio U framework are very good since we obtained, for instance, on complex turbulent flow Large Eddy Simulation (LES) simulations an efficiency of up to 90% on 20 processors. The efficiencies obtained on direct numerical simulations of two phase flow fluids are similar since the speed-up is nearly equals to 7.5 for a 3-dimensional simulation using a one million element mesh on 8 processors. The purpose of this paper is to focus on the main concepts and their implementation that were the guidelines of the design of the parallel architecture of the code. (author)

  17. High Performance Parallel Multigrid Algorithms for Unstructured Grids

    Science.gov (United States)

    Frederickson, Paul O.

    1996-01-01

    We describe a high performance parallel multigrid algorithm for a rather general class of unstructured grid problems in two and three dimensions. The algorithm PUMG, for parallel unstructured multigrid, is related in structure to the parallel multigrid algorithm PSMG introduced by McBryan and Frederickson, for they both obtain a higher convergence rate through the use of multiple coarse grids. Another reason for the high convergence rate of PUMG is its smoother, an approximate inverse developed by Baumgardner and Frederickson.

  18. Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

    Science.gov (United States)

    Quealy, Angela; Cole, Gary L.; Blech, Richard A.

    1993-01-01

    The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.

  19. Parallel and distributed processing: applications to power systems

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Felix; Murphy, Liam [California Univ., Berkeley, CA (United States). Dept. of Electrical Engineering and Computer Sciences

    1994-12-31

    Applications of parallel and distributed processing to power systems problems are still in the early stages. Rapid progress in computing and communications promises a revolutionary increase in the capacity of distributed processing systems. In this paper, the state-of-the art in distributed processing technology and applications is reviewed and future trends are discussed. (author) 14 refs.,1 tab.

  20. Highly parallel line-based image coding for many cores.

    Science.gov (United States)

    Peng, Xiulian; Xu, Jizheng; Zhou, You; Wu, Feng

    2012-01-01

    Computers are developing along with a new trend from the dual-core and quad-core processors to ones with tens or even hundreds of cores. Multimedia, as one of the most important applications in computers, has an urgent need to design parallel coding algorithms for compression. Taking intraframe/image coding as a start point, this paper proposes a pure line-by-line coding scheme (LBLC) to meet the need. In LBLC, an input image is processed line by line sequentially, and each line is divided into small fixed-length segments. The compression of all segments from prediction to entropy coding is completely independent and concurrent at many cores. Results on a general-purpose computer show that our scheme can get a 13.9 times speedup with 15 cores at the encoder and a 10.3 times speedup at the decoder. Ideally, such near-linear speeding relation with the number of cores can be kept for more than 100 cores. In addition to the high parallelism, the proposed scheme can perform comparatively or even better than the H.264 high profile above middle bit rates. At near-lossless coding, it outperforms H.264 more than 10 dB. At lossless coding, up to 14% bit-rate reduction is observed compared with H.264 lossless coding at the high 4:4:4 profile.

  1. Parallelization of applications for networks with homogeneous and heterogeneous processors

    International Nuclear Information System (INIS)

    Colombet, L.

    1994-01-01

    The aim of this thesis is to study and develop efficient methods for parallelization of scientific applications on parallel computers with distributed memory. The first part presents two libraries of PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) communication tools. They allow implementation of programs on most parallel machines, but also on heterogeneous computer networks. This chapter illustrates the problems faced when trying to evaluate performances of networks with heterogeneous processors. To evaluate such performances, the concepts of speed-up and efficiency have been modified and adapted to account for heterogeneity. The second part deals with a study of parallel application libraries such as ScaLAPACK and with the development of communication masking techniques. The general concept is based on communication anticipation, in particular by pipelining message sending operations. Experimental results on Cray T3D and IBM SP1 machines validates the theoretical studies performed on basic algorithms of the libraries discussed above. Two examples of scientific applications are given: the first is a model of young stars for astrophysics and the other is a model of photon trajectories in the Compton effect. (J.S.). 83 refs., 65 figs., 24 tabs

  2. A highly parallel algorithm for track finding

    International Nuclear Information System (INIS)

    Dell'Orso, M.

    1990-01-01

    We describe a very fast algorithm for track finding, which is applicable to a whole class of detectors like drift chambers, silicon microstrip detectors, etc. The algorithm uses a pattern bank stored in a large memory and organized into a tree structure. (orig.)

  3. ADL: a graphical design language for real time parallel applications

    NARCIS (Netherlands)

    M.R. van Steen; T. Vogel; A. ten Dam

    1993-01-01

    textabstractDesigning parallel applications is generally experienced as a tedious and difficult task, especially when hard real-time performance requirements have to be met. This paper discusses on-going work concerning the construction of a Design Entry System which supports the design phase of

  4. A new bipolar RRAM selector based on anti-parallel connected diodes for crossbar applications

    International Nuclear Information System (INIS)

    Li, Yingtao; Gong, Qingchun; Li, Rongrong; Jiang, Xinyu

    2014-01-01

    Crossbar arrays are the most promising application of a resistive random access memory (RRAM) device for achieving high density memory. However, cross-talk interference in the crossbar array limits the increase in the integration density. In this paper, the combination of two anti-parallel connected diodes and a bipolar RRAM cell is proposed to suppress the sneak current in a crossbar array with anti-parallel connected diodes as the selector for the bipolar RRAM. By using the anti-parallel connected diodes as a selector, the sneak current can be effectively suppressed and the high density crossbar array of more than 1 Mb can be realized as estimated by the 1/2V read voltage scheme. These results indicate that anti-parallel connected diodes can be used as a bipolar selector and have great potential for high density bipolar RRAM crossbar array applications. (papers)

  5. Overview of Parallel Platforms for Common High Performance Computing

    Directory of Open Access Journals (Sweden)

    T. Fryza

    2012-04-01

    Full Text Available The paper deals with various parallel platforms used for high performance computing in the signal processing domain. More precisely, the methods exploiting the multicores central processing units such as message passing interface and OpenMP are taken into account. The properties of the programming methods are experimentally proved in the application of a fast Fourier transform and a discrete cosine transform and they are compared with the possibilities of MATLAB's built-in functions and Texas Instruments digital signal processors with very long instruction word architectures. New FFT and DCT implementations were proposed and tested. The implementation phase was compared with CPU based computing methods and with possibilities of the Texas Instruments digital signal processing library on C6747 floating-point DSPs. The optimal combination of computing methods in the signal processing domain and new, fast routines' implementation is proposed as well.

  6. Development of Industrial High-Speed Transfer Parallel Robot

    International Nuclear Information System (INIS)

    Kim, Byung In; Kyung, Jin Ho; Do, Hyun Min; Jo, Sang Hyun

    2013-01-01

    Parallel robots used in industry require high stiffness or high speed because of their structural characteristics. Nowadays, the importance of rapid transportation has increased in the distribution industry. In this light, an industrial parallel robot has been developed for high-speed transfer. The developed parallel robot can handle a maximum payload of 3 kg. For a payload of 0.1 kg, the trajectory cycle time is 0.3 s (come and go), and the maximum velocity is 4.5 m/s (pick amp, place work, adept cycle). In this motion, its maximum acceleration is very high and reaches approximately 13g. In this paper, the design, analysis, and performance test results of the developed parallel robot system are introduced

  7. Implementation of a high performance parallel finite element micromagnetics package

    International Nuclear Information System (INIS)

    Scholz, W.; Suess, D.; Dittrich, R.; Schrefl, T.; Tsiantos, V.; Forster, H.; Fidler, J.

    2004-01-01

    A new high performance scalable parallel finite element micromagnetics package has been implemented. It includes solvers for static energy minimization, time integration of the Landau-Lifshitz-Gilbert equation, and the nudged elastic band method

  8. Parallel operation of voltage-source converters: issues and applications

    Energy Technology Data Exchange (ETDEWEB)

    Almeida, F.C.B.; Silva, D.S. [Federal University of Juiz de Fora (UFJF), MG (Brazil)], Emails: felipe.brum@engenharia.ufjf.br, salomaoime@yahoo.com.br; Ribeiro, P.F. [Calvin College, Grand Rapids, MI (United States); Federal University of Juiz de Fora (UFJF), MG (Brazil)], E-mail: pfribeiro@ieee.org

    2009-07-01

    Technological advancements in power electronics have prompted the development of advanced AC/DC conversion systems with high efficiency and flexible performance. Among these devices, the Voltage-Source Converter (VSC) has become an essential building block. This paper considers the parallel operation of VSCs under different system conditions and how they can assist the operation of highly complex power networks. A multi-terminal VSC-based High Voltage Direct Current (M-VSC-HVDC) system is chosen to be modeled, simulated and then analyzed as an example of VSCs operating in parallel. (author)

  9. Designing a High Performance Parallel Personal Cluster

    OpenAIRE

    Kapanova, K. G.; Sellier, J. M.

    2016-01-01

    Today, many scientific and engineering areas require high performance computing to perform computationally intensive experiments. For example, many advances in transport phenomena, thermodynamics, material properties, computational chemistry and physics are possible only because of the availability of such large scale computing infrastructures. Yet many challenges are still open. The cost of energy consumption, cooling, competition for resources have been some of the reasons why the scientifi...

  10. High-energy physics software parallelization using database techniques

    International Nuclear Information System (INIS)

    Argante, E.; Van der Stok, P.D.V.; Willers, I.

    1997-01-01

    A programming model for software parallelization, called CoCa, is introduced that copes with problems caused by typical features of high-energy physics software. By basing CoCa on the database transaction paradigm, the complexity induced by the parallelization is for a large part transparent to the programmer, resulting in a higher level of abstraction than the native message passing software. CoCa is implemented on a Meiko CS-2 and on a SUN SPARCcenter 2000 parallel computer. On the CS-2, the performance is comparable with the performance of native PVM and MPI. (orig.)

  11. Evaluation of technologies of parallel computers. Communication networks for a real-time triggering application for a high-energy physics experiment at CERN

    International Nuclear Information System (INIS)

    Hoertnagl, Ch.

    1997-12-01

    Experiments at the future Large Hadron Collider (LHC) at CERN will be faced with an extraordinary challenge of event selection in real time. The primary event rate, equal to the bunch crossing frequency of 40 MHz, will have to be reduced by a factor of almost one-in-a-million in order to reveal traces of rare physics processes from an abundant background. This work presents various contributions to ongoing feasibility studies concerning the possible use of commercial technologies from the proximities of parallel computers and their communication networks for the second trigger stage, which faces an average data input rate of 100 kHz. Studies in this thesis apply a combination of methodologies, namely the build-up of lab-scale prototype implementations (including their exposition to test beam runs), algorithm development, technology tracking and benchmarking, as well as discrete event simulation. The main contribution consists of several technology case studies, which are based on the exploration of a set of standard benchmark programs for revealing simple parameters for characterizing delays during communication. Studied technologies include the communication sub-system of the Meiko CS-2, Asynchronous Transfer Mode (ATM), MEMORY CHANNEL, and Scalable Coherent Interface (SCI); all could be considered typical for candidate technologies. The discussion sheds light on the relative benefits and costs associated with different parallel programming models, in general, and with the use of message-passing libraries, such as Message Passing Interface (MPI), in particular. Best observed end-user-to-end-user latencies were ∼ 10 μs, best asymptotic bandwidths were ∼ 70 MByte/s. Typical sub-patterns of communication that have to be applied in the second trigger stage were sustained at ∼ 13 kHz, using today's technologies in realistic embeddings. (author)

  12. A highly scalable massively parallel fast marching method for the Eikonal equation

    Science.gov (United States)

    Yang, Jianming; Stern, Frederick

    2017-03-01

    The fast marching method is a widely used numerical method for solving the Eikonal equation arising from a variety of scientific and engineering fields. It is long deemed inherently sequential and an efficient parallel algorithm applicable to large-scale practical applications is not available in the literature. In this study, we present a highly scalable massively parallel implementation of the fast marching method using a domain decomposition approach. Central to this algorithm is a novel restarted narrow band approach that coordinates the frequency of communications and the amount of computations extra to a sequential run for achieving an unprecedented parallel performance. Within each restart, the narrow band fast marching method is executed; simple synchronous local exchanges and global reductions are adopted for communicating updated data in the overlapping regions between neighboring subdomains and getting the latest front status, respectively. The independence of front characteristics is exploited through special data structures and augmented status tags to extract the masked parallelism within the fast marching method. The efficiency, flexibility, and applicability of the parallel algorithm are demonstrated through several examples. These problems are extensively tested on six grids with up to 1 billion points using different numbers of processes ranging from 1 to 65536. Remarkable parallel speedups are achieved using tens of thousands of processes. Detailed pseudo-codes for both the sequential and parallel algorithms are provided to illustrate the simplicity of the parallel implementation and its similarity to the sequential narrow band fast marching algorithm.

  13. The FORCE: A highly portable parallel programming language

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.

  14. The FORCE - A highly portable parallel programming language

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.

  15. Parallel computing in genomic research: advances and applications

    Directory of Open Access Journals (Sweden)

    Ocaña K

    2015-11-01

    Full Text Available Kary Ocaña,1 Daniel de Oliveira2 1National Laboratory of Scientific Computing, Petrópolis, Rio de Janeiro, 2Institute of Computing, Fluminense Federal University, Niterói, Brazil Abstract: Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities. Keywords: high-performance computing, genomic research, cloud computing, grid computing, cluster computing, parallel computing

  16. A parallel solution for high resolution histological image analysis.

    Science.gov (United States)

    Bueno, G; González, R; Déniz, O; García-Rojo, M; González-García, J; Fernández-Carrobles, M M; Vállez, N; Salido, J

    2012-10-01

    This paper describes a general methodology for developing parallel image processing algorithms based on message passing for high resolution images (on the order of several Gigabytes). These algorithms have been applied to histological images and must be executed on massively parallel processing architectures. Advances in new technologies for complete slide digitalization in pathology have been combined with developments in biomedical informatics. However, the efficient use of these digital slide systems is still a challenge. The image processing that these slides are subject to is still limited both in terms of data processed and processing methods. The work presented here focuses on the need to design and develop parallel image processing tools capable of obtaining and analyzing the entire gamut of information included in digital slides. Tools have been developed to assist pathologists in image analysis and diagnosis, and they cover low and high-level image processing methods applied to histological images. Code portability, reusability and scalability have been tested by using the following parallel computing architectures: distributed memory with massive parallel processors and two networks, INFINIBAND and Myrinet, composed of 17 and 1024 nodes respectively. The parallel framework proposed is flexible, high performance solution and it shows that the efficient processing of digital microscopic images is possible and may offer important benefits to pathology laboratories. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  17. Parallel real-time visualization system for large-scale simulation. Application to WSPEEDI

    International Nuclear Information System (INIS)

    Muramatsu, Kazuhiro; Otani, Takayuki; Kitabata, Hideyuki; Matsumoto, Hideki; Takei, Toshifumi; Doi, Shun

    2000-01-01

    The real-time visualization system, PATRAS (PArallel TRAcking Steering system) has been developed on parallel computing servers. The system performs almost all of the visualization tasks on a parallel computing server, and uses image data compression technique for efficient communication between the server and the client terminal. Therefore, the system realizes high performance concurrent visualization in an internet computing environment. The experience in applying PATRAS to WSPEEDI (Worldwide version of System for Prediction Environmental Emergency Dose Information) is reported. The application of PATRAS to WSPEEDI enables users to understand behaviours of radioactive tracers from different release points easily and quickly. (author)

  18. A Generic Mesh Data Structure with Parallel Applications

    Science.gov (United States)

    Cochran, William Kenneth, Jr.

    2009-01-01

    High performance, massively-parallel multi-physics simulations are built on efficient mesh data structures. Most data structures are designed from the bottom up, focusing on the implementation of linear algebra routines. In this thesis, we explore a top-down approach to design, evaluating the various needs of many aspects of simulation, not just…

  19. High Efficiency EBCOT with Parallel Coding Architecture for JPEG2000

    Directory of Open Access Journals (Sweden)

    Chiang Jen-Shiun

    2006-01-01

    Full Text Available This work presents a parallel context-modeling coding architecture and a matching arithmetic coder (MQ-coder for the embedded block coding (EBCOT unit of the JPEG2000 encoder. Tier-1 of the EBCOT consumes most of the computation time in a JPEG2000 encoding system. The proposed parallel architecture can increase the throughput rate of the context modeling. To match the high throughput rate of the parallel context-modeling architecture, an efficient pipelined architecture for context-based adaptive arithmetic encoder is proposed. This encoder of JPEG2000 can work at 180 MHz to encode one symbol each cycle. Compared with the previous context-modeling architectures, our parallel architectures can improve the throughput rate up to 25%.

  20. High temporal resolution functional MRI using parallel echo volumar imaging

    International Nuclear Information System (INIS)

    Rabrait, C.; Ciuciu, P.; Ribes, A.; Poupon, C.; Dehaine-Lambertz, G.; LeBihan, D.; Lethimonnier, F.; Le Roux, P.; Dehaine-Lambertz, G.

    2008-01-01

    Purpose: To combine parallel imaging with 3D single-shot acquisition (echo volumar imaging, EVI) in order to acquire high temporal resolution volumar functional MRI (fMRI) data. Materials and Methods: An improved EVI sequence was associated with parallel acquisition and field of view reduction in order to acquire a large brain volume in 200 msec. Temporal stability and functional sensitivity were increased through optimization of all imaging parameters and Tikhonov regularization of parallel reconstruction. Two human volunteers were scanned with parallel EVI in a 1.5 T whole-body MR system, while submitted to a slow event-related auditory paradigm. Results: Thanks to parallel acquisition, the EVI volumes display a low level of geometric distortions and signal losses. After removal of low-frequency drifts and physiological artifacts,activations were detected in the temporal lobes of both volunteers and voxel-wise hemodynamic response functions (HRF) could be computed. On these HRF different habituation behaviors in response to sentence repetition could be identified. Conclusion: This work demonstrates the feasibility of high temporal resolution 3D fMRI with parallel EVI. Combined with advanced estimation tools,this acquisition method should prove useful to measure neural activity timing differences or study the nonlinearities and non-stationarities of the BOLD response. (authors)

  1. Final Report: Migration Mechanisms for Large-scale Parallel Applications

    Energy Technology Data Exchange (ETDEWEB)

    Jason Nieh

    2009-10-30

    Process migration is the ability to transfer a process from one machine to another. It is a useful facility in distributed computing environments, especially as computing devices become more pervasive and Internet access becomes more ubiquitous. The potential benefits of process migration, among others, are fault resilience by migrating processes off of faulty hosts, data access locality by migrating processes closer to the data, better system response time by migrating processes closer to users, dynamic load balancing by migrating processes to less loaded hosts, and improved service availability and administration by migrating processes before host maintenance so that applications can continue to run with minimal downtime. Although process migration provides substantial potential benefits and many approaches have been considered, achieving transparent process migration functionality has been difficult in practice. To address this problem, our work has designed, implemented, and evaluated new and powerful transparent process checkpoint-restart and migration mechanisms for desktop, server, and parallel applications that operate across heterogeneous cluster and mobile computing environments. A key aspect of this work has been to introduce lightweight operating system virtualization to provide processes with private, virtual namespaces that decouple and isolate processes from dependencies on the host operating system instance. This decoupling enables processes to be transparently checkpointed and migrated without modifying, recompiling, or relinking applications or the operating system. Building on this lightweight operating system virtualization approach, we have developed novel technologies that enable (1) coordinated, consistent checkpoint-restart and migration of multiple processes, (2) fast checkpointing of process and file system state to enable restart of multiple parallel execution environments and time travel, (3) process migration across heterogeneous

  2. High spatial resolution CT image reconstruction using parallel computing

    International Nuclear Information System (INIS)

    Yin Yin; Liu Li; Sun Gongxing

    2003-01-01

    Using the PC cluster system with 16 dual CPU nodes, we accelerate the FBP and OR-OSEM reconstruction of high spatial resolution image (2048 x 2048). Based on the number of projections, we rewrite the reconstruction algorithms into parallel format and dispatch the tasks to each CPU. By parallel computing, the speedup factor is roughly equal to the number of CPUs, which can be up to about 25 times when 25 CPUs used. This technique is very suitable for real-time high spatial resolution CT image reconstruction. (authors)

  3. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    International Nuclear Information System (INIS)

    Nash, T.

    1989-05-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. 6 figs

  4. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    International Nuclear Information System (INIS)

    Nash, T.

    1989-01-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. (orig.)

  5. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    Science.gov (United States)

    Nash, Thomas

    1989-12-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.

  6. A parallelization study of the general purpose Monte Carlo code MCNP4 on a distributed memory highly parallel computer

    International Nuclear Information System (INIS)

    Yamazaki, Takao; Fujisaki, Masahide; Okuda, Motoi; Takano, Makoto; Masukawa, Fumihiro; Naito, Yoshitaka

    1993-01-01

    The general purpose Monte Carlo code MCNP4 has been implemented on the Fujitsu AP1000 distributed memory highly parallel computer. Parallelization techniques developed and studied are reported. A shielding analysis function of the MCNP4 code is parallelized in this study. A technique to map a history to each processor dynamically and to map control process to a certain processor was applied. The efficiency of parallelized code is up to 80% for a typical practical problem with 512 processors. These results demonstrate the advantages of a highly parallel computer to the conventional computers in the field of shielding analysis by Monte Carlo method. (orig.)

  7. Parallel computing for event reconstruction in high-energy physics

    International Nuclear Information System (INIS)

    Wolbers, S.

    1993-01-01

    Parallel computing has been recognized as a solution to large computing problems. In High Energy Physics offline event reconstruction of detector data is a very large computing problem that has been solved with parallel computing techniques. A review of the parallel programming package CPS (Cooperative Processes Software) developed and used at Fermilab for offline reconstruction of Terabytes of data requiring the delivery of hundreds of Vax-Years per experiment is given. The Fermilab UNIX farms, consisting of 180 Silicon Graphics workstations and 144 IBM RS6000 workstations, are used to provide the computing power for the experiments. Fermilab has had a long history of providing production parallel computing starting with the ACP (Advanced Computer Project) Farms in 1986. The Fermilab UNIX Farms have been in production for over 2 years with 24 hour/day service to experimental user groups. Additional tools for management, control and monitoring these large systems will be described. Possible future directions for parallel computing in High Energy Physics will be given

  8. Parallel computing in genomic research: advances and applications.

    Science.gov (United States)

    Ocaña, Kary; de Oliveira, Daniel

    2015-01-01

    Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC) environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities.

  9. High performance parallel computers for science: New developments at the Fermilab advanced computer program

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.

    1988-08-01

    Fermilab's Advanced Computer Program (ACP) has been developing highly cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 MFlops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction. 10 refs., 7 figs

  10. Control of parallel-connected bidirectional AC-DC converters in stationary frame for microgrid application

    DEFF Research Database (Denmark)

    Lu, Xiaonan; Guerrero, Josep M.; Teodorescu, Remus

    2011-01-01

    With the penetration of renewable energy in modern power system, microgrid has become a popular application worldwide. In this paper, parallel-connected bidirectional converters for AC and DC hybrid microgrid application are proposed as an efficient interface. To reach the goal of bidirectional...... power conversion, both rectifier and inverter modes are analyzed. In order to achieve high performance operation, hierarchical control system is accomplished. The control system is designed in stationary frame, with harmonic compensation in parallel and no coupled terms between axes. In this control...

  11. Analysis and Design of High-Order Parallel Resonant Converters

    Science.gov (United States)

    Batarseh, Issa Eid

    1990-01-01

    In this thesis, a special state variable transformation technique has been derived for the analysis of high order dc-to-dc resonant converters. Converters comprised of high order resonant tanks have the advantage of utilizing the parasitic elements by making them part of the resonant tank. A new set of state variables is defined in order to make use of two-dimensional state-plane diagrams in the analysis of high order converters. Such a method has been successfully used for the analysis of the conventional Parallel Resonant Converters (PRC). Consequently, two -dimensional state-plane diagrams are used to analyze the steady state response for third and fourth order PRC's when these converters are operated in the continuous conduction mode. Based on this analysis, a set of control characteristic curves for the LCC-, LLC- and LLCC-type PRC are presented from which various converter design parameters are obtained. Various design curves for component value selections and device ratings are given. This analysis of high order resonant converters shows that the addition of the reactive components to the resonant tank results in converters with better performance characteristics when compared with the conventional second order PRC. Complete design procedure along with design examples for 2nd, 3rd and 4th order converters are presented. Practical power supply units, normally used for computer applications, were built and tested by using the LCC-, LLC- and LLCC-type commutation schemes. In addition, computer simulation results are presented for these converters in order to verify the theoretical results.

  12. Language interoperability for high-performance parallel scientific components

    International Nuclear Information System (INIS)

    Elliot, N; Kohn, S; Smolinski, B

    1999-01-01

    With the increasing complexity and interdisciplinary nature of scientific applications, code reuse is becoming increasingly important in scientific computing. One method for facilitating code reuse is the use of components technologies, which have been used widely in industry. However, components have only recently worked their way into scientific computing. Language interoperability is an important underlying technology for these component architectures. In this paper, we present an approach to language interoperability for a high-performance parallel, component architecture being developed by the Common Component Architecture (CCA) group. Our approach is based on Interface Definition Language (IDL) techniques. We have developed a Scientific Interface Definition Language (SIDL), as well as bindings to C and Fortran. We have also developed a SIDL compiler and run-time library support for reference counting, reflection, object management, and exception handling (Babel). Results from using Babel to call a standard numerical solver library (written in C) from C and Fortran show that the cost of using Babel is minimal, where as the savings in development time and the benefits of object-oriented development support for C and Fortran far outweigh the costs

  13. Remote parallel rendering for high-resolution tiled display walls

    KAUST Repository

    Nachbaur, Daniel

    2014-11-01

    © 2014 IEEE. We present a complete, robust and simple to use hardware and software stack delivering remote parallel rendering of complex geometrical and volumetric models to high resolution tiled display walls in a production environment. We describe the setup and configuration, present preliminary benchmarks showing interactive framerates, and describe our contributions for a seamless integration of all the software components.

  14. Remote parallel rendering for high-resolution tiled display walls

    KAUST Repository

    Nachbaur, Daniel; Dumusc, Raphael; Bilgili, Ahmet; Hernando, Juan; Eilemann, Stefan

    2014-01-01

    © 2014 IEEE. We present a complete, robust and simple to use hardware and software stack delivering remote parallel rendering of complex geometrical and volumetric models to high resolution tiled display walls in a production environment. We describe the setup and configuration, present preliminary benchmarks showing interactive framerates, and describe our contributions for a seamless integration of all the software components.

  15. High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures

    Directory of Open Access Journals (Sweden)

    H. Y. Su

    2012-04-01

    Full Text Available This article presents two high-efficient parallel realizations of the context-based adaptive variable length coding (CAVLC based on heterogeneous multicore processors. By optimizing the architecture of the CAVLC encoder, three kinds of dependences are eliminated or weaken, including the context-based data dependence, the memory accessing dependence and the control dependence. The CAVLC pipeline is divided into three stages: two scans, coding, and lag packing, and be implemented on two typical heterogeneous multicore architectures. One is a block-based SIMD parallel CAVLC encoder on multicore stream processor STORM. The other is a component-oriented SIMT parallel encoder on massively parallel architecture GPU. Both of them exploited rich data-level parallelism. Experiments results show that compared with the CPU version, more than 70 times of speedup can be obtained for STORM and over 50 times for GPU. The implementation of encoder on STORM can make a real-time processing for 1080p @30fps and GPU-based version can satisfy the requirements for 720p real-time encoding. The throughput of the presented CAVLC encoders is more than 10 times higher than that of published software encoders on DSP and multicore platforms.

  16. Parallel application of plasma equilibrium fitting based on inhomogeneous platforms

    International Nuclear Information System (INIS)

    Liao Min; Zhang Jinhua; Chen Liaoyuan; Li Yongge; Pan Wei; Pan Li

    2008-01-01

    An online analysis and online display platform EFIT, which is based on the equilibrium-fitting mode, is inducted in this paper. This application can realize large data transportation between inhomogeneous platforms by designing a communication mechanism using sockets. It spends approximately one minute to complete the equilibrium fitting reconstruction by using a finite state machine to describe the management node and several node computers of cluster system to fulfill the parallel computation, this satisfies the online display during the discharge interval. An effective communication model between inhomogeneous platforms is provided, which could transport the computing results from Linux platform to Windows platform for online analysis and display. (authors)

  17. Integrated computer network high-speed parallel interface

    International Nuclear Information System (INIS)

    Frank, R.B.

    1979-03-01

    As the number and variety of computers within Los Alamos Scientific Laboratory's Central Computer Facility grows, the need for a standard, high-speed intercomputer interface has become more apparent. This report details the development of a High-Speed Parallel Interface from conceptual through implementation stages to meet current and future needs for large-scle network computing within the Integrated Computer Network. 4 figures

  18. Adaptive Load Balancing of Parallel Applications with Multi-Agent Reinforcement Learning on Heterogeneous Systems

    Directory of Open Access Journals (Sweden)

    Johan Parent

    2004-01-01

    Full Text Available We report on the improvements that can be achieved by applying machine learning techniques, in particular reinforcement learning, for the dynamic load balancing of parallel applications. The applications being considered in this paper are coarse grain data intensive applications. Such applications put high pressure on the interconnect of the hardware. Synchronization and load balancing in complex, heterogeneous networks need fast, flexible, adaptive load balancing algorithms. Viewing a parallel application as a one-state coordination game in the framework of multi-agent reinforcement learning, and by using a recently introduced multi-agent exploration technique, we are able to improve upon the classic job farming approach. The improvements are achieved with limited computation and communication overhead.

  19. GROMACS 4.5: A high-throughput and highly parallel open source molecular simulation toolkit

    Energy Technology Data Exchange (ETDEWEB)

    Pronk, Sander [Science for Life Lab., Stockholm (Sweden); KTH Royal Institute of Technology, Stockholm (Sweden); Pall, Szilard [Science for Life Lab., Stockholm (Sweden); KTH Royal Institute of Technology, Stockholm (Sweden); Schulz, Roland [Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Larsson, Per [Univ. of Virginia, Charlottesville, VA (United States); Bjelkmar, Par [Science for Life Lab., Stockholm (Sweden); Stockholm Univ., Stockholm (Sweden); Apostolov, Rossen [Science for Life Lab., Stockholm (Sweden); KTH Royal Institute of Technology, Stockholm (Sweden); Shirts, Michael R. [Univ. of Virginia, Charlottesville, VA (United States); Smith, Jeremy C. [Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Kasson, Peter M. [Univ. of Virginia, Charlottesville, VA (United States); van der Spoel, David [Science for Life Lab., Stockholm (Sweden); Uppsala Univ., Uppsala (Sweden); Hess, Berk [Science for Life Lab., Stockholm (Sweden); KTH Royal Institute of Technology, Stockholm (Sweden); Lindahl, Erik [Science for Life Lab., Stockholm (Sweden); KTH Royal Institute of Technology, Stockholm (Sweden); Stockholm Univ., Stockholm (Sweden)

    2013-02-13

    In this study, molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. As a result, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations.

  20. GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit.

    Science.gov (United States)

    Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R; Smith, Jeremy C; Kasson, Peter M; van der Spoel, David; Hess, Berk; Lindahl, Erik

    2013-04-01

    Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. GROMACS is an open source and free software available from http://www.gromacs.org. Supplementary data are available at Bioinformatics online.

  1. 7th International Workshop on Parallel Tools for High Performance Computing

    CERN Document Server

    Gracia, José; Nagel, Wolfgang; Resch, Michael

    2014-01-01

    Current advances in High Performance Computing (HPC) increasingly impact efficient software development workflows. Programmers for HPC applications need to consider trends such as increased core counts, multiple levels of parallelism, reduced memory per core, and I/O system challenges in order to derive well performing and highly scalable codes. At the same time, the increasing complexity adds further sources of program defects. While novel programming paradigms and advanced system libraries provide solutions for some of these challenges, appropriate supporting tools are indispensable. Such tools aid application developers in debugging, performance analysis, or code optimization and therefore make a major contribution to the development of robust and efficient parallel software. This book introduces a selection of the tools presented and discussed at the 7th International Parallel Tools Workshop, held in Dresden, Germany, September 3-4, 2013.  

  2. Devices Based on Parallel-Plate Waveguides for Terahertz Applications

    Science.gov (United States)

    Reichel, Kimberly S.

    The promise of terahertz (THz) frequencies for technological applications is wide, spanning from wireless communications for faster downloads to non-destructive imaging for security screening. Although the potential is high, there is a lack of the basic devices necessary to make these prospects a reality. One essential component for any electromagnetic wave technology is a waveguide, which as the name implies can guide light waves, like a hose would direct water from the source to the desired target location. Several waveguide types have been introduced for THz frequencies, one of the most promising of which is the parallel-plate waveguide (PPWG). The PPWG is attractive based on its superior waveguiding performance of efficient input coupling and low losses, but additionally it serves as an excellent platform for other purposes. The projects presented in this dissertation highlight a few new functionalities incorporated into, and enabled by, a PPWG for sensing, filtering, and splitting. First, we characterize a high quality factor resonator integrated into a PPWG used for microfluidic sensing. Typically, the characterization of the frequency-dependent electric field profile inside a narrowband resonator is challenging, either due to limited optical access or to the perturbative effects of invasive probes. In our situation however, the geometry of the PPWG allows for direct access to the resonant cavity via the open sides of the waveguide and a novel implementation of the air-biased coherent detection (ABCD) method permits non-invasive probing. Through both experiment and simulation, we see the narrowband frequencies trapped in the resonator and also discover an unexpected broadband asymmetric field distribution due to the resonator inside the waveguide, yielding new information that is not available in the far field. Second, we investigate a narrowband tunable filter based on extraordinary optical transmission (EOT) through a 1D array of subwavelength holes inside

  3. Highly scalable parallel processing of extracellular recordings of Multielectrode Arrays.

    Science.gov (United States)

    Gehring, Tiago V; Vasilaki, Eleni; Giugliano, Michele

    2015-01-01

    Technological advances of Multielectrode Arrays (MEAs) used for multisite, parallel electrophysiological recordings, lead to an ever increasing amount of raw data being generated. Arrays with hundreds up to a few thousands of electrodes are slowly seeing widespread use and the expectation is that more sophisticated arrays will become available in the near future. In order to process the large data volumes resulting from MEA recordings there is a pressing need for new software tools able to process many data channels in parallel. Here we present a new tool for processing MEA data recordings that makes use of new programming paradigms and recent technology developments to unleash the power of modern highly parallel hardware, such as multi-core CPUs with vector instruction sets or GPGPUs. Our tool builds on and complements existing MEA data analysis packages. It shows high scalability and can be used to speed up some performance critical pre-processing steps such as data filtering and spike detection, helping to make the analysis of larger data sets tractable.

  4. Vortex structure behind highly heated two cylinders in parallel arrangements

    International Nuclear Information System (INIS)

    Kurita, Eiichirou; Yahagi, Yuji

    2008-01-01

    Vortex structures behind twin, highly heated cylinders in parallel arrangements have been investigated experimentally. The experiments were conducted under the following conditions: cylinder diameter, D=4 mm; mean flow velocity, U ∞ =1.0 m/s; Reynolds number, Re=250; cylinder clearance, S/D=0.5 - 1.4; and cylinder heat flux, q=0 - 72.6 kW/m 2 . For S/D > 1.2, the Karman vortex street is formed alternately behind each cylinder divided on the slit flow. The slit flow velocity increases with a decrease in S/D and decreases with increasing heat flux. For S/D 2 ). As a result, the increased local kinematic viscosity and S/D play a key role for the vortex structure and formation behind arrangements of two parallel cylinders. (author)

  5. Particle simulation on a distributed memory highly parallel processor

    International Nuclear Information System (INIS)

    Sato, Hiroyuki; Ikesaka, Morio

    1990-01-01

    This paper describes parallel molecular dynamics simulation of atoms governed by local force interaction. The space in the model is divided into cubic subspaces and mapped to the processor array of the CAP-256, a distributed memory, highly parallel processor developed at Fujitsu Labs. We developed a new technique to avoid redundant calculation of forces between atoms in different processors. Experiments showed the communication overhead was less than 5%, and the idle time due to load imbalance was less than 11% for two model problems which contain 11,532 and 46,128 argon atoms. From the software simulation, the CAP-II which is under development is estimated to be about 45 times faster than CAP-256 and will be able to run the same problem about 40 times faster than Fujitsu's M-380 mainframe when 256 processors are used. (author)

  6. Parallel Application Development Using Architecture View Driven Model Transformations

    NARCIS (Netherlands)

    Arkin, E.; Tekinerdogan, B.

    2015-01-01

    o realize the increased need for computing performance the current trend is towards applying parallel computing in which the tasks are run in parallel on multiple nodes. On its turn we can observe the rapid increase of the scale of parallel computing platforms. This situation has led to a complexity

  7. Direct drive digital servo press with high parallel control

    Science.gov (United States)

    Murata, Chikara; Yabe, Jun; Endou, Junichi; Hasegawa, Kiyoshi

    2013-12-01

    Direct drive digital servo press has been developed as the university-industry joint research and development since 1998. On the basis of this result, 4-axes direct drive digital servo press has been developed and in the market on April of 2002. This servo press is composed of 1 slide supported by 4 ball screws and each axis has linearscale measuring the position of each axis with high accuracy less than μm order level. Each axis is controlled independently by servo motor and feedback system. This system can keep high level parallelism and high accuracy even with high eccentric load. Furthermore the 'full stroke full power' is obtained by using ball screws. Using these features, new various types of press forming and stamping have been obtained by development and production. The new stamping and forming methods are introduced and 'manufacturing' need strategy of press forming with high added value and also the future direction of press forming are also introduced.

  8. High Performance Fortran for Aerospace Applications

    National Research Council Canada - National Science Library

    Mehrotra, Piyush

    2000-01-01

    .... HPF is a set of Fortran extensions designed to provide users with a high-level interface for programming data parallel scientific applications while delegating to the compiler/runtime system the task...

  9. 9th International Workshop on Parallel Tools for High Performance Computing

    CERN Document Server

    Hilbrich, Tobias; Niethammer, Christoph; Gracia, José; Nagel, Wolfgang; Resch, Michael

    2016-01-01

    High Performance Computing (HPC) remains a driver that offers huge potentials and benefits for science and society. However, a profound understanding of the computational matters and specialized software is needed to arrive at effective and efficient simulations. Dedicated software tools are important parts of the HPC software landscape, and support application developers. Even though a tool is by definition not a part of an application, but rather a supplemental piece of software, it can make a fundamental difference during the development of an application. Such tools aid application developers in the context of debugging, performance analysis, and code optimization, and therefore make a major contribution to the development of robust and efficient parallel software. This book introduces a selection of the tools presented and discussed at the 9th International Parallel Tools Workshop held in Dresden, Germany, September 2-3, 2015, which offered an established forum for discussing the latest advances in paral...

  10. 8th International Workshop on Parallel Tools for High Performance Computing

    CERN Document Server

    Gracia, José; Knüpfer, Andreas; Resch, Michael; Nagel, Wolfgang

    2015-01-01

    Numerical simulation and modelling using High Performance Computing has evolved into an established technique in academic and industrial research. At the same time, the High Performance Computing infrastructure is becoming ever more complex. For instance, most of the current top systems around the world use thousands of nodes in which classical CPUs are combined with accelerator cards in order to enhance their compute power and energy efficiency. This complexity can only be mastered with adequate development and optimization tools. Key topics addressed by these tools include parallelization on heterogeneous systems, performance optimization for CPUs and accelerators, debugging of increasingly complex scientific applications, and optimization of energy usage in the spirit of green IT. This book represents the proceedings of the 8th International Parallel Tools Workshop, held October 1-2, 2014 in Stuttgart, Germany – which is a forum to discuss the latest advancements in the parallel tools.

  11. Aggregating job exit statuses of a plurality of compute nodes executing a parallel application

    Science.gov (United States)

    Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.; Mundy, Michael B.

    2015-07-21

    Aggregating job exit statuses of a plurality of compute nodes executing a parallel application, including: identifying a subset of compute nodes in the parallel computer to execute the parallel application; selecting one compute node in the subset of compute nodes in the parallel computer as a job leader compute node; initiating execution of the parallel application on the subset of compute nodes; receiving an exit status from each compute node in the subset of compute nodes, where the exit status for each compute node includes information describing execution of some portion of the parallel application by the compute node; aggregating each exit status from each compute node in the subset of compute nodes; and sending an aggregated exit status for the subset of compute nodes in the parallel computer.

  12. The Benefits of Adaptive Partitioning for Parallel AMR Applications

    Energy Technology Data Exchange (ETDEWEB)

    Steensland, Johan [Sandia National Lab. (SNL-CA), Livermore, CA (United States). Advanced Software Research and Development

    2008-07-01

    Parallel adaptive mesh refinement methods potentially lead to realistic modeling of complex three-dimensional physical phenomena. However, the dynamics inherent in these methods present significant challenges in data partitioning and load balancing. Significant human resources, including time, effort, experience, and knowledge, are required for determining the optimal partitioning technique for each new simulation. In reality, scientists resort to using the on-board partitioner of the computational framework, or to using the partitioning industry standard, ParMetis. Adaptive partitioning refers to repeatedly selecting, configuring and invoking the optimal partitioning technique at run-time, based on the current state of the computer and application. In theory, adaptive partitioning automatically delivers superior performance and eliminates the need for repeatedly spending valuable human resources for determining the optimal static partitioning technique. In practice, however, enabling frameworks are non-existent due to the inherent significant inter-disciplinary research challenges. This paper presents a study of a simple implementation of adaptive partitioning and discusses implied potential benefits from the perspective of common groups of users within computational science. The study is based on a large set of data derived from experiments including six real-life, multi-time-step adaptive applications from various scientific domains, five complementing and fundamentally different partitioning techniques, a large set of parameters corresponding to a wide spectrum of computing environments, and a flexible cost function that considers the relative impact of multiple partitioning metrics and diverse partitioning objectives. The results show that even a simple implementation of adaptive partitioning can automatically generate results statistically equivalent to the best static partitioning. Thus, it is possible to effectively eliminate the problem of determining the

  13. Design, analysis and control of cable-suspended parallel robots and its applications

    CERN Document Server

    Zi, Bin

    2017-01-01

    This book provides an essential overview of the authors’ work in the field of cable-suspended parallel robots, focusing on innovative design, mechanics, control, development and applications. It presents and analyzes several typical mechanical architectures of cable-suspended parallel robots in practical applications, including the feed cable-suspended structure for super antennae, hybrid-driven-based cable-suspended parallel robots, and cooperative cable parallel manipulators for multiple mobile cranes. It also addresses the fundamental mechanics of cable-suspended parallel robots on the basis of their typical applications, including the kinematics, dynamics and trajectory tracking control of the feed cable-suspended structure for super antennae. In addition it proposes a novel hybrid-driven-based cable-suspended parallel robot that uses integrated mechanism design methods to improve the performance of traditional cable-suspended parallel robots. A comparative study on error and performance indices of hybr...

  14. Highly uniform parallel microfabrication using a large numerical aperture system

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Zi-Yu; Su, Ya-Hui, E-mail: ustcsyh@ahu.edu.cn, E-mail: dongwu@ustc.edu.cn [School of Electrical Engineering and Automation, Anhui University, Hefei 230601 (China); Zhang, Chen-Chu; Hu, Yan-Lei; Wang, Chao-Wei; Li, Jia-Wen; Chu, Jia-Ru; Wu, Dong, E-mail: ustcsyh@ahu.edu.cn, E-mail: dongwu@ustc.edu.cn [CAS Key Laboratory of Mechanical Behavior and Design of Materials, Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China, Hefei 230026 (China)

    2016-07-11

    In this letter, we report an improved algorithm to produce accurate phase patterns for generating highly uniform diffraction-limited multifocal arrays in a large numerical aperture objective system. It is shown that based on the original diffraction integral, the uniformity of the diffraction-limited focal arrays can be improved from ∼75% to >97%, owing to the critical consideration of the aperture function and apodization effect associated with a large numerical aperture objective. The experimental results, e.g., 3 × 3 arrays of square and triangle, seven microlens arrays with high uniformity, further verify the advantage of the improved algorithm. This algorithm enables the laser parallel processing technology to realize uniform microstructures and functional devices in the microfabrication system with a large numerical aperture objective.

  15. Development and application of efficient strategies for parallel magnetic resonance imaging

    Energy Technology Data Exchange (ETDEWEB)

    Breuer, F.

    2006-07-01

    Virtually all existing MRI applications require both a high spatial and high temporal resolution for optimum detection and classification of the state of disease. The main strategy to meet the increasing demands of advanced diagnostic imaging applications has been the steady improvement of gradient systems, which provide increased gradient strengths and faster switching times. Rapid imaging techniques and the advances in gradient performance have significantly reduced acquisition times from about an hour to several minutes or seconds. In order to further increase imaging speed, much higher gradient strengths and much faster switching times are required which are technically challenging to provide. In addition to significant hardware costs, peripheral neuro-stimulations and the surpassing of admissable acoustic noise levels may occur. Today's whole body gradient systems already operate just below the allowed safety levels. For these reasons, alternative strategies are needed to bypass these limitations. The greatest progress in further increasing imaging speed has been the development of multi-coil arrays and the advent of partially parallel acquisition (PPA) techniques in the late 1990's. Within the last years, parallel imaging methods have become commercially available,and are therefore ready for broad clinical use. The basic feature of parallel imaging is a scan time reduction, applicable to nearly any available MRI method, while maintaining the contrast behavior without requiring higher gradient system performance. PPA operates by allowing an array of receiver surface coils, positioned around the object under investigation, to partially replace time-consuming spatial encoding which normally is performed by switching magnetic field gradients. Using this strategy, spatial resolution can be improved given a specific imaging time, or scan times can be reduced at a given spatial resolution. Furthermore, in some cases, PPA can even be used to reduce image

  16. Development and application of efficient strategies for parallel magnetic resonance imaging

    International Nuclear Information System (INIS)

    Breuer, F.

    2006-01-01

    Virtually all existing MRI applications require both a high spatial and high temporal resolution for optimum detection and classification of the state of disease. The main strategy to meet the increasing demands of advanced diagnostic imaging applications has been the steady improvement of gradient systems, which provide increased gradient strengths and faster switching times. Rapid imaging techniques and the advances in gradient performance have significantly reduced acquisition times from about an hour to several minutes or seconds. In order to further increase imaging speed, much higher gradient strengths and much faster switching times are required which are technically challenging to provide. In addition to significant hardware costs, peripheral neuro-stimulations and the surpassing of admissable acoustic noise levels may occur. Today's whole body gradient systems already operate just below the allowed safety levels. For these reasons, alternative strategies are needed to bypass these limitations. The greatest progress in further increasing imaging speed has been the development of multi-coil arrays and the advent of partially parallel acquisition (PPA) techniques in the late 1990's. Within the last years, parallel imaging methods have become commercially available,and are therefore ready for broad clinical use. The basic feature of parallel imaging is a scan time reduction, applicable to nearly any available MRI method, while maintaining the contrast behavior without requiring higher gradient system performance. PPA operates by allowing an array of receiver surface coils, positioned around the object under investigation, to partially replace time-consuming spatial encoding which normally is performed by switching magnetic field gradients. Using this strategy, spatial resolution can be improved given a specific imaging time, or scan times can be reduced at a given spatial resolution. Furthermore, in some cases, PPA can even be used to reduce image artifacts

  17. Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

    Science.gov (United States)

    Moon, Hongsik

    What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the

  18. Intelligent trigger by massively parallel processors for high energy physics experiments

    International Nuclear Information System (INIS)

    Rohrbach, F.; Vesztergombi, G.

    1992-01-01

    The CERN-MPPC collaboration concentrates its effort on the development of machines based on massive parallelism with thousands of integrated processing elements, arranged in a string. Seven applications are under detailed studies within the collaboration: three for LHC, one for SSC, two for fixed target high energy physics at CERN and one for HDTV. Preliminary results are presented. They show that the objectives should be reached with the use of the ASP architecture. (author)

  19. Parallelization of MCNP 4, a Monte Carlo neutron and photon transport code system, in highly parallel distributed memory type computer

    International Nuclear Information System (INIS)

    Masukawa, Fumihiro; Takano, Makoto; Naito, Yoshitaka; Yamazaki, Takao; Fujisaki, Masahide; Suzuki, Koichiro; Okuda, Motoi.

    1993-11-01

    In order to improve the accuracy and calculating speed of shielding analyses, MCNP 4, a Monte Carlo neutron and photon transport code system, has been parallelized and measured of its efficiency in the highly parallel distributed memory type computer, AP1000. The code has been analyzed statically and dynamically, then the suitable algorithm for parallelization has been determined for the shielding analysis functions of MCNP 4. This includes a strategy where a new history is assigned to the idling processor element dynamically during the execution. Furthermore, to avoid the congestion of communicative processing, the batch concept, processing multi-histories by a unit, has been introduced. By analyzing a sample cask problem with 2,000,000 histories by the AP1000 with 512 processor elements, the 82 % of parallelization efficiency is achieved, and the calculational speed has been estimated to be around 50 times as fast as that of FACOM M-780. (author)

  20. Parallel Backprojection: A Case Study in High-Performance Reconfigurable Computing

    Directory of Open Access Journals (Sweden)

    Cordes Ben

    2009-01-01

    Full Text Available High-performance reconfigurable computing (HPRC is a novel approach to provide large-scale computing power to modern scientific applications. Using both general-purpose processors and FPGAs allows application designers to exploit fine-grained and coarse-grained parallelism, achieving high degrees of speedup. One scientific application that benefits from this technique is backprojection, an image formation algorithm that can be used as part of a synthetic aperture radar (SAR processing system. We present an implementation of backprojection for SAR on an HPRC system. Using simulated data taken at a variety of ranges, our implementation runs over 200 times faster than a similar software program, with an overall application speedup better than 50x. The backprojection application is easily parallelizable, achieving near-linear speedup when run on multiple nodes of a clustered HPRC system. The results presented can be applied to other systems and other algorithms with similar characteristics.

  1. Parallel Backprojection: A Case Study in High-Performance Reconfigurable Computing

    Directory of Open Access Journals (Sweden)

    2009-03-01

    Full Text Available High-performance reconfigurable computing (HPRC is a novel approach to provide large-scale computing power to modern scientific applications. Using both general-purpose processors and FPGAs allows application designers to exploit fine-grained and coarse-grained parallelism, achieving high degrees of speedup. One scientific application that benefits from this technique is backprojection, an image formation algorithm that can be used as part of a synthetic aperture radar (SAR processing system. We present an implementation of backprojection for SAR on an HPRC system. Using simulated data taken at a variety of ranges, our implementation runs over 200 times faster than a similar software program, with an overall application speedup better than 50x. The backprojection application is easily parallelizable, achieving near-linear speedup when run on multiple nodes of a clustered HPRC system. The results presented can be applied to other systems and other algorithms with similar characteristics.

  2. Application of parallel computing to seismic damage process simulation of an arch dam

    International Nuclear Information System (INIS)

    Zhong Hong; Lin Gao; Li Jianbo

    2010-01-01

    The simulation of damage process of high arch dam subjected to strong earthquake shocks is significant to the evaluation of its performance and seismic safety, considering the catastrophic effect of dam failure. However, such numerical simulation requires rigorous computational capacity. Conventional serial computing falls short of that and parallel computing is a fairly promising solution to this problem. The parallel finite element code PDPAD was developed for the damage prediction of arch dams utilizing the damage model with inheterogeneity of concrete considered. Developed with programming language Fortran, the code uses a master/slave mode for programming, domain decomposition method for allocation of tasks, MPI (Message Passing Interface) for communication and solvers from AZTEC library for solution of large-scale equations. Speedup test showed that the performance of PDPAD was quite satisfactory. The code was employed to study the damage process of a being-built arch dam on a 4-node PC Cluster, with more than one million degrees of freedom considered. The obtained damage mode was quite similar to that of shaking table test, indicating that the proposed procedure and parallel code PDPAD has a good potential in simulating seismic damage mode of arch dams. With the rapidly growing need for massive computation emerged from engineering problems, parallel computing will find more and more applications in pertinent areas.

  3. Kemari: A Portable High Performance Fortran System for Distributed Memory Parallel Processors

    Directory of Open Access Journals (Sweden)

    T. Kamachi

    1997-01-01

    Full Text Available We have developed a compilation system which extends High Performance Fortran (HPF in various aspects. We support the parallelization of well-structured problems with loop distribution and alignment directives similar to HPF's data distribution directives. Such directives give both additional control to the user and simplify the compilation process. For the support of unstructured problems, we provide directives for dynamic data distribution through user-defined mappings. The compiler also allows integration of message-passing interface (MPI primitives. The system is part of a complete programming environment which also comprises a parallel debugger and a performance monitor and analyzer. After an overview of the compiler, we describe the language extensions and related compilation mechanisms in detail. Performance measurements demonstrate the compiler's applicability to a variety of application classes.

  4. State of the art of parallel scientific visualization applications on PC clusters; Etat de l'art des applications de visualisation scientifique paralleles sur grappes de PC

    Energy Technology Data Exchange (ETDEWEB)

    Juliachs, M

    2004-07-01

    In this state of the art on parallel scientific visualization applications on PC clusters, we deal with both surface and volume rendering approaches. We first analyze available PC cluster configurations and existing parallel rendering software components for parallel graphics rendering. CEA/DIF has been studying cluster visualization since 2001. This report is part of a study to set up a new visualization research platform. This platform consisting of an eight-node PC cluster under Linux and a tiled display was installed in collaboration with Versailles-Saint-Quentin University in August 2003. (author)

  5. State of the art of parallel scientific visualization applications on PC clusters; Etat de l'art des applications de visualisation scientifique paralleles sur grappes de PC

    Energy Technology Data Exchange (ETDEWEB)

    Juliachs, M

    2004-07-01

    In this state of the art on parallel scientific visualization applications on PC clusters, we deal with both surface and volume rendering approaches. We first analyze available PC cluster configurations and existing parallel rendering software components for parallel graphics rendering. CEA/DIF has been studying cluster visualization since 2001. This report is part of a study to set up a new visualization research platform. This platform consisting of an eight-node PC cluster under Linux and a tiled display was installed in collaboration with Versailles-Saint-Quentin University in August 2003. (author)

  6. Resonant enhanced parallel-T topology for weak coupling wireless power transfer pickup applications

    Directory of Open Access Journals (Sweden)

    Yao Guo

    2015-07-01

    Full Text Available For the wireless power transfer (WPT system, the transfer performance and the coupling coefficient are contradictory. In this paper, a novel parallel-T resonant topology consists of a traditional parallel circuit and a T-matching network for secondary side is proposed. With this method, a boosted voltage can be output to the load, since this topology has a resonant enhancement effect, and high Q value can be obtained at a low resonant frequency and low coil inductance. This feature makes it more suitable for weak coupling WPT applications. Besides, the proposed topology shows good frequency stability and adaptability to variations of load. Experimental results show that the output voltage gain improves by 757% compared with traditional series circuit, and reaches 85% total efficiency when the coupling coefficient is 0.046.

  7. The Protein Maker: an automated system for high-throughput parallel purification

    International Nuclear Information System (INIS)

    Smith, Eric R.; Begley, Darren W.; Anderson, Vanessa; Raymond, Amy C.; Haffner, Taryn E.; Robinson, John I.; Edwards, Thomas E.; Duncan, Natalie; Gerdts, Cory J.; Mixon, Mark B.; Nollert, Peter; Staker, Bart L.; Stewart, Lance J.

    2011-01-01

    The Protein Maker instrument addresses a critical bottleneck in structural genomics by allowing automated purification and buffer testing of multiple protein targets in parallel with a single instrument. Here, the use of this instrument to (i) purify multiple influenza-virus proteins in parallel for crystallization trials and (ii) identify optimal lysis-buffer conditions prior to large-scale protein purification is described. The Protein Maker is an automated purification system developed by Emerald BioSystems for high-throughput parallel purification of proteins and antibodies. This instrument allows multiple load, wash and elution buffers to be used in parallel along independent lines for up to 24 individual samples. To demonstrate its utility, its use in the purification of five recombinant PB2 C-terminal domains from various subtypes of the influenza A virus is described. Three of these constructs crystallized and one diffracted X-rays to sufficient resolution for structure determination and deposition in the Protein Data Bank. Methods for screening lysis buffers for a cytochrome P450 from a pathogenic fungus prior to upscaling expression and purification are also described. The Protein Maker has become a valuable asset within the Seattle Structural Genomics Center for Infectious Disease (SSGCID) and hence is a potentially valuable tool for a variety of high-throughput protein-purification applications

  8. Simple, parallel, high-performance virtual machines for extreme computations

    International Nuclear Information System (INIS)

    Chokoufe Nejad, Bijan; Ohl, Thorsten; Reuter, Jurgen

    2014-11-01

    We introduce a high-performance virtual machine (VM) written in a numerically fast language like Fortran or C to evaluate very large expressions. We discuss the general concept of how to perform computations in terms of a VM and present specifically a VM that is able to compute tree-level cross sections for any number of external legs, given the corresponding byte code from the optimal matrix element generator, O'Mega. Furthermore, this approach allows to formulate the parallel computation of a single phase space point in a simple and obvious way. We analyze hereby the scaling behaviour with multiple threads as well as the benefits and drawbacks that are introduced with this method. Our implementation of a VM can run faster than the corresponding native, compiled code for certain processes and compilers, especially for very high multiplicities, and has in general runtimes in the same order of magnitude. By avoiding the tedious compile and link steps, which may fail for source code files of gigabyte sizes, new processes or complex higher order corrections that are currently out of reach could be evaluated with a VM given enough computing power.

  9. Highly parallel algorithm for high pT physics at FAIR-CBM

    International Nuclear Information System (INIS)

    Fueloep, A; Vesztergombi, G

    2010-01-01

    The limitations of presently available data on p T range are discussed and planned future upgrades are outlined. Special attention is given to the FAIR-CBM experiment as a unique high luminosity facility for future continuation of the measurements at very high p T with emphasis on the so-called mosaic trigger system to use the highly parallel online algorithm.

  10. High accurate volume holographic correlator with 4000 parallel correlation channels

    Science.gov (United States)

    Ni, Kai; Qu, Zongyao; Cao, Liangcai; Su, Ping; He, Qingsheng; Jin, Guofan

    2008-03-01

    Volume holographic correlator allows simultaneously calculate the two-dimensional inner product between the input image and each stored image. We have recently experimentally implemented in VHC 4000 parallel correlation channels with better than 98% output accuracy in a single location in a crystal. The speckle modulation is used to suppress the sidelobes of the correlation patterns, allowing more correlation spots to be contained in the output plane. A modified exposure schedule is designed to ensure the hologram in each channel with unity diffraction efficiency. In this schedule, a restricted coefficient was introduced into the original exposure schedule to solve the problem that the sensitivity and time constant of the crystal will change as a time function when in high-capacity storage. An interleaving method is proposed to improve the output accuracy. By unifying the distribution of the input and stored image patterns without changing the inner products between them, this method could eliminate the impact of correlation pattern variety on calculated inner product values. Moreover, by using this method, the maximum correlation spot size is reduced, which decreases the required minimum safe clearance between neighboring spots in the output plane, allowing more spots to be parallely detected without crosstalk. The experimental results are given and analyzed.

  11. A novel highly parallel algorithm for linearly unmixing hyperspectral images

    Science.gov (United States)

    Guerra, Raúl; López, Sebastián.; Callico, Gustavo M.; López, Jose F.; Sarmiento, Roberto

    2014-10-01

    Endmember extraction and abundances calculation represent critical steps within the process of linearly unmixing a given hyperspectral image because of two main reasons. The first one is due to the need of computing a set of accurate endmembers in order to further obtain confident abundance maps. The second one refers to the huge amount of operations involved in these time-consuming processes. This work proposes an algorithm to estimate the endmembers of a hyperspectral image under analysis and its abundances at the same time. The main advantage of this algorithm is its high parallelization degree and the mathematical simplicity of the operations implemented. This algorithm estimates the endmembers as virtual pixels. In particular, the proposed algorithm performs the descent gradient method to iteratively refine the endmembers and the abundances, reducing the mean square error, according with the linear unmixing model. Some mathematical restrictions must be added so the method converges in a unique and realistic solution. According with the algorithm nature, these restrictions can be easily implemented. The results obtained with synthetic images demonstrate the well behavior of the algorithm proposed. Moreover, the results obtained with the well-known Cuprite dataset also corroborate the benefits of our proposal.

  12. Highly parallel translation of DNA sequences into small molecules.

    Directory of Open Access Journals (Sweden)

    Rebecca M Weisinger

    Full Text Available A large body of in vitro evolution work establishes the utility of biopolymer libraries comprising 10(10 to 10(15 distinct molecules for the discovery of nanomolar-affinity ligands to proteins. Small-molecule libraries of comparable complexity will likely provide nanomolar-affinity small-molecule ligands. Unlike biopolymers, small molecules can offer the advantages of cell permeability, low immunogenicity, metabolic stability, rapid diffusion and inexpensive mass production. It is thought that such desirable in vivo behavior is correlated with the physical properties of small molecules, specifically a limited number of hydrogen bond donors and acceptors, a defined range of hydrophobicity, and most importantly, molecular weights less than 500 Daltons. Creating a collection of 10(10 to 10(15 small molecules that meet these criteria requires the use of hundreds to thousands of diversity elements per step in a combinatorial synthesis of three to five steps. With this goal in mind, we have reported a set of mesofluidic devices that enable DNA-programmed combinatorial chemistry in a highly parallel 384-well plate format. Here, we demonstrate that these devices can translate DNA genes encoding 384 diversity elements per coding position into corresponding small-molecule gene products. This robust and efficient procedure yields small molecule-DNA conjugates suitable for in vitro evolution experiments.

  13. Highly Parallelized Pattern Matching Execution for the ATLAS Experiment

    CERN Document Server

    Citraro, Saverio; The ATLAS collaboration

    2015-01-01

    The trigger system of the ATLAS experiment at LHC will extend its rejection capabilities during operations in 2015-2018 by introducing the Fast TracKer system (FTK). FTK is a hardware based system capable of finding charged particle tracks by analyzing hits in silicon detectors at the rate of 105 events per second. The core of track reconstruction is performed into two pipelined steps. At first step the candidate tracks are found by matching combination of low resolution hits to predefined patterns; then they are used in the second step to seed a more precise track fitting algorithm. The key FTK component is an Associative Memory (AM) system that is used to perform pattern matching with high degree of parallelism. The AM system implementation, the AM Serial Link Processor, is based on an extremely powerful network of 2 Gb/s serial links to sustain a huge traffic of data. We report on the design of the Serial Link Processor consisting of two types of boards, the Little Associative Memory Board (LAMB), a mezzan...

  14. Highly Parallelized Pattern Matching Execution for the ATLAS Experiment

    CERN Document Server

    Citraro, Saverio; The ATLAS collaboration

    2015-01-01

    Abstract– The Associative Memory (AM) system of the Fast Tracker (FTK) processor has been designed to perform pattern matching using as input the data from the silicon tracker in the ATLAS experiment. The AM is the primary component of the FTK system and is designed using ASIC technology (the AM chip) to execute pattern matching with a high degree of parallelism. The FTK system finds track candidates at low resolution that are seeds for a full resolution track fitting. The AM system implementation is named “Serial Link Processor” and is based on an extremely powerful network of 2 Gb/s serial links to sustain a huge traffic of data. This paper reports on the design of the Serial Link Processor consisting of two types of boards, the Little Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME motherboard which hosts four LAMB daughterboards. We also report on the performance of the prototypes (both hardware and firmware) produced and ...

  15. Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

    Science.gov (United States)

    Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

    2017-01-01

    In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.

  16. Leveraging Cloud Heterogeneity for Cost-Efficient Execution of Parallel Applications

    OpenAIRE

    Roloff, Eduardo; Diener, Matthias; Diaz Carreño, Emmanuell; Gaspary, Luciano Paschoal; Navaux, Philippe O.A.

    2017-01-01

    Public cloud providers offer a wide range of instance types, with different processing and interconnection speeds, as well as varying prices. Furthermore, the tasks of many parallel applications show different computational demands due to load imbalance. These differences can be exploited for improving the cost efficiency of parallel applications in many cloud environments by matching application requirements to instance types. In this paper, we introduce the concept of heterogeneous cloud sy...

  17. Application of parallel computing techniques to a large-scale reservoir simulation

    International Nuclear Information System (INIS)

    Zhang, Keni; Wu, Yu-Shu; Ding, Chris; Pruess, Karsten

    2001-01-01

    Even with the continual advances made in both computational algorithms and computer hardware used in reservoir modeling studies, large-scale simulation of fluid and heat flow in heterogeneous reservoirs remains a challenge. The problem commonly arises from intensive computational requirement for detailed modeling investigations of real-world reservoirs. This paper presents the application of a massive parallel-computing version of the TOUGH2 code developed for performing large-scale field simulations. As an application example, the parallelized TOUGH2 code is applied to develop a three-dimensional unsaturated-zone numerical model simulating flow of moisture, gas, and heat in the unsaturated zone of Yucca Mountain, Nevada, a potential repository for high-level radioactive waste. The modeling approach employs refined spatial discretization to represent the heterogeneous fractured tuffs of the system, using more than a million 3-D gridblocks. The problem of two-phase flow and heat transfer within the model domain leads to a total of 3,226,566 linear equations to be solved per Newton iteration. The simulation is conducted on a Cray T3E-900, a distributed-memory massively parallel computer. Simulation results indicate that the parallel computing technique, as implemented in the TOUGH2 code, is very efficient. The reliability and accuracy of the model results have been demonstrated by comparing them to those of small-scale (coarse-grid) models. These comparisons show that simulation results obtained with the refined grid provide more detailed predictions of the future flow conditions at the site, aiding in the assessment of proposed repository performance

  18. Application of parallelized software architecture to an autonomous ground vehicle

    Science.gov (United States)

    Shakya, Rahul; Wright, Adam; Shin, Young Ho; Momin, Orko; Petkovsek, Steven; Wortman, Paul; Gautam, Prasanna; Norton, Adam

    2011-01-01

    This paper presents improvements made to Q, an autonomous ground vehicle designed to participate in the Intelligent Ground Vehicle Competition (IGVC). For the 2010 IGVC, Q was upgraded with a new parallelized software architecture and a new vision processor. Improvements were made to the power system reducing the number of batteries required for operation from six to one. In previous years, a single state machine was used to execute the bulk of processing activities including sensor interfacing, data processing, path planning, navigation algorithms and motor control. This inefficient approach led to poor software performance and made it difficult to maintain or modify. For IGVC 2010, the team implemented a modular parallel architecture using the National Instruments (NI) LabVIEW programming language. The new architecture divides all the necessary tasks - motor control, navigation, sensor data collection, etc. into well-organized components that execute in parallel, providing considerable flexibility and facilitating efficient use of processing power. Computer vision is used to detect white lines on the ground and determine their location relative to the robot. With the new vision processor and some optimization of the image processing algorithm used last year, two frames can be acquired and processed in 70ms. With all these improvements, Q placed 2nd in the autonomous challenge.

  19. Evaluation of the Intel iWarp parallel processor for space flight applications

    Science.gov (United States)

    Hine, Butler P., III; Fong, Terrence W.

    1993-01-01

    The potential of a DARPA-sponsored advanced processor, the Intel iWarp, for use in future SSF Data Management Systems (DMS) upgrades is evaluated through integration into the Ames DMS testbed and applications testing. The iWarp is a distributed, parallel computing system well suited for high performance computing applications such as matrix operations and image processing. The system architecture is modular, supports systolic and message-based computation, and is capable of providing massive computational power in a low-cost, low-power package. As a consequence, the iWarp offers significant potential for advanced space-based computing. This research seeks to determine the iWarp's suitability as a processing device for space missions. In particular, the project focuses on evaluating the ease of integrating the iWarp into the SSF DMS baseline architecture and the iWarp's ability to support computationally stressing applications representative of SSF tasks.

  20. Practical parallel computing

    CERN Document Server

    Morse, H Stephen

    1994-01-01

    Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi

  1. Parallel translation in warped product spaces: application to the Reissner-Nordstroem spacetime

    International Nuclear Information System (INIS)

    Raposo, A P; Del Riego, L

    2005-01-01

    A formal treatment of the parallel translation transformations in warped product manifolds is presented and related to those parallel translation transformations in each of the factor manifolds. A straightforward application to the Schwarzschild and Reissner-Nordstroem geometries, considered here as particular examples, explains some apparently surprising properties of the holonomy in these manifolds

  2. Enabling Requirements-Based Programming for Highly-Dependable Complex Parallel and Distributed Systems

    Science.gov (United States)

    Hinchey, Michael G.; Rash, James L.; Rouff, Christopher A.

    2005-01-01

    The manual application of formal methods in system specification has produced successes, but in the end, despite any claims and assertions by practitioners, there is no provable relationship between a manually derived system specification or formal model and the customer's original requirements. Complex parallel and distributed system present the worst case implications for today s dearth of viable approaches for achieving system dependability. No avenue other than formal methods constitutes a serious contender for resolving the problem, and so recognition of requirements-based programming has come at a critical juncture. We describe a new, NASA-developed automated requirement-based programming method that can be applied to certain classes of systems, including complex parallel and distributed systems, to achieve a high degree of dependability.

  3. Computational Performance of a Parallelized Three-Dimensional High-Order Spectral Element Toolbox

    Science.gov (United States)

    Bosshard, Christoph; Bouffanais, Roland; Clémençon, Christian; Deville, Michel O.; Fiétier, Nicolas; Gruber, Ralf; Kehtari, Sohrab; Keller, Vincent; Latt, Jonas

    In this paper, a comprehensive performance review of an MPI-based high-order three-dimensional spectral element method C++ toolbox is presented. The focus is put on the performance evaluation of several aspects with a particular emphasis on the parallel efficiency. The performance evaluation is analyzed with help of a time prediction model based on a parameterization of the application and the hardware resources. A tailor-made CFD computation benchmark case is introduced and used to carry out this review, stressing the particular interest for clusters with up to 8192 cores. Some problems in the parallel implementation have been detected and corrected. The theoretical complexities with respect to the number of elements, to the polynomial degree, and to communication needs are correctly reproduced. It is concluded that this type of code has a nearly perfect speed up on machines with thousands of cores, and is ready to make the step to next-generation petaflop machines.

  4. High-Performance Psychometrics: The Parallel-E Parallel-M Algorithm for Generalized Latent Variable Models. Research Report. ETS RR-16-34

    Science.gov (United States)

    von Davier, Matthias

    2016-01-01

    This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…

  5. Highly parallel machines and future of scientific computing

    International Nuclear Information System (INIS)

    Singh, G.S.

    1992-01-01

    Computing requirement of large scale scientific computing has always been ahead of what state of the art hardware could supply in the form of supercomputers of the day. And for any single processor system the limit to increase in the computing power was realized a few years back itself. Now with the advent of parallel computing systems the availability of machines with the required computing power seems a reality. In this paper the author tries to visualize the future large scale scientific computing in the penultimate decade of the present century. The author summarized trends in parallel computers and emphasize the need for a better programming environment and software tools for optimal performance. The author concludes this paper with critique on parallel architectures, software tools and algorithms. (author). 10 refs., 2 tabs

  6. A high performance parallel approach to medical imaging

    International Nuclear Information System (INIS)

    Frieder, G.; Frieder, O.; Stytz, M.R.

    1988-01-01

    Research into medical imaging using general purpose parallel processing architectures is described and a review of the performance of previous medical imaging machines is provided. Results demonstrating that general purpose parallel architectures can achieve performance comparable to other, specialized, medical imaging machine architectures is presented. A new back-to-front hidden-surface removal algorithm is described. Results demonstrating the computational savings obtained by using the modified back-to-front hidden-surface removal algorithm are presented. Performance figures for forming a full-scale medical image on a mesh interconnected multiprocessor are presented

  7. Design of high-performance parallelized gene predictors in MATLAB.

    Science.gov (United States)

    Rivard, Sylvain Robert; Mailloux, Jean-Gabriel; Beguenane, Rachid; Bui, Hung Tien

    2012-04-10

    This paper proposes a method of implementing parallel gene prediction algorithms in MATLAB. The proposed designs are based on either Goertzel's algorithm or on FFTs and have been implemented using varying amounts of parallelism on a central processing unit (CPU) and on a graphics processing unit (GPU). Results show that an implementation using a straightforward approach can require over 4.5 h to process 15 million base pairs (bps) whereas a properly designed one could perform the same task in less than five minutes. In the best case, a GPU implementation can yield these results in 57 s. The present work shows how parallelism can be used in MATLAB for gene prediction in very large DNA sequences to produce results that are over 270 times faster than a conventional approach. This is significant as MATLAB is typically overlooked due to its apparent slow processing time even though it offers a convenient environment for bioinformatics. From a practical standpoint, this work proposes two strategies for accelerating genome data processing which rely on different parallelization mechanisms. Using a CPU, the work shows that direct access to the MEX function increases execution speed and that the PARFOR construct should be used in order to take full advantage of the parallelizable Goertzel implementation. When the target is a GPU, the work shows that data needs to be segmented into manageable sizes within the GFOR construct before processing in order to minimize execution time.

  8. Design of a highly parallel board-level-interconnection with 320 Gbps capacity

    Science.gov (United States)

    Lohmann, U.; Jahns, J.; Limmer, S.; Fey, D.; Bauer, H.

    2012-01-01

    A parallel board-level interconnection design is presented consisting of 32 channels, each operating at 10 Gbps. The hardware uses available optoelectronic components (VCSEL, TIA, pin-diodes) and a combination of planarintegrated free-space optics, fiber-bundles and available MEMS-components, like the DMD™ from Texas Instruments. As a specific feature, we present a new modular inter-board interconnect, realized by 3D fiber-matrix connectors. The performance of the interconnect is evaluated with regard to optical properties and power consumption. Finally, we discuss the application of the interconnect for strongly distributed system architectures, as, for example, in high performance embedded computing systems and data centers.

  9. Multi-thread Parallel Speech Recognition for Mobile Applications

    Directory of Open Access Journals (Sweden)

    LOJKA Martin

    2014-05-01

    Full Text Available In this paper, the server based solution of the multi-thread large vocabulary automatic speech recognition engine is described along with the Android OS and HTML5 practical application examples. The basic idea was to bring speech recognition available for full variety of applications for computers and especially for mobile devices. The speech recognition engine should be independent of commercial products and services (where the dictionary could not be modified. Using of third-party services could be also a security and privacy problem in specific applications, when the unsecured audio data could not be sent to uncontrolled environments (voice data transferred to servers around the globe. Using our experience with speech recognition applications, we have been able to construct a multi-thread speech recognition serverbased solution designed for simple applications interface (API to speech recognition engine modified to specific needs of particular application.

  10. Adaptive Distributed Data Structure Management for Parallel CFD Applications

    KAUST Repository

    Frisch, Jerome

    2013-09-01

    Computational fluid dynamics (CFD) simulations require a lot of computing resources in terms of CPU time and memory in order to compute with a reasonable physical accuracy. If only uniformly refined domains are applied, the amount of computing cells is growing rather fast if a certain small resolution is physically required. This can be remedied by applying adaptively refined grids. Unfortunately, due to the adaptive refinement procedures, errors are introduced which have to be taken into account. This paper is focussing on implementation details of the applied adaptive data structure management and a qualitative analysis of the introduced errors by analysing a Poisson problem on the given data structure, which has to be solved in every time step of a CFD analysis. Furthermore an adaptive CFD benchmark example is computed, showing the benefits of an adaptive refinement as well as measurements of parallel data distribution and performance. © 2013 IEEE.

  11. Connectionist Models and Parallelism in High Level Vision.

    Science.gov (United States)

    1985-01-01

    GRANT NUMBER(s) Jerome A. Feldman N00014-82-K-0193 9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENt. PROJECT, TASK Computer Science...Connectionist Models 2.1 Background and Overviev % Computer science is just beginning to look seriously at parallel computation : it may turn out that...the chair. The program includes intermediate level networks that compute more complex joints and ones that compute parallelograms in the image. These

  12. Dynamic workload balancing of parallel applications with user-level scheduling on the Grid

    CERN Document Server

    Korkhov, Vladimir V; Krzhizhanovskaya, Valeria V

    2009-01-01

    This paper suggests a hybrid resource management approach for efficient parallel distributed computing on the Grid. It operates on both application and system levels, combining user-level job scheduling with dynamic workload balancing algorithm that automatically adapts a parallel application to the heterogeneous resources, based on the actual resource parameters and estimated requirements of the application. The hybrid environment and the algorithm for automated load balancing are described, the influence of resource heterogeneity level is measured, and the speedup achieved with this technique is demonstrated for different types of applications and resources.

  13. Experimental analysis of a capillary pumped loop for terrestrial applications with several evaporators in parallel

    International Nuclear Information System (INIS)

    Blet, Nicolas; Bertin, Yves; Ayel, Vincent; Romestant, Cyril; Platel, Vincent

    2016-01-01

    Highlights: • This paper introduces experimental studies of a CPLTA with 3 evaporators in parallel. • Operating principles of mono-evaporator CPLTA are reminded. • A reference test with the new bench with only one evaporator is introduced. • Global behavior of the multi-evaporators loop is presented and discussed. • Some additional thermohydraulic couplings are revealed. - Abstract: In the context of high-dissipation electronics cooling for ground transportation, a new design of two-phase loop has been improved in recent years: the capillary pumped loop for terrestrial application (CPLTA). This hybrid system, between the two standard capillary pumped loop (CPL) and loop heat pipe (LHP), has been widely investigated with a single evaporator, and so a single dissipative area, to know its mean operating principles and thermohydraulic couplings between the components. To aim to extend its scope of applications, a new experimental CPLTA with three evaporators in parallel is studied in this paper with methanol as working fluid. Even if the dynamics of the loop in multi-evaporators mode appears on the whole similar to that with a single operating evaporator, additional couplings are highlighted between the several evaporators. A decoupling between vapor generation flow rate and pressure drop in each evaporator is especially revealed. The impact of this phenomenon on the conductance at evaporator is analyzed.

  14. Progress on H5Part: A Portable High Performance Parallel Data Interface for Electromagnetics Simulations

    International Nuclear Information System (INIS)

    Adelmann, Andreas; Gsell, Achim; Oswald, Benedikt; Schietinger, Thomas; Bethel, Wes; Shalf, John; Siegerist, Cristina; Stockinger, Kurt

    2007-01-01

    Significant problems facing all experimental and computational sciences arise from growing data size and complexity. Common to all these problems is the need to perform efficient data I/O on diverse computer architectures. In our scientific application, the largest parallel particle simulations generate vast quantities of six-dimensional data. Such a simulation run produces data for an aggregate data size up to several TB per run. Motivated by the need to address data I/O and access challenges, we have implemented H5Part, an open source data I/O API that simplifies the use of the Hierarchical Data Format v5 library (HDF5). HDF5 is an industry standard for high performance, cross-platform data storage and retrieval that runs on all contemporary architectures from large parallel supercomputers to laptops. H5Part, which is oriented to the needs of the particle physics and cosmology communities, provides support for parallel storage and retrieval of particles, structured and in the future unstructured meshes. In this paper, we describe recent work focusing on I/O support for particles and structured meshes and provide data showing performance on modern supercomputer architectures like the IBM POWER 5

  15. A parallel stereo reconstruction algorithm with applications in entomology (APSRA)

    Science.gov (United States)

    Bhasin, Rajesh; Jang, Won Jun; Hart, John C.

    2012-03-01

    We propose a fast parallel algorithm for the reconstruction of 3-Dimensional point clouds of insects from binocular stereo image pairs using a hierarchical approach for disparity estimation. Entomologists study various features of insects to classify them, build their distribution maps, and discover genetic links between specimens among various other essential tasks. This information is important to the pesticide and the pharmaceutical industries among others. When considering the large collections of insects entomologists analyze, it becomes difficult to physically handle the entire collection and share the data with researchers across the world. With the method presented in our work, Entomologists can create an image database for their collections and use the 3D models for studying the shape and structure of the insects thus making it easier to maintain and share. Initial feedback shows that the reconstructed 3D models preserve the shape and size of the specimen. We further optimize our results to incorporate multiview stereo which produces better overall structure of the insects. Our main contribution is applying stereoscopic vision techniques to entomology to solve the problems faced by entomologists.

  16. AC losses in horizontally parallel HTS tapes for possible wireless power transfer applications

    Science.gov (United States)

    Shen, Boyang; Geng, Jianzhao; Zhang, Xiuchang; Fu, Lin; Li, Chao; Zhang, Heng; Dong, Qihuan; Ma, Jun; Gawith, James; Coombs, T. A.

    2017-12-01

    This paper presents the concept of using horizontally parallel HTS tapes with AC loss study, and the investigation on possible wireless power transfer (WPT) applications. An example of three parallel HTS tapes was proposed, whose AC loss study was carried out both from experiment using electrical method; and simulation using 2D H-formulation on the FEM platform of COMSOL Multiphysics. The electromagnetic induction around the three parallel tapes was monitored using COMSOL simulation. The electromagnetic induction and AC losses generated by a conventional three turn coil was simulated as well, and then compared to the case of three parallel tapes with the same AC transport current. The analysis demonstrates that HTS parallel tapes could be potentially used into wireless power transfer systems, which could have lower total AC losses than conventional HTS coils.

  17. Automatic Migration from PARMACS to MPI in Parallel Fortran Applications

    Directory of Open Access Journals (Sweden)

    Rolf Hempel

    1999-01-01

    Full Text Available The PARMACS message passing interface has been in widespread use by application projects, especially in Europe. With the new MPI standard for message passing, many projects face the problem of replacing PARMACS with MPI. An automatic translation tool has been developed which replaces all PARMACS 6.0 calls in an application program with their corresponding MPI calls. In this paper we describe the mapping of the PARMACS programming model onto MPI. We then present some implementation details of the converter tool.

  18. Successful design and application of SNCR parallel to combustion modification

    Energy Technology Data Exchange (ETDEWEB)

    Zhao, Dongxian; Tang, Leping; Shao, Xiaozhen; Meng, Derun; Li, Hongjian [Tongfang Environment CO., LTD., Beijing (China); Zhou, Wei; Xu, Guang [GE Energy, Anaheim, CA (United States)

    2013-07-01

    Various De-NOx methods have been recently adopted in China to control NOx emissions including Selective Non-Catalytic Reaction (SNCR) technology. Usually, the design of SNCR system is carried out after the combustion modification technologies, such as low NOx burner (LNB) and over fire air (OFA), have already been installed and in operation. This article discusses how to design the SNCR system parallel to the combustion modification. The SNCR process design consists of three steps: (1) boiler baseline test, (2) computational fluid dynamics simulation (CFD) facilitated design and (3) SNCR system performance predictions and optimizations. The first step is to conduct boiler baseline test to characterize the boiler operating conditions at a load range. The test data can also be used to calibrate the CFD model. The second step is to develop a three-dimensional boiler coal combustion CFD model to simulate the operation of the boilers at both baseline and post combustion modification conditions. The simulation reveals velocity, temperature and combustible distributions in the furnace. The last step is to determine the position and numbers of the injectors for SNCR reagent. The final field tests upon the project completion have shown that the average SNCR De-NOx efficiency has reached 35.1% with the maximum removal efficiency of 45% on full load. The project also couples the SNCR and SCR (Selective Catalytic Reduction) technologies. The combined removal efficiency of combustion modifications, SNCR and SCR is higher than 82%. This paper shows a successful example for retrofitting aged power-generating units with limited space.

  19. Parallel scan hyperspectral fluorescence imaging system and biomedical application for microarrays

    International Nuclear Information System (INIS)

    Liu Zhiyi; Ma Suihua; Liu Le; Guo Jihua; He Yonghong; Ji Yanhong

    2011-01-01

    Microarray research offers great potential for analysis of gene expression profile and leads to greatly improved experimental throughput. A number of instruments have been reported for microarray detection, such as chemiluminescence, surface plasmon resonance, and fluorescence markers. Fluorescence imaging is popular for the readout of microarrays. In this paper we develop a quasi-confocal, multichannel parallel scan hyperspectral fluorescence imaging system for microarray research. Hyperspectral imaging records the entire emission spectrum for every voxel within the imaged area in contrast to recording only fluorescence intensities of filter-based scanners. Coupled with data analysis, the recorded spectral information allows for quantitative identification of the contributions of multiple, spectrally overlapping fluorescent dyes and elimination of unwanted artifacts. The mechanism of quasi-confocal imaging provides a high signal-to-noise ratio, and parallel scan makes this approach a high throughput technique for microarray analysis. This system is improved with a specifically designed spectrometer which can offer a spectral resolution of 0.2 nm, and operates with spatial resolutions ranging from 2 to 30 μm . Finally, the application of the system is demonstrated by reading out microarrays for identification of bacteria.

  20. Radiation-hard/high-speed parallel optical links

    Energy Technology Data Exchange (ETDEWEB)

    Gan, K.K., E-mail: gan@mps.ohio-state.edu [Department of Physics, The Ohio State University, Columbus, OH 43210 (United States); Buchholz, P.; Heidbrink, S. [Fachbereich Physik, Universität Siegen, Siegen (Germany); Kagan, H.P.; Kass, R.D.; Moore, J.; Smith, D.S. [Department of Physics, The Ohio State University, Columbus, OH 43210 (United States); Vogt, M.; Ziolkowski, M. [Fachbereich Physik, Universität Siegen, Siegen (Germany)

    2016-09-21

    We have designed and fabricated a compact parallel optical engine for transmitting data at 5 Gb/s. The device consists of a 4-channel ASIC driving a VCSEL (Vertical Cavity Surface Emitting Laser) array in an optical package. The ASIC is designed using only core transistors in a 65 nm CMOS process to enhance the radiation-hardness. The ASIC contains an 8-bit DAC to control the bias and modulation currents of the individual channels in the VCSEL array. The performance of the optical engine up at 5 Gb/s is satisfactory.

  1. Automatic SIMD parallelization of embedded applications based on pattern recognition

    NARCIS (Netherlands)

    Manniesing, R.; Karkowski, I.P.; Corporaal, H.

    2000-01-01

    This paper investigates the potential for automatic mapping of typical embedded applications to architectures with multimedia instruction set extensions. For this purpose a (pattern matching based) code transformation engine is used, which involves a three-step process of matching, condition

  2. Introduction to massively-parallel computing in high-energy physics

    CERN Document Server

    AUTHOR|(CDS)2083520

    1993-01-01

    Ever since computers were first used for scientific and numerical work, there has existed an "arms race" between the technical development of faster computing hardware, and the desires of scientists to solve larger problems in shorter time-scales. However, the vast leaps in processor performance achieved through advances in semi-conductor science have reached a hiatus as the technology comes up against the physical limits of the speed of light and quantum effects. This has lead all high performance computer manufacturers to turn towards a parallel architecture for their new machines. In these lectures we will introduce the history and concepts behind parallel computing, and review the various parallel architectures and software environments currently available. We will then introduce programming methodologies that allow efficient exploitation of parallel machines, and present case studies of the parallelization of typical High Energy Physics codes for the two main classes of parallel computing architecture (S...

  3. State of the art of parallel scientific visualization applications on PC clusters

    International Nuclear Information System (INIS)

    Juliachs, M.

    2004-01-01

    In this state of the art on parallel scientific visualization applications on PC clusters, we deal with both surface and volume rendering approaches. We first analyze available PC cluster configurations and existing parallel rendering software components for parallel graphics rendering. CEA/DIF has been studying cluster visualization since 2001. This report is part of a study to set up a new visualization research platform. This platform consisting of an eight-node PC cluster under Linux and a tiled display was installed in collaboration with Versailles-Saint-Quentin University in August 2003. (author)

  4. Geospatial Applications on Different Parallel and Distributed Systems in enviroGRIDS Project

    Science.gov (United States)

    Rodila, D.; Bacu, V.; Gorgan, D.

    2012-04-01

    The execution of Earth Science applications and services on parallel and distributed systems has become a necessity especially due to the large amounts of Geospatial data these applications require and the large geographical areas they cover. The parallelization of these applications comes to solve important performance issues and can spread from task parallelism to data parallelism as well. Parallel and distributed architectures such as Grid, Cloud, Multicore, etc. seem to offer the necessary functionalities to solve important problems in the Earth Science domain: storing, distribution, management, processing and security of Geospatial data, execution of complex processing through task and data parallelism, etc. A main goal of the FP7-funded project enviroGRIDS (Black Sea Catchment Observation and Assessment System supporting Sustainable Development) [1] is the development of a Spatial Data Infrastructure targeting this catchment region but also the development of standardized and specialized tools for storing, analyzing, processing and visualizing the Geospatial data concerning this area. For achieving these objectives, the enviroGRIDS deals with the execution of different Earth Science applications, such as hydrological models, Geospatial Web services standardized by the Open Geospatial Consortium (OGC) and others, on parallel and distributed architecture to maximize the obtained performance. This presentation analysis the integration and execution of Geospatial applications on different parallel and distributed architectures and the possibility of choosing among these architectures based on application characteristics and user requirements through a specialized component. Versions of the proposed platform have been used in enviroGRIDS project on different use cases such as: the execution of Geospatial Web services both on Web and Grid infrastructures [2] and the execution of SWAT hydrological models both on Grid and Multicore architectures [3]. The current

  5. An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications

    Energy Technology Data Exchange (ETDEWEB)

    Vydyanathan, Naga; Krishnamoorthy, Sriram; Sabin, Gerald M.; Catalyurek, Umit V.; Kurc, Tahsin; Sadayappan, Ponnuswamy; Saltz, Joel H.

    2009-08-01

    Complex parallel applications can often be modeled as directed acyclic graphs of coarse-grained application-tasks with dependences. These applications exhibit both task- and data-parallelism, and combining these two (also called mixedparallelism), has been shown to be an effective model for their execution. In this paper, we present an algorithm to compute the appropriate mix of task- and data-parallelism required to minimize the parallel completion time (makespan) of these applications. In other words, our algorithm determines the set of tasks that should be run concurrently and the number of processors to be allocated to each task. The processor allocation and scheduling decisions are made in an integrated manner and are based on several factors such as the structure of the taskgraph, the runtime estimates and scalability characteristics of the tasks and the inter-task data communication volumes. A locality conscious scheduling strategy is used to improve inter-task data reuse. Evaluation through simulations and actual executions of task graphs derived from real applications as well as synthetic graphs shows that our algorithm consistently generates schedules with lower makespan as compared to CPR and CPA, two previously proposed scheduling algorithms. Our algorithm also produces schedules that have lower makespan than pure taskand data-parallel schedules. For task graphs with known optimal schedules or lower bounds on the makespan, our algorithm generates schedules that are closer to the optima than other scheduling approaches.

  6. Advanced FDTD methods parallelization, acceleration, and engineering applications

    CERN Document Server

    Yu, Wenhua

    2011-01-01

    The finite-difference time-domain (FDTD) method has revolutionized antenna design and electromagnetics engineering. Here's a cutting-edge book that focuses on the performance optimization and engineering applications of FDTD simulation systems. Covering the latest developments in this area, this unique resource offer you expert advice on the FDTD method, hardware platforms, and network systems. Moreover the book offers guidance in distinguishing between the many different electromagnetics software packages on the market today. You also find a complete chapter dedicated to large multi-scale pro

  7. FPGAs and parallel architectures for aerospace applications soft errors and fault-tolerant design

    CERN Document Server

    Rech, Paolo

    2016-01-01

    This book introduces the concepts of soft errors in FPGAs, as well as the motivation for using commercial, off-the-shelf (COTS) FPGAs in mission-critical and remote applications, such as aerospace.  The authors describe the effects of radiation in FPGAs, present a large set of soft-error mitigation techniques that can be applied in these circuits, as well as methods for qualifying these circuits under radiation.  Coverage includes radiation effects in FPGAs, fault-tolerant techniques for FPGAs, use of COTS FPGAs in aerospace applications, experimental data of FPGAs under radiation, FPGA embedded processors under radiation, and fault injection in FPGAs. Since dedicated parallel processing architectures such as GPUs have become more desirable in aerospace applications due to high computational power, GPU analysis under radiation is also discussed. ·         Discusses features and drawbacks of reconfigurability methods for FPGAs, focused on aerospace applications; ·         Explains how radia...

  8. Parallel Genetic Algorithms for calibrating Cellular Automata models: Application to lava flows

    International Nuclear Information System (INIS)

    D'Ambrosio, D.; Spataro, W.; Di Gregorio, S.; Calabria Univ., Cosenza; Crisci, G.M.; Rongo, R.; Calabria Univ., Cosenza

    2005-01-01

    Cellular Automata are highly nonlinear dynamical systems which are suitable far simulating natural phenomena whose behaviour may be specified in terms of local interactions. The Cellular Automata model SCIARA, developed far the simulation of lava flows, demonstrated to be able to reproduce the behaviour of Etnean events. However, in order to apply the model far the prediction of future scenarios, a thorough calibrating phase is required. This work presents the application of Genetic Algorithms, general-purpose search algorithms inspired to natural selection and genetics, far the parameters optimisation of the model SCIARA. Difficulties due to the elevated computational time suggested the adoption a Master-Slave Parallel Genetic Algorithm far the calibration of the model with respect to the 2001 Mt. Etna eruption. Results demonstrated the usefulness of the approach, both in terms of computing time and quality of performed simulations

  9. The development of GPU-based parallel PRNG for Monte Carlo applications in CUDA Fortran

    Directory of Open Access Journals (Sweden)

    Hamed Kargaran

    2016-04-01

    Full Text Available The implementation of Monte Carlo simulation on the CUDA Fortran requires a fast random number generation with good statistical properties on GPU. In this study, a GPU-based parallel pseudo random number generator (GPPRNG have been proposed to use in high performance computing systems. According to the type of GPU memory usage, GPU scheme is divided into two work modes including GLOBAL_MODE and SHARED_MODE. To generate parallel random numbers based on the independent sequence method, the combination of middle-square method and chaotic map along with the Xorshift PRNG have been employed. Implementation of our developed PPRNG on a single GPU showed a speedup of 150x and 470x (with respect to the speed of PRNG on a single CPU core for GLOBAL_MODE and SHARED_MODE, respectively. To evaluate the accuracy of our developed GPPRNG, its performance was compared to that of some other commercially available PPRNGs such as MATLAB, FORTRAN and Miller-Park algorithm through employing the specific standard tests. The results of this comparison showed that the developed GPPRNG in this study can be used as a fast and accurate tool for computational science applications.

  10. The development of GPU-based parallel PRNG for Monte Carlo applications in CUDA Fortran

    Energy Technology Data Exchange (ETDEWEB)

    Kargaran, Hamed, E-mail: h-kargaran@sbu.ac.ir; Minuchehr, Abdolhamid; Zolfaghari, Ahmad [Department of nuclear engineering, Shahid Behesti University, Tehran, 1983969411 (Iran, Islamic Republic of)

    2016-04-15

    The implementation of Monte Carlo simulation on the CUDA Fortran requires a fast random number generation with good statistical properties on GPU. In this study, a GPU-based parallel pseudo random number generator (GPPRNG) have been proposed to use in high performance computing systems. According to the type of GPU memory usage, GPU scheme is divided into two work modes including GLOBAL-MODE and SHARED-MODE. To generate parallel random numbers based on the independent sequence method, the combination of middle-square method and chaotic map along with the Xorshift PRNG have been employed. Implementation of our developed PPRNG on a single GPU showed a speedup of 150x and 470x (with respect to the speed of PRNG on a single CPU core) for GLOBAL-MODE and SHARED-MODE, respectively. To evaluate the accuracy of our developed GPPRNG, its performance was compared to that of some other commercially available PPRNGs such as MATLAB, FORTRAN and Miller-Park algorithm through employing the specific standard tests. The results of this comparison showed that the developed GPPRNG in this study can be used as a fast and accurate tool for computational science applications.

  11. The Permanent Magnet Operating Mechanism of Double Coil Parallel Driven at a High Speed

    Directory of Open Access Journals (Sweden)

    WEI Xau-Lao

    2017-02-01

    Full Text Available Abstract:Operating mechanism is the main part of breaker,and the quality of breaker will directly influence the safe operation of power system. Because of the continuous improvement requirements of switch,in order to mak this actuator faster and more powerful closing,this paper proposes a double coil parallel driven permanent magnet actuator at a high speed. This paper expounds the working principle of single and double coil parallel driven permanent magnet actuator. It uses Ansoft building model and contrasts test results. In prance we designed and produced the single and double coil parallel driven permanent magnet actuator for experimental study. The simulation and experiment results show that double coil parallel driven permanent magnet actuator,compared with single coil parallel driven permanent magnet actuator,has a better and faster action performance. Thus,the double coil parallel driven permanent magnet actuator achieves a kind of optimization.

  12. The application of parallel wells to support the use of groundwater for sustainable irrigation

    Science.gov (United States)

    Suhardi

    2018-05-01

    The use of groundwater as a source of irrigation is one alternative in meeting water needs of plants. Using groundwater for irrigation requires a high cost because of the discharge that can be taken is limited. In addition, the use of large groundwater can cause environmental damage and social conflict. To minimize costs, maintain quality of the environment and to prevent social conflicts, it is necessary to innovate in the groundwater taking system. The study was conducted with an innovation of using parallel wells. Performance is measured by comparing parallel wells with a single well. The results showed that the use of parallel wells to meet the water needs of rice plants and increase the pump discharge up to 100%. In addition, parallel wells can reduce the influence radius of taking of groundwater compared to single well so as to prevent social conflict. Thus, the use of parallel wells can support the achievement of the use of groundwater for sustainable irrigation.

  13. Parallelism at Cern: real-time and off-line applications in the GP-MIMD2 project

    International Nuclear Information System (INIS)

    Calafiura, P.

    1997-01-01

    A wide range of general purpose high-energy physics applications, ranging from Monte Carlo simulation to data acquisition, from interactive data analysis to on-line filtering, have been ported, or developed, and run in parallel on IBM SP-2 and Meiko CS-2 CERN large multi-processor machines. The ESPRIT project GP-MIMD2 has been a catalyst for the interest in parallel computing at CERN. The project provided the 128 processor Meiko CS-2 system that is now succesfully integrated in the CERN computing environment. The CERN experiment NA48 was involved in the GP-MIMD2 project since the beginning. NA48 physicists run, as part of their day-to-day work, simulation and analysis programs parallelized using the message passing interface MPI. The CS-2 is also a vital component of the experiment data acquisition system and will be used to calibrate in real-time the 13000 channels liquid krypton calorimeter. (orig.)

  14. Development of three-dimensional neoclassical transport simulation code with high performance Fortran on a vector-parallel computer

    International Nuclear Information System (INIS)

    Satake, Shinsuke; Okamoto, Masao; Nakajima, Noriyoshi; Takamaru, Hisanori

    2005-11-01

    A neoclassical transport simulation code (FORTEC-3D) applicable to three-dimensional configurations has been developed using High Performance Fortran (HPF). Adoption of computing techniques for parallelization and a hybrid simulation model to the δf Monte-Carlo method transport simulation, including non-local transport effects in three-dimensional configurations, makes it possible to simulate the dynamism of global, non-local transport phenomena with a self-consistent radial electric field within a reasonable computation time. In this paper, development of the transport code using HPF is reported. Optimization techniques in order to achieve both high vectorization and parallelization efficiency, adoption of a parallel random number generator, and also benchmark results, are shown. (author)

  15. High-speed parallel solution of the neutron diffusion equation with the hierarchical domain decomposition boundary element method incorporating parallel communications

    International Nuclear Information System (INIS)

    Tsuji, Masashi; Chiba, Gou

    2000-01-01

    A hierarchical domain decomposition boundary element method (HDD-BEM) for solving the multiregion neutron diffusion equation (NDE) has been fully parallelized, both for numerical computations and for data communications, to accomplish a high parallel efficiency on distributed memory message passing parallel computers. Data exchanges between node processors that are repeated during iteration processes of HDD-BEM are implemented, without any intervention of the host processor that was used to supervise parallel processing in the conventional parallelized HDD-BEM (P-HDD-BEM). Thus, the parallel processing can be executed with only cooperative operations of node processors. The communication overhead was even the dominant time consuming part in the conventional P-HDD-BEM, and the parallelization efficiency decreased steeply with the increase of the number of processors. With the parallel data communication, the efficiency is affected only by the number of boundary elements assigned to decomposed subregions, and the communication overhead can be drastically reduced. This feature can be particularly advantageous in the analysis of three-dimensional problems where a large number of processors are required. The proposed P-HDD-BEM offers a promising solution to the deterioration problem of parallel efficiency and opens a new path to parallel computations of NDEs on distributed memory message passing parallel computers. (author)

  16. Radiation-hard/high-speed parallel optical links

    International Nuclear Information System (INIS)

    Gan, K.K.; Buchholz, P.; Kagan, H.P.; Kass, R.D.; Moore, J.; Smith, D.S.; Wiese, A.; Ziolkowski, M.

    2014-01-01

    We have designed an ASIC for use in a parallel optical engine for a new layer of the ATLAS pixel detector in the initial phase of the LHC luminosity upgrade. The ASIC is a 12-channel VCSEL (Vertical Cavity Surface Emitting Laser) array driver capable of operating up to 5 Gb/s per channel. The ASIC is designed using a 130 nm CMOS process to enhance the radiation-hardness. A scheme for redundancy has also been implemented to allow bypassing of a broken VCSEL. The ASIC also contains a power-on reset circuit that sets the ASIC to a default configuration with no signal steering. In addition, the bias and modulation currents of the individual channels are programmable. The performance of the first prototype ASIC up to 5 Gb/s is satisfactory. Furthermore, we are able to program the bias and modulation currents and to bypass a broken VCSEL channel. We are currently upgrading our design to allow operation at 10 Gb/s per channel yielding an aggregated bandwidth of 120 Gb/s. Some preliminary results of the design will be presented

  17. Conical Refraction of Elastic Waves by Anisotropic Metamaterials and Application for Parallel Translation of Elastic Waves.

    Science.gov (United States)

    Ahn, Young Kwan; Lee, Hyung Jin; Kim, Yoon Young

    2017-08-30

    Conical refraction, which is quite well-known in electromagnetic waves, has not been explored well in elastic waves due to the lack of proper natural elastic media. Here, we propose and design a unique anisotropic elastic metamaterial slab that realizes conical refraction for horizontally incident longitudinal or transverse waves; the single-mode wave is split into two oblique coupled longitudinal-shear waves. As an interesting application, we carried out an experiment of parallel translation of an incident elastic wave system through the anisotropic metamaterial slab. The parallel translation can be useful for ultrasonic non-destructive testing of a system hidden by obstacles. While the parallel translation resembles light refraction through a parallel plate without angle deviation between entry and exit beams, this wave behavior cannot be achieved without the engineered metamaterial because an elastic wave incident upon a dissimilar medium is always split at different refraction angles into two different modes, longitudinal and shear.

  18. High accuracy microwave frequency measurement based on single-drive dual-parallel Mach-Zehnder modulator

    DEFF Research Database (Denmark)

    Zhao, Ying; Pang, Xiaodan; Deng, Lei

    2011-01-01

    A novel approach for broadband microwave frequency measurement by employing a single-drive dual-parallel Mach-Zehnder modulator is proposed and experimentally demonstrated. Based on bias manipulations of the modulator, conventional frequency-to-power mapping technique is developed by performing a...... 10−3 relative error. This high accuracy frequency measurement technique is a promising candidate for high-speed electronic warfare and defense applications....

  19. A High-Performance Parallel FDTD Method Enhanced by Using SSE Instruction Set

    Directory of Open Access Journals (Sweden)

    Dau-Chyrh Chang

    2012-01-01

    Full Text Available We introduce a hardware acceleration technique for the parallel finite difference time domain (FDTD method using the SSE (streaming (single instruction multiple data SIMD extensions instruction set. The implementation of SSE instruction set to parallel FDTD method has achieved the significant improvement on the simulation performance. The benchmarks of the SSE acceleration on both the multi-CPU workstation and computer cluster have demonstrated the advantages of (vector arithmetic logic unit VALU acceleration over GPU acceleration. Several engineering applications are employed to demonstrate the performance of parallel FDTD method enhanced by SSE instruction set.

  20. High-speed parallel implementation of a modified PBR algorithm on DSP-based EH topology

    Science.gov (United States)

    Rajan, K.; Patnaik, L. M.; Ramakrishna, J.

    1997-08-01

    Algebraic Reconstruction Technique (ART) is an age-old method used for solving the problem of three-dimensional (3-D) reconstruction from projections in electron microscopy and radiology. In medical applications, direct 3-D reconstruction is at the forefront of investigation. The simultaneous iterative reconstruction technique (SIRT) is an ART-type algorithm with the potential of generating in a few iterations tomographic images of a quality comparable to that of convolution backprojection (CBP) methods. Pixel-based reconstruction (PBR) is similar to SIRT reconstruction, and it has been shown that PBR algorithms give better quality pictures compared to those produced by SIRT algorithms. In this work, we propose a few modifications to the PBR algorithms. The modified algorithms are shown to give better quality pictures compared to PBR algorithms. The PBR algorithm and the modified PBR algorithms are highly compute intensive, Not many attempts have been made to reconstruct objects in the true 3-D sense because of the high computational overhead. In this study, we have developed parallel two-dimensional (2-D) and 3-D reconstruction algorithms based on modified PBR. We attempt to solve the two problems encountered by the PBR and modified PBR algorithms, i.e., the long computational time and the large memory requirements, by parallelizing the algorithm on a multiprocessor system. We investigate the possible task and data partitioning schemes by exploiting the potential parallelism in the PBR algorithm subject to minimizing the memory requirement. We have implemented an extended hypercube (EH) architecture for the high-speed execution of the 3-D reconstruction algorithm using the commercially available fast floating point digital signal processor (DSP) chips as the processing elements (PEs) and dual-port random access memories (DPR) as channels between the PEs. We discuss and compare the performances of the PBR algorithm on an IBM 6000 RISC workstation, on a Silicon

  1. High-performance parallel processors based on star-coupled wavelength division multiplexing optical interconnects

    Science.gov (United States)

    Deri, Robert J.; DeGroot, Anthony J.; Haigh, Ronald E.

    2002-01-01

    As the performance of individual elements within parallel processing systems increases, increased communication capability between distributed processor and memory elements is required. There is great interest in using fiber optics to improve interconnect communication beyond that attainable using electronic technology. Several groups have considered WDM, star-coupled optical interconnects. The invention uses a fiber optic transceiver to provide low latency, high bandwidth channels for such interconnects using a robust multimode fiber technology. Instruction-level simulation is used to quantify the bandwidth, latency, and concurrency required for such interconnects to scale to 256 nodes, each operating at 1 GFLOPS performance. Performance scales have been shown to .apprxeq.100 GFLOPS for scientific application kernels using a small number of wavelengths (8 to 32), only one wavelength received per node, and achievable optoelectronic bandwidth and latency.

  2. HPC-NMF: A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization

    Energy Technology Data Exchange (ETDEWEB)

    2016-08-22

    NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems for $\\WW$ and $\\HH$. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementation, our algorithm is also flexible: It performs well for both dense and sparse matrices, and allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors $\\WW$ and $\\HH$ within the alternating iterations.

  3. Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

    Science.gov (United States)

    OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

    1998-01-01

    This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).

  4. Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing

    Science.gov (United States)

    Amooie, M. A.; Moortgat, J.

    2017-12-01

    We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.

  5. Parallel and series FED microstrip array with high efficiency and low cross polarization

    Science.gov (United States)

    Huang, John (Inventor)

    1995-01-01

    A microstrip array antenna for vertically polarized fan beam (approximately 2 deg x 50 deg) for C-band SAR applications with a physical area of 1.7 m by 0.17 m comprises two rows of patch elements and employs a parallel feed to left- and right-half sections of the rows. Each section is divided into two segments that are fed in parallel with the elements in each segment fed in series through matched transmission lines for high efficiency. The inboard section has half the number of patch elements of the outboard section, and the outboard sections, which have tapered distribution with identical transmission line sections, terminated with half wavelength long open-circuit stubs so that the remaining energy is reflected and radiated in phase. The elements of the two inboard segments of the two left- and right-half sections are provided with tapered transmission lines from element to element for uniform power distribution over the central third of the entire array antenna. The two rows of array elements are excited at opposite patch feed locations with opposite (180 deg difference) phases for reduced cross-polarization.

  6. Adapting high-level language programs for parallel processing using data flow

    Science.gov (United States)

    Standley, Hilda M.

    1988-01-01

    EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.

  7. High-speed parallel forward error correction for optical transport networks

    DEFF Research Database (Denmark)

    Rasmussen, Anders; Ruepp, Sarah Renée; Berger, Michael Stübert

    2010-01-01

    This paper presents a highly parallelized hardware implementation of the standard OTN Reed-Solomon Forward Error Correction algorithm. The proposed circuit is designed to meet the immense throughput required by OTN4, using commercially available FPGA technology....

  8. Artificial intelligence - applications in high energy and nuclear physics

    Energy Technology Data Exchange (ETDEWEB)

    Mueller, U. E-mail: mueller@whep.uni-wuppertal.de

    2003-04-21

    In the parallel sessions at ACAT2002 different artificial intelligence applications in high energy and nuclear physics were presented. I will briefly summarize these presentations. Further details can be found in the relevant section of these proceedings.

  9. Parallel transmission techniques in magnetic resonance imaging: experimental realization, applications and perspectives

    International Nuclear Information System (INIS)

    Ullmann, P.

    2007-06-01

    The primary objective of this work was the first experimental realization of parallel RF transmission for accelerating spatially selective excitation in magnetic resonance imaging. Furthermore, basic aspects regarding the performance of this technique were investigated, potential risks regarding the specific absorption rate (SAR) were considered and feasibility studies under application-oriented conditions as first steps towards a practical utilisation of this technique were undertaken. At first, based on the RF electronics platform of the Bruker Avance MRI systems, the technical foundations were laid to perform simultaneous transmission of individual RF waveforms on different RF channels. Another essential requirement for the realization of Parallel Excitation (PEX) was the design and construction of suitable RF transmit arrays with elements driven by separate transmit channels. In order to image the PEX results two imaging methods were implemented based on a spin-echo and a gradient-echo sequence, in which a parallel spatially selective pulse was included as an excitation pulse. In the course of this work PEX experiments were successfully performed on three different MRI systems, a 4.7 T and a 9.4 T animal system and a 3 T human scanner, using 5 different RF coil setups in total. In the last part of this work investigations regarding possible applications of Parallel Excitation were performed. A first study comprised experiments of slice-selective B1 inhomogeneity correction by using 3D-selective Parallel Excitation. The investigations were performed in a phantom as well as in a rat fixed in paraformaldehyde solution. In conjunction with these experiments a novel method of calculating RF pulses for spatially selective excitation based on a so-called Direct Calibration approach was developed, which is particularly suitable for this type of experiments. In the context of these experiments it was demonstrated how to combine the advantages of parallel transmission

  10. Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

    Directory of Open Access Journals (Sweden)

    Hari Radhakrishnan

    2015-01-01

    Full Text Available This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were done using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.

  11. Towards a HPC-oriented parallel implementation of a learning algorithm for bioinformatics applications.

    Science.gov (United States)

    D'Angelo, Gianni; Rampone, Salvatore

    2014-01-01

    The huge quantity of data produced in Biomedical research needs sophisticated algorithmic methodologies for its storage, analysis, and processing. High Performance Computing (HPC) appears as a magic bullet in this challenge. However, several hard to solve parallelization and load balancing problems arise in this context. Here we discuss the HPC-oriented implementation of a general purpose learning algorithm, originally conceived for DNA analysis and recently extended to treat uncertainty on data (U-BRAIN). The U-BRAIN algorithm is a learning algorithm that finds a Boolean formula in disjunctive normal form (DNF), of approximately minimum complexity, that is consistent with a set of data (instances) which may have missing bits. The conjunctive terms of the formula are computed in an iterative way by identifying, from the given data, a family of sets of conditions that must be satisfied by all the positive instances and violated by all the negative ones; such conditions allow the computation of a set of coefficients (relevances) for each attribute (literal), that form a probability distribution, allowing the selection of the term literals. The great versatility that characterizes it, makes U-BRAIN applicable in many of the fields in which there are data to be analyzed. However the memory and the execution time required by the running are of O(n(3)) and of O(n(5)) order, respectively, and so, the algorithm is unaffordable for huge data sets. We find mathematical and programming solutions able to lead us towards the implementation of the algorithm U-BRAIN on parallel computers. First we give a Dynamic Programming model of the U-BRAIN algorithm, then we minimize the representation of the relevances. When the data are of great size we are forced to use the mass memory, and depending on where the data are actually stored, the access times can be quite different. According to the evaluation of algorithmic efficiency based on the Disk Model, in order to reduce the costs of

  12. OS and Runtime Support for Efficiently Managing Cores in Parallel Applications

    OpenAIRE

    Klues, Kevin Alan

    2015-01-01

    Parallel applications can benefit from the ability to explicitly control their thread scheduling policies in user-space. However, modern operating systems lack the interfaces necessary to make this type of “user-level” scheduling efficient. The key component missing is the ability for applications to gain direct access to cores and keep control of those cores even when making I/O operations that traditionally block in the kernel. A number of former systems provided limited support for these c...

  13. Parallel multigrid methods: implementation on message-passing computers and applications to fluid dynamics. A draft

    International Nuclear Information System (INIS)

    Solchenbach, K.; Thole, C.A.; Trottenberg, U.

    1987-01-01

    For a wide class of problems in scientific computing, in particular for partial differential equations, the multigrid principle has proved to yield highly efficient numerical methods. However, the principle has to be applied carefully: if the multigrid components are not chosen adequately with respect to the given problem, the efficiency may be much smaller than possible. This has been demonstrated for many practical problems. Unfortunately, the general theories on multigrid convergence do not give much help in constructing really efficient multigrid algorithms. Although some progress has been made in bridging the gap between theory and practice during the last few years, there are still several theoretical approaches which are misleading rather than helpful with respect to the objective of real efficiency. The research in finding highly efficient algorithms for non-model applications therefore is still a sophisticated mixture of theoretical considerations, a transfer of experiences from model to real life problems and systematical experimental work. The emphasis of the practical research activity today lies - among others - in the following fields: - finding efficient multigrid components for really complex problems, - combining the multigrid approach with advanced discretizative techniques: - constructing highly parallel multigrid algorithms. In this paper, we want to deal mainly with the last topic

  14. Parallel Application Performance on Two Generations of Intel Xeon HPC Platforms

    Energy Technology Data Exchange (ETDEWEB)

    Chang, Christopher H.; Long, Hai; Sides, Scott; Vaidhynathan, Deepthi; Jones, Wesley

    2015-10-15

    Two next-generation node configurations hosting the Haswell microarchitecture were tested with a suite of microbenchmarks and application examples, and compared with a current Ivy Bridge production node on NREL" tm s Peregrine high-performance computing cluster. A primary conclusion from this study is that the additional cores are of little value to individual task performance--limitations to application parallelism, or resource contention among concurrently running but independent tasks, limits effective utilization of these added cores. Hyperthreading generally impacts throughput negatively, but can improve performance in the absence of detailed attention to runtime workflow configuration. The observations offer some guidance to procurement of future HPC systems at NREL. First, raw core count must be balanced with available resources, particularly memory bandwidth. Balance-of-system will determine value more than processor capability alone. Second, hyperthreading continues to be largely irrelevant to the workloads that are commonly seen, and were tested here, at NREL. Finally, perhaps the most impactful enhancement to productivity might occur through enabling multiple concurrent jobs per node. Given the right type and size of workload, more may be achieved by doing many slow things at once, than fast things in order.

  15. FISH: A THREE-DIMENSIONAL PARALLEL MAGNETOHYDRODYNAMICS CODE FOR ASTROPHYSICAL APPLICATIONS

    International Nuclear Information System (INIS)

    Kaeppeli, R.; Whitehouse, S. C.; Scheidegger, S.; Liebendoerfer, M.; Pen, U.-L.

    2011-01-01

    FISH is a fast and simple ideal magnetohydrodynamics code that scales to ∼10,000 processes for a Cartesian computational domain of ∼1000 3 cells. The simplicity of FISH has been achieved by the rigorous application of the operator splitting technique, while second-order accuracy is maintained by the symmetric ordering of the operators. Between directional sweeps, the three-dimensional data are rotated in memory so that the sweep is always performed in a cache-efficient way along the direction of contiguous memory. Hence, the code only requires a one-dimensional description of the conservation equations to be solved. This approach also enables an elegant novel parallelization of the code that is based on persistent communications with MPI for cubic domain decomposition on machines with distributed memory. This scheme is then combined with an additional OpenMP parallelization of different sweeps that can take advantage of clusters of shared memory. We document the detailed implementation of a second-order total variation diminishing advection scheme based on flux reconstruction. The magnetic fields are evolved by a constrained transport scheme. We show that the subtraction of a simple estimate of the hydrostatic gradient from the total gradients can significantly reduce the dissipation of the advection scheme in simulations of gravitationally bound hydrostatic objects. Through its simplicity and efficiency, FISH is as well suited for hydrodynamics classes as for large-scale astrophysical simulations on high-performance computer clusters. In preparation for the release of a public version, we demonstrate the performance of FISH in a suite of astrophysically orientated test cases.

  16. Unified Lambert Tool for Massively Parallel Applications in Space Situational Awareness

    Science.gov (United States)

    Woollands, Robyn M.; Read, Julie; Hernandez, Kevin; Probe, Austin; Junkins, John L.

    2018-03-01

    Awareness computer cluster at the LASR Lab, Texas A&M University. We demonstrate the power of our tool by solving a highly parallel example problem, that is the generation of extremal field maps for optimal spacecraft rendezvous (and eventual orbit debris removal). In addition we demonstrate the need for including perturbative effects in simulations for satellite tracking or data association. The unified Lambert tool is ideal for but not limited to space situational awareness applications.

  17. 10th International Workshop on Parallel Tools for High Performance Computing

    CERN Document Server

    Gracia, José; Hilbrich, Tobias; Knüpfer, Andreas; Resch, Michael; Nagel, Wolfgang

    2017-01-01

    This book presents the proceedings of the 10th International Parallel Tools Workshop, held October 4-5, 2016 in Stuttgart, Germany – a forum to discuss the latest advances in parallel tools. High-performance computing plays an increasingly important role for numerical simulation and modelling in academic and industrial research. At the same time, using large-scale parallel systems efficiently is becoming more difficult. A number of tools addressing parallel program development and analysis have emerged from the high-performance computing community over the last decade, and what may have started as collection of small helper script has now matured to production-grade frameworks. Powerful user interfaces and an extensive body of documentation allow easy usage by non-specialists.

  18. A wavelet-based regularized reconstruction algorithm for SENSE parallel MRI with applications to neuroimaging

    International Nuclear Information System (INIS)

    Chaari, L.; Pesquet, J.Ch.; Chaari, L.; Ciuciu, Ph.; Benazza-Benyahia, A.

    2011-01-01

    To reduce scanning time and/or improve spatial/temporal resolution in some Magnetic Resonance Imaging (MRI) applications, parallel MRI acquisition techniques with multiple coils acquisition have emerged since the early 1990's as powerful imaging methods that allow a faster acquisition process. In these techniques, the full FOV image has to be reconstructed from the resulting acquired under sampled k-space data. To this end, several reconstruction techniques have been proposed such as the widely-used Sensitivity Encoding (SENSE) method. However, the reconstructed image generally presents artifacts when perturbations occur in both the measured data and the estimated coil sensitivity profiles. In this paper, we aim at achieving accurate image reconstruction under degraded experimental conditions (low magnetic field and high reduction factor), in which neither the SENSE method nor the Tikhonov regularization in the image domain give convincing results. To this end, we present a novel method for SENSE-based reconstruction which proceeds with regularization in the complex wavelet domain by promoting sparsity. The proposed approach relies on a fast algorithm that enables the minimization of regularized non-differentiable criteria including more general penalties than a classical l 1 term. To further enhance the reconstructed image quality, local convex constraints are added to the regularization process. In vivo human brain experiments carried out on Gradient-Echo (GRE) anatomical and Echo Planar Imaging (EPI) functional MRI data at 1.5 T indicate that our algorithm provides reconstructed images with reduced artifacts for high reduction factors. (authors)

  19. Application of Pfortran and Co-Array Fortran in the Parallelization of the GROMOS96 Molecular Dynamics Module

    Directory of Open Access Journals (Sweden)

    Piotr Bała

    2001-01-01

    Full Text Available After at least a decade of parallel tool development, parallelization of scientific applications remains a significant undertaking. Typically parallelization is a specialized activity supported only partially by the programming tool set, with the programmer involved with parallel issues in addition to sequential ones. The details of concern range from algorithm design down to low-level data movement details. The aim of parallel programming tools is to automate the latter without sacrificing performance and portability, allowing the programmer to focus on algorithm specification and development. We present our use of two similar parallelization tools, Pfortran and Cray's Co-Array Fortran, in the parallelization of the GROMOS96 molecular dynamics module. Our parallelization started from the GROMOS96 distribution's shared-memory implementation of the replicated algorithm, but used little of that existing parallel structure. Consequently, our parallelization was close to starting with the sequential version. We found the intuitive extensions to Pfortran and Co-Array Fortran helpful in the rapid parallelization of the project. We present performance figures for both the Pfortran and Co-Array Fortran parallelizations showing linear speedup within the range expected by these parallelization methods.

  20. Optimization and application of parallel solid-phase extraction coupled with ultra-high performance liquid chromatography-tandem mass spectrometry for the determination of 11 aminoglycoside residues in honey and royal jelly.

    Science.gov (United States)

    Wang, Xinran; Yang, Shupeng; Li, Yi; Zhang, Jinzhen; Jin, Yue; Zhao, Wen; Zhang, Yongxin; Huang, Jingping; Wang, Peng; Wu, Cuiling; Zhou, Jinhui

    2018-03-23

    A robust and sensitive method of solid-phase extraction followed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) was established and performed for the simultaneous determination of eleven aminoglycosides (AGs) in royal jelly and honey. After sample extraction by a phosphate buffer containing trichloroacetic acid (TCA) and ethylenediaminetetracetic acid disodium salt (Na 2 EDTA), the extraction solution was subjected to a parallel solid-phase extraction for clean-up prior to the LC-MS/MS analysis. The same method was applied to analyze two completely different matrices, honey and royal jelly. Good sensitivity, repeatability, and recovery were obtained by using the mobile phase without an ion-pairing reagent such as heptafluorobutyric acid (HFBA) or sodium heptanesulfonate. The calibration curves of the honey and royal jelly samples exhibited a good linear response (R 2  > 0.99) at six concentrations in the range of 10-1000 μg/mL. The limit of quantification (LOQ) of the AGs ranged from 10 to 25 μg/kg in the honey and from 12.5 to 25 μg/kg in the royal jelly. The recoveries of the AGs for the honey and royal jelly samples were in the range of 79.48% to 108.95% and 74.61% to 113.70% respectively and the relative standard deviations (RSDs) were between 1.23% and 9.59%, and between 1.51% and 9.98%, respectively. The proposed approach has been allowed in China as a reference method for the simultaneous determination of eleven AGs in honey and royal jelly. Copyright © 2018 Elsevier B.V. All rights reserved.

  1. High-Bandwidth, High-Efficiency Envelope Tracking Power Supply for 40W RF Power Amplifier Using Paralleled Bandpass Current Sources

    DEFF Research Database (Denmark)

    Høyerby, Mikkel Christian Wendelboe; Andersen, Michael Andreas E.

    2005-01-01

    This paper presents a high-performance power conversion scheme for power supply applications that require very high output voltage slew rates (dV/dt). The concept is to parallel 2 switching bandpass current sources, each optimized for its passband frequency space and the expected load current....... The principle is demonstrated with a power supply, designed for supplying a 40 W linear RF power amplifier for efficient amplification of a 16-QAM modulated data stream...

  2. High-speed detection of emergent market clustering via an unsupervised parallel genetic algorithm

    Directory of Open Access Journals (Sweden)

    Dieter Hendricks

    2016-02-01

    Full Text Available We implement a master-slave parallel genetic algorithm with a bespoke log-likelihood fitness function to identify emergent clusters within price evolutions. We use graphics processing units (GPUs to implement a parallel genetic algorithm and visualise the results using disjoint minimal spanning trees. We demonstrate that our GPU parallel genetic algorithm, implemented on a commercially available general purpose GPU, is able to recover stock clusters in sub-second speed, based on a subset of stocks in the South African market. This approach represents a pragmatic choice for low-cost, scalable parallel computing and is significantly faster than a prototype serial implementation in an optimised C-based fourth-generation programming language, although the results are not directly comparable because of compiler differences. Combined with fast online intraday correlation matrix estimation from high frequency data for cluster identification, the proposed implementation offers cost-effective, near-real-time risk assessment for financial practitioners.

  3. Modular high-temperature gas-cooled reactor simulation using parallel processors

    International Nuclear Information System (INIS)

    Ball, S.J.; Conklin, J.C.

    1989-01-01

    The MHPP (Modular HTGR Parallel Processor) code has been developed to simulate modular high-temperature gas-cooled reactor (MHTGR) transients and accidents. MHPP incorporates a very detailed model for predicting the dynamics of the reactor core, vessel, and cooling systems over a wide variety of scenarios ranging from expected transients to very-low-probability severe accidents. The simulations routines, which had originally been developed entirely as serial code, were readily adapted to parallel processing Fortran. The resulting parallelized simulation speed was enhanced significantly. Workstation interfaces are being developed to provide for user (operator) interaction. In this paper the benefits realized by adapting previous MHTGR codes to run on a parallel processor are discussed, along with results of typical accident analyses

  4. Reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application

    Science.gov (United States)

    Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda E [Cambridge, MA; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

    2012-04-17

    Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.

  5. Reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application

    Science.gov (United States)

    Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda A [Rochester, MN; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

    2012-01-10

    Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.

  6. Development of a parallel zoomed EVI sequence for high temporal resolution analysis of the BOLD response

    International Nuclear Information System (INIS)

    Rabrait, C.

    2006-01-01

    acquisition of one line out of two and one plane out new acquisition method was experimented with success both with block-designed and event related visual paradigms. An 80 x 80 x 120 mm 3 volume, containing the visual cortex, was acquired with a repetition time of 200 ms. In both cases, a robust activation was detected in the primary visual cortex, with a high statistical significance threshold (7). Actually, high scanning rates mostly benefit for hemodynamic response estimation. To estimate hemodynamic response functions from our event-related data, a non parametric unsupervised method (8) is applied to the activated clusters selected using the SPM2 software. In every tested clusters, the estimated response functions are close to the usually assumed shape, although no prior assumption about the shape of the response is made. Furthermore, a significant early negative response is sometimes seen, which i s generally difficult to observe at 1.5 T. Current studies now aim at obtaining good quality single-voxel hemodynamic response functions, in order to map the spatio-temporal features of the BOLD response. Activation maps based on the early negative response could also be a valuable application of our method, since this response is thought to be more co-localized with the true activated areas than the delayed positive response (9). Another goal of our future work is the optimization of new paradigms to better exploit zoomed parallel Echo Volumar Imaging properties. (author)

  7. MINUIT package parallelization and applications using the RooFit package

    International Nuclear Information System (INIS)

    Lazzaro, Alfio; Moneta, Lorenzo

    2010-01-01

    The fitting procedures are based on numerical minimization of functions. The MINUIT package is the most common package used for such procedures in High Energy Physics community. The main algorithm in this package, MIGRAD, searches the minimum of a function using the gradient information. For each minimization iteration, MIGRAD requires the calculation of the derivative for each free parameter of the function to be minimized. Minimization is required for data analysis problems based on the maximum likelihood technique. The calculation of complex likelihood functions, with several free parameters, many independent variables and large data samples, can be very CPU-time consuming. Then, the minimization process requires the calculation of the likelihood functions several times for each minimization iteration. In this paper we will show how MIGRAD algorithm and the likelihood function calculation can be easily parallelized using Message Passing Interface techniques. We will present the speed-up improvements obtained in typical physics applications such as complex maximum likelihood fits using the RooFit package.

  8. Parallel transmission techniques in magnetic resonance imaging: experimental realization, applications and perspectives; Parallele Sendetechniken in der Magnetresonanztomographie: experimentelle Realisierung, Anwendungen und Perspektiven

    Energy Technology Data Exchange (ETDEWEB)

    Ullmann, P.

    2007-06-15

    The primary objective of this work was the first experimental realization of parallel RF transmission for accelerating spatially selective excitation in magnetic resonance imaging. Furthermore, basic aspects regarding the performance of this technique were investigated, potential risks regarding the specific absorption rate (SAR) were considered and feasibility studies under application-oriented conditions as first steps towards a practical utilisation of this technique were undertaken. At first, based on the RF electronics platform of the Bruker Avance MRI systems, the technical foundations were laid to perform simultaneous transmission of individual RF waveforms on different RF channels. Another essential requirement for the realization of Parallel Excitation (PEX) was the design and construction of suitable RF transmit arrays with elements driven by separate transmit channels. In order to image the PEX results two imaging methods were implemented based on a spin-echo and a gradient-echo sequence, in which a parallel spatially selective pulse was included as an excitation pulse. In the course of this work PEX experiments were successfully performed on three different MRI systems, a 4.7 T and a 9.4 T animal system and a 3 T human scanner, using 5 different RF coil setups in total. In the last part of this work investigations regarding possible applications of Parallel Excitation were performed. A first study comprised experiments of slice-selective B1 inhomogeneity correction by using 3D-selective Parallel Excitation. The investigations were performed in a phantom as well as in a rat fixed in paraformaldehyde solution. In conjunction with these experiments a novel method of calculating RF pulses for spatially selective excitation based on a so-called Direct Calibration approach was developed, which is particularly suitable for this type of experiments. In the context of these experiments it was demonstrated how to combine the advantages of parallel transmission

  9. DVS-SOFTWARE: An Effective Tool for Applying Highly Parallelized Hardware To Computational Geophysics

    Science.gov (United States)

    Herrera, I.; Herrera, G. S.

    2015-12-01

    Most geophysical systems are macroscopic physical systems. The behavior prediction of such systems is carried out by means of computational models whose basic models are partial differential equations (PDEs) [1]. Due to the enormous size of the discretized version of such PDEs it is necessary to apply highly parallelized super-computers. For them, at present, the most efficient software is based on non-overlapping domain decomposition methods (DDM). However, a limiting feature of the present state-of-the-art techniques is due to the kind of discretizations used in them. Recently, I. Herrera and co-workers using 'non-overlapping discretizations' have produced the DVS-Software which overcomes this limitation [2]. The DVS-software can be applied to a great variety of geophysical problems and achieves very high parallel efficiencies (90%, or so [3]). It is therefore very suitable for effectively applying the most advanced parallel supercomputers available at present. In a parallel talk, in this AGU Fall Meeting, Graciela Herrera Z. will present how this software is being applied to advance MOD-FLOW. Key Words: Parallel Software for Geophysics, High Performance Computing, HPC, Parallel Computing, Domain Decomposition Methods (DDM)REFERENCES [1]. Herrera Ismael and George F. Pinder, Mathematical Modelling in Science and Engineering: An axiomatic approach", John Wiley, 243p., 2012. [2]. Herrera, I., de la Cruz L.M. and Rosas-Medina A. "Non Overlapping Discretization Methods for Partial, Differential Equations". NUMER METH PART D E, 30: 1427-1454, 2014, DOI 10.1002/num 21852. (Open source) [3]. Herrera, I., & Contreras Iván "An Innovative Tool for Effectively Applying Highly Parallelized Software To Problems of Elasticity". Geofísica Internacional, 2015 (In press)

  10. Development, Verification and Validation of Parallel, Scalable Volume of Fluid CFD Program for Propulsion Applications

    Science.gov (United States)

    West, Jeff; Yang, H. Q.

    2014-01-01

    There are many instances involving liquid/gas interfaces and their dynamics in the design of liquid engine powered rockets such as the Space Launch System (SLS). Some examples of these applications are: Propellant tank draining and slosh, subcritical condition injector analysis for gas generators, preburners and thrust chambers, water deluge mitigation for launch induced environments and even solid rocket motor liquid slag dynamics. Commercially available CFD programs simulating gas/liquid interfaces using the Volume of Fluid approach are currently limited in their parallel scalability. In 2010 for instance, an internal NASA/MSFC review of three commercial tools revealed that parallel scalability was seriously compromised at 8 cpus and no additional speedup was possible after 32 cpus. Other non-interface CFD applications at the time were demonstrating useful parallel scalability up to 4,096 processors or more. Based on this review, NASA/MSFC initiated an effort to implement a Volume of Fluid implementation within the unstructured mesh, pressure-based algorithm CFD program, Loci-STREAM. After verification was achieved by comparing results to the commercial CFD program CFD-Ace+, and validation by direct comparison with data, Loci-STREAM-VoF is now the production CFD tool for propellant slosh force and slosh damping rate simulations at NASA/MSFC. On these applications, good parallel scalability has been demonstrated for problems sizes of tens of millions of cells and thousands of cpu cores. Ongoing efforts are focused on the application of Loci-STREAM-VoF to predict the transient flow patterns of water on the SLS Mobile Launch Platform in order to support the phasing of water for launch environment mitigation so that vehicle determinantal effects are not realized.

  11. Applications of parallel computer architectures to the real-time simulation of nuclear power systems

    International Nuclear Information System (INIS)

    Doster, J.M.; Sills, E.D.

    1988-01-01

    In this paper the authors report on efforts to utilize parallel computer architectures for the thermal-hydraulic simulation of nuclear power systems and current research efforts toward the development of advanced reactor operator aids and control systems based on this new technology. Many aspects of reactor thermal-hydraulic calculations are inherently parallel, and the computationally intensive portions of these calculations can be effectively implemented on modern computers. Timing studies indicate faster-than-real-time, high-fidelity physics models can be developed when the computational algorithms are designed to take advantage of the computer's architecture. These capabilities allow for the development of novel control systems and advanced reactor operator aids. Coupled with an integral real-time data acquisition system, evolving parallel computer architectures can provide operators and control room designers improved control and protection capabilities. Current research efforts are currently under way in this area

  12. Design of the Trap Filter for the High Power Converters with Parallel Interleaved VSCs

    DEFF Research Database (Denmark)

    Gohil, Ghanshyamsinh Vijaysinh; Bede, Lorand; Teodorescu, Remus

    2014-01-01

    The power handling capability of the state-of-the-art semiconductor devices is limited. Therefore, the Voltage Source Converters (VSCs) are often connected in parallel to realize high power converter. The switching frequency semiconductor devices, used in the high power VSCs, is also limited...

  13. How to use MPI communication in highly parallel climate simulations more easily and more efficiently.

    Science.gov (United States)

    Behrens, Jörg; Hanke, Moritz; Jahns, Thomas

    2014-05-01

    In this talk we present a way to facilitate efficient use of MPI communication for developers of climate models. Exploitation of the performance potential of today's highly parallel supercomputers with real world simulations is a complex task. This is partly caused by the low level nature of the MPI communication library which is the dominant communication tool at least for inter-node communication. In order to manage the complexity of the task, climate simulations with non-trivial communication patterns often use an internal abstraction layer above MPI without exploiting the benefits of communication aggregation or MPI-datatypes. The solution for the complexity and performance problem we propose is the communication library YAXT. This library is built on top of MPI and takes high level descriptions of arbitrary domain decompositions and automatically derives an efficient collective data exchange. Several exchanges can be aggregated in order to reduce latency costs. Examples are given which demonstrate the simplicity and the performance gains for selected climate applications.

  14. Discussion paper for a highly parallel array processor-based machine

    International Nuclear Information System (INIS)

    Hagstrom, R.; Bolotin, G.; Dawson, J.

    1984-01-01

    The architectural plant for a quickly realizable implementation of a highly parallel special-purpose computer system with peak performance in the range of 6 billion floating point operations per second is discussed. The architecture is suitable to Lattice Gauge theoretical computations of fundamental physics interest and may be applicable to a range of other problems which deal with numerically intensive computational problems. The plan is quickly realizable because it employs a maximum of commercially available hardware subsystems and because the architecture is software-transparent to the individual processors, allowing straightforward re-use of whatever commercially available operating-systems and support software that is suitable to run on the commercially-produced processors. A tiny prototype instrument, designed along this architecture has already operated. A few elementary examples of programs which can run efficiently are presented. The large machine which the authors would propose to build would be based upon a highly competent array-processor, the ST-100 Array Processor, and specific design possibilities are discussed. The first step toward realizing this plan practically is to install a single ST-100 to allow algorithm development to proceed while a demonstration unit is built using two of the ST-100 Array Processors

  15. Design Issues and Application of Cable-Based Parallel Manipulators for Rehabilitation Therapy

    Directory of Open Access Journals (Sweden)

    E. Ottaviano

    2008-01-01

    Full Text Available In this study, cable-based manipulators are proposed for application in rehabilitation therapies. Cable-based manipulators show good features that are very useful when the system has to interact with humans. In particular, they can be used to aid motion or as monitoring/training systems in rehabilitation therapies. Modelling and simulation of both active and passive cable-based parallel manipulators are presented for an application to help older people, patients or disabled people in the sit-to-stand transfer and as a monitoring/training system. Experimental results are presented by using built prototypes.

  16. The application of image processing in the measurement for three-light-axis parallelity of laser ranger

    Science.gov (United States)

    Wang, Yang; Wang, Qianqian

    2008-12-01

    When laser ranger is transported or used in field operations, the transmitting axis, receiving axis and aiming axis may be not parallel. The nonparallelism of the three-light-axis will affect the range-measuring ability or make laser ranger not be operated exactly. So testing and adjusting the three-light-axis parallelity in the production and maintenance of laser ranger is important to ensure using laser ranger reliably. The paper proposes a new measurement method using digital image processing based on the comparison of some common measurement methods for the three-light-axis parallelity. It uses large aperture off-axis paraboloid reflector to get the images of laser spot and white light cross line, and then process the images on LabVIEW platform. The center of white light cross line can be achieved by the matching arithmetic in LABVIEW DLL. And the center of laser spot can be achieved by gradation transformation, binarization and area filter in turn. The software system can set CCD, detect the off-axis paraboloid reflector, measure the parallelity of transmitting axis and aiming axis and control the attenuation device. The hardware system selects SAA7111A, a programmable vedio decoding chip, to perform A/D conversion. FIFO (first-in first-out) is selected as buffer.USB bus is used to transmit data to PC. The three-light-axis parallelity can be achieved according to the position bias between them. The device based on this method has been already used. The application proves this method has high precision, speediness and automatization.

  17. A highly efficient parallel algorithm for solving the neutron diffusion nodal equations on shared-memory computers

    International Nuclear Information System (INIS)

    Azmy, Y.Y.; Kirk, B.L.

    1990-01-01

    Modern parallel computer architectures offer an enormous potential for reducing CPU and wall-clock execution times of large-scale computations commonly performed in various applications in science and engineering. Recently, several authors have reported their efforts in developing and implementing parallel algorithms for solving the neutron diffusion equation on a variety of shared- and distributed-memory parallel computers. Testing of these algorithms for a variety of two- and three-dimensional meshes showed significant speedup of the computation. Even for very large problems (i.e., three-dimensional fine meshes) executed concurrently on a few nodes in serial (nonvector) mode, however, the measured computational efficiency is very low (40 to 86%). In this paper, the authors present a highly efficient (∼85 to 99.9%) algorithm for solving the two-dimensional nodal diffusion equations on the Sequent Balance 8000 parallel computer. Also presented is a model for the performance, represented by the efficiency, as a function of problem size and the number of participating processors. The model is validated through several tests and then extrapolated to larger problems and more processors to predict the performance of the algorithm in more computationally demanding situations

  18. Parallel computations

    CERN Document Server

    1982-01-01

    Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn

  19. Application of Raptor-M3G to reactor dosimetry problems on massively parallel architectures - 026

    International Nuclear Information System (INIS)

    Longoni, G.

    2010-01-01

    The solution of complex 3-D radiation transport problems requires significant resources both in terms of computation time and memory availability. Therefore, parallel algorithms and multi-processor architectures are required to solve efficiently large 3-D radiation transport problems. This paper presents the application of RAPTOR-M3G (Rapid Parallel Transport Of Radiation - Multiple 3D Geometries) to reactor dosimetry problems. RAPTOR-M3G is a newly developed parallel computer code designed to solve the discrete ordinates (SN) equations on multi-processor computer architectures. This paper presents the results for a reactor dosimetry problem using a 3-D model of a commercial 2-loop pressurized water reactor (PWR). The accuracy and performance of RAPTOR-M3G will be analyzed and the numerical results obtained from the calculation will be compared directly to measurements of the neutron field in the reactor cavity air gap. The parallel performance of RAPTOR-M3G on massively parallel architectures, where the number of computing nodes is in the order of hundreds, will be analyzed up to four hundred processors. The performance results will be presented based on two supercomputing architectures: the POPLE supercomputer operated by the Pittsburgh Supercomputing Center and the Westinghouse computer cluster. The Westinghouse computer cluster is equipped with a standard Ethernet network connection and an InfiniBand R interconnects capable of a bandwidth in excess of 20 GBit/sec. Therefore, the impact of the network architecture on RAPTOR-M3G performance will be analyzed as well. (authors)

  20. Dynamic Analysis and Vibration Attenuation of Cable-Driven Parallel Manipulators for Large Workspace Applications

    Directory of Open Access Journals (Sweden)

    Jingli Du

    2013-01-01

    Full Text Available Cable-driven parallel manipulators are one of the best solutions to achieving large workspace since flexible cables can be easily stored on reels. However, due to the negligible flexural stiffness of cables, long cables will unavoidably vibrate during operation for large workspace applications. In this paper a finite element model for cable-driven parallel manipulators is proposed to mimic small amplitude vibration of cables around their desired position. Output feedback of the cable tension variation at the end of the end-effector is utilized to design the vibration attenuation controller which aims at attenuating the vibration of cables by slightly varying the cable length, thus decreasing its effect on the end-effector. When cable vibration is attenuated, motion controller could be designed for implementing precise large motion to track given trajectories. A numerical example is presented to demonstrate the dynamic model and the control algorithm.

  1. Parallel rendering

    Science.gov (United States)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  2. Parallel Adaptive Mesh Refinement for High-Order Finite-Volume Schemes in Computational Fluid Dynamics

    Science.gov (United States)

    Schwing, Alan Michael

    For computational fluid dynamics, the governing equations are solved on a discretized domain of nodes, faces, and cells. The quality of the grid or mesh can be a driving source for error in the results. While refinement studies can help guide the creation of a mesh, grid quality is largely determined by user expertise and understanding of the flow physics. Adaptive mesh refinement is a technique for enriching the mesh during a simulation based on metrics for error, impact on important parameters, or location of important flow features. This can offload from the user some of the difficult and ambiguous decisions necessary when discretizing the domain. This work explores the implementation of adaptive mesh refinement in an implicit, unstructured, finite-volume solver. Consideration is made for applying modern computational techniques in the presence of hanging nodes and refined cells. The approach is developed to be independent of the flow solver in order to provide a path for augmenting existing codes. It is designed to be applicable for unsteady simulations and refinement and coarsening of the grid does not impact the conservatism of the underlying numerics. The effect on high-order numerical fluxes of fourth- and sixth-order are explored. Provided the criteria for refinement is appropriately selected, solutions obtained using adapted meshes have no additional error when compared to results obtained on traditional, unadapted meshes. In order to leverage large-scale computational resources common today, the methods are parallelized using MPI. Parallel performance is considered for several test problems in order to assess scalability of both adapted and unadapted grids. Dynamic repartitioning of the mesh during refinement is crucial for load balancing an evolving grid. Development of the methods outlined here depend on a dual-memory approach that is described in detail. Validation of the solver developed here against a number of motivating problems shows favorable

  3. Parallel Reservoir Simulations with Sparse Grid Techniques and Applications to Wormhole Propagation

    KAUST Repository

    Wu, Yuanqing

    2015-09-08

    In this work, two topics of reservoir simulations are discussed. The first topic is the two-phase compositional flow simulation in hydrocarbon reservoir. The major obstacle that impedes the applicability of the simulation code is the long run time of the simulation procedure, and thus speeding up the simulation code is necessary. Two means are demonstrated to address the problem: parallelism in physical space and the application of sparse grids in parameter space. The parallel code can gain satisfactory scalability, and the sparse grids can remove the bottleneck of flash calculations. Instead of carrying out the flash calculation in each time step of the simulation, a sparse grid approximation of all possible results of the flash calculation is generated before the simulation. Then the constructed surrogate model is evaluated to approximate the flash calculation results during the simulation. The second topic is the wormhole propagation simulation in carbonate reservoir. In this work, different from the traditional simulation technique relying on the Darcy framework, we propose a new framework called Darcy-Brinkman-Forchheimer framework to simulate wormhole propagation. Furthermore, to process the large quantity of cells in the simulation grid and shorten the long simulation time of the traditional serial code, standard domain-based parallelism is employed, using the Hypre multigrid library. In addition to that, a new technique called “experimenting field approach” to set coefficients in the model equations is introduced. In the 2D dissolution experiments, different configurations of wormholes and a series of properties simulated by both frameworks are compared. We conclude that the numerical results of the DBF framework are more like wormholes and more stable than the Darcy framework, which is a demonstration of the advantages of the DBF framework. The scalability of the parallel code is also evaluated, and good scalability can be achieved. Finally, a mixed

  4. libstable: Fast, Parallel, and High-Precision Computation of α-Stable Distributions in R, C/C++, and MATLAB

    Directory of Open Access Journals (Sweden)

    Javier Royuela-del-Val

    2017-06-01

    Full Text Available α-stable distributions are a family of well-known probability distributions. However, the lack of closed analytical expressions hinders their application. Currently, several tools have been developed to numerically evaluate their density and distribution functions or to estimate their parameters, but available solutions either do not reach sufficient precision on their evaluations or are excessively slow for practical purposes. Moreover, they do not take full advantage of the parallel processing capabilities of current multi-core machines. Other solutions work only on a subset of the α-stable parameter space. In this paper we present an R package and a C/C++ library with a MATLAB front-end that permit parallelized, fast and high precision evaluation of density, distribution and quantile functions, as well as random variable generation and parameter estimation of α-stable distributions in their whole parameter space. The described library can be easily integrated into third party developments.

  5. Line filter design of parallel interleaved VSCs for high power wind energy conversion systems

    DEFF Research Database (Denmark)

    Gohil, Ghanshyamsinh Vijaysinh; Bede, Lorand; Teodorescu, Remus

    2015-01-01

    The Voltage Source Converters (VSCs) are often connected in parallel in a Wind Energy Conversion System (WECS) to match the high power rating of the modern wind turbines. The effect of the interleaved carriers on the harmonic performance of the parallel connected VSCs is analyzed in this paper...... limit. In order to achieve the desired filter performance with optimal values of the filter parameters, the use of a LC trap branch with the conventional LCL filter is proposed. The expressions for the resonant frequencies of the proposed line filter are derived and used in the design to selectively...

  6. High performance parallelism pearls 2 multicore and many-core programming approaches

    CERN Document Server

    Jeffers, Jim

    2015-01-01

    High Performance Parallelism Pearls Volume 2 offers another set of examples that demonstrate how to leverage parallelism. Similar to Volume 1, the techniques included here explain how to use processors and coprocessors with the same programming - illustrating the most effective ways to combine Xeon Phi coprocessors with Xeon and other multicore processors. The book includes examples of successful programming efforts, drawn from across industries and domains such as biomed, genetics, finance, manufacturing, imaging, and more. Each chapter in this edited work includes detailed explanations of t

  7. Load balancing in highly parallel processing of Monte Carlo code for particle transport

    International Nuclear Information System (INIS)

    Higuchi, Kenji; Takemiya, Hiroshi; Kawasaki, Takuji

    1998-01-01

    In parallel processing of Monte Carlo (MC) codes for neutron, photon and electron transport problems, particle histories are assigned to processors making use of independency of the calculation for each particle. Although we can easily parallelize main part of a MC code by this method, it is necessary and practically difficult to optimize the code concerning load balancing in order to attain high speedup ratio in highly parallel processing. In fact, the speedup ratio in the case of 128 processors remains in nearly one hundred times when using the test bed for the performance evaluation. Through the parallel processing of the MCNP code, which is widely used in the nuclear field, it is shown that it is difficult to attain high performance by static load balancing in especially neutron transport problems, and a load balancing method, which dynamically changes the number of assigned particles minimizing the sum of the computational and communication costs, overcomes the difficulty, resulting in nearly fifteen percentage of reduction for execution time. (author)

  8. Tile-based parallel coordinates and its application in financial visualization

    Science.gov (United States)

    Alsakran, Jamal; Zhao, Ye; Zhao, Xinlei

    2010-01-01

    Parallel coordinates technique has been widely used in information visualization applications and it has achieved great success in visualizing multivariate data and perceiving their trends. Nevertheless, visual clutter usually weakens or even diminishes its ability when the data size increases. In this paper, we first propose a tile-based parallel coordinates, where the plotting area is divided into rectangular tiles. Each tile stores an intersection density that counts the total number of polylines intersecting with that tile. Consequently, the intersection density is mapped to optical attributes, such as color and opacity, by interactive transfer functions. The method visualizes the polylines efficiently and informatively in accordance with the density distribution, and thus, reduces visual cluttering and promotes knowledge discovery. The interactivity of our method allows the user to instantaneously manipulate the tiles distribution and the transfer functions. Specifically, the classic parallel coordinates rendering is a special case of our method when each tile represents only one pixel. A case study on a real world data set, U.S. stock mutual fund data of year 2006, is presented to show the capability of our method in visually analyzing financial data. The presented visual analysis is conducted by an expert in the domain of finance. Our method gains the support from professionals in the finance field, they embrace it as a potential investment analysis tool for mutual fund managers, financial planners, and investors.

  9. Parallel Resolved Open Source CFD-DEM: Method, Validation and Application

    Directory of Open Access Journals (Sweden)

    A. Hager

    2014-03-01

    Full Text Available In the following paper the authors present a fully parallelized Open Source method for calculating the interaction of immersed bodies and surrounding fluid. A combination of computational fluid dynamics (CFD and a discrete element method (DEM accounts for the physics of both the fluid and the particles. The objects considered are relatively big compared to the cells of the fluid mesh, i.e. they cover several cells each. Thus this fictitious domain method (FDM is called resolved. The implementation is realized within the Open Source framework CFDEMcOupling (www.cfdem.com, which provides an interface between OpenFOAM® based CFD-solvers and the DEM software LIGGGHTS (www.liggghts.com. While both LIGGGHTS and OpenFOAM® were already parallelized, only a recent improvement of the algorithm permits the fully parallel computation of resolved problems. Alongside with a detailed description of the method, its implementation and recent improvements, a number of application and validation examples is presented in the scope of this paper.

  10. A Fast parallel tridiagonal algorithm for a class of CFD applications

    Science.gov (United States)

    Moitra, Stuti; Sun, Xian-He

    1996-01-01

    The parallel diagonal dominant (PDD) algorithm is an efficient tridiagonal solver. This paper presents for study a variation of the PDD algorithm, the reduced PDD algorithm. The new algorithm maintains the minimum communication provided by the PDD algorithm, but has a reduced operation count. The PDD algorithm also has a smaller operation count than the conventional sequential algorithm for many applications. Accuracy analysis is provided for the reduced PDD algorithm for symmetric Toeplitz tridiagonal (STT) systems. Implementation results on Langley's Intel Paragon and IBM SP2 show that both the PDD and reduced PDD algorithms are efficient and scalable.

  11. Highly accelerated cardiac cine parallel MRI using low-rank matrix completion and partial separability model

    Science.gov (United States)

    Lyu, Jingyuan; Nakarmi, Ukash; Zhang, Chaoyi; Ying, Leslie

    2016-05-01

    This paper presents a new approach to highly accelerated dynamic parallel MRI using low rank matrix completion, partial separability (PS) model. In data acquisition, k-space data is moderately randomly undersampled at the center kspace navigator locations, but highly undersampled at the outer k-space for each temporal frame. In reconstruction, the navigator data is reconstructed from undersampled data using structured low-rank matrix completion. After all the unacquired navigator data is estimated, the partial separable model is used to obtain partial k-t data. Then the parallel imaging method is used to acquire the entire dynamic image series from highly undersampled data. The proposed method has shown to achieve high quality reconstructions with reduction factors up to 31, and temporal resolution of 29ms, when the conventional PS method fails.

  12. Parallel computing works

    Energy Technology Data Exchange (ETDEWEB)

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  13. CUDA/GPU Technology : Parallel Programming For High Performance Scientific Computing

    OpenAIRE

    YUHENDRA; KUZE, Hiroaki; JOSAPHAT, Tetuko Sri Sumantyo

    2009-01-01

    [ABSTRACT]Graphics processing units (GP Us) originally designed for computer video cards have emerged as the most powerful chip in a high-performance workstation. In the high performance computation capabilities, graphic processing units (GPU) lead to much more powerful performance than conventional CPUs by means of parallel processing. In 2007, the birth of Compute Unified Device Architecture (CUDA) and CUDA-enabled GPUs by NVIDIA Corporation brought a revolution in the general purpose GPU a...

  14. Accelerated Electron-Beam Formation with a High Capture Coefficient in a Parallel Coupled Accelerating Structure

    Science.gov (United States)

    Chernousov, Yu. D.; Shebolaev, I. V.; Ikryanov, I. M.

    2018-01-01

    An electron beam with a high (close to 100%) coefficient of electron capture into the regime of acceleration has been obtained in a linear electron accelerator based on a parallel coupled slow-wave structure, electron gun with microwave-controlled injection current, and permanent-magnet beam-focusing system. The high capture coefficient was due to the properties of the accelerating structure, beam-focusing system, and electron-injection system. Main characteristics of the proposed systems are presented.

  15. High temporal resolution magnetic resonance imaging: development of a parallel three dimensional acquisition method for functional neuroimaging

    International Nuclear Information System (INIS)

    Rabrait, C.

    2007-11-01

    Echo Planar Imaging is widely used to perform data acquisition in functional neuroimaging. This sequence allows the acquisition of a set of about 30 slices, covering the whole brain, at a spatial resolution ranging from 2 to 4 mm, and a temporal resolution ranging from 1 to 2 s. It is thus well adapted to the mapping of activated brain areas but does not allow precise study of the brain dynamics. Moreover, temporal interpolation is needed in order to correct for inter-slices delays and 2-dimensional acquisition is subject to vascular in flow artifacts. To improve the estimation of the hemodynamic response functions associated with activation, this thesis aimed at developing a 3-dimensional high temporal resolution acquisition method. To do so, Echo Volume Imaging was combined with reduced field-of-view acquisition and parallel imaging. Indeed, E.V.I. allows the acquisition of a whole volume in Fourier space following a single excitation, but it requires very long echo trains. Parallel imaging and field-of-view reduction are used to reduce the echo train durations by a factor of 4, which allows the acquisition of a 3-dimensional brain volume with limited susceptibility-induced distortions and signal losses, in 200 ms. All imaging parameters have been optimized in order to reduce echo train durations and to maximize S.N.R., so that cerebral activation can be detected with a high level of confidence. Robust detection of brain activation was demonstrated with both visual and auditory paradigms. High temporal resolution hemodynamic response functions could be estimated through selective averaging of the response to the different trials of the stimulation. To further improve S.N.R., the matrix inversions required in parallel reconstruction were regularized, and the impact of the level of regularization on activation detection was investigated. Eventually, potential applications of parallel E.V.I. such as the study of non-stationary effects in the B.O.L.D. response

  16. Development and application of a massively parallel KKR Green function method for large scale systems

    Energy Technology Data Exchange (ETDEWEB)

    Thiess, Alexander R.

    2011-12-19

    In this thesis we present the development of the self-consistent, full-potential Korringa-Kohn-Rostoker (KKR) Green function method KKRnano for calculating the electronic properties, magnetic interactions, and total energy including all electrons on the basis of the density functional theory (DFT) on high-end massively parallelized high-performance computers for supercells containing thousands of atoms without sacrifice of accuracy. KKRnano was used for the following two applications. The first application is centered in the field of dilute magnetic semiconductors. In this field a new promising material combination was identified: gadolinium doped gallium nitride which shows ferromagnetic ordering of colossal magnetic moments above room temperature. It quickly turned out that additional extrinsic defects are inducing the striking properties. However, the question which kind of extrinsic defects are present in experimental samples is still unresolved. In order to shed light on this open question, we perform extensive studies of the most promising candidates: interstitial nitrogen and oxygen, as well as gallium vacancies. By analyzing the pairwise magnetic coupling among defects it is shown that nitrogen and oxygen interstitials cannot support thermally stable ferromagnetic order. Gallium vacancies, on the other hand, facilitate an important coupling mechanism. The vacancies are found to induce large magnetic moments on all surrounding nitrogen sites, which then couple ferromagnetically both among themselves and with the gadolinium dopants. Based on a statistical evaluation it can be concluded that already small concentrations of gallium vacancies can lead to a distinct long-range ferromagnetic ordering. Beyond this important finding we present further indications, from which we infer that gallium vacancies likely cause the striking ferromagnetic coupling of colossal magnetic moments in GaN:Gd. The second application deals with the phase-change material germanium

  17. Development and application of a massively parallel KKR Green function method for large scale systems

    International Nuclear Information System (INIS)

    Thiess, Alexander R.

    2011-01-01

    In this thesis we present the development of the self-consistent, full-potential Korringa-Kohn-Rostoker (KKR) Green function method KKRnano for calculating the electronic properties, magnetic interactions, and total energy including all electrons on the basis of the density functional theory (DFT) on high-end massively parallelized high-performance computers for supercells containing thousands of atoms without sacrifice of accuracy. KKRnano was used for the following two applications. The first application is centered in the field of dilute magnetic semiconductors. In this field a new promising material combination was identified: gadolinium doped gallium nitride which shows ferromagnetic ordering of colossal magnetic moments above room temperature. It quickly turned out that additional extrinsic defects are inducing the striking properties. However, the question which kind of extrinsic defects are present in experimental samples is still unresolved. In order to shed light on this open question, we perform extensive studies of the most promising candidates: interstitial nitrogen and oxygen, as well as gallium vacancies. By analyzing the pairwise magnetic coupling among defects it is shown that nitrogen and oxygen interstitials cannot support thermally stable ferromagnetic order. Gallium vacancies, on the other hand, facilitate an important coupling mechanism. The vacancies are found to induce large magnetic moments on all surrounding nitrogen sites, which then couple ferromagnetically both among themselves and with the gadolinium dopants. Based on a statistical evaluation it can be concluded that already small concentrations of gallium vacancies can lead to a distinct long-range ferromagnetic ordering. Beyond this important finding we present further indications, from which we infer that gallium vacancies likely cause the striking ferromagnetic coupling of colossal magnetic moments in GaN:Gd. The second application deals with the phase-change material germanium

  18. Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems

    KAUST Repository

    Mudigere, Dheevatsa

    2015-05-01

    In this work, we revisit the 1999 Gordon Bell Prize winning PETSc-FUN3D aerodynamics code, extending it with highly-tuned shared-memory parallelization and detailed performance analysis on modern highly parallel architectures. An unstructured-grid implicit flow solver, which forms the backbone of computational aerodynamics, poses particular challenges due to its large irregular working sets, unstructured memory accesses, and variable/limited amount of parallelism. This code, based on a domain decomposition approach, exposes tradeoffs between the number of threads assigned to each MPI-rank sub domain, and the total number of domains. By applying several algorithm- and architecture-aware optimization techniques for unstructured grids, we show a 6.9X speed-up in performance on a single-node Intel® XeonTM1 E5 2690 v2 processor relative to the out-of-the-box compilation. Our scaling studies on TACC Stampede supercomputer show that our optimizations continue to provide performance benefits over baseline implementation as we scale up to 256 nodes.

  19. Application of the DMRG in two dimensions: a parallel tempering algorithm

    Science.gov (United States)

    Hu, Shijie; Zhao, Jize; Zhang, Xuefeng; Eggert, Sebastian

    The Density Matrix Renormalization Group (DMRG) is known to be a powerful algorithm for treating one-dimensional systems. When the DMRG is applied in two dimensions, however, the convergence becomes much less reliable and typically ''metastable states'' may appear, which are unfortunately quite robust even when keeping a very high number of DMRG states. To overcome this problem we have now successfully developed a parallel tempering DMRG algorithm. Similar to parallel tempering in quantum Monte Carlo, this algorithm allows the systematic switching of DMRG states between different model parameters, which is very efficient for solving convergence problems. Using this method we have figured out the phase diagram of the xxz model on the anisotropic triangular lattice which can be realized by hardcore bosons in optical lattices. SFB Transregio 49 of the Deutsche Forschungsgemeinschaft (DFG) and the Allianz fur Hochleistungsrechnen Rheinland-Pfalz (AHRP).

  20. Screening of a Brassica napus bacterial artificial chromosome library using highly parallel single nucleotide polymorphism assays

    Science.gov (United States)

    2013-01-01

    Background Efficient screening of bacterial artificial chromosome (BAC) libraries with polymerase chain reaction (PCR)-based markers is feasible provided that a multidimensional pooling strategy is implemented. Single nucleotide polymorphisms (SNPs) can be screened in multiplexed format, therefore this marker type lends itself particularly well for medium- to high-throughput applications. Combining the power of multiplex-PCR assays with a multidimensional pooling system may prove to be especially challenging in a polyploid genome. In polyploid genomes two classes of SNPs need to be distinguished, polymorphisms between accessions (intragenomic SNPs) and those differentiating between homoeologous genomes (intergenomic SNPs). We have assessed whether the highly parallel Illumina GoldenGate® Genotyping Assay is suitable for the screening of a BAC library of the polyploid Brassica napus genome. Results A multidimensional screening platform was developed for a Brassica napus BAC library which is composed of almost 83,000 clones. Intragenomic and intergenomic SNPs were included in Illumina’s GoldenGate® Genotyping Assay and both SNP classes were used successfully for screening of the multidimensional BAC pools of the Brassica napus library. An optimized scoring method is proposed which is especially valuable for SNP calling of intergenomic SNPs. Validation of the genotyping results by independent methods revealed a success of approximately 80% for the multiplex PCR-based screening regardless of whether intra- or intergenomic SNPs were evaluated. Conclusions Illumina’s GoldenGate® Genotyping Assay can be efficiently used for screening of multidimensional Brassica napus BAC pools. SNP calling was specifically tailored for the evaluation of BAC pool screening data. The developed scoring method can be implemented independently of plant reference samples. It is demonstrated that intergenomic SNPs represent a powerful tool for BAC library screening of a polyploid genome

  1. Building Input Adaptive Parallel Applications: A Case Study of Sparse Grid Interpolation

    KAUST Repository

    Murarasu, Alin

    2012-12-01

    The well-known power wall resulting in multi-cores requires special techniques for speeding up applications. In this sense, parallelization plays a crucial role. Besides standard serial optimizations, techniques such as input specialization can also bring a substantial contribution to the speedup. By identifying common patterns in the input data, we propose new algorithms for sparse grid interpolation that accelerate the state-of-the-art non-specialized version. Sparse grid interpolation is an inherently hierarchical method of interpolation employed for example in computational steering applications for decompressing highdimensional simulation data. In this context, improving the speedup is essential for real-time visualization. Using input specialization, we report a speedup of up to 9x over the nonspecialized version. The paper covers the steps we took to reach this speedup by means of input adaptivity. Our algorithms will be integrated in fastsg, a library for fast sparse grid interpolation. © 2012 IEEE.

  2. High-performance parallel approaches for three-dimensional light detection and ranging point clouds gridding

    Science.gov (United States)

    Rizki, Permata Nur Miftahur; Lee, Heezin; Lee, Minsu; Oh, Sangyoon

    2017-01-01

    With the rapid advance of remote sensing technology, the amount of three-dimensional point-cloud data has increased extraordinarily, requiring faster processing in the construction of digital elevation models. There have been several attempts to accelerate the computation using parallel methods; however, little attention has been given to investigating different approaches for selecting the most suited parallel programming model for a given computing environment. We present our findings and insights identified by implementing three popular high-performance parallel approaches (message passing interface, MapReduce, and GPGPU) on time demanding but accurate kriging interpolation. The performances of the approaches are compared by varying the size of the grid and input data. In our empirical experiment, we demonstrate the significant acceleration by all three approaches compared to a C-implemented sequential-processing method. In addition, we also discuss the pros and cons of each method in terms of usability, complexity infrastructure, and platform limitation to give readers a better understanding of utilizing those parallel approaches for gridding purposes.

  3. Full-field parallel interferometry coherence probe microscope for high-speed optical metrology.

    Science.gov (United States)

    Safrani, A; Abdulhalim, I

    2015-06-01

    Parallel detection of several achromatic phase-shifted images is used to obtain a high-speed, high-resolution, full-field, optical coherence probe tomography system based on polarization interferometry. The high enface imaging speed, short coherence gate, and high lateral resolution provided by the system are exploited to determine microbump height uniformity in an integrated semiconductor chip at 50 frames per second. The technique is demonstrated using the Linnik microscope, although it can be implemented on any polarization-based interference microscopy system.

  4. Application of a parallel 3-dimensional hydrogeochemistry HPF code to a proposed waste disposal site at the Oak Ridge National Laboratory

    International Nuclear Information System (INIS)

    Gwo, Jin-Ping; Yeh, Gour-Tsyh

    1997-01-01

    The objectives of this study are (1) to parallelize a 3-dimensional hydrogeochemistry code and (2) to apply the parallel code to a proposed waste disposal site at the Oak Ridge National Laboratory (ORNL). The 2-dimensional hydrogeochemistry code HYDROGEOCHEM, developed at the Pennsylvania State University for coupled subsurface solute transport and chemical equilibrium processes, was first modified to accommodate 3-dimensional problem domains. A bi-conjugate gradient stabilized linear matrix solver was then incorporated to solve the matrix equation. We chose to parallelize the 3-dimensional code on the Intel Paragons at ORNL by using an HPF (high performance FORTRAN) compiler developed at PGI. The data- and task-parallel algorithms available in the HPF compiler proved to be highly efficient for the geochemistry calculation. This calculation can be easily implemented in HPF formats and is perfectly parallel because the chemical speciation on one finite-element node is virtually independent of those on the others. The parallel code was applied to a subwatershed of the Melton Branch at ORNL. Chemical heterogeneity, in addition to physical heterogeneities of the geological formations, has been identified as one of the major factors that affect the fate and transport of contaminants at ORNL. This study demonstrated an application of the 3-dimensional hydrogeochemistry code on the Melton Branch site. A uranium tailing problem that involved in aqueous complexation and precipitation-dissolution was tested. Performance statistics was collected on the Intel Paragons at ORNL. Implications of these results on the further optimization of the code were discussed

  5. Parallel continuous simulated tempering and its applications in large-scale molecular simulations

    Energy Technology Data Exchange (ETDEWEB)

    Zang, Tianwu; Yu, Linglin; Zhang, Chong [Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas 77005 (United States); Ma, Jianpeng, E-mail: jpma@bcm.tmc.edu [Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas 77005 (United States); Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030 (United States)

    2014-07-28

    In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.

  6. A Parallel Compact Multi-Dimensional Numerical Algorithm with Aeroacoustics Applications

    Science.gov (United States)

    Povitsky, Alex; Morris, Philip J.

    1999-01-01

    In this study we propose a novel method to parallelize high-order compact numerical algorithms for the solution of three-dimensional PDEs (Partial Differential Equations) in a space-time domain. For this numerical integration most of the computer time is spent in computation of spatial derivatives at each stage of the Runge-Kutta temporal update. The most efficient direct method to compute spatial derivatives on a serial computer is a version of Gaussian elimination for narrow linear banded systems known as the Thomas algorithm. In a straightforward pipelined implementation of the Thomas algorithm processors are idle due to the forward and backward recurrences of the Thomas algorithm. To utilize processors during this time, we propose to use them for either non-local data independent computations, solving lines in the next spatial direction, or local data-dependent computations by the Runge-Kutta method. To achieve this goal, control of processor communication and computations by a static schedule is adopted. Thus, our parallel code is driven by a communication and computation schedule instead of the usual "creative, programming" approach. The obtained parallelization speed-up of the novel algorithm is about twice as much as that for the standard pipelined algorithm and close to that for the explicit DRP algorithm.

  7. Ab initio quantum chemistry in parallel-portable tools and applications

    International Nuclear Information System (INIS)

    Harrison, R.J.; Shepard, R.; Kendall, R.A.

    1991-01-01

    In common with many of the computational sciences, ab initio chemistry faces computational constraints to which a partial solution is offered by the prospect of highly parallel computers. Ab initio codes are large and complex (O(10 5 ) lines of FORTRAN), representing a significant investment of communal effort. The often conflicting requirements of portability and efficiency have been successfully resolved on vector computers by reliance on matrix oriented kernels. This proves inadequate even upon closely-coupled shared-memory parallel machines. We examine the algorithms employed during a typical sequence of calculations. Then we investigate how efficient portable parallel implementations may be derived, including the complex multi-reference singles and doubles configuration interaction algorithm. A portable toolkit, modeled after the Intel iPSC and the ANL-ACRF PARMACS, is developed, using shared memory and TCP/IP sockets. The toolkit is used as an initial platform for programs portable between LANS, Crays and true distributed-memory MIMD machines. Timings are presented. 53 refs., 4 tabs

  8. Parallel Gene Expression Differences between Low and High Latitude Populations of Drosophila melanogaster and D. simulans.

    Science.gov (United States)

    Zhao, Li; Wit, Janneke; Svetec, Nicolas; Begun, David J

    2015-05-01

    Gene expression variation within species is relatively common, however, the role of natural selection in the maintenance of this variation is poorly understood. Here we investigate low and high latitude populations of Drosophila melanogaster and its sister species, D. simulans, to determine whether the two species show similar patterns of population differentiation, consistent with a role for spatially varying selection in maintaining gene expression variation. We compared at two temperatures the whole male transcriptome of D. melanogaster and D. simulans sampled from Panama City (Panama) and Maine (USA). We observed a significant excess of genes exhibiting differential expression in both species, consistent with parallel adaptation to heterogeneous environments. Moreover, the majority of genes showing parallel expression differentiation showed the same direction of differential expression in the two species and the magnitudes of expression differences between high and low latitude populations were correlated across species, further bolstering the conclusion that parallelism for expression phenotypes results from spatially varying selection. However, the species also exhibited important differences in expression phenotypes. For example, the genomic extent of genotype × environment interaction was much more common in D. melanogaster. Highly differentiated SNPs between low and high latitudes were enriched in the 3' UTRs and CDS of the geographically differently expressed genes in both species, consistent with an important role for cis-acting variants in driving local adaptation for expression-related phenotypes.

  9. Evaluation of the power consumption of a high-speed parallel robot

    Science.gov (United States)

    Han, Gang; Xie, Fugui; Liu, Xin-Jun

    2018-06-01

    An inverse dynamic model of a high-speed parallel robot is established based on the virtual work principle. With this dynamic model, a new evaluation method is proposed to measure the power consumption of the robot during pick-and-place tasks. The power vector is extended in this method and used to represent the collinear velocity and acceleration of the moving platform. Afterward, several dynamic performance indices, which are homogenous and possess obvious physical meanings, are proposed. These indices can evaluate the power input and output transmissibility of the robot in a workspace. The distributions of the power input and output transmissibility of the high-speed parallel robot are derived with these indices and clearly illustrated in atlases. Furtherly, a low-power-consumption workspace is selected for the robot.

  10. The Galley Parallel File System

    Science.gov (United States)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.

  11. Decision Optimization for Power Grid Operating Conditions with High- and Low-Voltage Parallel Loops

    Directory of Open Access Journals (Sweden)

    Dong Yang

    2017-05-01

    Full Text Available With the development of higher-voltage power grids, the high- and low-voltage parallel loops are emerging, which lead to energy losses and even threaten the security and stability of power systems. The multi-infeed high-voltage direct current (HVDC configurations widely appearing in AC/DC interconnected power systems make this situation even worse. Aimed at energy saving and system security, a decision optimization method for power grid operating conditions with high- and low-voltage parallel loops is proposed in this paper. Firstly, considering hub substation distribution and power grid structure, parallel loop opening schemes are generated with GN (Girvan-Newman algorithms. Then, candidate opening schemes are preliminarily selected from all these generated schemes based on a filtering index. Finally, with the influence on power system security, stability and operation economy in consideration, an evaluation model for candidate opening schemes is founded based on analytic hierarchy process (AHP. And a fuzzy evaluation algorithm is used to find the optimal scheme. Simulation results of a New England 39-bus system and an actual power system validate the effectiveness and superiority of this proposed method.

  12. Implementation of highly parallel and large scale GW calculations within the OpenAtom software

    Science.gov (United States)

    Ismail-Beigi, Sohrab

    The need to describe electronic excitations with better accuracy than provided by band structures produced by Density Functional Theory (DFT) has been a long-term enterprise for the computational condensed matter and materials theory communities. In some cases, appropriate theoretical frameworks have existed for some time but have been difficult to apply widely due to computational cost. For example, the GW approximation incorporates a great deal of important non-local and dynamical electronic interaction effects but has been too computationally expensive for routine use in large materials simulations. OpenAtom is an open source massively parallel ab initiodensity functional software package based on plane waves and pseudopotentials (http://charm.cs.uiuc.edu/OpenAtom/) that takes advantage of the Charm + + parallel framework. At present, it is developed via a three-way collaboration, funded by an NSF SI2-SSI grant (ACI-1339804), between Yale (Ismail-Beigi), IBM T. J. Watson (Glenn Martyna) and the University of Illinois at Urbana Champaign (Laxmikant Kale). We will describe the project and our current approach towards implementing large scale GW calculations with OpenAtom. Potential applications of large scale parallel GW software for problems involving electronic excitations in semiconductor and/or metal oxide systems will be also be pointed out.

  13. Development of a parallel genetic algorithm using MPI and its application in a nuclear reactor core. Design optimization

    International Nuclear Information System (INIS)

    Waintraub, Marcel; Pereira, Claudio M.N.A.; Baptista, Rafael P.

    2005-01-01

    This work presents the development of a distributed parallel genetic algorithm applied to a nuclear reactor core design optimization. In the implementation of the parallelism, a 'Message Passing Interface' (MPI) library, standard for parallel computation in distributed memory platforms, has been used. Another important characteristic of MPI is its portability for various architectures. The main objectives of this paper are: validation of the results obtained by the application of this algorithm in a nuclear reactor core optimization problem, through comparisons with previous results presented by Pereira et al.; and performance test of the Brazilian Nuclear Engineering Institute (IEN) cluster in reactors physics optimization problems. The experiments demonstrated that the developed parallel genetic algorithm using the MPI library presented significant gains in the obtained results and an accentuated reduction of the processing time. Such results ratify the use of the parallel genetic algorithms for the solution of nuclear reactor core optimization problems. (author)

  14. Acceleration of cardiovascular MRI using parallel imaging: basic principles, practical considerations, clinical applications and future directions

    International Nuclear Information System (INIS)

    Niendorf, T.; Sodickson, D.

    2006-01-01

    Cardiovascular Magnetic Resonance (CVMR) imaging has proven to be of clinical value for non-invasive diagnostic imaging of cardiovascular diseases. CVMR requires rapid imaging; however, the speed of conventional MRI is fundamentally limited due to its sequential approach to image acquisition, in which data points are collected one after the other in the presence of sequentially-applied magnetic field gradients and radiofrequency coils to acquire multiple data points simultaneously, and thereby to increase imaging speed and efficiency beyond the limits of purely gradient-based approaches. The resulting improvements in imaging speed can be used in various ways, including shortening long examinations, improving spatial resolution and anatomic coverage, improving temporal resolution, enhancing image quality, overcoming physiological constraints, detecting and correcting for physiologic motion, and streamlining work flow. Examples of these strategies will be provided in this review, after some of the fundamentals of parallel imaging methods now in use for cardiovascular MRI are outlined. The emphasis will rest upon basic principles and clinical state-of-the art cardiovascular MRI applications. In addition, practical aspects such as signal-to-noise ratio considerations, tailored parallel imaging protocols and potential artifacts will be discussed, and current trends and future directions will be explored. (orig.)

  15. Development of parallel algorithms for electrical power management in space applications

    Science.gov (United States)

    Berry, Frederick C.

    1989-01-01

    The application of parallel techniques for electrical power system analysis is discussed. The Newton-Raphson method of load flow analysis was used along with the decomposition-coordination technique to perform load flow analysis. The decomposition-coordination technique enables tasks to be performed in parallel by partitioning the electrical power system into independent local problems. Each independent local problem represents a portion of the total electrical power system on which a loan flow analysis can be performed. The load flow analysis is performed on these partitioned elements by using the Newton-Raphson load flow method. These independent local problems will produce results for voltage and power which can then be passed to the coordinator portion of the solution procedure. The coordinator problem uses the results of the local problems to determine if any correction is needed on the local problems. The coordinator problem is also solved by an iterative method much like the local problem. The iterative method for the coordination problem will also be the Newton-Raphson method. Therefore, each iteration at the coordination level will result in new values for the local problems. The local problems will have to be solved again along with the coordinator problem until some convergence conditions are met.

  16. Application of Massively Parallel Sequencing in the Clinical Diagnostic Testing of Inherited Cardiac Conditions

    Directory of Open Access Journals (Sweden)

    Ivone U. S. Leong

    2014-06-01

    Full Text Available Sudden cardiac death in people between the ages of 1–40 years is a devastating event and is frequently caused by several heritable cardiac disorders. These disorders include cardiac ion channelopathies, such as long QT syndrome, catecholaminergic polymorphic ventricular tachycardia and Brugada syndrome and cardiomyopathies, such as hypertrophic cardiomyopathy and arrhythmogenic right ventricular cardiomyopathy. Through careful molecular genetic evaluation of DNA from sudden death victims, the causative gene mutation can be uncovered, and the rest of the family can be screened and preventative measures implemented in at-risk individuals. The current screening approach in most diagnostic laboratories uses Sanger-based sequencing; however, this method is time consuming and labour intensive. The development of massively parallel sequencing has made it possible to produce millions of sequence reads simultaneously and is potentially an ideal approach to screen for mutations in genes that are associated with sudden cardiac death. This approach offers mutation screening at reduced cost and turnaround time. Here, we will review the current commercially available enrichment kits, massively parallel sequencing (MPS platforms, downstream data analysis and its application to sudden cardiac death in a diagnostic environment.

  17. Parallel optical control of spatiotemporal neuronal spike activity using high-frequency digital light processingtechnology

    Directory of Open Access Journals (Sweden)

    Jason eJerome

    2011-08-01

    Full Text Available Neurons in the mammalian neocortex receive inputs from and communicate back to thousands of other neurons, creating complex spatiotemporal activity patterns. The experimental investigation of these parallel dynamic interactions has been limited due to the technical challenges of monitoring or manipulating neuronal activity at that level of complexity. Here we describe a new massively parallel photostimulation system that can be used to control action potential firing in in vitro brain slices with high spatial and temporal resolution while performing extracellular or intracellular electrophysiological measurements. The system uses Digital-Light-Processing (DLP technology to generate 2-dimensional (2D stimulus patterns with >780,000 independently controlled photostimulation sites that operate at high spatial (5.4 µm and temporal (>13kHz resolution. Light is projected through the quartz-glass bottom of the perfusion chamber providing access to a large area (2.76 x 2.07 mm2 of the slice preparation. This system has the unique capability to induce temporally precise action potential firing in large groups of neurons distributed over a wide area covering several cortical columns. Parallel photostimulation opens up new opportunities for the in vitro experimental investigation of spatiotemporal neuronal interactions at a broad range of anatomical scales.

  18. Efficient high-precision matrix algebra on parallel architectures for nonlinear combinatorial optimization

    KAUST Repository

    Gunnels, John; Lee, Jon; Margulies, Susan

    2010-01-01

    We provide a first demonstration of the idea that matrix-based algorithms for nonlinear combinatorial optimization problems can be efficiently implemented. Such algorithms were mainly conceived by theoretical computer scientists for proving efficiency. We are able to demonstrate the practicality of our approach by developing an implementation on a massively parallel architecture, and exploiting scalable and efficient parallel implementations of algorithms for ultra high-precision linear algebra. Additionally, we have delineated and implemented the necessary algorithmic and coding changes required in order to address problems several orders of magnitude larger, dealing with the limits of scalability from memory footprint, computational efficiency, reliability, and interconnect perspectives. © Springer and Mathematical Programming Society 2010.

  19. Efficient high-precision matrix algebra on parallel architectures for nonlinear combinatorial optimization

    KAUST Repository

    Gunnels, John

    2010-06-01

    We provide a first demonstration of the idea that matrix-based algorithms for nonlinear combinatorial optimization problems can be efficiently implemented. Such algorithms were mainly conceived by theoretical computer scientists for proving efficiency. We are able to demonstrate the practicality of our approach by developing an implementation on a massively parallel architecture, and exploiting scalable and efficient parallel implementations of algorithms for ultra high-precision linear algebra. Additionally, we have delineated and implemented the necessary algorithmic and coding changes required in order to address problems several orders of magnitude larger, dealing with the limits of scalability from memory footprint, computational efficiency, reliability, and interconnect perspectives. © Springer and Mathematical Programming Society 2010.

  20. Architecture of top down, parallel pattern recognition system TOPS and its application to the MR head images

    International Nuclear Information System (INIS)

    Matsunoshita, Jun-ichi; Akamatsu, Shigeo; Yamamoto, Shinji.

    1993-01-01

    This paper describes about the system architecture of a new image recognition system TOPS (top-down parallel pattern recognition system), and its application to the automatic extraction of brain organs (cerebrum, cerebellum, brain stem) from 3D-MRI images. Main concepts of TOPS are as follows: (1) TOPS is the top-down type recognition system, which allows parallel models in each level of hierarchy structure. (2) TOPS allows parallel image processing algorithms for one purpose (for example, for extraction of one special organ). This results in multiple candidates for one purpose, and judgment to get unique solution for it will be made at upper level of hierarchy structure. (author)

  1. High performance parallel computing of flows in complex geometries: I. Methods

    International Nuclear Information System (INIS)

    Gourdain, N; Gicquel, L; Montagnac, M; Vermorel, O; Staffelbach, G; Garcia, M; Boussuge, J-F; Gazaix, M; Poinsot, T

    2009-01-01

    Efficient numerical tools coupled with high-performance computers, have become a key element of the design process in the fields of energy supply and transportation. However flow phenomena that occur in complex systems such as gas turbines and aircrafts are still not understood mainly because of the models that are needed. In fact, most computational fluid dynamics (CFD) predictions as found today in industry focus on a reduced or simplified version of the real system (such as a periodic sector) and are usually solved with a steady-state assumption. This paper shows how to overcome such barriers and how such a new challenge can be addressed by developing flow solvers running on high-end computing platforms, using thousands of computing cores. Parallel strategies used by modern flow solvers are discussed with particular emphases on mesh-partitioning, load balancing and communication. Two examples are used to illustrate these concepts: a multi-block structured code and an unstructured code. Parallel computing strategies used with both flow solvers are detailed and compared. This comparison indicates that mesh-partitioning and load balancing are more straightforward with unstructured grids than with multi-block structured meshes. However, the mesh-partitioning stage can be challenging for unstructured grids, mainly due to memory limitations of the newly developed massively parallel architectures. Finally, detailed investigations show that the impact of mesh-partitioning on the numerical CFD solutions, due to rounding errors and block splitting, may be of importance and should be accurately addressed before qualifying massively parallel CFD tools for a routine industrial use.

  2. Applications of high power microwaves

    International Nuclear Information System (INIS)

    Benford, J.; Swegle, J.

    1993-01-01

    The authors address a number of applications for HPM technology. There is a strong symbiotic relationship between a developing technology and its emerging applications. New technologies can generate new applications. Conversely, applications can demand development of new technological capability. High-power microwave generating systems come with size and weight penalties and problems associated with the x-radiation and collection of the electron beam. Acceptance of these difficulties requires the identification of a set of applications for which high-power operation is either demanded or results in significant improvements in peRFormance. The authors identify the following applications, and discuss their requirements and operational issues: (1) High-energy RF acceleration; (2) Atmospheric modification (both to produce artificial ionospheric mirrors for radio waves and to save the ozone layer); (3) Radar; (4) Electronic warfare; and (5) Laser pumping. In addition, they discuss several applications requiring high average power than border on HPM, power beaming and plasma heating

  3. Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

    Energy Technology Data Exchange (ETDEWEB)

    Bylaska, Eric J., E-mail: Eric.Bylaska@pnnl.gov [Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, P.O. Box 999, Richland, Washington 99352 (United States); Weare, Jonathan Q., E-mail: weare@uchicago.edu [Department of Mathematics, University of Chicago, Chicago, Illinois 60637 (United States); Weare, John H., E-mail: jweare@ucsd.edu [Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, California 92093 (United States)

    2013-08-21

    Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t{sub i} (trajectory positions and velocities x{sub i} = (r{sub i}, v{sub i})) to time t{sub i+1} (x{sub i+1}) by x{sub i+1} = f{sub i}(x{sub i}), the dynamics problem spanning an interval from t{sub 0}…t{sub M} can be transformed into a root finding problem, F(X) = [x{sub i} − f(x{sub (i−1})]{sub i} {sub =1,M} = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H{sub 2}O AIMD simulation at the MP2 level. The maximum speedup ((serial execution time)/(parallel execution time) ) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up

  4. Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

    International Nuclear Information System (INIS)

    Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.

    2013-01-01

    Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t i (trajectory positions and velocities x i = (r i , v i )) to time t i+1 (x i+1 ) by x i+1 = f i (x i ), the dynamics problem spanning an interval from t 0 …t M can be transformed into a root finding problem, F(X) = [x i − f(x (i−1 )] i =1,M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H 2 O AIMD simulation at the MP2 level. The maximum speedup ((serial execution time)/(parallel execution time) ) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a

  5. Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications

    International Nuclear Information System (INIS)

    Hoisie, A.; Lubeck, O.; Wasserman, H.

    1998-01-01

    The authors develop a model for the parallel performance of algorithms that consist of concurrent, two-dimensional wavefronts implemented in a message passing environment. The model, based on a LogGP machine parameterization, combines the separate contributions of computation and communication wavefronts. They validate the model on three important supercomputer systems, on up to 500 processors. They use data from a deterministic particle transport application taken from the ASCI workload, although the model is general to any wavefront algorithm implemented on a 2-D processor domain. They also use the validated model to make estimates of performance and scalability of wavefront algorithms on 100-TFLOPS computer systems expected to be in existence within the next decade as part of the ASCI program and elsewhere. In this context, they analyze two problem sizes. The model shows that on the largest such problem (1 billion cells), inter-processor communication performance is not the bottleneck. Single-node efficiency is the dominant factor

  6. Development of parallel implementation of adaptive numerical methods with industrial applications in fluid mechanics

    International Nuclear Information System (INIS)

    Laucoin, E.

    2008-10-01

    Numerical resolution of partial differential equations can be made reliable and efficient through the use of adaptive numerical methods.We present here the work we have done for the design, the implementation and the validation of such a method within an industrial software platform with applications in thermohydraulics. From the geometric point of view, this method can deal both with mesh refinement and mesh coarsening, while ensuring the quality of the mesh cells. Numerically, we use the mortar elements formalism in order to extend the Finite Volumes-Elements method implemented in the Trio-U platform and to deal with the non-conforming meshes arising from the adaptation procedure. Finally, we present an implementation of this method using concepts from domain decomposition methods for ensuring its efficiency while running in a parallel execution context. (author)

  7. Aspects of computation on asynchronous parallel processors

    International Nuclear Information System (INIS)

    Wright, M.

    1989-01-01

    The increasing availability of asynchronous parallel processors has provided opportunities for original and useful work in scientific computing. However, the field of parallel computing is still in a highly volatile state, and researchers display a wide range of opinion about many fundamental questions such as models of parallelism, approaches for detecting and analyzing parallelism of algorithms, and tools that allow software developers and users to make effective use of diverse forms of complex hardware. This volume collects the work of researchers specializing in different aspects of parallel computing, who met to discuss the framework and the mechanics of numerical computing. The far-reaching impact of high-performance asynchronous systems is reflected in the wide variety of topics, which include scientific applications (e.g. linear algebra, lattice gauge simulation, ordinary and partial differential equations), models of parallelism, parallel language features, task scheduling, automatic parallelization techniques, tools for algorithm development in parallel environments, and system design issues

  8. Absorbed dose calibration factors for parallel-plate chambers in high energy photon beams

    International Nuclear Information System (INIS)

    McEwen, M.R.; Duane, S.; Thomas, R.A.S.

    2002-01-01

    An investigation was carried out into the performance of parallel-plate chambers in 60 Co and MV photon beams. The aim was to derive calibration factors, investigate chamber-to-chamber variability and provide much-needed information on the use of parallel-plate chambers in high-energy X-ray beams. A set of NE2561/NE2611 reference chambers, calibrated against the primary standard graphite calorimeter is used for the dissemination of absorbed dose to water. The parallel-plate chambers were calibrated by comparison with the NPL reference chambers in a water phantom. Two types of parallel-plate chamber were investigated - the NACP -02 and Roos and measurements were made at 60 C0 and 6 linac photon energies (6-19 MV). Calibration factors were derived together with polarity corrections. The standard uncertainty in the calibration of a chamber in terms of absorbed dose to water is estimated to be ±0.75%. The results of the polarity measurements were somewhat confusing. One would expect the correction to be small and previous measurements in electron beams have indicated that there is little variation between chambers of these types. However, some chambers gave unexpectedly large polarity corrections, up to 0.8%. By contrast the measured polarity correction for a NE2611 chamber was less than 0.13% at all energies. The reason for these large polarity corrections is not clear, but experimental error and linac variations have been ruled out. By combining the calibration data for the different chambers it was possible to obtain experimental k Q factors for the two chamber types. It would appear from the data that the variations between chambers of the same type are random and one can therefore define a generic curve for each chamber type. These are presented in Figure 1, together with equivalent data for two cylindrical chamber types - NE2561/NE2611 and NE2571. As can be seen, there is a clear difference between the curves for the cylindrical chambers and those for the

  9. Parallel Algorithms and Software for Nuclear, Energy, and Environmental Applications Part I: Multiphysics Algorithms

    International Nuclear Information System (INIS)

    Gaston, Derek; Guo, Luanjing; Hansen, Glen; Huang, Hai; Johnson, Richard; Park, HyeongKae; Podgorney, Robert; Tonks, Michael; Williamson, Richard

    2012-01-01

    There is a growing trend within energy and environmental simulation to consider tightly coupled solutions to multiphysics problems. This can be seen in nuclear reactor analysis where analysts are interested in coupled flow, heat transfer and neutronics, and in nuclear fuel performance simulation where analysts are interested in thermomechanics with contact coupled to species transport and chemistry. In energy and environmental applications, energy extraction involves geomechanics, flow through porous media and fractured formations, adding heat transport for enhanced oil recovery and geothermal applications, and adding reactive transport in the case of applications modeling the underground flow of contaminants. These more ambitious simulations usually motivate some level of parallel computing. Many of the physics coupling efforts to date utilize simple code coupling or first-order operator splitting, often referred to as loose coupling. While these approaches can produce answers, they usually leave questions of accuracy and stability unanswered. Additionally, the different physics often reside on distinct meshes and data are coupled via simple interpolation, again leaving open questions of stability and accuracy.

  10. Modeling of fatigue crack induced nonlinear ultrasonics using a highly parallelized explicit local interaction simulation approach

    Science.gov (United States)

    Shen, Yanfeng; Cesnik, Carlos E. S.

    2016-04-01

    This paper presents a parallelized modeling technique for the efficient simulation of nonlinear ultrasonics introduced by the wave interaction with fatigue cracks. The elastodynamic wave equations with contact effects are formulated using an explicit Local Interaction Simulation Approach (LISA). The LISA formulation is extended to capture the contact-impact phenomena during the wave damage interaction based on the penalty method. A Coulomb friction model is integrated into the computation procedure to capture the stick-slip contact shear motion. The LISA procedure is coded using the Compute Unified Device Architecture (CUDA), which enables the highly parallelized supercomputing on powerful graphic cards. Both the explicit contact formulation and the parallel feature facilitates LISA's superb computational efficiency over the conventional finite element method (FEM). The theoretical formulations based on the penalty method is introduced and a guideline for the proper choice of the contact stiffness is given. The convergence behavior of the solution under various contact stiffness values is examined. A numerical benchmark problem is used to investigate the new LISA formulation and results are compared with a conventional contact finite element solution. Various nonlinear ultrasonic phenomena are successfully captured using this contact LISA formulation, including the generation of nonlinear higher harmonic responses. Nonlinear mode conversion of guided waves at fatigue cracks is also studied.

  11. A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC

    Directory of Open Access Journals (Sweden)

    Yun-gang Xue

    2017-01-01

    Full Text Available We propose a highly parallel and scalable motion estimation algorithm, named multilevel resolution motion estimation (MLRME for short, by combining the advantages of local full search and downsampling. By subsampling a video frame, a large amount of computation is saved. While using the local full-search method, it can exploit massive parallelism and make full use of the powerful modern many-core accelerators, such as GPU and Intel Xeon Phi. We implanted the proposed MLRME into HM12.0, and the experimental results showed that the encoding quality of the MLRME method is close to that of the fast motion estimation in HEVC, which declines by less than 1.5%. We also implemented the MLRME with CUDA, which obtained 30–60x speed-up compared to the serial algorithm on single CPU. Specifically, the parallel implementation of MLRME on a GTX 460 GPU can meet the real-time coding requirement with about 25 fps for the 2560×1600 video format, while, for 832×480, the performance is more than 100 fps.

  12. Fluid/Structure Interaction Studies of Aircraft Using High Fidelity Equations on Parallel Computers

    Science.gov (United States)

    Guruswamy, Guru; VanDalsem, William (Technical Monitor)

    1994-01-01

    Abstract Aeroelasticity which involves strong coupling of fluids, structures and controls is an important element in designing an aircraft. Computational aeroelasticity using low fidelity methods such as the linear aerodynamic flow equations coupled with the modal structural equations are well advanced. Though these low fidelity approaches are computationally less intensive, they are not adequate for the analysis of modern aircraft such as High Speed Civil Transport (HSCT) and Advanced Subsonic Transport (AST) which can experience complex flow/structure interactions. HSCT can experience vortex induced aeroelastic oscillations whereas AST can experience transonic buffet associated structural oscillations. Both aircraft may experience a dip in the flutter speed at the transonic regime. For accurate aeroelastic computations at these complex fluid/structure interaction situations, high fidelity equations such as the Navier-Stokes for fluids and the finite-elements for structures are needed. Computations using these high fidelity equations require large computational resources both in memory and speed. Current conventional super computers have reached their limitations both in memory and speed. As a result, parallel computers have evolved to overcome the limitations of conventional computers. This paper will address the transition that is taking place in computational aeroelasticity from conventional computers to parallel computers. The paper will address special techniques needed to take advantage of the architecture of new parallel computers. Results will be illustrated from computations made on iPSC/860 and IBM SP2 computer by using ENSAERO code that directly couples the Euler/Navier-Stokes flow equations with high resolution finite-element structural equations.

  13. Design, fabrication and characterization of a micro-fluxgate intended for parallel robot application

    Science.gov (United States)

    Kirchhoff, M. R.; Bogdanski, G.; Büttgenbach, S.

    2009-05-01

    This paper presents a micro-magnetometer based on the fluxgate principle. Fluxgates detect the magnitude and direction of DC and low-frequency AC magnetic fields. The detectable flux density typically ranges from several 10 nT to about 1 mT. The introduced fluxgate sensor is fabricated using MEMS-technologies, basically UV depth lithography and electroplating for manufacturing high aspect ratio structures. It consists of helical copper coils around a soft magnetic nickel-iron (NiFe) core. The core is designed in so-called racetrack geometry, whereby the directional sensitivity of the sensor is considerably higher compared to common ring-core fluxgates. The electrical operation is based on analyzing the 2nd harmonic of the AC output signal. Configuration, manufacturing and selected characteristics of the fluxgate magnetometer are discussed in this work. The fluxgate builds the basis of an innovative angular sensor system for a parallel robot with HEXA-structure. Integrated into the passive joints of the parallel robot, the fluxgates are combined with permanent magnets rotating on the joint shafts. The magnet transmits the angular information via its magnetic orientation. In this way, the angles between the kinematic elements are measured, which allows self-calibration of the robot and the fast analytical solution of direct kinematics for an advanced workspace monitoring.

  14. Drainage network extraction from a high-resolution DEM using parallel programming in the .NET Framework

    Science.gov (United States)

    Du, Chao; Ye, Aizhong; Gan, Yanjun; You, Jinjun; Duan, Qinyun; Ma, Feng; Hou, Jingwen

    2017-12-01

    High-resolution Digital Elevation Models (DEMs) can be used to extract high-accuracy prerequisite drainage networks. A higher resolution represents a larger number of grids. With an increase in the number of grids, the flow direction determination will require substantial computer resources and computing time. Parallel computing is a feasible method with which to resolve this problem. In this paper, we proposed a parallel programming method within the .NET Framework with a C# Compiler in a Windows environment. The basin is divided into sub-basins, and subsequently the different sub-basins operate on multiple threads concurrently to calculate flow directions. The method was applied to calculate the flow direction of the Yellow River basin from 3 arc-second resolution SRTM DEM. Drainage networks were extracted and compared with HydroSHEDS river network to assess their accuracy. The results demonstrate that this method can calculate the flow direction from high-resolution DEMs efficiently and extract high-precision continuous drainage networks.

  15. Parallel transport in ideal magnetohydrodynamics and applications to resistive wall modes

    International Nuclear Information System (INIS)

    Finn, J.M.; Gerwin, R.A.

    1996-01-01

    It is shown that in magnetohydrodynamics (MHD) with an ideal Ohm close-quote s law, in the presence of parallel heat flux, density gradient, temperature gradient, and parallel compression, but in the absence of perpendicular compressibility, there is an exact cancellation of the parallel transport terms. This cancellation is due to the fact that magnetic flux is advected in the presence of an ideal Ohm close-quote s law, and therefore parallel transport of temperature and density gives the same result as perpendicular advection of the same quantities. Discussions are also presented regarding parallel viscosity and parallel velocity shear, and the generalization to toroidal geometry. These results suggest that a correct generalization of the Hammett endash Perkins fluid operator [G. W. Hammett and F. W. Perkins, Phys. Rev. Lett. 64, 3019 (1990)] to simulate Landau damping for electromagnetic modes must give an operator that acts on the dynamics parallel to the perturbed magnetic field lines. copyright 1996 American Institute of Physics

  16. Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations.

    Science.gov (United States)

    Bylaska, Eric J; Weare, Jonathan Q; Weare, John H

    2013-08-21

    Parallel in time simulation algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0[ellipsis (horizontal)]tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are parallelized by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed parallel in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD simulations, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The parallel in time algorithms developed are tested by applying them to MD and AIMD simulations of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD simulation using Stillinger-Weber potentials, and a HCl + 4H2O AIMD simulation at the MP2 level. The maximum speedup (serial execution/timeparallel execution time) obtained by parallelizing the Stillinger-Weber MD simulation was nearly 3.0. For the AIMD MP2 simulations, the algorithms achieved speedups of up to 14.3. The parallel in time algorithms can be implemented in a

  17. Development of high-resolution x-ray CT system using parallel beam geometry

    Energy Technology Data Exchange (ETDEWEB)

    Yoneyama, Akio, E-mail: akio.yoneyama.bu@hitachi.com; Baba, Rika [Central Research Laboratory, Hitachi Ltd., Hatoyama, Saitama (Japan); Hyodo, Kazuyuki [Institute of Materials Science, High Energy Accelerator Research Organization, Tsukuba, Ibaraki (Japan); Takeda, Tohoru [School of Allied Health Sciences, Kitasato University, Sagamihara, Kanagawa (Japan); Nakano, Haruhisa; Maki, Koutaro [Department of Orthodontics, School of Dentistry Showa University, Ota-ku, Tokyo (Japan); Sumitani, Kazushi; Hirai, Yasuharu [Kyushu Synchrotron Light Research Center, Tosu, Saga (Japan)

    2016-01-28

    For fine three-dimensional observations of large biomedical and organic material samples, we developed a high-resolution X-ray CT system. The system consists of a sample positioner, a 5-μm scintillator, microscopy lenses, and a water-cooled sCMOS detector. Parallel beam geometry was adopted to attain a field of view of a few mm square. A fine three-dimensional image of birch branch was obtained using a 9-keV X-ray at BL16XU of SPring-8 in Japan. The spatial resolution estimated from the line profile of a sectional image was about 3 μm.

  18. Investigation of density-wave oscillation in parallel boiling channels under high pressure

    International Nuclear Information System (INIS)

    Ming Xiao; Xuejun Chen; Mingyuan Zhang

    1992-01-01

    This paper presents experimental results on density-wave instability in parallel boiling channels. Experiments have been done in a high pressure steam-water loop. Different types of two-phase flow instabilities have been observed, including density-wave oscillation, pressure-drop type oscillation, thermal oscillation and secondary density-wave oscillation. The secondary density-wave oscillation appears at very low exit steam quality (less than 0.1) and at the positive portion of Δ P-G curves with both channels' flow rate oscillating in phase. Density-wave oscillation can appear at pressure up to 192 bar and disappear over 207 bar. (6 figures) (Author)

  19. A Novel Technique for Design of Ultra High Tunable Electrostatic Parallel Plate RF MEMS Variable Capacitor

    Science.gov (United States)

    Baghelani, Masoud; Ghavifekr, Habib Badri

    2017-12-01

    This paper introduces a novel method for designing of low actuation voltage, high tuning ratio electrostatic parallel plate RF MEMS variable capacitors. It is feasible to achieve ultra-high tuning ratios way beyond 1.5:1 barrier, imposed by pull-in effect, by the proposed method. The proposed method is based on spring strengthening of the structure just before the unstable region. Spring strengthening could be realized by embedding some dimples on the spring arms with the precise height. These dimples shorten the spring length when achieved to the substrate. By the proposed method, as high tuning ratios as 7.5:1 is attainable by only considering four dimple sets. The required actuation voltage for this high tuning ratio is 14.33 V which is simply achievable on-chip by charge pump circuits. Brownian noise effect is also discussed and mechanical natural frequency of the structure is calculated.

  20. Automatic Energy Schemes for High Performance Applications

    Energy Technology Data Exchange (ETDEWEB)

    Sundriyal, Vaibhav [Iowa State Univ., Ames, IA (United States)

    2013-01-01

    Although high-performance computing traditionally focuses on the efficient execution of large-scale applications, both energy and power have become critical concerns when approaching exascale. Drastic increases in the power consumption of supercomputers affect significantly their operating costs and failure rates. In modern microprocessor architectures, equipped with dynamic voltage and frequency scaling (DVFS) and CPU clock modulation (throttling), the power consumption may be controlled in software. Additionally, network interconnect, such as Infiniband, may be exploited to maximize energy savings while the application performance loss and frequency switching overheads must be carefully balanced. This work first studies two important collective communication operations, all-to-all and allgather and proposes energy saving strategies on the per-call basis. Next, it targets point-to-point communications to group them into phases and apply frequency scaling to them to save energy by exploiting the architectural and communication stalls. Finally, it proposes an automatic runtime system which combines both collective and point-to-point communications into phases, and applies throttling to them apart from DVFS to maximize energy savings. The experimental results are presented for NAS parallel benchmark problems as well as for the realistic parallel electronic structure calculations performed by the widely used quantum chemistry package GAMESS. Close to the maximum energy savings were obtained with a substantially low performance loss on the given platform.

  1. A Parallel Strategy for High-speed Interpolation of CNC Using Data Space Constraint Method

    Directory of Open Access Journals (Sweden)

    Shuan-qiang Yang

    2013-12-01

    Full Text Available A high-speed interpolation scheme using parallel computing is proposed in this paper. The interpolation method is divided into two tasks, namely, the rough task executing in PC and the fine task in the I/O card. During the interpolation procedure, the double buffers are constructed to exchange the interpolation data between the two tasks. Then, the data space constraint method is adapted to ensure the reliable and continuous data communication between the two buffers. Therefore, the proposed scheme can be realized in the common distribution of the operation systems without real-time performance. The high-speed and high-precision motion control can be achieved as well. Finally, an experiment is conducted on the self-developed CNC platform, the test results are shown to verify the proposed method.

  2. A 10-bit column-parallel cyclic ADC for high-speed CMOS image sensors

    International Nuclear Information System (INIS)

    Han Ye; Li Quanliang; Shi Cong; Wu Nanjian

    2013-01-01

    This paper presents a high-speed column-parallel cyclic analog-to-digital converter (ADC) for a CMOS image sensor. A correlated double sampling (CDS) circuit is integrated in the ADC, which avoids a stand-alone CDS circuit block. An offset cancellation technique is also introduced, which reduces the column fixed-pattern noise (FPN) effectively. One single channel ADC with an area less than 0.02 mm 2 was implemented in a 0.13 μm CMOS image sensor process. The resolution of the proposed ADC is 10-bit, and the conversion rate is 1.6 MS/s. The measured differential nonlinearity and integral nonlinearity are 0.89 LSB and 6.2 LSB together with CDS, respectively. The power consumption from 3.3 V supply is only 0.66 mW. An array of 48 10-bit column-parallel cyclic ADCs was integrated into an array of CMOS image sensor pixels. The measured results indicated that the ADC circuit is suitable for high-speed CMOS image sensors. (semiconductor integrated circuits)

  3. Encoding methods for B1+ mapping in parallel transmit systems at ultra high field

    Science.gov (United States)

    Tse, Desmond H. Y.; Poole, Michael S.; Magill, Arthur W.; Felder, Jörg; Brenner, Daniel; Jon Shah, N.

    2014-08-01

    Parallel radiofrequency (RF) transmission, either in the form of RF shimming or pulse design, has been proposed as a solution to the B1+ inhomogeneity problem in ultra high field magnetic resonance imaging. As a prerequisite, accurate B1+ maps from each of the available transmit channels are required. In this work, four different encoding methods for B1+ mapping, namely 1-channel-on, all-channels-on-except-1, all-channels-on-1-inverted and Fourier phase encoding, were evaluated using dual refocusing acquisition mode (DREAM) at 9.4 T. Fourier phase encoding was demonstrated in both phantom and in vivo to be the least susceptible to artefacts caused by destructive RF interference at 9.4 T. Unlike the other two interferometric encoding schemes, Fourier phase encoding showed negligible dependency on the initial RF phase setting and therefore no prior B1+ knowledge is required. Fourier phase encoding also provides a flexible way to increase the number of measurements to increase SNR, and to allow further reduction of artefacts by weighted decoding. These advantages of Fourier phase encoding suggest that it is a good choice for B1+ mapping in parallel transmit systems at ultra high field.

  4. Parallel field line and stream line tracing algorithms for space physics applications

    Science.gov (United States)

    Toth, G.; de Zeeuw, D.; Monostori, G.

    2004-05-01

    Field line and stream line tracing is required in various space physics applications, such as the coupling of the global magnetosphere and inner magnetosphere models, the coupling of the solar energetic particle and heliosphere models, or the modeling of comets, where the multispecies chemical equations are solved along stream lines of a steady state solution obtained with single fluid MHD model. Tracing a vector field is an inherently serial process, which is difficult to parallelize. This is especially true when the data corresponding to the vector field is distributed over a large number of processors. We designed algorithms for the various applications, which scale well to a large number of processors. In the first algorithm the computational domain is divided into blocks. Each block is on a single processor. The algorithm folows the vector field inside the blocks, and calculates a mapping of the block surfaces. The blocks communicate the values at the coinciding surfaces, and the results are interpolated. Finally all block surfaces are defined and values inside the blocks are obtained. In the second algorithm all processors start integrating along the vector field inside the accessible volume. When the field line leaves the local subdomain, the position and other information is stored in a buffer. Periodically the processors exchange the buffers, and continue integration of the field lines until they reach a boundary. At that point the results are sent back to the originating processor. Efficiency is achieved by a careful phasing of computation and communication. In the third algorithm the results of a steady state simulation are stored on a hard drive. The vector field is contained in blocks. All processors read in all the grid and vector field data and the stream lines are integrated in parallel. If a stream line enters a block, which has already been integrated, the results can be interpolated. By a clever ordering of the blocks the execution speed can be

  5. High-Resolution Electronics: Spontaneous Patterning of High-Resolution Electronics via Parallel Vacuum Ultraviolet (Adv. Mater. 31/2016).

    Science.gov (United States)

    Liu, Xuying; Kanehara, Masayuki; Liu, Chuan; Sakamoto, Kenji; Yasuda, Takeshi; Takeya, Jun; Minari, Takeo

    2016-08-01

    On page 6568, T. Minari and co-workers describe spontaneous patterning based on the parallel vacuum ultraviolet (PVUV) technique, enabling the homogeneous integration of complex, high-resolution electronic circuits, even on large-scale, flexible, transparent substrates. Irradiation of PVUV to the hydrophobic polymer surface precisely renders the selected surface into highly wettable regions with sharply defined boundaries, which spontaneously guides a metal nanoparticle ink into a series of circuit lines and gaps with the widths down to a resolution of 1 μm. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. MulticoreBSP for C : A high-performance library for shared-memory parallel programming

    NARCIS (Netherlands)

    Yzelman, A. N.; Bisseling, R. H.; Roose, D.; Meerbergen, K.

    2014-01-01

    The bulk synchronous parallel (BSP) model, as well as parallel programming interfaces based on BSP, classically target distributed-memory parallel architectures. In earlier work, Yzelman and Bisseling designed a MulticoreBSP for Java library specifically for shared-memory architectures. In the

  7. Transparent Nanopore Cavity Arrays Enable Highly Parallelized Optical Studies of Single Membrane Proteins on Chip.

    Science.gov (United States)

    Diederichs, Tim; Nguyen, Quoc Hung; Urban, Michael; Tampé, Robert; Tornow, Marc

    2018-06-13

    Membrane proteins involved in transport processes are key targets for pharmaceutical research and industry. Despite continuous improvements and new developments in the field of electrical readouts for the analysis of transport kinetics, a well-suited methodology for high-throughput characterization of single transporters with nonionic substrates and slow turnover rates is still lacking. Here, we report on a novel architecture of silicon chips with embedded nanopore microcavities, based on a silicon-on-insulator technology for high-throughput optical readouts. Arrays containing more than 14 000 inverted-pyramidal cavities of 50 femtoliter volumes and 80 nm circular pore openings were constructed via high-resolution electron-beam lithography in combination with reactive ion etching and anisotropic wet etching. These cavities feature both, an optically transparent bottom and top cap. Atomic force microscopy analysis reveals an overall extremely smooth chip surface, particularly in the vicinity of the nanopores, which exhibits well-defined edges. Our unprecedented transparent chip design provides parallel and independent fluorescent readout of both cavities and buffer reservoir for unbiased single-transporter recordings. Spreading of large unilamellar vesicles with efficiencies up to 96% created nanopore-supported lipid bilayers, which are stable for more than 1 day. A high lipid mobility in the supported membrane was determined by fluorescent recovery after photobleaching. Flux kinetics of α-hemolysin were characterized at single-pore resolution with a rate constant of 0.96 ± 0.06 × 10 -3 s -1 . Here, we deliver an ideal chip platform for pharmaceutical research, which features high parallelism and throughput, synergistically combined with single-transporter resolution.

  8. Experiences in Data-Parallel Programming

    Directory of Open Access Journals (Sweden)

    Terry W. Clark

    1997-01-01

    Full Text Available To efficiently parallelize a scientific application with a data-parallel compiler requires certain structural properties in the source program, and conversely, the absence of others. A recent parallelization effort of ours reinforced this observation and motivated this correspondence. Specifically, we have transformed a Fortran 77 version of GROMOS, a popular dusty-deck program for molecular dynamics, into Fortran D, a data-parallel dialect of Fortran. During this transformation we have encountered a number of difficulties that probably are neither limited to this particular application nor do they seem likely to be addressed by improved compiler technology in the near future. Our experience with GROMOS suggests a number of points to keep in mind when developing software that may at some time in its life cycle be parallelized with a data-parallel compiler. This note presents some guidelines for engineering data-parallel applications that are compatible with Fortran D or High Performance Fortran compilers.

  9. De Novo Ultrascale Atomistic Simulations On High-End Parallel Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Nakano, A; Kalia, R K; Nomura, K; Sharma, A; Vashishta, P; Shimojo, F; van Duin, A; Goddard, III, W A; Biswas, R; Srivastava, D; Yang, L H

    2006-09-04

    We present a de novo hierarchical simulation framework for first-principles based predictive simulations of materials and their validation on high-end parallel supercomputers and geographically distributed clusters. In this framework, high-end chemically reactive and non-reactive molecular dynamics (MD) simulations explore a wide solution space to discover microscopic mechanisms that govern macroscopic material properties, into which highly accurate quantum mechanical (QM) simulations are embedded to validate the discovered mechanisms and quantify the uncertainty of the solution. The framework includes an embedded divide-and-conquer (EDC) algorithmic framework for the design of linear-scaling simulation algorithms with minimal bandwidth complexity and tight error control. The EDC framework also enables adaptive hierarchical simulation with automated model transitioning assisted by graph-based event tracking. A tunable hierarchical cellular decomposition parallelization framework then maps the O(N) EDC algorithms onto Petaflops computers, while achieving performance tunability through a hierarchy of parameterized cell data/computation structures, as well as its implementation using hybrid Grid remote procedure call + message passing + threads programming. High-end computing platforms such as IBM BlueGene/L, SGI Altix 3000 and the NSF TeraGrid provide an excellent test grounds for the framework. On these platforms, we have achieved unprecedented scales of quantum-mechanically accurate and well validated, chemically reactive atomistic simulations--1.06 billion-atom fast reactive force-field MD and 11.8 million-atom (1.04 trillion grid points) quantum-mechanical MD in the framework of the EDC density functional theory on adaptive multigrids--in addition to 134 billion-atom non-reactive space-time multiresolution MD, with the parallel efficiency as high as 0.998 on 65,536 dual-processor BlueGene/L nodes. We have also achieved an automated execution of hierarchical QM

  10. A parallel Monte Carlo code for planar and SPECT imaging: implementation, verification and applications in (131)I SPECT.

    Science.gov (United States)

    Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F

    2002-02-01

    This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.

  11. High-Performance Parallel and Stream Processing of X-ray Microdiffraction Data on Multicores

    International Nuclear Information System (INIS)

    Bauer, Michael A; McIntyre, Stewart; Xie Yuzhen; Biem, Alain; Tamura, Nobumichi

    2012-01-01

    We present the design and implementation of a high-performance system for processing synchrotron X-ray microdiffraction (XRD) data in IBM InfoSphere Streams on multicore processors. We report on the parallel and stream processing techniques that we use to harvest the power of clusters of multicores to analyze hundreds of gigabytes of synchrotron XRD data in order to reveal the microtexture of polycrystalline materials. The timing to process one XRD image using one pipeline is about ten times faster than the best C program at present. With the support of InfoSphere Streams platform, our software is able to be scaled up to operate on clusters of multi-cores for processing multiple images concurrently. This system provides a high-performance processing kernel to achieve near real-time data analysis of image data from synchrotron experiments.

  12. Fermionic Casimir effect for parallel plates in the presence of compact dimensions with applications to nanotubes

    International Nuclear Information System (INIS)

    Bellucci, S.; Saharian, A. A.

    2009-01-01

    We evaluate the Casimir energy and force for a massive fermionic field in the geometry of two parallel plates on background of Minkowski spacetime with an arbitrary number of toroidally compactified spatial dimensions. The bag boundary conditions are imposed on the plates and periodicity conditions with arbitrary phases are considered along the compact dimensions. The Casimir energy is decomposed into purely topological, single plate and interaction parts. With independence of the lengths of the compact dimensions and the phases in the periodicity conditions, the interaction part of the Casimir energy is always negative. In order to obtain the resulting force, the contributions from both sides of the plates must be taken into account. Then, the forces coming from the topological parts of the vacuum energy cancel out and only the interaction term contributes to the Casimir force. Applications of the general formulae to Kaluza-Klein-type models and carbon nanotubes are given. In particular, we show that for finite-length metallic nanotubes, the Casimir forces acting on the tube edges are always attractive, whereas for semiconducting-type ones, they are attractive for small lengths of the nanotube and repulsive for large lengths.

  13. Development of a new dynamic turbulent model, applications to two-dimensional and plane parallel flows

    International Nuclear Information System (INIS)

    Laval, Jean Philippe

    1999-01-01

    We developed a turbulent model based on asymptotic development of the Navier-Stokes equations within the hypothesis of non-local interactions at small scales. This model provides expressions of the turbulent Reynolds sub-grid stresses via estimates of the sub-grid velocities rather than velocities correlations as is usually done. The model involves the coupling of two dynamical equations: one for the resolved scales of motions, which depends upon the Reynolds stresses generated by the sub-grid motions, and one for the sub-grid scales of motions, which can be used to compute the sub-grid Reynolds stresses. The non-locality of interaction at sub-grid scales allows to model their evolution with a linear inhomogeneous equation where the forcing occurs via the energy cascade from resolved to sub-grid scales. This model was solved using a decomposition of sub-grid scales on Gabor's modes and implemented numerically in 2D with periodic boundary conditions. A particles method (PIC) was used to compute the sub-grid scales. The results were compared with results of direct simulations for several typical flows. The model was also applied to plane parallel flows. An analytical study of the equations allows a description of mean velocity profiles in agreement with experimental results and theoretical results based on the symmetries of the Navier-Stokes equation. Possible applications and improvements of the model are discussed in the conclusion. (author) [fr

  14. Short term scheduling of multiple grid-parallel PEM fuel cells for microgrid applications

    Energy Technology Data Exchange (ETDEWEB)

    El-Sharkh, M.Y.; Rahman, A.; Alam, M.S. [Dept. of Electrical and Computer Engineering, University of South Alabama, Mobile, AL 36688 (United States)

    2010-10-15

    This paper presents a short term scheduling scheme for multiple grid-parallel PEM fuel cell power plants (FCPPs) connected to supply electrical and thermal energy to a microgrid community. As in the case of regular power plants, short term scheduling of FCPP is also a cost-based optimization problem that includes the cost of operation, thermal power recovery, and the power trade with the local utility grid. Due to the ability of the microgrid community to trade power with the local grid, the power balance constraint is not applicable, other constraints like the real power operating limits of the FCPP, and minimum up and down time are therefore used. To solve the short term scheduling problem of the FCPPs, a hybrid technique based on evolutionary programming (EP) and hill climbing technique (HC) is used. The EP is used to estimate the optimal schedule and the output power from each FCPP. The HC technique is used to monitor the feasibility of the solution during the search process. The short term scheduling problem is used to estimate the schedule and the electrical and thermal power output of five FCPPs supplying a maximum power of 300 kW. (author)

  15. A 9-Bit 50 MSPS Quadrature Parallel Pipeline ADC for Communication Receiver Application

    Science.gov (United States)

    Roy, Sounak; Banerjee, Swapna

    2018-03-01

    This paper presents the design and implementation of a pipeline Analog-to-Digital Converter (ADC) for superheterodyne receiver application. Several enhancement techniques have been applied in implementing the ADC, in order to relax the target specifications of its building blocks. The concepts of time interleaving and double sampling have been used simultaneously to enhance the sampling speed and to reduce the number of amplifiers used in the ADC. Removal of a front end sample-and-hold amplifier is possible by employing dynamic comparators with switched capacitor based comparison of input signal and reference voltage. Each module of the ADC comprises two 2.5-bit stages followed by two 1.5-bit stages and a 3-bit flash stage. Four such pipeline ADC modules are time interleaved using two pairs of non-overlapping clock signals. These two pairs of clock signals are in phase quadrature with each other. Hence the term quadrature parallel pipeline ADC has been used. These configurations ensure that the entire ADC contains only eight operational-trans-conductance amplifiers. The ADC is implemented in a 0.18-μm CMOS process and supply voltage of 1.8 V. The proto-type is tested at sampling frequencies of 50 and 75 MSPS producing an Effective Number of Bits (ENOB) of 6.86- and 6.11-bits respectively. At peak sampling speed, the core ADC consumes only 65 mW of power.

  16. High Performance Computation of a Jet in Crossflow by Lattice Boltzmann Based Parallel Direct Numerical Simulation

    Directory of Open Access Journals (Sweden)

    Jiang Lei

    2015-01-01

    Full Text Available Direct numerical simulation (DNS of a round jet in crossflow based on lattice Boltzmann method (LBM is carried out on multi-GPU cluster. Data parallel SIMT (single instruction multiple thread characteristic of GPU matches the parallelism of LBM well, which leads to the high efficiency of GPU on the LBM solver. With present GPU settings (6 Nvidia Tesla K20M, the present DNS simulation can be completed in several hours. A grid system of 1.5 × 108 is adopted and largest jet Reynolds number reaches 3000. The jet-to-free-stream velocity ratio is set as 3.3. The jet is orthogonal to the mainstream flow direction. The validated code shows good agreement with experiments. Vortical structures of CRVP, shear-layer vortices and horseshoe vortices, are presented and analyzed based on velocity fields and vorticity distributions. Turbulent statistical quantities of Reynolds stress are also displayed. Coherent structures are revealed in a very fine resolution based on the second invariant of the velocity gradients.

  17. α spectrometer of parallel plate grid ionization chamber of high energy resolution

    International Nuclear Information System (INIS)

    Tong Boting; Wang Jianqing; Dong Mingli; Tang Peijia; Wang Xiaorong; Lin Cansheng

    2000-01-01

    Parallel plate grid ionization chamber with cathode area of 300 cm 2 was developed and applied to detect minimum α-emitters. It consist of a vacuum system, a gas cycle system of the parallel plate grid ionization chamber, electronics (a high voltage supply, a pre-amplifier and a main amplifier) and a computer-multichannel analyzer. The energy resolution is 23 keV FWHM for the 244 Cm electrostatic precipitated source. The integral background is typically 10 counts/h between 4 and 6 MeV. The detector efficiency is 50%. The minimum detecting activity is 3 x 10 -4 Bq (3σ, 30 hours). This spectrometer is suitable for detecting various samples, such as samples of the soil, water, air, bion, food, structural material, geology, archaeology, α-emitters of after processing and measuring α activity of accounting for and control of nuclear material and monitoring the artificial radioactivity nuclides of environment samples around nuclear facilities. The spectrometer is equipped with apparatus for preparing large area α source by using vacuum deposition or ultrasonic pulverization. The operating program of preparing source is simple. The source thickness can be kept in 40-60 μm/cm 2

  18. Performance study of a cluster calculation; parallelization and application under geant4

    International Nuclear Information System (INIS)

    Trabelsi, Abir

    2007-01-01

    This work concretizes the final studies project for engineering computer sciences, it is archived within the national center of nuclear sciences and technology. The project consists in studying the performance of a set of machines in order to determine the best architecture to assemble them in a cluster. As well as the parallelism and the parallel implementation of GEANT4, as a tool of simulation. The realisation of this project consists on : 1) programming with C++ and executing the two benchmarks P MV and PMM on each station; 2) Interpreting this result in order to show the best architecture of the cluster; 3) parallelism with TOP-C the two benchmarks; 4) Executing the two Top-C versions on the cluster; 5) Generalizing this results; 6)parallelism et executing the parallel version of GEANT4. (Author). 14 refs

  19. Toward General Software Level Silent Data Corruption Detection for Parallel Applications

    Energy Technology Data Exchange (ETDEWEB)

    Berrocal, Eduardo; Bautista-Gomez, Leonardo; Di, Sheng; Lan, Zhiling; Cappello, Franck

    2017-12-01

    Silent data corruption (SDC) poses a great challenge for high-performance computing (HPC) applications as we move to extreme-scale systems. Mechanisms have been proposed that are able to detect SDC in HPC applications by using the peculiarities of the data (more specifically, its “smoothness” in time and space) to make predictions. However, these data-analytic solutions are still far from fully protecting applications to a level comparable with more expensive solutions such as full replication. In this work, we propose partial replication to overcome this limitation. More specifically, we have observed that not all processes of an MPI application experience the same level of data variability at exactly the same time. Thus, we can smartly choose and replicate only those processes for which the lightweight data-analytic detectors would perform poorly. In addition, we propose a new evaluation method based on the probability that a corruption will pass unnoticed by a particular detector (instead of just reporting overall single-bit precision and recall). In our experiments, we use four applications dealing with different explosions. Our results indicate that our new approach can protect the MPI applications analyzed with 7–70% less overhead (depending on the application) than that of full duplication with similar detection recall.

  20. Research on Shaft Subsynchronous Oscillation Characteristics of Parallel Generators and SSDC Application in Mitigating SSO of Multi-Generators

    Directory of Open Access Journals (Sweden)

    Shen Wang

    2015-02-01

    Full Text Available Subsynchronous oscillation (SSO of generators caused by high voltage direct current (HVDC systems can be solved by applying supplemental subsynchronous damping controller (SSDC. SSDC application in mitigating SSO of single-generator systems has been studied intensively. This paper focuses on SSDC application in mitigating SSO of multi-generator systems. The phase relationship of the speed signals of the generators under their common mechanical natural frequencies is a key consideration in SSDC design. The paper studies in detail the phase relationship of the speed signals of two generators in parallel under their shared mechanical natural frequency, revealing regardless of whether the two generators are identical or not, there always exists a common-mode and an anti-mode under their common natural frequency, and the phase relationship of the speed signals of the generators depends on the extent to which the anti-mode is stimulated. The paper further demonstrates that to guarantee the effectiveness of SSDC, the anti-phase mode component of its input signal should be eliminated. Based on the above analysis, the paper introduces the design process of SSDC for multi-generator systems and verifies its effectiveness through simulation in Power Systems Computer Aided Design/Electromagnetic Transients including Direct Current (PSCAD/EMTDC.

  1. Parallel computation of fluid-structural interactions using high resolution upwind schemes

    Science.gov (United States)

    Hu, Zongjun

    An efficient and accurate solver is developed to simulate the non-linear fluid-structural interactions in turbomachinery flutter flows. A new low diffusion E-CUSP scheme, Zha CUSP scheme, is developed to improve the efficiency and accuracy of the inviscid flux computation. The 3D unsteady Navier-Stokes equations with the Baldwin-Lomax turbulence model are solved using the finite volume method with the dual-time stepping scheme. The linearized equations are solved with Gauss-Seidel line iterations. The parallel computation is implemented using MPI protocol. The solver is validated with 2D cases for its turbulence modeling, parallel computation and unsteady calculation. The Zha CUSP scheme is validated with 2D cases, including a supersonic flat plate boundary layer, a transonic converging-diverging nozzle and a transonic inlet diffuser. The Zha CUSP2 scheme is tested with 3D cases, including a circular-to-rectangular nozzle, a subsonic compressor cascade and a transonic channel. The Zha CUSP schemes are proved to be accurate, robust and efficient in these tests. The steady and unsteady separation flows in a 3D stationary cascade under high incidence and three inlet Mach numbers are calculated to study the steady state separation flow patterns and their unsteady oscillation characteristics. The leading edge vortex shedding is the mechanism behind the unsteady characteristics of the high incidence separated flows. The separation flow characteristics is affected by the inlet Mach number. The blade aeroelasticity of a linear cascade with forced oscillating blades is studied using parallel computation. A simplified two-passage cascade with periodic boundary condition is first calculated under a medium frequency and a low incidence. The full scale cascade with 9 blades and two end walls is then studied more extensively under three oscillation frequencies and two incidence angles. The end wall influence and the blade stability are studied and compared under different

  2. FOCEX: A fiber-optic extender for a high speed parallel RS485 data cable

    International Nuclear Information System (INIS)

    Meadows, J.T.; Anderson, J.T.; Cooper, P.S.; Engelfried, J.; Franzen, J.W.; Forster, B.G.; Levinson, F.; Rawls, J.; Haber, S.

    1995-05-01

    For longer-distant, high speed data links, optical fibre becomes most cost-effective than copper or other hard wire cable systems. Fermilab supplied to Finisar Corp. of Menlo Park, CA., a set of specifications for card functions, sizes and interconnector pin assignments. Finisar designed and assembled a set of fiber optical P.C. cards using 100 megabyte/sec commercial optoelectronics and a serialization and deserialization HOT-ROD chipset designed by GAZELLE Microcircuits, Inc. (A Tri Quint Semiconductors company). The cooperative effort between Fermilab and Finisar has allowed Fermilab to created a reliable 50 Megabytes/sec (40 bit parallel RS485 DART data bus) cable to cable extender using a virtually invisible Fiber Channel point-to-point(FC-0) fiber optical single-simplex system. The system is easily capable of sustaining a 50 megabytes/sec of data, control and status line throughput at distances of 1625 feet (500 meters) using standard multi-mode fiber

  3. Nonlinear Elastodynamic Behaviour Analysis of High-Speed Spatial Parallel Coordinate Measuring Machines

    Directory of Open Access Journals (Sweden)

    Xiulong Chen

    2012-10-01

    Full Text Available In order to study the elastodynamic behaviour of 4- universal joints- prismatic pairs- spherical joints / universal joints- prismatic pairs- universal joints 4-UPS-UPU high-speed spatial PCMMs(parallel coordinate measuring machines, the nonlinear time-varying dynamics model, which comprehensively considers geometric nonlinearity and the rigid-flexible coupling effect, is derived by using Lagrange equations and finite element methods. Based on the Newmark method, the kinematics output response of 4-UPS-UPU PCMMs is illustrated through numerical simulation. The results of the simulation show that the flexibility of the links is demonstrated to have a significant impact on the system dynamics response. This research can provide the important theoretical base of the optimization design and vibration control for 4-UPS-UPU PCMMs.

  4. Evaluation of emerging parallel optical link technology for high energy physics

    International Nuclear Information System (INIS)

    Chramowicz, J; Kwan, S; Prosser, A; Winchell, M

    2012-01-01

    Modern particle detectors utilize optical fiber links to deliver event data to upstream trigger and data processing systems. Future detector systems can benefit from the development of dense arrangements of high speed optical links emerging from industry advancements in transceiver technology. Supporting data transfers of up to 120 Gbps in each direction, optical engines permit assembly of the optical transceivers in close proximity to ASICs and FPGAs. Test results of some of these parallel components will be presented including the development of pluggable FPGA Mezzanine Cards equipped with optical engines to provide to collaborators on the Versatile Link Common Project for the HI-LHC at CERN. This work was supported by the U.S. Department of Energy, operated by Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359 with the United States Department of Energy.

  5. An overview of the activities of the OECD/NEA Task Force on adapting computer codes in nuclear applications to parallel architectures

    Energy Technology Data Exchange (ETDEWEB)

    Kirk, B.L. [Oak Ridge National Lab., TN (United States); Sartori, E. [OCDE/OECD NEA Data Bank, Issy-les-Moulineaux (France); Viedma, L.G. de [Consejo de Seguridad Nuclear, Madrid (Spain)

    1997-06-01

    Subsequent to the introduction of High Performance Computing in the developed countries, the Organization for Economic Cooperation and Development/Nuclear Energy Agency (OECD/NEA) created the Task Force on Adapting Computer Codes in Nuclear Applications to Parallel Architectures (under the guidance of the Nuclear Science Committee`s Working Party on Advanced Computing) to study the growth area in supercomputing and its applicability to the nuclear community`s computer codes. The result has been four years of investigation for the Task Force in different subject fields - deterministic and Monte Carlo radiation transport, computational mechanics and fluid dynamics, nuclear safety, atmospheric models and waste management.

  6. An overview of the activities of the OECD/NEA Task Force on adapting computer codes in nuclear applications to parallel architectures

    International Nuclear Information System (INIS)

    Kirk, B.L.; Sartori, E.; Viedma, L.G. de

    1997-01-01

    Subsequent to the introduction of High Performance Computing in the developed countries, the Organization for Economic Cooperation and Development/Nuclear Energy Agency (OECD/NEA) created the Task Force on Adapting Computer Codes in Nuclear Applications to Parallel Architectures (under the guidance of the Nuclear Science Committee's Working Party on Advanced Computing) to study the growth area in supercomputing and its applicability to the nuclear community's computer codes. The result has been four years of investigation for the Task Force in different subject fields - deterministic and Monte Carlo radiation transport, computational mechanics and fluid dynamics, nuclear safety, atmospheric models and waste management

  7. Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems

    KAUST Repository

    Mudigere, Dheevatsa; Sridharan, Srinivas; Deshpande, Anand; Park, Jongsoo; Heinecke, Alexander; Smelyanskiy, Mikhail; Kaul, Bharat; Dubey, Pradeep; Kaushik, Dinesh; Keyes, David E.

    2015-01-01

    -grid implicit flow solver, which forms the backbone of computational aerodynamics, poses particular challenges due to its large irregular working sets, unstructured memory accesses, and variable/limited amount of parallelism. This code, based on a domain

  8. Massively Parallel Computing at Sandia and Its Application to National Defense

    National Research Council Canada - National Science Library

    Dosanjh, Sudip

    1991-01-01

    Two years ago, researchers at Sandia National Laboratories showed that a massively parallel computer with 1024 processors could solve scientific problems more than 1000 times faster than a single processor...

  9. Domain Specific Language for Geant4 Parallelization for Space-based Applications, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — A major limiting factor in HPC growth is the requirement to parallelize codes to leverage emerging architectures, especially as single core performance has plateaued...

  10. Parallelization of learning problems by artificial neural networks. Application in external radiotherapy

    International Nuclear Information System (INIS)

    Sauget, M.

    2007-12-01

    This research is about the application of neural networks used in the external radiotherapy domain. The goal is to elaborate a new evaluating system for the radiation dose distributions in heterogeneous environments. The al objective of this work is to build a complete tool kit to evaluate the optimal treatment planning. My st research point is about the conception of an incremental learning algorithm. The interest of my work is to combine different optimizations specialized in the function interpolation and to propose a new algorithm allowing to change the neural network architecture during the learning phase. This algorithm allows to minimise the al size of the neural network while keeping a good accuracy. The second part of my research is to parallelize the previous incremental learning algorithm. The goal of that work is to increase the speed of the learning step as well as the size of the learned dataset needed in a clinical case. For that, our incremental learning algorithm presents an original data decomposition with overlapping, together with a fault tolerance mechanism. My last research point is about a fast and accurate algorithm computing the radiation dose deposit in any heterogeneous environment. At the present time, the existing solutions used are not optimal. The fast solution are not accurate and do not give an optimal treatment planning. On the other hand, the accurate solutions are far too slow to be used in a clinical context. Our algorithm answers to this problem by bringing rapidity and accuracy. The concept is to use a neural network adequately learned together with a mechanism taking into account the environment changes. The advantages of this algorithm is to avoid the use of a complex physical code while keeping a good accuracy and reasonable computation times. (author)

  11. High-speed technique based on a parallel projection correlation procedure for digital image correlation

    Science.gov (United States)

    Zaripov, D. I.; Renfu, Li

    2018-05-01

    The implementation of high-efficiency digital image correlation methods based on a zero-normalized cross-correlation (ZNCC) procedure for high-speed, time-resolved measurements using a high-resolution digital camera is associated with big data processing and is often time consuming. In order to speed-up ZNCC computation, a high-speed technique based on a parallel projection correlation procedure is proposed. The proposed technique involves the use of interrogation window projections instead of its two-dimensional field of luminous intensity. This simplification allows acceleration of ZNCC computation up to 28.8 times compared to ZNCC calculated directly, depending on the size of interrogation window and region of interest. The results of three synthetic test cases, such as a one-dimensional uniform flow, a linear shear flow and a turbulent boundary-layer flow, are discussed in terms of accuracy. In the latter case, the proposed technique is implemented together with an iterative window-deformation technique. On the basis of the results of the present work, the proposed technique is recommended to be used for initial velocity field calculation, with further correction using more accurate techniques.

  12. Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications.

    Science.gov (United States)

    Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J

    2004-09-01

    We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.

  13. Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications

    International Nuclear Information System (INIS)

    Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J.

    2004-01-01

    We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1x10 8 or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8x10 8 histories. For a smaller number of histories (1x10 8 ) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1x10 8 histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy

  14. High-voltage isolation transformer for sub-nanosecond rise time pulses constructed with annular parallel-strip transmission lines.

    Science.gov (United States)

    Homma, Akira

    2011-07-01

    A novel annular parallel-strip transmission line was devised to construct high-voltage high-speed pulse isolation transformers. The transmission lines can easily realize stable high-voltage operation and good impedance matching between primary and secondary circuits. The time constant for the step response of the transformer was calculated by introducing a simple low-frequency equivalent circuit model. Results show that the relation between the time constant and low-cut-off frequency of the transformer conforms to the theory of the general first-order linear time-invariant system. Results also show that the test transformer composed of the new transmission lines can transmit about 600 ps rise time pulses across the dc potential difference of more than 150 kV with insertion loss of -2.5 dB. The measured effective time constant of 12 ns agreed exactly with the theoretically predicted value. For practical applications involving the delivery of synchronized trigger signals to a dc high-voltage electron gun station, the transformer described in this paper exhibited advantages over methods using fiber optic cables for the signal transfer system. This transformer has no jitter or breakdown problems that invariably occur in active circuit components.

  15. Using the Eclipse Parallel Tools Platform to Assist Earth Science Model Development and Optimization on High Performance Computers

    Science.gov (United States)

    Alameda, J. C.

    2011-12-01

    Development and optimization of computational science models, particularly on high performance computers, and with the advent of ubiquitous multicore processor systems, practically on every system, has been accomplished with basic software tools, typically, command-line based compilers, debuggers, performance tools that have not changed substantially from the days of serial and early vector computers. However, model complexity, including the complexity added by modern message passing libraries such as MPI, and the need for hybrid code models (such as openMP and MPI) to be able to take full advantage of high performance computers with an increasing core count per shared memory node, has made development and optimization of such codes an increasingly arduous task. Additional architectural developments, such as many-core processors, only complicate the situation further. In this paper, we describe how our NSF-funded project, "SI2-SSI: A Productive and Accessible Development Workbench for HPC Applications Using the Eclipse Parallel Tools Platform" (WHPC) seeks to improve the Eclipse Parallel Tools Platform, an environment designed to support scientific code development targeted at a diverse set of high performance computing systems. Our WHPC project to improve Eclipse PTP takes an application-centric view to improve PTP. We are using a set of scientific applications, each with a variety of challenges, and using PTP to drive further improvements to both the scientific application, as well as to understand shortcomings in Eclipse PTP from an application developer perspective, to drive our list of improvements we seek to make. We are also partnering with performance tool providers, to drive higher quality performance tool integration. We have partnered with the Cactus group at Louisiana State University to improve Eclipse's ability to work with computational frameworks and extremely complex build systems, as well as to develop educational materials to incorporate into

  16. Operational mesoscale atmospheric dispersion prediction using high performance parallel computing cluster for emergency response

    International Nuclear Information System (INIS)

    Srinivas, C.V.; Venkatesan, R.; Muralidharan, N.V.; Das, Someshwar; Dass, Hari; Eswara Kumar, P.

    2005-08-01

    An operational atmospheric dispersion prediction system is implemented on a cluster super computer for 'Online Emergency Response' for Kalpakkam nuclear site. The numerical system constitutes a parallel version of a nested grid meso-scale meteorological model MM5 coupled to a random walk particle dispersion model FLEXPART. The system provides 48 hour forecast of the local weather and radioactive plume dispersion due to hypothetical air borne releases in a range of 100 km around the site. The parallel code was implemented on different cluster configurations like distributed and shared memory systems. Results of MM5 run time performance for 1-day prediction are reported on all the machines available for testing. A reduction of 5 times in runtime is achieved using 9 dual Xeon nodes (18 physical/36 logical processors) compared to a single node sequential run. Based on the above run time results a cluster computer facility with 9-node Dual Xeon is commissioned at IGCAR for model operation. The run time of a triple nested domain MM5 is about 4 h for 24 h forecast. The system has been operated continuously for a few months and results were ported on the IMSc home page. Initial and periodic boundary condition data for MM5 are provided by NCMRWF, New Delhi. An alternative source is found to be NCEP, USA. These two sources provide the input data to the operational models at different spatial and temporal resolutions and using different assimilation methods. A comparative study on the results of forecast is presented using these two data sources for present operational use. Slight improvement is noticed in rainfall, winds, geopotential heights and the vertical atmospheric structure while using NCEP data probably because of its high spatial and temporal resolution. (author)

  17. Piecewise - Parabolic Methods for Parallel Computation with Applications to Unstable Fluid Flow in 2 and 3 Dimensions

    Energy Technology Data Exchange (ETDEWEB)

    Woodward, P. R.

    2003-03-26

    This report summarizes the results of the project entitled, ''Piecewise-Parabolic Methods for Parallel Computation with Applications to Unstable Fluid Flow in 2 and 3 Dimensions'' This project covers a span of many years, beginning in early 1987. It has provided over that considerable period the core funding to my research activities in scientific computation at the University of Minnesota. It has supported numerical algorithm development, application of those algorithms to fundamental fluid dynamics problems in order to demonstrate their effectiveness, and the development of scientific visualization software and systems to extract scientific understanding from those applications.

  18. Cpl6: The New Extensible, High-Performance Parallel Coupler forthe Community Climate System Model

    Energy Technology Data Exchange (ETDEWEB)

    Craig, Anthony P.; Jacob, Robert L.; Kauffman, Brain; Bettge,Tom; Larson, Jay; Ong, Everest; Ding, Chris; He, Yun

    2005-03-24

    Coupled climate models are large, multiphysics applications designed to simulate the Earth's climate and predict the response of the climate to any changes in the forcing or boundary conditions. The Community Climate System Model (CCSM) is a widely used state-of-art climate model that has released several versions to the climate community over the past ten years. Like many climate models, CCSM employs a coupler, a functional unit that coordinates the exchange of data between parts of climate system such as the atmosphere and ocean. This paper describes the new coupler, cpl6, contained in the latest version of CCSM,CCSM3. Cpl6 introduces distributed-memory parallelism to the coupler, a class library for important coupler functions, and a standardized interface for component models. Cpl6 is implemented entirely in Fortran90 and uses Model Coupling Toolkit as the base for most of its classes. Cpl6 gives improved performance over previous versions and scales well on multiple platforms.

  19. Feasibility studies for a high energy physics MC program on massive parallel platforms

    International Nuclear Information System (INIS)

    Bertolotto, L.M.; Peach, K.J.; Apostolakis, J.; Bruschini, C.E.; Calafiura, P.; Gagliardi, F.; Metcalf, M.; Norton, A.; Panzer-Steindel, B.

    1994-01-01

    The parallelization of a Monte Carlo program for the NA48 experiment is presented. As a first step, a task farming structure was realized. Based on this, a further step, making use of a distributed database for showers in the electro-magnetic calorimeter, was implemented. Further possibilities for using parallel processing for a quasi-real time calibration of the calorimeter are described

  20. Parallelization of applications for networks with homogeneous and heterogeneous processors; Parallelisation d`applications pour des reseaux de processeurs homogenes ou heterogenes

    Energy Technology Data Exchange (ETDEWEB)

    Colombet, L

    1994-10-07

    The aim of this thesis is to study and develop efficient methods for parallelization of scientific applications on parallel computers with distributed memory. The first part presents two libraries of PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) communication tools. They allow implementation of programs on most parallel machines, but also on heterogeneous computer networks. This chapter illustrates the problems faced when trying to evaluate performances of networks with heterogeneous processors. To evaluate such performances, the concepts of speed-up and efficiency have been modified and adapted to account for heterogeneity. The second part deals with a study of parallel application libraries such as ScaLAPACK and with the development of communication masking techniques. The general concept is based on communication anticipation, in particular by pipelining message sending operations. Experimental results on Cray T3D and IBM SP1 machines validates the theoretical studies performed on basic algorithms of the libraries discussed above. Two examples of scientific applications are given: the first is a model of young stars for astrophysics and the other is a model of photon trajectories in the Compton effect. (J.S.). 83 refs., 65 figs., 24 tabs.

  1. A modified parallel artificial membrane permeability assay for evaluating the bioconcentration of highly hydrophobic chemicals in fish.

    Science.gov (United States)

    Kwon, Jung-Hwan; Escher, Beate I

    2008-03-01

    Low cost in vitro tools are needed at the screening stage of assessment of bioaccumulation potential of new and existing chemicals because the number of chemical substances that needs to be tested highly exceeds the capacity of in vivo bioconcentration tests. Thus, the parallel artificial membrane permeability assay (PAMPA) system was modified to predict passive uptake/ elimination rate in fish. To overcome the difficulties associated with low aqueous solubility and high membrane affinity of highly hydrophobic chemicals, we measured the rate of permeation from the donor poly(dimethylsiloxane)(PDMS) disk to the acceptor PDMS disk through aqueous and PDMS membrane boundary layers and term the modified PAMPA system "PDMS-PAMPA". Twenty chemicals were selected for validation of PDMS-PAMPA. The measured permeability is proportional to the passive elimination rate constant in fish and was used to predict the "minimum" in vivo elimination rate constant. The in vivo data were very close to predicted values except for a few polar chemicals and metabolically active chemicals, such as pyrene and benzo[a]pyrene. Thus, PDMS-PAMPA can be an appropriate in vitro system for nonmetabolizable chemicals. Combination with metabolic clearance rates using a battery of metabolic degradation assays would enhance the applicability for metabolizable chemicals.

  2. Mechanics of curved surfaces, with application to surface-parallel cracks

    Science.gov (United States)

    Martel, Stephen J.

    2011-10-01

    The surfaces of many bodies are weakened by shallow enigmatic cracks that parallel the surface. A re-formulation of the static equilibrium equations in a curvilinear reference frame shows that a tension perpendicular to a traction-free surface can arise at shallow depths even under the influence of gravity. This condition occurs if σ11k1 + σ22k2 > ρg cosβ, where k1 and k2 are the principal curvatures (negative if convex) at the surface, σ11 and σ22 are tensile (positive) or compressive (negative) stresses parallel to the respective principal curvature arcs, ρ is material density, g is gravitational acceleration, and β is the surface slope. The curvature terms do not appear in equilibrium equations in a Cartesian reference frame. Compression parallel to a convex surface thus can cause subsurface cracks to open. A quantitative test of the relationship above accounts for where sheeting joints (prominent shallow surface-parallel fractures in rock) are abundant and for where they are scarce or absent in the varied topography of Yosemite National Park, resolving key aspects of a classic problem in geology: the formation of sheeting joints. Moreover, since the equilibrium equations are independent of rheology, the relationship above can be applied to delamination or spalling caused by surface-parallel cracks in many materials.

  3. Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

    Science.gov (United States)

    Harper, Richard

    1989-01-01

    In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.

  4. Structural Directed Growth of Ultrathin Parallel Birnessite on β-MnO2 for High-Performance Asymmetric Supercapacitors.

    Science.gov (United States)

    Zhu, Shijin; Li, Li; Liu, Jiabin; Wang, Hongtao; Wang, Tian; Zhang, Yuxin; Zhang, Lili; Ruoff, Rodney S; Dong, Fan

    2018-02-27

    Two-dimensional birnessite has attracted attention for electrochemical energy storage because of the presence of redox active Mn 4+ /Mn 3+ ions and spacious interlayer channels available for ions diffusion. However, current strategies are largely limited to enhancing the electrical conductivity of birnessite. One key limitation affecting the electrochemical properties of birnessite is the poor utilization of the MnO 6 unit. Here, we assemble β-MnO 2 /birnessite core-shell structure that exploits the exposed crystal face of β-MnO 2 as the core and ultrathin birnessite sheets that have the structure advantage to enhance the utilization efficiency of the Mn from the bulk. Our birnessite that has sheets parallel to each other is found to have unusual crystal structure with interlayer spacing, Mn(III)/Mn(IV) ratio and the content of the balancing cations differing from that of the common birnessite. The substrate directed growth mechanism is carefully investigated. The as-prepared core-shell nanostructures enhance the exposed surface area of birnessite and achieve high electrochemical performances (for example, 657 F g -1 in 1 M Na 2 SO 4 electrolyte based on the weight of parallel birnessite) and excellent rate capability over a potential window of up to 1.2 V. This strategy opens avenues for fundamental studies of birnessite and its properties and suggests the possibility of its use in energy storage and other applications. The potential window of an asymmetric supercapacitor that was assembled with this material can be enlarged to 2.2 V (in aqueous electrolyte) with a good cycling ability.

  5. Generic method for deriving the general shaking force balance conditions of parallel manipulators with application to a redundant planar 4-RRR parallel manipulator

    NARCIS (Netherlands)

    van der Wijk, V.; Krut, S.; Pierrot, F.; Herder, Justus Laurens

    2011-01-01

    This paper proposes a generic method for deriving the general shaking force balance conditions of parallel manipulators. Instead of considering the balancing of a parallel manipulator link-by-link or leg-by-leg, the architecture is considered altogether. The first step is to write the linear

  6. An optimization approach to parallel generation solar PV investments in the U.S.: Two applications illustrate the case for tariff reform

    International Nuclear Information System (INIS)

    Kildegaard, Arne; Wente, Jordan

    2015-01-01

    We construct a model to optimize the economics of distributed generation photovoltaics (DGPV) for a parallel generation (behind-the-meter) application. Applying the model to the short-interval load and insolation data for two similar dairy operations in the U.S. Upper Midwest region, we find that highly site-specific differences in parameters lead to strikingly divergent results. Operating behind-the-meter strongly rewards real-time concurrence between on-site generation and on-site load. Compared to operating under a value of solar tariff (VOST) or net energy metering (NEM), we argue that parallel generation tariffs amplify the existing, irreducible uncertainties of project economics, and discourage DGPV investment. - Highlights: • Presents a novel approach to optimizing the size of behind-the-meter PV. • Demonstrates interaction of tax and financial parameters with load and insolation data. • Identifies how behind-the-meter operation raises risk to project economics.

  7. Developing a Massively Parallel Forward Projection Radiography Model for Large-Scale Industrial Applications

    Energy Technology Data Exchange (ETDEWEB)

    Bauerle, Matthew [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2014-08-01

    This project utilizes Graphics Processing Units (GPUs) to compute radiograph simulations for arbitrary objects. The generation of radiographs, also known as the forward projection imaging model, is computationally intensive and not widely utilized. The goal of this research is to develop a massively parallel algorithm that can compute forward projections for objects with a trillion voxels (3D pixels). To achieve this end, the data are divided into blocks that can each t into GPU memory. The forward projected image is also divided into segments to allow for future parallelization and to avoid needless computations.

  8. Development of novel series and parallel sensing system based on nanostructured surface enhanced Raman scattering substrate for biomedical application

    Science.gov (United States)

    Chang, Te-Wei

    With the advance of nanofabrication, the capability of nanoscale metallic structure fabrication opens a whole new study in nanoplasmonics, which is defined as the investigation of photon-electron interaction in the vicinity of nanoscale metallic structures. The strong oscillation of free electrons at the interface between metal and surrounding dielectric material caused by propagating surface plasmon resonance (SPR) or localized surface plasmon resonance (LSPR) enables a variety of new applications in different areas, especially biological sensing techniques. One of the promising biological sensing applications by surface resonance polariton is surface enhanced Raman spectroscopy (SERS), which significantly reinforces the feeble signal of traditional Raman scattering by at least 104 times. It enables highly sensitive and precise molecule identification with the assistance of a SERS substrate. Until now, the design of new SERS substrate fabrication process is still thriving since no dominant design has emerged yet. The ideal process should be able to achieve both a high sensitivity and low cost device in a simple and reliable way. In this thesis two promising approaches for fabricating nanostructured SERS substrate are proposed: thermal dewetting technique and nanoimprint replica technique. These two techniques are demonstrated to show the capability of fabricating high performance SERS substrate in a reliable and cost efficient fashion. In addition, these two techniques have their own unique characteristics and can be integrated with other sensing techniques to build a serial or parallel sensing system. The breakthrough of a combination system with different sensing techniques overcomes the inherent limitations of SERS detection and leverages it to a whole new level of systematic sensing. The development of a sensing platform based on thermal dewetting technique is covered as the first half of this thesis. The process optimization, selection of substrate material

  9. Reactor Dosimetry Applications Using RAPTOR-M3G:. a New Parallel 3-D Radiation Transport Code

    Science.gov (United States)

    Longoni, Gianluca; Anderson, Stanwood L.

    2009-08-01

    The numerical solution of the Linearized Boltzmann Equation (LBE) via the Discrete Ordinates method (SN) requires extensive computational resources for large 3-D neutron and gamma transport applications due to the concurrent discretization of the angular, spatial, and energy domains. This paper will discuss the development RAPTOR-M3G (RApid Parallel Transport Of Radiation - Multiple 3D Geometries), a new 3-D parallel radiation transport code, and its application to the calculation of ex-vessel neutron dosimetry responses in the cavity of a commercial 2-loop Pressurized Water Reactor (PWR). RAPTOR-M3G is based domain decomposition algorithms, where the spatial and angular domains are allocated and processed on multi-processor computer architectures. As compared to traditional single-processor applications, this approach reduces the computational load as well as the memory requirement per processor, yielding an efficient solution methodology for large 3-D problems. Measured neutron dosimetry responses in the reactor cavity air gap will be compared to the RAPTOR-M3G predictions. This paper is organized as follows: Section 1 discusses the RAPTOR-M3G methodology; Section 2 describes the 2-loop PWR model and the numerical results obtained. Section 3 addresses the parallel performance of the code, and Section 4 concludes this paper with final remarks and future work.

  10. High Performance Molecular Visualization: In-Situ and Parallel Rendering with EGL

    Science.gov (United States)

    Stone, John E.; Messmer, Peter; Sisneros, Robert; Schulten, Klaus

    2016-01-01

    Large scale molecular dynamics simulations produce terabytes of data that is impractical to transfer to remote facilities. It is therefore necessary to perform visualization tasks in-situ as the data are generated, or by running interactive remote visualization sessions and batch analyses co-located with direct access to high performance storage systems. A significant challenge for deploying visualization software within clouds, clusters, and supercomputers involves the operating system software required to initialize and manage graphics acceleration hardware. Recently, it has become possible for applications to use the Embedded-system Graphics Library (EGL) to eliminate the requirement for windowing system software on compute nodes, thereby eliminating a significant obstacle to broader use of high performance visualization applications. We outline the potential benefits of this approach in the context of visualization applications used in the cloud, on commodity clusters, and supercomputers. We discuss the implementation of EGL support in VMD, a widely used molecular visualization application, and we outline benefits of the approach for molecular visualization tasks on petascale computers, clouds, and remote visualization servers. We then provide a brief evaluation of the use of EGL in VMD, with tests using developmental graphics drivers on conventional workstations and on Amazon EC2 G2 GPU-accelerated cloud instance types. We expect that the techniques described here will be of broad benefit to many other visualization applications. PMID:27747137

  11. High-throughput fabrication of micrometer-sized compound parabolic mirror arrays by using parallel laser direct-write processing

    International Nuclear Information System (INIS)

    Yan, Wensheng; Gu, Min; Cumming, Benjamin P

    2015-01-01

    Micrometer-sized parabolic mirror arrays have significant applications in both light emitting diodes and solar cells. However, low fabrication throughput has been identified as major obstacle for the mirror arrays towards large-scale applications due to the serial nature of the conventional method. Here, the mirror arrays are fabricated by using a parallel laser direct-write processing, which addresses this barrier. In addition, it is demonstrated that the parallel writing is able to fabricate complex arrays besides simple arrays and thus offers wider applications. Optical measurements show that each single mirror confines the full-width at half-maximum value to as small as 17.8 μm at the height of 150 μm whilst providing a transmittance of up to 68.3% at a wavelength of 633 nm in good agreement with the calculation values. (paper)

  12. Parallel computation

    International Nuclear Information System (INIS)

    Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.

    1997-01-01

    The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment

  13. Research and development of a gaseous detector PIM (parallel ionization multiplier) dedicated to particle tracking under high hadron rates

    International Nuclear Information System (INIS)

    Beucher, J.

    2007-10-01

    PIM (Parallel Ionization Multiplier) is a multi-stage micro-pattern gaseous detector using micro-meshes technology. This new device, based on Micromegas (micro-mesh gaseous structure) detector principle of operation, offers good characteristics for minimum ionizing particles track detection. However, this kind of detectors placed in hadron environment suffers discharges which degrade sensibly the detection efficiency and account for hazard to the front-end electronics. In order to minimize these strong events, it is convenient to perform charges multiplication by several successive steps. Within the framework of a European hadron physics project we have investigated the multi-stage PIM detector for high hadrons flux application. For this part of research and development, a systematic study for many geometrical configurations of a two amplification stages separated with a transfer space operated with the gaseous mixture Ne + 10% CO 2 has been performed. Beam tests realised with high energy hadrons at CERN facility have given that discharges probability could be strongly reduced with a suitable PIM device. A discharges rate lower to 10 9 by incident hadron and a spatial resolution of 51 μm have been measured at the beginning efficiency plateau (>96 %) operating point. (author)

  14. High fidelity thermal-hydraulic analysis using CFD and massively parallel computers

    International Nuclear Information System (INIS)

    Weber, D.P.; Wei, T.Y.C.; Brewster, R.A.; Rock, Daniel T.; Rizwan-uddin

    2000-01-01

    Thermal-hydraulic analyses play an important role in design and reload analysis of nuclear power plants. These analyses have historically relied on early generation computational fluid dynamics capabilities, originally developed in the 1960s and 1970s. Over the last twenty years, however, dramatic improvements in both computational fluid dynamics codes in the commercial sector and in computing power have taken place. These developments offer the possibility of performing large scale, high fidelity, core thermal hydraulics analysis. Such analyses will allow a determination of the conservatism employed in traditional design approaches and possibly justify the operation of nuclear power systems at higher powers without compromising safety margins. The objective of this work is to demonstrate such a large scale analysis approach using a state of the art CFD code, STAR-CD, and the computing power of massively parallel computers, provided by IBM. A high fidelity representation of a current generation PWR was analyzed with the STAR-CD CFD code and the results were compared to traditional analyses based on the VIPRE code. Current design methodology typically involves a simplified representation of the assemblies, where a single average pin is used in each assembly to determine the hot assembly from a whole core analysis. After determining this assembly, increased refinement is used in the hot assembly, and possibly some of its neighbors, to refine the analysis for purposes of calculating DNBR. This latter calculation is performed with sub-channel codes such as VIPRE. The modeling simplifications that are used involve the approximate treatment of surrounding assemblies and coarse representation of the hot assembly, where the subchannel is the lowest level of discretization. In the high fidelity analysis performed in this study, both restrictions have been removed. Within the hot assembly, several hundred thousand to several million computational zones have been used, to

  15. The parallel implementation of a backpropagation neural network and its applicability to SPECT image reconstruction

    Energy Technology Data Exchange (ETDEWEB)

    Kerr, John Patrick [Iowa State Univ., Ames, IA (United States)

    1992-01-01

    The objective of this study was to determine the feasibility of using an Artificial Neural Network (ANN), in particular a backpropagation ANN, to improve the speed and quality of the reconstruction of three-dimensional SPECT (single photon emission computed tomography) images. In addition, since the processing elements (PE)s in each layer of an ANN are independent of each other, the speed and efficiency of the neural network architecture could be better optimized by implementing the ANN on a massively parallel computer. The specific goals of this research were: to implement a fully interconnected backpropagation neural network on a serial computer and a SIMD parallel computer, to identify any reduction in the time required to train these networks on the parallel machine versus the serial machine, to determine if these neural networks can learn to recognize SPECT data by training them on a section of an actual SPECT image, and to determine from the knowledge obtained in this research if full SPECT image reconstruction by an ANN implemented on a parallel computer is feasible both in time required to train the network, and in quality of the images reconstructed.

  16. An Inconvenient Truth: An Application of the Extended Parallel Process Model

    Science.gov (United States)

    Goodall, Catherine E.; Roberto, Anthony J.

    2008-01-01

    "An Inconvenient Truth" is an Academy Award-winning documentary about global warming presented by Al Gore. This documentary is appropriate for a lesson on fear appeals and the extended parallel process model (EPPM). The EPPM is concerned with the effects of perceived threat and efficacy on behavior change. Perceived threat is composed of an…

  17. Application of Parallel Hierarchical Matrices and Low-Rank Tensors in Spatial Statistics and Parameter Identification

    KAUST Repository

    Litvinenko, Alexander

    2018-03-12

    Part 1: Parallel H-matrices in spatial statistics 1. Motivation: improve statistical model 2. Tools: Hierarchical matrices 3. Matern covariance function and joint Gaussian likelihood 4. Identification of unknown parameters via maximizing Gaussian log-likelihood 5. Implementation with HLIBPro. Part 2: Low-rank Tucker tensor methods in spatial statistics

  18. Application of Parallel Hierarchical Matrices in Spatial Statistics and Parameter Identification

    KAUST Repository

    Litvinenko, Alexander

    2018-04-20

    Parallel H-matrices in spatial statistics 1. Motivation: improve statistical model 2. Tools: Hierarchical matrices [Hackbusch 1999] 3. Matern covariance function and joint Gaussian likelihood 4. Identification of unknown parameters via maximizing Gaussian log-likelihood 5. Implementation with HLIBPro

  19. Mixing subattolitre volumes in a quantitative and highly parallel manner with soft matter nanofluidics

    DEFF Research Database (Denmark)

    Christensen, Sune M.; Bolinger, Pierre-Yves; Hatzakis, Nikos

    2012-01-01

    Handling and mixing ultrasmall volumes of reactants in parallel can increase the throughput and complexity of screening assays while simultaneously reducing reagent consumption. Microfabricated silicon and plastic can provide reliable fluidic devices, but cannot typically handle total volumes sma...

  20. Optimized Parallel Discrete Event Simulation (PDES) for High Performance Computing (HPC) Clusters

    National Research Council Canada - National Science Library

    Abu-Ghazaleh, Nael

    2005-01-01

    The aim of this project was to study the communication subsystem performance of state of the art optimistic simulator Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES...

  1. High-resolution whole-brain diffusion MRI at 7T using radiofrequency parallel transmission.

    Science.gov (United States)

    Wu, Xiaoping; Auerbach, Edward J; Vu, An T; Moeller, Steen; Lenglet, Christophe; Schmitter, Sebastian; Van de Moortele, Pierre-François; Yacoub, Essa; Uğurbil, Kâmil

    2018-03-30

    Investigating the utility of RF parallel transmission (pTx) for Human Connectome Project (HCP)-style whole-brain diffusion MRI (dMRI) data at 7 Tesla (7T). Healthy subjects were scanned in pTx and single-transmit (1Tx) modes. Multiband (MB), single-spoke pTx pulses were designed to image sagittal slices. HCP-style dMRI data (i.e., 1.05-mm resolutions, MB2, b-values = 1000/2000 s/mm 2 , 286 images and 40-min scan) and data with higher accelerations (MB3 and MB4) were acquired with pTx. pTx significantly improved flip-angle detected signal uniformity across the brain, yielding ∼19% increase in temporal SNR (tSNR) averaged over the brain relative to 1Tx. This allowed significantly enhanced estimation of multiple fiber orientations (with ∼21% decrease in dispersion) in HCP-style 7T dMRI datasets. Additionally, pTx pulses achieved substantially lower power deposition, permitting higher accelerations, enabling collection of the same data in 2/3 and 1/2 the scan time or of more data in the same scan time. pTx provides a solution to two major limitations for slice-accelerated high-resolution whole-brain dMRI at 7T; it improves flip-angle uniformity, and enables higher slice acceleration relative to current state-of-the-art. As such, pTx provides significant advantages for rapid acquisition of high-quality, high-resolution truly whole-brain dMRI data. © 2018 International Society for Magnetic Resonance in Medicine.

  2. High spatial resolution whole-body MR angiography featuring parallel imaging: initial experience

    International Nuclear Information System (INIS)

    Quick, H.H.; Vogt, F.M.; Madewald, S.; Herborn, C.U.; Bosk, S.; Goehde, S.; Debatin, J.F.; Ladd, M.E.

    2004-01-01

    Materials and methods: whole-body multi-station MRA was performed with a rolling table platform (AngioSURF) on 5 volunteers in two imaging series: 1) standard imaging protocol, 2) modified high-resolution protocol employing PAT using the generalized autocalibrating partially parallel acquisitions (GRAPPA) algorithm with an acceleration factor of 3. For an intra-individual comparison of the two MR examinations, the arterial vasculature was divided into 30 segments. Signal-to-noise ratios (SNR) and contrast-to-noise ratios (CNR) were calculated for all 30 arterial segments of each subject. Vessel segment depiction was qualitatively assessed applying a 5-point scale to each of the segments. Image reconstruction times were recorded for the standard as well as the PAT protocol. Results: compared to the standard protocol, PAT allowed for increased spatial resolution through a 3-fold reduction in mean voxel size for each of the 5 stations. Mean SNR and CNR values over all specified vessel segments decreased by a factor of 1.58 and 1.56, respectively. Despite the reduced SNR and CNR, the depiction of all specified vessel segments increased in PAT images, reflecting the increased spatial resolution. Qualitative comparison of standard and PAT images showed an increase in vessel segment conspicuity with more detailed depiction of intramuscular arterial branches in all volunteers. The time for image data reconstruction of all 5 stations was significantly increased from about 10 minutes to 40 minutes when using the PAT acquisition. (orig.) [de

  3. Mathematical and numerical models to achieve high speed with special-purpose parallel processors

    International Nuclear Information System (INIS)

    Cheng, H.S.; Wulff, W.; Mallen, A.N.

    1986-01-01

    Historically, safety analyses and plant dynamic simulations have been and still are being carried out be means of detailed FORTRAN codes on expensive mainframe computers in time-consuming batch processing mode. These codes have grown to be so expensive to execute that their utilization depends increasingly on the availability of very expensive supercomputers. Thus, advanced technology for high-speed, low-cost, and accurate plant dynamic simulations is very much needed. Ideally, a low-cost facility based on a modern minicomputer can be dedicated to the staff of a power plant, which is easy and convenient to use, and which can simulate realistically plant transients at faster than real-time speeds. Such a simulation capability can enhance safety and plant utilization. One such simulation facility that has been developed is the Brookhaven National Laboratory (BNL) Plant Analyzer, currently set up for boiling water reactor plant simulations at up to seven times faster than real-time process speeds. The principal hardware components of the BNL Plant Analyzer are two units of special-purpose parallel processors, the AD10 of Applied Dynamics International and a PDP-11/34 host computer

  4. A vibrating wire parallel to a high temperature superconducting slab. Vol. 2

    Energy Technology Data Exchange (ETDEWEB)

    Saif, A G; El-sabagh, M A [Department of Mathematic and Theoretical physics, Nuclear Research Center, Atomic Energy Authority, Cairo (Egypt)

    1996-03-01

    The power losses problem for an idealized high temperature type II superconducting system of a simple geometry is studied. This system is composed of a vibrating normal conducting wire (two wires) carrying a direct current parallel to an uniaxial anisotropic type II superconducting slab (moving slab). First, the electromagnetic equation governing the dynamics of this system, and its solutions are obtained. Secondly, a modified anisotropic london equation is developed to study these systems in the case of the slab moving. Thirdly, it is found that, the power losses is dependent on the frequency, london penetration depth, permeability, conductivity, velocity, and the distance between the normal conductors and the surfaces of the superconducting slab. Moreover, the power losses decreases as the distance between the normal conductors and the surface of the superconducting slab decreases; and increases as the frequency, the london penetration depth, permeability, conductivity, and velocity are increased. These losses along the versor of the anisotropy axis is increased as {lambda}{sub |}| increases. Moreover, it is greater than the power losses along the crystal symmetry direction. In the isotropic case as well as the slab thickness tends to infinity, agreement with previous results are obtained. 2 figs.

  5. Parallel sort with a ranged, partitioned key-value store in a high perfomance computing environment

    Science.gov (United States)

    Bent, John M.; Faibish, Sorin; Grider, Gary; Torres, Aaron; Poole, Stephen W.

    2016-01-26

    Improved sorting techniques are provided that perform a parallel sort using a ranged, partitioned key-value store in a high performance computing (HPC) environment. A plurality of input data files comprising unsorted key-value data in a partitioned key-value store are sorted. The partitioned key-value store comprises a range server for each of a plurality of ranges. Each input data file has an associated reader thread. Each reader thread reads the unsorted key-value data in the corresponding input data file and performs a local sort of the unsorted key-value data to generate sorted key-value data. A plurality of sorted, ranged subsets of each of the sorted key-value data are generated based on the plurality of ranges. Each sorted, ranged subset corresponds to a given one of the ranges and is provided to one of the range servers corresponding to the range of the sorted, ranged subset. Each range server sorts the received sorted, ranged subsets and provides a sorted range. A plurality of the sorted ranges are concatenated to obtain a globally sorted result.

  6. Achieving high performance in numerical computations on RISC workstations and parallel systems

    Energy Technology Data Exchange (ETDEWEB)

    Goedecker, S. [Max-Planck Inst. for Solid State Research, Stuttgart (Germany); Hoisie, A. [Los Alamos National Lab., NM (United States)

    1997-08-20

    The nominal peak speeds of both serial and parallel computers is raising rapidly. At the same time however it is becoming increasingly difficult to get out a significant fraction of this high peak speed from modern computer architectures. In this tutorial the authors give the scientists and engineers involved in numerically demanding calculations and simulations the necessary basic knowledge to write reasonably efficient programs. The basic principles are rather simple and the possible rewards large. Writing a program by taking into account optimization techniques related to the computer architecture can significantly speedup your program, often by factors of 10--100. As such, optimizing a program can for instance be a much better solution than buying a faster computer. If a few basic optimization principles are applied during program development, the additional time needed for obtaining an efficient program is practically negligible. In-depth optimization is usually only needed for a few subroutines or kernels and the effort involved is therefore also acceptable.

  7. Determination of accurate 1H positions of an alanine tripeptide with anti-parallel and parallel β-sheet structures by high resolution 1H solid state NMR and GIPAW chemical shift calculation.

    Science.gov (United States)

    Yazawa, Koji; Suzuki, Furitsu; Nishiyama, Yusuke; Ohata, Takuya; Aoki, Akihiro; Nishimura, Katsuyuki; Kaji, Hironori; Shimizu, Tadashi; Asakura, Tetsuo

    2012-11-25

    The accurate (1)H positions of alanine tripeptide, A(3), with anti-parallel and parallel β-sheet structures could be determined by highly resolved (1)H DQMAS solid-state NMR spectra and (1)H chemical shift calculation with gauge-including projector augmented wave calculations.

  8. Parallel interconnect for a novel system approach to short distance high information transfer data links

    Science.gov (United States)

    Raskin, Glenn; Lebby, Michael S.; Carney, F.; Kazakia, M.; Schwartz, Daniel B.; Gaw, Craig A.

    1997-04-01

    The OPTOBUSTM family of products provides for high performance parallel interconnection utilizing optical links in a 10-bit wide bi-directional configuration. The link is architected to be 'transparent' in that it is totally asynchronous and dc coupled so that it can be treated as a perfect cable with extremely low skew and no losses. An optical link consists of two identical transceiver modules and a pair of connectorized 62.5 micrometer multi mode fiber ribbon cables. The OPTOBUSTM I link provides bi- directional functionality at 4 Gbps (400 Mbps per channel), while the OPTOBUSTM II link will offer the same capability at 8 Gbps (800 Mbps per channel). The transparent structure of the OPTOBUSTM links allow for an arbitrary data stream regardless of its structure. Both the OPTOBUSTM I and OPTOBUSTM II transceiver modules are packaged as partially populated 14 by 14 pin grid arrays (PGA) with optical receptacles on one side of the module. The modules themselves are composed of several elements; including passives, integrated circuits optoelectronic devices and optical interface units (OIUs) (which consist of polymer waveguides and a specially designed lead frame). The initial offering of the modules electrical interface utilizes differential CML. The CML line driver sinks 5 mA of current into one of two pins. When terminated with 50 ohm pull-up resistors tied to a voltage between VCC and VCC-2, the result is a differential swing of plus or minus 250 mV, capable of driving standard PECL I/Os. Future offerings of the OPTOBUSTM links will incorporate LVDS and PECL interfaces as well as CML. The integrated circuits are silicon based. For OPTOBUSTM I links, a 1.5 micrometer drawn emitter NPN bipolar process is used for the receiver and an enhanced 0.8 micrometer CMOS process for the laser driver. For OPTOBUSTM II links, a 0.8 micrometer drawn emitter NPN bipolar process is used for the receiver and the driver IC utilizes 0.8 micrometer BiCMOS technology. The OPTOBUSTM

  9. Z-buffer image assembly processing in high parallel visualization processing

    International Nuclear Information System (INIS)

    Kaneko, Isamu; Muramatsu, Kazuhiro

    2000-03-01

    On the platform of the parallel computer with many processors, the domain decomposition method is used as a popular means of parallel processing. In these days when the simulation scale becomes much larger and takes a lot of time, the simultaneous visualization processing with the actual computation is much more needed, and especially in case of a real-time visualization, the domain decomposition technique is indispensable. In case of parallel rendering processing, the rendered results must be gathered to one processor to compose the integrated picture in the last stage. This integration is usually conducted by the method using Z-buffer values. This process, however, induces the crucial problems of much lower speed processing and local memory shortage in case of parallel processing exceeding more than several tens of processors. In this report, the two new solutions are proposed. The one is the adoption of a special operator (Reduce operator) in the parallelization process, and the other is a buffer compression by deleting the background informations. This report includes the performance results of these new techniques to investigate their effect with use of the parallel computer Paragon. (author)

  10. Signal enhancement due to high-Z nanofilm electrodes in parallel plate ionization chambers with variable microgaps.

    Science.gov (United States)

    Brivio, Davide; Sajo, Erno; Zygmanski, Piotr

    2017-12-01

    regarded as stopped by the radiation transport code but which can move and form electron current in small gaps (charge are moderately decreasing with the air gap. When gap sizes are smaller than ~20 μm, the contribution to signal from dose approaches zero while contributions from high-energy current and deposited charges give rise to an offset signal. The measured signal enhancement ratio (SER) was 40.0 ± 5.0 for the 3 μm gap and rapidly decreasing with gap size down to 9.9 ± 1.2 for the 21 μm gap and to 6.6 ± 0.3 for the 100 μm gap. The uncertainties in SER were mostly due to uncertainties in gap size and data acquisition system. We developed an experimental method to determine the signal enhancement due to high-Z nanolayers in parallel plate ionization chambers with micrometer spatial resolution. As the water-equivalent thicknesses of these air gaps are 3 nm to 10 μm, the method may also be applicable for nanoscopic spatial resolution of other gap materials. The method may be extended to solid insulator materials with low Z. © 2017 American Association of Physicists in Medicine.

  11. High-Tc superconductor applications

    International Nuclear Information System (INIS)

    Anon.

    1988-01-01

    There has been much speculation about new products and business opportunities which high-Tc superconductors might make possible. However, with the exception of one Japanese survey, there have not been any recognized forecasts suggesting a timeframe and relative economic impact for proposed high-Tc products. The purpose of this survey is to provide definitive projections of the timetable for high-Tc product development, based on the combined forecasts of the leading U.S. superconductivity experts. The FTS panel of experts on high-Tc superconductor applications, representing both business and research, forecast the commercialization and economic impact for 28 classes of electronic, magnetic, communications, instrumentation, transportation, industrial, and power generation products. In most cases, forecasts predict the occurrence of developments within a 90% statistical confidence limit of 2-to-3 years. The report provides background information on the 28 application areas, as well as other information useful for strategic planners. The panel also forecast high-Tc research spending, markets, and international competitiveness, and provide insight into how the industry will evolve

  12. Parallel computing works!

    CERN Document Server

    Fox, Geoffrey C; Messina, Guiseppe C

    2014-01-01

    A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop

  13. Parallel fuzzy connected image segmentation on GPU

    OpenAIRE

    Zhuge, Ying; Cao, Yong; Udupa, Jayaram K.; Miller, Robert W.

    2011-01-01

    Purpose: Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm impleme...

  14. Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis

    Science.gov (United States)

    Choudhary, Alok Nidhi

    1989-01-01

    Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.

  15. Optimum design of 6-DOF parallel manipulator with translational/rotational workspaces for haptic device application

    Energy Technology Data Exchange (ETDEWEB)

    Yoon, Jung Won; Hwang, Yoon Kwon [Gyeongsang National University, Jinju (Korea, Republic of); Ryu, Je Ha [Gwangju Institute of Science and Technology, Gwangju (Korea, Republic of)

    2010-05-15

    This paper proposes an optimum design method that satisfies the desired orientation workspace at the boundary of the translation workspace while maximizing the mechanism isotropy for parallel manipulators. A simple genetic algorithm is used to obtain the optimal linkage parameters of a six-degree-of-freedom parallel manipulator that can be used as a haptic device. The objective function is composed of a desired spherical shape translation workspace and a desired orientation workspace located on the boundaries of the desired translation workspace, along with a global conditioning index based on a homogeneous Jacobian matrix. The objective function was optimized to satisfy the desired orientation workspace at the boundary positions as translated from a neutral position of the increased entropy mechanism. An optimization result with desired translation and orientation workspaces for a haptic device was obtained to show the effectiveness of the suggested scheme, and the kinematic performances of the proposed model were compared with those of a preexisting base model

  16. Optimum design of 6-DOF parallel manipulator with translational/rotational workspaces for haptic device application

    International Nuclear Information System (INIS)

    Yoon, Jung Won; Hwang, Yoon Kwon; Ryu, Je Ha

    2010-01-01

    This paper proposes an optimum design method that satisfies the desired orientation workspace at the boundary of the translation workspace while maximizing the mechanism isotropy for parallel manipulators. A simple genetic algorithm is used to obtain the optimal linkage parameters of a six-degree-of-freedom parallel manipulator that can be used as a haptic device. The objective function is composed of a desired spherical shape translation workspace and a desired orientation workspace located on the boundaries of the desired translation workspace, along with a global conditioning index based on a homogeneous Jacobian matrix. The objective function was optimized to satisfy the desired orientation workspace at the boundary positions as translated from a neutral position of the increased entropy mechanism. An optimization result with desired translation and orientation workspaces for a haptic device was obtained to show the effectiveness of the suggested scheme, and the kinematic performances of the proposed model were compared with those of a preexisting base model

  17. Process-Oriented Parallel Programming with an Application to Data-Intensive Computing

    OpenAIRE

    Givelberg, Edward

    2014-01-01

    We introduce process-oriented programming as a natural extension of object-oriented programming for parallel computing. It is based on the observation that every class of an object-oriented language can be instantiated as a process, accessible via a remote pointer. The introduction of process pointers requires no syntax extension, identifies processes with programming objects, and enables processes to exchange information simply by executing remote methods. Process-oriented programming is a h...

  18. A GPU Parallelization of the Absolute Nodal Coordinate Formulation for Applications in Flexible Multibody Dynamics

    Science.gov (United States)

    2012-02-17

    to be solved. Disclaimer: Reference herein to any specific commercial company , product, process, or service by trade name, trademark...data processing rather than data caching and control flow. To make use of this computational power, NVIDIA introduced a general purpose parallel...GPU implementations were run on an Intel Nehalem Xeon E5520 2.26GHz processor with an NVIDIA Tesla C2070 graphics card for varying numbers of

  19. Development of parallellized higher-order generalized depletion perturbation theory for application in equilibrium cycle optimization

    Energy Technology Data Exchange (ETDEWEB)

    Geemert, R. van E-mail: rene.vangeemert@psi.ch; Hoogenboom, J.E. E-mail: j.e.hoogenboom@iri.tudelft.nl

    2001-09-01

    As nuclear fuel economy is basically a multi-cycle issue, a fair way of evaluating reload patterns is to consider their performance in the case of an equilibrium cycle. The equilibrium cycle associated with a reload pattern is defined as the limit fuel cycle that eventually emerges after multiple successive periodic refueling, each time implementing the same reload scheme. Since the equilibrium cycle is the solution of a reload operation invariance equation, it can in principle be found with sufficient accuracy only by applying an iterative procedure, simulating the emergence of the limit cycle. For a design purpose such as the optimization of reload patterns, in which many different equilibrium cycle perturbations (resulting from many different limited changes in the reload operator) must be evaluated, this requires far too much computational effort. However, for very fast calculation of these many different equilibrium cycle perturbations it is also possible to set up a generalized variational approach. This approach results in an iterative scheme that yields the exact perturbation in the equilibrium cycle solution as well, in an accelerated way. Furthermore, both the solution of the adjoint equations occurring in the perturbation theory formalism and the implementation of the optimization algorithm have been parallellized and executed on a massively parallel machine. The combination of parallellism and generalized perturbation theory offers the opportunity to perform very exhaustive, fast and accurate sampling of the solution space for the equilibrium cycle reload pattern optimization problem.

  20. Dynamic programming in parallel boundary detection with application to ultrasound intima-media segmentation.

    Science.gov (United States)

    Zhou, Yuan; Cheng, Xinyao; Xu, Xiangyang; Song, Enmin

    2013-12-01

    Segmentation of carotid artery intima-media in longitudinal ultrasound images for measuring its thickness to predict cardiovascular diseases can be simplified as detecting two nearly parallel boundaries within a certain distance range, when plaque with irregular shapes is not considered. In this paper, we improve the implementation of two dynamic programming (DP) based approaches to parallel boundary detection, dual dynamic programming (DDP) and piecewise linear dual dynamic programming (PL-DDP). Then, a novel DP based approach, dual line detection (DLD), which translates the original 2-D curve position to a 4-D parameter space representing two line segments in a local image segment, is proposed to solve the problem while maintaining efficiency and rotation invariance. To apply the DLD to ultrasound intima-media segmentation, it is imbedded in a framework that employs an edge map obtained from multiplication of the responses of two edge detectors with different scales and a coupled snake model that simultaneously deforms the two contours for maintaining parallelism. The experimental results on synthetic images and carotid arteries of clinical ultrasound images indicate improved performance of the proposed DLD compared to DDP and PL-DDP, with respect to accuracy and efficiency. Copyright © 2013 Elsevier B.V. All rights reserved.

  1. Parallel Evolutionary Optimization of Multibody Systems with Application to Railway Dynamics

    Energy Technology Data Exchange (ETDEWEB)

    Eberhard, Peter [University of Erlangen-Nuremberg, Institute of Applied Mechanics (Germany)], E-mail: eberhard@ltm.uni-erlangen.de; Dignath, Florian [University of Stuttgart, Institute B of Mechanics (Germany)], E-mail: fd@mechb.uni-stuttgart.de; Kuebler, Lars [University of Erlangen-Nuremberg, Institute of Applied Mechanics (Germany)], E-mail: kuebler@ltm.uni-erlangen.de

    2003-03-15

    The optimization of multibody systems usually requires many costly criteria computations since the equations of motion must be evaluated by numerical time integration for each considered design. For actively controlled or flexible multibody systems additional difficulties arise as the criteria may contain non-differentiable points or many local minima. Therefore, in this paper a stochastic evolution strategy is used in combination with parallel computing in order to reduce the computation times whilst keeping the inherent robustness. For the parallelization a master-slave approach is used in a heterogeneous workstation/PC cluster. The pool-of-tasks concept is applied in order to deal with the frequently changing workloads of different machines in the cluster. In order to analyze the performance of the parallel optimization method, the suspension of an ICE passenger coach, modeled as an elastic multibody system, is optimized simultaneously with regard to several criteria including vibration damping and a criterion related to safety against derailment. The iterative and interactive nature of a typical optimization process for technical systems is emphasized.

  2. Parallel Evolutionary Optimization of Multibody Systems with Application to Railway Dynamics

    International Nuclear Information System (INIS)

    Eberhard, Peter; Dignath, Florian; Kuebler, Lars

    2003-01-01

    The optimization of multibody systems usually requires many costly criteria computations since the equations of motion must be evaluated by numerical time integration for each considered design. For actively controlled or flexible multibody systems additional difficulties arise as the criteria may contain non-differentiable points or many local minima. Therefore, in this paper a stochastic evolution strategy is used in combination with parallel computing in order to reduce the computation times whilst keeping the inherent robustness. For the parallelization a master-slave approach is used in a heterogeneous workstation/PC cluster. The pool-of-tasks concept is applied in order to deal with the frequently changing workloads of different machines in the cluster. In order to analyze the performance of the parallel optimization method, the suspension of an ICE passenger coach, modeled as an elastic multibody system, is optimized simultaneously with regard to several criteria including vibration damping and a criterion related to safety against derailment. The iterative and interactive nature of a typical optimization process for technical systems is emphasized

  3. COMPARISON OF PARALLEL AND SERIES HYBRID POWERTRAINS FOR TRANSIT BUS APPLICATION

    Energy Technology Data Exchange (ETDEWEB)

    Gao, Zhiming [ORNL; Daw, C Stuart [ORNL; Smith, David E [ORNL; Jones, Perry T [ORNL; LaClair, Tim J [ORNL; Parks, II, James E [ORNL

    2016-01-01

    The fuel economy and emissions of both conventional and hybrid buses equipped with emissions aftertreatment were evaluated via computational simulation for six representative city bus drive cycles. Both series and parallel configurations for the hybrid case were studied. The simulation results indicate that series hybrid buses have the greatest overall advantage in fuel economy. The series and parallel hybrid buses were predicted to produce similar CO and HC tailpipe emissions but were also predicted to have reduced NOx tailpipe emissions compared to the conventional bus in higher speed cycles. For the New York bus cycle (NYBC), which has the lowest average speed among the cycles evaluated, the series bus tailpipe emissions were somewhat higher than they were for the conventional bus, while the parallel hybrid bus had significantly lower tailpipe emissions. All three bus powertrains were found to require periodic active DPF regeneration to maintain PM control. Plug-in operation of series hybrid buses appears to offer significant fuel economy benefits and is easily employed due to the relatively large battery capacity that is typical of the series hybrid configuration.

  4. A parallel calibration utility for WRF-Hydro on high performance computers

    Science.gov (United States)

    Wang, J.; Wang, C.; Kotamarthi, V. R.

    2017-12-01

    A successful modeling of complex hydrological processes comprises establishing an integrated hydrological model which simulates the hydrological processes in each water regime, calibrates and validates the model performance based on observation data, and estimates the uncertainties from different sources especially those associated with parameters. Such a model system requires large computing resources and often have to be run on High Performance Computers (HPC). The recently developed WRF-Hydro modeling system provides a significant advancement in the capability to simulate regional water cycles more completely. The WRF-Hydro model has a large range of parameters such as those in the input table files — GENPARM.TBL, SOILPARM.TBL and CHANPARM.TBL — and several distributed scaling factors such as OVROUGHRTFAC. These parameters affect the behavior and outputs of the model and thus may need to be calibrated against the observations in order to obtain a good modeling performance. Having a parameter calibration tool specifically for automate calibration and uncertainty estimates of WRF-Hydro model can provide significant convenience for the modeling community. In this study, we developed a customized tool using the parallel version of the model-independent parameter estimation and uncertainty analysis tool, PEST, to enabled it to run on HPC with PBS and SLURM workload manager and job scheduler. We also developed a series of PEST input file templates that are specifically for WRF-Hydro model calibration and uncertainty analysis. Here we will present a flood case study occurred in April 2013 over Midwest. The sensitivity and uncertainties are analyzed using the customized PEST tool we developed.

  5. A Fast, High Quality, and Reproducible Parallel Lagged-Fibonacci Pseudorandom Number Generator

    Science.gov (United States)

    Mascagni, Michael; Cuccaro, Steven A.; Pryor, Daniel V.; Robinson, M. L.

    1995-07-01

    We study the suitability of the additive lagged-Fibonacci pseudo-random number generator for parallel computation. This generator has relatively short period with respect to the size of its seed. However, the short period is more than made up for with the huge number of full-period cycles it contains. These different full period cycles are called equivalence classes. We show how to enumerate the equivalence classes and how to compute seeds to select a given equivalence class, In addition, we present some theoretical measures of quality for this generator when used in parallel. Next, we conjecture on the size of these measures of quality for this generator. Extensive empirical evidence supports this conjecture. In addition, a probabilistic interpretation of these measures leads to another conjecture similarly supported by empirical evidence. Finally we give an explicit parallelization suitable for a fully reproducible asynchronous MIMD implementation.

  6. Large-scale parallel configuration interaction. II. Two- and four-component double-group general active space implementation with application to BiH

    DEFF Research Database (Denmark)

    Knecht, Stefan; Jensen, Hans Jørgen Aagaard; Fleig, Timo

    2010-01-01

    We present a parallel implementation of a large-scale relativistic double-group configuration interaction CIprogram. It is applicable with a large variety of two- and four-component Hamiltonians. The parallel algorithm is based on a distributed data model in combination with a static load balanci...

  7. Numerical investigation of power requirements for ultra-high-speed serial-to-parallel conversion

    DEFF Research Database (Denmark)

    Lillieholm, Mads; Mulvad, Hans Christian Hansen; Palushani, Evarist

    2012-01-01

    We present a numerical bit-error rate investigation of 160-640 Gbit/s serial-to-parallel conversion by four-wave mixing based time-domain optical Fourier transformation, showing an inverse scaling of the required pump energy per bit with the bit rate.......We present a numerical bit-error rate investigation of 160-640 Gbit/s serial-to-parallel conversion by four-wave mixing based time-domain optical Fourier transformation, showing an inverse scaling of the required pump energy per bit with the bit rate....

  8. 'Iconic' tracking algorithms for high energy physics using the TRAX-I massively parallel processor

    International Nuclear Information System (INIS)

    Vesztergombi, G.

    1989-01-01

    TRAX-I, a cost-effective parallel microcomputer, applying associative string processor (ASP) architecture with 16 K parallel processing elements, is being built by Aspex Microsystems Ltd. (UK). When applied to the tracking problem of very complex events with several hundred tracks, the large number of processors allows one to dedicate one or more processors to each wire (in MWPC), each pixel (in digitized images from streamer chambers or other visual detectors), or each pad (in TPC) to perform very efficient pattern recognition. Some linear tracking algorithms based on this ''ionic'' representation are presented. (orig.)

  9. 'Iconic' tracking algorithms for high energy physics using the TRAX-I massively parallel processor

    International Nuclear Information System (INIS)

    Vestergombi, G.

    1989-11-01

    TRAX-I, a cost-effective parallel microcomputer, applying Associative String Processor (ASP) architecture with 16 K parallel processing elements, is being built by Aspex Microsystems Ltd. (UK). When applied to the tracking problem of very complex events with several hundred tracks, the large number of processors allows one to dedicate one or more processors to each wire (in MWPC), each pixel (in digitized images from streamer chambers or other visual detectors), or each pad (in TPC) to perform very efficient pattern recognition. Some linear tracking algorithms based on this 'iconic' representation are presented. (orig.)

  10. Convergent Evolution of Hemoglobin Function in High-Altitude Andean Waterfowl Involves Limited Parallelism at the Molecular Sequence Level.

    Directory of Open Access Journals (Sweden)

    Chandrasekhar Natarajan

    2015-12-01

    Full Text Available A fundamental question in evolutionary genetics concerns the extent to which adaptive phenotypic convergence is attributable to convergent or parallel changes at the molecular sequence level. Here we report a comparative analysis of hemoglobin (Hb function in eight phylogenetically replicated pairs of high- and low-altitude waterfowl taxa to test for convergence in the oxygenation properties of Hb, and to assess the extent to which convergence in biochemical phenotype is attributable to repeated amino acid replacements. Functional experiments on native Hb variants and protein engineering experiments based on site-directed mutagenesis revealed the phenotypic effects of specific amino acid replacements that were responsible for convergent increases in Hb-O2 affinity in multiple high-altitude taxa. In six of the eight taxon pairs, high-altitude taxa evolved derived increases in Hb-O2 affinity that were caused by a combination of unique replacements, parallel replacements (involving identical-by-state variants with independent mutational origins in different lineages, and collateral replacements (involving shared, identical-by-descent variants derived via introgressive hybridization. In genome scans of nucleotide differentiation involving high- and low-altitude populations of three separate species, function-altering amino acid polymorphisms in the globin genes emerged as highly significant outliers, providing independent evidence for adaptive divergence in Hb function. The experimental results demonstrate that convergent changes in protein function can occur through multiple historical paths, and can involve multiple possible mutations. Most cases of convergence in Hb function did not involve parallel substitutions and most parallel substitutions did not affect Hb-O2 affinity, indicating that the repeatability of phenotypic evolution does not require parallelism at the molecular level.

  11. Lightweight High Efficiency Electric Motors for Space Applications

    Science.gov (United States)

    Robertson, Glen A.; Tyler, Tony R.; Piper, P. J.

    2011-01-01

    Lightweight high efficiency electric motors are needed across a wide range of space applications from - thrust vector actuator control for launch and flight applications to - general vehicle, base camp habitat and experiment control for various mechanisms to - robotics for various stationary and mobile space exploration missions. QM Power?s Parallel Path Magnetic Technology Motors have slowly proven themselves to be a leading motor technology in this area; winning a NASA Phase II for "Lightweight High Efficiency Electric Motors and Actuators for Low Temperature Mobility and Robotics Applications" a US Army Phase II SBIR for "Improved Robot Actuator Motors for Medical Applications", an NSF Phase II SBIR for "Novel Low-Cost Electric Motors for Variable Speed Applications" and a DOE SBIR Phase I for "High Efficiency Commercial Refrigeration Motors" Parallel Path Magnetic Technology obtains the benefits of using permanent magnets while minimizing the historical trade-offs/limitations found in conventional permanent magnet designs. The resulting devices are smaller, lower weight, lower cost and have higher efficiency than competitive permanent magnet and non-permanent magnet designs. QM Power?s motors have been extensively tested and successfully validated by multiple commercial and aerospace customers and partners as Boeing Research and Technology. Prototypes have been made between 0.1 and 10 HP. They are also in the process of scaling motors to over 100kW with their development partners. In this paper, Parallel Path Magnetic Technology Motors will be discussed; specifically addressing their higher efficiency, higher power density, lighter weight, smaller physical size, higher low end torque, wider power zone, cooler temperatures, and greater reliability with lower cost and significant environment benefit for the same peak output power compared to typically motors. A further discussion on the inherent redundancy of these motors for space applications will be provided.

  12. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing.

    Directory of Open Access Journals (Sweden)

    Jonas Binladen

    2007-02-01

    Full Text Available The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources.We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences. Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis.We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%. Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial

  13. High performance computing of density matrix renormalization group method for 2-dimensional model. Parallelization strategy toward peta computing

    International Nuclear Information System (INIS)

    Yamada, Susumu; Igarashi, Ryo; Machida, Masahiko; Imamura, Toshiyuki; Okumura, Masahiko; Onishi, Hiroaki

    2010-01-01

    We parallelize the density matrix renormalization group (DMRG) method, which is a ground-state solver for one-dimensional quantum lattice systems. The parallelization allows us to extend the applicable range of the DMRG to n-leg ladders i.e., quasi two-dimension cases. Such an extension is regarded to bring about several breakthroughs in e.g., quantum-physics, chemistry, and nano-engineering. However, the straightforward parallelization requires all-to-all communications between all processes which are unsuitable for multi-core systems, which is a mainstream of current parallel computers. Therefore, we optimize the all-to-all communications by the following two steps. The first one is the elimination of the communications between all processes by only rearranging data distribution with the communication data amount kept. The second one is the avoidance of the communication conflict by rescheduling the calculation and the communication. We evaluate the performance of the DMRG method on multi-core supercomputers and confirm that our two-steps tuning is quite effective. (author)

  14. A Visual Database System for Image Analysis on Parallel Computers and its Application to the EOS Amazon Project

    Science.gov (United States)

    Shapiro, Linda G.; Tanimoto, Steven L.; Ahrens, James P.

    1996-01-01

    The goal of this task was to create a design and prototype implementation of a database environment that is particular suited for handling the image, vision and scientific data associated with the NASA's EOC Amazon project. The focus was on a data model and query facilities that are designed to execute efficiently on parallel computers. A key feature of the environment is an interface which allows a scientist to specify high-level directives about how query execution should occur.

  15. Hybrid MPI-OpenMP Parallelism in the ONETEP Linear-Scaling Electronic Structure Code: Application to the Delamination of Cellulose Nanofibrils.

    Science.gov (United States)

    Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton

    2014-11-11

    We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here.

  16. Methodes spectrales paralleles et applications aux simulations de couches de melange compressibles

    OpenAIRE

    Male , Jean-Michel; Fezoui , Loula ,

    1993-01-01

    La resolution des equations de Navier-Stokes en methodes spectrales pour des ecoulements compressibles peut etre assez gourmande en temps de calcul. On etudie donc ici la parallelisation d'un tel algorithme et son implantation sur une machine massivement parallele, la connection-machine CM-2. La methode spectrale s'adapte bien aux exigences du parallelisme massif, mais l'un des outils de base de cette methode, la transformee de Fourier rapide (lorsqu'elle doit etre appliquee sur les deux dime...

  17. Design and Control System of a Modular Parallel Robot for Medical Applications

    Directory of Open Access Journals (Sweden)

    Florin Covaciu

    2015-06-01

    Full Text Available Brachytherapy (BT, a cancer treatment method, is a type of internal radiation therapy which implies that radiation doses (seeds are placed inside the tumor, aiming to destroy only the cancerous cells, without affecting the surrounding healthy tissue. For a successful brachytherapy procedure, the accurate radiation seeds placement is an important issue, which is why a robotic system has been built for this task. The paper presents the design of a parallel robotic system for brachytherapy procedures and the control system architecture and its implementation.

  18. Cocaine Use and Delinquent Behavior among High-Risk Youths: A Growth Model of Parallel Processes

    Science.gov (United States)

    Dembo, Richard; Sullivan, Christopher

    2009-01-01

    We report the results of a parallel-process, latent growth model analysis examining the relationships between cocaine use and delinquent behavior among youths. The study examined a sample of 278 justice-involved juveniles completing at least one of three follow-up interviews as part of a National Institute on Drug Abuse-funded study. The results…

  19. High Performance Parallel Processing Project: Industrial computing initiative. Progress reports for fiscal year 1995

    Energy Technology Data Exchange (ETDEWEB)

    Koniges, A.

    1996-02-09

    This project is a package of 11 individual CRADA`s plus hardware. This innovative project established a three-year multi-party collaboration that is significantly accelerating the availability of commercial massively parallel processing computing software technology to U.S. government, academic, and industrial end-users. This report contains individual presentations from nine principal investigators along with overall program information.

  20. ADaCGH: A parallelized web-based application and R package for the analysis of aCGH data.

    Directory of Open Access Journals (Sweden)

    Ramón Díaz-Uriarte

    Full Text Available BACKGROUND: Copy number alterations (CNAs in genomic DNA have been associated with complex human diseases, including cancer. One of the most common techniques to detect CNAs is array-based comparative genomic hybridization (aCGH. The availability of aCGH platforms and the need for identification of CNAs has resulted in a wealth of methodological studies. METHODOLOGY/PRINCIPAL FINDINGS: ADaCGH is an R package and a web-based application for the analysis of aCGH data. It implements eight methods for detection of CNAs, gains and losses of genomic DNA, including all of the best performing ones from two recent reviews (CBS, GLAD, CGHseg, HMM. For improved speed, we use parallel computing (via MPI. Additional information (GO terms, PubMed citations, KEGG and Reactome pathways is available for individual genes, and for sets of genes with altered copy numbers. CONCLUSIONS/SIGNIFICANCE: ADACGH represents a qualitative increase in the standards of these types of applications: a all of the best performing algorithms are included, not just one or two; b we do not limit ourselves to providing a thin layer of CGI on top of existing BioConductor packages, but instead carefully use parallelization, examining different schemes, and are able to achieve significant decreases in user waiting time (factors up to 45x; c we have added functionality not currently available in some methods, to adapt to recent recommendations (e.g., merging of segmentation results in wavelet-based and CGHseg algorithms; d we incorporate redundancy, fault-tolerance and checkpointing, which are unique among web-based, parallelized applications; e all of the code is available under open source licenses, allowing to build upon, copy, and adapt our code for other software projects.

  1. Numeric algorithms for parallel processors computer architectures with applications to the few-groups neutron diffusion equations

    International Nuclear Information System (INIS)

    Zee, S.K.

    1987-01-01

    A numeric algorithm and an associated computer code were developed for the rapid solution of the finite-difference method representation of the few-group neutron-diffusion equations on parallel computers. Applications of the numeric algorithm on both SIMD (vector pipeline) and MIMD/SIMD (multi-CUP/vector pipeline) architectures were explored. The algorithm was successfully implemented in the two-group, 3-D neutron diffusion computer code named DIFPAR3D (DIFfusion PARallel 3-Dimension). Numerical-solution techniques used in the code include the Chebyshev polynomial acceleration technique in conjunction with the power method of outer iteration. For inner iterations, a parallel form of red-black (cyclic) line SOR with automated determination of group dependent relaxation factors and iteration numbers required to achieve specified inner iteration error tolerance is incorporated. The code employs a macroscopic depletion model with trace capability for selected fission products' transients and critical boron. In addition to this, moderator and fuel temperature feedback models are also incorporated into the DIFPAR3D code, for realistic simulation of power reactor cores. The physics models used were proven acceptable in separate benchmarking studies

  2. Method of Parallel-Hierarchical Network Self-Training and its Application for Pattern Classification and Recognition

    Directory of Open Access Journals (Sweden)

    TIMCHENKO, L.

    2012-11-01

    Full Text Available Propositions necessary for development of parallel-hierarchical (PH network training methods are discussed in this article. Unlike already known structures of the artificial neural network, where non-normalized (absolute similarity criteria are used for comparison, the suggested structure uses a normalized criterion. Based on the analysis of training rules, a conclusion is made that application of two training methods with a teacher is optimal for PH network training: error correction-based training and memory-based training. Mathematical models of training and a combined method of PH network training for recognition of static and dynamic patterns are developed.

  3. Application of the parallel processing computer to a nuclear disaster prevention support system

    Energy Technology Data Exchange (ETDEWEB)

    Shigehiro, Nukatsuka; Osami, Watanabe [Mitsubishi Heavy Industries, LTD (Japan)

    2003-07-01

    At the time of nuclear emergency, it is important to identify the type and the cause of the accident. Besides with these, it is also important to provide adequate information for the emergency response organization to support decision making by predicting and evaluating the development of the event and the influence of the release of radioactivity for the environment. Recently, a new type of nuclear disaster prevention support system called MEASURES (Multiple Radiological Emergency Assistance System for Urgent Response) was developed which provides not only the current state of the nuclear power plant and the influence of the radioactivity for the environment, but also the future prediction of the accident development. In order to provide the accurate results of these analyses quickly, MEASURES utilizes various techniques, such as multiple nesting method which narrows down the calculation area gradually, and parallel processing computer for three dimensional analyses, such as air current distribution analysis. In this paper, the outline and the feature of MEASURES are presented, especially focused on the usage of parallel processing computer for the three dimensional air current distribution analysis. (authors)

  4. Massively parallel electrical conductivity imaging of the subsurface: Applications to hydrocarbon exploration

    Science.gov (United States)

    Newman, Gregory A.; Commer, Michael

    2009-07-01

    Three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical conductivity mapping of potential offshore oil and gas reservoirs. The imaging technology employs controlled source electromagnetic (CSEM) and magnetotelluric (MT) fields and treats geological media exhibiting transverse anisotropy. Moreover when combined with established seismic methods, direct imaging of reservoir fluids is possible. Because of the size of the 3D conductivity imaging problem, strategies are required exploiting computational parallelism and optimal meshing. The algorithm thus developed has been shown to scale to tens of thousands of processors. In one imaging experiment, 32,768 tasks/processors on the IBM Watson Research Blue Gene/L supercomputer were successfully utilized. Over a 24 hour period we were able to image a large scale field data set that previously required over four months of processing time on distributed clusters based on Intel or AMD processors utilizing 1024 tasks on an InfiniBand fabric. Electrical conductivity imaging using massively parallel computational resources produces results that cannot be obtained otherwise and are consistent with timeframes required for practical exploration problems.

  5. Circularly Polarized Low-Profile Antenna for Radiating Parallel to Ground Plane for RFID Reader Applications

    Directory of Open Access Journals (Sweden)

    Kittima Lertsakwimarn

    2013-01-01

    Full Text Available This paper presents a low-profile printed antenna with double U-shaped arms radiating circular polarization for the UHF RFID readers. The proposed antenna consists of double U-shaped strip structures and a capacitive feeding line to generate circular polarization. A part of the U-shaped arms is bent by 90° to direct the main beam parallel to the ground plane. From the results, -10 dB |S11| and 3 dB axial ratio of the antenna cover a typical UHF RFID band from 920 MHz to 925 MHz. The bidirectional beam is obtained with the maximum gain of 1.8 dBic in the parallel direction to the ground plane at the 925 MHz. The overall size of the proposed antenna including ground plane is 107 mm × 57 mm × 12.8 mm (0.33λ0 × 0.17λ0 × 0.04λ0.

  6. Generalized Philosophy of Alerting with Applications for Parallel Approach Collision Prevention

    Science.gov (United States)

    Winder, Lee F.; Kuchar, James K.

    2000-01-01

    The goal of the research was to develop formal guidelines for the design of hazard avoidance systems. An alerting system is automation designed to reduce the likelihood of undesirable outcomes that are due to rare failures in a human-controlled system. It accomplishes this by monitoring the system, and issuing warning messages to the human operators when thought necessary to head off a problem. On examination of existing and recently proposed logics for alerting it appears that few commonly accepted principles guide the design process. Different logics intended to address the same hazards may take disparate forms and emphasize different aspects of performance, because each reflects the intuitive priorities of a different designer. Because performance must be satisfactory to all users of an alerting system (implying a universal meaning of acceptable performance) and not just one designer, a proposed logic often undergoes significant piecemeal modification before gamma general acceptance. This report is an initial attempt to clarify the common performance goals by which an alerting system is ultimately judged. A better understanding of these goals will hopefully allow designers to reach the final logic in a quicker, more direct and repeatable manner. As a case study, this report compares three alerting logics for collision prevention during independent approaches to parallel runways, and outlines a fourth alternative incorporating elements of the first three, but satisfying stated requirements. Three existing logics for parallel approach alerting are described. Each follows from different intuitive principles. The logics are presented as examples of three "philosophies" of alerting system design.

  7. Massively parallel electrical conductivity imaging of the subsurface: Applications to hydrocarbon exploration

    International Nuclear Information System (INIS)

    Newman, Gregory A; Commer, Michael

    2009-01-01

    Three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical conductivity mapping of potential offshore oil and gas reservoirs. The imaging technology employs controlled source electromagnetic (CSEM) and magnetotelluric (MT) fields and treats geological media exhibiting transverse anisotropy. Moreover when combined with established seismic methods, direct imaging of reservoir fluids is possible. Because of the size of the 3D conductivity imaging problem, strategies are required exploiting computational parallelism and optimal meshing. The algorithm thus developed has been shown to scale to tens of thousands of processors. In one imaging experiment, 32,768 tasks/processors on the IBM Watson Research Blue Gene/L supercomputer were successfully utilized. Over a 24 hour period we were able to image a large scale field data set that previously required over four months of processing time on distributed clusters based on Intel or AMD processors utilizing 1024 tasks on an InfiniBand fabric. Electrical conductivity imaging using massively parallel computational resources produces results that cannot be obtained otherwise and are consistent with timeframes required for practical exploration problems.

  8. Application of the parallel processing computer to a nuclear disaster prevention support system

    International Nuclear Information System (INIS)

    Shigehiro, Nukatsuka; Osami, Watanabe

    2003-01-01

    At the time of nuclear emergency, it is important to identify the type and the cause of the accident. Besides with these, it is also important to provide adequate information for the emergency response organization to support decision making by predicting and evaluating the development of the event and the influence of the release of radioactivity for the environment. Recently, a new type of nuclear disaster prevention support system called MEASURES (Multiple Radiological Emergency Assistance System for Urgent Response) was developed which provides not only the current state of the nuclear power plant and the influence of the radioactivity for the environment, but also the future prediction of the accident development. In order to provide the accurate results of these analyses quickly, MEASURES utilizes various techniques, such as multiple nesting method which narrows down the calculation area gradually, and parallel processing computer for three dimensional analyses, such as air current distribution analysis. In this paper, the outline and the feature of MEASURES are presented, especially focused on the usage of parallel processing computer for the three dimensional air current distribution analysis. (authors)

  9. A study on stimulation of DC high voltage power of LCC series parallel resonant in projectile velocity measurement system

    Science.gov (United States)

    Lu, Dong-dong; Gu, Jin-liang; Luo, Hong-e.; Xia, Yan

    2017-10-01

    According to specific requirements of the X-ray machine system for measuring velocity of outfield projectile, a DC high voltage power supply system is designed for the high voltage or the smaller current. The system comprises: a series resonant circuit is selected as a full-bridge inverter circuit; a high-frequency zero-current soft switching of a high-voltage power supply is realized by PWM output by STM32; a nanocrystalline alloy transformer is chosen as a high-frequency booster transformer; and the related parameters of an LCC series-parallel resonant are determined according to the preset parameters of the transformer. The concrete method includes: a LCC series parallel resonant circuit and a voltage doubling circuit are stimulated by using MULTISM and MATLAB; selecting an optimal solution and an optimal parameter of all parts after stimulation analysis; and finally verifying the correctness of the parameter by stimulation of the whole system. Through stimulation analysis, the output voltage of the series-parallel resonant circuit gets to 10KV in 28s: then passing through the voltage doubling circuit, the output voltage gets to 120KV in one hour. According to the system, the wave range of the output voltage is so small as to provide the stable X-ray supply for the X-ray machine for measuring velocity of outfield projectile. It is fast in charging and high in efficiency.

  10. Quantitative and Selective Analysis of Feline Growth Related Proteins Using Parallel Reaction Monitoring High Resolution Mass Spectrometry.

    Directory of Open Access Journals (Sweden)

    Mårten Sundberg

    Full Text Available Today immunoassays are widely used in veterinary medicine, but lack of species specific assays often necessitates the use of assays developed for human applications. Mass spectrometry (MS is an attractive alternative due to high specificity and versatility, allowing for species-independent analysis. Targeted MS-based quantification methods are valuable complements to large scale shotgun analysis. A method referred to as parallel reaction monitoring (PRM, implemented on Orbitrap MS, has lately been presented as an excellent alternative to more traditional selected reaction monitoring/multiple reaction monitoring (SRM/MRM methods. The insulin-like growth factor (IGF-system is not well described in the cat but there are indications of important differences between cats and humans. In feline medicine IGF-I is mainly analyzed for diagnosis of growth hormone disorders but also for research, while the other proteins in the IGF-system are not routinely analyzed within clinical practice. Here, a PRM method for quantification of IGF-I, IGF-II, IGF binding protein (BP -3 and IGFBP-5 in feline serum is presented. Selective quantification was supported by the use of a newly launched internal standard named QPrEST™. Homology searches demonstrated the possibility to use this standard of human origin for quantification of the targeted feline proteins. Excellent quantitative sensitivity at the attomol/μL (pM level and selectivity were obtained. As the presented approach is very generic we show that high resolution mass spectrometry in combination with PRM and QPrEST™ internal standards is a versatile tool for protein quantitation across multispecies.

  11. Parallel high-performance grid computing: capabilities and opportunities of a novel demanding service and business class allowing highest resource efficiency

    NARCIS (Netherlands)

    F.N. Kepper (Nick); R. Ettig (Ramona); F. Dickmann (Frank); R. Stehr (Rene); F.G. Grosveld (Frank); G. Wedemann (Gero); T.A. Knoch (Tobias)

    2010-01-01

    textabstractThe hardware and software requirements for parallel applications depend on the problem size, type and the number particles / parameters, the degree of parallelization possible, the load balancing over different processors / memory, the calculation type and the input / output and

  12. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

    Directory of Open Access Journals (Sweden)

    Mark James Abraham

    2015-09-01

    Full Text Available GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported.

  13. Highly efficient parallel direct solver for solving dense complex matrix equations from method of moments

    Directory of Open Access Journals (Sweden)

    Yan Chen

    2017-03-01

    Full Text Available Based on the vectorised and cache optimised kernel, a parallel lower upper decomposition with a novel communication avoiding pivoting scheme is developed to solve dense complex matrix equations generated by the method of moments. The fine-grain data rearrangement and assembler instructions are adopted to reduce memory accessing times and improve CPU cache utilisation, which also facilitate vectorisation of the code. Through grouping processes in a binary tree, a parallel pivoting scheme is designed to optimise the communication pattern and thus reduces the solving time of the proposed solver. Two large electromagnetic radiation problems are solved on two supercomputers, respectively, and the numerical results demonstrate that the proposed method outperforms those in open source and commercial libraries.

  14. Temporal Precedence Checking for Switched Models and its Application to a Parallel Landing Protocol

    Science.gov (United States)

    Duggirala, Parasara Sridhar; Wang, Le; Mitra, Sayan; Viswanathan, Mahesh; Munoz, Cesar A.

    2014-01-01

    This paper presents an algorithm for checking temporal precedence properties of nonlinear switched systems. This class of properties subsume bounded safety and capture requirements about visiting a sequence of predicates within given time intervals. The algorithm handles nonlinear predicates that arise from dynamics-based predictions used in alerting protocols for state-of-the-art transportation systems. It is sound and complete for nonlinear switch systems that robustly satisfy the given property. The algorithm is implemented in the Compare Execute Check Engine (C2E2) using validated simulations. As a case study, a simplified model of an alerting system for closely spaced parallel runways is considered. The proposed approach is applied to this model to check safety properties of the alerting logic for different operating conditions such as initial velocities, bank angles, aircraft longitudinal separation, and runway separation.

  15. Parameter allocation of parallel array bistable stochastic resonance and its application in communication systems

    International Nuclear Information System (INIS)

    Liu Jian; Zhai Qi-Qing; Wang You-Guo; Liu Jin

    2016-01-01

    In this paper, we propose a parameter allocation scheme in a parallel array bistable stochastic resonance-based communication system (P-BSR-CS) to improve the performance of weak binary pulse amplitude modulated (BPAM) signal transmissions. The optimal parameter allocation policy of the P-BSR-CS is provided to minimize the bit error rate (BER) and maximize the channel capacity (CC) under the adiabatic approximation condition. On this basis, we further derive the best parameter selection theorem in realistic communication scenarios via variable transformation. Specifically, the P-BSR structure design not only brings the robustness of parameter selection optimization, where the optimal parameter pair is not fixed but variable in quite a wide range, but also produces outstanding system performance. Theoretical analysis and simulation results indicate that in the P-BSR-CS the proposed parameter allocation scheme yields considerable performance improvement, particularly in very low signal-to-noise ratio (SNR) environments. (paper)

  16. Tunable multiple plasmon induced transparencies in parallel graphene sheets and its applications

    Science.gov (United States)

    khazaee, Sara; Granpayeh, Nosrat

    2018-01-01

    Tunable plasmon induced transparency is achieved by using only two parallel graphene sheets beyond silicon diffractive grating in mid-infrared region. Excitation of the guided-wave resonance (GWR) in this structure is illustrated on the normal incident transmission spectra and plays the bright resonance mode role. Weak hybridization between two bright modes, creates plasmon induced transparency (PIT) optical response. The resonance frequency of transparency window can be tuned by different geometrical parameters. Also, variation of graphene Fermi energy can be used to achieve tunability of the resonance frequency of transparency window without reconstruction and re-fabrication of the structure. We demonstrate the existence of multiple PIT spectral responses resulting from a series of self-assembled GWRs to be used as the wavelength demultiplexer. This study can be used for design of the optical ultra-compact devices and photonic integrated circuits.

  17. Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications

    Science.gov (United States)

    Biswas, Rupak; Das, Sajal K.; Harvey, Daniel; Oliker, Leonid

    1999-01-01

    The ability to dynamically adapt an unstructured -rid (or mesh) is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult, particularly from the view point of portability on various multiprocessor platforms We address this problem by developing PLUM, tin automatic anti architecture-independent framework for adaptive numerical computations in a message-passing environment. Portability is demonstrated by comparing performance on an SP2, an Origin2000, and a T3E, without any code modifications. We also present a general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication pattern, with a goal to providing a global view of system loads across processors. Experiments on, an SP2 and an Origin2000 demonstrate the portability of our approach which achieves superb load balance at the cost of minimal extra overhead.

  18. The STAPL Parallel Graph Library

    KAUST Repository

    Harshvardhan,; Fidel, Adam; Amato, Nancy M.; Rauchwerger, Lawrence

    2013-01-01

    This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable

  19. Scalable High-Performance Parallel Design for Network Intrusion Detection Systems on Many-Core Processors

    OpenAIRE

    Jiang, Hayang; Xie, Gaogang; Salamatian, Kavé; Mathy, Laurent

    2013-01-01

    Network Intrusion Detection Systems (NIDSes) face significant challenges coming from the relentless network link speed growth and increasing complexity of threats. Both hardware accelerated and parallel software-based NIDS solutions, based on commodity multi-core and GPU processors, have been proposed to overcome these challenges. Network Intrusion Detection Systems (NIDSes) face significant challenges coming from the relentless network link speed growth and increasing complexity of threats. ...

  20. A New English?Arabic Parallel Text Corpus for Lexicographic Applications

    Directory of Open Access Journals (Sweden)

    Hashan Al-Ajmi

    2011-10-01

    Full Text Available

    Abstract: Bilingual lexicographers, translation specialists and English teachers in the Arabworld do not have access to computerized corpora of parallel texts for the English–Arabic languagepair. This project has been carried out to meet this requirement by establishing the first generalparallel corpus of English texts and their Arabic translations. The first phase of the project involvedthe selection of general source texts having appropriate lexical and stylistic features. The chosensource texts deal with a variety of topics such as the environment, globalization, psychology, history,politics, drama, etc. Their Arabic translations were taken from The World of Knowledge seriespublished by the National Council for Culture, Arts and Letters (NCCAL in Kuwait.

    Keywords: PARALLEL CORPUS, LEXICOGRAPHY, TRANSLATION, BILINGUAL DICTIONARY,COLLOCATIONS, ALIGNMENT, SYNONYMS, DERIVATIVES, ANTONYMS, GLOSSARY,FREQUENCY

    Opsomming: 'n Nuwe Engels–Arabiese parallelletekskorpus vir leksikografiesetoepassings Tweetalige leksikograwe, vertaalkundiges en Engelsonderwysers in dieArabiese wêreld het nie toegang tot gerekenariseerde korpusse van parallelle tekste vir die Engels–Arabiese taalpaar nie. Hierdie projek is onderneem om in dié behoefte te voorsien deur die eerstealgemene parallelle korpus van Engelse tekste en hul Arabiese vertalings tot stand te bring. Dieeerste fase van die projek het die keuse van algemene brontekste behels wat geskikte leksikale enstilistiese eienskappe besit. Die gekose brontekste handel oor 'n verskeidenheid onderwerpe soosdie omgewing, globalisering, psigologie, geskiedenis, politiek, drama, ens. Hul Arabiese vertalingsis geneem uit The World of Knowledge-reeks gepubliseer deur die National Council for Culture, Artsand Letters (NCCAL in Koeweit.

    Sleutelwoorde: PARALLELLE KORPUS, LEKSIKOGRAFIE, VERTALING, TWEETALIGEWOORDEBOEK, KOLLOKASIES, OOREENSTEMMING, SINONIEME, AFLEIDINGS, ANTONIEME

  1. Parallel segmented outlet flow high performance liquid chromatography with multiplexed detection

    International Nuclear Information System (INIS)

    Camenzuli, Michelle; Terry, Jessica M.; Shalliker, R. Andrew; Conlan, Xavier A.; Barnett, Neil W.; Francis, Paul S.

    2013-01-01

    Graphical abstract: -- Highlights: •Multiplexed detection for liquid chromatography. •‘Parallel segmented outlet flow’ distributes inner and outer portions of the analyte zone. •Three detectors were used simultaneously for the determination of opiate alkaloids. -- Abstract: We describe a new approach to multiplex detection for HPLC, exploiting parallel segmented outlet flow – a new column technology that provides pressure-regulated control of eluate flow through multiple outlet channels, which minimises the additional dead volume associated with conventional post-column flow splitting. Using three detectors: one UV-absorbance and two chemiluminescence systems (tris(2,2′-bipyridine)ruthenium(III) and permanganate), we examine the relative responses for six opium poppy (Papaver somniferum) alkaloids under conventional and multiplexed conditions, where approximately 30% of the eluate was distributed to each detector and the remaining solution directed to a collection vessel. The parallel segmented outlet flow mode of operation offers advantages in terms of solvent consumption, waste generation, total analysis time and solute band volume when applying multiple detectors to HPLC, but the manner in which each detection system is influenced by changes in solute concentration and solution flow rates must be carefully considered

  2. Parallel segmented outlet flow high performance liquid chromatography with multiplexed detection

    Energy Technology Data Exchange (ETDEWEB)

    Camenzuli, Michelle [Australian Centre for Research on Separation Science (ACROSS), School of Science and Health, University of Western Sydney (Parramatta), Sydney, NSW (Australia); Terry, Jessica M. [Centre for Chemistry and Biotechnology, School of Life and Environmental Sciences, Deakin University, Geelong, Victoria 3216 (Australia); Shalliker, R. Andrew, E-mail: r.shalliker@uws.edu.au [Australian Centre for Research on Separation Science (ACROSS), School of Science and Health, University of Western Sydney (Parramatta), Sydney, NSW (Australia); Conlan, Xavier A.; Barnett, Neil W. [Centre for Chemistry and Biotechnology, School of Life and Environmental Sciences, Deakin University, Geelong, Victoria 3216 (Australia); Francis, Paul S., E-mail: paul.francis@deakin.edu.au [Centre for Chemistry and Biotechnology, School of Life and Environmental Sciences, Deakin University, Geelong, Victoria 3216 (Australia)

    2013-11-25

    Graphical abstract: -- Highlights: •Multiplexed detection for liquid chromatography. •‘Parallel segmented outlet flow’ distributes inner and outer portions of the analyte zone. •Three detectors were used simultaneously for the determination of opiate alkaloids. -- Abstract: We describe a new approach to multiplex detection for HPLC, exploiting parallel segmented outlet flow – a new column technology that provides pressure-regulated control of eluate flow through multiple outlet channels, which minimises the additional dead volume associated with conventional post-column flow splitting. Using three detectors: one UV-absorbance and two chemiluminescence systems (tris(2,2′-bipyridine)ruthenium(III) and permanganate), we examine the relative responses for six opium poppy (Papaver somniferum) alkaloids under conventional and multiplexed conditions, where approximately 30% of the eluate was distributed to each detector and the remaining solution directed to a collection vessel. The parallel segmented outlet flow mode of operation offers advantages in terms of solvent consumption, waste generation, total analysis time and solute band volume when applying multiple detectors to HPLC, but the manner in which each detection system is influenced by changes in solute concentration and solution flow rates must be carefully considered.

  3. NbN nanowire optical detectors for high speed applications

    International Nuclear Information System (INIS)

    Quaranta, O; Pagano, S; Ejrnaes, M; Nappi, C; Pessina, E; Fontana, F

    2008-01-01

    We have developed a novel geometry for single photon optical detectors (SSPD) based on NbN nanowires. Traditionally the SSPD are realized in a meander structure in order to realize a reasonable (few square microns) collecting area. This has the disadvantage of generating a large detector inductance, mostly of kinetic origin, that strongly limits the detector operation in high speed applications, such as telecommunication. Moreover the extreme aspect ratio of the detector (a nanowire a fraction of mm long and 100 nm wide) puts strong requirements on the nanofabrication processes, with negative effects on the production yield. Our novel proposed geometry is based on a parallel stripes configuration designed in such a way that the light induced switching of a single stripe generates the switching of all the other through a cascade mechanism. The net result is an SSPD device that has a much lower intrinsic inductance, and consequently a much wider bandwidth (up to 10 GHz range). Moreover the signal amplitude generated is much larger than that of traditional SSPD, due to the contribution of all the parallel stripe. We present here the design and results of numerical simulation of the response of this novel type of SSPD. In particular we discuss of the design solutions that allow the cascade operation of the detector, by realizing a very fast and synchronous switching of all the parallel lines. Key issues, such as the optimal number of parallel lines, with respect to fabrication and operation constraints of the detectors are also discussed

  4. Ultrascalable petaflop parallel supercomputer

    Science.gov (United States)

    Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY

    2010-07-20

    A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

  5. Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

    CERN Document Server

    Calafiura, Paolo; The ATLAS collaboration; Seuster, Rolf; Tsulaia, Vakhtang; van Gemmeren, Peter

    2015-01-01

    AthenaMP is a multi-process version of the ATLAS reconstruction and data analysis framework Athena. By leveraging Linux fork and copy-on-write, it allows the sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain confugurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows to run AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the...

  6. Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

    CERN Document Server

    Calafiura, Paolo; Seuster, Rolf; Tsulaia, Vakhtang; van Gemmeren, Peter

    2015-01-01

    AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows to run AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of Ath...

  7. High-resolution brain SPECT imaging by combination of parallel and tilted detector heads.

    Science.gov (United States)

    Suzuki, Atsuro; Takeuchi, Wataru; Ishitsu, Takafumi; Morimoto, Yuichi; Kobashi, Keiji; Ueno, Yuichiro

    2015-10-01

    To improve the spatial resolution of brain single-photon emission computed tomography (SPECT), we propose a new brain SPECT system in which the detector heads are tilted towards the rotation axis so that they are closer to the brain. In addition, parallel detector heads are used to obtain the complete projection data set. We evaluated this parallel and tilted detector head system (PT-SPECT) in simulations. In the simulation study, the tilt angle of the detector heads relative to the axis was 45°. The distance from the collimator surface of the parallel detector heads to the axis was 130 mm. The distance from the collimator surface of the tilted detector heads to the origin on the axis was 110 mm. A CdTe semiconductor panel with a 1.4 mm detector pitch and a parallel-hole collimator were employed in both types of detector head. A line source phantom, cold-rod brain-shaped phantom, and cerebral blood flow phantom were evaluated. The projection data were generated by forward-projection of the phantom images using physics models, and Poisson noise at clinical levels was applied to the projection data. The ordered-subsets expectation maximization algorithm with physics models was used. We also evaluated conventional SPECT using four parallel detector heads for the sake of comparison. The evaluation of the line source phantom showed that the transaxial FWHM in the central slice for conventional SPECT ranged from 6.1 to 8.5 mm, while that for PT-SPECT ranged from 5.3 to 6.9 mm. The cold-rod brain-shaped phantom image showed that conventional SPECT could visualize up to 8-mm-diameter rods. By contrast, PT-SPECT could visualize up to 6-mm-diameter rods in upper slices of a cerebrum. The cerebral blood flow phantom image showed that the PT-SPECT system provided higher resolution at the thalamus and caudate nucleus as well as at the longitudinal fissure of the cerebrum compared with conventional SPECT. PT-SPECT provides improved image resolution at not only upper but also at

  8. Parallel R

    CERN Document Server

    McCallum, Ethan

    2011-01-01

    It's tough to argue with R as a high-quality, cross-platform, open source statistical software product-unless you're in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You'll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don't. With these packages, you can overcome R's single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R's memory barrier.

  9. Parallel microscope-based fluorescence, absorbance and time-of-flight mass spectrometry detection for high performance liquid chromatography and determination of glucosamine in urine.

    Science.gov (United States)

    Xiong, Bo; Wang, Ling-Ling; Li, Qiong; Nie, Yu-Ting; Cheng, Shuang-Shuang; Zhang, Hui; Sun, Ren-Qiang; Wang, Yu-Jiao; Zhou, Hong-Bin

    2015-11-01

    A parallel microscope-based laser-induced fluorescence (LIF), ultraviolet-visible absorbance (UV) and time-of-flight mass spectrometry (TOF-MS) detection for high performance liquid chromatography (HPLC) was achieved and used to determine glucosamine in urines. First, a reliable and convenient LIF detection was developed based on an inverted microscope and corresponding modulations. Parallel HPLC-LIF/UV/TOF-MS detection was developed by the combination of preceding Microscope-based LIF detection and HPLC coupled with UV and TOF-MS. The proposed setup, due to its parallel scheme, was free of the influence from photo bleaching in LIF detection. Rhodamine B, glutamic acid and glucosamine have been determined to evaluate its performance. Moreover, the proposed strategy was used to determine the glucosamine in urines, and subsequent results suggested that glucosamine, which was widely used in the prevention of the bone arthritis, was metabolized to urines within 4h. Furthermore, its concentration in urines decreased to 5.4mM at 12h. Efficient glucosamine detection was achieved based on a sensitive quantification (LIF), a universal detection (UV) and structural characterizations (TOF-MS). This application indicated that the proposed strategy was sensitive, universal and versatile, and it was capable of improved analysis, especially for analytes with low concentrations in complex samples, compared with conventional HPLC-UV/TOF-MS. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. Performance of the Galley Parallel File System

    Science.gov (United States)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    As the input/output (I/O) needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. This interface conceals the parallism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. Initial experiments, reported in this paper, indicate that Galley is capable of providing high-performance 1/O to applications the applications that rely on them. In Section 3 we describe that access data in patterns that have been observed to be common.

  11. Simulation of synchrotron white-beam topographs. An algorithm for parallel processing: application to the study of piezoelectric devices

    International Nuclear Information System (INIS)

    Epelboin, Y.

    1996-01-01

    This paper presents a new algorithm for the integration of Takagi-Taupin equations taking into account the fact that X-ray diffraction is a parallel phenomenon. The diffraction equations show that the propagation of the waves is independent in each incidence plane. It is thus possible to compute in parallel the propagation of the waves in different planes. Two algorithms are presented: the first one for multiprocessor machines where the processors share a common memory, the second one for massively parallel computers. The program is written to achieve a high vectorization ratio and to make it as efficient as possible with modern superscalar and array processors. The simulation of the image of a defect has been divided into two independent parts. In the first one, one computes the derivatives of the deformation inside the crystal; in the second one, these results are used to simulate the image. This allows one rapidly to change the model for a defect, something that was not feasible in all previously written simulation programs since the computation of the deformation was part of the simulation. The study of stroboscopic images of the propagation of acoustic waves in piezoelectric devices is given as an example of the possibilities of this new program. (orig.)

  12. A Fast Parallel Algorithm for Selected Inversion of Structured Sparse Matrices with Application to 2D Electronic Structure Calculations

    International Nuclear Information System (INIS)

    Lin Lin; Chao Yang; Jiangfeng Lu; Lexing Ying; Weinan, E.

    2009-01-01

    We present an efficient parallel algorithm and its implementation for computing the diagonal of H -1 where H is a 2D Kohn-Sham Hamiltonian discretized on a rectangular domain using a standard second order finite difference scheme. This type of calculation can be used to obtain an accurate approximation to the diagonal of a Fermi-Dirac function of H through a recently developed pole-expansion technique LinLuYingE2009. The diagonal elements are needed in electronic structure calculations for quantum mechanical systems HohenbergKohn1964, KohnSham 1965,DreizlerGross1990. We show how elimination tree is used to organize the parallel computation and how synchronization overhead is reduced by passing data level by level along this tree using the technique of local buffers and relative indices. We analyze the performance of our implementation by examining its load balance and communication overhead. We show that our implementation exhibits an excellent weak scaling on a large-scale high performance distributed parallel machine. When compared with standard approach for evaluating the diagonal a Fermi-Dirac function of a Kohn-Sham Hamiltonian associated a 2D electron quantum dot, the new pole-expansion technique that uses our algorithm to compute the diagonal of (H-z i I) -1 for a small number of poles z i is much faster, especially when the quantum dot contains many electrons.

  13. A Fast Parallel Algorithm for Selected Inversion of Structured Sparse Matrices with Application to 2D Electronic Structure Calculations

    Energy Technology Data Exchange (ETDEWEB)

    Lin, Lin; Yang, Chao; Lu, Jiangfeng; Ying, Lexing; E, Weinan

    2009-09-25

    We present an efficient parallel algorithm and its implementation for computing the diagonal of $H^-1$ where $H$ is a 2D Kohn-Sham Hamiltonian discretized on a rectangular domain using a standard second order finite difference scheme. This type of calculation can be used to obtain an accurate approximation to the diagonal of a Fermi-Dirac function of $H$ through a recently developed pole-expansion technique \\cite{LinLuYingE2009}. The diagonal elements are needed in electronic structure calculations for quantum mechanical systems \\citeHohenbergKohn1964, KohnSham 1965,DreizlerGross1990. We show how elimination tree is used to organize the parallel computation and how synchronization overhead is reduced by passing data level by level along this tree using the technique of local buffers and relative indices. We analyze the performance of our implementation by examining its load balance and communication overhead. We show that our implementation exhibits an excellent weak scaling on a large-scale high performance distributed parallel machine. When compared with standard approach for evaluating the diagonal a Fermi-Dirac function of a Kohn-Sham Hamiltonian associated a 2D electron quantum dot, the new pole-expansion technique that uses our algorithm to compute the diagonal of $(H-z_i I)^-1$ for a small number of poles $z_i$ is much faster, especially when the quantum dot contains many electrons.

  14. Synthesis and Evaluation of A High Precision 3D-Printed Ti6Al4V Compliant Parallel Manipulator

    Science.gov (United States)

    Pham, Minh Tuan; Teo, Tat Joo; Huat Yeo, Song; Wang, Pan; Nai, Mui Ling Sharon

    2017-12-01

    A novel 3D printed compliant parallel manipulator (CPM) with θX - θX - Z motions is presented in this paper. This CPM is synthesized using the beam-based method, a new structural optimization approach, to achieve optimized stiffness properties with targeted dynamic behavior. The CPM performs high non-actuating stiffness based on the predicted stiffness ratios of about 3600 for translations and 570 for rotations, while the dynamic response is fast with the targeted first resonant mode of 100Hz. A prototype of the synthesized CPM is fabricated using the electron beam melting (EBM) technology with Ti6Al4V material. Driven by three voice-coil (VC) motors, the CPM demonstrated a positioning resolution of 50nm along the Z axis and an angular resolution of ~0.3 “about the X and Y axes, the positioning accuracy is also good with the measured values of ±25.2nm and ±0.17” for the translation and rotations respectively. Experimental investigation also shows that this large workspace CPM has a first resonant mode of 98Hz and the stiffness behavior matches the prediction with the highest deviation of 11.2%. Most importantly, the full workspace of 10° × 10° × 7mm of the proposed CPM can be achieved, that demonstrates 3D printed compliant mechanisms can perform large elastic deformation. The obtained results show that CPMs printed by EBM technology have predictable mechanical characteristics and are applicable in precise positioning systems.

  15. Highly Parallel Computing Architectures by using Arrays of Quantum-dot Cellular Automata (QCA): Opportunities, Challenges, and Recent Results

    Science.gov (United States)

    Fijany, Amir; Toomarian, Benny N.

    2000-01-01

    -based architectures for highly parallel and systolic computation of signal/image processing applications, such as FFT and Wavelet and Wlash-Hadamard Transforms.

  16. A high-quality narrow passband filter for elastic SV waves via aligned parallel separated thin polymethylmethacrylate plates

    OpenAIRE

    Jun Zhang; Yaolu Liu; Wensheng Yan; Ning Hu

    2017-01-01

    We designed a high-quality filter that consists of aligned parallel polymethylmethacrylate (PMMA) thin plates with small gaps for elastic SV waves propagate in metals. Both the theoretical model and the full numerical simulation show the transmission spectrum of the elastic SV waves through such a filter has several sharp peaks with flawless transmission within the investigated frequencies. These peaks can be readily tuned by manipulating the geometry parameters of the PMMA plates. Our invest...

  17. Approaches for introducing high molecular diversity in scaffolds: fast parallel synthesis of highly substituted 1H-quinolin-4-one libraries.

    Science.gov (United States)

    Kuznetsov, Vladimir; Gorohovsky, Sofia; Levy, Amalia; Meir, Simcha; Shkoulev, Vladimir; Menashe, Naim; Greenwald, Moshe; Aizikovich, Alexander; Ofer, Dror; Byk, Gerardo; Gellerman, Garry

    2004-01-01

    We have developed a two steps strategy for the parallel synthesis of highly diversified quinolin-ones. In the first step we have combined and improved different synthetic methods for generating quinolin-4-ones bearing four different substitutions at specific positions using round bottomed flasks. The synthesis was assessed for a large number of substituted quinolin-4-ones. In the second step, the improved method was adapted to a parallel array synthesis using a 12 positions carrousel as demonstrated for the synthesis of 42-variable quinolin-4-ones. The first combinatorial library set 14(a-x) was obtained with a chemical purity of more than 95% without purification, the second library set 15(a-r), which included two synthetic steps, needed combinatorial purification using an innovative parallel purifier. The proposed approach contributes to a more extensive diversification of molecular scaffolds in general and provides access to highly substituted quinolinones in particular.

  18. Turbo-SMT: Parallel Coupled Sparse Matrix-Tensor Factorizations and Applications

    Science.gov (United States)

    Papalexakis, Evangelos E.; Faloutsos, Christos; Mitchell, Tom M.; Talukdar, Partha Pratim; Sidiropoulos, Nicholas D.; Murphy, Brian

    2016-01-01

    How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like ’edible’, ’fits in hand’)? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem. Can we enhance any CMTF solver, so that it can operate on potentially very large datasets that may not fit in main memory? We introduce Turbo-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, produces sparse and interpretable solutions, and parallelizes any CMTF algorithm, producing sparse and interpretable solutions (up to 65 fold). Additionally, we improve upon ALS, the work-horse algorithm for CMTF, with respect to efficiency and robustness to missing values. We apply Turbo-SMT to BrainQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. Turbo-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy. Finally, we demonstrate the generality of Turbo-SMT, by applying it on a Facebook dataset (users, ’friends’, wall-postings); there, Turbo-SMT spots spammer-like anomalies. PMID:27672406

  19. HLIBCov: Parallel Hierarchical Matrix Approximation of Large Covariance Matrices and Likelihoods with Applications in Parameter Identification

    KAUST Repository

    Litvinenko, Alexander

    2017-09-26

    The main goal of this article is to introduce the parallel hierarchical matrix library HLIBpro to the statistical community. We describe the HLIBCov package, which is an extension of the HLIBpro library for approximating large covariance matrices and maximizing likelihood functions. We show that an approximate Cholesky factorization of a dense matrix of size $2M\\\\times 2M$ can be computed on a modern multi-core desktop in few minutes. Further, HLIBCov is used for estimating the unknown parameters such as the covariance length, variance and smoothness parameter of a Mat\\\\\\'ern covariance function by maximizing the joint Gaussian log-likelihood function. The computational bottleneck here is expensive linear algebra arithmetics due to large and dense covariance matrices. Therefore covariance matrices are approximated in the hierarchical ($\\\\H$-) matrix format with computational cost $\\\\mathcal{O}(k^2n \\\\log^2 n/p)$ and storage $\\\\mathcal{O}(kn \\\\log n)$, where the rank $k$ is a small integer (typically $k<25$), $p$ the number of cores and $n$ the number of locations on a fairly general mesh. We demonstrate a synthetic example, where the true values of known parameters are known. For reproducibility we provide the C++ code, the documentation, and the synthetic data.

  20. The Application of Paired Parallel Filters for Ultra-Wideband Signal Processing

    Directory of Open Access Journals (Sweden)

    S. L. Chernyshev

    2015-01-01

    Full Text Available The paper considers a unit in which the parallel filters on regular lines are pair-attached. This connection allows to reduce a side line impedance at the point of connection. At the same time these lines become narrow, and the possibility to excite higher modes in the joint reduces.Consider the scattering matrix of four identical lines connection. Then find the scattering matrix of connection in which two side lines are connected with filters. Particular cases of the reflection coefficients of different filters are considered. It is shown that only in the case of identical filters there remained a linear relationship between the input filter coefficients of reflection and transmission coefficient of the unit. It facilitates the solution of the problem of synthesis. Restrictions on the transfer coefficient are found. In transition to the time domain impulse response of connection under consideration and the expression for the synthesis were defined. The paper considers an example of implementation of the matched filtering in this connection. In this case, the output signal is a half-sum of the input signal and their autocorrelation function.

  1. Application of Sigmoidal Gompertz Curves in Reverse Parallel Parking for Autonomous Vehicles

    Directory of Open Access Journals (Sweden)

    Aneesh Chand

    2015-09-01

    Full Text Available A new method for the planning and autonomous execution of a single-trajectory, velocity-independent, parallel parking manoeuvre for autonomous vehicles is presented. The procedure commences with the identification and pre-selection of a smooth sigmoidal trajectory known as the Gompertz curve in parametric format. Trajectory parameters are determined in real-time during the path-planning phase using an optimization scheme in order to generate a candidate path. The optimization scheme takes into account the maximum steering angles that can be physically realized and checks the generated candidate trajectory for collisions. Thereafter, the trajectory is reparametrized to arc-length format using the cubic interpolation method and the vehicle orientation at every point of the trajectory is deduced. Following that, values of the steering angle(s are determined. In the final step, the vehicle uses dead-reckoning to follows the arc-length parametrized path in reverse in order to park itself in a single-manoeuvre. The proposed method is substantiated through both extensive simulations and real sensor data.

  2. A COMPUTER CLUSTER SYSTEM FOR PSEUDO-PARALLEL EXECUTION OF GEANT4 SERIAL APPLICATION

    Directory of Open Access Journals (Sweden)

    Memmo Federici

    2013-12-01

    Full Text Available Simulation of the interactions between particles and matter in studies for developing X-rays detectors generally requires very long calculation times (up to several days or weeks. These times are often a serious limitation for the success of the simulations and for the accuracy of the simulated models. One of the tools used by the scientific community to perform these simulations is Geant4 (Geometry And Tracking [2, 3]. On the best of experience in the design of the AVES cluster computing system, Federici et al. [1], the IAPS (Istituto di Astrofisica e Planetologia Spaziali INAF laboratories were able to develop a cluster computer system dedicated to Geant 4. The Cluster is easy to use and easily expandable, and thanks to the design criteria adopted it achieves an excellent compromise between performance and cost. The management software developed for the Cluster splits the single instance of simulation on the cores available, allowing the use of software written for serial computation to reach a computing speed similar to that obtainable from a native parallel software. The simulations carried out on the Cluster showed an increase in execution time by a factor of 20 to 60 compared to the times obtained with the use of a single PC of medium quality.

  3. HLIBCov: Parallel Hierarchical Matrix Approximation of Large Covariance Matrices and Likelihoods with Applications in Parameter Identification

    KAUST Repository

    Litvinenko, Alexander

    2017-09-24

    The main goal of this article is to introduce the parallel hierarchical matrix library HLIBpro to the statistical community. We describe the HLIBCov package, which is an extension of the HLIBpro library for approximating large covariance matrices and maximizing likelihood functions. We show that an approximate Cholesky factorization of a dense matrix of size $2M\\\\times 2M$ can be computed on a modern multi-core desktop in few minutes. Further, HLIBCov is used for estimating the unknown parameters such as the covariance length, variance and smoothness parameter of a Mat\\\\\\'ern covariance function by maximizing the joint Gaussian log-likelihood function. The computational bottleneck here is expensive linear algebra arithmetics due to large and dense covariance matrices. Therefore covariance matrices are approximated in the hierarchical ($\\\\mathcal{H}$-) matrix format with computational cost $\\\\mathcal{O}(k^2n \\\\log^2 n/p)$ and storage $\\\\mathcal{O}(kn \\\\log n)$, where the rank $k$ is a small integer (typically $k<25$), $p$ the number of cores and $n$ the number of locations on a fairly general mesh. We demonstrate a synthetic example, where the true values of known parameters are known. For reproducibility we provide the C++ code, the documentation, and the synthetic data.

  4. Convergence of highly parallel stray field calculation using the fast multipole method on irregular meshes

    Science.gov (United States)

    Palmesi, P.; Abert, C.; Bruckner, F.; Suess, D.

    2018-05-01

    Fast stray field calculation is commonly considered of great importance for micromagnetic simulations, since it is the most time consuming part of the simulation. The Fast Multipole Method (FMM) has displayed linear O(N) parallelization behavior on many cores. This article investigates the error of a recent FMM approach approximating sources using linear—instead of constant—finite elements in the singular integral for calculating the stray field and the corresponding potential. After measuring performance in an earlier manuscript, this manuscript investigates the convergence of the relative L2 error for several FMM simulation parameters. Various scenarios either calculating the stray field directly or via potential are discussed.

  5. Design Analysis and Dynamic Modeling of a High-Speed 3T1R Pick-and-Place Parallel Robot

    DEFF Research Database (Denmark)

    Wu, Guanglei; Bai, Shaoping; Hjørnet, Preben

    2015-01-01

    This paper introduces a four degree-of-freedom parallel robot producing three translation and one rotation (Schönflies motion). This robot can generate a rectangular workspace that is close to the applicable work envelope and suitable for pick-and-place operations. The kinematics of the robot...... is studied to analyze the workspace and the isocontours of the local dexterity over the representative regular workspace are visualized. The simplified dynamics is modeled and compared with Adams model to show its effectiveness....

  6. Experience with highly-parallel software for the storage system of the ATLAS Experiment at CERN

    CERN Document Server

    Colombo, T; The ATLAS collaboration

    2012-01-01

    The ATLAS experiment is observing proton-proton collisions delivered by the LHC accelerator. The ATLAS Trigger and Data Acquisition (TDAQ) system selects interesting events on-line in a three-level trigger system in order to store them at a budgeted rate of several hundred Hz. This paper focuses on the TDAQ data-logging system and in particular on the implementation and performance of a novel parallel software design. In this respect, the main challenge presented by the data-logging workload is the conflict between the largely parallel nature of the event processing, especially the recently introduced event compression, and the constraint of sequential file writing and checksum evaluation. This is further complicated by the necessity of operating in a fully data-driven mode, to cope with continuously evolving trigger and detector configurations. In this paper we report on the design of the new ATLAS on-line storage software. In particular we will discuss our development experience using recent concurrency-ori...

  7. H5Part A Portable High Performance Parallel Data Interface for Particle Simulations

    CERN Document Server

    Adelmann, Andreas; Shalf, John M; Siegerist, Cristina

    2005-01-01

    Largest parallel particle simulations, in six dimensional phase space generate wast amont of data. It is also desirable to share data and data analysis tools such as ParViT (Particle Visualization Toolkit) among other groups who are working on particle-based accelerator simulations. We define a very simple file schema built on top of HDF5 (Hierarchical Data Format version 5) as well as an API that simplifies the reading/writing of the data to the HDF5 file format. HDF5 offers a self-describing machine-independent binary file format that supports scalable parallel I/O performance for MPI codes on a variety of supercomputing systems and works equally well on laptop computers. The API is available for C, C++, and Fortran codes. The file format will enable disparate research groups with very different simulation implementations to share data transparently and share data analysis tools. For instance, the common file format will enable groups that depend on completely different simulation implementations to share c...

  8. Application of massively parallel sequencing to genetic diagnosis in multiplex families with idiopathic sensorineural hearing impairment.

    Directory of Open Access Journals (Sweden)

    Chen-Chi Wu

    Full Text Available Despite the clinical utility of genetic diagnosis to address idiopathic sensorineural hearing impairment (SNHI, the current strategy for screening mutations via Sanger sequencing suffers from the limitation that only a limited number of DNA fragments associated with common deafness mutations can be genotyped. Consequently, a definitive genetic diagnosis cannot be achieved in many families with discernible family history. To investigate the diagnostic utility of massively parallel sequencing (MPS, we applied the MPS technique to 12 multiplex families with idiopathic SNHI in which common deafness mutations had previously been ruled out. NimbleGen sequence capture array was designed to target all protein coding sequences (CDSs and 100 bp of the flanking sequence of 80 common deafness genes. We performed MPS on the Illumina HiSeq2000, and applied BWA, SAMtools, Picard, GATK, Variant Tools, ANNOVAR, and IGV for bioinformatics analyses. Initial data filtering with allele frequencies (0.95 prioritized 5 indels (insertions/deletions and 36 missense variants in the 12 multiplex families. After further validation by Sanger sequencing, segregation pattern, and evolutionary conservation of amino acid residues, we identified 4 variants in 4 different genes, which might lead to SNHI in 4 families compatible with autosomal dominant inheritance. These included GJB2 p.R75Q, MYO7A p.T381M, KCNQ4 p.S680F, and MYH9 p.E1256K. Among them, KCNQ4 p.S680F and MYH9 p.E1256K were novel. In conclusion, MPS allows genetic diagnosis in multiplex families with idiopathic SNHI by detecting mutations in relatively uncommon deafness genes.

  9. A comparison of high-order explicit Runge–Kutta, extrapolation, and deferred correction methods in serial and parallel

    KAUST Repository

    Ketcheson, David I.

    2014-06-13

    We compare the three main types of high-order one-step initial value solvers: extrapolation, spectral deferred correction, and embedded Runge–Kutta pairs. We consider orders four through twelve, including both serial and parallel implementations. We cast extrapolation and deferred correction methods as fixed-order Runge–Kutta methods, providing a natural framework for the comparison. The stability and accuracy properties of the methods are analyzed by theoretical measures, and these are compared with the results of numerical tests. In serial, the eighth-order pair of Prince and Dormand (DOP8) is most efficient. But other high-order methods can be more efficient than DOP8 when implemented in parallel. This is demonstrated by comparing a parallelized version of the wellknown ODEX code with the (serial) DOP853 code. For an N-body problem with N = 400, the experimental extrapolation code is as fast as the tuned Runge–Kutta pair at loose tolerances, and is up to two times as fast at tight tolerances.

  10. Conceptual design and kinematic analysis of a novel parallel robot for high-speed pick-and-place operations

    Science.gov (United States)

    Meng, Qizhi; Xie, Fugui; Liu, Xin-Jun

    2018-06-01

    This paper deals with the conceptual design, kinematic analysis and workspace identification of a novel four degrees-of-freedom (DOFs) high-speed spatial parallel robot for pick-and-place operations. The proposed spatial parallel robot consists of a base, four arms and a 1½ mobile platform. The mobile platform is a major innovation that avoids output singularity and offers the advantages of both single and double platforms. To investigate the characteristics of the robot's DOFs, a line graph method based on Grassmann line geometry is adopted in mobility analysis. In addition, the inverse kinematics is derived, and the constraint conditions to identify the correct solution are also provided. On the basis of the proposed concept, the workspace of the robot is identified using a set of presupposed parameters by taking input and output transmission index as the performance evaluation criteria.

  11. High performance shallow water kernels for parallel overland flow simulations based on FullSWOF2D

    KAUST Repository

    Wittmann, Roland

    2017-01-25

    We describe code optimization and parallelization procedures applied to the sequential overland flow solver FullSWOF2D. Major difficulties when simulating overland flows comprise dealing with high resolution datasets of large scale areas which either cannot be computed on a single node either due to limited amount of memory or due to too many (time step) iterations resulting from the CFL condition. We address these issues in terms of two major contributions. First, we demonstrate a generic step-by-step transformation of the second order finite volume scheme in FullSWOF2D towards MPI parallelization. Second, the computational kernels are optimized by the use of templates and a portable vectorization approach. We discuss the load imbalance of the flux computation due to dry and wet cells and propose a solution using an efficient cell counting approach. Finally, scalability results are shown for different test scenarios along with a flood simulation benchmark using the Shaheen II supercomputer.

  12. Massively parallel digital high resolution melt for rapid and absolutely quantitative sequence profiling

    Science.gov (United States)

    Velez, Daniel Ortiz; Mack, Hannah; Jupe, Julietta; Hawker, Sinead; Kulkarni, Ninad; Hedayatnia, Behnam; Zhang, Yang; Lawrence, Shelley; Fraley, Stephanie I.

    2017-02-01

    In clinical diagnostics and pathogen detection, profiling of complex samples for low-level genotypes represents a significant challenge. Advances in speed, sensitivity, and extent of multiplexing of molecular pathogen detection assays are needed to improve patient care. We report the development of an integrated platform enabling the identification of bacterial pathogen DNA sequences in complex samples in less than four hours. The system incorporates a microfluidic chip and instrumentation to accomplish universal PCR amplification, High Resolution Melting (HRM), and machine learning within 20,000 picoliter scale reactions, simultaneously. Clinically relevant concentrations of bacterial DNA molecules are separated by digitization across 20,000 reactions and amplified with universal primers targeting the bacterial 16S gene. Amplification is followed by HRM sequence fingerprinting in all reactions, simultaneously. The resulting bacteria-specific melt curves are identified by Support Vector Machine learning, and individual pathogen loads are quantified. The platform reduces reaction volumes by 99.995% and achieves a greater than 200-fold increase in dynamic range of detection compared to traditional PCR HRM approaches. Type I and II error rates are reduced by 99% and 100% respectively, compared to intercalating dye-based digital PCR (dPCR) methods. This technology could impact a number of quantitative profiling applications, especially infectious disease diagnostics.

  13. Parallel Programming Application to Matrix Algebra in the Spectral Method for Control Systems Analysis, Synthesis and Identification

    Directory of Open Access Journals (Sweden)

    V. Yu. Kleshnin

    2016-01-01

    Full Text Available The article describes the matrix algebra libraries based on the modern technologies of parallel programming for the Spectrum software, which can use a spectral method (in the spectral form of mathematical description to analyse, synthesise and identify deterministic and stochastic dynamical systems. The developed matrix algebra libraries use the following technologies for the GPUs: OmniThreadLibrary, OpenMP, Intel Threading Building Blocks, Intel Cilk Plus for CPUs nVidia CUDA, OpenCL, and Microsoft Accelerated Massive Parallelism.The developed libraries support matrices with real elements (single and double precision. The matrix dimensions are limited by 32-bit or 64-bit memory model and computer configuration. These libraries are general-purpose and can be used not only for the Spectrum software. They can also find application in the other projects where there is a need to perform operations with large matrices.The article provides a comparative analysis of the libraries developed for various matrix operations (addition, subtraction, scalar multiplication, multiplication, powers of matrices, tensor multiplication, transpose, inverse matrix, finding a solution of the system of linear equations through the numerical experiments using different CPU and GPU. The article contains sample programs and performance test results for matrix multiplication, which requires most of all computational resources in regard to the other operations.

  14. Parallel processing method for high-speed real time digital pulse processing for gamma-ray spectroscopy

    International Nuclear Information System (INIS)

    Fernandes, A.M.; Pereira, R.C.; Sousa, J.; Neto, A.; Carvalho, P.; Batista, A.J.N.; Carvalho, B.B.; Varandas, C.A.F.; Tardocchi, M.; Gorini, G.

    2010-01-01

    A new data acquisition (DAQ) system was developed to fulfil the requirements of the gamma-ray spectrometer (GRS) JET-EP2 (joint European Torus enhancement project 2), providing high-resolution spectroscopy at very high-count rate (up to few MHz). The system is based on the Advanced Telecommunications Computing Architecture TM (ATCA TM ) and includes a transient record (TR) module with 8 channels of 14 bits resolution at 400 MSamples/s (MSPS) sampling rate, 4 GB of local memory, and 2 field programmable gate array (FPGA) able to perform real time algorithms for data reduction and digital pulse processing. Although at 400 MSPS only fast programmable devices such as FPGAs can be used either for data processing and data transfer, FPGA resources also present speed limitation at some specific tasks, leading to an unavoidable data lost when demanding algorithms are applied. To overcome this problem and foreseeing an increase of the algorithm complexity, a new digital parallel filter was developed, aiming to perform real time pulse processing in the FPGAs of the TR module at the presented sampling rate. The filter is based on the conventional digital time-invariant trapezoidal shaper operating with parallelized data while performing pulse height analysis (PHA) and pile up rejection (PUR). The incoming sampled data is successively parallelized and fed into the processing algorithm block at one fourth of the sampling rate. The following data processing and data transfer is also performed at one fourth of the sampling rate. The algorithm based on data parallelization technique was implemented and tested at JET facilities, where a spectrum was obtained. Attending to the observed results, the PHA algorithm will be improved by implementing the pulse pile up discrimination.

  15. Quartic scaling MP2 for solids: A highly parallelized algorithm in the plane wave basis

    Science.gov (United States)

    Schäfer, Tobias; Ramberger, Benjamin; Kresse, Georg

    2017-03-01

    We present a low-complexity algorithm to calculate the correlation energy of periodic systems in second-order Møller-Plesset (MP2) perturbation theory. In contrast to previous approximation-free MP2 codes, our implementation possesses a quartic scaling, O ( N 4 ) , with respect to the system size N and offers an almost ideal parallelization efficiency. The general issue that the correlation energy converges slowly with the number of basis functions is eased by an internal basis set extrapolation. The key concept to reduce the scaling is to eliminate all summations over virtual orbitals which can be elegantly achieved in the Laplace transformed MP2 formulation using plane wave basis sets and fast Fourier transforms. Analogously, this approach could allow us to calculate second order screened exchange as well as particle-hole ladder diagrams with a similar low complexity. Hence, the presented method can be considered as a step towards systematically improved correlation energies.

  16. Stochastic parallel gradient descent based adaptive optics used for a high contrast imaging coronagraph

    International Nuclear Information System (INIS)

    Dong Bing; Ren Deqing; Zhang Xi

    2011-01-01

    An adaptive optics (AO) system based on a stochastic parallel gradient descent (SPGD) algorithm is proposed to reduce the speckle noises in the optical system of a stellar coronagraph in order to further improve the contrast. The principle of the SPGD algorithm is described briefly and a metric suitable for point source imaging optimization is given. The feasibility and good performance of the SPGD algorithm is demonstrated by an experimental system featured with a 140-actuator deformable mirror and a Hartmann-Shark wavefront sensor. Then the SPGD based AO is applied to a liquid crystal array (LCA) based coronagraph to improve the contrast. The LCA can modulate the incoming light to generate a pupil apodization mask of any pattern. A circular stepped pattern is used in our preliminary experiment and the image contrast shows improvement from 10 -3 to 10 -4.5 at an angular distance of 2λ/D after being corrected by SPGD based AO.

  17. Research on the verification of highly parallel and concurrent embedded software

    OpenAIRE

    青木, 利晃

    2012-01-01

    本研究では,スケジューリングを伴う並行・並列ソフトウェアと,スケジューリングを提供するリアルタイムオペレーティングシステム(RTOS)を対象とした.成果としては,前者に関しては,実時間を含む振る舞いを検証するためのアルゴリズムおよびツールを提案し,後者に関しては,RTOS の設計と実装を検証する手法およびツールを提案し,実際に使われているRTOS の検証も行った.これにより,現実的なセッティングで,モデル検査に基づいた手法の提案に成功し,実際に,現実問題に適用できることがわかった. : We focus on parallel/concurrent software which is controlled by real-time operating system(RTOS) and RTOS itself. We have proposed an algorithm and tool to verify the behavior of parallel/concurrent software which contains scheduling by RTOS and real-...

  18. Parallel discrete event simulation

    NARCIS (Netherlands)

    Overeinder, B.J.; Hertzberger, L.O.; Sloot, P.M.A.; Withagen, W.J.

    1991-01-01

    In simulating applications for execution on specific computing systems, the simulation performance figures must be known in a short period of time. One basic approach to the problem of reducing the required simulation time is the exploitation of parallelism. However, in parallelizing the simulation

  19. INVESTIGATION OF FLIP-FLOP PERFORMANCE ON DIFFERENT TYPE AND ARCHITECTURE IN SHIFT REGISTER WITH PARALLEL LOAD APPLICATIONS

    Directory of Open Access Journals (Sweden)

    Dwi Purnomo

    2015-08-01

    Full Text Available Register is one of the computer components that have a key role in computer organisation. Every computer contains millions of registers that are manifested by flip-flop. This research focuses on the investigation of flip-flop performance based on its type (D, T, S-R, and J-K and architecture (structural, behavioural, and hybrid. Each type of flip-flop on each architecture would be tested in different bit of shift register with parallel load applications. The experiment criteria that will be assessed are power consumption, resources required, memory required, latency, and efficiency. Based on the experiment, it could be shown that D flip-flop and hybrid architecture showed the best performance in required memory, latency, power consumption, and efficiency. In addition, the experiment results showed that the greater the register number, the less efficient the system would be.

  20. Unique parallel radiations of high-mountainous species of the genus Sedum (Crassulaceae) on the continental island of Taiwan.

    Science.gov (United States)

    Ito, Takuro; Yu, Chih-Chieh; Nakamura, Koh; Chung, Kuo-Fang; Yang, Qin-Er; Fu, Cheng-Xin; Qi, Zhe-Chen; Kokubugata, Goro

    2017-08-01

    We explored the temporal and spatial diversification of the plant genus Sedum L. (Crassulaceae) in Taiwan based on molecular analysis of nrITS and cpDNA sequences from East Asian Sedum members. Our phylogenetic and ancestral area reconstruction analysis showed that Taiwanese Sedum comprised two lineages that independently migrated from Japan and Eastern China. Furthermore, the genetic distances among species in these two clades were smaller than those of other East Asian Sedum clades, and the Taiwanese members of each clade occupy extremely varied habitats with similar niches in high-mountain regions. These data indicate that species diversification occurred in parallel in the two Taiwanese Sedum lineages, and that these parallel radiations could have occurred within the small continental island of Taiwan. Moreover, the estimated time of divergence for Taiwanese Sedum indicates that the two radiations might have been correlated to the formation of mountains in Taiwan during the early Pleistocene. We suggest that these parallel radiations may be attributable to the geographical dynamics of Taiwan and specific biological features of Sedum that allow them to adapt to new ecological niches. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. Getting To Exascale: Applying Novel Parallel Programming Models To Lab Applications For The Next Generation Of Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Dube, Evi [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Shereda, Charles [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Nau, Lee [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Harris, Lance [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2010-09-27

    As supercomputing moves toward exascale, node architectures will change significantly. CPU core counts on nodes will increase by an order of magnitude or more. Heterogeneous architectures will become more commonplace, with GPUs or FPGAs providing additional computational power. Novel programming models may make better use of on-node parallelism in these new architectures than do current models. In this paper we examine several of these novel models – UPC, CUDA, and OpenCL –to determine their suitability to LLNL scientific application codes. Our study consisted of several phases: We conducted interviews with code teams and selected two codes to port; We learned how to program in the new models and ported the codes; We debugged and tuned the ported applications; We measured results, and documented our findings. We conclude that UPC is a challenge for porting code, Berkeley UPC is not very robust, and UPC is not suitable as a general alternative to OpenMP for a number of reasons. CUDA is well supported and robust but is a proprietary NVIDIA standard, while OpenCL is an open standard. Both are well suited to a specific set of application problems that can be run on GPUs, but some problems are not suited to GPUs. Further study of the landscape of novel models is recommended.

  2. Advanced parallel processing with supercomputer architectures

    International Nuclear Information System (INIS)

    Hwang, K.

    1987-01-01

    This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers

  3. Design of a novel parallel reconfigurable machine tool

    CSIR Research Space (South Africa)

    Modungwa, D

    2008-06-01

    Full Text Available of meeting the demands for high mechanical dexterity adaptation as well as high stiffness necessary for mould and die re-conditioning. This paper presents, the design of parallel reconfigurable machine tool (PRMT) based on both application...

  4. An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications.

    Science.gov (United States)

    Park, Seong-Wook; Park, Junyoung; Bong, Kyeongryeol; Shin, Dongjoo; Lee, Jinmook; Choi, Sungpill; Yoo, Hoi-Jun

    2015-12-01

    Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm × 4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.

  5. Parallel MR imaging.

    Science.gov (United States)

    Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A; Seiberlich, Nicole

    2012-07-01

    Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the undersampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. Copyright © 2012 Wiley Periodicals, Inc.

  6. Parallelization in Modern C++

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...

  7. High voltage pulse generator. [Patent application

    Science.gov (United States)

    Fasching, G.E.

    1975-06-12

    An improved high-voltage pulse generator is described which is especially useful in ultrasonic testing of rock core samples. An N number of capacitors are charged in parallel to V volts and at the proper instance are coupled in series to produce a high-voltage pulse of N times V volts. Rapid switching of the capacitors from the paralleled charging configuration to the series discharging configuration is accomplished by using silicon-controlled rectifiers which are chain self-triggered following the initial triggering of the first rectifier connected between the first and second capacitors. A timing and triggering circuit is provided to properly synchronize triggering pulses to the first SCR at a time when the charging voltage is not being applied to the parallel-connected charging capacitors. The output voltage can be readily increased by adding additional charging networks. The circuit allows the peak level of the output to be easily varied over a wide range by using a variable autotransformer in the charging circuit.

  8. Survey and research on applications of parallel compiler; Heiretsu compiler tekiyorei ni kansuru chosa kenkyu hokokusho

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2001-03-31

    An urgent proposition is made that an advanced computing software program development and maintenance system be set up, and activities are conducted in search of strategies and guidelines for the establishment of such a system. Out of recognition that it is important to develop software programs such as operation systems for supercomputers, a survey is conducted of software technology development strategies especially involving application software programs. It is proposed that efforts be positively exerted to develop strategic software developing programs for advanced computing for concentratedly enhancing the development now under way of strategic software programs. In concrete terms, named as strategic software programs to be developed are a next-generation semiconductor TCAD (technology computer-aided design) system, protein structure/function analysis system, fatigue simulation system, next-generation fluid analysis system, chemical reaction simulator, grid computing, and a nanodevice surface analysis system. (NEDO)

  9. Three-dimensional all-speed CFD code for safety analysis of nuclear reactor containment: Status of GASFLOW parallelization, model development, validation and application

    Energy Technology Data Exchange (ETDEWEB)

    Xiao, Jianjun, E-mail: jianjun.xiao@kit.edu [Institute of Nuclear and Energy Technologies, Karlsruhe Institute of Technology, P.O. Box 3640, 76021 Karlsruhe (Germany); Travis, John R., E-mail: jack_travis@comcast.com [Engineering and Scientific Software Inc., 3010 Old Pecos Trail, Santa Fe, NM 87505 (United States); Royl, Peter, E-mail: peter.royl@partner.kit.edu [Institute of Nuclear and Energy Technologies, Karlsruhe Institute of Technology, P.O. Box 3640, 76021 Karlsruhe (Germany); Necker, Gottfried, E-mail: gottfried.necker@partner.kit.edu [Institute of Nuclear and Energy Technologies, Karlsruhe Institute of Technology, P.O. Box 3640, 76021 Karlsruhe (Germany); Svishchev, Anatoly, E-mail: anatoly.svishchev@kit.edu [Institute of Nuclear and Energy Technologies, Karlsruhe Institute of Technology, P.O. Box 3640, 76021 Karlsruhe (Germany); Jordan, Thomas, E-mail: thomas.jordan@kit.edu [Institute of Nuclear and Energy Technologies, Karlsruhe Institute of Technology, P.O. Box 3640, 76021 Karlsruhe (Germany)

    2016-05-15

    Highlights: • 3-D scalable semi-implicit pressure-based CFD code for containment safety analysis. • Robust solution algorithm valid for all-speed flows. • Well validated and widely used CFD code for hydrogen safety analysis. • Code applied in various types of nuclear reactor containments. • Parallelization enables high-fidelity models in large scale containment simulations. - Abstract: GASFLOW is a three dimensional semi-implicit all-speed CFD code which can be used to predict fluid dynamics, chemical kinetics, heat and mass transfer, aerosol transportation and other related phenomena involved in postulated accidents in nuclear reactor containments. The main purpose of the paper is to give a brief review on recent GASFLOW code development, validations and applications in the field of nuclear safety. GASFLOW code has been well validated by international experimental benchmarks, and has been widely applied to hydrogen safety analysis in various types of nuclear power plants in European and Asian countries, which have been summarized in this paper. Furthermore, four benchmark tests of a lid-driven cavity flow, low Mach number jet flow, 1-D shock tube and supersonic flow over a forward-facing step are presented in order to demonstrate the accuracy and wide-ranging capability of ICE’d ALE solution algorithm for all-speed flows. GASFLOW has been successfully parallelized using the paradigms of Message Passing Interface (MPI) and domain decomposition. The parallel version, GASFLOW-MPI, adds great value to large scale containment simulations by enabling high-fidelity models, including more geometric details and more complex physics. It will be helpful for the nuclear safety engineers to better understand the hydrogen safety related physical phenomena during the severe accident, to optimize the design of the hydrogen risk mitigation systems and to fulfill the licensing requirements by the nuclear regulatory authorities. GASFLOW-MPI is targeting a high

  10. Parallel Adjective High-Order CFD Simulations Characterizing SOFIA Cavity Acoustics

    Science.gov (United States)

    Barad, Michael F.; Brehm, Christoph; Kiris, Cetin C.; Biswas, Rupak

    2016-01-01

    This paper presents large-scale MPI-parallel computational uid dynamics simulations for the Stratospheric Observatory for Infrared Astronomy (SOFIA). SOFIA is an airborne, 2.5-meter infrared telescope mounted in an open cavity in the aft fuselage of a Boeing 747SP. These simulations focus on how the unsteady ow eld inside and over the cavity interferes with the optical path and mounting structure of the telescope. A temporally fourth-order accurate Runge-Kutta, and spatially fth-order accurate WENO- 5Z scheme was used to perform implicit large eddy simulations. An immersed boundary method provides automated gridding for complex geometries and natural coupling to a block-structured Cartesian adaptive mesh re nement framework. Strong scaling studies using NASA's Pleiades supercomputer with up to 32k CPU cores and 4 billion compu- tational cells shows excellent scaling. Dynamic load balancing based on execution time on individual AMR blocks addresses irregular numerical cost associated with blocks con- taining boundaries. Limits to scaling beyond 32k cores are identi ed, and targeted code optimizations are discussed.

  11. Scattering by a plane-parallel layer with high concentration of optically soft particles

    International Nuclear Information System (INIS)

    Loiko, Valery A.; Berdnik, Vladimir V.

    2009-01-01

    A method describing light propagation in a plane-parallel light-scattering layer with large concentration of homogeneous particles is developed. It is based on the radiative transfer equation and the doubling method. The interference approximation is used to take into account collective scattering effects. Spectral dependence of transmitted light for a layer of nonabsorbing optically soft particles with subwavelength-sized particles is investigated. At small volume concentration of the particles the weak spectral dependences of wave exponents for coherently transmitted and diffuse light are observed. It is shown that in a layer with large volume concentration of the subwavelength-sized particles the wave exponent can exceed considerably the value of four, which takes place for the Rayleigh particles. The dependence of wave exponents for coherently transmitted and diffuse light on the refractive index and concentration of particles is investigated in detail. Multiple scattering of light results in the reduction of the exponent. The quantitative results are presented and discussed. It is shown that there is a range of wavelengths where the negative values of the wave exponent at the regime of multiple scattering are implemented.

  12. Experience with highly-parallel software for the storage system of the ATLAS Experiment at CERN

    CERN Document Server

    Colombo, T; The ATLAS collaboration

    2012-01-01

    The ATLAS experiment is observing proton-proton collisions delivered by the LHC accelerator at a centre of mass energy of 7 TeV. The ATLAS Trigger and Data Acquisition (TDAQ) system selects interesting events on-line in a three-level trigger system in order to store them at a budgeted rate of several hundred Hz, for an average event size of ~1.2 MB. This paper focuses on the TDAQ data-logging system and in particular on the implementation and performance of a novel SW design, reporting on the effort of exploiting the full power of recently installed multi-core hardware. In this respect, the main challenge presented by the data-logging workload is the conflict between the largely parallel nature of the event processing, especially the recently introduced on-line event-compression, and the constraint of sequential file writing and checksum evaluation. This is furtherly complicated by the necessity of operating in a fully data-driven mode, to cope with continuously evolving trigger and detector configurations. T...

  13. A high-quality narrow passband filter for elastic SV waves via aligned parallel separated thin polymethylmethacrylate plates

    Directory of Open Access Journals (Sweden)

    Jun Zhang

    2017-08-01

    Full Text Available We designed a high-quality filter that consists of aligned parallel polymethylmethacrylate (PMMA thin plates with small gaps for elastic SV waves propagate in metals. Both the theoretical model and the full numerical simulation show the transmission spectrum of the elastic SV waves through such a filter has several sharp peaks with flawless transmission within the investigated frequencies. These peaks can be readily tuned by manipulating the geometry parameters of the PMMA plates. Our investigation finds that the same filter performs well for different metals where the elastic SV waves propagated.

  14. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing

    DEFF Research Database (Denmark)

    Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P

    2007-01-01

    BACKGROUND: The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine...... primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution...

  15. Molecular simulation workflows as parallel algorithms: the execution engine of Copernicus, a distributed high-performance computing platform.

    Science.gov (United States)

    Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik

    2015-06-09

    Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus.

  16. High-performance parallel computing in the classroom using the public goods game as an example

    Science.gov (United States)

    Perc, Matjaž

    2017-07-01

    The use of computers in statistical physics is common because the sheer number of equations that describe the behaviour of an entire system particle by particle often makes it impossible to solve them exactly. Monte Carlo methods form a particularly important class of numerical methods for solving problems in statistical physics. Although these methods are simple in principle, their proper use requires a good command of statistical mechanics, as well as considerable computational resources. The aim of this paper is to demonstrate how the usage of widely accessible graphics cards on personal computers can elevate the computing power in Monte Carlo simulations by orders of magnitude, thus allowing live classroom demonstration of phenomena that would otherwise be out of reach. As an example, we use the public goods game on a square lattice where two strategies compete for common resources in a social dilemma situation. We show that the second-order phase transition to an absorbing phase in the system belongs to the directed percolation universality class, and we compare the time needed to arrive at this result by means of the main processor and by means of a suitable graphics card. Parallel computing on graphics processing units has been developed actively during the last decade, to the point where today the learning curve for entry is anything but steep for those familiar with programming. The subject is thus ripe for inclusion in graduate and advanced undergraduate curricula, and we hope that this paper will facilitate this process in the realm of physics education. To that end, we provide a documented source code for an easy reproduction of presented results and for further development of Monte Carlo simulations of similar systems.

  17. A massively parallel sequencing approach uncovers ancient origins and high genetic variability of endangered Przewalski's horses.

    Science.gov (United States)

    Goto, Hiroki; Ryder, Oliver A; Fisher, Allison R; Schultz, Bryant; Kosakovsky Pond, Sergei L; Nekrutenko, Anton; Makova, Kateryna D

    2011-01-01

    The endangered Przewalski's horse is the closest relative of the domestic horse and is the only true wild horse species surviving today. The question of whether Przewalski's horse is the direct progenitor of domestic horse has been hotly debated. Studies of DNA diversity within Przewalski's horses have been sparse but are urgently needed to ensure their successful reintroduction to the wild. In an attempt to resolve the controversy surrounding the phylogenetic position and genetic diversity of Przewalski's horses, we used massively parallel sequencing technology to decipher the complete mitochondrial and partial nuclear genomes for all four surviving maternal lineages of Przewalski's horses. Unlike single-nucleotide polymorphism (SNP) typing usually affected by ascertainment bias, the present method is expected to be largely unbiased. Three mitochondrial haplotypes were discovered-two similar ones, haplotypes I/II, and one substantially divergent from the other two, haplotype III. Haplotypes I/II versus III did not cluster together on a phylogenetic tree, rejecting the monophyly of Przewalski's horse maternal lineages, and were estimated to split 0.117-0.186 Ma, significantly preceding horse domestication. In the phylogeny based on autosomal sequences, Przewalski's horses formed a monophyletic clade, separate from the Thoroughbred domestic horse lineage. Our results suggest that Przewalski's horses have ancient origins and are not the direct progenitors of domestic horses. The analysis of the vast amount of sequence data presented here suggests that Przewalski's and domestic horse lineages diverged at least 0.117 Ma but since then have retained ancestral genetic polymorphism and/or experienced gene flow.

  18. A high efficient integrated planar transformer for primary-parallel isolated boost converters

    DEFF Research Database (Denmark)

    Sen, Gökhan; Ouyang, Ziwei; Thomsen, Ole Cornelius

    2010-01-01

    for a fuel cell fed battery charger application with 50–110 V input and 65–105 V output. Input inductors are coupled for current sharing, eliminating the use of current sharing transformers. An efficiency of 94% is achieved during nominal operating condition where the input is 70-V and the output is 84-V....

  19. Evaluation of RANS and LES models for Natural Convection in High-Aspect-Ratio Parallel Plate Channels

    Science.gov (United States)

    Fradeneck, Austen; Kimber, Mark

    2017-11-01

    The present study evaluates the effectiveness of current RANS and LES models in simulating natural convection in high-aspect ratio parallel plate channels. The geometry under consideration is based on a simplification of the coolant and bypass channels in the very high-temperature gas reactor (VHTR). Two thermal conditions are considered, asymmetric and symmetric wall heating with an applied heat flux to match Rayleigh numbers experienced in the VHTR during a loss of flow accident (LOFA). RANS models are compared to analogous high-fidelity LES simulations. Preliminary results demonstrate the efficacy of the low-Reynolds number k- ɛ formulations and their enhancement to the standard form and Reynolds stress transport model in terms of calculating the turbulence production due to buoyancy and overall mean flow variables.

  20. Massively parallel evolutionary computation on GPGPUs

    CERN Document Server

    Tsutsui, Shigeyoshi

    2013-01-01

    Evolutionary algorithms (EAs) are metaheuristics that learn from natural collective behavior and are applied to solve optimization problems in domains such as scheduling, engineering, bioinformatics, and finance. Such applications demand acceptable solutions with high-speed execution using finite computational resources. Therefore, there have been many attempts to develop platforms for running parallel EAs using multicore machines, massively parallel cluster machines, or grid computing environments. Recent advances in general-purpose computing on graphics processing units (GPGPU) have opened u

  1. Processor architecture exploration and synthesis of massively parallel multi-processor accelerators in application to LDPC decoding

    NARCIS (Netherlands)

    Jan, Y.; Jóźwiak, Lech

    Numerous modern applications in various fields, such as communication and networking, multimedia, encryption, etc., impose extremely high demands regarding performance while at the same time requiring low energy consumption, low cost, and short design time. Often these very high demands cannot be

  2. SQDFT: Spectral Quadrature method for large-scale parallel O(N) Kohn-Sham calculations at high temperature

    Science.gov (United States)

    Suryanarayana, Phanish; Pratapa, Phanisri P.; Sharma, Abhiraj; Pask, John E.

    2018-03-01

    We present SQDFT: a large-scale parallel implementation of the Spectral Quadrature (SQ) method for O(N) Kohn-Sham Density Functional Theory (DFT) calculations at high temperature. Specifically, we develop an efficient and scalable finite-difference implementation of the infinite-cell Clenshaw-Curtis SQ approach, in which results for the infinite crystal are obtained by expressing quantities of interest as bilinear forms or sums of bilinear forms, that are then approximated by spatially localized Clenshaw-Curtis quadrature rules. We demonstrate the accuracy of SQDFT by showing systematic convergence of energies and atomic forces with respect to SQ parameters to reference diagonalization results, and convergence with discretization to established planewave results, for both metallic and insulating systems. We further demonstrate that SQDFT achieves excellent strong and weak parallel scaling on computer systems consisting of tens of thousands of processors, with near perfect O(N) scaling with system size and wall times as low as a few seconds per self-consistent field iteration. Finally, we verify the accuracy of SQDFT in large-scale quantum molecular dynamics simulations of aluminum at high temperature.

  3. A novel detection platform for parallel monitoring of DNA hybridization with high sensitivity and specificity

    DEFF Research Database (Denmark)

    Yi, Sun; Perch-Nielsen, Ivan R.; Wang, Zhenyu

    We developed a high-sensitive platform to monior multiple hybridization events in real time. By creating a microoptical array in a polymeric chip, the system combine the excellent discriminative power of supercritical angle fluorescence (SAF) microscopy with high-throughput capabilities of microa......We developed a high-sensitive platform to monior multiple hybridization events in real time. By creating a microoptical array in a polymeric chip, the system combine the excellent discriminative power of supercritical angle fluorescence (SAF) microscopy with high-throughput capabilities...

  4. A task-based parallelism and vectorized approach to 3D Method of Characteristics (MOC) reactor simulation for high performance computing architectures

    Science.gov (United States)

    Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.

    2016-05-01

    In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.

  5. Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics.

    Science.gov (United States)

    Kelly, Benjamin J; Fitch, James R; Hu, Yangqiu; Corsmeier, Donald J; Zhong, Huachun; Wetzel, Amy N; Nordquist, Russell D; Newsom, David L; White, Peter

    2015-01-20

    While advances in genome sequencing technology make population-scale genomics a possibility, current approaches for analysis of these data rely upon parallelization strategies that have limited scalability, complex implementation and lack reproducibility. Churchill, a balanced regional parallelization strategy, overcomes these challenges, fully automating the multiple steps required to go from raw sequencing reads to variant discovery. Through implementation of novel deterministic parallelization techniques, Churchill allows computationally efficient analysis of a high-depth whole genome sample in less than two hours. The method is highly scalable, enabling full analysis of the 1000 Genomes raw sequence dataset in a week using cloud resources. http://churchill.nchri.org/.

  6. High technology for radiation application

    International Nuclear Information System (INIS)

    Iida, Toshiyuki

    2005-03-01

    Fundamentals of radiations, radioactivity, and their applications in recent industrial, medical, agricultural and various research fields are reviewed. The book begins with historical description regarding to discovery of radiation at the end of 19th century and the exploration into the inside of an atom utilizing the radiation discovered, discovery of the neutron which finally leaded to nuclear energy liberation. Developments of radiation sources, including nuclear reactors, and charged-particle accelerators follow with simultaneous description on radiation measurement or detection technology. In medical fields, X-ray diagnosis, interventional radiology (IVR), nuclear medicine (PET and others), and radiation therapy are introduced. In pharmaceutical field, synthesis of labeled compounds and tracer techniques are explained. In industrial application, radiation-reinforced wires and heat-resistant cables whose economic effect can be estimated to amount to more than 10 12 yen, radiation mutation, food irradiation, and applied accelerators such as polymer modifications, decomposition of environmentally harmful substances, and ion-implantations important in semiconductor device fabrication. Finally, problems relating to general public such as radiation education and safety concept are also discussed. (S. Ohno)

  7. FY1995 next generation highly parallel database / dataminig server using 100 PC's and ATM switch; 1995 nendo tasudai no pasokon wo ATM ketsugoshita jisedai choheiretsu database mining server no kaihatsu

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-03-01

    The objective of the research is first to build a highly parallel processing system using 100 personal computers and an ATM switch. The former is a commodity for computer, while the latter can be regarded as a commodity for future communication systems. Second is to implement parallel relational database management system and parallel data mining system over the 100-PC cluster system. Third is to run decision-support queries typicalto data warehouses, to run association rule mining, and to prove the effectiveness of the proposed architecture as a next generation parallel database/datamining server. Performance/cost ratio of PC is significantly improved compared with workstations and proprietry systems due to its mass production. The cost of ATM switch is also considerably decreasing since ATM is being widely accepted as a communication-on infrastructure. By combining 100 PCs as computing commodities and ATM switch as a communication commodity, we built large sca-le parallel processing system inexpensively. Each mode employs the Pentium Pro CPU and the communication badwidth between PC's is more than 120Mbits/sec. A new parallel relational DBMS is design-ed and implemented. TPC-D, which is a standard benchmark for decision support applicants (100GBytes) is executed. Our system attained much higher performance than current commercial systems which are also much more expensive than ours. In addition, we developed a novel parallel data mining algorithm to extract associate rules. We implemented it in our system and succeeded toattain high performance. Thus it is verified that ATM connected PC cluster is very promising as a next generation platform for large scale database/dataminig server. (NEDO)

  8. FY1995 next generation highly parallel database / dataminig server using 100 PC's and ATM switch; 1995 nendo tasudai no pasokon wo ATM ketsugoshita jisedai choheiretsu database mining server no kaihatsu

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-03-01

    The objective of the research is first to build a highly parallel processing system using 100 personal computers and an ATM switch. The former is a commodity for computer, while the latter can be regarded as a commodity for future communication systems. Second is to implement parallel relational database management system and parallel data mining system over the 100-PC cluster system. Third is to run decision-support queries typicalto data warehouses, to run association rule mining, and to prove the effectiveness of the proposed architecture as a next generation parallel database/datamining server. Performance/cost ratio of PC is significantly improved compared with workstations and proprietry systems due to its mass production. The cost of ATM switch is also considerably decreasing since ATM is being widely accepted as a communication-on infrastructure. By combining 100 PCs as computing commodities and ATM switch as a communication commodity, we built large sca-le parallel processing system inexpensively. Each mode employs the Pentium Pro CPU and the communication badwidth between PC's is more than 120Mbits/sec. A new parallel relational DBMS is design-ed and implemented. TPC-D, which is a standard benchmark for decision support applicants (100GBytes) is executed. Our system attained much higher performance than current commercial systems which are also much more expensive than ours. In addition, we developed a novel parallel data mining algorithm to extract associate rules. We implemented it in our system and succeeded toattain high performance. Thus it is verified that ATM connected PC cluster is very promising as a next generation platform for large scale database/dataminig server. (NEDO)

  9. High-efficiency one-dimensional atom localization via two parallel standing-wave fields

    International Nuclear Information System (INIS)

    Wang, Zhiping; Wu, Xuqiang; Lu, Liang; Yu, Benli

    2014-01-01

    We present a new scheme of high-efficiency one-dimensional (1D) atom localization via measurement of upper state population or the probe absorption in a four-level N-type atomic system. By applying two classical standing-wave fields, the localization peak position and number, as well as the conditional position probability, can be easily controlled by the system parameters, and the sub-half-wavelength atom localization is also observed. More importantly, there is 100% detecting probability of the atom in the subwavelength domain when the corresponding conditions are satisfied. The proposed scheme may open up a promising way to achieve high-precision and high-efficiency 1D atom localization. (paper)

  10. Ceramics for high temperature applications

    International Nuclear Information System (INIS)

    Mocellin, A.

    1977-01-01

    Problems related to materials, their fabrication, properties, handling, improvements are examined. Silicium nitride and silicium carbide are obtained by vacuum hot-pressing, reaction sintering and chemical vapour deposition. Micrographs are shown. Mechanical properties i.e. room and high temperature strength, creep resistance fracture mechanics and fatigue resistance. Recent developments of pressureless sintered Si C and the Si-Al-O-N quaternary system are mentioned

  11. A Sparse Self-Consistent Field Algorithm and Its Parallel Implementation: Application to Density-Functional-Based Tight Binding.

    Science.gov (United States)

    Scemama, Anthony; Renon, Nicolas; Rapacioli, Mathias

    2014-06-10

    We present an algorithm and its parallel implementation for solving a self-consistent problem as encountered in Hartree-Fock or density functional theory. The algorithm takes advantage of the sparsity of matrices through the use of local molecular orbitals. The implementation allows one to exploit efficiently modern symmetric multiprocessing (SMP) computer architectures. As a first application, the algorithm is used within the density-functional-based tight binding method, for which most of the computational time is spent in the linear algebra routines (diagonalization of the Fock/Kohn-Sham matrix). We show that with this algorithm (i) single point calculations on very large systems (millions of atoms) can be performed on large SMP machines, (ii) calculations involving intermediate size systems (1000-100 000 atoms) are also strongly accelerated and can run efficiently on standard servers, and (iii) the error on the total energy due to the use of a cutoff in the molecular orbital coefficients can be controlled such that it remains smaller than the SCF convergence criterion.

  12. [Application of excitation-emission matrix spectrum combined with parallel factor analysis in dissolved organic matter in East China Sea].

    Science.gov (United States)

    Lü, Li-Sha; Zhao, Wei-Hong; Miao, Hui

    2013-03-01

    Using excitation-emission matrix spectrum(EEMs) combined with parallel factor analysis (PARAFAC) examine the fluorescent components feature of dissolved organic matter (DOM) sampled from East China Sea in the summer and autumn was examined. The type, distribution and origin of the fluorescence dissolved organic matter were also discussed. Three fluorescent components were identified by PARAFAC, including protein-like component C1 (235, 280/330), terrestrial or marine humic-like component C2 (255, 330/400) and terrestrial humic-like component C3 (275, 360/480). The good linearity of the two humic-like components showed the same source or some relationship between the chemical constitutions. As a whole, the level of the fluorescence intensity in coastal ocean was higher than that of the open ocean in different water layers in two seasons. The relationship of three components with chlorophyll-a and salinity showed the DOM in the study area is almost not influenced by the living algal matter, but the fresh water outflow of the Yangtze River might be the source of them in the Yangtze River estuary in Summer. From what has been discussed above, we can draw the conclusion that the application of EEM-PARAFAC modeling will exert a profound influence upon the research of the dissolved organic matter.

  13. Parallel algorithms

    CERN Document Server

    Casanova, Henri; Robert, Yves

    2008-01-01

    ""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi

  14. SOLVING BY PARALLEL COMPUTATION THE POISSON PROBLEM FOR HIGH INTENSITY BEAMS IN CIRCULAR ACCELERATORS

    International Nuclear Information System (INIS)

    LUCCIO, A.U.; DIMPERIO, N.L.; SAMULYAK, R.; BEEB-WANG, J.

    2001-01-01

    Simulation of high intensity accelerators leads to the solution of the Poisson Equation, to calculate space charge forces in the presence of acceleration chamber walls. We reduced the problem to ''two-and-a-half'' dimensions for long particle bunches, characteristic of large circular accelerators, and applied the results to the tracking code Orbit

  15. High-throughput liquid-liquid extraction in 96-well format: Parallel artificial liquid membrane extraction

    DEFF Research Database (Denmark)

    Gjelstad, Astrid; Andresen, Alf Terje; Dahlgren, Anders

    2017-01-01

    , highly efficient sample cleanup, and direct compatibility with liquid chromatography–mass spectrometry (LC–MS). The consumption of hazardous organic solvents is also almost eliminated using PALME as the sample preparation technique. This article summarizes current experiences with PALME, based on work...

  16. Parallel Störmer-Cowell methods for high-precision orbit computations

    NARCIS (Netherlands)

    P.J. van der Houwen; E. Messina; J.J.B. de Swart (Jacques)

    1998-01-01

    textabstractMany orbit problems in celestial mechanics are described by (nonstiff) initial-value problems (IVPs) for second-order ordinary differential equations of the form $y' = {bf f (y)$. The most successful integration methods are based on high-order Runge-Kutta-Nyström formulas. However, these

  17. RAID Disk Arrays for High Bandwidth Applications

    Science.gov (United States)

    Moren, Bill

    1996-01-01

    High bandwidth applications require large amounts of data transferred to/from storage devices at extremely high data rates. Further, these applications often are 'real time' in which access to the storage device must take place on the schedule of the data source, not the storage. A good example is a satellite downlink - the volume of data is quite large and the data rates quite high (dozens of MB/sec). Further, a telemetry downlink must take place while the satellite is overhead. A storage technology which is ideally suited to these types of applications is redundant arrays of independent discs (RAID). Raid storage technology, while offering differing methodologies for a variety of applications, supports the performance and redundancy required in real-time applications. Of the various RAID levels, RAID-3 is the only one which provides high data transfer rates under all operating conditions, including after a drive failure.

  18. Java parallel secure stream for grid computing

    International Nuclear Information System (INIS)

    Chen, J.; Akers, W.; Chen, Y.; Watson, W.

    2001-01-01

    The emergence of high speed wide area networks makes grid computing a reality. However grid applications that need reliable data transfer still have difficulties to achieve optimal TCP performance due to network tuning of TCP window size to improve the bandwidth and to reduce latency on a high speed wide area network. The authors present a pure Java package called JPARSS (Java Parallel Secure Stream) that divides data into partitions that are sent over several parallel Java streams simultaneously and allows Java or Web applications to achieve optimal TCP performance in a gird environment without the necessity of tuning the TCP window size. Several experimental results are provided to show that using parallel stream is more effective than tuning TCP window size. In addition X.509 certificate based single sign-on mechanism and SSL based connection establishment are integrated into this package. Finally a few applications using this package will be discussed

  19. High sensitivity and high Q-factor nanoslotted parallel quadrabeam photonic crystal cavity for real-time and label-free sensing

    Energy Technology Data Exchange (ETDEWEB)

    Yang, Daquan [Rowland Institute at Harvard University, Cambridge, Massachusetts 02142 (United States); State Key Laboratory of Information Photonics and Optical Communications, School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876 (China); School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138 (United States); Kita, Shota; Wang, Cheng; Lončar, Marko [School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138 (United States); Liang, Feng; Quan, Qimin [Rowland Institute at Harvard University, Cambridge, Massachusetts 02142 (United States); Tian, Huiping; Ji, Yuefeng [State Key Laboratory of Information Photonics and Optical Communications, School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876 (China)

    2014-08-11

    We experimentally demonstrate a label-free sensor based on nanoslotted parallel quadrabeam photonic crystal cavity (NPQC). The NPQC possesses both high sensitivity and high Q-factor. We achieved sensitivity (S) of 451 nm/refractive index unit and Q-factor >7000 in water at telecom wavelength range, featuring a sensor figure of merit >2000, an order of magnitude improvement over the previous photonic crystal sensors. In addition, we measured the streptavidin-biotin binding affinity and detected 10 ag/mL concentrated streptavidin in the phosphate buffered saline solution.

  20. Parallel analysis of film and TLD application in personal dosimetry of medical staff during application of invasive radiological procedures

    International Nuclear Information System (INIS)

    Misovic, M.; Boskovic, Z.; Spasic-Jokic, V.

    1997-01-01

    Although both types of dosimeters showed similar results for mentioned category of health care workers we wished to emphasize some advantages in use of TLD and film dosemeters in personal dosimetry. The main advantageous of film for dosimetric purposes are that it can provide visual representation of the radiation field and they are cheap, but there are lot of disadvantages. Advantages of TLD are based on: possibility for re-use, practically for whole users working life, small dimensions suitable for results, high precision and specially wide dose range. They are sensitive on low dose, practically for ten times more than film is. Disadvantages of TLD are based on their previous thermal and radiation history and on the fact that information about dose disappears after reading procedure. Considering advantages and disadvantages of both types of dosemeters we decided to propose TLD for routine hospital practice in personal dosimetry. (author)

  1. High-Level Application Framework for LCLS

    Energy Technology Data Exchange (ETDEWEB)

    Chu, P; Chevtsov, S.; Fairley, D.; Larrieu, C.; Rock, J.; Rogind, D.; White, G.; Zalazny, M.; /SLAC

    2008-04-22

    A framework for high level accelerator application software is being developed for the Linac Coherent Light Source (LCLS). The framework is based on plug-in technology developed by an open source project, Eclipse. Many existing functionalities provided by Eclipse are available to high-level applications written within this framework. The framework also contains static data storage configuration and dynamic data connectivity. Because the framework is Eclipse-based, it is highly compatible with any other Eclipse plug-ins. The entire infrastructure of the software framework will be presented. Planned applications and plug-ins based on the framework are also presented.

  2. Massively parallel mathematical sieves

    Energy Technology Data Exchange (ETDEWEB)

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  3. Applications of high-temperature superconductivity

    International Nuclear Information System (INIS)

    Malozemoff, A.P.; Gallagher, W.J.; Schwall, R.E.

    1987-01-01

    The new high temperature superconductors open up possibilities for applications in magnets, power transmission, computer interconnections, Josephson devices and instrumentation, among many others. The success of these applications hinges on many interlocking factors, including critical current density, critical fields, allowable processing temperatures, mechanical properties and chemical stability. An analysis of some of these factors suggests which applications may be the easiest to realize and which may have the greatest potential

  4. A real-time hybrid neuron network for highly parallel cognitive systems.

    Science.gov (United States)

    Christiaanse, Gerrit Jan; Zjajo, Amir; Galuzzi, Carlo; van Leuken, Rene

    2016-08-01

    For comprehensive understanding of how neurons communicate with each other, new tools need to be developed that can accurately mimic the behaviour of such neurons and neuron networks under `real-time' constraints. In this paper, we propose an easily customisable, highly pipelined, neuron network design, which executes optimally scheduled floating-point operations for maximal amount of biophysically plausible neurons per FPGA family type. To reduce the required amount of resources without adverse effect on the calculation latency, a single exponent instance is used for multiple neuron calculation operations. Experimental results indicate that the proposed network design allows the simulation of up to 1188 neurons on Virtex7 (XC7VX550T) device in brain real-time yielding a speed-up of x12.4 compared to the state-of-the art.

  5. High Efficiency Power Converter for Low Voltage High Power Applications

    DEFF Research Database (Denmark)

    Nymand, Morten

    The topic of this thesis is the design of high efficiency power electronic dc-to-dc converters for high-power, low-input-voltage to high-output-voltage applications. These converters are increasingly required for emerging sustainable energy systems such as fuel cell, battery or photo voltaic based...

  6. Potential applications of high temperature helium

    International Nuclear Information System (INIS)

    Schleicher, R.W. Jr.; Kennedy, A.J.

    1992-09-01

    This paper discusses the DOE MHTGR-SC program's recent activity to improve the economics of the MHTGR without sacrificing safety performance and two potential applications of high temperature helium, the MHTGR gas turbine plant and a process heat application for methanol production from coal

  7. Choosing processor array configuration by performance modeling for a highly parallel linear algebra algorithm

    International Nuclear Information System (INIS)

    Littlefield, R.J.; Maschhoff, K.J.

    1991-04-01

    Many linear algebra algorithms utilize an array of processors across which matrices are distributed. Given a particular matrix size and a maximum number of processors, what configuration of processors, i.e., what size and shape array, will execute the fastest? The answer to this question depends on tradeoffs between load balancing, communication startup and transfer costs, and computational overhead. In this paper we analyze in detail one algorithm: the blocked factored Jacobi method for solving dense eigensystems. A performance model is developed to predict execution time as a function of the processor array and matrix sizes, plus the basic computation and communication speeds of the underlying computer system. In experiments on a large hypercube (up to 512 processors), this model has been found to be highly accurate (mean error ∼ 2%) over a wide range of matrix sizes (10 x 10 through 200 x 200) and processor counts (1 to 512). The model reveals, and direct experiment confirms, that the tradeoffs mentioned above can be surprisingly complex and counterintuitive. We propose decision procedures based directly on the performance model to choose configurations for fastest execution. The model-based decision procedures are compared to a heuristic strategy and shown to be significantly better. 7 refs., 8 figs., 1 tab

  8. The One-node CMFD Computational Framework for Highly Parallel Reactor Analysis

    International Nuclear Information System (INIS)

    Kim, Yong Hee

    2016-01-01

    This paper presents one such possible approach named the One-Node and Two-Node Hybrid CMFD method; One-Node CMFD is used to solve the global problem, while Two-Node NEM (Nodal Expansion Method) CMFD replaces FMFD as the local problem. Rapid advancement in computing capabilities has enabled the pursuit for high-fidelity tools in analyzing the reactor configurations. One such tool that gains a lot of recent attention is the pin-by-pin core calculation via local-global iteration with nonlinear acceleration scheme. One-Node CMFD method uses two correction terms to preserve the interface current and flux, in contrast to one correction term employed by conventional nonlinear iterative method proposed. With two correction terms, One-Node CMFD can reduce computing time of local FMFD (Fine-Mesh Finite Difference) as it nonlinearly couples the CMFD and FMFD methods in global-local iterations. Nonetheless, retrieving the pinlevel information from the local FMFD is very time consuming. As such, a more flexible approach is needed to address this challenge.

  9. WATERLOPP V2/64: A highly parallel machine for numerical computation

    Science.gov (United States)

    Ostlund, Neil S.

    1985-07-01

    Current technological trends suggest that the high performance scientific machines of the future are very likely to consist of a large number (greater than 1024) of processors connected and communicating with each other in some as yet undetermined manner. Such an assembly of processors should behave as a single machine in obtaining numerical solutions to scientific problems. However, the appropriate way of organizing both the hardware and software of such an assembly of processors is an unsolved and active area of research. It is particularly important to minimize the organizational overhead of interprocessor comunication, global synchronization, and contention for shared resources if the performance of a large number ( n) of processors is to be anything like the desirable n times the performance of a single processor. In many situations, adding a processor actually decreases the performance of the overall system since the extra organizational overhead is larger than the extra processing power added. The systolic loop architecture is a new multiple processor architecture which attemps at a solution to the problem of how to organize a large number of asynchronous processors into an effective computational system while minimizing the organizational overhead. This paper gives a brief overview of the basic systolic loop architecture, systolic loop algorithms for numerical computation, and a 64-processor implementation of the architecture, WATERLOOP V2/64, that is being used as a testbed for exploring the hardware, software, and algorithmic aspects of the architecture.

  10. Parallel magnetic resonance imaging

    International Nuclear Information System (INIS)

    Larkman, David J; Nunes, Rita G

    2007-01-01

    Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed. (invited topical review)

  11. Application of High Temperature Superconductors to Accelerators

    CERN Document Server

    Ballarino, A

    2000-01-01

    Since the discovery of high temperature superconductivity, a large effort has been made by the scientific community to investigate this field towards a possible application of the new oxide superconductors to different devices like SMES, magnetic bearings, flywheels energy storage, magnetic shielding, transmission cables, fault current limiters, etc. However, all present day large scale applications using superconductivity in accelerator technology are based on conventional materials operating at liquid helium temperatures. Poor mechanical properties, low critical current density and sensitivity to the magnetic field at high temperature are the key parameters whose improvement is essential for a large scale application of high temperature superconductors to such devices. Current leads, used for transferring currents from the power converters, working at room temperature, into the liquid helium environment, where the magnets are operating, represent an immediate application of the emerging technology of high t...

  12. Development and application of a 6.5 million feature affymetrix genechip® for massively parallel discovery of single position polymorphisms in lettuce (Lactuca spp.)

    OpenAIRE

    Stoffel, Kevin; van Leeuwen, Hans; Kozik, Alexander; Caldwell, David; Ashrafi, Hamid; Cui, Xinping; Tan, Xiaoping; Hill, Theresa; Reyes-Chin-Wo, Sebastian; Truco, Maria-Jose; Michelmore, Richard W; Van Deynze, Allen

    2012-01-01

    Abstract Background High-resolution genetic maps are needed in many crops to help characterize the genetic diversity that determines agriculturally important traits. Hybridization to microarrays to detect single feature polymorphisms is a powerful technique for marker discovery and genotyping because of its highly parallel nature. However, microarrays designed for gene expression analysis rarely provide sufficient gene coverage for optimal detection o...

  13. Parallel Fortran-MPI software for numerical inversion of the Laplace transform and its application to oscillatory water levels in groundwater environments

    Science.gov (United States)

    Zhan, X.

    2005-01-01

    A parallel Fortran-MPI (Message Passing Interface) software for numerical inversion of the Laplace transform based on a Fourier series method is developed to meet the need of solving intensive computational problems involving oscillatory water level's response to hydraulic tests in a groundwater environment. The software is a parallel version of ACM (The Association for Computing Machinery) Transactions on Mathematical Software (TOMS) Algorithm 796. Running 38 test examples indicated that implementation of MPI techniques with distributed memory architecture speedups the processing and improves the efficiency. Applications to oscillatory water levels in a well during aquifer tests are presented to illustrate how this package can be applied to solve complicated environmental problems involved in differential and integral equations. The package is free and is easy to use for people with little or no previous experience in using MPI but who wish to get off to a quick start in parallel computing. ?? 2004 Elsevier Ltd. All rights reserved.

  14. Parallel Algorithm for GPU Processing; for use in High Speed Machine Vision Sensing of Cotton Lint Trash

    Directory of Open Access Journals (Sweden)

    Mathew G. Pelletier

    2008-02-01

    Full Text Available One of the main hurdles standing in the way of optimal cleaning of cotton lint isthe lack of sensing systems that can react fast enough to provide the control system withreal-time information as to the level of trash contamination of the cotton lint. This researchexamines the use of programmable graphic processing units (GPU as an alternative to thePC’s traditional use of the central processing unit (CPU. The use of the GPU, as analternative computation platform, allowed for the machine vision system to gain asignificant improvement in processing time. By improving the processing time, thisresearch seeks to address the lack of availability of rapid trash sensing systems and thusalleviate a situation in which the current systems view the cotton lint either well before, orafter, the cotton is cleaned. This extended lag/lead time that is currently imposed on thecotton trash cleaning control systems, is what is responsible for system operators utilizing avery large dead-band safety buffer in order to ensure that the cotton lint is not undercleaned.Unfortunately, the utilization of a large dead-band buffer results in the majority ofthe cotton lint being over-cleaned which in turn causes lint fiber-damage as well assignificant losses of the valuable lint due to the excessive use of cleaning machinery. Thisresearch estimates that upwards of a 30% reduction in lint loss could be gained through theuse of a tightly coupled trash sensor to the cleaning machinery control systems. Thisresearch seeks to improve processing times through the development of a new algorithm forcotton trash sensing that allows for implementation on a highly parallel architecture.Additionally, by moving the new parallel algorithm onto an alternative computing platform,the graphic processing unit “GPU”, for processing of the cotton trash images, a speed up ofover 6.5 times, over optimized code running on the PC’s central processing

  15. Service Oriented Architecture for High Level Applications

    International Nuclear Information System (INIS)

    Chu, P.

    2012-01-01

    Standalone high level applications often suffer from poor performance and reliability due to lengthy initialization, heavy computation and rapid graphical update. Service-oriented architecture (SOA) is trying to separate the initialization and computation from applications and to distribute such work to various service providers. Heavy computation such as beam tracking will be done periodically on a dedicated server and data will be available to client applications at all time. Industrial standard service architecture can help to improve the performance, reliability and maintainability of the service. Robustness will also be improved by reducing the complexity of individual client applications.

  16. Resistor Combinations for Parallel Circuits.

    Science.gov (United States)

    McTernan, James P.

    1978-01-01

    To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)

  17. X-ray computed tomography comparison of individual and parallel assembled commercial lithium iron phosphate batteries at end of life after high rate cycling

    Science.gov (United States)

    Carter, Rachel; Huhman, Brett; Love, Corey T.; Zenyuk, Iryna V.

    2018-03-01

    X-ray computed tomography (X-ray CT) across multiple length scales is utilized for the first time to investigate the physical abuse of high C-rate pulsed discharge on cells wired individually and in parallel.. Manufactured lithium iron phosphate cells boasting high rate capability were pulse power tested in both wiring conditions with high discharge currents of 10C for a high number of cycles (up to 1200) until end of life (health (SOH) monitoring methods, is diagnosed using CT by rendering the interior current collector without harm or alteration to the active materials. Correlation of CT observations to the electrochemical pulse data from the parallel-wired cells reveals the risk of parallel wiring during high C-rate pulse discharge.

  18. Parallel assembling and equation solving via graph algorithms with an application to the FE simulation of metal extrusion processes

    CERN Document Server

    Unterkircher, A

    2005-01-01

    We propose methods for parallel assembling and iterative equation solving based on graph algorithms. The assembling technique is independent of dimension, element type and model shape. As a parallel solving technique we construct a multiplicative symmetric Schwarz preconditioner for the conjugate gradient method. Both methods have been incorporated into a non-linear FE code to simulate 3D metal extrusion processes. We illustrate the efficiency of these methods on shared memory computers by realistic examples.

  19. Portable and Transparent Message Compression in MPI Libraries to Improve the Performance and Scalability of Parallel Applications

    Energy Technology Data Exchange (ETDEWEB)

    Albonesi, David; Burtscher, Martin

    2009-04-17

    The goal of this project has been to develop a lossless compression algorithm for message-passing libraries that can accelerate HPC systems by reducing the communication time. Because both compression and decompression have to be performed in software in real time, the algorithm has to be extremely fast while still delivering a good compression ratio. During the first half of this project, they designed a new compression algorithm called FPC for scientific double-precision data, made the source code available on the web, and published two papers describing its operation, the first in the proceedings of the Data Compression Conference and the second in the IEEE Transactions on Computers. At comparable average compression ratios, this algorithm compresses and decompresses 10 to 100 times faster than BZIP2, DFCM, FSD, GZIP, and PLMI on the three architectures tested. With prediction tables that fit into the CPU's L1 data acache, FPC delivers a guaranteed throughput of six gigabits per second on a 1.6 GHz Itanium 2 system. The C source code and documentation of FPC are posted on-line and have already been downloaded hundreds of times. To evaluate FPC, they gathered 13 real-world scientific datasets from around the globe, including satellite data, crash-simulation data, and messages from HPC systems. Based on the large number of requests they received, they also made these datasets available to the community (with permission of the original sources). While FPC represents a great step forward, it soon became clear that its throughput was too slow for the emerging 10 gigabits per second networks. Hence, no speedup can be gained by including this algorithm in an MPI library. They therefore changed the aim of the second half of the project. Instead of implementing FPC in an MPI library, they refocused their efforts to develop a parallel compression algorithm to further boost the throughput. After all, all modern high-end microprocessors contain multiple CPUs on a

  20. Totally parallel multilevel algorithms

    Science.gov (United States)

    Frederickson, Paul O.

    1988-01-01

    Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.