Streaming for Functional Data-Parallel Languages
DEFF Research Database (Denmark)
Madsen, Frederik Meisner
In this thesis, we investigate streaming as a general solution to the space inefficiency commonly found in functional data-parallel programming languages. The data-parallel paradigm maps well to parallel SIMD-style hardware. However, the traditional fully materializing execution strategy...... by extending two existing data-parallel languages: NESL and Accelerate. In the extensions we map bulk operations to data-parallel streams that can evaluate fully sequential, fully parallel or anything in between. By a dataflow, piecewise parallel execution strategy, the runtime system can adjust to any target...... flattening necessitates all sub-computations to materialize at the same time. For example, naive n by n matrix multiplication requires n^3 space in NESL because the algorithm contains n^3 independent scalar multiplications. For large values of n, this is completely unacceptable. We address the problem...
Density functional theory and parallel processing
International Nuclear Information System (INIS)
Ward, R.C.; Geist, G.A.; Butler, W.H.
1987-01-01
The authors demonstrate a method for obtaining the ground state energies and charge densities of a system of atoms described within density functional theory using simulated annealing on a parallel computer
Bessel functions: parallel display and processing.
Lohmann, A W; Ojeda-Castañeda, J; Serrano-Heredia, A
1994-01-01
We present an optical setup that converts planar binary curves into two-dimensional amplitude distributions, which are proportional, along one axis, to the Bessel function of order n, whereas along the other axis the order n increases. This Bessel displayer can be used for parallel Bessel transformation of a signal. Experimental verifications are included.
Parallel transaction processing in functional languages, towards practical functional databases
Wevers, L.; Huisman, Marieke; de Keijzer, Ander
2013-01-01
This paper shows how functional languages can be adapted for transaction processing, and discusses the implementation of a parallel runtime system for such functional transaction processing languages. We extend functional languages with current state variables and result state variables to allow the
Massively parallel sparse matrix function calculations with NTPoly
Dawson, William; Nakajima, Takahito
2018-04-01
We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
When do evolutionary algorithms optimize separable functions in parallel?
DEFF Research Database (Denmark)
Doerr, Benjamin; Sudholt, Dirk; Witt, Carsten
2013-01-01
is that evolutionary algorithms make progress on all subfunctions in parallel, so that optimizing a separable function does not take not much longer than optimizing the hardest subfunction-subfunctions are optimized "in parallel." We show that this is only partially true, already for the simple (1+1) evolutionary...... algorithm ((1+1) EA). For separable functions composed of k Boolean functions indeed the optimization time is the maximum optimization time of these functions times a small O(log k) overhead. More generally, for sums of weighted subfunctions that each attain non-negative integer values less than r = o(log1...
High temporal resolution functional MRI using parallel echo volumar imaging
International Nuclear Information System (INIS)
Rabrait, C.; Ciuciu, P.; Ribes, A.; Poupon, C.; Dehaine-Lambertz, G.; LeBihan, D.; Lethimonnier, F.; Le Roux, P.; Dehaine-Lambertz, G.
2008-01-01
Purpose: To combine parallel imaging with 3D single-shot acquisition (echo volumar imaging, EVI) in order to acquire high temporal resolution volumar functional MRI (fMRI) data. Materials and Methods: An improved EVI sequence was associated with parallel acquisition and field of view reduction in order to acquire a large brain volume in 200 msec. Temporal stability and functional sensitivity were increased through optimization of all imaging parameters and Tikhonov regularization of parallel reconstruction. Two human volunteers were scanned with parallel EVI in a 1.5 T whole-body MR system, while submitted to a slow event-related auditory paradigm. Results: Thanks to parallel acquisition, the EVI volumes display a low level of geometric distortions and signal losses. After removal of low-frequency drifts and physiological artifacts,activations were detected in the temporal lobes of both volunteers and voxel-wise hemodynamic response functions (HRF) could be computed. On these HRF different habituation behaviors in response to sentence repetition could be identified. Conclusion: This work demonstrates the feasibility of high temporal resolution 3D fMRI with parallel EVI. Combined with advanced estimation tools,this acquisition method should prove useful to measure neural activity timing differences or study the nonlinearities and non-stationarities of the BOLD response. (authors)
Rocket measurement of auroral partial parallel distribution functions
Lin, C.-A.
1980-01-01
The auroral partial parallel distribution functions are obtained by using the observed energy spectra of electrons. The experiment package was launched by a Nike-Tomahawk rocket from Poker Flat, Alaska over a bright auroral band and covered an altitude range of up to 180 km. Calculated partial distribution functions are presented with emphasis on their slopes. The implications of the slopes are discussed. It should be pointed out that the slope of the partial parallel distribution function obtained from one energy spectra will be changed by superposing another energy spectra on it.
Parallel keyed hash function construction based on chaotic maps
International Nuclear Information System (INIS)
Xiao Di; Liao Xiaofeng; Deng Shaojiang
2008-01-01
Recently, a variety of chaos-based hash functions have been proposed. Nevertheless, none of them works efficiently in parallel computing environment. In this Letter, an algorithm for parallel keyed hash function construction is proposed, whose structure can ensure the uniform sensitivity of hash value to the message. By means of the mechanism of both changeable-parameter and self-synchronization, the keystream establishes a close relation with the algorithm key, the content and the order of each message block. The entire message is modulated into the chaotic iteration orbit, and the coarse-graining trajectory is extracted as the hash value. Theoretical analysis and computer simulation indicate that the proposed algorithm can satisfy the performance requirements of hash function. It is simple, efficient, practicable, and reliable. These properties make it a good choice for hash on parallel computing platform
A hybrid method for the parallel computation of Green's functions
DEFF Research Database (Denmark)
Petersen, Dan Erik; Li, Song; Stokbro, Kurt
2009-01-01
of the large number of times this calculation needs to be performed, this is computationally very expensive even on supercomputers. The classical approach is based on recurrence formulas which cannot be efficiently parallelized. This practically prevents the solution of large problems with hundreds...... of thousands of atoms. We propose new recurrences for a general class of sparse matrices to calculate Green's and lesser Green's function matrices which extend formulas derived by Takahashi and others. We show that these recurrences may lead to a dramatically reduced computational cost because they only...... require computing a small number of entries of the inverse matrix. Then. we propose a parallelization strategy for block tridiagonal matrices which involves a combination of Schur complement calculations and cyclic reduction. It achieves good scalability even on problems of modest size....
Administering truncated receive functions in a parallel messaging interface
Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E
2014-12-09
Administering truncated receive functions in a parallel messaging interface (`PMI`) of a parallel computer comprising a plurality of compute nodes coupled for data communications through the PMI and through a data communications network, including: sending, through the PMI on a source compute node, a quantity of data from the source compute node to a destination compute node; specifying, by an application on the destination compute node, a portion of the quantity of data to be received by the application on the destination compute node and a portion of the quantity of data to be discarded; receiving, by the PMI on the destination compute node, all of the quantity of data; providing, by the PMI on the destination compute node to the application on the destination compute node, only the portion of the quantity of data to be received by the application; and discarding, by the PMI on the destination compute node, the portion of the quantity of data to be discarded.
Massively Parallel Interrogation of Aptamer Sequence, Structure and Function
Energy Technology Data Exchange (ETDEWEB)
Fischer, N O; Tok, J B; Tarasow, T M
2008-02-08
Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. Methodology/Principal Findings. High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and interchip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.
Massively parallel interrogation of aptamer sequence, structure and function.
Directory of Open Access Journals (Sweden)
Nicholas O Fischer
Full Text Available BACKGROUND: Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. METHODOLOGY/PRINCIPAL FINDINGS: High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and inter-chip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. CONCLUSION AND SIGNIFICANCE: The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.
A hybrid method for the parallel computation of Green's functions
International Nuclear Information System (INIS)
Petersen, Dan Erik; Li Song; Stokbro, Kurt; Sorensen, Hans Henrik B.; Hansen, Per Christian; Skelboe, Stig; Darve, Eric
2009-01-01
Quantum transport models for nanodevices using the non-equilibrium Green's function method require the repeated calculation of the block tridiagonal part of the Green's and lesser Green's function matrices. This problem is related to the calculation of the inverse of a sparse matrix. Because of the large number of times this calculation needs to be performed, this is computationally very expensive even on supercomputers. The classical approach is based on recurrence formulas which cannot be efficiently parallelized. This practically prevents the solution of large problems with hundreds of thousands of atoms. We propose new recurrences for a general class of sparse matrices to calculate Green's and lesser Green's function matrices which extend formulas derived by Takahashi and others. We show that these recurrences may lead to a dramatically reduced computational cost because they only require computing a small number of entries of the inverse matrix. Then, we propose a parallelization strategy for block tridiagonal matrices which involves a combination of Schur complement calculations and cyclic reduction. It achieves good scalability even on problems of modest size.
Zhong, Jidan; Rifkin-Graboi, Anne; Ta, Anh Tuan; Yap, Kar Lai; Chuang, Kai-Hsiang; Meaney, Michael J; Qiu, Anqi
2014-07-01
Children begin performing similarly to adults on tasks requiring executive functions in late childhood, a transition that is probably due to neuroanatomical fine-tuning processes, including myelination and synaptic pruning. In parallel to such structural changes in neuroanatomical organization, development of functional organization may also be associated with cognitive behaviors in children. We examined 6- to 10-year-old children's cortical thickness, functional organization, and cognitive performance. We used structural magnetic resonance imaging (MRI) to identify areas with cortical thinning, resting-state fMRI to identify functional organization in parallel to cortical development, and working memory/response inhibition tasks to assess executive functioning. We found that neuroanatomical changes in the form of cortical thinning spread over bilateral frontal, parietal, and occipital regions. These regions were engaged in 3 functional networks: sensorimotor and auditory, executive control, and default mode network. Furthermore, we found that working memory and response inhibition only associated with regional functional connectivity, but not topological organization (i.e., local and global efficiency of information transfer) of these functional networks. Interestingly, functional connections associated with "bottom-up" as opposed to "top-down" processing were more clearly related to children's performance on working memory and response inhibition, implying an important role for brain systems involved in late childhood. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A two-level parallel direct search implementation for arbitrarily sized objective functions
Energy Technology Data Exchange (ETDEWEB)
Hutchinson, S.A.; Shadid, N.; Moffat, H.K. [Sandia National Labs., Albuquerque, NM (United States)] [and others
1994-12-31
In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p{sub 2}) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.
Functional Parallel Factor Analysis for Functions of One- and Two-dimensional Arguments.
Choi, Ji Yeh; Hwang, Heungsun; Timmerman, Marieke E
2018-03-01
Parallel factor analysis (PARAFAC) is a useful multivariate method for decomposing three-way data that consist of three different types of entities simultaneously. This method estimates trilinear components, each of which is a low-dimensional representation of a set of entities, often called a mode, to explain the maximum variance of the data. Functional PARAFAC permits the entities in different modes to be smooth functions or curves, varying over a continuum, rather than a collection of unconnected responses. The existing functional PARAFAC methods handle functions of a one-dimensional argument (e.g., time) only. In this paper, we propose a new extension of functional PARAFAC for handling three-way data whose responses are sequenced along both a two-dimensional domain (e.g., a plane with x- and y-axis coordinates) and a one-dimensional argument. Technically, the proposed method combines PARAFAC with basis function expansion approximations, using a set of piecewise quadratic finite element basis functions for estimating two-dimensional smooth functions and a set of one-dimensional basis functions for estimating one-dimensional smooth functions. In a simulation study, the proposed method appeared to outperform the conventional PARAFAC. We apply the method to EEG data to demonstrate its empirical usefulness.
Improving the security of a parallel keyed hash function based on chaotic maps
Energy Technology Data Exchange (ETDEWEB)
Xiao Di, E-mail: xiaodi_cqu@hotmail.co [College of Computer Science and Engineering, Chongqing University, Chongqing 400044 (China); Liao Xiaofeng [College of Computer Science and Engineering, Chongqing University, Chongqing 400044 (China); Wang Yong [College of Computer Science and Engineering, Chongqing University, Chongqing 400044 (China)] [College of Economy and Management, Chongqing University of Posts and Telecommunications, Chongqing 400065 (China)
2009-11-23
In this Letter, we analyze the cause of vulnerability of the original parallel keyed hash function based on chaotic maps in detail, and then propose the corresponding enhancement measures. Theoretical analysis and computer simulation indicate that the modified hash function is more secure than the original one. At the same time, it can keep the parallel merit and satisfy the other performance requirements of hash function.
Improving the security of a parallel keyed hash function based on chaotic maps
International Nuclear Information System (INIS)
Xiao Di; Liao Xiaofeng; Wang Yong
2009-01-01
In this Letter, we analyze the cause of vulnerability of the original parallel keyed hash function based on chaotic maps in detail, and then propose the corresponding enhancement measures. Theoretical analysis and computer simulation indicate that the modified hash function is more secure than the original one. At the same time, it can keep the parallel merit and satisfy the other performance requirements of hash function.
Simplifying the parallelization of scientific codes by a function-centric approach in Python
International Nuclear Information System (INIS)
Nilsen, Jon K; Cai Xing; Langtangen, Hans Petter; Hoeyland, Bjoern
2010-01-01
The purpose of this paper is to show how existing scientific software can be parallelized using a separate thin layer of Python code where all parallelization-specific tasks are implemented. We provide specific examples of such a Python code layer, which can act as templates for parallelizing a wide set of serial scientific codes. The use of Python for parallelization is motivated by the fact that the language is well suited for reusing existing serial codes programmed in other languages. The extreme flexibility of Python with regard to handling functions makes it very easy to wrap up decomposed computational tasks of a serial scientific application as Python functions. Many parallelization-specific components can be implemented as generic Python functions, which may take as input those wrapped functions that perform concrete computational tasks. The overall programming effort needed by this parallelization approach is limited, and the resulting parallel Python scripts have a compact and clean structure. The usefulness of the parallelization approach is exemplified by three different classes of application in natural and social sciences.
Approximate inference for spatial functional data on massively parallel processors
DEFF Research Database (Denmark)
Raket, Lars Lau; Markussen, Bo
2014-01-01
With continually increasing data sizes, the relevance of the big n problem of classical likelihood approaches is greater than ever. The functional mixed-effects model is a well established class of models for analyzing functional data. Spatial functional data in a mixed-effects setting...... in linear time. An extremely efficient GPU implementation is presented, and the proposed methods are illustrated by conducting a classical statistical analysis of 2D chromatography data consisting of more than 140 million spatially correlated observation points....
Unpacking the cognitive map: the parallel map theory of hippocampal function.
Jacobs, Lucia F; Schenk, Françoise
2003-04-01
In the parallel map theory, the hippocampus encodes space with 2 mapping systems. The bearing map is constructed primarily in the dentate gyrus from directional cues such as stimulus gradients. The sketch map is constructed within the hippocampus proper from positional cues. The integrated map emerges when data from the bearing and sketch maps are combined. Because the component maps work in parallel, the impairment of one can reveal residual learning by the other. Such parallel function may explain paradoxes of spatial learning, such as learning after partial hippocampal lesions, taxonomic and sex differences in spatial learning, and the function of hippocampal neurogenesis. By integrating evidence from physiology to phylogeny, the parallel map theory offers a unified explanation for hippocampal function.
The Effects of Stress and Executive Functions on Decision Making in an Executive Parallel Task
McGuigan, Brian
2016-01-01
The aim of this study was to investigate the effects of acute stress on parallel task performance with the Game of Dice Task (GDT) to measure decision making and the Stroop test. Two previous studies have found that the combination of stress and a parallel task with the GDT and an executive functions task preserved performance on the GDT for a stress group compared to a control group. The purpose of this study was to create and use a new parallel task with the GDT and the stroop test to elu...
Cryptanalysis on a parallel keyed hash function based on chaotic maps
International Nuclear Information System (INIS)
Guo Wei; Wang Xiaoming; He Dake; Cao Yang
2009-01-01
This Letter analyzes the security of a novel parallel keyed hash function based on chaotic maps, proposed by Xiao et al. to improve the efficiency in parallel computing environment. We show how to devise forgery attacks on Xiao's scheme with differential cryptanalysis and give the experiment results of two kinds of forgery attacks firstly. Furthermore, we discuss the problem of weak keys in the scheme and demonstrate how to utilize weak keys to construct collision.
Liu, Zhen; Qi, Fei-Yan; Zhou, Xin; Ren, Hai-Qing; Shi, Peng
2014-09-01
Echolocation is a sensory system whereby certain mammals navigate and forage using sound waves, usually in environments where visibility is limited. Curiously, echolocation has evolved independently in bats and whales, which occupy entirely different environments. Based on this phenotypic convergence, recent studies identified several echolocation-related genes with parallel sites at the protein sequence level among different echolocating mammals, and among these, prestin seems the most promising. Although previous studies analyzed the evolutionary mechanism of prestin, the functional roles of the parallel sites in the evolution of mammalian echolocation are not clear. By functional assays, we show that a key parameter of prestin function, 1/α, is increased in all echolocating mammals and that the N7T parallel substitution accounted for this functional convergence. Moreover, another parameter, V1/2, was shifted toward the depolarization direction in a toothed whale, the bottlenose dolphin (Tursiops truncatus) and a constant-frequency (CF) bat, the Stoliczka's trident bat (Aselliscus stoliczkanus). The parallel site of I384T between toothed whales and CF bats was responsible for this functional convergence. Furthermore, the two parameters (1/α and V1/2) were correlated with mammalian high-frequency hearing, suggesting that the convergent changes of the prestin function in echolocating mammals may play important roles in mammalian echolocation. To our knowledge, these findings present the functional patterns of echolocation-related genes in echolocating mammals for the first time and rigorously demonstrate adaptive parallel evolution at the protein sequence level, paving the way to insights into the molecular mechanism underlying mammalian echolocation. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Topology in SU(2) lattice gauge theory and parallelization of functional magnetic resonance imaging
Energy Technology Data Exchange (ETDEWEB)
Solbrig, Stefan
2008-07-01
In this thesis, I discuss topological properties of quenched SU(2) lattice gauge fields. In particular, clusters of topological charge density exhibit a power-law. The exponent of that power-law can be used to validate models for lattice gauge fields. Instead of working with fixed cutoffs of the topological charge density, using the notion of a ''watermark'' is more convenient. Furthermore, I discuss how a parallel computer, originally designed for lattice gauge field simulations, can be used for functional magnetic resonance imaging. Multi parameter fits can be parallelized to achieve almost real-time evaluation of fMRI data. (orig.)
Topology in SU(2) lattice gauge theory and parallelization of functional magnetic resonance imaging
International Nuclear Information System (INIS)
Solbrig, Stefan
2008-01-01
In this thesis, I discuss topological properties of quenched SU(2) lattice gauge fields. In particular, clusters of topological charge density exhibit a power-law. The exponent of that power-law can be used to validate models for lattice gauge fields. Instead of working with fixed cutoffs of the topological charge density, using the notion of a ''watermark'' is more convenient. Furthermore, I discuss how a parallel computer, originally designed for lattice gauge field simulations, can be used for functional magnetic resonance imaging. Multi parameter fits can be parallelized to achieve almost real-time evaluation of fMRI data. (orig.)
Topology in SU(2) lattice gauge theory and parallelization of functional magnetic resonance imaging
Energy Technology Data Exchange (ETDEWEB)
Solbrig, Stefan
2008-07-01
In this thesis, I discuss topological properties of quenched SU(2) lattice gauge fields. In particular, clusters of topological charge density exhibit a power-law. The exponent of that power-law can be used to validate models for lattice gauge fields. Instead of working with fixed cutoffs of the topological charge density, using the notion of a ''watermark'' is more convenient. Furthermore, I discuss how a parallel computer, originally designed for lattice gauge field simulations, can be used for functional magnetic resonance imaging. Multi parameter fits can be parallelized to achieve almost real-time evaluation of fMRI data. (orig.)
DGDFT: A massively parallel method for large scale density functional theory calculations.
Hu, Wei; Lin, Lin; Yang, Chao
2015-09-28
We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.
DGDFT: A massively parallel method for large scale density functional theory calculations
Energy Technology Data Exchange (ETDEWEB)
Hu, Wei, E-mail: whu@lbl.gov; Yang, Chao, E-mail: cyang@lbl.gov [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Lin, Lin, E-mail: linlin@math.berkeley.edu [Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720 (United States); Department of Mathematics, University of California, Berkeley, California 94720 (United States)
2015-09-28
We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10{sup −4} Hartree/atom in terms of the error of energy and 6.2 × 10{sup −4} Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.
DGDFT: A massively parallel method for large scale density functional theory calculations
International Nuclear Information System (INIS)
Hu, Wei; Yang, Chao; Lin, Lin
2015-01-01
We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10 −4 Hartree/atom in terms of the error of energy and 6.2 × 10 −4 Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail
Storing files in a parallel computing system based on user-specified parser function
Faibish, Sorin; Bent, John M; Tzelnic, Percy; Grider, Gary; Manzanares, Adam; Torres, Aaron
2014-10-21
Techniques are provided for storing files in a parallel computing system based on a user-specified parser function. A plurality of files generated by a distributed application in a parallel computing system are stored by obtaining a parser from the distributed application for processing the plurality of files prior to storage; and storing one or more of the plurality of files in one or more storage nodes of the parallel computing system based on the processing by the parser. The plurality of files comprise one or more of a plurality of complete files and a plurality of sub-files. The parser can optionally store only those files that satisfy one or more semantic requirements of the parser. The parser can also extract metadata from one or more of the files and the extracted metadata can be stored with one or more of the plurality of files and used for searching for files.
A massively-parallel electronic-structure calculations based on real-space density functional theory
International Nuclear Information System (INIS)
Iwata, Jun-Ichi; Takahashi, Daisuke; Oshiyama, Atsushi; Boku, Taisuke; Shiraishi, Kenji; Okada, Susumu; Yabana, Kazuhiro
2010-01-01
Based on the real-space finite-difference method, we have developed a first-principles density functional program that efficiently performs large-scale calculations on massively-parallel computers. In addition to efficient parallel implementation, we also implemented several computational improvements, substantially reducing the computational costs of O(N 3 ) operations such as the Gram-Schmidt procedure and subspace diagonalization. Using the program on a massively-parallel computer cluster with a theoretical peak performance of several TFLOPS, we perform electronic-structure calculations for a system consisting of over 10,000 Si atoms, and obtain a self-consistent electronic-structure in a few hundred hours. We analyze in detail the costs of the program in terms of computation and of inter-node communications to clarify the efficiency, the applicability, and the possibility for further improvements.
Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A L
2012-06-13
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.
2012-06-01
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
International Nuclear Information System (INIS)
Andrade, Xavier; Aspuru-Guzik, Alán; Alberdi-Rodriguez, Joseba; Rubio, Angel; Strubbe, David A; Louie, Steven G; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Marques, Miguel A L
2012-01-01
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures. (topical review)
Functional Parallel Factor Analysis for Functions of One- and Two-dimensional Arguments
Choi, Ji Yeh; Hwang, Heungsun; Timmerman, Marieke
Parallel factor analysis (PARAFAC) is a useful multivariate method for decomposing three-way data that consist of three different types of entities simultaneously. This method estimates trilinear components, each of which is a low-dimensional representation of a set of entities, often called a mode,
Slepoy, A; Peters, M D; Thompson, A P
2007-11-30
Molecular dynamics and other molecular simulation methods rely on a potential energy function, based only on the relative coordinates of the atomic nuclei. Such a function, called a force field, approximately represents the electronic structure interactions of a condensed matter system. Developing such approximate functions and fitting their parameters remains an arduous, time-consuming process, relying on expert physical intuition. To address this problem, a functional programming methodology was developed that may enable automated discovery of entirely new force-field functional forms, while simultaneously fitting parameter values. The method uses a combination of genetic programming, Metropolis Monte Carlo importance sampling and parallel tempering, to efficiently search a large space of candidate functional forms and parameters. The methodology was tested using a nontrivial problem with a well-defined globally optimal solution: a small set of atomic configurations was generated and the energy of each configuration was calculated using the Lennard-Jones pair potential. Starting with a population of random functions, our fully automated, massively parallel implementation of the method reproducibly discovered the original Lennard-Jones pair potential by searching for several hours on 100 processors, sampling only a minuscule portion of the total search space. This result indicates that, with further improvement, the method may be suitable for unsupervised development of more accurate force fields with completely new functional forms. Copyright (c) 2007 Wiley Periodicals, Inc.
DEFF Research Database (Denmark)
Dung, Phan Anh; Hansen, Michael Reichhardt
2015-01-01
In this paper we investigate multicore parallelism in the context of functional programming by means of two quantifier-elimination procedures for Presburger Arithmetic: one is based on Cooper’s algorithm and the other is based on the Omega Test. We first develop correct-by-construction prototype...... platform executing on an 8-core machine. A speedup of approximately 4 was obtained for Cooper’s algorithm and a speedup of approximately 6 was obtained for the exact-shadow part of the Omega Test. The considered procedures are complex, memory-intense algorithms on huge formula trees and the case study...... reveals more general applicable techniques and guideline for deriving parallel algorithms from sequential ones in the context of data-intensive tree algorithms. The obtained insights should apply for any strict and impure functional programming language. Furthermore, the results obtained for the exact...
Green's function for electrons in a narrow quantum well in a parallel magnetic field
International Nuclear Information System (INIS)
Horing, Norman J. Morgenstern; Glasser, M. Lawrence; Dong Bing
2005-01-01
Electron dynamics in a narrow quantum well in a parallel magnetic field of arbitrary strength are examined here. We derive an explicit analytical closed-form solution for the Green's function of Landau-quantized electrons in skipping states of motion between the narrow well walls coupled with in-plane translational motion and hybridized with the zero-field lowest subband energy eigenstate. Such Landau-quantized modes are not uniformly spaced
A study of objective functions for organs with parallel and serial architecture
International Nuclear Information System (INIS)
Stavrev, P.V.; Stavreva, N.A.; Round, W.H.
1997-01-01
An objective function analysis when target volumes are deliberately enlarged to account for tumour mobility and consecutive uncertainty in the tumour position in external beam radiotherapy has been carried out. The dose distribution inside the tumour is assumed to have logarithmic dependence on the tumour cell density which assures an iso-local tumour control probability. The normal tissue immediately surrounding the tumour is irradiated homogeneously at a dose level equal to the dose D(R)) delivered at the edge of the tumour The normal tissue in the high dose field is modelled as being organized in identical functional subunits (FSUs) composed of a relatively large number of cells. Two types of organs - having serial and parallel architecture are considered. Implicit averaging over intrapatient normal tissue radiosensitivity variations is done. A function describing the normal tissue survival probability S 0 is constructed. The objective function is given as a product of the total tumour control probability (TCP) and the normal tissue survival probability S 0 . The values of the dose D(R)) which result in a maximum of the objective function are obtained for different combinations of tumour and normal tissue parameters, such as tumour and normal tissue radiosensitivities, number of cells constituting a normal tissue functional unit, total number of normal cells under high dose (D(R)) exposure and functional reserve for organs having parallel architecture. The corresponding TCP and S 0 values are computed and discussed. (authors)
Zhang, Rushao; Hui, Mingqi; Long, Zhiying; Zhao, Xiaojie; Yao, Li
2012-01-01
Background Neural substrates underlying motor learning have been widely investigated with neuroimaging technologies. Investigations have illustrated the critical regions of motor learning and further revealed parallel alterations of functional activation during imagination and execution after learning. However, little is known about the functional connectivity associated with motor learning, especially motor imagery learning, although benefits from functional connectivity analysis attract more attention to the related explorations. We explored whether motor imagery (MI) and motor execution (ME) shared parallel alterations of functional connectivity after MI learning. Methodology/Principal Findings Graph theory analysis, which is widely used in functional connectivity exploration, was performed on the functional magnetic resonance imaging (fMRI) data of MI and ME tasks before and after 14 days of consecutive MI learning. The control group had no learning. Two measures, connectivity degree and interregional connectivity, were calculated and further assessed at a statistical level. Two interesting results were obtained: (1) The connectivity degree of the right posterior parietal lobe decreased in both MI and ME tasks after MI learning in the experimental group; (2) The parallel alterations of interregional connectivity related to the right posterior parietal lobe occurred in the supplementary motor area for both tasks. Conclusions/Significance These computational results may provide the following insights: (1) The establishment of motor schema through MI learning may induce the significant decrease of connectivity degree in the posterior parietal lobe; (2) The decreased interregional connectivity between the supplementary motor area and the right posterior parietal lobe in post-test implicates the dissociation between motor learning and task performing. These findings and explanations further revealed the neural substrates underpinning MI learning and supported that
University Utrecht
1992-01-01
A simple density functional theory for the various liquid-crystalline phases of parallel hard spherocylinders is formulated on the basis of Pynn's ansatz for the direct correlation function of the spherocylinders. Fair agreement with the computer simulations is found.
Parallel Fixed Point Implementation of a Radial Basis Function Network in an FPGA
Directory of Open Access Journals (Sweden)
Alisson C. D. de Souza
2014-09-01
Full Text Available This paper proposes a parallel fixed point radial basis function (RBF artificial neural network (ANN, implemented in a field programmable gate array (FPGA trained online with a least mean square (LMS algorithm. The processing time and occupied area were analyzed for various fixed point formats. The problems of precision of the ANN response for nonlinear classification using the XOR gate and interpolation using the sine function were also analyzed in a hardware implementation. The entire project was developed using the System Generator platform (Xilinx, with a Virtex-6 xc6vcx240t-1ff1156 as the target FPGA.
Vector Green's function algorithm for radiative transfer in plane-parallel atmosphere
Energy Technology Data Exchange (ETDEWEB)
Qin Yi [School of Physics, University of New South Wales (Australia)]. E-mail: yi.qin@csiro.au; Box, Michael A. [School of Physics, University of New South Wales (Australia)
2006-01-15
Green's function is a widely used approach for boundary value problems. In problems related to radiative transfer, Green's function has been found to be useful in land, ocean and atmosphere remote sensing. It is also a key element in higher order perturbation theory. This paper presents an explicit expression of the Green's function, in terms of the source and radiation field variables, for a plane-parallel atmosphere with either vacuum boundaries or a reflecting (BRDF) surface. Full polarization state is considered but the algorithm has been developed in such way that it can be easily reduced to solve scalar radiative transfer problems, which makes it possible to implement a single set of code for computing both the scalar and the vector Green's function.
Vector Green's function algorithm for radiative transfer in plane-parallel atmosphere
International Nuclear Information System (INIS)
Qin Yi; Box, Michael A.
2006-01-01
Green's function is a widely used approach for boundary value problems. In problems related to radiative transfer, Green's function has been found to be useful in land, ocean and atmosphere remote sensing. It is also a key element in higher order perturbation theory. This paper presents an explicit expression of the Green's function, in terms of the source and radiation field variables, for a plane-parallel atmosphere with either vacuum boundaries or a reflecting (BRDF) surface. Full polarization state is considered but the algorithm has been developed in such way that it can be easily reduced to solve scalar radiative transfer problems, which makes it possible to implement a single set of code for computing both the scalar and the vector Green's function
Grelck, C.; Herhut, S.; Jesshope, C.; Joslin, C.; Lankamp, M.; Scholz, S.-B.; Shafarenko, A.
2009-01-01
We present preliminary results from compiling the high-level, functional and data-parallel programming language SaC into a novel multi-core design: Microgrids of Self-Adaptive Virtual Processors (SVPs). The side-effect free nature of SaC in conjunction with its data-parallel foundation make it an
Large-area parallel near-field optical nanopatterning of functional materials using microsphere mask
Energy Technology Data Exchange (ETDEWEB)
Chen, G.X. [NUS Nanoscience and Nanotechnology Initiative, National University of Singapore, 2 Engineering Drive 3, Singapore 117576 (Singapore); Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore); Hong, M.H. [NUS Nanoscience and Nanotechnology Initiative, National University of Singapore, 2 Engineering Drive 3, Singapore 117576 (Singapore); Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore); Data Storage Institute, ASTAR, DSI Building, 5 Engineering Drive 1, Singapore 117608 (Singapore)], E-mail: Hong_Minghui@dsi.a-star.edu.sg; Lin, Y. [NUS Nanoscience and Nanotechnology Initiative, National University of Singapore, 2 Engineering Drive 3, Singapore 117576 (Singapore); Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore); Wang, Z.B. [Data Storage Institute, ASTAR, DSI Building, 5 Engineering Drive 1, Singapore 117608 (Singapore); Ng, D.K.T. [Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore); Data Storage Institute, ASTAR, DSI Building, 5 Engineering Drive 1, Singapore 117608 (Singapore); Xie, Q. [Data Storage Institute, ASTAR, DSI Building, 5 Engineering Drive 1, Singapore 117608 (Singapore); Tan, L.S. [NUS Nanoscience and Nanotechnology Initiative, National University of Singapore, 2 Engineering Drive 3, Singapore 117576 (Singapore); Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore); Chong, T.C. [Department of Electrical and Computer Engineering, National University of Singapore, 4 Engineering Drive 3, Singapore 117576 (Singapore); Data Storage Institute, ASTAR, DSI Building, 5 Engineering Drive 1, Singapore 117608 (Singapore)
2008-01-31
Large-area parallel near-field optical nanopatterning on functional material surfaces was investigated with KrF excimer laser irradiation. A monolayer of silicon dioxide microspheres was self-assembled on the sample surfaces as the processing mask. Nanoholes and nanospots were obtained on silicon surfaces and thin silver films, respectively. The nanopatterning results were affected by the refractive indices of the surrounding media. Near-field optical enhancement beneath the microspheres is the physical origin of nanostructure formation. Theoretical calculation was performed to study the intensity of optical field distributions under the microspheres according to the light scattering model of a sphere on the substrate.
Olsen, Aaron M; Westneat, Mark W
2016-12-01
Many musculoskeletal systems, including the skulls of birds, fishes, and some lizards consist of interconnected chains of mobile skeletal elements, analogous to linkage mechanisms used in engineering. Biomechanical studies have applied linkage models to a diversity of musculoskeletal systems, with previous applications primarily focusing on two-dimensional linkage geometries, bilaterally symmetrical pairs of planar linkages, or single four-bar linkages. Here, we present new, three-dimensional (3D), parallel linkage models of the skulls of birds and fishes and use these models (available as free kinematic simulation software), to investigate structure-function relationships in these systems. This new computational framework provides an accessible and integrated workflow for exploring the evolution of structure and function in complex musculoskeletal systems. Linkage simulations show that kinematic transmission, although a suitable functional metric for linkages with single rotating input and output links, can give misleading results when applied to linkages with substantial translational components or multiple output links. To take into account both linear and rotational displacement we define force mechanical advantage for a linkage (analogous to lever mechanical advantage) and apply this metric to measure transmission efficiency in the bird cranial mechanism. For linkages with multiple, expanding output points we propose a new functional metric, expansion advantage, to measure expansion amplification and apply this metric to the buccal expansion mechanism in fishes. Using the bird cranial linkage model, we quantify the inaccuracies that result from simplifying a 3D geometry into two dimensions. We also show that by combining single-chain linkages into parallel linkages, more links can be simulated while decreasing or maintaining the same number of input parameters. This generalized framework for linkage simulation and analysis can accommodate linkages of differing
Parallel Execution of Functional Mock-up Units in Buildings Modeling
Energy Technology Data Exchange (ETDEWEB)
Ozmen, Ozgur [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Nutaro, James J. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); New, Joshua Ryan [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)
2016-06-30
A Functional Mock-up Interface (FMI) defines a standardized interface to be used in computer simulations to develop complex cyber-physical systems. FMI implementation by a software modeling tool enables the creation of a simulation model that can be interconnected, or the creation of a software library called a Functional Mock-up Unit (FMU). This report describes an FMU wrapper implementation that imports FMUs into a C++ environment and uses an Euler solver that executes FMUs in parallel using Open Multi-Processing (OpenMP). The purpose of this report is to elucidate the runtime performance of the solver when a multi-component system is imported as a single FMU (for the whole system) or as multiple FMUs (for different groups of components as sub-systems). This performance comparison is conducted using two test cases: (1) a simple, multi-tank problem; and (2) a more realistic use case based on the Modelica Buildings Library. In both test cases, the performance gains are promising when each FMU consists of a large number of states and state events that are wrapped in a single FMU. Load balancing is demonstrated to be a critical factor in speeding up parallel execution of multiple FMUs.
1982-01-01
Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn
DEFF Research Database (Denmark)
Gunabalan, R.; Sanjeevikumar, P.; Blaabjerg, Frede
2015-01-01
This paper presents the transfer function modeling and stability analysis of two induction motors of same ratings and parameters connected in parallel. The induction motors are controlled by a single inverter and the entire drive system is modeled using transfer function in LabView. Further...
International Nuclear Information System (INIS)
Frink, L.J.D.; Salinger, A.G.
2000-01-01
Fluids adsorbed near surfaces, near macromolecules, and in porous materials are inhomogeneous, exhibiting spatially varying density distributions. This inhomogeneity in the fluid plays an important role in controlling a wide variety of complex physical phenomena including wetting, self-assembly, corrosion, and molecular recognition. One of the key methods for studying the properties of inhomogeneous fluids in simple geometries has been density functional theory (DFT). However, there has been a conspicuous lack of calculations in complex two- and three-dimensional geometries. The computational difficulty arises from the need to perform nested integrals that are due to nonlocal terms in the free energy functional. These integral equations are expensive both in evaluation time and in memory requirements; however, the expense can be mitigated by intelligent algorithms and the use of parallel computers. This paper details the efforts to develop efficient numerical algorithms so that nonlocal DFT calculations in complex geometries that require two or three dimensions can be performed. The success of this implementation will enable the study of solvation effects at heterogeneous surfaces, in zeolites, in solvated (bio)polymers, and in colloidal suspensions
Sharma, Mrinal; Prabha, C Ratna
2011-12-01
Ubiquitin, a small eukaryotic protein serving as a post-translational modification on many important proteins, plays central role in cellular homeostasis and cell cycle regulation. Ubiquitin features two beta-bulges, the second beta-bulge, located at the C-terminal region of the protein along with type II turn, holds 3 residues Glu64(1), Ser65(2) and Gln2(X). Percent frequency of occurrence of such a sequence in parallel beta-bulge is very low. However, the sequence and structure have been conserved in ubiquitin through out the evolution. Present study involves replacement of residues in unusual beta-bulge of ubiquitin by introducing mutations in combination through site directed mutagenesis, generating double and triple mutants and their functional characterization. Mutant ubiquitins cloned in yeast expression vector YEp96 tested for growth profile, viability assay and heat stress complementation study have revealed significant decrease in growth rate, loss of viability and non-complementation of heat sensitive phenotype with UbE64G-S65D and UbQ2N-E64G-S65D mutations. However, UbQ2N-S65D did not show any negative effects in the above assays. Present results show that, replacement of residues in beta-bulge of ubiquitin exerts severe effects on growth and viability in Saccharomyces cerevisiae due to functional failure of the mutant ubiquitins UbE64G-S65D and UbQ2N-E64G-S65D.
From functional programming to multicore parallelism: A case study based on Presburger Arithmetic
DEFF Research Database (Denmark)
Dung, Phan Anh; Hansen, Michael Reichhardt
2011-01-01
, we are interested in using PA in connection with the Duration Calculus Model Checker (DCMC) [5]. There are effective decision procedures for PA including Cooper’s algorithm and the Omega Test; however, their complexity is extremely high with doubly exponential lower bound and triply exponential upper...... bound [7]. We investigate these decision procedures in the context of multicore parallelism with the hope of exploiting multicore powers. Unfortunately, we are not aware of any prior parallelism research related to decision procedures for PA. The closest work is the preliminary results on parallelism...
Directory of Open Access Journals (Sweden)
Chandrasekhar Natarajan
2015-12-01
Full Text Available A fundamental question in evolutionary genetics concerns the extent to which adaptive phenotypic convergence is attributable to convergent or parallel changes at the molecular sequence level. Here we report a comparative analysis of hemoglobin (Hb function in eight phylogenetically replicated pairs of high- and low-altitude waterfowl taxa to test for convergence in the oxygenation properties of Hb, and to assess the extent to which convergence in biochemical phenotype is attributable to repeated amino acid replacements. Functional experiments on native Hb variants and protein engineering experiments based on site-directed mutagenesis revealed the phenotypic effects of specific amino acid replacements that were responsible for convergent increases in Hb-O2 affinity in multiple high-altitude taxa. In six of the eight taxon pairs, high-altitude taxa evolved derived increases in Hb-O2 affinity that were caused by a combination of unique replacements, parallel replacements (involving identical-by-state variants with independent mutational origins in different lineages, and collateral replacements (involving shared, identical-by-descent variants derived via introgressive hybridization. In genome scans of nucleotide differentiation involving high- and low-altitude populations of three separate species, function-altering amino acid polymorphisms in the globin genes emerged as highly significant outliers, providing independent evidence for adaptive divergence in Hb function. The experimental results demonstrate that convergent changes in protein function can occur through multiple historical paths, and can involve multiple possible mutations. Most cases of convergence in Hb function did not involve parallel substitutions and most parallel substitutions did not affect Hb-O2 affinity, indicating that the repeatability of phenotypic evolution does not require parallelism at the molecular level.
Khorshidi, Abdollah; Ashoor, Mansour
2014-05-01
This study investigates modulation transfer function (MTF) in parallel beam (PB) and fan beam (FB) collimators using the Monte Carlo method with full width at half maximum (FWHM), square and circular-shaped holes, and scatter and penetration (S + P) components. A regulation similar to the lead-to-air ratio was used for both collimators to estimate output data. The hole pattern was designed to compare FB by PB parameters. The radioactive source in air and in a water phantom placed in front of the collimators was simulated using MCNP5 code. The test results indicated that the square holes in PB (PBs) had better FWHM than did the cylindrical (PBc) holes. In contrast, the cylindrical holes in the FB (FBc) had better FWHM than the square holes. In general, the resolution of FBc was better than that of the PBc in air and scatter mediums. The S + P decreased for all collimators as the distance from the source to the collimator surface (z) increased. The FBc had a lower S + P than FBs, but PBc had a higher S + P than PBs. Of the FB and PB collimators with the identical hole shapes, PBs had a smaller S + P than FBs, and FBc had a smaller S + P than PBc. The MTF value for the FB was greater than for the PB and had increased spatial frequency; the FBc had higher MTF than the FBs and PB collimators. Estimating the FB using PB parameters and diverse hole shapes may be useful in collimator design to improve the resolution and efficiency of SPECT images.
Energy Technology Data Exchange (ETDEWEB)
Guttenberg, Philipp; Lin, Mengyan [Romax Technology, Nottingham (United Kingdom)
2009-07-01
The following paper presents a comparative efficiency analysis of the Toyota Prius versus the Honda Insight using advanced Energy Flow Analysis methods. The sample study shows that even very different hybrid concepts like a split- and a parallel-hybrid can be compared in a high level of detail and demonstrates the benefit showing exemplary results. (orig.)
Harper, Richard
1989-01-01
In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.
Energy Technology Data Exchange (ETDEWEB)
Thiess, Alexander R.
2011-12-19
In this thesis we present the development of the self-consistent, full-potential Korringa-Kohn-Rostoker (KKR) Green function method KKRnano for calculating the electronic properties, magnetic interactions, and total energy including all electrons on the basis of the density functional theory (DFT) on high-end massively parallelized high-performance computers for supercells containing thousands of atoms without sacrifice of accuracy. KKRnano was used for the following two applications. The first application is centered in the field of dilute magnetic semiconductors. In this field a new promising material combination was identified: gadolinium doped gallium nitride which shows ferromagnetic ordering of colossal magnetic moments above room temperature. It quickly turned out that additional extrinsic defects are inducing the striking properties. However, the question which kind of extrinsic defects are present in experimental samples is still unresolved. In order to shed light on this open question, we perform extensive studies of the most promising candidates: interstitial nitrogen and oxygen, as well as gallium vacancies. By analyzing the pairwise magnetic coupling among defects it is shown that nitrogen and oxygen interstitials cannot support thermally stable ferromagnetic order. Gallium vacancies, on the other hand, facilitate an important coupling mechanism. The vacancies are found to induce large magnetic moments on all surrounding nitrogen sites, which then couple ferromagnetically both among themselves and with the gadolinium dopants. Based on a statistical evaluation it can be concluded that already small concentrations of gallium vacancies can lead to a distinct long-range ferromagnetic ordering. Beyond this important finding we present further indications, from which we infer that gallium vacancies likely cause the striking ferromagnetic coupling of colossal magnetic moments in GaN:Gd. The second application deals with the phase-change material germanium
International Nuclear Information System (INIS)
Thiess, Alexander R.
2011-01-01
In this thesis we present the development of the self-consistent, full-potential Korringa-Kohn-Rostoker (KKR) Green function method KKRnano for calculating the electronic properties, magnetic interactions, and total energy including all electrons on the basis of the density functional theory (DFT) on high-end massively parallelized high-performance computers for supercells containing thousands of atoms without sacrifice of accuracy. KKRnano was used for the following two applications. The first application is centered in the field of dilute magnetic semiconductors. In this field a new promising material combination was identified: gadolinium doped gallium nitride which shows ferromagnetic ordering of colossal magnetic moments above room temperature. It quickly turned out that additional extrinsic defects are inducing the striking properties. However, the question which kind of extrinsic defects are present in experimental samples is still unresolved. In order to shed light on this open question, we perform extensive studies of the most promising candidates: interstitial nitrogen and oxygen, as well as gallium vacancies. By analyzing the pairwise magnetic coupling among defects it is shown that nitrogen and oxygen interstitials cannot support thermally stable ferromagnetic order. Gallium vacancies, on the other hand, facilitate an important coupling mechanism. The vacancies are found to induce large magnetic moments on all surrounding nitrogen sites, which then couple ferromagnetically both among themselves and with the gadolinium dopants. Based on a statistical evaluation it can be concluded that already small concentrations of gallium vacancies can lead to a distinct long-range ferromagnetic ordering. Beyond this important finding we present further indications, from which we infer that gallium vacancies likely cause the striking ferromagnetic coupling of colossal magnetic moments in GaN:Gd. The second application deals with the phase-change material germanium
International Nuclear Information System (INIS)
Rabrait, C.
2007-11-01
Echo Planar Imaging is widely used to perform data acquisition in functional neuroimaging. This sequence allows the acquisition of a set of about 30 slices, covering the whole brain, at a spatial resolution ranging from 2 to 4 mm, and a temporal resolution ranging from 1 to 2 s. It is thus well adapted to the mapping of activated brain areas but does not allow precise study of the brain dynamics. Moreover, temporal interpolation is needed in order to correct for inter-slices delays and 2-dimensional acquisition is subject to vascular in flow artifacts. To improve the estimation of the hemodynamic response functions associated with activation, this thesis aimed at developing a 3-dimensional high temporal resolution acquisition method. To do so, Echo Volume Imaging was combined with reduced field-of-view acquisition and parallel imaging. Indeed, E.V.I. allows the acquisition of a whole volume in Fourier space following a single excitation, but it requires very long echo trains. Parallel imaging and field-of-view reduction are used to reduce the echo train durations by a factor of 4, which allows the acquisition of a 3-dimensional brain volume with limited susceptibility-induced distortions and signal losses, in 200 ms. All imaging parameters have been optimized in order to reduce echo train durations and to maximize S.N.R., so that cerebral activation can be detected with a high level of confidence. Robust detection of brain activation was demonstrated with both visual and auditory paradigms. High temporal resolution hemodynamic response functions could be estimated through selective averaging of the response to the different trials of the stimulation. To further improve S.N.R., the matrix inversions required in parallel reconstruction were regularized, and the impact of the level of regularization on activation detection was investigated. Eventually, potential applications of parallel E.V.I. such as the study of non-stationary effects in the B.O.L.D. response
Directory of Open Access Journals (Sweden)
T. Stefański
2014-12-01
Full Text Available Parallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD method was developed on a multicore central processing unit. DGF-FDTD avoids computations of the electromagnetic field in free-space cells and does not require domain termination by absorbing boundary conditions. Computed DGF-FDTD solutions are compatible with the FDTD grid enabling the perfect hybridization of FDTD with the use of time-domain integral equation methods. The developed implementation can be applied to simulations of antenna characteristics. For the sake of example, arrays of Yagi-Uda antennas were simulated with the use of parallel DGF-FDTD. The efficiency of parallel computations was investigated as a function of the number of current elements in the FDTD grid. Although the developed method does not apply the fast Fourier transform for convolution computations, advantages stemming from the application of DGF-FDTD instead of FDTD can be demonstrated for one-dimensional wire antennas when simulation results are post-processed by the near-to-far-field transformation.
Energy Technology Data Exchange (ETDEWEB)
Tanabe, Hirotaka; Okuda, Junichiro; Nishikawa, Takashi; Nishimura, Tsuyoshi (Osaka Univ. (Japan). Faculty of Medicine); Shiraishi, Junzo
1982-06-01
In order to identify the functional brain areas, such as Broca's area, on computed tomography slices parallel to the orbito-meatal line, the numbers of Brodmann's cortical mapping were shown on a diagram of representative brain sections parallel to the orbito-meatal line. Also, we described a method, using cerebral sulci as anatomical landmarks, for projecting lesions shown by CT scan onto the lateral brain diagram. The procedures were as follows. The distribution of lesions on CT slices was determined by the identification of major cerebral sulci and fissures, such as the Sylvian fissure, the central sulcus, and the superior frontal sulcus. Those lesions were then projected onto the lateral diagram by comparing each CT slice with the horizontal diagrams of brain sections. The method was demonstrated in three cases developing neuropsychological symptoms.
Wheatley, Thomas E.; Michaloski, John L.; Lumia, Ronald
1989-01-01
Analysis of a robot control system leads to a broad range of processing requirements. One fundamental requirement of a robot control system is the necessity of a microcomputer system in order to provide sufficient processing capability.The use of multiple processors in a parallel architecture is beneficial for a number of reasons, including better cost performance, modular growth, increased reliability through replication, and flexibility for testing alternate control strategies via different partitioning. A survey of the progression from low level control synchronizing primitives to higher level communication tools is presented. The system communication and control mechanisms of existing robot control systems are compared to the hierarchical control model. The impact of this design methodology on the current robot control systems is explored.
Chen, Kewei; Zhan, Hongbin
2018-06-01
The reactive solute transport in a single fracture bounded by upper and lower matrixes is a classical problem that captures the dominant factors affecting transport behavior beyond pore scale. A parallel fracture-matrix system which considers the interaction among multiple paralleled fractures is an extension to a single fracture-matrix system. The existing analytical or semi-analytical solution for solute transport in a parallel fracture-matrix simplifies the problem to various degrees, such as neglecting the transverse dispersion in the fracture and/or the longitudinal diffusion in the matrix. The difficulty of solving the full two-dimensional (2-D) problem lies in the calculation of the mass exchange between the fracture and matrix. In this study, we propose an innovative Green's function approach to address the 2-D reactive solute transport in a parallel fracture-matrix system. The flux at the interface is calculated numerically. It is found that the transverse dispersion in the fracture can be safely neglected due to the small scale of fracture aperture. However, neglecting the longitudinal matrix diffusion would overestimate the concentration profile near the solute entrance face and underestimate the concentration profile at the far side. The error caused by neglecting the longitudinal matrix diffusion decreases with increasing Peclet number. The longitudinal matrix diffusion does not have obvious influence on the concentration profile in long-term. The developed model is applied to a non-aqueous-phase-liquid (DNAPL) contamination field case in New Haven Arkose of Connecticut in USA to estimate the Trichloroethylene (TCE) behavior over 40 years. The ratio of TCE mass stored in the matrix and the injected TCE mass increases above 90% in less than 10 years.
A hybrid method for the parallel computation of Green’s functions
DEFF Research Database (Denmark)
Petersen, Dan Erik; Li, Song; Stokbro, Kurt
2009-01-01
Quantum transport models for nanodevices using the non-equilibrium Green’s function method require the repeated calculation of the block tridiagonal part of the Green’s and lesser Green’s function matrices. This problem is related to the calculation of the inverse of a sparse matrix. Because of t...
Corral framework: Trustworthy and fully functional data intensive parallel astronomical pipelines
Cabral, J. B.; Sánchez, B.; Beroiz, M.; Domínguez, M.; Lares, M.; Gurovich, S.; Granitto, P.
2017-07-01
Data processing pipelines represent an important slice of the astronomical software library that include chains of processes that transform raw data into valuable information via data reduction and analysis. In this work we present Corral, a Python framework for astronomical pipeline generation. Corral features a Model-View-Controller design pattern on top of an SQL Relational Database capable of handling: custom data models; processing stages; and communication alerts, and also provides automatic quality and structural metrics based on unit testing. The Model-View-Controller provides concept separation between the user logic and the data models, delivering at the same time multi-processing and distributed computing capabilities. Corral represents an improvement over commonly found data processing pipelines in astronomysince the design pattern eases the programmer from dealing with processing flow and parallelization issues, allowing them to focus on the specific algorithms needed for the successive data transformations and at the same time provides a broad measure of quality over the created pipeline. Corral and working examples of pipelines that use it are available to the community at https://github.com/toros-astro.
Introduction to parallel programming
Brawer, Steven
1989-01-01
Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race
Directory of Open Access Journals (Sweden)
Anneke van der Walt
Full Text Available Visual evoked potential (VEP latency prolongation and optic nerve lesion length after acute optic neuritis (ON corresponds to the degree of demyelination, while subsequent recovery of latency may represent optic nerve remyelination. We aimed to investigate the relationship between multifocal VEP (mfVEP latency and optic nerve lesion length after acute ON.Thirty acute ON patients were studied at 1, 3, 6 and 12 months using mfVEP and at 1 and 12 months with optic nerve MRI. LogMAR and low contrast visual acuity were documented. By one month, the mfVEP amplitude had recovered sufficiently for latency to be measured in 23 (76.7% patients with seven patients having no recordable mfVEP in more than 66% of segments in at least one test. Only data from these 23 patients was analysed further.Both latency and lesion length showed significant recovery during the follow-up period. Lesion length and mfVEP latency were highly correlated at 1 (r = 0.94, p = <0.0001 and 12 months (r = 0.75, p < 0.001. Both measures demonstrated a similar trend of recovery. Speed of latency recovery was faster in the early follow-up period while lesion length shortening remained relatively constant. At 1 month, latency delay was worse by 1.76 ms for additional 1mm of lesion length while at 12 months, 1mm of lesion length accounted for 1.94 ms of latency delay.A strong association between two putative measures of demyelination in early and chronic ON was found. Parallel recovery of both measures could reflect optic nerve remyelination.
International Nuclear Information System (INIS)
Mueller, D.W.; Crosbie, A.L.
2005-01-01
The topic of this work is the generalized X- and Y-functions of multidimensional radiative transfer. The physical problem considered is spatially varying, collimated radiation incident on the upper boundary of an isotropically scattering, plane-parallel medium. An integral transform is used to reduce the three-dimensional transport equation to a one-dimensional form, and a modified Ambarzumian's method is used to derive coupled, integro-differential equations for the source functions at the boundaries of the medium. The resulting equations are said to be in double-integral form because the integration is over both angular variables. Numerical results are presented to illustrate the computational characteristics of the formulation
DEFF Research Database (Denmark)
Bendtsen, Claus; Nielsen, Ole Holm; Hansen, Lars Bruno
2001-01-01
The quantum mechanical ground state of electrons is described by Density Functional Theory, which leads to large minimization problems. An efficient minimization method uses a self-consistent field (SCF) solution of large eigenvalue problems. The iterative Davidson algorithm is often used, and we...
Directory of Open Access Journals (Sweden)
Kennedy Deborah M
2004-06-01
Full Text Available Abstract Background Although the Western Ontario and McMaster University Osteoarthritis Index (WOMAC is considered the leading outcome measure for patients with osteoarthritis of the lower extremity, recent work has challenged its factorial validity and the physical function subscale's ability to detect valid change when pain and function display different profiles of change. This study examined the etiology of the WOMAC's physical function subscale's limited ability to detect change in the presence of discordant changes for pain and function. We hypothesized that the duplication of some items on the WOMAC's pain and function subscales contributed to this shortcoming. Methods Two eight-item physical function scales were abstracted from the WOMAC's 17-item physical function subscale: one contained activities and themes that were duplicated on the pain subscale (SIMILAR-8; the other version avoided overlapping activities (DISSIMILAR-8. Factorial validity of the shortened measures was assessed on 310 patients awaiting hip or knee arthroplasty. The shortened measures' abilities to detect change were examined on a sample of 104 patients following primary hip or knee arthroplasty. The WOMAC and three performance measures that included activity specific pain assessments – 40 m walk test, stair test, and timed-up-and-go test – were administered preoperatively, within 16 days of hip or knee arthroplasty, and at an interval of greater than 20 days following the first post-surgical assessment. Standardized response means were used to quantify change. Results The SIMILAR-8 did not demonstrate factorial validity; however, the factorial structure of the DISSIMILAR-8 was supported. The time to complete the performance measures more than doubled between the preoperative and first postoperative assessments supporting the theory that lower extremity functional status diminished over this interval. The DISSIMILAR-8 detected this deterioration in functional
The functional significance of cortical reorganization and the parallel development of CI therapy
Directory of Open Access Journals (Sweden)
Edward eTaub
2014-06-01
Full Text Available For the 19th and the better part of the 20th centuries two correlative beliefs were strongly held by almost all neuroscientists and practitioners in the field of neurorehabilitation. The first was that after maturity the adult CNS was hardwired and fixed, and second that in the chronic phase after CNS injury no substantial recovery of function could take place no matter what intervention was employed. However, in the last part of the 20th century evidence began to accumulate that neither belief was correct. First, in the 1960s and 1970s, in research with primates given a surgical abolition of somatic sensation from a single forelimb, which rendered the extremity useless, it was found that behavioral techniques could convert the limb into an extremity that could be used extensively. Beginning in the late 1980s, the techniques employed with deafferented monkeys were translated into a rehabilitation treatment, termed Constraint Induced Movement therapy or CI therapy, for substantially improving the motor deficit of the upper and lower extremities in the chronic phase after stroke. CI therapy has been applied successfully to other types of damage to the CNS such as traumatic brain injury, cerebral palsy, multiple sclerosis, and spinal cord injury, and it has also been used to improve function in focal hand dystonia and for aphasia after stroke. As this work was proceeding, it was being shown during the 1980s and 1990s that sustained modulation of afferent input could alter the structure of the CNS and that this topographic reorganization could have relevance to the function of the individual. The alteration in these once fundamental beliefs has given rise to important recent developments in neuroscience and neurorehabilitation and holds promise for further increasing our understanding of CNS function and extending the boundaries of what is possible in neurorehabilitation.
International Nuclear Information System (INIS)
Yan Haojing; Yan Lin; Zamojski, Michel A.; Windhorst, Rogier A.; McCarthy, Patrick J.; Fan Xiaohui; Dave, Romeel; Roettgering, Huub J. A.; Koekemoer, Anton M.; Robertson, Brant E.; Cai Zheng
2011-01-01
We report the first results from the Hubble Infrared Pure Parallel Imaging Extragalactic Survey, which utilizes the pure parallel orbits of the Hubble Space Telescope to do deep imaging along a large number of random sightlines. To date, our analysis includes 26 widely separated fields observed by the Wide Field Camera 3, which amounts to 122.8 arcmin 2 in total area. We have found three bright Y 098 -dropouts, which are candidate galaxies at z ∼> 7.4. One of these objects shows an indication of peculiar variability and its nature is uncertain. The other two objects are among the brightest candidate galaxies at these redshifts known to date (L>2L*). Such very luminous objects could be the progenitors of the high-mass Lyman break galaxies observed at lower redshifts (up to z ∼ 5). While our sample is still limited in size, it is much less subject to the uncertainty caused by 'cosmic variance' than other samples because it is derived using fields along many random sightlines. We find that the existence of the brightest candidate at z ∼ 7.4 is not well explained by the current luminosity function (LF) estimates at z ∼ 8. However, its inferred surface density could be explained by the prediction from the LFs at z ∼ 7 if it belongs to the high-redshift tail of the galaxy population at z ∼ 7.
Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah
2012-01-01
Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple.
Alaverdashvili, Mariam; Hackett, Mark J; Caine, Sally; Paterson, Phyllis G
2017-04-01
While protein-energy malnutrition in the adult has been reported to induce motor abnormalities and exaggerate motor deficits caused by stroke, it is not known if alterations in mature cortical neurons contribute to the functional deficits. Therefore, we explored if PEM in adult rats provoked changes in the biochemical profile of neurons in the forelimb and hindlimb regions of the motor cortex. Fourier transform infrared spectroscopic imaging using a synchrotron generated light source revealed for the first time altered lipid composition in neurons and subcellular domains (cytosol and nuclei) in a cortical layer and region-specific manner. This change measured by the area under the curve of the δ(CH 2 ) band may indicate modifications in membrane fluidity. These PEM-induced biochemical changes were associated with the development of abnormalities in forelimb use and posture. The findings of this study provide a mechanism by which PEM, if not treated, could exacerbate the course of various neurological disorders and diminish treatment efficacy. Copyright © 2017 Elsevier Inc. All rights reserved.
Yan, Haojing; Yan, Lin; Zamojski, Michel A.; Windhorst, Rogier A.; McCarthy, Patrick J.; Fan, Xiaohui; Röttgering, Huub J. A.; Koekemoer, Anton M.; Robertson, Brant E.; Davé, Romeel; Cai, Zheng
2011-02-01
We report the first results from the Hubble Infrared Pure Parallel Imaging Extragalactic Survey, which utilizes the pure parallel orbits of the Hubble Space Telescope to do deep imaging along a large number of random sightlines. To date, our analysis includes 26 widely separated fields observed by the Wide Field Camera 3, which amounts to 122.8 arcmin2 in total area. We have found three bright Y 098-dropouts, which are candidate galaxies at z >~ 7.4. One of these objects shows an indication of peculiar variability and its nature is uncertain. The other two objects are among the brightest candidate galaxies at these redshifts known to date (L>2L*). Such very luminous objects could be the progenitors of the high-mass Lyman break galaxies observed at lower redshifts (up to z ~ 5). While our sample is still limited in size, it is much less subject to the uncertainty caused by "cosmic variance" than other samples because it is derived using fields along many random sightlines. We find that the existence of the brightest candidate at z ≈ 7.4 is not well explained by the current luminosity function (LF) estimates at z ≈ 8. However, its inferred surface density could be explained by the prediction from the LFs at z ≈ 7 if it belongs to the high-redshift tail of the galaxy population at z ≈ 7. Based on observations made with the NASA/ESA Hubble Space Telescope, obtained at the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS 5-26555. These observations are associated with programs 11700 and 11702.
Scemama, Anthony; Renon, Nicolas; Rapacioli, Mathias
2014-06-10
We present an algorithm and its parallel implementation for solving a self-consistent problem as encountered in Hartree-Fock or density functional theory. The algorithm takes advantage of the sparsity of matrices through the use of local molecular orbitals. The implementation allows one to exploit efficiently modern symmetric multiprocessing (SMP) computer architectures. As a first application, the algorithm is used within the density-functional-based tight binding method, for which most of the computational time is spent in the linear algebra routines (diagonalization of the Fock/Kohn-Sham matrix). We show that with this algorithm (i) single point calculations on very large systems (millions of atoms) can be performed on large SMP machines, (ii) calculations involving intermediate size systems (1000-100 000 atoms) are also strongly accelerated and can run efficiently on standard servers, and (iii) the error on the total energy due to the use of a cutoff in the molecular orbital coefficients can be controlled such that it remains smaller than the SCF convergence criterion.
Okamoto, Ryuichi; Onuki, Akira
2012-03-21
We investigate the critical behavior of a near-critical fluid confined between two parallel plates in contact with a reservoir by calculating the order parameter profile and the Casimir amplitudes (for the force density and for the grand potential). Our results are applicable to one-component fluids and binary mixtures. We assume that the walls absorb one of the fluid components selectively for binary mixtures. We propose a renormalized local functional theory accounting for the fluctuation effects. Analysis is performed in the plane of the temperature T and the order parameter in the reservoir ψ(∞). Our theory is universal if the physical quantities are scaled appropriately. If the component favored by the walls is slightly poor in the reservoir, there appears a line of first-order phase transition of capillary condensation outside the bulk coexistence curve. The excess adsorption changes discontinuously between condensed and noncondensed states at the transition. With increasing T, the transition line ends at a capillary critical point T=T(c) (ca) slightly lower than the bulk critical temperature T(c) for the upper critical solution temperature. The Casimir amplitudes are larger than their critical point values by 10-100 times at off-critical compositions near the capillary condensation line. © 2012 American Institute of Physics
Hung, Linda; Huang, Chen; Shin, Ilgyou; Ho, Gregory S.; Lignères, Vincent L.; Carter, Emily A.
2010-12-01
Orbital-free density functional theory (OFDFT) is a first principles quantum mechanics method to find the ground-state energy of a system by variationally minimizing with respect to the electron density. No orbitals are used in the evaluation of the kinetic energy (unlike Kohn-Sham DFT), and the method scales nearly linearly with the size of the system. The PRinceton Orbital-Free Electronic Structure Software (PROFESS) uses OFDFT to model materials from the atomic scale to the mesoscale. This new version of PROFESS allows the study of larger systems with two significant changes: PROFESS is now parallelized, and the ion-electron and ion-ion terms scale quasilinearly, instead of quadratically as in PROFESS v1 (L. Hung and E.A. Carter, Chem. Phys. Lett. 475 (2009) 163). At the start of a run, PROFESS reads the various input files that describe the geometry of the system (ion positions and cell dimensions), the type of elements (defined by electron-ion pseudopotentials), the actions you want it to perform (minimize with respect to electron density and/or ion positions and/or cell lattice vectors), and the various options for the computation (such as which functionals you want it to use). Based on these inputs, PROFESS sets up a computation and performs the appropriate optimizations. Energies, forces, stresses, material geometries, and electron density configurations are some of the values that can be output throughout the optimization. New version program summaryProgram Title: PROFESS Catalogue identifier: AEBN_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBN_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 68 721 No. of bytes in distributed program, including test data, etc.: 1 708 547 Distribution format: tar.gz Programming language: Fortran 90 Computer
Hey2 functions in parallel with Hes1 and Hes5 for mammalian auditory sensory organ development
Directory of Open Access Journals (Sweden)
Chin Michael T
2008-02-01
Full Text Available Abstract Background During mouse development, the precursor cells that give rise to the auditory sensory organ, the organ of Corti, are specified prior to embryonic day 14.5 (E14.5. Subsequently, the sensory domain is patterned precisely into one row of inner and three rows of outer sensory hair cells interdigitated with supporting cells. Both the restriction of the sensory domain and the patterning of the sensory mosaic of the organ of Corti involve Notch-mediated lateral inhibition and cellular rearrangement characteristic of convergent extension. This study explores the expression and function of a putative Notch target gene. Results We report that a putative Notch target gene, hairy-related basic helix-loop-helix (bHLH transcriptional factor Hey2, is expressed in the cochlear epithelium prior to terminal differentiation. Its expression is subsequently restricted to supporting cells, overlapping with the expression domains of two known Notch target genes, Hairy and enhancer of split homolog genes Hes1 and Hes5. In combination with the loss of Hes1 or Hes5, genetic inactivation of Hey2 leads to increased numbers of mis-patterned inner or outer hair cells, respectively. Surprisingly, the ectopic hair cells in Hey2 mutants are accompanied by ectopic supporting cells. Furthermore, Hey2-/-;Hes1-/- and Hey2-/-;Hes1+/- mutants show a complete penetrance of early embryonic lethality. Conclusion Our results indicate that Hey2 functions in parallel with Hes1 and Hes5 in patterning the organ of Corti, and interacts genetically with Hes1 for early embryonic development and survival. Our data implicates expansion of the progenitor pool and/or the boundaries of the developing sensory organ to account for patterning defects observed in Hey2 mutants.
Crockett, Thomas W.
1995-01-01
This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
Parallelism in matrix computations
Gallopoulos, Efstratios; Sameh, Ahmed H
2016-01-01
This book is primarily intended as a research monograph that could also be used in graduate courses for the design of parallel algorithms in matrix computations. It assumes general but not extensive knowledge of numerical linear algebra, parallel architectures, and parallel programming paradigms. The book consists of four parts: (I) Basics; (II) Dense and Special Matrix Computations; (III) Sparse Matrix Computations; and (IV) Matrix functions and characteristics. Part I deals with parallel programming paradigms and fundamental kernels, including reordering schemes for sparse matrices. Part II is devoted to dense matrix computations such as parallel algorithms for solving linear systems, linear least squares, the symmetric algebraic eigenvalue problem, and the singular-value decomposition. It also deals with the development of parallel algorithms for special linear systems such as banded ,Vandermonde ,Toeplitz ,and block Toeplitz systems. Part III addresses sparse matrix computations: (a) the development of pa...
Boer, A. de; Mouthaan, K.
2000-01-01
The design and measured performance of a GaAs multi-function X-band MMIC for spacebased synthetic aperture radar (SAR) applications with 7-bit phase and amplitude control and integrated serial to parallel converter (including level conversion) is presented. The main application for the
Casanova, Henri; Robert, Yves
2008-01-01
""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi
Directory of Open Access Journals (Sweden)
G. Ahirwar
2006-08-01
Full Text Available The effect of parallel electric field on the growth rate, parallel and perpendicular resonant energy and marginal stability of the electromagnetic ion-cyclotron (EMIC wave with general loss-cone distribution function in a low β homogeneous plasma is investigated by particle aspect approach. The effect of the steepness of the loss-cone distribution is investigated on the electromagnetic ion-cyclotron wave. The whole plasma is considered to consist of resonant and non-resonant particles. It is assumed that resonant particles participate in the energy exchange with the wave, whereas non-resonant particles support the oscillatory motion of the wave. The wave is assumed to propagate parallel to the static magnetic field. The effect of the parallel electric field with the general distribution function is to control the growth rate of the EMIC waves, whereas the effect of steep loss-cone distribution is to enhance the growth rate and perpendicular heating of the ions. This study is relevant to the analysis of ion conics in the presence of an EMIC wave in the auroral acceleration region of the Earth's magnetoplasma.
Nishizawa, Hiroaki; Nishimura, Yoshifumi; Kobayashi, Masato; Irle, Stephan; Nakai, Hiromi
2016-08-05
The linear-scaling divide-and-conquer (DC) quantum chemical methodology is applied to the density-functional tight-binding (DFTB) theory to develop a massively parallel program that achieves on-the-fly molecular reaction dynamics simulations of huge systems from scratch. The functions to perform large scale geometry optimization and molecular dynamics with DC-DFTB potential energy surface are implemented to the program called DC-DFTB-K. A novel interpolation-based algorithm is developed for parallelizing the determination of the Fermi level in the DC method. The performance of the DC-DFTB-K program is assessed using a laboratory computer and the K computer. Numerical tests show the high efficiency of the DC-DFTB-K program, a single-point energy gradient calculation of a one-million-atom system is completed within 60 s using 7290 nodes of the K computer. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
International Nuclear Information System (INIS)
Tanabe, Hirotaka; Okuda, Junichiro; Nishikawa, Takashi; Nishimura, Tsuyoshi; Shiraishi, Junzo.
1982-01-01
In order to identify the functional brain areas, such as Broca's area, on computed tomography slices parallel to the orbito-meatal line, the numbers of Brodmann's cortical mapping were shown on a diagram of representative brain sections parallel to the orbito-meatal line. Also, we described a method, using cerebral sulci as anatomical landmarks, for projecting lesions shown by CT scan onto the lateral brain diagram. The procedures were as follows. The distribution of lesions on CT slices was determined by the identification of major cerebral sulci and fissures, such as the Sylvian fissure, the central sulcus, and the superior frontal sulcus. Those lesions were then projected onto the lateral diagram by comparing each CT slice with the horizontal diagrams of brain sections. The method was demonstrated in three cases developing neuropsychological symptoms. (author)
International Nuclear Information System (INIS)
Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.
1997-01-01
The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment
Ngounou, Bertrand; Aliyev, Elchin H; Guschin, Dmitrii A; Sultanov, Yusif M; Efendiev, Ayaz A; Schuhmann, Wolfgang
2007-09-01
The integration of flexible anchoring groups bearing imidazolyl or pyridyl substituents into the structure of electrodeposition paints (EDP) is the basis for the parallel synthesis of a library containing 107 members of different cathodic and anodic EDPs with a high variation in polymer properties. The obtained EDPs were used as immobilization matrix for biosensor fabrication using glucose oxidase as a model enzyme. Amperometric glucose sensors based on the different EDPs showed a wide variation in their sensor characteristics with respect to the apparent Michaelis-Menten constant (KM(app)) representing the linear measuring range and the maximum current (Imax(app)). Based on these results first assumptions concerning the impact of different side chains in the EDP on the expected biosensor properties could be obtained allowing for an improved rational optimization of EDPs used as immobilization matrix in amperometric biosensors.
McCallum, Ethan
2011-01-01
It's tough to argue with R as a high-quality, cross-platform, open source statistical software product-unless you're in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You'll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don't. With these packages, you can overcome R's single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R's memory barrier.
Chafi, Hatim; Elias, Saba N; Nguyen, Huyen T; Friel, Harry T; Knopp, Michael V; Guo, BeiBei; Heymsfield, Steven B; Jia, Guang
2016-01-01
To evaluate whether parallel radiofrequency transmission (mTX) can improve the symmetry of the left and right femoral arteries in dynamic contrast enhanced magnetic resonance imaging (DCE-MRI) of prostate and bladder cancer. Eighteen prostate and 24 bladder cancer patients underwent 3.0 Tesla DCE-MRI scan with a single transmission channel coil. Subsequently, 21 prostate and 21 bladder cancer patients were scanned using the dual channel mTX upgrade. The precontrast signal ( S0) and the maximum enhancement ratio (MER) were measured in both the left and the right femoral arteries. Within the patient cohort, the ratio of S0 and MER in the left artery to that in the right artery ( S0_LR, MER_LR) was calculated with and without the use of mTX. Left to right asymmetry indices for S0 ( S0_LRasym) and MER ( MER_LRasym) were defined as the absolute values of the difference between S0_LR and 1, and the difference between MER_LR and 1, respectively. S0_LRasym, and MER_LRasym were 0.21 and 0.19 for prostate cancer patients with mTX, and 0.43 and 0.45 for the ones imaged without it (P enhancement. © 2015 Wiley Periodicals, Inc.
Caspar, Thibault; Schultz, Anthony; Schaeffer, Mickaël; Labani, Aïssam; Jeung, Mi-Young; Jurgens, Paul Thomas; El Ghannudi, Soraya; Roy, Catherine; Ohana, Mickaël
To compare cine MR b-TFE sequences acquired before and after gadolinium injection, on a 3T scanner with a parallel RF transmission technique in order to potentially improve scanning time efficiency when evaluating LV function. 25 consecutive patients scheduled for a cardiac MRI were prospectively included and had their b-TFE cine sequences acquired before and right after gadobutrol injection. Images were assessed qualitatively (overall image quality, LV edge sharpness, artifacts and LV wall motion) and quantitatively with measurement of LVEF, LV mass, and telediastolic volume and contrast-to-noise ratio (CNR) between the myocardium and the cardiac chamber. Statistical analysis was conducted using a Bayesian paradigm. No difference was found before or after injection for the LVEF, LV mass and telediastolic volume evaluations. Overall image quality and CNR were significantly lower after injection (estimated coefficient cine after > cine before gadolinium: -1.75 CI = [-3.78;-0.0305], prob(coef>0) = 0% and -0.23 CI = [-0.49;0.04], prob(coef>0) = 4%) respectively), but this decrease did not affect the visual assessment of LV wall motion (cine after > cine before gadolinium: -1.46 CI = [-4.72;1.13], prob(coef>0) = 15%). In 3T cardiac MRI acquired with parallel RF transmission technique, qualitative and quantitative assessment of LV function can reliably be performed with cine sequences acquired after gadolinium injection, despite a significant decrease in the CNR and the overall image quality.
Florica, Roxana Oriana; Hipolito, Victoria; Bautista, Stephen; Anvari, Homa; Rapp, Chloe; El-Rass, Suzan; Asgharian, Alimohammad; Antonescu, Costin N; Killeen, Marie T
2017-10-01
The axons of the DA and DB classes of motor neurons fail to reach the dorsal cord in the absence of the guidance cue UNC-6/Netrin or its receptor UNC-5 in C. elegans. However, the axonal processes usually exit their cell bodies in the ventral cord in the absence of both molecules. Strains lacking functional versions of UNC-6 or UNC-5 have a low level of DA and DB motor neuron axon outgrowth defects. We found that mutations in the genes for all six of the ENU-3 proteins function to enhance the outgrowth defects of the DA and DB axons in strains lacking either UNC-6 or UNC-5. A mutation in the gene for the MIG-14/Wntless protein also enhances defects in a strain lacking either UNC-5 or UNC-6, suggesting that the ENU-3 and Wnt pathways function parallel to the Netrin pathway in directing motor neuron axon outgrowth. Our evidence suggests that the ENU-3 proteins are novel members of the Wnt pathway in nematodes. Five of the six members of the ENU-3 family are predicted to be single-pass trans-membrane proteins. The expression pattern of ENU-3.1 was consistent with plasma membrane localization. One family member, ENU-3.6, lacks the predicted signal peptide and the membrane-spanning domain. In HeLa cells ENU-3.6 had a cytoplasmic localization and caused actin dependent processes to appear. We conclude that the ENU-3 family proteins function in a pathway parallel to the UNC-6/Netrin pathway for motor neuron axon outgrowth, most likely in the Wnt pathway. Copyright © 2017 Elsevier Inc. All rights reserved.
Directory of Open Access Journals (Sweden)
James G. Worner
2017-05-01
Full Text Available James Worner is an Australian-based writer and scholar currently pursuing a PhD at the University of Technology Sydney. His research seeks to expose masculinities lost in the shadow of Australia’s Anzac hegemony while exploring new opportunities for contemporary historiography. He is the recipient of the Doctoral Scholarship in Historical Consciousness at the university’s Australian Centre of Public History and will be hosted by the University of Bologna during 2017 on a doctoral research writing scholarship. ‘Parallel Lines’ is one of a collection of stories, The Shapes of Us, exploring liminal spaces of modern life: class, gender, sexuality, race, religion and education. It looks at lives, like lines, that do not meet but which travel in proximity, simultaneously attracted and repelled. James’ short stories have been published in various journals and anthologies.
International Nuclear Information System (INIS)
Zhang Shixun; Yamagia, Shinichi; Yunoki, Seiji
2013-01-01
Models of fermions interacting with classical degrees of freedom are applied to a large variety of systems in condensed matter physics. For this class of models, Weiße [Phys. Rev. Lett. 102, 150604 (2009)] has recently proposed a very efficient numerical method, called O(N) Green-Function-Based Monte Carlo (GFMC) method, where a kernel polynomial expansion technique is used to avoid the full numerical diagonalization of the fermion Hamiltonian matrix of size N, which usually costs O(N 3 ) computational complexity. Motivated by this background, in this paper we apply the GFMC method to the double exchange model in three spatial dimensions. We mainly focus on the implementation of GFMC method using both MPI on a CPU-based cluster and Nvidia's Compute Unified Device Architecture (CUDA) programming techniques on a GPU-based (Graphics Processing Unit based) cluster. The time complexity of the algorithm and the parallel implementation details on the clusters are discussed. We also show the performance scaling for increasing Hamiltonian matrix size and increasing number of nodes, respectively. The performance evaluation indicates that for a 32 3 Hamiltonian a single GPU shows higher performance equivalent to more than 30 CPU cores parallelized using MPI
International Nuclear Information System (INIS)
Wang, Guan M; Sandberg, William C; Kenny, Steven D
2006-01-01
The mechanical and dynamical properties of a model Au(111)/thiol surface system were investigated by using localized atomic-type orbital density functional theory in the local density approximation. Relaxing the system gives a configuration where the sulfur atom forms covalent bonds to two adjacent gold atoms as the lowest energy structure. Investigations based on ab initio molecular dynamics simulations at 300, 350 and 370 K show that this tethering system is stable. The rupture behaviour between the thiol and the surface was studied by displacing the free end of the thiol. Calculated energy profiles show a process of multiple successive ruptures that account for experimental observations. The process features successive ruptures of the two Au-S bonds followed by the extraction of one S-bonded Au atom from the surface. The force required to rupture the thiol from the surface was found to be dependent on the direction in which the thiol was displaced, with values comparable with AFM measurements. These results aid the understanding of failure dynamics of Au(111)-thiol-tethered biosurfaces in microfluidic devices where fluidic shear and normal forces are of concern
Directory of Open Access Journals (Sweden)
Dörries Carola
2010-07-01
Full Text Available Abstract Background Self-gated dynamic cardiovascular magnetic resonance (CMR enables non-invasive visualization of the heart and accurate assessment of cardiac function in mouse models of human disease. However, self-gated CMR requires the acquisition of large datasets to ensure accurate and artifact-free reconstruction of cardiac cines and is therefore hampered by long acquisition times putting high demands on the physiological stability of the animal. For this reason, we evaluated the feasibility of accelerating the data collection using the parallel imaging technique SENSE with respect to both anatomical definition and cardiac function quantification. Results Findings obtained from accelerated data sets were compared to fully sampled reference data. Our results revealed only minor differences in image quality of short- and long-axis cardiac cines: small anatomical structures (papillary muscles and the aortic valve and left-ventricular (LV remodeling after myocardial infarction (MI were accurately detected even for 3-fold accelerated data acquisition using a four-element phased array coil. Quantitative analysis of LV cardiac function (end-diastolic volume (EDV, end-systolic volume (ESV, stroke volume (SV, ejection fraction (EF and LV mass in healthy and infarcted animals revealed no substantial deviations from reference (fully sampled data for all investigated acceleration factors with deviations ranging from 2% to 6% in healthy animals and from 2% to 8% in infarcted mice for the highest acceleration factor of 3.0. CNR calculations performed between LV myocardial wall and LV cavity revealed a maximum CNR decrease of 50% for the 3-fold accelerated data acquisition when compared to the fully-sampled acquisition. Conclusions We have demonstrated the feasibility of accelerated self-gated retrospective CMR in mice using the parallel imaging technique SENSE. The proposed method led to considerably reduced acquisition times, while preserving high
Directory of Open Access Journals (Sweden)
Alex N Nguyen Ba
2017-04-01
Full Text Available Regulatory networks often increase in complexity during evolution through gene duplication and divergence of component proteins. Two models that explain this increase in complexity are: 1 adaptive changes after gene duplication, such as resolution of adaptive conflicts, and 2 non-adaptive processes such as duplication, degeneration and complementation. Both of these models predict complementary changes in the retained duplicates, but they can be distinguished by direct fitness measurements in organisms with short generation times. Previously, it has been observed that repeated duplication of an essential protein in the spindle checkpoint pathway has occurred multiple times over the eukaryotic tree of life, leading to convergent protein domain organization in its duplicates. Here, we replace the paralog pair in S. cerevisiae with a single-copy protein from a species that did not undergo gene duplication. Surprisingly, using quantitative fitness measurements in laboratory conditions stressful for the spindle-checkpoint pathway, we find no evidence that reorganization of protein function after gene duplication is beneficial. We then reconstruct several evolutionary intermediates from the inferred ancestral network to the extant one, and find that, at the resolution of our assay, there exist stepwise mutational paths from the single protein to the divergent pair of extant proteins with no apparent fitness defects. Parallel evolution has been taken as strong evidence for natural selection, but our results suggest that even in these cases, reorganization of protein function after gene duplication may be explained by neutral processes.
Ultrascalable petaflop parallel supercomputer
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY
2010-07-20
A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
Energy Technology Data Exchange (ETDEWEB)
Habermann, C.R.; Cramer, M.C.; Aldefeld, D.; Weiss, F.; Kaul, M.G.; Adam, G. [Radiologisches Zentrum, Klinik und Poliklinik fuer Diagnostische und Interventionelle Radiologie, Universitaetsklinikum Hamburg-Eppendorf (Germany); Graessner, J. [Siemens Medical Systems, Hamburg (Germany); Reitmeier, F.; Jaehne, M. [Kopf- und Hautzentrum, Klinik und Poliklinik fuer Hals-, Nasen- und Ohrenheilkunde, Universitaetsklinikum Hamburg-Eppendorf (Germany); Petersen, K.U. [Zentrum fuer Psychosoziale Medizin, Klinik und Poliklinik fuer Psychiatrie und Psychotherapie, Universitaetsklinikum Hamburg-Eppendorf (Germany)
2005-04-01
Purpose: To optimise a fast sequence for MR-sialography and to compare a parallel and non-parallel acquisition technique. Additionally, the effect of oral stimulation regarding the image quality was evaluated. Material and Methods: All examinations were performed by using a 1.5-T superconducting system. After developing a sufficient sequence for MR-sialography, a single-shot turbo-spin-echo sequence (ss-TSE) with an acquisition time of 2.8 sec was used in transverse and oblique sagittal orientation in 27 healthy volunteers. All images were performed with and without parallel imaging technique. The assessment of the ductal system of the submandibular and parotid gland was performed using a 1 to 5 visual scale for each side separately. Images were evaluated by four independent experienced radiologists. For statistical evaluation, an ANOVA with post-hoc comparisons was used with an overall two-tailed significance level of P=.05. For evaluation of interobserver variability, an intraclass correlation was computed and correlation >.08 was determined to indicate a high correlation. Results: All parts of salivary excretal ducts could be visualised in all volunteers, with an overall rating for all ducts of 2.26 (SD{+-}1.09). Between the four observers a high correlation could be obtained with an intraclass correlation of 0.9475. A significant influence regarding the slice angulations could not be obtained (p=0.74). In all healthy volunteers the visibility of excretory ducts improved significantly after oral application of a Sialogogum (p<0.001; {eta}{sup 2}=0.049). The use of a parallel imaging technique did not lead to an improvement of visualisation, showing a significant loss of image quality compared to an acquistion technique without parallel imaging (p<0.001; {eta}{sup 2}=0.013). Conclusion: The optimised ss-TSE MR-sialography seems to be a fast and sufficient technique for visualisation of excretory ducts of the main salivary glands, with no elaborate post
Lee, Y. C.; Thompson, H. M.; Gaskell, P. H.
2009-12-01
FILMPAR is a highly efficient and portable parallel multigrid algorithm for solving a discretised form of the lubrication approximation to three-dimensional, gravity-driven, continuous thin film free-surface flow over substrates containing micro-scale topography. While generally applicable to problems involving heterogeneous and distributed features, for illustrative purposes the algorithm is benchmarked on a distributed memory IBM BlueGene/P computing platform for the case of flow over a single trench topography, enabling direct comparison with complementary experimental data and existing serial multigrid solutions. Parallel performance is assessed as a function of the number of processors employed and shown to lead to super-linear behaviour for the production of mesh-independent solutions. In addition, the approach is used to solve for the case of flow over a complex inter-connected topographical feature and a description provided of how FILMPAR could be adapted relatively simply to solve for a wider class of related thin film flow problems. Program summaryProgram title: FILMPAR Catalogue identifier: AEEL_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEL_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 530 421 No. of bytes in distributed program, including test data, etc.: 1 960 313 Distribution format: tar.gz Programming language: C++ and MPI Computer: Desktop, server Operating system: Unix/Linux Mac OS X Has the code been vectorised or parallelised?: Yes. Tested with up to 128 processors RAM: 512 MBytes Classification: 12 External routines: GNU C/C++, MPI Nature of problem: Thin film flows over functional substrates containing well-defined single and complex topographical features are of enormous significance, having a wide variety of engineering
Matpar: Parallel Extensions for MATLAB
Springer, P. L.
1998-01-01
Matpar is a set of client/server software that allows a MATLAB user to take advantage of a parallel computer for very large problems. The user can replace calls to certain built-in MATLAB functions with calls to Matpar functions.
Parallel Programming with Intel Parallel Studio XE
Blair-Chappell , Stephen
2012-01-01
Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the
Overview of the Force Scientific Parallel Language
Directory of Open Access Journals (Sweden)
Gita Alaghband
1994-01-01
Full Text Available The Force parallel programming language designed for large-scale shared-memory multiprocessors is presented. The language provides a number of parallel constructs as extensions to the ordinary Fortran language and is implemented as a two-level macro preprocessor to support portability across shared memory multiprocessors. The global parallelism model on which the Force is based provides a powerful parallel language. The parallel constructs, generic synchronization, and freedom from process management supported by the Force has resulted in structured parallel programs that are ported to the many multiprocessors on which the Force is implemented. Two new parallel constructs for looping and functional decomposition are discussed. Several programming examples to illustrate some parallel programming approaches using the Force are also presented.
Directory of Open Access Journals (Sweden)
P. Janhunen
2005-07-01
Full Text Available We study the wave-related (AC and static (DC parallel Poynting vector (Poynting energy flux as a function of altitude in auroral field lines using Polar EFI and MFE data. The study is statistical and contains 5 years of data in the altitude range 5000–30000 km. We verify the low altitude part of the results by comparison with earlier Astrid-2 EMMA Poynting vector statistics at 1000 km altitude. The EMMA data are also used to statistically compensate the Polar results for the missing zonal electric field component. We compare the Poynting vector with previous statistical DMSP satellite data concerning the electron precipitation power. We find that the AC Poynting vector (Alfvén-wave related Poynting vector is statistically not sufficient to power auroral electron precipitation, although it may, for Kp>2, power 25–50% of it. The statistical AC Poynting vector also has a stepwise transition at R=4 RE, so that its amplitude increases with increasing altitude. We suggest that this corresponds to Alfvén waves being in Landau resonance with electrons, so that wave-induced electron acceleration takes place at this altitude range, which was earlier named the Alfvén Resonosphere (ARS. The DC Poynting vector is ~3 times larger than electron precipitation and corresponds mainly to ionospheric Joule heating. In the morning sector (02:00–06:00 MLT we find that the DC Poynting vector has a nontrivial altitude profile such that it decreases by a factor of ~2 when moving upward from 3 to 4 RE radial distance. In other nightside MLT sectors the altitude profile is more uniform. The morning sector nontrivial altitude profile may be due to divergence of the perpendicular Poynting vector field at R=3–4 RE. Keywords. Magnetospheric physics (Auroral phenomena; Magnetosphere-ionosphere interactions – Space plasma physics (Wave-particle interactions
A Parallel Butterfly Algorithm
Poulson, Jack; Demanet, Laurent; Maxwell, Nicholas; Ying, Lexing
2014-01-01
The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.
A Parallel Butterfly Algorithm
Poulson, Jack
2014-02-04
The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.
Morse, H Stephen
1994-01-01
Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi
Akl, Selim G
1985-01-01
Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the
Energy Technology Data Exchange (ETDEWEB)
Rabrait, C
2007-11-15
Echo Planar Imaging is widely used to perform data acquisition in functional neuroimaging. This sequence allows the acquisition of a set of about 30 slices, covering the whole brain, at a spatial resolution ranging from 2 to 4 mm, and a temporal resolution ranging from 1 to 2 s. It is thus well adapted to the mapping of activated brain areas but does not allow precise study of the brain dynamics. Moreover, temporal interpolation is needed in order to correct for inter-slices delays and 2-dimensional acquisition is subject to vascular in flow artifacts. To improve the estimation of the hemodynamic response functions associated with activation, this thesis aimed at developing a 3-dimensional high temporal resolution acquisition method. To do so, Echo Volume Imaging was combined with reduced field-of-view acquisition and parallel imaging. Indeed, E.V.I. allows the acquisition of a whole volume in Fourier space following a single excitation, but it requires very long echo trains. Parallel imaging and field-of-view reduction are used to reduce the echo train durations by a factor of 4, which allows the acquisition of a 3-dimensional brain volume with limited susceptibility-induced distortions and signal losses, in 200 ms. All imaging parameters have been optimized in order to reduce echo train durations and to maximize S.N.R., so that cerebral activation can be detected with a high level of confidence. Robust detection of brain activation was demonstrated with both visual and auditory paradigms. High temporal resolution hemodynamic response functions could be estimated through selective averaging of the response to the different trials of the stimulation. To further improve S.N.R., the matrix inversions required in parallel reconstruction were regularized, and the impact of the level of regularization on activation detection was investigated. Eventually, potential applications of parallel E.V.I. such as the study of non-stationary effects in the B.O.L.D. response
Directory of Open Access Journals (Sweden)
P. Janhunen
2005-07-01
Full Text Available We study the wave-related (AC and static (DC parallel Poynting vector (Poynting energy flux as a function of altitude in auroral field lines using Polar EFI and MFE data. The study is statistical and contains 5 years of data in the altitude range 5000–30000 km. We verify the low altitude part of the results by comparison with earlier Astrid-2 EMMA Poynting vector statistics at 1000 km altitude. The EMMA data are also used to statistically compensate the Polar results for the missing zonal electric field component. We compare the Poynting vector with previous statistical DMSP satellite data concerning the electron precipitation power. We find that the AC Poynting vector (Alfvén-wave related Poynting vector is statistically not sufficient to power auroral electron precipitation, although it may, for K_{p}>2, power 25–50% of it. The statistical AC Poynting vector also has a stepwise transition at R=4 R_{E}, so that its amplitude increases with increasing altitude. We suggest that this corresponds to Alfvén waves being in Landau resonance with electrons, so that wave-induced electron acceleration takes place at this altitude range, which was earlier named the Alfvén Resonosphere (ARS. The DC Poynting vector is ~3 times larger than electron precipitation and corresponds mainly to ionospheric Joule heating. In the morning sector (02:00–06:00 MLT we find that the DC Poynting vector has a nontrivial altitude profile such that it decreases by a factor of ~2 when moving upward from 3 to 4 R_{E} radial distance. In other nightside MLT sectors the altitude profile is more uniform. The morning sector nontrivial altitude profile may be due to divergence of the perpendicular Poynting vector field at R=3–4 R_{E}.
Keywords. Magnetospheric physics (Auroral phenomena; Magnetosphere-ionosphere interactions – Space plasma physics (Wave-particle interactions
Bouwens, R. J.; Illingworth, G. D.; Blakeslee, J. P.; Franx, M.
2006-12-01
We have detected 506 i-dropouts (z~6 galaxies) in deep, wide-area HST ACS fields: HUDF, enhanced GOODS, and HUDF parallel ACS fields (HUDF-Ps). The contamination levels are ~92% are at z~6). With these samples, we present the most comprehensive, quantitative analyses of z~6 galaxies yet and provide optimal measures of the UV luminosity function (LF) and luminosity density at z~6, and their evolution to z~3. We redetermine the size and color evolution from z~6 to z~3. Field-to-field variations (cosmic variance), completeness, flux, and contamination corrections are modeled systematically and quantitatively. After corrections, we derive a rest-frame continuum UV (~1350 Å) LF at z~6 that extends to M1350,AB~-17.5 (0.04L*z=3). There is strong evidence for evolution of the LF between z~6 and z~3, most likely through a brightening (0.6+/-0.2 mag) of M* (at 99.7% confidence), although the degree depends on the faint-end slope. As expected from hierarchical models, the most luminous galaxies are deficient at z~6. Density evolution (φ*) is ruled out at >99.99% confidence. Despite large changes in the LF, the luminosity density at z~6 is similar to (0.82+/-0.21 times) that at z~3. Changes in the mean UV color of galaxies from z~6 to z~3 suggest an evolution in dust content, indicating that the true evolution is substantially larger: at z~6 the star formation rate density is just ~30% of the z~3 value. Our UV LF is consistent with z~6 galaxies providing the necessary UV flux to reionize the universe. Based on observations made with the NASA/ESA Hubble Space Telescope, which is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS 5-26555. These observations are associated with program 9803. Observations have been carried out using the Very Large Telescope at the European Southern Observatory (ESO) Paranal Observatory, under program ID LP168.A-0485.
Fox, Geoffrey C; Messina, Guiseppe C
2014-01-01
A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop
Parallel Atomistic Simulations
Energy Technology Data Exchange (ETDEWEB)
HEFFELFINGER,GRANT S.
2000-01-18
Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.
Parallel Implicit Algorithms for CFD
Keyes, David E.
1998-01-01
The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.
International Nuclear Information System (INIS)
Habermann, C.R.; Cramer, M.C.; Aldefeld, D.; Weiss, F.; Kaul, M.G.; Adam, G.; Graessner, J.; Reitmeier, F.; Jaehne, M.; Petersen, K.U.
2005-01-01
Purpose: To optimise a fast sequence for MR-sialography and to compare a parallel and non-parallel acquisition technique. Additionally, the effect of oral stimulation regarding the image quality was evaluated. Material and Methods: All examinations were performed by using a 1.5-T superconducting system. After developing a sufficient sequence for MR-sialography, a single-shot turbo-spin-echo sequence (ss-TSE) with an acquisition time of 2.8 sec was used in transverse and oblique sagittal orientation in 27 healthy volunteers. All images were performed with and without parallel imaging technique. The assessment of the ductal system of the submandibular and parotid gland was performed using a 1 to 5 visual scale for each side separately. Images were evaluated by four independent experienced radiologists. For statistical evaluation, an ANOVA with post-hoc comparisons was used with an overall two-tailed significance level of P=.05. For evaluation of interobserver variability, an intraclass correlation was computed and correlation >.08 was determined to indicate a high correlation. Results: All parts of salivary excretal ducts could be visualised in all volunteers, with an overall rating for all ducts of 2.26 (SD±1.09). Between the four observers a high correlation could be obtained with an intraclass correlation of 0.9475. A significant influence regarding the slice angulations could not be obtained (p=0.74). In all healthy volunteers the visibility of excretory ducts improved significantly after oral application of a Sialogogum (p 2 =0.049). The use of a parallel imaging technique did not lead to an improvement of visualisation, showing a significant loss of image quality compared to an acquistion technique without parallel imaging (p 2 =0.013). Conclusion: The optimised ss-TSE MR-sialography seems to be a fast and sufficient technique for visualisation of excretory ducts of the main salivary glands, with no elaborate post-processing needed. To improve results of MR
Parallel Boltzmann machines : a mathematical model
Zwietering, P.J.; Aarts, E.H.L.
1991-01-01
A mathematical model is presented for the description of parallel Boltzmann machines. The framework is based on the theory of Markov chains and combines a number of previously known results into one generic model. It is argued that parallel Boltzmann machines maximize a function consisting of a
The convergence of parallel Boltzmann machines
Zwietering, P.J.; Aarts, E.H.L.; Eckmiller, R.; Hartmann, G.; Hauske, G.
1990-01-01
We discuss the main results obtained in a study of a mathematical model of synchronously parallel Boltzmann machines. We present supporting evidence for the conjecture that a synchronously parallel Boltzmann machine maximizes a consensus function that consists of a weighted sum of the regular
CERN. Geneva
2016-01-01
The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...
International Nuclear Information System (INIS)
Petridis, C.; Ries, T.; Cramer, M.C.; Graessner, J.; Petersen, K.U.; Reitmeier, F.; Jaehne, M.; Weiss, F.; Adam, G.; Habermann, C.R.
2007-01-01
Purpose: To evaluate an ultra-fast sequence for MR sialography requiring no post-processing and to compare the acquisition technique regarding the effect of oral stimulation with a parallel acquisition technique in patients with salivary gland diseases. Materials and Methods: 128 patients with salivary gland disease were prospectively examined using a 1.5-T superconducting system with a 30 mT/m maximum gradient capability and a maximum slew rate of 125 mT/m/sec. A single-shot turbo-spin-echo sequence (ss-TSE) with an acquisition time of 2.8 sec was used in transverse and oblique sagittal orientation. All images were obtained with and without a parallel imaging technique. The evaluation of the ductal system of the parotid and submandibular gland was performed using a visual scale of 1-5 for each side. The images were assessed by two independent experienced radiologists. An ANOVA with posthoc comparisons and an overall two tailed significance level of p=0.05 was used for the statistical evaluation. An intraclass correlation was computed to evaluate interobserver variability and a correlation of >0.8 was determined, thereby indicating a high correlation. Results: Depending on the diagnosed diseases and the absence of abruption of the ducts, all parts of excretory ducts were able to be visualized in all patients using the developed technique with an overall rating for all ducts of 2.70 (SD±0.89). A high correlation was achieved between the two observers with an intraclass correlation of 0.73. Oral application of a sialogogum improved the visibility of excretory ducts significantly (p<0.001). In contrast, the use of a parallel imaging technique led to a significant decrease in image quality (p=0,011). (orig.)
Compiler Technology for Parallel Scientific Computation
Directory of Open Access Journals (Sweden)
Can Özturan
1994-01-01
Full Text Available There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with equational programming language (EPL. Our approach is based on a program decomposition, parallel code synthesis, and run-time support for parallel scientific computation. The program decomposition is guided by the source program annotations provided by the user. The synthesis of parallel code is based on configurations that describe the overall computation as a set of interacting components. Run-time support is provided by the compiler-generated code that redistributes computation and data during object program execution. The generated parallel code is optimized using techniques of data alignment, operator placement, wavefront determination, and memory optimization. In this article we discuss annotations, configurations, parallel code generation, and run-time support suitable for parallel programs written in the functional parallel programming language EPL and in Fortran.
Multitasking TORT Under UNICOS: Parallel Performance Models and Measurements
International Nuclear Information System (INIS)
Azmy, Y.Y.; Barnett, D.A.
1999-01-01
The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-COS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead
Multitasking TORT under UNICOS: Parallel performance models and measurements
International Nuclear Information System (INIS)
Barnett, A.; Azmy, Y.Y.
1999-01-01
The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead
DEFF Research Database (Denmark)
Sitchinava, Nodar; Zeh, Norbert
2012-01-01
We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number of...... in the optimal OhOf(psortN + K/PB) parallel I/O complexity, where K is the size of the output reported in the process and psortN is the parallel I/O complexity of sorting N elements using P processors....
Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A; Seiberlich, Nicole
2012-07-01
Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the undersampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. Copyright © 2012 Wiley Periodicals, Inc.
Parallel Algorithms and Patterns
Energy Technology Data Exchange (ETDEWEB)
Robey, Robert W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
2016-06-16
This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.
Application Portable Parallel Library
Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott
1995-01-01
Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
Parallel discrete event simulation
Overeinder, B.J.; Hertzberger, L.O.; Sloot, P.M.A.; Withagen, W.J.
1991-01-01
In simulating applications for execution on specific computing systems, the simulation performance figures must be known in a short period of time. One basic approach to the problem of reducing the required simulation time is the exploitation of parallelism. However, in parallelizing the simulation
Parallel reservoir simulator computations
International Nuclear Information System (INIS)
Hemanth-Kumar, K.; Young, L.C.
1995-01-01
The adaptation of a reservoir simulator for parallel computations is described. The simulator was originally designed for vector processors. It performs approximately 99% of its calculations in vector/parallel mode and relative to scalar calculations it achieves speedups of 65 and 81 for black oil and EOS simulations, respectively on the CRAY C-90
Totally parallel multilevel algorithms
Frederickson, Paul O.
1988-01-01
Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.
Energy Technology Data Exchange (ETDEWEB)
1991-10-23
An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
Massively parallel mathematical sieves
Energy Technology Data Exchange (ETDEWEB)
Montry, G.R.
1989-01-01
The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.
Fellner, C; Doenitz, C; Finkenzeller, T; Jung, E M; Rennert, J; Schlaier, J
2009-01-01
Geometric distortions and low spatial resolution are current limitations in functional magnetic resonance imaging (fMRI). The aim of this study was to evaluate if application of parallel imaging or significant reduction of voxel size in combination with a new 32-channel head array coil can reduce those drawbacks at 1.5 T for a simple hand motor task. Therefore, maximum t-values (tmax) in different regions of activation, time-dependent signal-to-noise ratios (SNR(t)) as well as distortions within the precentral gyrus were evaluated. Comparing fMRI with and without parallel imaging in 17 healthy subjects revealed significantly reduced geometric distortions in anterior-posterior direction. Using parallel imaging, tmax only showed a mild reduction (7-11%) although SNR(t) was significantly diminished (25%). In 7 healthy subjects high-resolution (2 x 2 x 2 mm3) fMRI was compared with standard fMRI (3 x 3 x 3 mm3) in a 32-channel coil and with high-resolution fMRI in a 12-channel coil. The new coil yielded a clear improvement for tmax (21-32%) and SNR(t) (51%) in comparison with the 12-channel coil. Geometric distortions were smaller due to the smaller voxel size. Therefore, the reduction in tmax (8-16%) and SNR(t) (52%) in the high-resolution experiment seems to be tolerable with this coil. In conclusion, parallel imaging is an alternative to reduce geometric distortions in fMRI at 1.5 T. Using a 32-channel coil, reduction of the voxel size might be the preferable way to improve spatial accuracy.
Parallel imaging with phase scrambling.
Zaitsev, Maxim; Schultz, Gerrit; Hennig, Juergen; Gruetter, Rolf; Gallichan, Daniel
2015-04-01
Most existing methods for accelerated parallel imaging in MRI require additional data, which are used to derive information about the sensitivity profile of each radiofrequency (RF) channel. In this work, a method is presented to avoid the acquisition of separate coil calibration data for accelerated Cartesian trajectories. Quadratic phase is imparted to the image to spread the signals in k-space (aka phase scrambling). By rewriting the Fourier transform as a convolution operation, a window can be introduced to the convolved chirp function, allowing a low-resolution image to be reconstructed from phase-scrambled data without prominent aliasing. This image (for each RF channel) can be used to derive coil sensitivities to drive existing parallel imaging techniques. As a proof of concept, the quadratic phase was applied by introducing an offset to the x(2) - y(2) shim and the data were reconstructed using adapted versions of the image space-based sensitivity encoding and GeneRalized Autocalibrating Partially Parallel Acquisitions algorithms. The method is demonstrated in a phantom (1 × 2, 1 × 3, and 2 × 2 acceleration) and in vivo (2 × 2 acceleration) using a 3D gradient echo acquisition. Phase scrambling can be used to perform parallel imaging acceleration without acquisition of separate coil calibration data, demonstrated here for a 3D-Cartesian trajectory. Further research is required to prove the applicability to other 2D and 3D sampling schemes. © 2014 Wiley Periodicals, Inc.
Automatic Parallelization Tool: Classification of Program Code for Parallel Computing
Directory of Open Access Journals (Sweden)
Mustafa Basthikodi
2016-04-01
Full Text Available Performance growth of single-core processors has come to a halt in the past decade, but was re-enabled by the introduction of parallelism in processors. Multicore frameworks along with Graphical Processing Units empowered to enhance parallelism broadly. Couples of compilers are updated to developing challenges forsynchronization and threading issues. Appropriate program and algorithm classifications will have advantage to a great extent to the group of software engineers to get opportunities for effective parallelization. In present work we investigated current species for classification of algorithms, in that related work on classification is discussed along with the comparison of issues that challenges the classification. The set of algorithms are chosen which matches the structure with different issues and perform given task. We have tested these algorithms utilizing existing automatic species extraction toolsalong with Bones compiler. We have added functionalities to existing tool, providing a more detailed characterization. The contributions of our work include support for pointer arithmetic, conditional and incremental statements, user defined types, constants and mathematical functions. With this, we can retain significant data which is not captured by original speciesof algorithms. We executed new theories into the device, empowering automatic characterization of program code.
Algorithms for parallel computers
International Nuclear Information System (INIS)
Churchhouse, R.F.
1985-01-01
Until relatively recently almost all the algorithms for use on computers had been designed on the (usually unstated) assumption that they were to be run on single processor, serial machines. With the introduction of vector processors, array processors and interconnected systems of mainframes, minis and micros, however, various forms of parallelism have become available. The advantage of parallelism is that it offers increased overall processing speed but it also raises some fundamental questions, including: (i) which, if any, of the existing 'serial' algorithms can be adapted for use in the parallel mode. (ii) How close to optimal can such adapted algorithms be and, where relevant, what are the convergence criteria. (iii) How can we design new algorithms specifically for parallel systems. (iv) For multi-processor systems how can we handle the software aspects of the interprocessor communications. Aspects of these questions illustrated by examples are considered in these lectures. (orig.)
Parallelism and array processing
International Nuclear Information System (INIS)
Zacharov, V.
1983-01-01
Modern computing, as well as the historical development of computing, has been dominated by sequential monoprocessing. Yet there is the alternative of parallelism, where several processes may be in concurrent execution. This alternative is discussed in a series of lectures, in which the main developments involving parallelism are considered, both from the standpoint of computing systems and that of applications that can exploit such systems. The lectures seek to discuss parallelism in a historical context, and to identify all the main aspects of concurrency in computation right up to the present time. Included will be consideration of the important question as to what use parallelism might be in the field of data processing. (orig.)
Advanced parallel processing with supercomputer architectures
International Nuclear Information System (INIS)
Hwang, K.
1987-01-01
This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers
Meex, R.C.R.; Schrauwen-Hinderling, V.B.; Moonen-Kornips, E.; Schaart, G.; Mensink, M.R.; Phielix, E.; Weijer, van de T.; Sels, J.P.; Schrauwen, P.; Hesselink, M.K.C.
2010-01-01
OBJECTIVE-Mitochondrial dysfunction and fat accumulation in skeletal muscle (increased intramyocellular lipid [IMCL]) have been linked to development of type 2 diabetes. We examined whether exercise training could restore mitochondrial function and insulin sensitivity in patients with type 2
Parallel magnetic resonance imaging
International Nuclear Information System (INIS)
Larkman, David J; Nunes, Rita G
2007-01-01
Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed. (invited topical review)
Shared Variable Oriented Parallel Precompiler for SPMD Model
Institute of Scientific and Technical Information of China (English)
无
1995-01-01
For the moment,commercial parallel computer systems with distributed memory architecture are usually provided with parallel FORTRAN or parallel C compliers,which are just traditional sequential FORTRAN or C compilers expanded with communication statements.Programmers suffer from writing parallel programs with communication statements. The Shared Variable Oriented Parallel Precompiler (SVOPP) proposed in this paper can automatically generate appropriate communication statements based on shared variables for SPMD(Single Program Multiple Data) computation model and greatly ease the parallel programming with high communication efficiency.The core function of parallel C precompiler has been successfully verified on a transputer-based parallel computer.Its prominent performance shows that SVOPP is probably a break-through in parallel programming technique.
Directory of Open Access Journals (Sweden)
Kunjumon I Vadakkan
2011-07-01
Full Text Available The internal sensation of memory, which is available only to the owner of an individual nervous system, is difficult to analyze for its basic elements of operation. We hypothesize that associative learning induces the formation of functional LINK between the postsynapses. During memory retrieval, the activation of either postsynapse re-activates the functional LINK evoking a semblance of sensory activity arriving at its opposite postsynapse, nature of which defines the basic unit of virtual internal sensation - namely, semblion. Neuronal networks that undergo continuous oscillatory activity at certain levels of their organization induce semblions enabling the system to continuously learn, self-organize, and demonstrate instantiation, features that can be utilized for developing artificial intelligence (AI. Suitability of the inter-postsynaptic functional LINKs to meet the expectations of Minsky’s K-lines, basic elements of a memory theory generated to develop AI and methods to replicate semblances outside the nervous system are explained.
The STAPL Parallel Graph Library
Harshvardhan,; Fidel, Adam; Amato, Nancy M.; Rauchwerger, Lawrence
2013-01-01
This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable
Vadakkan, Kunjumon I.
2011-01-01
The internal sensation of memory, which is available only to the owner of an individual nervous system, is difficult to analyze for its basic elements of operation. We hypothesize that associative learning induces the formation of functional LINK between the postsynapses. During memory retrieval, the activation of either postsynapse re-activates the functional LINK evoking a semblance of sensory activity arriving at its opposite postsynapse, nature of which defines the basic unit of internal sensation – namely, the semblion. In neuronal networks that undergo continuous oscillatory activity at certain levels of their organization re-activation of functional LINKs is expected to induce semblions, enabling the system to continuously learn, self-organize, and demonstrate instantiation, features that can be utilized for developing artificial intelligence (AI). This paper also explains suitability of the inter-postsynaptic functional LINKs to meet the expectations of Minsky’s K-lines, basic elements of a memory theory generated to develop AI and methods to replicate semblances outside the nervous system. PMID:21845180
Giesinger, J.M.; Aaronson, N.K.; Arraras, J.I.; Efficace, F.; Groenvold, M.; Kieffer, J.M.; Loth, F.L.; Petersen, M.A.; Ramage, J.; Tomaszewski, K.A.; Young, T.; Holzner, B.
2018-01-01
Objective: In this study, we investigated what makes a symptom or functional impairment clinically important, that is, relevant for a patient to discuss with a health care professional (HCP). This is the first part of a European Organisation for Research and Treatment of Cancer (EORTC) Quality of
Meex, Ruth C R; Schrauwen-Hinderling, Vera B; Moonen-Kornips, Esther; Schaart, Gert; Mensink, Marco; Phielix, Esther; van de Weijer, Tineke; Sels, Jean-Pierre; Schrauwen, Patrick; Hesselink, Matthijs K C
2010-03-01
Mitochondrial dysfunction and fat accumulation in skeletal muscle (increased intramyocellular lipid [IMCL]) have been linked to development of type 2 diabetes. We examined whether exercise training could restore mitochondrial function and insulin sensitivity in patients with type 2 diabetes. Eighteen male type 2 diabetic and 20 healthy male control subjects of comparable body weight, BMI, age, and VO2max participated in a 12-week combined progressive training program (three times per week and 45 min per session). In vivo mitochondrial function (assessed via magnetic resonance spectroscopy), insulin sensitivity (clamp), metabolic flexibility (indirect calorimetry), and IMCL content (histochemically) were measured before and after training. Mitochondrial function was lower in type 2 diabetic compared with control subjects (P = 0.03), improved by training in control subjects (28% increase; P = 0.02), and restored to control values in type 2 diabetic subjects (48% increase; P type 2 diabetic subjects (delta Rd 63% increase; P type 2 diabetic subjects was restored (delta respiratory exchange ratio 63% increase; P = 0.01) but was unchanged in control subjects (delta respiratory exchange ratio 7% increase; P = 0.22). Starting with comparable pretraining IMCL levels, training tended to increase IMCL content in type 2 diabetic subjects (27% increase; P = 0.10), especially in type 2 muscle fibers. Exercise training restored in vivo mitochondrial function in type 2 diabetic subjects. Insulin-mediated glucose disposal and metabolic flexibility improved in type 2 diabetic subjects in the face of near-significantly increased IMCL content. This indicates that increased capacity to store IMCL and restoration of improved mitochondrial function contribute to improved muscle insulin sensitivity.
Massively parallel multicanonical simulations
Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard
2018-03-01
Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.
SPINning parallel systems software
International Nuclear Information System (INIS)
Matlin, O.S.; Lusk, E.; McCune, W.
2002-01-01
We describe our experiences in using Spin to verify parts of the Multi Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of processes connected by Unix network sockets. MPD is dynamic processes and connections among them are created and destroyed as MPD is initialized, runs user processes, recovers from faults, and terminates. This dynamic nature is easily expressible in the Spin/Promela framework but poses performance and scalability challenges. We present here the results of expressing some of the parallel algorithms of MPD and executing both simulation and verification runs with Spin
Parallel programming with Python
Palach, Jan
2014-01-01
A fast, easy-to-follow and clear tutorial to help you develop Parallel computing systems using Python. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts and will help you in implementing these techniques in the real world. If you are an experienced Python programmer and are willing to utilize the available computing resources by parallelizing applications in a simple way, then this book is for you. You are required to have a basic knowledge of Python development to get the most of this book.
Carlotti, François; Pagano, Marc; Guilloux, Loïc; Donoso, Katty; Valdés, Valentina; Hunt, Brian P. V.
2018-03-01
increasing in contribution from west to east while, in parallel, gelatinous plankton decreased (dominated by appendicularians) and other holoplankton. Detritus in the net tow samples represented 20-50 % of the biomass, the lowest and the highest values being obtained in the subtropical Pacific gyre and in the Coral Sea, respectively, linked to the local primary production and the biomass and growth rates of zooplanktonic populations. Taxonomic compositions showed a high degree of similarity across the whole region, however, with a moderate difference in subtropical Pacific gyre. Several copepod taxa, known to have trophic links with Trichodesmium, presented positive relationships with Trichodesmium abundance, such as the Harpacticoids Macrosetella, Microsetella and Miracia, and the Poecilostomatoids Corycaeus and Oncaea. At the LD stations, the populations initially responded to local spring blooms with a large production of larval forms, reflected in increasing abundances but with limited (station LD-A) or no (station LD-A) biomass changes. Diazotrophs contributed up to 67 and 75 % to zooplankton biomass in the western and central Melanesian Archipelago regions respectively, but strongly decreased to an average of 22 % in the subtropical Pacific gyre (GY) and down to 7 % occurring in the most eastern station (SD-15). Using allometric relationships, specific zooplankton ingestion rates were estimated between 0.55 and 0.64 d-1 with the highest mean value at the bloom station (LD-B) and the lowest in GY, whereas estimated weight specific excretion rates ranged between 0.1 and 0.15 d-1 for NH4 and between 0.09 and 9.12 d-1 for PO4. Daily grazing pressure on phytoplankton stocks and daily regeneration by zooplankton were as well estimated for the different regions showing contrasted impacts between MA and GY regions. For the 3 LD stations, it was not possible to find any relationship between the abundance and biomass in the water column and swimmers found in sediment
Expressing Parallelism with ROOT
Energy Technology Data Exchange (ETDEWEB)
Piparo, D. [CERN; Tejedor, E. [CERN; Guiraud, E. [CERN; Ganis, G. [CERN; Mato, P. [CERN; Moneta, L. [CERN; Valls Pla, X. [CERN; Canal, P. [Fermilab
2017-11-22
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Expressing Parallelism with ROOT
Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.
2017-10-01
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Parallel Fast Legendre Transform
Alves de Inda, M.; Bisseling, R.H.; Maslen, D.K.
1998-01-01
We discuss a parallel implementation of a fast algorithm for the discrete polynomial Legendre transform We give an introduction to the DriscollHealy algorithm using polynomial arithmetic and present experimental results on the eciency and accuracy of our implementation The algorithms were
Practical parallel programming
Bauer, Barr E
2014-01-01
This is the book that will teach programmers to write faster, more efficient code for parallel processors. The reader is introduced to a vast array of procedures and paradigms on which actual coding may be based. Examples and real-life simulations using these devices are presented in C and FORTRAN.
Parallel hierarchical radiosity rendering
Energy Technology Data Exchange (ETDEWEB)
Carter, Michael [Iowa State Univ., Ames, IA (United States)
1993-07-01
In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.
Parallel universes beguile science
2007-01-01
A staple of mind-bending science fiction, the possibility of multiple universes has long intrigued hard-nosed physicists, mathematicians and cosmologists too. We may not be able -- as least not yet -- to prove they exist, many serious scientists say, but there are plenty of reasons to think that parallel dimensions are more than figments of eggheaded imagination.
Energy Technology Data Exchange (ETDEWEB)
2017-04-04
A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.
International Nuclear Information System (INIS)
Gardes, D.; Volkov, P.
1981-01-01
A 5x3cm 2 (timing only) and a 15x5cm 2 (timing and position) parallel plate avalanche counters (PPAC) are considered. The theory of operation and timing resolution is given. The measurement set-up and the curves of experimental results illustrate the possibilities of the two counters [fr
Parallel hierarchical global illumination
Energy Technology Data Exchange (ETDEWEB)
Snell, Quinn O. [Iowa State Univ., Ames, IA (United States)
1997-10-08
Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.
Giesinger, Johannes M; Aaronson, Neil K; Arraras, Juan I; Efficace, Fabio; Groenvold, Mogens; Kieffer, Jacobien M; Loth, Fanny L; Petersen, Morten Aa; Ramage, John; Tomaszewski, Krzysztof A; Young, Teresa; Holzner, Bernhard
2018-02-01
In this study, we investigated what makes a symptom or functional impairment clinically important, that is, relevant for a patient to discuss with a health care professional (HCP). This is the first part of a European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Group project focusing on the development of thresholds for clinical importance for the EORTC QLQ-C30 questionnaire and its corresponding computer-adaptive version. We conducted interviews with cancer patients and HCPs in 6 European countries. Participants were asked to name aspects of a symptom or problem that make it clinically important and to provide importance ratings for a predefined set of aspects (eg, need for help and limitations of daily functioning). We conducted interviews with 83 cancer patients (mean age, 60.3 y; 50.6% men) and 67 HCPs. Participants related clinical importance to limitations of everyday life (patients, 65.1%; HCPs, 77.6%), the emotional impact of a symptom/problem (patients, 53.0%; HCPs, 64.2%), and duration/frequency (patients, 51.8%; HCPs, 49.3%). In the patient sample, importance ratings were highest for worries by partner or family, limitations in everyday life, and need for help from the medical staff. Health care professionals rated limitations in everyday life and need for help from the medical staff to be most important. Limitations in everyday life, need for (medical) help, and emotional impact on the patient or family/partner were found to be relevant aspects of clinical importance. Based on these findings, we will define anchor items for the development of thresholds for clinical importance for the EORTC measures in a Europe-wide field study. Copyright © 2017 John Wiley & Sons, Ltd.
Rouhani, Mohammad Hossein; Mortazavi Najafabadi, Mojgan; Surkan, Pamela J; Esmaillzadeh, Ahmad; Feizi, Awat; Azadbakht, Leila
2018-02-01
Animal studies report that oat (Avena sativa L) intake has favorable effects on kidney function. However, the effects of oat consumption have not been assessed in humans. The aim of this study was to examine the impact of oat intake on biomarkers of renal function in patients with chronic kidney disease (CKD). Fifty-two patients with CKD were randomly assigned to a control group (recommended to reduce intake of dietary protein, phosphorus, sodium and potassium) or an oat consumption group (given nutritional recommendations for controls +50 g/day oats). Blood urea nitrogen (BUN), serum creatinine (SCr), urine creatinine, serum albumin, serum potassium, parathyroid hormone (PTH), serum klotho and urine protein concentration were measured at baseline and after an eight-week intervention. Creatinine clearance was calculated using urine creatinine concentration. Within group analysis showed a significant increase in BUN (P = 0.02) and serum potassium (P = 0.01) and a marginally significant increment in SCr (P = 0.08) among controls. However, changes in the oat group were not significant. In a multivariate adjusted model, we observed a significant difference in change of serum potassium (-0.03 mEq/L for oat group and 0.13 mEq/L for control group; P = 0.01) and a marginally significant difference in change of serum albumin (0.01 g/dl for oat group and -0.08 for control group; P = 0.08) between the two groups. There was no change in PTH concentration. Intake of oats may have a beneficial effect on serum albumin and serum potassium in patients with CKD. Present study registered under IRCT.ir identifier no. IRCT2015050414551N2. Copyright © 2016 Elsevier Ltd and European Society for Clinical Nutrition and Metabolism. All rights reserved.
Directory of Open Access Journals (Sweden)
Deepti Jain
Full Text Available Salt-sensitive yeast mutants were deployed to characterize a gene encoding a C2H2 zinc finger protein (CaZF that is differentially expressed in a drought-tolerant variety of chickpea (Cicer arietinum and provides salinity-tolerance in transgenic tobacco. In Saccharomyces cerevisiae most of the cellular responses to hyper-osmotic stress is regulated by two interconnected pathways involving high osmolarity glycerol mitogen-activated protein kinase (Hog1p and Calcineurin (CAN, a Ca(2+/calmodulin-regulated protein phosphatase 2B. In this study, we report that heterologous expression of CaZF provides osmotolerance in S. cerevisiae through Hog1p and Calcineurin dependent as well as independent pathways. CaZF partially suppresses salt-hypersensitive phenotypes of hog1, can and hog1can mutants and in conjunction, stimulates HOG and CAN pathway genes with subsequent accumulation of glycerol in absence of Hog1p and CAN. CaZF directly binds to stress response element (STRE to activate STRE-containing promoter in yeast. Transactivation and salt tolerance assays of CaZF deletion mutants showed that other than the transactivation domain a C-terminal domain composed of acidic and basic amino acids is also required for its function. Altogether, results from this study suggests that CaZF is a potential plant salt-tolerance determinant and also provide evidence that in budding yeast expression of HOG and CAN pathway genes can be stimulated in absence of their regulatory enzymes to provide osmotolerance.
Data communications in a parallel active messaging interface of a parallel computer
Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E
2013-11-12
Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.
Wald, Ingo; Ize, Santiago
2015-07-28
Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.
DEFF Research Database (Denmark)
Gregersen, Frans; Josephson, Olle; Kristoffersen, Gjert
of departure that English may be used in parallel with the various local, in this case Nordic, languages. As such, the book integrates the challenge of internationalization faced by any university with the wish to improve quality in research, education and administration based on the local language......Abstract [en] More parallel, please is the result of the work of an Inter-Nordic group of experts on language policy financed by the Nordic Council of Ministers 2014-17. The book presents all that is needed to plan, practice and revise a university language policy which takes as its point......(s). There are three layers in the text: First, you may read the extremely brief version of the in total 11 recommendations for best practice. Second, you may acquaint yourself with the extended version of the recommendations and finally, you may study the reasoning behind each of them. At the end of the text, we give...
PARALLEL MOVING MECHANICAL SYSTEMS
Directory of Open Access Journals (Sweden)
Florian Ion Tiberius Petrescu
2014-09-01
Full Text Available Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Moving mechanical systems parallel structures are solid, fast, and accurate. Between parallel systems it is to be noticed Stewart platforms, as the oldest systems, fast, solid and precise. The work outlines a few main elements of Stewart platforms. Begin with the geometry platform, kinematic elements of it, and presented then and a few items of dynamics. Dynamic primary element on it means the determination mechanism kinetic energy of the entire Stewart platforms. It is then in a record tail cinematic mobile by a method dot matrix of rotation. If a structural mottoelement consists of two moving elements which translates relative, drive train and especially dynamic it is more convenient to represent the mottoelement as a single moving components. We have thus seven moving parts (the six motoelements or feet to which is added mobile platform 7 and one fixed.
Xyce parallel electronic simulator.
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.
2010-05-01
This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.
Betchov, R
2012-01-01
Stability of Parallel Flows provides information pertinent to hydrodynamical stability. This book explores the stability problems that occur in various fields, including electronics, mechanics, oceanography, administration, economics, as well as naval and aeronautical engineering. Organized into two parts encompassing 10 chapters, this book starts with an overview of the general equations of a two-dimensional incompressible flow. This text then explores the stability of a laminar boundary layer and presents the equation of the inviscid approximation. Other chapters present the general equation
Algorithmically specialized parallel computers
Snyder, Lawrence; Gannon, Dennis B
1985-01-01
Algorithmically Specialized Parallel Computers focuses on the concept and characteristics of an algorithmically specialized computer.This book discusses the algorithmically specialized computers, algorithmic specialization using VLSI, and innovative architectures. The architectures and algorithms for digital signal, speech, and image processing and specialized architectures for numerical computations are also elaborated. Other topics include the model for analyzing generalized inter-processor, pipelined architecture for search tree maintenance, and specialized computer organization for raster
International Nuclear Information System (INIS)
Yamazaki, Takao; Fujisaki, Masahide; Okuda, Motoi; Takano, Makoto; Masukawa, Fumihiro; Naito, Yoshitaka
1993-01-01
The general purpose Monte Carlo code MCNP4 has been implemented on the Fujitsu AP1000 distributed memory highly parallel computer. Parallelization techniques developed and studied are reported. A shielding analysis function of the MCNP4 code is parallelized in this study. A technique to map a history to each processor dynamically and to map control process to a certain processor was applied. The efficiency of parallelized code is up to 80% for a typical practical problem with 512 processors. These results demonstrate the advantages of a highly parallel computer to the conventional computers in the field of shielding analysis by Monte Carlo method. (orig.)
Vitale, Salvatore Giovanni; Caruso, Salvatore; Rapisarda, Agnese Maria Chiara; Cianci, Stefano; Cianci, Antonio
2018-03-01
Menopause results in metabolic changes that contribute to increase risk of cardiovascular diseases: increase in low density lipoprotein (LDL) and triglycerides and decrease in high density lipoprotein (HDL), weight gain are associated with a correspondent increase in incidence of hypertension and diabetes. The aim of this study was to evaluate the effect of a preparation of isoflavones, calcium vitamin D and inulin in menopausal women. We performed a prospective, randomized, placebo-controlled, parallel-group study. A total of 50 patients were randomized to receive either oral preparations of isoflavones (40 mg), calcium (500 mg) vitamin D (300 UI) and inulin (3 g) or placebo (control group). Pre- and post-treatment assessment of quality of life and sexual function were performed through Menopause-Specific Quality of Life Questionnaire (MENQOL) and Female Sexual Function Index (FSFI); evaluations of anthropometric indicators, body composition through bioelectrical impedance analyser, lumbar spine and proximal femur T-score and lipid profile were performed. After 12 months, a significant reduction in MENQOL vasomotor, physical and sexual domain scores ( p 0.05) were found in the same group. According to our data analysis, isoflavones, calcium, vitamin D and inulin may exert favourable effects on menopausal symptoms and signs.
Solà, Rosa; Valls, Rosa-Maria; Martorell, Isabel; Giralt, Montserrat; Pedret, Anna; Taltavull, Núria; Romeu, Marta; Rodríguez, Àurea; Moriña, David; Lopez de Frutos, Victor; Montero, Manuel; Casajuana, Maria-Carmen; Pérez, Laura; Faba, Jenny; Bernal, Gloria; Astilleros, Anna; González, Roser; Puiggrós, Francesc; Arola, Lluís; Chetrit, Carlos; Martinez-Puig, Daniel
2015-11-01
Preliminary results suggested that oral-administration of rooster comb extract (RCE) rich in hyaluronic acid (HA) was associated with improved muscle strength. Following these promising results, the objective of the present study was to evaluate the effect of low-fat yoghurt supplemented with RCE rich in HA on muscle function in adults with mild knee pain; a symptom of early osteoarthritis. Participants (n = 40) received low-fat yoghurt (125 mL d(-1)) supplemented with 80 mg d(-1) of RCE and the placebo group (n = 40) consumed the same yoghurt without the RCE, in a randomized, controlled, double-blind, parallel trial over 12 weeks. Using an isokinetic dynamometer (Biodex System 4), RCE consumption, compared to control, increased the affected knee peak torque, total work and mean power at 180° s(-1), at least 11% in men (p < 0.05) with no differences in women. No dietary differences were noted. These results suggest that long-term consumption of low-fat yoghurt supplemented with RCE could be a dietary tool to improve muscle strength in men, associated with possible clinical significance. However, further studies are needed to elucidate reasons for these sex difference responses observed, and may provide further insight into muscle function.
Resistor Combinations for Parallel Circuits.
McTernan, James P.
1978-01-01
To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)
SOFTWARE FOR DESIGNING PARALLEL APPLICATIONS
Directory of Open Access Journals (Sweden)
M. K. Bouza
2017-01-01
Full Text Available The object of research is the tools to support the development of parallel programs in C/C ++. The methods and software which automates the process of designing parallel applications are proposed.
Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors
Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.
1990-01-01
Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.
Parallel External Memory Graph Algorithms
DEFF Research Database (Denmark)
Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari
2010-01-01
In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of Â¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....
Energy Technology Data Exchange (ETDEWEB)
Lober, R.R.; Tautges, T.J.; Vaughan, C.T.
1997-03-01
Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.
An interactive parallel processor for data analysis
International Nuclear Information System (INIS)
Mong, J.; Logan, D.; Maples, C.; Rathbun, W.; Weaver, D.
1984-01-01
A parallel array of eight minicomputers has been assembled in an attempt to deal with kiloparameter data events. By exporting computer system functions to a separate processor, the authors have been able to achieve computer amplification linearly proportional to the number of executing processors
Experience with a clustered parallel reduction machine
Beemster, M.; Hartel, Pieter H.; Hertzberger, L.O.; Hofman, R.F.H.; Langendoen, K.G.; Li, L.L.; Milikowski, R.; Vree, W.G.; Barendregt, H.P.; Mulder, J.C.
A clustered architecture has been designed to exploit divide and conquer parallelism in functional programs. The programming methodology developed for the machine is based on explicit annotations and program transformations. It has been successfully applied to a number of algorithms resulting in a
Parallel inter channel interaction mechanisms
International Nuclear Information System (INIS)
Jovic, V.; Afgan, N.; Jovic, L.
1995-01-01
Parallel channels interactions are examined. For experimental researches of nonstationary regimes flow in three parallel vertical channels results of phenomenon analysis and mechanisms of parallel channel interaction for adiabatic condition of one-phase fluid and two-phase mixture flow are shown. (author)
International Nuclear Information System (INIS)
Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G
2007-01-01
The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results
Fast parallel event reconstruction
CERN. Geneva
2010-01-01
On-line processing of large data volumes produced in modern HEP experiments requires using maximum capabilities of modern and future many-core CPU and GPU architectures.One of such powerful feature is a SIMD instruction set, which allows packing several data items in one register and to operate on all of them, thus achievingmore operations per clock cycle. Motivated by the idea of using the SIMD unit ofmodern processors, the KF based track fit has been adapted for parallelism, including memory optimization, numerical analysis, vectorization with inline operator overloading, and optimization using SDKs. The speed of the algorithm has been increased in 120000 times with 0.1 ms/track, running in parallel on 16 SPEs of a Cell Blade computer. Running on a Nehalem CPU with 8 cores it shows the processing speed of 52 ns/track using the Intel Threading Building Blocks. The same KF algorithm running on an Nvidia GTX 280 in the CUDA frameworkprovi...
Parallelization of Reversible Ripple-carry Adders
DEFF Research Database (Denmark)
Thomsen, Michael Kirkedal; Axelsen, Holger Bock
2009-01-01
The design of fast arithmetic logic circuits is an important research topic for reversible and quantum computing. A special challenge in this setting is the computation of standard arithmetical functions without the generation of \\emph{garbage}. Here, we present a novel parallelization scheme...... wherein $m$ parallel $k$-bit reversible ripple-carry adders are combined to form a reversible $mk$-bit \\emph{ripple-block carry adder} with logic depth $\\mathcal{O}(m+k)$ for a \\emph{minimal} logic depth $\\mathcal{O}(\\sqrt{mk})$, thus improving on the $mk$-bit ripple-carry adder logic depth $\\mathcal...
PALNS - A software framework for parallel large neighborhood search
DEFF Research Database (Denmark)
Røpke, Stefan
2009-01-01
This paper propose a simple, parallel, portable software framework for the metaheuristic named large neighborhood search (LNS). The aim is to provide a framework where the user has to set up a few data structures and implement a few functions and then the framework provides a metaheuristic where ...... parallelization "comes for free". We apply the parallel LNS heuristic to two different problems: the traveling salesman problem with pickup and delivery (TSPPD) and the capacitated vehicle routing problem (CVRP)....
Programming parallel architectures - The BLAZE family of languages
Mehrotra, Piyush
1989-01-01
This paper gives an overview of the various approaches to programming multiprocessor architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive, since they remove much of the burden of exploiting parallel architectures from the user. This paper also describes recent work in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described.
International Nuclear Information System (INIS)
DeHart, Mark D.; Williams, Mark L.; Bowman, Stephen M.
2010-01-01
The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement
Parallel Polarization State Generation.
She, Alan; Capasso, Federico
2016-05-17
The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.
Parallel imaging microfluidic cytometer.
Ehrlich, Daniel J; McKenna, Brian K; Evans, James G; Belkina, Anna C; Denis, Gerald V; Sherr, David H; Cheung, Man Ching
2011-01-01
By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of fluorescence-activated flow cytometry (FCM) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity, and (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in ∼6-10 min, about 30 times the speed of most current FCM systems. In 1D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times for the sample throughput of charge-coupled device (CCD)-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take. Copyright © 2011 Elsevier Inc. All rights reserved.
Parallelization of ITOUGH2 using PVM
International Nuclear Information System (INIS)
Finsterle, Stefan
1998-01-01
ITOUGH2 inversions are computationally intensive because the forward problem must be solved many times to evaluate the objective function for different parameter combinations or to numerically calculate sensitivity coefficients. Most of these forward runs are independent from each other and can therefore be performed in parallel. Message passing based on the Parallel Virtual Machine (PVM) system has been implemented into ITOUGH2 to enable parallel processing of ITOUGH2 jobs on a heterogeneous network of Unix workstations. This report describes the PVM system and its implementation into ITOUGH2. Instructions are given for installing PVM, compiling ITOUGH2-PVM for use on a workstation cluster, the preparation of an 1.TOUGH2 input file under PVM, and the execution of an ITOUGH2-PVM application. Examples are discussed, demonstrating the use of ITOUGH2-PVM
Parallel computational in nuclear group constant calculation
International Nuclear Information System (INIS)
Su'ud, Zaki; Rustandi, Yaddi K.; Kurniadi, Rizal
2002-01-01
In this paper parallel computational method in nuclear group constant calculation using collision probability method will be discuss. The main focus is on the calculation of collision matrix which need large amount of computational time. The geometry treated here is concentric cylinder. The calculation of collision probability matrix is carried out using semi analytic method using Beckley Naylor Function. To accelerate computation speed some computer parallel used to solve the problem. We used LINUX based parallelization using PVM software with C or fortran language. While in windows based we used socket programming using DELPHI or C builder. The calculation results shows the important of optimal weight for each processor in case there area many type of processor speed
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2016-03-15
Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.
About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems
Directory of Open Access Journals (Sweden)
Loredana MOCEAN
2009-01-01
Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.
Neural nets for massively parallel optimization
Dixon, Laurence C. W.; Mills, David
1992-07-01
To apply massively parallel processing systems to the solution of large scale optimization problems it is desirable to be able to evaluate any function f(z), z (epsilon) Rn in a parallel manner. The theorem of Cybenko, Hecht Nielsen, Hornik, Stinchcombe and White, and Funahasi shows that this can be achieved by a neural network with one hidden layer. In this paper we address the problem of the number of nodes required in the layer to achieve a given accuracy in the function and gradient values at all points within a given n dimensional interval. The type of activation function needed to obtain nonsingular Hessian matrices is described and a strategy for obtaining accurate minimal networks presented.
Parallel and vector implementation of APROS simulator code
International Nuclear Information System (INIS)
Niemi, J.; Tommiska, J.
1990-01-01
In this paper the vector and parallel processing implementation of a general purpose simulator code is discussed. In this code the utilization of vector processing is straightforward. In addition to the loop level parallel processing, the functional decomposition and the domain decomposition have been considered. Results represented for a PWR-plant simulation illustrate the potential speed-up factors of the alternatives. It turns out that the loop level parallelism and the domain decomposition are the most promising alternative to employ the parallel processing. (author)
Parallel Framework for Cooperative Processes
Directory of Open Access Journals (Sweden)
Mitică Craus
2005-01-01
Full Text Available This paper describes the work of an object oriented framework designed to be used in the parallelization of a set of related algorithms. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can be used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. Two applications are presented: an Ant Colony Optimization (ACO parallel algorithm for the Travelling Salesman Problem (TSP and an Image Processing (IP parallel algorithm for the Symmetrical Neighborhood Filter (SNF. The implementations of these applications by means of the parallel framework prove to have good performances: approximatively linear speedup and low communication cost.
Parallel Monte Carlo reactor neutronics
International Nuclear Information System (INIS)
Blomquist, R.N.; Brown, F.B.
1994-01-01
The issues affecting implementation of parallel algorithms for large-scale engineering Monte Carlo neutron transport simulations are discussed. For nuclear reactor calculations, these include load balancing, recoding effort, reproducibility, domain decomposition techniques, I/O minimization, and strategies for different parallel architectures. Two codes were parallelized and tested for performance. The architectures employed include SIMD, MIMD-distributed memory, and workstation network with uneven interactive load. Speedups linear with the number of nodes were achieved
The BLAZE language - A parallel language for scientific programming
Mehrotra, Piyush; Van Rosendale, John
1987-01-01
A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.
The BLAZE language: A parallel language for scientific programming
Mehrotra, P.; Vanrosendale, J.
1985-01-01
A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.
DEFF Research Database (Denmark)
Kosbar, Tamer R.; Sofan, Mamdouh A.; Waly, Mohamed A.
2015-01-01
about 6.1 °C when the TFO strand was modified with Z and the Watson-Crick strand with adenine-LNA (AL). The molecular modeling results showed that, in case of nucleobases Y and Z a hydrogen bond (1.69 and 1.72 Å, respectively) was formed between the protonated 3-aminopropyn-1-yl chain and one...... of the phosphate groups in Watson-Crick strand. Also, it was shown that the nucleobase Y made a good stacking and binding with the other nucleobases in the TFO and Watson-Crick duplex, respectively. In contrast, the nucleobase Z with LNA moiety was forced to twist out of plane of Watson-Crick base pair which......The phosphoramidites of DNA monomers of 7-(3-aminopropyn-1-yl)-8-aza-7-deazaadenine (Y) and 7-(3-aminopropyn-1-yl)-8-aza-7-deazaadenine LNA (Z) are synthesized, and the thermal stability at pH 7.2 and 8.2 of anti-parallel triplexes modified with these two monomers is determined. When, the anti...
Parallel consensual neural networks.
Benediktsson, J A; Sveinsson, J R; Ersoy, O K; Swain, P H
1997-01-01
A new type of a neural-network architecture, the parallel consensual neural network (PCNN), is introduced and applied in classification/data fusion of multisource remote sensing and geographic data. The PCNN architecture is based on statistical consensus theory and involves using stage neural networks with transformed input data. The input data are transformed several times and the different transformed data are used as if they were independent inputs. The independent inputs are first classified using the stage neural networks. The output responses from the stage networks are then weighted and combined to make a consensual decision. In this paper, optimization methods are used in order to weight the outputs from the stage networks. Two approaches are proposed to compute the data transforms for the PCNN, one for binary data and another for analog data. The analog approach uses wavelet packets. The experimental results obtained with the proposed approach show that the PCNN outperforms both a conjugate-gradient backpropagation neural network and conventional statistical methods in terms of overall classification accuracy of test data.
A Parallel Particle Swarm Optimizer
National Research Council Canada - National Science Library
Schutte, J. F; Fregly, B .J; Haftka, R. T; George, A. D
2003-01-01
.... Motivated by a computationally demanding biomechanical system identification problem, we introduce a parallel implementation of a stochastic population based global optimizer, the Particle Swarm...
Patterns for Parallel Software Design
Ortega-Arjona, Jorge Luis
2010-01-01
Essential reading to understand patterns for parallel programming Software patterns have revolutionized the way we think about how software is designed, built, and documented, and the design of parallel software requires you to consider other particular design aspects and special skills. From clusters to supercomputers, success heavily depends on the design skills of software developers. Patterns for Parallel Software Design presents a pattern-oriented software architecture approach to parallel software design. This approach is not a design method in the classic sense, but a new way of managin
DEFF Research Database (Denmark)
Christensen, Mark Schram; Ehrsson, H Henrik; Nielsen, Jens Bo
2013-01-01
a different network, involving bilateral dorsal premotor cortex (PMd), primary motor cortex, and SMA, was more active when subjects viewed parallel movements while performing either symmetrical or parallel movements. Correlations between behavioral instability and brain activity were present in right lateral...... adduction-abduction movements symmetrically or in parallel with real-time congruent or incongruent visual feedback of the movements. One network, consisting of bilateral superior and middle frontal gyrus and supplementary motor area (SMA), was more active when subjects performed parallel movements, whereas...
Spierings, Egilius L H; McAllister, Peter J; Bilchik, Tanya R
2015-04-01
A review on headache and insomnia revealed that insomnia is a risk factor for increased headache frequency and headache intensity in migraineurs. The authors designed a randomized, double blind, placebo-controlled, parallel-group, pilot study in which migraineurs who also had insomnia were enrolled, to test this observation. In the study, the authors treated 79 subjects with IHS-II migraine with and/or without aura and with DSM-IV primary insomnia for 6 weeks with 3 mg eszopiclone (Lunesta(®)) or placebo at bedtime. The treatment was preceded by a 2-week baseline period and followed by a 2-week run-out period. Of the 79 subjects treated, 75 were evaluable, 35 in the eszopiclone group, and 40 in the placebo group. At baseline, the groups were comparable except for sleep latency. Of the three remaining sleep variables, total sleep time, nighttime awakenings, and sleep quality, the number of nighttime awakenings during the 6-week treatment period was significantly lower in the eszopiclone group than in the placebo group (P = 0.03). Of the three daytime variables, alertness, fatigue, and functioning, this was also the case for fatigue (P = 005). The headache variables, frequency, duration, and intensity, did not show a difference from placebo during the 6-week treatment period. The study did not meet primary endpoint, that is, the difference in total sleep time during the 6-week treatment period between eszopiclone and placebo was less than 40 minutes. Therefore, it failed to answer the question as to whether insomnia is, indeed, a risk factor for increased headache frequency and headache intensity in migraineurs.
Parallelization and automatic data distribution for nuclear reactor simulations
Energy Technology Data Exchange (ETDEWEB)
Liebrock, L.M. [Liebrock-Hicks Research, Calumet, MI (United States)
1997-07-01
Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.
Parallelization and automatic data distribution for nuclear reactor simulations
International Nuclear Information System (INIS)
Liebrock, L.M.
1997-01-01
Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed
Adler, Lenard A; Dirks, Bryan; Deas, Patrick; Raychaudhuri, Aparna; Dauphin, Matthew; Saylor, Keith; Weisler, Richard
2013-10-09
This study examined the effects of lisdexamfetamine dimesylate (LDX) on quality of life (QOL) in adults with attention-deficit/hyperactivity disorder (ADHD) and clinically significant executive function deficits (EFD). This report highlights QOL findings from a 10-week randomized placebo-controlled trial of LDX (30-70 mg/d) in adults (18-55 years) with ADHD and EFD (Behavior Rating Inventory of EF-Adult, Global Executive Composite [BRIEF-A GEC] ≥65). The primary efficacy measure was the self-reported BRIEF-A; a key secondary measure was self-reported QOL on the Adult ADHD Impact Module (AIM-A). The clinician-completed ADHD Rating Scale version IV (ADHD-RS-IV) with adult prompts and Clinical Global Impressions-Severity (CGI-S) were also employed. The Adult ADHD QoL (AAQoL) was added while the study was in progress. A post hoc analysis examined the subgroup having evaluable results from both AIM-A and AAQoL. Of 161 randomized (placebo, 81; LDX, 80), 159 were included in the safety population. LDX improved AIM-A multi-item domain scores versus placebo; LS mean difference for Performance and Daily Functioning was 21.6 (ES, 0.93, PPsychological Health was 12.1; Life Outlook was 12.5; and Relationships was 7.3. In a post hoc analysis of participants with both AIM-A and AAQoL scores, AIM-A multi-item subgroup analysis scores numerically improved with LDX, with smaller difference for Impact of Symptoms: Daily Interference. The safety profile of LDX was consistent with amphetamine use in previous studies. Overall, adults with ADHD/EFD exhibited self-reported improvement on QOL, using the AIM-A and AAQoL scales in line with medium/large ES; these improvements were paralleled by improvements in EF and ADHD symptoms. The safety profile of LDX was similar to previous studies. ClinicalTrials.gov, NCT01101022.
Parallelization of a blind deconvolution algorithm
Matson, Charles L.; Borelli, Kathy J.
2006-09-01
Often it is of interest to deblur imagery in order to obtain higher-resolution images. Deblurring requires knowledge of the blurring function - information that is often not available separately from the blurred imagery. Blind deconvolution algorithms overcome this problem by jointly estimating both the high-resolution image and the blurring function from the blurred imagery. Because blind deconvolution algorithms are iterative in nature, they can take minutes to days to deblur an image depending how many frames of data are used for the deblurring and the platforms on which the algorithms are executed. Here we present our progress in parallelizing a blind deconvolution algorithm to increase its execution speed. This progress includes sub-frame parallelization and a code structure that is not specialized to a specific computer hardware architecture.
Multi-petascale highly efficient parallel supercomputer
Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen-Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng
2018-05-15
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaflop-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC). The ASIC nodes are interconnected by a five dimensional torus network that optimally maximize the throughput of packet communications between nodes and minimize latency. The network implements collective network and a global asynchronous network that provides global barrier and notification functions. Integrated in the node design include a list-based prefetcher. The memory system implements transaction memory, thread level speculation, and multiversioning cache that improves soft error rate at the same time and supports DMA functionality allowing for parallel processing message-passing.
PARALLEL IMPORT: REALITY FOR RUSSIA
Directory of Open Access Journals (Sweden)
Т. А. Сухопарова
2014-01-01
Full Text Available Problem of parallel import is urgent question at now. Parallel import legalization in Russia is expedient. Such statement based on opposite experts opinion analysis. At the same time it’s necessary to negative consequences consider of this decision and to apply remedies to its minimization.Purchase on Elibrary.ru > Buy now
LOOP-3, Hydraulic Stability in Heated Parallel Channels
Energy Technology Data Exchange (ETDEWEB)
Davies, A L [AEEW, Dorset (United Kingdom)
1968-02-01
1 - Nature of physical problem solved: Hydraulic stability in parallel channels. 2 - Method of solution: Calculation of transfer functions developed in reference (10 below). 3 - Restrictions on the complexity of the problem: Only due to assumptions in analysis (see ref.)
The Galley Parallel File System
Nieuwejaar, Nils; Kotz, David
1996-01-01
Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.
Parallelization of the FLAPW method
International Nuclear Information System (INIS)
Canning, A.; Mannstadt, W.; Freeman, A.J.
1999-01-01
The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about one hundred atoms due to a lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel computer
Parallelization of the FLAPW method
Canning, A.; Mannstadt, W.; Freeman, A. J.
2000-08-01
The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.
Parallel Monte Carlo simulation of aerosol dynamics
Zhou, K.
2014-01-01
A highly efficient Monte Carlo (MC) algorithm is developed for the numerical simulation of aerosol dynamics, that is, nucleation, surface growth, and coagulation. Nucleation and surface growth are handled with deterministic means, while coagulation is simulated with a stochastic method (Marcus-Lushnikov stochastic process). Operator splitting techniques are used to synthesize the deterministic and stochastic parts in the algorithm. The algorithm is parallelized using the Message Passing Interface (MPI). The parallel computing efficiency is investigated through numerical examples. Near 60% parallel efficiency is achieved for the maximum testing case with 3.7 million MC particles running on 93 parallel computing nodes. The algorithm is verified through simulating various testing cases and comparing the simulation results with available analytical and/or other numerical solutions. Generally, it is found that only small number (hundreds or thousands) of MC particles is necessary to accurately predict the aerosol particle number density, volume fraction, and so forth, that is, low order moments of the Particle Size Distribution (PSD) function. Accurately predicting the high order moments of the PSD needs to dramatically increase the number of MC particles. 2014 Kun Zhou et al.
Kalman Filter Tracking on Parallel Architectures
International Nuclear Information System (INIS)
Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi
2016-01-01
Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors such as GPGPU, ARM and Intel MIC. In order to achieve the theoretical performance gains of these processors, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High-Luminosity Large Hadron Collider (HL-LHC), for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques such as Cellular Automata or Hough Transforms. The most common track finding techniques in use today, however, are those based on a Kalman filter approach. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust, and are in use today at the LHC. Given the utility of the Kalman filter in track finding, we have begun to port these algorithms to parallel architectures, namely Intel Xeon and Xeon Phi. We report here on our progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a simplified experimental environment
Comparison of multihardware parallel implementations for a phase unwrapping algorithm
Hernandez-Lopez, Francisco Javier; Rivera, Mariano; Salazar-Garibay, Adan; Legarda-Sáenz, Ricardo
2018-04-01
Phase unwrapping is an important problem in the areas of optical metrology, synthetic aperture radar (SAR) image analysis, and magnetic resonance imaging (MRI) analysis. These images are becoming larger in size and, particularly, the availability and need for processing of SAR and MRI data have increased significantly with the acquisition of remote sensing data and the popularization of magnetic resonators in clinical diagnosis. Therefore, it is important to develop faster and accurate phase unwrapping algorithms. We propose a parallel multigrid algorithm of a phase unwrapping method named accumulation of residual maps, which builds on a serial algorithm that consists of the minimization of a cost function; minimization achieved by means of a serial Gauss-Seidel kind algorithm. Our algorithm also optimizes the original cost function, but unlike the original work, our algorithm is a parallel Jacobi class with alternated minimizations. This strategy is known as the chessboard type, where red pixels can be updated in parallel at same iteration since they are independent. Similarly, black pixels can be updated in parallel in an alternating iteration. We present parallel implementations of our algorithm for different parallel multicore architecture such as CPU-multicore, Xeon Phi coprocessor, and Nvidia graphics processing unit. In all the cases, we obtain a superior performance of our parallel algorithm when compared with the original serial version. In addition, we present a detailed comparative performance of the developed parallel versions.
Sparse BLIP: BLind Iterative Parallel imaging reconstruction using compressed sensing.
She, Huajun; Chen, Rong-Rong; Liang, Dong; DiBella, Edward V R; Ying, Leslie
2014-02-01
To develop a sensitivity-based parallel imaging reconstruction method to reconstruct iteratively both the coil sensitivities and MR image simultaneously based on their prior information. Parallel magnetic resonance imaging reconstruction problem can be formulated as a multichannel sampling problem where solutions are sought analytically. However, the channel functions given by the coil sensitivities in parallel imaging are not known exactly and the estimation error usually leads to artifacts. In this study, we propose a new reconstruction algorithm, termed Sparse BLind Iterative Parallel, for blind iterative parallel imaging reconstruction using compressed sensing. The proposed algorithm reconstructs both the sensitivity functions and the image simultaneously from undersampled data. It enforces the sparseness constraint in the image as done in compressed sensing, but is different from compressed sensing in that the sensing matrix is unknown and additional constraint is enforced on the sensitivities as well. Both phantom and in vivo imaging experiments were carried out with retrospective undersampling to evaluate the performance of the proposed method. Experiments show improvement in Sparse BLind Iterative Parallel reconstruction when compared with Sparse SENSE, JSENSE, IRGN-TV, and L1-SPIRiT reconstructions with the same number of measurements. The proposed Sparse BLind Iterative Parallel algorithm reduces the reconstruction errors when compared to the state-of-the-art parallel imaging methods. Copyright © 2013 Wiley Periodicals, Inc.
Is Monte Carlo embarrassingly parallel?
Energy Technology Data Exchange (ETDEWEB)
Hoogenboom, J. E. [Delft Univ. of Technology, Mekelweg 15, 2629 JB Delft (Netherlands); Delft Nuclear Consultancy, IJsselzoom 2, 2902 LB Capelle aan den IJssel (Netherlands)
2012-07-01
Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)
Is Monte Carlo embarrassingly parallel?
International Nuclear Information System (INIS)
Hoogenboom, J. E.
2012-01-01
Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)
Parallel integer sorting with medium and fine-scale parallelism
Dagum, Leonardo
1993-01-01
Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
Template based parallel checkpointing in a massively parallel computer system
Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN
2009-01-13
A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
Parallel education: what is it?
Amos, Michelle Peta
2017-01-01
In the history of education it has long been discussed that single-sex and coeducation are the two models of education present in schools. With the introduction of parallel schools over the last 15 years, there has been very little research into this 'new model'. Many people do not understand what it means for a school to be parallel or they confuse a parallel model with co-education, due to the presence of both boys and girls within the one institution. Therefore, the main obj...
Balanced, parallel operation of flashlamps
International Nuclear Information System (INIS)
Carder, B.M.; Merritt, B.T.
1979-01-01
A new energy store, the Compensated Pulsed Alternator (CPA), promises to be a cost effective substitute for capacitors to drive flashlamps that pump large Nd:glass lasers. Because the CPA is large and discrete, it will be necessary that it drive many parallel flashlamp circuits, presenting a problem in equal current distribution. Current division to +- 20% between parallel flashlamps has been achieved, but this is marginal for laser pumping. A method is presented here that provides equal current sharing to about 1%, and it includes fused protection against short circuit faults. The method was tested with eight parallel circuits, including both open-circuit and short-circuit fault tests
International Nuclear Information System (INIS)
Baron, E.; Hauschildt, Peter H.
1998-01-01
We describe an important addition to the parallel implementation of our generalized nonlocal thermodynamic equilibrium (NLTE) stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is, distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition, task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000 - 300,000) and hence parallelization over wavelength can lead both to considerable speedup in calculation time and the ability to make use of the aggregate memory available on massively parallel supercomputers. Here, we describe an implementation of a pipelined design for the wavelength parallelization of PHOENIX, where the necessary data from the processor working on a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known. Our implementation uses a MIMD design based on a relatively small number of standard message passing interface (MPI) library calls and is fully portable between serial and parallel computers. copyright 1998 The American Astronomical Society
Parallelizing the spectral transform method: A comparison of alternative parallel algorithms
International Nuclear Information System (INIS)
Foster, I.; Worley, P.H.
1993-01-01
The spectral transform method is a standard numerical technique for solving partial differential equations on the sphere and is widely used in global climate modeling. In this paper, we outline different approaches to parallelizing the method and describe experiments that we are conducting to evaluate the efficiency of these approaches on parallel computers. The experiments are conducted using a testbed code that solves the nonlinear shallow water equations on a sphere, but are designed to permit evaluation in the context of a global model. They allow us to evaluate the relative merits of the approaches as a function of problem size and number of processors. The results of this study are guiding ongoing work on PCCM2, a parallel implementation of the Community Climate Model developed at the National Center for Atmospheric Research
Energy Technology Data Exchange (ETDEWEB)
Adachi, Masaaki; Ogasawara, Shinobu; Kume, Etsuo [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment; Ishizuki, Shigeru; Nemoto, Toshiyuki; Kawasaki, Nobuo; Kawai, Wataru [Fujitsu Ltd., Tokyo (Japan); Yatake, Yo-ichi [Hitachi Ltd., Tokyo (Japan)
2001-02-01
Several computer codes in the nuclear field have been vectorized, parallelized and trans-ported on the FUJITSU VPP500 system, the AP3000 system, the SX-4 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 18 codes in fiscal 1999. These results are reported in 3 parts, i.e., the vectorization and the parallelization part on vector processors, the parallelization part on scalar processors and the porting part. In this report, we describe the vectorization and parallelization on vector processors. In this vectorization and parallelization on vector processors part, the vectorization of Relativistic Molecular Orbital Calculation code RSCAT, a microscopic transport code for high energy nuclear collisions code JAM, three-dimensional non-steady thermal-fluid analysis code STREAM, Relativistic Density Functional Theory code RDFT and High Speed Three-Dimensional Nodal Diffusion code MOSRA-Light on the VPP500 system and the SX-4 system are described. (author)
Workspace Analysis for Parallel Robot
Directory of Open Access Journals (Sweden)
Ying Sun
2013-05-01
Full Text Available As a completely new-type of robot, the parallel robot possesses a lot of advantages that the serial robot does not, such as high rigidity, great load-carrying capacity, small error, high precision, small self-weight/load ratio, good dynamic behavior and easy control, hence its range is extended in using domain. In order to find workspace of parallel mechanism, the numerical boundary-searching algorithm based on the reverse solution of kinematics and limitation of link length has been introduced. This paper analyses position workspace, orientation workspace of parallel robot of the six degrees of freedom. The result shows: It is a main means to increase and decrease its workspace to change the length of branch of parallel mechanism; The radius of the movement platform has no effect on the size of workspace, but will change position of workspace.
"Feeling" Series and Parallel Resistances.
Morse, Robert A.
1993-01-01
Equipped with drinking straws and stirring straws, a teacher can help students understand how resistances in electric circuits combine in series and in parallel. Follow-up suggestions are provided. (ZWH)
Parallel encoders for pixel detectors
International Nuclear Information System (INIS)
Nikityuk, N.M.
1991-01-01
A new method of fast encoding and determining the multiplicity and coordinates of fired pixels is described. A specific example construction of parallel encodes and MCC for n=49 and t=2 is given. 16 refs.; 6 figs.; 2 tabs
Massively Parallel Finite Element Programming
Heister, Timo
2010-01-01
Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
Event monitoring of parallel computations
Directory of Open Access Journals (Sweden)
Gruzlikov Alexander M.
2015-06-01
Full Text Available The paper considers the monitoring of parallel computations for detection of abnormal events. It is assumed that computations are organized according to an event model, and monitoring is based on specific test sequences
Massively Parallel Finite Element Programming
Heister, Timo; Kronbichler, Martin; Bangerth, Wolfgang
2010-01-01
Today's large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
The STAPL Parallel Graph Library
Harshvardhan,
2013-01-01
This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable distributed graph container and a collection of commonly used parallel graph algorithms. The library introduces pGraph pViews that separate algorithm design from the container implementation. It supports three graph processing algorithmic paradigms, level-synchronous, asynchronous and coarse-grained, and provides common graph algorithms based on them. Experimental results demonstrate improved scalability in performance and data size over existing graph libraries on more than 16,000 cores and on internet-scale graphs containing over 16 billion vertices and 250 billion edges. © Springer-Verlag Berlin Heidelberg 2013.
Parallel ray tracing for one-dimensional discrete ordinate computations
International Nuclear Information System (INIS)
Jarvis, R.D.; Nelson, P.
1996-01-01
The ray-tracing sweep in discrete-ordinates, spatially discrete numerical approximation methods applied to the linear, steady-state, plane-parallel, mono-energetic, azimuthally symmetric, neutral-particle transport equation can be reduced to a parallel prefix computation. In so doing, the often severe penalty in convergence rate of the source iteration, suffered by most current parallel algorithms using spatial domain decomposition, can be avoided while attaining parallelism in the spatial domain to whatever extent desired. In addition, the reduction implies parallel algorithm complexity limits for the ray-tracing sweep. The reduction applies to all closed, linear, one-cell functional (CLOF) spatial approximation methods, which encompasses most in current popular use. Scalability test results of an implementation of the algorithm on a 64-node nCube-2S hypercube-connected, message-passing, multi-computer are described. (author)
Reliability allocation problem in a series-parallel system
International Nuclear Information System (INIS)
Yalaoui, Alice; Chu, Chengbin; Chatelet, Eric
2005-01-01
In order to improve system reliability, designers may introduce in a system different technologies in parallel. When each technology is composed of components in series, the configuration belongs to the series-parallel systems. This type of system has not been studied as much as the parallel-series architecture. There exist no methods dedicated to the reliability allocation in series-parallel systems with different technologies. We propose in this paper theoretical and practical results for the allocation problem in a series-parallel system. Two resolution approaches are developed. Firstly, a one stage problem is studied and the results are exploited for the multi-stages problem. A theoretical condition for obtaining the optimal allocation is developed. Since this condition is too restrictive, we secondly propose an alternative approach based on an approximated function and the results of the one-stage study. This second approach is applied to numerical examples
Writing parallel programs that work
CERN. Geneva
2012-01-01
Serial algorithms typically run inefficiently on parallel machines. This may sound like an obvious statement, but it is the root cause of why parallel programming is considered to be difficult. The current state of the computer industry is still that almost all programs in existence are serial. This talk will describe the techniques used in the Intel Parallel Studio to provide a developer with the tools necessary to understand the behaviors and limitations of the existing serial programs. Once the limitations are known the developer can refactor the algorithms and reanalyze the resulting programs with the tools in the Intel Parallel Studio to create parallel programs that work. About the speaker Paul Petersen is a Sr. Principal Engineer in the Software and Solutions Group (SSG) at Intel. He received a Ph.D. degree in Computer Science from the University of Illinois in 1993. After UIUC, he was employed at Kuck and Associates, Inc. (KAI) working on auto-parallelizing compiler (KAP), and was involved in th...
Exploiting Symmetry on Parallel Architectures.
Stiller, Lewis Benjamin
1995-01-01
This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.
Parallel algorithms for continuum dynamics
International Nuclear Information System (INIS)
Hicks, D.L.; Liebrock, L.M.
1987-01-01
Simply porting existing parallel programs to a new parallel processor may not achieve the full speedup possible; to achieve the maximum efficiency may require redesigning the parallel algorithms for the specific architecture. The authors discuss here parallel algorithms that were developed first for the HEP processor and then ported to the CRAY X-MP/4, the ELXSI/10, and the Intel iPSC/32. Focus is mainly on the most recent parallel processing results produced, i.e., those on the Intel Hypercube. The applications are simulations of continuum dynamics in which the momentum and stress gradients are important. Examples of these are inertial confinement fusion experiments, severe breaks in the coolant system of a reactor, weapons physics, shock-wave physics. Speedup efficiencies on the Intel iPSC Hypercube are very sensitive to the ratio of communication to computation. Great care must be taken in designing algorithms for this machine to avoid global communication. This is much more critical on the iPSC than it was on the three previous parallel processors
Flexibility and Performance of Parallel File Systems
Kotz, David; Nieuwejaar, Nils
1996-01-01
As we gain experience with parallel file systems, it becomes increasingly clear that a single solution does not suit all applications. For example, it appears to be impossible to find a single appropriate interface, caching policy, file structure, or disk-management strategy. Furthermore, the proliferation of file-system interfaces and abstractions make applications difficult to port. We propose that the traditional functionality of parallel file systems be separated into two components: a fixed core that is standard on all platforms, encapsulating only primitive abstractions and interfaces, and a set of high-level libraries to provide a variety of abstractions and application-programmer interfaces (API's). We present our current and next-generation file systems as examples of this structure. Their features, such as a three-dimensional file structure, strided read and write interfaces, and I/O-node programs, are specifically designed with the flexibility and performance necessary to support a wide range of applications.
Oxytocin: parallel processing in the social brain?
Dölen, Gül
2015-06-01
Early studies attempting to disentangle the network complexity of the brain exploited the accessibility of sensory receptive fields to reveal circuits made up of synapses connected both in series and in parallel. More recently, extension of this organisational principle beyond the sensory systems has been made possible by the advent of modern molecular, viral and optogenetic approaches. Here, evidence supporting parallel processing of social behaviours mediated by oxytocin is reviewed. Understanding oxytocinergic signalling from this perspective has significant implications for the design of oxytocin-based therapeutic interventions aimed at disorders such as autism, where disrupted social function is a core clinical feature. Moreover, identification of opportunities for novel technology development will require a better appreciation of the complexity of the circuit-level organisation of the social brain. © 2015 The Authors. Journal of Neuroendocrinology published by John Wiley & Sons Ltd on behalf of British Society for Neuroendocrinology.
Parallel Relational Universes – experiments in modularity
DEFF Research Database (Denmark)
Pagliarini, Luigi; Lund, Henrik Hautop
2015-01-01
: We here describe Parallel Relational Universes, an artistic method used for the psychological analysis of group dynamics. The design of the artistic system, which mediates group dynamics, emerges from our studies of modular playware and remixing playware. Inspired from remixing modular playware......, where users remix samples in the form of physical and functional modules, we created an artistic instantiation of such a concept with the Parallel Relational Universes, allowing arts alumni to remix artistic expressions. Here, we report the data emerged from a first pre-test, run with gymnasium’s alumni....... We then report both the artistic and the psychological findings. We discuss possible variations of such an instrument. Between an art piece and a psychological test, at a first cognitive analysis, it seems to be a promising research tool...
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2014-08-12
Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Parallel computing in plasma physics: Nonlinear instabilities
International Nuclear Information System (INIS)
Pohn, E.; Kamelander, G.; Shoucri, M.
2000-01-01
A Vlasov-Poisson-system is used for studying the time evolution of the charge-separation at a spatial one- as well as a two-dimensional plasma-edge. Ions are advanced in time using the Vlasov-equation. The whole three-dimensional velocity-space is considered leading to very time-consuming four-resp. five-dimensional fully kinetic simulations. In the 1D simulations electrons are assumed to behave adiabatic, i.e. they are Boltzmann-distributed, leading to a nonlinear Poisson-equation. In the 2D simulations a gyro-kinetic approximation is used for the electrons. The plasma is assumed to be initially neutral. The simulations are performed at an equidistant grid. A constant time-step is used for advancing the density-distribution function in time. The time-evolution of the distribution function is performed using a splitting scheme. Each dimension (x, y, υ x , υ y , υ z ) of the phase-space is advanced in time separately. The value of the distribution function for the next time is calculated from the value of an - in general - interstitial point at the present time (fractional shift). One-dimensional cubic-spline interpolation is used for calculating the interstitial function values. After the fractional shifts are performed for each dimension of the phase-space, a whole time-step for advancing the distribution function is finished. Afterwards the charge density is calculated, the Poisson-equation is solved and the electric field is calculated before the next time-step is performed. The fractional shift method sketched above was parallelized for p processors as follows. Considering first the shifts in y-direction, a proper parallelization strategy is to split the grid into p disjoint υ z -slices, which are sub-grids, each containing a different 1/p-th part of the υ z range but the whole range of all other dimensions. Each processor is responsible for performing the y-shifts on a different slice, which can be done in parallel without any communication between
SPRINT: A new parallel framework for R
Directory of Open Access Journals (Sweden)
Scharinger Florian
2008-12-01
Full Text Available Abstract Background Microarray analysis allows the simultaneous measurement of thousands to millions of genes or sequences across tens to thousands of different samples. The analysis of the resulting data tests the limits of existing bioinformatics computing infrastructure. A solution to this issue is to use High Performance Computing (HPC systems, which contain many processors and more memory than desktop computer systems. Many biostatisticians use R to process the data gleaned from microarray analysis and there is even a dedicated group of packages, Bioconductor, for this purpose. However, to exploit HPC systems, R must be able to utilise the multiple processors available on these systems. There are existing modules that enable R to use multiple processors, but these are either difficult to use for the HPC novice or cannot be used to solve certain classes of problems. A method of exploiting HPC systems, using R, but without recourse to mastering parallel programming paradigms is therefore necessary to analyse genomic data to its fullest. Results We have designed and built a prototype framework that allows the addition of parallelised functions to R to enable the easy exploitation of HPC systems. The Simple Parallel R INTerface (SPRINT is a wrapper around such parallelised functions. Their use requires very little modification to existing sequential R scripts and no expertise in parallel computing. As an example we created a function that carries out the computation of a pairwise calculated correlation matrix. This performs well with SPRINT. When executed using SPRINT on an HPC resource of eight processors this computation reduces by more than three times the time R takes to complete it on one processor. Conclusion SPRINT allows the biostatistician to concentrate on the research problems rather than the computation, while still allowing exploitation of HPC systems. It is easy to use and with further development will become more useful as more
Second derivative parallel block backward differentiation type ...
African Journals Online (AJOL)
Second derivative parallel block backward differentiation type formulas for Stiff ODEs. ... Log in or Register to get access to full text downloads. ... and the methods are inherently parallel and can be distributed over parallel processors. They are ...
A Parallel Approach to Fractal Image Compression
Lubomir Dedera
2004-01-01
The paper deals with a parallel approach to coding and decoding algorithms in fractal image compressionand presents experimental results comparing sequential and parallel algorithms from the point of view of achieved bothcoding and decoding time and effectiveness of parallelization.
Directory of Open Access Journals (Sweden)
Letícia Maria Sicuro CORRÊA
1998-01-01
Full Text Available O contraste pro/pronome em orações coordenadas em português é aqui explorado de modo a distinguirem-se os procedimentos através dos quais formas pronominais sujeito são interpretadas em diferentes contextos sintáticos e discursivos — através de estratégias de seleção de um antecedente lingüístico ou da recuperação "automática" de uma representação mantida particularmente ativada na memória de trabalho. Dois experimentos são relatados. O primeiro testa a hipótese de que o vínculo sintático entre a oração que contém a forma pronominal e a que contém seus possíveis antecedentes define condições de processamento que favorecem o uso de um ou de outro procedimento de interpretação. O segundo testa a hipótese de que o grau de ativação de uma dada representação na memória de trabalho (definido em relação a um sistema que opera em três níveis afeta o modo como o sujeito pronominal de orações independentes é interpretado no discurso. Verifica-se que o contraste pro/ pronome pode ser re-estabelecido fora do âmbito de sentenças complexas uma vez que haja alteração local do foco da referência. A natureza deste contraste é discutida levando-se em conta as condições de processamento nas quais este se manifesta.The contrastive interpretation of pro and pronoun in co-ordinate sentences by Portuguese speakers is explored here, as a means of distinguishing the procedures whereby pronominal forms are interpreted in different syntactic/discourse contexts - by means of a search-identification strategy (such as the parallel function strategy or by means of the "automatic" recovery of a representation which is maintained particularly activated in working memory. Two experiments are reported. The first one tests the hypothesis that the kind of syntactic relationship that holds between the clause containing the pronominal form and the one containing possible antecedents defines processing conditions that favour one
General upper bounds on the runtime of parallel evolutionary algorithms.
Lässig, Jörg; Sudholt, Dirk
2014-01-01
We present a general method for analyzing the runtime of parallel evolutionary algorithms with spatially structured populations. Based on the fitness-level method, it yields upper bounds on the expected parallel runtime. This allows for a rigorous estimate of the speedup gained by parallelization. Tailored results are given for common migration topologies: ring graphs, torus graphs, hypercubes, and the complete graph. Example applications for pseudo-Boolean optimization show that our method is easy to apply and that it gives powerful results. In our examples the performance guarantees improve with the density of the topology. Surprisingly, even sparse topologies such as ring graphs lead to a significant speedup for many functions while not increasing the total number of function evaluations by more than a constant factor. We also identify which number of processors lead to the best guaranteed speedups, thus giving hints on how to parameterize parallel evolutionary algorithms.
CALTRANS: A parallel, deterministic, 3D neutronics code
Energy Technology Data Exchange (ETDEWEB)
Carson, L.; Ferguson, J.; Rogers, J.
1994-04-01
Our efforts to parallelize the deterministic solution of the neutron transport equation has culminated in a new neutronics code CALTRANS, which has full 3D capability. In this article, we describe the layout and algorithms of CALTRANS and present performance measurements of the code on a variety of platforms. Explicit implementation of the parallel algorithms of CALTRANS using both the function calls of the Parallel Virtual Machine software package (PVM 3.2) and the Meiko CS-2 tagged message passing library (based on the Intel NX/2 interface) are provided in appendices.
Programming parallel architectures: The BLAZE family of languages
Mehrotra, Piyush
1988-01-01
Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.
Parallel fabrication of macroporous scaffolds.
Dobos, Andrew; Grandhi, Taraka Sai Pavan; Godeshala, Sudhakar; Meldrum, Deirdre R; Rege, Kaushal
2018-07-01
Scaffolds generated from naturally occurring and synthetic polymers have been investigated in several applications because of their biocompatibility and tunable chemo-mechanical properties. Existing methods for generation of 3D polymeric scaffolds typically cannot be parallelized, suffer from low throughputs, and do not allow for quick and easy removal of the fragile structures that are formed. Current molds used in hydrogel and scaffold fabrication using solvent casting and porogen leaching are often single-use and do not facilitate 3D scaffold formation in parallel. Here, we describe a simple device and related approaches for the parallel fabrication of macroporous scaffolds. This approach was employed for the generation of macroporous and non-macroporous materials in parallel, in higher throughput and allowed for easy retrieval of these 3D scaffolds once formed. In addition, macroporous scaffolds with interconnected as well as non-interconnected pores were generated, and the versatility of this approach was employed for the generation of 3D scaffolds from diverse materials including an aminoglycoside-derived cationic hydrogel ("Amikagel"), poly(lactic-co-glycolic acid) or PLGA, and collagen. Macroporous scaffolds generated using the device were investigated for plasmid DNA binding and cell loading, indicating the use of this approach for developing materials for different applications in biotechnology. Our results demonstrate that the device-based approach is a simple technology for generating scaffolds in parallel, which can enhance the toolbox of current fabrication techniques. © 2018 Wiley Periodicals, Inc.
Parallel plasma fluid turbulence calculations
International Nuclear Information System (INIS)
Leboeuf, J.N.; Carreras, B.A.; Charlton, L.A.; Drake, J.B.; Lynch, V.E.; Newman, D.E.; Sidikman, K.L.; Spong, D.A.
1994-01-01
The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center's CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated
Evaluating parallel optimization on transputers
Directory of Open Access Journals (Sweden)
A.G. Chalmers
2003-12-01
Full Text Available The faster processing power of modern computers and the development of efficient algorithms have made it possible for operations researchers to tackle a much wider range of problems than ever before. Further improvements in processing speed can be achieved utilising relatively inexpensive transputers to process components of an algorithm in parallel. The Davidon-Fletcher-Powell method is one of the most successful and widely used optimisation algorithms for unconstrained problems. This paper examines the algorithm and identifies the components that can be processed in parallel. The results of some experiments with these components are presented which indicates under what conditions parallel processing with an inexpensive configuration is likely to be faster than the traditional sequential implementations. The performance of the whole algorithm with its parallel components is then compared with the original sequential algorithm. The implementation serves to illustrate the practicalities of speeding up typical OR algorithms in terms of difficulty, effort and cost. The results give an indication of the savings in time a given parallel implementation can be expected to yield.
Pattern-Driven Automatic Parallelization
Directory of Open Access Journals (Sweden)
Christoph W. Kessler
1996-01-01
Full Text Available This article describes a knowledge-based system for automatic parallelization of a wide class of sequential numerical codes operating on vectors and dense matrices, and for execution on distributed memory message-passing multiprocessors. Its main feature is a fast and powerful pattern recognition tool that locally identifies frequently occurring computations and programming concepts in the source code. This tool also works for dusty deck codes that have been "encrypted" by former machine-specific code transformations. Successful pattern recognition guides sophisticated code transformations including local algorithm replacement such that the parallelized code need not emerge from the sequential program structure by just parallelizing the loops. It allows access to an expert's knowledge on useful parallel algorithms, available machine-specific library routines, and powerful program transformations. The partially restored program semantics also supports local array alignment, distribution, and redistribution, and allows for faster and more exact prediction of the performance of the parallelized target code than is usually possible.
Parallel data grabbing card based on PCI bus RS422
International Nuclear Information System (INIS)
Zhang Zhenghui; Shen Ji; Wei Dongshan; Chen Ziyu
2005-01-01
This article briefly introduces the developments of the parallel data grabbing card based on RS422 and PCI bus. It could be applied for grabbing the 14 bits parallel data in high speed, coming from the devices with RS422 interface. The methods of data acquisition which bases on the PCI protocol, the functions and their usages of the chips employed, the ideas and principles of the hardware and software designing are presented. (authors)
Directory of Open Access Journals (Sweden)
Ashkan Tousimojarad
2013-12-01
Full Text Available We present the Glasgow Parallel Reduction Machine (GPRM, a novel, flexible framework for parallel task-composition based many-core programming. We allow the programmer to structure programs into task code, written as C++ classes, and communication code, written in a restricted subset of C++ with functional semantics and parallel evaluation. In this paper we discuss the GPRM, the virtual machine framework that enables the parallel task composition approach. We focus the discussion on GPIR, the functional language used as the intermediate representation of the bytecode running on the GPRM. Using examples in this language we show the flexibility and power of our task composition framework. We demonstrate the potential using an implementation of a merge sort algorithm on a 64-core Tilera processor, as well as on a conventional Intel quad-core processor and an AMD 48-core processor system. We also compare our framework with OpenMP tasks in a parallel pointer chasing algorithm running on the Tilera processor. Our results show that the GPRM programs outperform the corresponding OpenMP codes on all test platforms, and can greatly facilitate writing of parallel programs, in particular non-data parallel algorithms such as reductions.
Parallel artificial liquid membrane extraction
DEFF Research Database (Denmark)
Gjelstad, Astrid; Rasmussen, Knut Einar; Parmer, Marthe Petrine
2013-01-01
This paper reports development of a new approach towards analytical liquid-liquid-liquid membrane extraction termed parallel artificial liquid membrane extraction. A donor plate and acceptor plate create a sandwich, in which each sample (human plasma) and acceptor solution is separated by an arti......This paper reports development of a new approach towards analytical liquid-liquid-liquid membrane extraction termed parallel artificial liquid membrane extraction. A donor plate and acceptor plate create a sandwich, in which each sample (human plasma) and acceptor solution is separated...... by an artificial liquid membrane. Parallel artificial liquid membrane extraction is a modification of hollow-fiber liquid-phase microextraction, where the hollow fibers are replaced by flat membranes in a 96-well plate format....
Parallel algorithms for mapping pipelined and parallel computations
Nicol, David M.
1988-01-01
Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
Cellular automata a parallel model
Mazoyer, J
1999-01-01
Cellular automata can be viewed both as computational models and modelling systems of real processes. This volume emphasises the first aspect. In articles written by leading researchers, sophisticated massive parallel algorithms (firing squad, life, Fischer's primes recognition) are treated. Their computational power and the specific complexity classes they determine are surveyed, while some recent results in relation to chaos from a new dynamic systems point of view are also presented. Audience: This book will be of interest to specialists of theoretical computer science and the parallelism challenge.
Fast parallel algorithm for CT image reconstruction.
Flores, Liubov A; Vidal, Vicent; Mayo, Patricia; Rodenas, Francisco; Verdú, Gumersindo
2012-01-01
In X-ray computed tomography (CT) the X rays are used to obtain the projection data needed to generate an image of the inside of an object. The image can be generated with different techniques. Iterative methods are more suitable for the reconstruction of images with high contrast and precision in noisy conditions and from a small number of projections. Their use may be important in portable scanners for their functionality in emergency situations. However, in practice, these methods are not widely used due to the high computational cost of their implementation. In this work we analyze iterative parallel image reconstruction with the Portable Extensive Toolkit for Scientific computation (PETSc).
Methods and models for the construction of weakly parallel tests
Adema, J.J.; Adema, Jos J.
1992-01-01
Several methods are proposed for the construction of weakly parallel tests [i.e., tests with the same test information function (TIF)]. A mathematical programming model that constructs tests containing a prespecified TIF and a heuristic that assigns items to tests with information functions that are
Methods and models for the construction of weakly parallel tests
Adema, J.J.; Adema, Jos J.
1990-01-01
Methods are proposed for the construction of weakly parallel tests, that is, tests with the same test information function. A mathematical programing model for constructing tests with a prespecified test information function and a heuristic for assigning items to tests such that their information
Parallel transport of long mean-free-path plasma along open magnetic field lines: Parallel heat flux
International Nuclear Information System (INIS)
Guo Zehua; Tang Xianzhu
2012-01-01
In a long mean-free-path plasma where temperature anisotropy can be sustained, the parallel heat flux has two components with one associated with the parallel thermal energy and the other the perpendicular thermal energy. Due to the large deviation of the distribution function from local Maxwellian in an open field line plasma with low collisionality, the conventional perturbative calculation of the parallel heat flux closure in its local or non-local form is no longer applicable. Here, a non-perturbative calculation is presented for a collisionless plasma in a two-dimensional flux expander bounded by absorbing walls. Specifically, closures of previously unfamiliar form are obtained for ions and electrons, which relate two distinct components of the species parallel heat flux to the lower order fluid moments such as density, parallel flow, parallel and perpendicular temperatures, and the field quantities such as the magnetic field strength and the electrostatic potential. The plasma source and boundary condition at the absorbing wall enter explicitly in the closure calculation. Although the closure calculation does not take into account wave-particle interactions, the results based on passing orbits from steady-state collisionless drift-kinetic equation show remarkable agreement with fully kinetic-Maxwell simulations. As an example of the physical implications of the theory, the parallel heat flux closures are found to predict a surprising observation in the kinetic-Maxwell simulation of the 2D magnetic flux expander problem, where the parallel heat flux of the parallel thermal energy flows from low to high parallel temperature region.
Parallel Sparse Matrix - Vector Product
DEFF Research Database (Denmark)
Alexandersen, Joe; Lazarov, Boyan Stefanov; Dammann, Bernd
This technical report contains a case study of a sparse matrix-vector product routine, implemented for parallel execution on a compute cluster with both pure MPI and hybrid MPI-OpenMP solutions. C++ classes for sparse data types were developed and the report shows how these class can be used...
[Falsified medicines in parallel trade].
Muckenfuß, Heide
2017-11-01
The number of falsified medicines on the German market has distinctly increased over the past few years. In particular, stolen pharmaceutical products, a form of falsified medicines, have increasingly been introduced into the legal supply chain via parallel trading. The reasons why parallel trading serves as a gateway for falsified medicines are most likely the complex supply chains and routes of transport. It is hardly possible for national authorities to trace the history of a medicinal product that was bought and sold by several intermediaries in different EU member states. In addition, the heterogeneous outward appearance of imported and relabelled pharmaceutical products facilitates the introduction of illegal products onto the market. Official batch release at the Paul-Ehrlich-Institut offers the possibility of checking some aspects that might provide an indication of a falsified medicine. In some circumstances, this may allow the identification of falsified medicines before they come onto the German market. However, this control is only possible for biomedicinal products that have not received a waiver regarding official batch release. For improved control of parallel trade, better networking among the EU member states would be beneficial. European-wide regulations, e. g., for disclosure of the complete supply chain, would help to minimise the risks of parallel trading and hinder the marketing of falsified medicines.
The parallel adult education system
DEFF Research Database (Denmark)
Wahlgren, Bjarne
2015-01-01
for competence development. The Danish university educational system includes two parallel programs: a traditional academic track (candidatus) and an alternative practice-based track (master). The practice-based program was established in 2001 and organized as part time. The total program takes half the time...
Where are the parallel algorithms?
Voigt, R. G.
1985-01-01
Four paradigms that can be useful in developing parallel algorithms are discussed. These include computational complexity analysis, changing the order of computation, asynchronous computation, and divide and conquer. Each is illustrated with an example from scientific computation, and it is shown that computational complexity must be used with great care or an inefficient algorithm may be selected.
Default Parallels Plesk Panel Page
services that small businesses want and need. Our software includes key building blocks of cloud service virtualized servers Service Provider Products ParallelsÂ® Automation Hosting, SaaS, and cloud computing , the leading hosting automation software. You see this page because there is no Web site at this
Parallel plate transmission line transformer
Voeten, S.J.; Brussaard, G.J.H.; Pemen, A.J.M.
2011-01-01
A Transmission Line Transformer (TLT) can be used to transform high-voltage nanosecond pulses. These transformers rely on the fact that the length of the pulse is shorter than the transmission lines used. This allows connecting the transmission lines in parallel at the input and in series at the
Massively parallel quantum computer simulator
De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.
2007-01-01
We describe portable software to simulate universal quantum computers on massive parallel Computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray
Parallel computing: numerics, applications, and trends
National Research Council Canada - National Science Library
Trobec, Roman; Vajteršic, Marián; Zinterhof, Peter
2009-01-01
... and/or distributed systems. The contributions to this book are focused on topics most concerned in the trends of today's parallel computing. These range from parallel algorithmics, programming, tools, network computing to future parallel computing. Particular attention is paid to parallel numerics: linear algebra, differential equations, numerica...
Experiments with parallel algorithms for combinatorial problems
G.A.P. Kindervater (Gerard); H.W.J.M. Trienekens
1985-01-01
textabstractIn the last decade many models for parallel computation have been proposed and many parallel algorithms have been developed. However, few of these models have been realized and most of these algorithms are supposed to run on idealized, unrealistic parallel machines. The parallel machines
International Nuclear Information System (INIS)
Heggarty, J.W.
1999-06-01
For almost thirty years, sequential R-matrix computation has been used by atomic physics research groups, from around the world, to model collision phenomena involving the scattering of electrons or positrons with atomic or molecular targets. As considerable progress has been made in the understanding of fundamental scattering processes, new data, obtained from more complex calculations, is of current interest to experimentalists. Performing such calculations, however, places considerable demands on the computational resources to be provided by the target machine, in terms of both processor speed and memory requirement. Indeed, in some instances the computational requirements are so great that the proposed R-matrix calculations are intractable, even when utilising contemporary classic supercomputers. Historically, increases in the computational requirements of R-matrix computation were accommodated by porting the problem codes to a more powerful classic supercomputer. Although this approach has been successful in the past, it is no longer considered to be a satisfactory solution due to the limitations of current (and future) Von Neumann machines. As a consequence, there has been considerable interest in the high performance multicomputers, that have emerged over the last decade which appear to offer the computational resources required by contemporary R-matrix research. Unfortunately, developing codes for these machines is not as simple a task as it was to develop codes for successive classic supercomputers. The difficulty arises from the considerable differences in the computing models that exist between the two types of machine and results in the programming of multicomputers to be widely acknowledged as a difficult, time consuming and error-prone task. Nevertheless, unless parallel R-matrix computation is realised, important theoretical and experimental atomic physics research will continue to be hindered. This thesis describes work that was undertaken in
The numerical parallel computing of photon transport
International Nuclear Information System (INIS)
Huang Qingnan; Liang Xiaoguang; Zhang Lifa
1998-12-01
The parallel computing of photon transport is investigated, the parallel algorithm and the parallelization of programs on parallel computers both with shared memory and with distributed memory are discussed. By analyzing the inherent law of the mathematics and physics model of photon transport according to the structure feature of parallel computers, using the strategy of 'to divide and conquer', adjusting the algorithm structure of the program, dissolving the data relationship, finding parallel liable ingredients and creating large grain parallel subtasks, the sequential computing of photon transport into is efficiently transformed into parallel and vector computing. The program was run on various HP parallel computers such as the HY-1 (PVP), the Challenge (SMP) and the YH-3 (MPP) and very good parallel speedup has been gotten
Parallelizing AT with MatlabMPI
International Nuclear Information System (INIS)
2011-01-01
The Accelerator Toolbox (AT) is a high-level collection of tools and scripts specifically oriented toward solving problems dealing with computational accelerator physics. It is integrated into the MATLAB environment, which provides an accessible, intuitive interface for accelerator physicists, allowing researchers to focus the majority of their efforts on simulations and calculations, rather than programming and debugging difficulties. Efforts toward parallelization of AT have been put in place to upgrade its performance to modern standards of computing. We utilized the packages MatlabMPI and pMatlab, which were developed by MIT Lincoln Laboratory, to set up a message-passing environment that could be called within MATLAB, which set up the necessary pre-requisites for multithread processing capabilities. On local quad-core CPUs, we were able to demonstrate processor efficiencies of roughly 95% and speed increases of nearly 380%. By exploiting the efficacy of modern-day parallel computing, we were able to demonstrate incredibly efficient speed increments per processor in AT's beam-tracking functions. Extrapolating from prediction, we can expect to reduce week-long computation runtimes to less than 15 minutes. This is a huge performance improvement and has enormous implications for the future computing power of the accelerator physics group at SSRL. However, one of the downfalls of parringpass is its current lack of transparency; the pMatlab and MatlabMPI packages must first be well-understood by the user before the system can be configured to run the scripts. In addition, the instantiation of argument parameters requires internal modification of the source code. Thus, parringpass, cannot be directly run from the MATLAB command line, which detracts from its flexibility and user-friendliness. Future work in AT's parallelization will focus on development of external functions and scripts that can be called from within MATLAB and configured on multiple nodes, while
Declarative Parallel Programming in Spreadsheet End-User Development
DEFF Research Database (Denmark)
Biermann, Florian
2016-01-01
Spreadsheets are first-order functional languages and are widely used in research and industry as a tool to conveniently perform all kinds of computations. Because cells on a spreadsheet are immutable, there are possibilities for implicit parallelization of spreadsheet computations. In this liter...... can directly apply results from functional array programming to a spreadsheet model of computations.......Spreadsheets are first-order functional languages and are widely used in research and industry as a tool to conveniently perform all kinds of computations. Because cells on a spreadsheet are immutable, there are possibilities for implicit parallelization of spreadsheet computations....... In this literature study, we provide an overview of the publications on spreadsheet end-user programming and declarative array programming to inform further research on parallel programming in spreadsheets. Our results show that there is a clear overlap between spreadsheet programming and array programming and we...
Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael
2012-01-01
We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529
International Nuclear Information System (INIS)
Barletta, A.
2008-01-01
The necessary condition for the onset of parallel flow in the fully developed region of an inclined duct is applied to the case of a circular tube. Parallel flow in inclined ducts is an uncommon regime, since in most cases buoyancy tends to produce the onset of secondary flow. The present study shows how proper thermal boundary conditions may preserve parallel flow regime. Mixed convection flow is studied for a special non-axisymmetric thermal boundary condition that, with a proper choice of a switch parameter, may be compatible with parallel flow. More precisely, a circumferentially variable heat flux distribution is prescribed on the tube wall, expressed as a sinusoidal function of the azimuthal coordinate θ with period 2π. A π/2 rotation in the position of the maximum heat flux, achieved by setting the switch parameter, may allow or not the existence of parallel flow. Two cases are considered corresponding to parallel and non-parallel flow. In the first case, the governing balance equations allow a simple analytical solution. On the contrary, in the second case, the local balance equations are solved numerically by employing a finite element method
Parallelization for X-ray crystal structural analysis program
Energy Technology Data Exchange (ETDEWEB)
Watanabe, Hiroshi [Japan Atomic Energy Research Inst., Tokyo (Japan); Minami, Masayuki; Yamamoto, Akiji
1997-10-01
In this report we study vectorization and parallelization for X-ray crystal structural analysis program. The target machine is NEC SX-4 which is a distributed/shared memory type vector parallel supercomputer. X-ray crystal structural analysis is surveyed, and a new multi-dimensional discrete Fourier transform method is proposed. The new method is designed to have a very long vector length, so that it enables to obtain the 12.0 times higher performance result that the original code. Besides the above-mentioned vectorization, the parallelization by micro-task functions on SX-4 reaches 13.7 times acceleration in the part of multi-dimensional discrete Fourier transform with 14 CPUs, and 3.0 times acceleration in the whole program. Totally 35.9 times acceleration to the original 1CPU scalar version is achieved with vectorization and parallelization on SX-4. (author)
Parallel computing solution of Boltzmann neutron transport equation
International Nuclear Information System (INIS)
Ansah-Narh, T.
2010-01-01
The focus of the research was on developing parallel computing algorithm for solving Eigen-values of the Boltzmam Neutron Transport Equation (BNTE) in a slab geometry using multi-grid approach. In response to the problem of slow execution of serial computing when solving large problems, such as BNTE, the study was focused on the design of parallel computing systems which was an evolution of serial computing that used multiple processing elements simultaneously to solve complex physical and mathematical problems. Finite element method (FEM) was used for the spatial discretization scheme, while angular discretization was accomplished by expanding the angular dependence in terms of Legendre polynomials. The eigenvalues representing the multiplication factors in the BNTE were determined by the power method. MATLAB Compiler Version 4.1 (R2009a) was used to compile the MATLAB codes of BNTE. The implemented parallel algorithms were enabled with matlabpool, a Parallel Computing Toolbox function. The option UseParallel was set to 'always' and the default value of the option was 'never'. When those conditions held, the solvers computed estimated gradients in parallel. The parallel computing system was used to handle all the bottlenecks in the matrix generated from the finite element scheme and each domain of the power method generated. The parallel algorithm was implemented on a Symmetric Multi Processor (SMP) cluster machine, which had Intel 32 bit quad-core x 86 processors. Convergence rates and timings for the algorithm on the SMP cluster machine were obtained. Numerical experiments indicated the designed parallel algorithm could reach perfect speedup and had good stability and scalability. (au)
Predicting mining activity with parallel genetic algorithms
Talaie, S.; Leigh, R.; Louis, S.J.; Raines, G.L.; Beyer, H.G.; O'Reilly, U.M.; Banzhaf, Arnold D.; Blum, W.; Bonabeau, C.; Cantu-Paz, E.W.; ,; ,
2005-01-01
We explore several different techniques in our quest to improve the overall model performance of a genetic algorithm calibrated probabilistic cellular automata. We use the Kappa statistic to measure correlation between ground truth data and data predicted by the model. Within the genetic algorithm, we introduce a new evaluation function sensitive to spatial correctness and we explore the idea of evolving different rule parameters for different subregions of the land. We reduce the time required to run a simulation from 6 hours to 10 minutes by parallelizing the code and employing a 10-node cluster. Our empirical results suggest that using the spatially sensitive evaluation function does indeed improve the performance of the model and our preliminary results also show that evolving different rule parameters for different regions tends to improve overall model performance. Copyright 2005 ACM.
International Nuclear Information System (INIS)
Masukawa, Fumihiro; Takano, Makoto; Naito, Yoshitaka; Yamazaki, Takao; Fujisaki, Masahide; Suzuki, Koichiro; Okuda, Motoi.
1993-11-01
In order to improve the accuracy and calculating speed of shielding analyses, MCNP 4, a Monte Carlo neutron and photon transport code system, has been parallelized and measured of its efficiency in the highly parallel distributed memory type computer, AP1000. The code has been analyzed statically and dynamically, then the suitable algorithm for parallelization has been determined for the shielding analysis functions of MCNP 4. This includes a strategy where a new history is assigned to the idling processor element dynamically during the execution. Furthermore, to avoid the congestion of communicative processing, the batch concept, processing multi-histories by a unit, has been introduced. By analyzing a sample cask problem with 2,000,000 histories by the AP1000 with 512 processor elements, the 82 % of parallelization efficiency is achieved, and the calculational speed has been estimated to be around 50 times as fast as that of FACOM M-780. (author)
Structural synthesis of parallel robots
Gogu, Grigore
This book represents the fifth part of a larger work dedicated to the structural synthesis of parallel robots. The originality of this work resides in the fact that it combines new formulae for mobility, connectivity, redundancy and overconstraints with evolutionary morphology in a unified structural synthesis approach that yields interesting and innovative solutions for parallel robotic manipulators. This is the first book on robotics that presents solutions for coupled, decoupled, uncoupled, fully-isotropic and maximally regular robotic manipulators with Schönflies motions systematically generated by using the structural synthesis approach proposed in Part 1. Overconstrained non-redundant/overactuated/redundantly actuated solutions with simple/complex limbs are proposed. Many solutions are presented here for the first time in the literature. The author had to make a difficult and challenging choice between protecting these solutions through patents and releasing them directly into the public domain. T...
GPU Parallel Bundle Block Adjustment
Directory of Open Access Journals (Sweden)
ZHENG Maoteng
2017-09-01
Full Text Available To deal with massive data in photogrammetry, we introduce the GPU parallel computing technology. The preconditioned conjugate gradient and inexact Newton method are also applied to decrease the iteration times while solving the normal equation. A brand new workflow of bundle adjustment is developed to utilize GPU parallel computing technology. Our method can avoid the storage and inversion of the big normal matrix, and compute the normal matrix in real time. The proposed method can not only largely decrease the memory requirement of normal matrix, but also largely improve the efficiency of bundle adjustment. It also achieves the same accuracy as the conventional method. Preliminary experiment results show that the bundle adjustment of a dataset with about 4500 images and 9 million image points can be done in only 1.5 minutes while achieving sub-pixel accuracy.
A tandem parallel plate analyzer
International Nuclear Information System (INIS)
Hamada, Y.; Fujisawa, A.; Iguchi, H.; Nishizawa, A.; Kawasumi, Y.
1996-11-01
By a new modification of a parallel plate analyzer the second-order focus is obtained in an arbitrary injection angle. This kind of an analyzer with a small injection angle will have an advantage of small operational voltage, compared to the Proca and Green analyzer where the injection angle is 30 degrees. Thus, the newly proposed analyzer will be very useful for the precise energy measurement of high energy particles in MeV range. (author)
International Nuclear Information System (INIS)
Gus'kov, B.N.; Kalinnikov, V.A.; Krastev, V.R.; Maksimov, A.N.; Nikityuk, N.M.
1985-01-01
This paper describes a high-speed parallel counter that contains 31 inputs and 15 outputs and is implemented by integrated circuits of series 500. The counter is designed for fast sampling of events according to the number of particles that pass simultaneously through the hodoscopic plane of the detector. The minimum delay of the output signals relative to the input is 43 nsec. The duration of the output signals can be varied from 75 to 120 nsec
An anthropologist in parallel structure
Directory of Open Access Journals (Sweden)
Noelle Molé Liston
2016-08-01
Full Text Available The essay examines the parallels between Molé Liston’s studies on labor and precarity in Italy and the United States’ anthropology job market. Probing the way economic shift reshaped the field of anthropology of Europe in the late 2000s, the piece explores how the neoliberalization of the American academy increased the value in studying the hardships and daily lives of non-western populations in Europe.
Combinatorics of spreads and parallelisms
Johnson, Norman
2010-01-01
Partitions of Vector Spaces Quasi-Subgeometry Partitions Finite Focal-SpreadsGeneralizing André SpreadsThe Going Up Construction for Focal-SpreadsSubgeometry Partitions Subgeometry and Quasi-Subgeometry Partitions Subgeometries from Focal-SpreadsExtended André SubgeometriesKantor's Flag-Transitive DesignsMaximal Additive Partial SpreadsSubplane Covered Nets and Baer Groups Partial Desarguesian t-Parallelisms Direct Products of Affine PlanesJha-Johnson SL(2,
New algorithms for parallel MRI
International Nuclear Information System (INIS)
Anzengruber, S; Ramlau, R; Bauer, F; Leitao, A
2008-01-01
Magnetic Resonance Imaging with parallel data acquisition requires algorithms for reconstructing the patient's image from a small number of measured lines of the Fourier domain (k-space). In contrast to well-known algorithms like SENSE and GRAPPA and its flavors we consider the problem as a non-linear inverse problem. However, in order to avoid cost intensive derivatives we will use Landweber-Kaczmarz iteration and in order to improve the overall results some additional sparsity constraints.
Wakefield calculations on parallel computers
International Nuclear Information System (INIS)
Schoessow, P.
1990-01-01
The use of parallelism in the solution of wakefield problems is illustrated for two different computer architectures (SIMD and MIMD). Results are given for finite difference codes which have been implemented on a Connection Machine and an Alliant FX/8 and which are used to compute wakefields in dielectric loaded structures. Benchmarks on code performance are presented for both cases. 4 refs., 3 figs., 2 tabs
Aspects of computation on asynchronous parallel processors
International Nuclear Information System (INIS)
Wright, M.
1989-01-01
The increasing availability of asynchronous parallel processors has provided opportunities for original and useful work in scientific computing. However, the field of parallel computing is still in a highly volatile state, and researchers display a wide range of opinion about many fundamental questions such as models of parallelism, approaches for detecting and analyzing parallelism of algorithms, and tools that allow software developers and users to make effective use of diverse forms of complex hardware. This volume collects the work of researchers specializing in different aspects of parallel computing, who met to discuss the framework and the mechanics of numerical computing. The far-reaching impact of high-performance asynchronous systems is reflected in the wide variety of topics, which include scientific applications (e.g. linear algebra, lattice gauge simulation, ordinary and partial differential equations), models of parallelism, parallel language features, task scheduling, automatic parallelization techniques, tools for algorithm development in parallel environments, and system design issues
Parallel processing of genomics data
Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario
2016-10-01
The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.
Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E
2014-11-11
Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.
Algorithms for computational fluid dynamics n parallel processors
International Nuclear Information System (INIS)
Van de Velde, E.F.
1986-01-01
A study of parallel algorithms for the numerical solution of partial differential equations arising in computational fluid dynamics is presented. The actual implementation on parallel processors of shared and nonshared memory design is discussed. The performance of these algorithms is analyzed in terms of machine efficiency, communication time, bottlenecks and software development costs. For elliptic equations, a parallel preconditioned conjugate gradient method is described, which has been used to solve pressure equations discretized with high order finite elements on irregular grids. A parallel full multigrid method and a parallel fast Poisson solver are also presented. Hyperbolic conservation laws were discretized with parallel versions of finite difference methods like the Lax-Wendroff scheme and with the Random Choice method. Techniques are developed for comparing the behavior of an algorithm on different architectures as a function of problem size and local computational effort. Effective use of these advanced architecture machines requires the use of machine dependent programming. It is shown that the portability problems can be minimized by introducing high level operations on vectors and matrices structured into program libraries
Empirical study of parallel LRU simulation algorithms
Carr, Eric; Nicol, David M.
1994-01-01
This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.
Optical flow optimization using parallel genetic algorithm
Zavala-Romero, Olmo; Botella, Guillermo; Meyer-Bäse, Anke; Meyer Base, Uwe
2011-06-01
A new approach to optimize the parameters of a gradient-based optical flow model using a parallel genetic algorithm (GA) is proposed. The main characteristics of the optical flow algorithm are its bio-inspiration and robustness against contrast, static patterns and noise, besides working consistently with several optical illusions where other algorithms fail. This model depends on many parameters which conform the number of channels, the orientations required, the length and shape of the kernel functions used in the convolution stage, among many more. The GA is used to find a set of parameters which improve the accuracy of the optical flow on inputs where the ground-truth data is available. This set of parameters helps to understand which of them are better suited for each type of inputs and can be used to estimate the parameters of the optical flow algorithm when used with videos that share similar characteristics. The proposed implementation takes into account the embarrassingly parallel nature of the GA and uses the OpenMP Application Programming Interface (API) to speedup the process of estimating an optimal set of parameters. The information obtained in this work can be used to dynamically reconfigure systems, with potential applications in robotics, medical imaging and tracking.
Automatic Loop Parallelization via Compiler Guided Refactoring
DEFF Research Database (Denmark)
Larsen, Per; Ladelsky, Razya; Lidman, Jacob
For many parallel applications, performance relies not on instruction-level parallelism, but on loop-level parallelism. Unfortunately, many modern applications are written in ways that obstruct automatic loop parallelization. Since we cannot identify sufficient parallelization opportunities...... for these codes in a static, off-line compiler, we developed an interactive compilation feedback system that guides the programmer in iteratively modifying application source, thereby improving the compiler’s ability to generate loop-parallel code. We use this compilation system to modify two sequential...... benchmarks, finding that the code parallelized in this way runs up to 8.3 times faster on an octo-core Intel Xeon 5570 system and up to 12.5 times faster on a quad-core IBM POWER6 system. Benchmark performance varies significantly between the systems. This suggests that semi-automatic parallelization should...
Parallel kinematics type, kinematics, and optimal design
Liu, Xin-Jun
2014-01-01
Parallel Kinematics- Type, Kinematics, and Optimal Design presents the results of 15 year's research on parallel mechanisms and parallel kinematics machines. This book covers the systematic classification of parallel mechanisms (PMs) as well as providing a large number of mechanical architectures of PMs available for use in practical applications. It focuses on the kinematic design of parallel robots. One successful application of parallel mechanisms in the field of machine tools, which is also called parallel kinematics machines, has been the emerging trend in advanced machine tools. The book describes not only the main aspects and important topics in parallel kinematics, but also references novel concepts and approaches, i.e. type synthesis based on evolution, performance evaluation and optimization based on screw theory, singularity model taking into account motion and force transmissibility, and others. This book is intended for researchers, scientists, engineers and postgraduates or above with interes...
Applied Parallel Computing Industrial Computation and Optimization
DEFF Research Database (Denmark)
Madsen, Kaj; NA NA NA Olesen, Dorte
Proceedings and the Third International Workshop on Applied Parallel Computing in Industrial Problems and Optimization (PARA96)......Proceedings and the Third International Workshop on Applied Parallel Computing in Industrial Problems and Optimization (PARA96)...
Parallel algorithms and cluster computing
Hoffmann, Karl Heinz
2007-01-01
This book presents major advances in high performance computing as well as major advances due to high performance computing. It contains a collection of papers in which results achieved in the collaboration of scientists from computer science, mathematics, physics, and mechanical engineering are presented. From the science problems to the mathematical algorithms and on to the effective implementation of these algorithms on massively parallel and cluster computers we present state-of-the-art methods and technology as well as exemplary results in these fields. This book shows that problems which seem superficially distinct become intimately connected on a computational level.
Parallel computation of rotating flows
DEFF Research Database (Denmark)
Lundin, Lars Kristian; Barker, Vincent A.; Sørensen, Jens Nørkær
1999-01-01
This paper deals with the simulation of 3‐D rotating flows based on the velocity‐vorticity formulation of the Navier‐Stokes equations in cylindrical coordinates. The governing equations are discretized by a finite difference method. The solution is advanced to a new time level by a two‐step process...... is that of solving a singular, large, sparse, over‐determined linear system of equations, and the iterative method CGLS is applied for this purpose. We discuss some of the mathematical and numerical aspects of this procedure and report on the performance of our software on a wide range of parallel computers. Darbe...
The parallel volume at large distances
DEFF Research Database (Denmark)
Kampf, Jürgen
In this paper we examine the asymptotic behavior of the parallel volume of planar non-convex bodies as the distance tends to infinity. We show that the difference between the parallel volume of the convex hull of a body and the parallel volume of the body itself tends to . This yields a new proof...... for the fact that a planar body can only have polynomial parallel volume, if it is convex. Extensions to Minkowski spaces and random sets are also discussed....
The parallel volume at large distances
DEFF Research Database (Denmark)
Kampf, Jürgen
In this paper we examine the asymptotic behavior of the parallel volume of planar non-convex bodies as the distance tends to infinity. We show that the difference between the parallel volume of the convex hull of a body and the parallel volume of the body itself tends to 0. This yields a new proof...... for the fact that a planar body can only have polynomial parallel volume, if it is convex. Extensions to Minkowski spaces and random sets are also discussed....
Inter-dot coupling effects on transport through correlated parallel
Indian Academy of Sciences (India)
Transport through symmetric parallel coupled quantum dot system has been studied, using non-equilibrium Green function formalism. The inter-dot tunnelling with on-dot and inter-dot Coulomb repulsion is included. The transmission coefficient and Landaur–Buttiker like current formula are shown in terms of internal states ...
A Parallel Approach to Fractal Image Compression
Directory of Open Access Journals (Sweden)
Lubomir Dedera
2004-01-01
Full Text Available The paper deals with a parallel approach to coding and decoding algorithms in fractal image compressionand presents experimental results comparing sequential and parallel algorithms from the point of view of achieved bothcoding and decoding time and effectiveness of parallelization.
Parallel Computing Using Web Servers and "Servlets".
Lo, Alfred; Bloor, Chris; Choi, Y. K.
2000-01-01
Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…
An Introduction to Parallel Computation R
Indian Academy of Sciences (India)
How are they programmed? This article provides an introduction. A parallel computer is a network of processors built for ... and have been used to solve problems much faster than a single ... in parallel computer design is to select an organization which ..... The most ambitious approach to parallel computing is to develop.
Comparison of parallel viscosity with neoclassical theory
International Nuclear Information System (INIS)
Ida, K.; Nakajima, N.
1996-04-01
Toroidal rotation profiles are measured with charge exchange spectroscopy for the plasma heated with tangential NBI in CHS heliotron/torsatron device to estimate parallel viscosity. The parallel viscosity derived from the toroidal rotation velocity shows good agreement with the neoclassical parallel viscosity plus the perpendicular viscosity. (μ perpendicular = 2 m 2 /s). (author)
Advances in randomized parallel computing
Rajasekaran, Sanguthevar
1999-01-01
The technique of randomization has been employed to solve numerous prob lems of computing both sequentially and in parallel. Examples of randomized algorithms that are asymptotically better than their deterministic counterparts in solving various fundamental problems abound. Randomized algorithms have the advantages of simplicity and better performance both in theory and often in practice. This book is a collection of articles written by renowned experts in the area of randomized parallel computing. A brief introduction to randomized algorithms In the aflalysis of algorithms, at least three different measures of performance can be used: the best case, the worst case, and the average case. Often, the average case run time of an algorithm is much smaller than the worst case. 2 For instance, the worst case run time of Hoare's quicksort is O(n ), whereas its average case run time is only O( n log n). The average case analysis is conducted with an assumption on the input space. The assumption made to arrive at t...
Xyce parallel electronic simulator design.
Energy Technology Data Exchange (ETDEWEB)
Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.
2010-09-01
This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.
Application of parallel preprocessors in data acquisition
International Nuclear Information System (INIS)
Butler, H.S.; Cooper, M.D.; Williams, R.A.; Hughes, E.B.; Rolfe, J.R.; Wilson, S.L.; Zeman, H.D.
1981-01-01
A data-acquisition system is being developed for a large-scale experiment at LAMPF. It will make use of four microprocessors running in parallel to acquire and preprocess data from 432 photomultiplier tubes (PMT) attached to 396 NaI crystals. The microprocessors are LSI-11/23s operating through CAMAC Auxiliary Crate Controllers (ACC). Data acquired by the microprocessors will be collected through a programmable Branch Driver (MBD) which also will read data from 52 scintillators (88 PMTs) and 728 wires comprising a drift chamber. The MBD will transfer data from each event into a PDP-11/44 for further processing and taping. The microprocessors will perform the secondary function of monitoring the calibration of the NaI PMTs. A special trigger circuit allows the system to stack data from a second event while the first is still being processed. Major components of the system were tested in April 1981. Timing measurements from this test are reported
External parallel sorting with multiprocessor computers
International Nuclear Information System (INIS)
Comanceau, S.I.
1984-01-01
This article describes methods of external sorting in which the entire main computer memory is used for the internal sorting of entries, forming out of them sorted segments of the greatest possible size, and outputting them to external memories. The obtained segments are merged into larger segments until all entries form one ordered segment. The described methods are suitable for sequential files stored on magnetic tape. The needs of the sorting algorithm can be met by using the relatively slow peripheral storage devices (e.g., tapes, disks, drums). The efficiency of the external sorting methods is determined by calculating the total sorting time as a function of the number of entries to be sorted and the number of parallel processors participating in the sorting process
Evolution of a minimal parallel programming model
International Nuclear Information System (INIS)
Lusk, Ewing; Butler, Ralph; Pieper, Steven C.
2017-01-01
Here, we take a historical approach to our presentation of self-scheduled task parallelism, a programming model with its origins in early irregular and nondeterministic computations encountered in automated theorem proving and logic programming. We show how an extremely simple task model has evolved into a system, asynchronous dynamic load balancing (ADLB), and a scalable implementation capable of supporting sophisticated applications on today’s (and tomorrow’s) largest supercomputers; and we illustrate the use of ADLB with a Green’s function Monte Carlo application, a modern, mature nuclear physics code in production use. Our lesson is that by surrendering a certain amount of generality and thus applicability, a minimal programming model (in terms of its basic concepts and the size of its application programmer interface) can achieve extreme scalability without introducing complexity.
Research on Control Strategy of Complex Systems through VSC-HVDC Grid Parallel Device
Directory of Open Access Journals (Sweden)
Xue Mei-Juan
2014-07-01
Full Text Available After the completion of grid parallel, the device can turn to be UPFC, STATCOM, SSSC, research on the conversion circuit and transform method by corresponding switching operation. Accomplish the grid parallel and comprehensive control of the tie-line and stable operation and control functions of grid after parallel. Defines the function select operation switch matrix and grid parallel system branch variable, forming a switch matrix to achieve corresponding function of the composite system. Formed a criterion of the selection means to choose control strategy according to the switch matrix, to accomplish corresponding function. Put the grid parallel, STATCOM, SSSC and UPFC together as a system, improve the stable operation and flexible control of the power system.
PDDP, A Data Parallel Programming Model
Directory of Open Access Journals (Sweden)
Karen H. Warren
1996-01-01
Full Text Available PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.
Parallelization of quantum molecular dynamics simulation code
International Nuclear Information System (INIS)
Kato, Kaori; Kunugi, Tomoaki; Shibahara, Masahiko; Kotake, Susumu
1998-02-01
A quantum molecular dynamics simulation code has been developed for the analysis of the thermalization of photon energies in the molecule or materials in Kansai Research Establishment. The simulation code is parallelized for both Scalar massively parallel computer (Intel Paragon XP/S75) and Vector parallel computer (Fujitsu VPP300/12). Scalable speed-up has been obtained with a distribution to processor units by division of particle group in both parallel computers. As a result of distribution to processor units not only by particle group but also by the particles calculation that is constructed with fine calculations, highly parallelization performance is achieved in Intel Paragon XP/S75. (author)
Implementation and performance of parallelized elegant
International Nuclear Information System (INIS)
Wang, Y.; Borland, M.
2008-01-01
The program elegant is widely used for design and modeling of linacs for free-electron lasers and energy recovery linacs, as well as storage rings and other applications. As part of a multi-year effort, we have parallelized many aspects of the code, including single-particle dynamics, wakefields, and coherent synchrotron radiation. We report on the approach used for gradual parallelization, which proved very beneficial in getting parallel features into the hands of users quickly. We also report details of parallelization of collective effects. Finally, we discuss performance of the parallelized code in various applications.
Development of massively parallel quantum chemistry program SMASH
Energy Technology Data Exchange (ETDEWEB)
Ishimura, Kazuya [Department of Theoretical and Computational Molecular Science, Institute for Molecular Science 38 Nishigo-Naka, Myodaiji, Okazaki, Aichi 444-8585 (Japan)
2015-12-31
A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C{sub 150}H{sub 30}){sub 2} with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.
Development of massively parallel quantum chemistry program SMASH
International Nuclear Information System (INIS)
Ishimura, Kazuya
2015-01-01
A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C 150 H 30 ) 2 with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer
Ultrasound Vector Flow Imaging: Part II: Parallel Systems
DEFF Research Database (Denmark)
Jensen, Jørgen Arendt; Nikolov, Svetoslav Ivanov; Yu, Alfred C. H.
2016-01-01
The paper gives a review of the current state-of-theart in ultrasound parallel acquisition systems for flow imaging using spherical and plane waves emissions. The imaging methods are explained along with the advantages of using these very fast and sensitive velocity estimators. These experimental...... ultrasound imaging for studying brain function in animals. The paper explains the underlying acquisition and estimation methods for fast 2-D and 3-D velocity imaging and gives a number of examples. Future challenges and the potentials of parallel acquisition systems for flow imaging are also discussed....
Local and Nonlocal Parallel Heat Transport in General Magnetic Fields
International Nuclear Information System (INIS)
Castillo-Negrete, D. del; Chacon, L.
2011-01-01
A novel approach for the study of parallel transport in magnetized plasmas is presented. The method avoids numerical pollution issues of grid-based formulations and applies to integrable and chaotic magnetic fields with local or nonlocal parallel closures. In weakly chaotic fields, the method gives the fractal structure of the devil's staircase radial temperature profile. In fully chaotic fields, the temperature exhibits self-similar spatiotemporal evolution with a stretched-exponential scaling function for local closures and an algebraically decaying one for nonlocal closures. It is shown that, for both closures, the effective radial heat transport is incompatible with the quasilinear diffusion model.
Parallelization of 2-D lattice Boltzmann codes
International Nuclear Information System (INIS)
Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo.
1996-03-01
Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author)
Parallelization of 2-D lattice Boltzmann codes
Energy Technology Data Exchange (ETDEWEB)
Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo
1996-03-01
Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author).
Arkin, Ethem; Tekinerdogan, Bedir; Imre, Kayhan M.
2017-01-01
The need for high-performance computing together with the increasing trend from single processor to parallel computer architectures has leveraged the adoption of parallel computing. To benefit from parallel computing power, usually parallel algorithms are defined that can be mapped and executed
Experiences in Data-Parallel Programming
Directory of Open Access Journals (Sweden)
Terry W. Clark
1997-01-01
Full Text Available To efficiently parallelize a scientific application with a data-parallel compiler requires certain structural properties in the source program, and conversely, the absence of others. A recent parallelization effort of ours reinforced this observation and motivated this correspondence. Specifically, we have transformed a Fortran 77 version of GROMOS, a popular dusty-deck program for molecular dynamics, into Fortran D, a data-parallel dialect of Fortran. During this transformation we have encountered a number of difficulties that probably are neither limited to this particular application nor do they seem likely to be addressed by improved compiler technology in the near future. Our experience with GROMOS suggests a number of points to keep in mind when developing software that may at some time in its life cycle be parallelized with a data-parallel compiler. This note presents some guidelines for engineering data-parallel applications that are compatible with Fortran D or High Performance Fortran compilers.
Parallel Algorithm for Adaptive Numerical Integration
International Nuclear Information System (INIS)
Sujatmiko, M.; Basarudin, T.
1997-01-01
This paper presents an automation algorithm for integration using adaptive trapezoidal method. The interval is adaptively divided where the width of sub interval are different and fit to the behavior of its function. For a function f, an integration on interval [a,b] can be obtained, with maximum tolerance ε, using estimation (f, a, b, ε). The estimated solution is valid if the error is still in a reasonable range, fulfil certain criteria. If the error is big, however, the problem is solved by dividing it into to similar and independent sub problem on to separate [a, (a+b)/2] and [(a+b)/2, b] interval, i. e. ( f, a, (a+b)/2, ε/2) and (f, (a+b)/2, b, ε/2) estimations. The problems are solved in two different kinds of processor, root processor and worker processor. Root processor function ti divide a main problem into sub problems and distribute them to worker processor. The division mechanism may go further until all of the sub problem are resolved. The solution of each sub problem is then submitted to the root processor such that the solution for the main problem can be obtained. The algorithm is implemented on C-programming-base distributed computer networking system under parallel virtual machine platform
DEFF Research Database (Denmark)
Zander, Mette; Madsbad, Sten; Madsen, Jan Lysgaard
2002-01-01
subcutaneous infusion of GLP-1 (n=10) or saline (n=10) for 6 weeks. Before (week 0) and at weeks 1 and 6, they underwent beta-cell function tests (hyperglycaemic clamps), 8 h profiles of plasma glucose, insulin, C-peptide, glucagon, and free fatty acids, and appetite and side-effect ratings on 100 mm visual...... analogue scales; at weeks 0 and 6 they also underwent dexascanning, measurement of insulin sensitivity (hyperinsulinaemic euglycaemic clamps), haemoglobin A(1c), and fructosamine. The primary endpoints were haemoglobin A(1c) concentration, 8-h profile of glucose concentration in plasma, and beta......-cell function (defined as the first-phase response to glucose and the maximum insulin secretory capacity of the cell). Analyses were per protocol. FINDINGS: One patient assigned saline was excluded because no veins were accessible. In the remaining nine patients in that group, no significant changes were...
Massively parallel diffuse optical tomography
Energy Technology Data Exchange (ETDEWEB)
Sandusky, John V.; Pitts, Todd A.
2017-09-05
Diffuse optical tomography systems and methods are described herein. In a general embodiment, the diffuse optical tomography system comprises a plurality of sensor heads, the plurality of sensor heads comprising respective optical emitter systems and respective sensor systems. A sensor head in the plurality of sensors heads is caused to act as an illuminator, such that its optical emitter system transmits a transillumination beam towards a portion of a sample. Other sensor heads in the plurality of sensor heads act as observers, detecting portions of the transillumination beam that radiate from the sample in the fields of view of the respective sensory systems of the other sensor heads. Thus, sensor heads in the plurality of sensors heads generate sensor data in parallel.
Embodied and Distributed Parallel DJing.
Cappelen, Birgitta; Andersson, Anders-Petter
2016-01-01
Everyone has a right to take part in cultural events and activities, such as music performances and music making. Enforcing that right, within Universal Design, is often limited to a focus on physical access to public areas, hearing aids etc., or groups of persons with special needs performing in traditional ways. The latter might be people with disabilities, being musicians playing traditional instruments, or actors playing theatre. In this paper we focus on the innovative potential of including people with special needs, when creating new cultural activities. In our project RHYME our goal was to create health promoting activities for children with severe disabilities, by developing new musical and multimedia technologies. Because of the users' extreme demands and rich contribution, we ended up creating both a new genre of musical instruments and a new art form. We call this new art form Embodied and Distributed Parallel DJing, and the new genre of instruments for Empowering Multi-Sensorial Things.
Device for balancing parallel strings
Mashikian, Matthew S.
1985-01-01
A battery plant is described which features magnetic circuit means in association with each of the battery strings in the battery plant for balancing the electrical current flow through the battery strings by equalizing the voltage across each of the battery strings. Each of the magnetic circuit means generally comprises means for sensing the electrical current flow through one of the battery strings, and a saturable reactor having a main winding connected electrically in series with the battery string, a bias winding connected to a source of alternating current and a control winding connected to a variable source of direct current controlled by the sensing means. Each of the battery strings is formed by a plurality of batteries connected electrically in series, and these battery strings are connected electrically in parallel across common bus conductors.
Linear parallel processing machines I
Energy Technology Data Exchange (ETDEWEB)
Von Kunze, M
1984-01-01
As is well-known, non-context-free grammars for generating formal languages happen to be of a certain intrinsic computational power that presents serious difficulties to efficient parsing algorithms as well as for the development of an algebraic theory of contextsensitive languages. In this paper a framework is given for the investigation of the computational power of formal grammars, in order to start a thorough analysis of grammars consisting of derivation rules of the form aB ..-->.. A/sub 1/ ... A /sub n/ b/sub 1/...b /sub m/ . These grammars may be thought of as automata by means of parallel processing, if one considers the variables as operators acting on the terminals while reading them right-to-left. This kind of automata and their 2-dimensional programming language prove to be useful by allowing a concise linear-time algorithm for integer multiplication. Linear parallel processing machines (LP-machines) which are, in their general form, equivalent to Turing machines, include finite automata and pushdown automata (with states encoded) as special cases. Bounded LP-machines yield deterministic accepting automata for nondeterministic contextfree languages, and they define an interesting class of contextsensitive languages. A characterization of this class in terms of generating grammars is established by using derivation trees with crossings as a helpful tool. From the algebraic point of view, deterministic LP-machines are effectively represented semigroups with distinguished subsets. Concerning the dualism between generating and accepting devices of formal languages within the algebraic setting, the concept of accepting automata turns out to reduce essentially to embeddability in an effectively represented extension monoid, even in the classical cases.
Parallel computing in enterprise modeling.
Energy Technology Data Exchange (ETDEWEB)
Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.
2008-08-01
This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.
Kinematics analysis and simulation of a new underactuated parallel robot
Directory of Open Access Journals (Sweden)
Wenxu YAN
2017-04-01
Full Text Available The number of degrees of freedom is equal to the number of the traditional robot driving motors, which causes defects such as low efficiency. To overcome that problem, based on the traditional parallel robot, a new underactuated parallel robot is presented. The structure characteristics and working principles of the underactuated parallel robot are analyzed. The forward and inverse solutions are derived by way of space analytic geometry and vector algebra. The kinematics model is established, and MATLAB is implied to verify the accuracy of forward and inverse solutions and identify the optimal work space. The simulation results show that the robot can realize the function of robot switch with three or four degrees of freedom when the number of driving motors is three, improving the efficiency of robot grasping, with the characteristics of large working space, high speed operation, high positioning accuracy, low manufacturing cost and so on, and it will have a wide range of industrial applications.
Parallel and distributed processing in power system simulation and control
Energy Technology Data Exchange (ETDEWEB)
Falcao, Djalma M [Universidade Federal, Rio de Janeiro, RJ (Brazil). Coordenacao dos Programas de Pos-graduacao de Engenharia
1994-12-31
Recent advances in computer technology will certainly have a great impact in the methodologies used in power system expansion and operational planning as well as in real-time control. Parallel and distributed processing are among the new technologies that present great potential for application in these areas. Parallel computers use multiple functional or processing units to speed up computation while distributed processing computer systems are collection of computers joined together by high speed communication networks having many objectives and advantages. The paper presents some ideas for the use of parallel and distributed processing in power system simulation and control. It also comments on some of the current research work in these topics and presents a summary of the work presently being developed at COPPE. (author) 53 refs., 2 figs.
Parallel iterative solvers and preconditioners using approximate hierarchical methods
Energy Technology Data Exchange (ETDEWEB)
Grama, A.; Kumar, V.; Sameh, A. [Univ. of Minnesota, Minneapolis, MN (United States)
1996-12-31
In this paper, we report results of the performance, convergence, and accuracy of a parallel GMRES solver for Boundary Element Methods. The solver uses a hierarchical approximate matrix-vector product based on a hybrid Barnes-Hut / Fast Multipole Method. We study the impact of various accuracy parameters on the convergence and show that with minimal loss in accuracy, our solver yields significant speedups. We demonstrate the excellent parallel efficiency and scalability of our solver. The combined speedups from approximation and parallelism represent an improvement of several orders in solution time. We also develop fast and paralellizable preconditioners for this problem. We report on the performance of an inner-outer scheme and a preconditioner based on truncated Green`s function. Experimental results on a 256 processor Cray T3D are presented.
Parallel computing for homogeneous diffusion and transport equations in neutronics
International Nuclear Information System (INIS)
Pinchedez, K.
1999-06-01
Parallel computing meets the ever-increasing requirements for neutronic computer code speed and accuracy. In this work, two different approaches have been considered. We first parallelized the sequential algorithm used by the neutronics code CRONOS developed at the French Atomic Energy Commission. The algorithm computes the dominant eigenvalue associated with PN simplified transport equations by a mixed finite element method. Several parallel algorithms have been developed on distributed memory machines. The performances of the parallel algorithms have been studied experimentally by implementation on a T3D Cray and theoretically by complexity models. A comparison of various parallel algorithms has confirmed the chosen implementations. We next applied a domain sub-division technique to the two-group diffusion Eigen problem. In the modal synthesis-based method, the global spectrum is determined from the partial spectra associated with sub-domains. Then the Eigen problem is expanded on a family composed, on the one hand, from eigenfunctions associated with the sub-domains and, on the other hand, from functions corresponding to the contribution from the interface between the sub-domains. For a 2-D homogeneous core, this modal method has been validated and its accuracy has been measured. (author)
Parallel heat transport in integrable and chaotic magnetic fields
Energy Technology Data Exchange (ETDEWEB)
Castillo-Negrete, D. del; Chacon, L. [Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831-8071 (United States)
2012-05-15
The study of transport in magnetized plasmas is a problem of fundamental interest in controlled fusion, space plasmas, and astrophysics research. Three issues make this problem particularly challenging: (i) The extreme anisotropy between the parallel (i.e., along the magnetic field), {chi}{sub ||} , and the perpendicular, {chi}{sub Up-Tack }, conductivities ({chi}{sub ||} /{chi}{sub Up-Tack} may exceed 10{sup 10} in fusion plasmas); (ii) Nonlocal parallel transport in the limit of small collisionality; and (iii) Magnetic field lines chaos which in general complicates (and may preclude) the construction of magnetic field line coordinates. Motivated by these issues, we present a Lagrangian Green's function method to solve the local and non-local parallel transport equation applicable to integrable and chaotic magnetic fields in arbitrary geometry. The method avoids by construction the numerical pollution issues of grid-based algorithms. The potential of the approach is demonstrated with nontrivial applications to integrable (magnetic island), weakly chaotic (Devil's staircase), and fully chaotic magnetic field configurations. For the latter, numerical solutions of the parallel heat transport equation show that the effective radial transport, with local and non-local parallel closures, is non-diffusive, thus casting doubts on the applicability of quasilinear diffusion descriptions. General conditions for the existence of non-diffusive, multivalued flux-gradient relations in the temperature evolution are derived.
CUBESIM, Hypercube and Denelcor Hep Parallel Computer Simulation
International Nuclear Information System (INIS)
Dunigan, T.H.
1988-01-01
1 - Description of program or function: CUBESIM is a set of subroutine libraries and programs for the simulation of message-passing parallel computers and shared-memory parallel computers. Subroutines are supplied to simulate the Intel hypercube and the Denelcor HEP parallel computers. The system permits a user to develop and test parallel programs written in C or FORTRAN on a single processor. The user may alter such hypercube parameters as message startup times, packet size, and the computation-to-communication ratio. The simulation generates a trace file that can be used for debugging, performance analysis, or graphical display. 2 - Method of solution: The CUBESIM simulator is linked with the user's parallel application routines to run as a single UNIX process. The simulator library provides a small operating system to perform process and message management. 3 - Restrictions on the complexity of the problem: Up to 128 processors can be simulated with a virtual memory limit of 6 million bytes. Up to 1000 processes can be simulated
Parallel Careers and their Consequences for Companies in Brazil
Directory of Open Access Journals (Sweden)
Maria Candida Baumer Azevedo
2014-04-01
Full Text Available Given the relevance of the need to manage parallel careers to attract and retain people in organizations, this paper provides insight into this phenomenon from an organizational perspective. The parallel career concept, introduced by Alboher (2007 and recently addressed by Schuiling (2012, has previously been examined only from the perspective of the parallel career holder (PC holder. The paper provides insight from both individual and organizational perspectives on the phenomenon of parallel careers and considers how it can function as an important tool for attracting and retaining people by contributing to human development. This paper employs a qualitative approach that includes 30 semi-structured one-on-one interviews. The organizational perspective arises from the 15 interviews with human resources (HR executives from different companies. The individual viewpoint originates from the interviews with 15 executives who are also PC holders. An inductive content analysis approach was used to examine Brazilian companies and the Brazilian office of multinationals. Companies that are concerned about having the best talent on their teams can benefit from a deeper understanding of parallel careers, which can be used to attract, develop, and retain talent. Limitations and directions for future research are discussed.
Analysis of multigrid methods on massively parallel computers: Architectural implications
Matheson, Lesley R.; Tarjan, Robert E.
1993-01-01
We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.
Computer-Aided Parallelizer and Optimizer
Jin, Haoqiang
2011-01-01
The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
Parallel trajectory similarity joins in spatial networks
Shang, Shuo
2018-04-04
The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider two cases of trajectory similarity joins (TS-Joins), including a threshold-based join (Tb-TS-Join) and a top-k TS-Join (k-TS-Join), where the objects are trajectories of vehicles moving in road networks. Given two sets of trajectories and a threshold θ, the Tb-TS-Join returns all pairs of trajectories from the two sets with similarity above θ. In contrast, the k-TS-Join does not take a threshold as a parameter, and it returns the top-k most similar trajectory pairs from the two sets. The TS-Joins target diverse applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction. With these applications in mind, we provide purposeful definitions of similarity. To enable efficient processing of the TS-Joins on large sets of trajectories, we develop search space pruning techniques and enable use of the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer search framework that lays the foundation for the algorithms for the Tb-TS-Join and the k-TS-Join that rely on different pruning techniques to achieve efficiency. For each trajectory, the algorithms first find similar trajectories. Then they merge the results to obtain the final result. The algorithms for the two joins exploit different upper and lower bounds on the spatiotemporal trajectory similarity and different heuristic scheduling strategies for search space pruning. Their per-trajectory searches are independent of each other and can be performed in parallel, and the mergings have constant cost. An empirical study with real data offers insight in the performance of the algorithms and demonstrates that they are capable of outperforming well-designed baseline algorithms by an order of magnitude.
Parallel trajectory similarity joins in spatial networks
Shang, Shuo; Chen, Lisi; Wei, Zhewei; Jensen, Christian S.; Zheng, Kai; Kalnis, Panos
2018-01-01
The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider two cases of trajectory similarity joins (TS-Joins), including a threshold-based join (Tb-TS-Join) and a top-k TS-Join (k-TS-Join), where the objects are trajectories of vehicles moving in road networks. Given two sets of trajectories and a threshold θ, the Tb-TS-Join returns all pairs of trajectories from the two sets with similarity above θ. In contrast, the k-TS-Join does not take a threshold as a parameter, and it returns the top-k most similar trajectory pairs from the two sets. The TS-Joins target diverse applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction. With these applications in mind, we provide purposeful definitions of similarity. To enable efficient processing of the TS-Joins on large sets of trajectories, we develop search space pruning techniques and enable use of the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer search framework that lays the foundation for the algorithms for the Tb-TS-Join and the k-TS-Join that rely on different pruning techniques to achieve efficiency. For each trajectory, the algorithms first find similar trajectories. Then they merge the results to obtain the final result. The algorithms for the two joins exploit different upper and lower bounds on the spatiotemporal trajectory similarity and different heuristic scheduling strategies for search space pruning. Their per-trajectory searches are independent of each other and can be performed in parallel, and the mergings have constant cost. An empirical study with real data offers insight in the performance of the algorithms and demonstrates that they are capable of outperforming well-designed baseline algorithms by an order of magnitude.
2014-11-01
The Special issue presents the papers for the INERA Workshop entitled "Transition Metal Oxides as Functional Layers in Smart windows and Water Splitting Devices", which was held in Varna, St. Konstantin and Elena, Bulgaria, from the 4th-6th September 2014. The Workshop is organized within the context of the INERA "Research and Innovation Capacity Strengthening of ISSP-BAS in Multifunctional Nanostructures", FP7 Project REGPOT 316309 program, European project of the Institute of Solid State Physics at the Bulgarian Academy of Sciences. There were 42 participants at the workshop, 16 from Sweden, Germany, Romania and Hungary, 11 invited lecturers, and 28 young participants. There were researchers present from prestigious European laboratories which are leaders in the field of transition metal oxide thin film technologies. The event contributed to training young researchers in innovative thin film technologies, as well as thin films characterization techniques. The topics of the Workshop cover the field of technology and investigation of thin oxide films as functional layers in "Smart windows" and "Water splitting" devices. The topics are related to the application of novel technologies for the preparation of transition metal oxide films and the modification of chromogenic properties towards the improvement of electrochromic and termochromic device parameters for possible industrial deployment. The Workshop addressed the following topics: Metal oxide films-functional layers in energy efficient devices; Photocatalysts and chemical sensing; Novel thin film technologies and applications; Methods of thin films characterizations; From the 37 abstracts sent, 21 manuscripts were written and later refereed. We appreciate the comments from all the referees, and we are grateful for their valuable contributions. Guest Editors: Assoc. Prof. Dr.Tatyana Ivanova Prof. DSc Kostadinka Gesheva Prof. DSc Hassan Chamatti Assoc. Prof. Dr. Georgi Popkirov Workshop Organizing Committee Prof
Parallel processing for fluid dynamics applications
International Nuclear Information System (INIS)
Johnson, G.M.
1989-01-01
The impact of parallel processing on computational science and, in particular, on computational fluid dynamics is growing rapidly. In this paper, particular emphasis is given to developments which have occurred within the past two years. Parallel processing is defined and the reasons for its importance in high-performance computing are reviewed. Parallel computer architectures are classified according to the number and power of their processing units, their memory, and the nature of their connection scheme. Architectures which show promise for fluid dynamics applications are emphasized. Fluid dynamics problems are examined for parallelism inherent at the physical level. CFD algorithms and their mappings onto parallel architectures are discussed. Several example are presented to document the performance of fluid dynamics applications on present-generation parallel processing devices
Design considerations for parallel graphics libraries
Crockett, Thomas W.
1994-01-01
Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
Multi-petascale highly efficient parallel supercomputer
Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng
2015-07-14
A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
Synchronization Techniques in Parallel Discrete Event Simulation
Lindén, Jonatan
2018-01-01
Discrete event simulation is an important tool for evaluating system models in many fields of science and engineering. To improve the performance of large-scale discrete event simulations, several techniques to parallelize discrete event simulation have been developed. In parallel discrete event simulation, the work of a single discrete event simulation is distributed over multiple processing elements. A key challenge in parallel discrete event simulation is to ensure that causally dependent ...
Parallel processing from applications to systems
Moldovan, Dan I
1993-01-01
This text provides one of the broadest presentations of parallelprocessing available, including the structure of parallelprocessors and parallel algorithms. The emphasis is on mappingalgorithms to highly parallel computers, with extensive coverage ofarray and multiprocessor architectures. Early chapters provideinsightful coverage on the analysis of parallel algorithms andprogram transformations, effectively integrating a variety ofmaterial previously scattered throughout the literature. Theory andpractice are well balanced across diverse topics in this concisepresentation. For exceptional cla
Parallel processing for artificial intelligence 1
Kanal, LN; Kumar, V; Suttner, CB
1994-01-01
Parallel processing for AI problems is of great current interest because of its potential for alleviating the computational demands of AI procedures. The articles in this book consider parallel processing for problems in several areas of artificial intelligence: image processing, knowledge representation in semantic networks, production rules, mechanization of logic, constraint satisfaction, parsing of natural language, data filtering and data mining. The publication is divided into six sections. The first addresses parallel computing for processing and understanding images. The second discus
A survey of parallel multigrid algorithms
Chan, Tony F.; Tuminaro, Ray S.
1987-01-01
A typical multigrid algorithm applied to well-behaved linear-elliptic partial-differential equations (PDEs) is described. Criteria for designing and evaluating parallel algorithms are presented. Before evaluating the performance of some parallel multigrid algorithms, consideration is given to some theoretical complexity results for solving PDEs in parallel and for executing the multigrid algorithm. The effect of mapping and load imbalance on the partial efficiency of the algorithm is studied.
Refinement of Parallel and Reactive Programs
Back, R. J. R.
1992-01-01
We show how to apply the refinement calculus to stepwise refinement of parallel and reactive programs. We use action systems as our basic program model. Action systems are sequential programs which can be implemented in a parallel fashion. Hence refinement calculus methods, originally developed for sequential programs, carry over to the derivation of parallel programs. Refinement of reactive programs is handled by data refinement techniques originally developed for the sequential refinement c...
Kinematics/statics analysis of a novel serial-parallel robotic arm with hand
International Nuclear Information System (INIS)
Lu, Yi; Dai, Zhuohong; Ye, Nijia; Wang, Peng
2015-01-01
A robotic arm with fingered hand generally has multi-functions to complete various complicated operations. A novel serial-parallel robotic arm with a hand is proposed and its kinematics and statics are studied systematically. A 3D prototype of the serial-parallel robotic arm with a hand is constructed and analyzed by simulation. The serial-parallel robotic arm with a hand is composed of an upper 3RPS parallel manipulator, a lower 3SPR parallel manipulator and a hand with three finger mechanisms. Its kinematics formulae for solving the displacement, velocity, acceleration of are derived. Its statics formula for solving the active/constrained forces is derived. Its reachable workspace and orientation workspace are constructed and analyzed. Finally, an analytic example is given for solving the kinematics and statics of the serial-parallel robotic arm with a hand and the analytic solutions are verified by a simulation mechanism.
Kinematics/statics analysis of a novel serial-parallel robotic arm with hand
Energy Technology Data Exchange (ETDEWEB)
Lu, Yi; Dai, Zhuohong; Ye, Nijia; Wang, Peng [Yanshan University, Hebei (China)
2015-10-15
A robotic arm with fingered hand generally has multi-functions to complete various complicated operations. A novel serial-parallel robotic arm with a hand is proposed and its kinematics and statics are studied systematically. A 3D prototype of the serial-parallel robotic arm with a hand is constructed and analyzed by simulation. The serial-parallel robotic arm with a hand is composed of an upper 3RPS parallel manipulator, a lower 3SPR parallel manipulator and a hand with three finger mechanisms. Its kinematics formulae for solving the displacement, velocity, acceleration of are derived. Its statics formula for solving the active/constrained forces is derived. Its reachable workspace and orientation workspace are constructed and analyzed. Finally, an analytic example is given for solving the kinematics and statics of the serial-parallel robotic arm with a hand and the analytic solutions are verified by a simulation mechanism.
Parallel Prediction of Stock Volatility
Directory of Open Access Journals (Sweden)
Priscilla Jenq
2017-10-01
Full Text Available Volatility is a measurement of the risk of financial products. A stock will hit new highs and lows over time and if these highs and lows fluctuate wildly, then it is considered a high volatile stock. Such a stock is considered riskier than a stock whose volatility is low. Although highly volatile stocks are riskier, the returns that they generate for investors can be quite high. Of course, with a riskier stock also comes the chance of losing money and yielding negative returns. In this project, we will use historic stock data to help us forecast volatility. Since the financial industry usually uses S&P 500 as the indicator of the market, we will use S&P 500 as a benchmark to compute the risk. We will also use artificial neural networks as a tool to predict volatilities for a specific time frame that will be set when we configure this neural network. There have been reports that neural networks with different numbers of layers and different numbers of hidden nodes may generate varying results. In fact, we may be able to find the best configuration of a neural network to compute volatilities. We will implement this system using the parallel approach. The system can be used as a tool for investors to allocating and hedging assets.
Vectoring of parallel synthetic jets
Berk, Tim; Ganapathisubramani, Bharathram; Gomit, Guillaume
2015-11-01
A pair of parallel synthetic jets can be vectored by applying a phase difference between the two driving signals. The resulting jet can be merged or bifurcated and either vectored towards the actuator leading in phase or the actuator lagging in phase. In the present study, the influence of phase difference and Strouhal number on the vectoring behaviour is examined experimentally. Phase-locked vorticity fields, measured using Particle Image Velocimetry (PIV), are used to track vortex pairs. The physical mechanisms that explain the diversity in vectoring behaviour are observed based on the vortex trajectories. For a fixed phase difference, the vectoring behaviour is shown to be primarily influenced by pinch-off time of vortex rings generated by the synthetic jets. Beyond a certain formation number, the pinch-off timescale becomes invariant. In this region, the vectoring behaviour is determined by the distance between subsequent vortex rings. We acknowledge the financial support from the European Research Council (ERC grant agreement no. 277472).
A Soft Parallel Kinematic Mechanism.
White, Edward L; Case, Jennifer C; Kramer-Bottiglio, Rebecca
2018-02-01
In this article, we describe a novel holonomic soft robotic structure based on a parallel kinematic mechanism. The design is based on the Stewart platform, which uses six sensors and actuators to achieve full six-degree-of-freedom motion. Our design is much less complex than a traditional platform, since it replaces the 12 spherical and universal joints found in a traditional Stewart platform with a single highly deformable elastomer body and flexible actuators. This reduces the total number of parts in the system and simplifies the assembly process. Actuation is achieved through coiled-shape memory alloy actuators. State observation and feedback is accomplished through the use of capacitive elastomer strain gauges. The main structural element is an elastomer joint that provides antagonistic force. We report the response of the actuators and sensors individually, then report the response of the complete assembly. We show that the completed robotic system is able to achieve full position control, and we discuss the limitations associated with using responsive material actuators. We believe that control demonstrated on a single body in this work could be extended to chains of such bodies to create complex soft robots.
Productive Parallel Programming: The PCN Approach
Directory of Open Access Journals (Sweden)
Ian Foster
1992-01-01
Full Text Available We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel computer and run unchanged on supercomputers, and integrated debugging and performance analysis tools. We survey representative scientific applications and identify problem classes for which PCN has proved particularly useful.
Prabhat
2014-01-01
Gain Critical Insight into the Parallel I/O EcosystemParallel I/O is an integral component of modern high performance computing (HPC), especially in storing and processing very large datasets to facilitate scientific discovery. Revealing the state of the art in this field, High Performance Parallel I/O draws on insights from leading practitioners, researchers, software architects, developers, and scientists who shed light on the parallel I/O ecosystem.The first part of the book explains how large-scale HPC facilities scope, configure, and operate systems, with an emphasis on choices of I/O har
Parallel, Rapid Diffuse Optical Tomography of Breast
National Research Council Canada - National Science Library
Yodh, Arjun
2001-01-01
During the last year we have experimentally and computationally investigated rapid acquisition and analysis of informationally dense diffuse optical data sets in the parallel plate compressed breast geometry...
Parallel, Rapid Diffuse Optical Tomography of Breast
National Research Council Canada - National Science Library
Yodh, Arjun
2002-01-01
During the last year we have experimentally and computationally investigated rapid acquisition and analysis of informationally dense diffuse optical data sets in the parallel plate compressed breast geometry...
Parallel auto-correlative statistics with VTK.
Energy Technology Data Exchange (ETDEWEB)
Pebay, Philippe Pierre; Bennett, Janine Camille
2013-08-01
This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.
Conformal pure radiation with parallel rays
International Nuclear Information System (INIS)
Leistner, Thomas; Paweł Nurowski
2012-01-01
We define pure radiation metrics with parallel rays to be n-dimensional pseudo-Riemannian metrics that admit a parallel null line bundle K and whose Ricci tensor vanishes on vectors that are orthogonal to K. We give necessary conditions in terms of the Weyl, Cotton and Bach tensors for a pseudo-Riemannian metric to be conformal to a pure radiation metric with parallel rays. Then, we derive conditions in terms of the tractor calculus that are equivalent to the existence of a pure radiation metric with parallel rays in a conformal class. We also give analogous results for n-dimensional pseudo-Riemannian pp-waves. (paper)
Compiling Scientific Programs for Scalable Parallel Systems
National Research Council Canada - National Science Library
Kennedy, Ken
2001-01-01
...). The research performed in this project included new techniques for recognizing implicit parallelism in sequential programs, a powerful and precise set-based framework for analysis and transformation...
Parallel thermal radiation transport in two dimensions
International Nuclear Information System (INIS)
Smedley-Stevenson, R.P.; Ball, S.R.
2003-01-01
This paper describes the distributed memory parallel implementation of a deterministic thermal radiation transport algorithm in a 2-dimensional ALE hydrodynamics code. The parallel algorithm consists of a variety of components which are combined in order to produce a state of the art computational capability, capable of solving large thermal radiation transport problems using Blue-Oak, the 3 Tera-Flop MPP (massive parallel processors) computing facility at AWE (United Kingdom). Particular aspects of the parallel algorithm are described together with examples of the performance on some challenging applications. (author)
Parallel Algorithms for the Exascale Era
Energy Technology Data Exchange (ETDEWEB)
Robey, Robert W. [Los Alamos National Laboratory
2016-10-19
New parallel algorithms are needed to reach the Exascale level of parallelism with millions of cores. We look at some of the research developed by students in projects at LANL. The research blends ideas from the early days of computing while weaving in the fresh approach brought by students new to the field of high performance computing. We look at reproducibility of global sums and why it is important to parallel computing. Next we look at how the concept of hashing has led to the development of more scalable algorithms suitable for next-generation parallel computers. Nearly all of this work has been done by undergraduates and published in leading scientific journals.
Parallel thermal radiation transport in two dimensions
Energy Technology Data Exchange (ETDEWEB)
Smedley-Stevenson, R.P.; Ball, S.R. [AWE Aldermaston (United Kingdom)
2003-07-01
This paper describes the distributed memory parallel implementation of a deterministic thermal radiation transport algorithm in a 2-dimensional ALE hydrodynamics code. The parallel algorithm consists of a variety of components which are combined in order to produce a state of the art computational capability, capable of solving large thermal radiation transport problems using Blue-Oak, the 3 Tera-Flop MPP (massive parallel processors) computing facility at AWE (United Kingdom). Particular aspects of the parallel algorithm are described together with examples of the performance on some challenging applications. (author)
Structured Parallel Programming Patterns for Efficient Computation
McCool, Michael; Robison, Arch
2012-01-01
Programming is now parallel programming. Much as structured programming revolutionized traditional serial programming decades ago, a new kind of structured programming, based on patterns, is relevant to parallel programming today. Parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give detailed concrete examples using multiple programming models. Examples are primarily given using two of th
Cryptography in constant parallel time
Applebaum, Benny
2013-01-01
Locally computable (NC0) functions are 'simple' functions for which every bit of the output can be computed by reading a small number of bits of their input. The study of locally computable cryptography attempts to construct cryptographic functions that achieve this strong notion of simplicity and simultaneously provide a high level of security. Such constructions are highly parallelizable and they can be realized by Boolean circuits of constant depth.This book establishes, for the first time, the possibility of local implementations for many basic cryptographic primitives such as one-way func
User's guide of parallel program development environment (PPDE). The 2nd edition
International Nuclear Information System (INIS)
Ueno, Hirokazu; Takemiya, Hiroshi; Imamura, Toshiyuki; Koide, Hiroshi; Matsuda, Katsuyuki; Higuchi, Kenji; Hirayama, Toshio; Ohta, Hirofumi
2000-03-01
The STA basic system has been enhanced to accelerate support for parallel programming on heterogeneous parallel computers, through a series of R and D on the technology of parallel processing. The enhancement has been made through extending the function of the PPDF, Parallel Program Development Environment in the STA basic system. The extended PPDE has the function to make: 1) the automatic creation of a 'makefile' and a shell script file for its execution, 2) the multi-tools execution which makes the tools on heterogeneous computers to execute with one operation a task on a computer, and 3) the mirror composition to reflect editing results of a file on a computer into all related files on other computers. These additional functions will enhance the work efficiency for program development on some computers. More functions have been added to the PPDE to provide help for parallel program development. New functions were also designed to complement a HPF translator and a parallelizing support tool when working together so that a sequential program is efficiently converted to a parallel program. This report describes the use of extended PPDE. (author)
Parallel Computing for Brain Simulation.
Pastur-Romay, L A; Porto-Pazos, A B; Cedron, F; Pazos, A
2017-01-01
The human brain is the most complex system in the known universe, it is therefore one of the greatest mysteries. It provides human beings with extraordinary abilities. However, until now it has not been understood yet how and why most of these abilities are produced. For decades, researchers have been trying to make computers reproduce these abilities, focusing on both understanding the nervous system and, on processing data in a more efficient way than before. Their aim is to make computers process information similarly to the brain. Important technological developments and vast multidisciplinary projects have allowed creating the first simulation with a number of neurons similar to that of a human brain. This paper presents an up-to-date review about the main research projects that are trying to simulate and/or emulate the human brain. They employ different types of computational models using parallel computing: digital models, analog models and hybrid models. This review includes the current applications of these works, as well as future trends. It is focused on various works that look for advanced progress in Neuroscience and still others which seek new discoveries in Computer Science (neuromorphic hardware, machine learning techniques). Their most outstanding characteristics are summarized and the latest advances and future plans are presented. In addition, this review points out the importance of considering not only neurons: Computational models of the brain should also include glial cells, given the proven importance of astrocytes in information processing. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Stampi: a message passing library for distributed parallel computing. User's guide
International Nuclear Information System (INIS)
Imamura, Toshiyuki; Koide, Hiroshi; Takemiya, Hiroshi
1998-11-01
A new message passing library, Stampi, has been developed to realize a computation with different kind of parallel computers arbitrarily and making MPI (Message Passing Interface) as an unique interface for communication. Stampi is based on MPI2 specification. It realizes dynamic process creation to different machines and communication between spawned one within the scope of MPI semantics. Vender implemented MPI as a closed system in one parallel machine and did not support both functions; process creation and communication to external machines. Stampi supports both functions and enables us distributed parallel computing. Currently Stampi has been implemented on COMPACS (COMplex PArallel Computer System) introduced in CCSE, five parallel computers and one graphic workstation, and any communication on them can be processed on. (author)
von Davier, Matthias
2016-01-01
This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…
The language parallel Pascal and other aspects of the massively parallel processor
Reeves, A. P.; Bruner, J. D.
1982-01-01
A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
Customizable Memory Schemes for Data Parallel Architectures
Gou, C.
2011-01-01
Memory system efficiency is crucial for any processor to achieve high performance, especially in the case of data parallel machines. Processing capabilities of parallel lanes will be wasted, when data requests are not accomplished in a sustainable and timely manner. Irregular vector memory accesses
Parallel Narrative Structure in Paul Harding's "Tinkers"
Çirakli, Mustafa Zeki
2014-01-01
The present paper explores the implications of parallel narrative structure in Paul Harding's "Tinkers" (2009). Besides primarily recounting the two sets of parallel narratives, "Tinkers" also comprises of seemingly unrelated fragments such as excerpts from clock repair manuals and diaries. The main stories, however, told…
Streaming nested data parallelism on multicores
DEFF Research Database (Denmark)
Madsen, Frederik Meisner; Filinski, Andrzej
2016-01-01
The paradigm of nested data parallelism (NDP) allows a variety of semi-regular computation tasks to be mapped onto SIMD-style hardware, including GPUs and vector units. However, some care is needed to keep down space consumption in situations where the available parallelism may vastly exceed...
Bayer image parallel decoding based on GPU
Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua
2012-11-01
In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.
Parallelization of TMVA Machine Learning Algorithms
Hajili, Mammad
2017-01-01
This report reflects my work on Parallelization of TMVA Machine Learning Algorithms integrated to ROOT Data Analysis Framework during summer internship at CERN. The report consists of 4 impor- tant part - data set used in training and validation, algorithms that multiprocessing applied on them, parallelization techniques and re- sults of execution time changes due to number of workers.
17 CFR 12.24 - Parallel proceedings.
2010-04-01
...) Definition. For purposes of this section, a parallel proceeding shall include: (1) An arbitration proceeding... the receivership includes the resolution of claims made by customers; or (3) A petition filed under... any of the foregoing with knowledge of a parallel proceeding shall promptly notify the Commission, by...
Parallel S/sub n/ iteration schemes
International Nuclear Information System (INIS)
Wienke, B.R.; Hiromoto, R.E.
1986-01-01
The iterative, multigroup, discrete ordinates (S/sub n/) technique for solving the linear transport equation enjoys widespread usage and appeal. Serial iteration schemes and numerical algorithms developed over the years provide a timely framework for parallel extension. On the Denelcor HEP, the authors investigate three parallel iteration schemes for solving the one-dimensional S/sub n/ transport equation. The multigroup representation and serial iteration methods are also reviewed. This analysis represents a first attempt to extend serial S/sub n/ algorithms to parallel environments and provides good baseline estimates on ease of parallel implementation, relative algorithm efficiency, comparative speedup, and some future directions. The authors examine ordered and chaotic versions of these strategies, with and without concurrent rebalance and diffusion acceleration. Two strategies efficiently support high degrees of parallelization and appear to be robust parallel iteration techniques. The third strategy is a weaker parallel algorithm. Chaotic iteration, difficult to simulate on serial machines, holds promise and converges faster than ordered versions of the schemes. Actual parallel speedup and efficiency are high and payoff appears substantial
Parallel Computing Strategies for Irregular Algorithms
Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)
2002-01-01
Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Parallel fuzzy connected image segmentation on GPU
Zhuge, Ying; Cao, Yong; Udupa, Jayaram K.; Miller, Robert W.
2011-01-01
Purpose: Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm impleme...
Non-Cartesian parallel imaging reconstruction.
Wright, Katherine L; Hamilton, Jesse I; Griswold, Mark A; Gulani, Vikas; Seiberlich, Nicole
2014-11-01
Non-Cartesian parallel imaging has played an important role in reducing data acquisition time in MRI. The use of non-Cartesian trajectories can enable more efficient coverage of k-space, which can be leveraged to reduce scan times. These trajectories can be undersampled to achieve even faster scan times, but the resulting images may contain aliasing artifacts. Just as Cartesian parallel imaging can be used to reconstruct images from undersampled Cartesian data, non-Cartesian parallel imaging methods can mitigate aliasing artifacts by using additional spatial encoding information in the form of the nonhomogeneous sensitivities of multi-coil phased arrays. This review will begin with an overview of non-Cartesian k-space trajectories and their sampling properties, followed by an in-depth discussion of several selected non-Cartesian parallel imaging algorithms. Three representative non-Cartesian parallel imaging methods will be described, including Conjugate Gradient SENSE (CG SENSE), non-Cartesian generalized autocalibrating partially parallel acquisition (GRAPPA), and Iterative Self-Consistent Parallel Imaging Reconstruction (SPIRiT). After a discussion of these three techniques, several potential promising clinical applications of non-Cartesian parallel imaging will be covered. © 2014 Wiley Periodicals, Inc.
Parallel Algorithms for Groebner-Basis Reduction
1987-09-25
22209 ELEMENT NO. NO. NO. ACCESSION NO. 11. TITLE (Include Security Classification) * PARALLEL ALGORITHMS FOR GROEBNER -BASIS REDUCTION 12. PERSONAL...All other editions are obsolete. Productivity Engineering in the UNIXt Environment p Parallel Algorithms for Groebner -Basis Reduction Technical Report
Parallel knock-out schemes in networks
Broersma, H.J.; Fomin, F.V.; Woeginger, G.J.
2004-01-01
We consider parallel knock-out schemes, a procedure on graphs introduced by Lampert and Slater in 1997 in which each vertex eliminates exactly one of its neighbors in each round. We are considering cases in which after a finite number of rounds, where the minimimum number is called the parallel
Building a parallel file system simulator
International Nuclear Information System (INIS)
Molina-Estolano, E; Maltzahn, C; Brandt, S A; Bent, J
2009-01-01
Parallel file systems are gaining in popularity in high-end computing centers as well as commercial data centers. High-end computing systems are expected to scale exponentially and to pose new challenges to their storage scalability in terms of cost and power. To address these challenges scientists and file system designers will need a thorough understanding of the design space of parallel file systems. Yet there exist few systematic studies of parallel file system behavior at petabyte- and exabyte scale. An important reason is the significant cost of getting access to large-scale hardware to test parallel file systems. To contribute to this understanding we are building a parallel file system simulator that can simulate parallel file systems at very large scale. Our goal is to simulate petabyte-scale parallel file systems on a small cluster or even a single machine in reasonable time and fidelity. With this simulator, file system experts will be able to tune existing file systems for specific workloads, scientists and file system deployment engineers will be able to better communicate workload requirements, file system designers and researchers will be able to try out design alternatives and innovations at scale, and instructors will be able to study very large-scale parallel file system behavior in the class room. In this paper we describe our approach and provide preliminary results that are encouraging both in terms of fidelity and simulation scalability.
Broadcasting a message in a parallel computer
Berg, Jeremy E [Rochester, MN; Faraj, Ahmad A [Rochester, MN
2011-08-02
Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.
Differences Between Distributed and Parallel Systems
Energy Technology Data Exchange (ETDEWEB)
Brightwell, R.; Maccabe, A.B.; Rissen, R.
1998-10-01
Distributed systems have been studied for twenty years and are now coming into wider use as fast networks and powerful workstations become more readily available. In many respects a massively parallel computer resembles a network of workstations and it is tempting to port a distributed operating system to such a machine. However, there are significant differences between these two environments and a parallel operating system is needed to get the best performance out of a massively parallel system. This report characterizes the differences between distributed systems, networks of workstations, and massively parallel systems and analyzes the impact of these differences on operating system design. In the second part of the report, we introduce Puma, an operating system specifically developed for massively parallel systems. We describe Puma portals, the basic building blocks for message passing paradigms implemented on top of Puma, and show how the differences observed in the first part of the report have influenced the design and implementation of Puma.
Parallel-In-Time For Moving Meshes
Energy Technology Data Exchange (ETDEWEB)
Falgout, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Manteuffel, T. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Southworth, B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Schroder, J. B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2016-02-04
With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is applied to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.
Parallel programming with Easy Java Simulations
Esquembre, F.; Christian, W.; Belloni, M.
2018-01-01
Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.
A parallel implementation of 3D Zernike moment analysis
Berjón Díez, Daniel; Arnaldo Duart, Sergio; Morán Burgos, Francisco
2011-01-01
Zernike polynomials are a well known set of functions that find many applications in image or pattern characterization because they allow to construct shape descriptors that are invariant against translations, rotations or scale changes. The concepts behind them can be extended to higher dimension spaces, making them also fit to describe volumetric data. They have been less used than their properties might suggest due to their high computational cost. We present a parallel implementation of 3...
Parallel algorithms for simulating continuous time Markov chains
Nicol, David M.; Heidelberger, Philip
1992-01-01
We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.
Multi states electromechanical switch for energy efficient parallel data processing
Kloub, Hussam
2011-04-01
We present a design, simulation results and fabrication of electromechanical switches enabling parallel data processing and multi functionality. The device is applied in logic gates AND, NOR, XNOR, and Flip-Flops. The device footprint size is 2μm by 0.5μm, and has a pull-in voltage of 5.15V which is verified by FEM simulation. © 2011 IEEE.
Multi states electromechanical switch for energy efficient parallel data processing
Kloub, Hussam; Smith, Casey; Hussain, Muhammad Mustafa
2011-01-01
We present a design, simulation results and fabrication of electromechanical switches enabling parallel data processing and multi functionality. The device is applied in logic gates AND, NOR, XNOR, and Flip-Flops. The device footprint size is 2μm by 0.5μm, and has a pull-in voltage of 5.15V which is verified by FEM simulation. © 2011 IEEE.
Computationally efficient implementation of combustion chemistry in parallel PDF calculations
International Nuclear Information System (INIS)
Lu Liuyan; Lantz, Steven R.; Ren Zhuyin; Pope, Stephen B.
2009-01-01
In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f m pi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel
Arkin, Ethem; Tekinerdogan, Bedir
2016-01-01
Mapping parallel algorithms to parallel computing platforms requires several activities such as the analysis of the parallel algorithm, the definition of the logical configuration of the platform, the mapping of the algorithm to the logical configuration platform and the implementation of the
Directory of Open Access Journals (Sweden)
Mehiddin Al-Baali
2015-12-01
Full Text Available We deal with the design of parallel algorithms by using variable partitioning techniques to solve nonlinear optimization problems. We propose an iterative solution method that is very efficient for separable functions, our scope being to discuss its performance for general functions. Experimental results on an illustrative example have suggested some useful modifications that, even though they improve the efficiency of our parallel method, leave some questions open for further investigation.
Heavy ion acceleration at parallel shocks
Directory of Open Access Journals (Sweden)
V. L. Galinsky
2010-11-01
Full Text Available A study of alpha particle acceleration at parallel shock due to an interaction with Alfvén waves self-consistently excited in both upstream and downstream regions was conducted using a scale-separation model (Galinsky and Shevchenko, 2000, 2007. The model uses conservation laws and resonance conditions to find where waves will be generated or damped and hence where particles will be pitch-angle scattered. It considers the total distribution function (for the bulk plasma and high energy tail, so no standard assumptions (e.g. seed populations, or some ad-hoc escape rate of accelerated particles are required. The heavy ion scattering on hydromagnetic turbulence generated by both protons and ions themselves is considered. The contribution of alpha particles to turbulence generation is important because of their relatively large mass-loading parameter P_{α}=n_{α}m_{α}/n_{p}m_{p} (m_{p}, n_{p} and m_{α}, n_{α} are proton and alpha particle mass and density that defines efficiency of wave excitation. The energy spectra of alpha particles are found and compared with those obtained in test particle approximation.
Parallel algorithms for interactive manipulation of digital terrain models
Davis, E. W.; Mcallister, D. F.; Nagaraj, V.
1988-01-01
Interactive three-dimensional graphics applications, such as terrain data representation and manipulation, require extensive arithmetic processing. Massively parallel machines are attractive for this application since they offer high computational rates, and grid connected architectures provide a natural mapping for grid based terrain models. Presented here are algorithms for data movement on the massive parallel processor (MPP) in support of pan and zoom functions over large data grids. It is an extension of earlier work that demonstrated real-time performance of graphics functions on grids that were equal in size to the physical dimensions of the MPP. When the dimensions of a data grid exceed the processing array size, data is packed in the array memory. Windows of the total data grid are interactively selected for processing. Movement of packed data is needed to distribute items across the array for efficient parallel processing. Execution time for data movement was found to exceed that for arithmetic aspects of graphics functions. Performance figures are given for routines written in MPP Pascal.
Acceleration and parallelization calculation of EFEN-SP_3 method
International Nuclear Information System (INIS)
Yang Wen; Zheng Youqi; Wu Hongchun; Cao Liangzhi; Li Yunzhao
2013-01-01
Due to the fact that the exponential function expansion nodal-SP_3 (EFEN-SP_3) method needs further improvement in computational efficiency to routinely carry out PWR whole core pin-by-pin calculation, the coarse mesh acceleration and spatial parallelization were investigated in this paper. The coarse mesh acceleration was built by considering discontinuity factor on each coarse mesh interface and preserving neutron balance within each coarse mesh in space, angle and energy. The spatial parallelization based on MPI was implemented by guaranteeing load balancing and minimizing communications cost to fully take advantage of the modern computing and storage abilities. Numerical results based on a commercial nuclear power reactor demonstrate an speedup ratio of about 40 for the coarse mesh acceleration and a parallel efficiency of higher than 60% with 40 CPUs for the spatial parallelization. With these two improvements, the EFEN code can complete a PWR whole core pin-by-pin calculation with 289 × 289 × 218 meshes and 4 energy groups within 100 s by using 48 CPUs (2.40 GHz frequency). (authors)
Design of high-performance parallelized gene predictors in MATLAB.
Rivard, Sylvain Robert; Mailloux, Jean-Gabriel; Beguenane, Rachid; Bui, Hung Tien
2012-04-10
This paper proposes a method of implementing parallel gene prediction algorithms in MATLAB. The proposed designs are based on either Goertzel's algorithm or on FFTs and have been implemented using varying amounts of parallelism on a central processing unit (CPU) and on a graphics processing unit (GPU). Results show that an implementation using a straightforward approach can require over 4.5 h to process 15 million base pairs (bps) whereas a properly designed one could perform the same task in less than five minutes. In the best case, a GPU implementation can yield these results in 57 s. The present work shows how parallelism can be used in MATLAB for gene prediction in very large DNA sequences to produce results that are over 270 times faster than a conventional approach. This is significant as MATLAB is typically overlooked due to its apparent slow processing time even though it offers a convenient environment for bioinformatics. From a practical standpoint, this work proposes two strategies for accelerating genome data processing which rely on different parallelization mechanisms. Using a CPU, the work shows that direct access to the MEX function increases execution speed and that the PARFOR construct should be used in order to take full advantage of the parallelizable Goertzel implementation. When the target is a GPU, the work shows that data needs to be segmented into manageable sizes within the GFOR construct before processing in order to minimize execution time.
Optimal conductive constructal configurations with “parallel design”
International Nuclear Information System (INIS)
Eslami, M.
2016-01-01
Highlights: • A new parallel design is proposed for conductive cooling of heat generating rectangles. • The geometric features are optimized analytically. • The internal structure morph as a function of available conductive material. • Thermal performance is superior to the previously numerically optimized designs. - Abstract: Today, conductive volume to point cooling of heat generating bodies is under investigation as an alternative method for thermal management of electronic chipsets with high power density. In this paper, a new simple geometry called “parallel design” is proposed for effective conductive cooling of rectangular heat generating bodies. This configuration tries to minimize the thermal resistance associated with the temperature drop inside the heat generating volume. The geometric features of the design are all optimized analytically and expressed with simple explicit equations. It is proved that optimal number of parallel links is equal to the thermal conductivity ratio multiplied by the porosity (or the volume ratio). With the universal aspect ratio of H/L = 2, total thermal resistance of the present parallel design is lower than the recently proposed networks of various shapes that are optimized with help of numerical simulations; especially when more conducting material is available.
Portable parallel programming in a Fortran environment
International Nuclear Information System (INIS)
May, E.N.
1989-01-01
Experience using the Argonne-developed PARMACs macro package to implement a portable parallel programming environment is described. Fortran programs with intrinsic parallelism of coarse and medium granularity are easily converted to parallel programs which are portable among a number of commercially available parallel processors in the class of shared-memory bus-based and local-memory network based MIMD processors. The parallelism is implemented using standard UNIX (tm) tools and a small number of easily understood synchronization concepts (monitors and message-passing techniques) to construct and coordinate multiple cooperating processes on one or many processors. Benchmark results are presented for parallel computers such as the Alliant FX/8, the Encore MultiMax, the Sequent Balance, the Intel iPSC/2 Hypercube and a network of Sun 3 workstations. These parallel machines are typical MIMD types with from 8 to 30 processors, each rated at from 1 to 10 MIPS processing power. The demonstration code used for this work is a Monte Carlo simulation of the response to photons of a ''nearly realistic'' lead, iron and plastic electromagnetic and hadronic calorimeter, using the EGS4 code system. 6 refs., 2 figs., 2 tabs
Performance of the Galley Parallel File System
Nieuwejaar, Nils; Kotz, David
1996-01-01
As the input/output (I/O) needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. This interface conceals the parallism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. Initial experiments, reported in this paper, indicate that Galley is capable of providing high-performance 1/O to applications the applications that rely on them. In Section 3 we describe that access data in patterns that have been observed to be common.
The kpx, a program analyzer for parallelization
International Nuclear Information System (INIS)
Matsuyama, Yuji; Orii, Shigeo; Ota, Toshiro; Kume, Etsuo; Aikawa, Hiroshi.
1997-03-01
The kpx is a program analyzer, developed as a common technological basis for promoting parallel processing. The kpx consists of three tools. The first is ktool, that shows how much execution time is spent in program segments. The second is ptool, that shows parallelization overhead on the Paragon system. The last is xtool, that shows parallelization overhead on the VPP system. The kpx, designed to work for any FORTRAN cord on any UNIX computer, is confirmed to work well after testing on Paragon, SP2, SR2201, VPP500, VPP300, Monte-4, SX-4 and T90. (author)
Synchronization Of Parallel Discrete Event Simulations
Steinman, Jeffrey S.
1992-01-01
Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Multistage parallel-serial time averaging filters
International Nuclear Information System (INIS)
Theodosiou, G.E.
1980-01-01
Here, a new time averaging circuit design, the 'parallel filter' is presented, which can reduce the time jitter, introduced in time measurements using counters of large dimensions. This parallel filter could be considered as a single stage unit circuit which can be repeated an arbitrary number of times in series, thus providing a parallel-serial filter type as a result. The main advantages of such a filter over a serial one are much less electronic gate jitter and time delay for the same amount of total time uncertainty reduction. (orig.)
Implementations of BLAST for parallel computers.
Jülich, A
1995-02-01
The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
Speedup predictions on large scientific parallel programs
International Nuclear Information System (INIS)
Williams, E.; Bobrowicz, F.
1985-01-01
How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory
Language constructs for modular parallel programs
Energy Technology Data Exchange (ETDEWEB)
Foster, I.
1996-03-01
We describe programming language constructs that facilitate the application of modular design techniques in parallel programming. These constructs allow us to isolate resource management and processor scheduling decisions from the specification of individual modules, which can themselves encapsulate design decisions concerned with concurrence, communication, process mapping, and data distribution. This approach permits development of libraries of reusable parallel program components and the reuse of these components in different contexts. In particular, alternative mapping strategies can be explored without modifying other aspects of program logic. We describe how these constructs are incorporated in two practical parallel programming languages, PCN and Fortran M. Compilers have been developed for both languages, allowing experimentation in substantial applications.
Distributed parallel messaging for multiprocessor systems
Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka
2013-06-04
A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.
Massively parallel Fokker-Planck code ALLAp
International Nuclear Information System (INIS)
Batishcheva, A.A.; Krasheninnikov, S.I.; Craddock, G.G.; Djordjevic, V.
1996-01-01
The recently developed for workstations Fokker-Planck code ALLA simulates the temporal evolution of 1V, 2V and 1D2V collisional edge plasmas. In this work we present the results of code parallelization on the CRI T3D massively parallel platform (ALLAp version). Simultaneously we benchmark the 1D2V parallel vesion against an analytic self-similar solution of the collisional kinetic equation. This test is not trivial as it demands a very strong spatial temperature and density variation within the simulation domain. (orig.)
Parallel Readout of Optical Disks
1992-08-01
r(x,y) is the apparent reflectance function of the disk surface including the phase error. The illuminat - ing optics should be chosen so that Er(x,y...of the light uniformly illuminat - ing the chip, Ap = 474\\im 2 is the area of photodiode, and rs is the time required to switch the synapses. Figure...reference beam that is incident from the right. Once the hologram is recorded the input is blocked and the disk is illuminat - ed. Lens LI takes the
User's guide of parallel program development environment (PPDE). The 2nd edition
Energy Technology Data Exchange (ETDEWEB)
Ueno, Hirokazu; Takemiya, Hiroshi; Imamura, Toshiyuki; Koide, Hiroshi; Matsuda, Katsuyuki; Higuchi, Kenji; Hirayama, Toshio [Center for Promotion of Computational Science and Engineering, Japan Atomic Energy Research Institute, Tokyo (Japan); Ohta, Hirofumi [Hitachi Ltd., Tokyo (Japan)
2000-03-01
The STA basic system has been enhanced to accelerate support for parallel programming on heterogeneous parallel computers, through a series of R and D on the technology of parallel processing. The enhancement has been made through extending the function of the PPDF, Parallel Program Development Environment in the STA basic system. The extended PPDE has the function to make: 1) the automatic creation of a 'makefile' and a shell script file for its execution, 2) the multi-tools execution which makes the tools on heterogeneous computers to execute with one operation a task on a computer, and 3) the mirror composition to reflect editing results of a file on a computer into all related files on other computers. These additional functions will enhance the work efficiency for program development on some computers. More functions have been added to the PPDE to provide help for parallel program development. New functions were also designed to complement a HPF translator and a paralleilizing support tool when working together so that a sequential program is efficiently converted to a parallel program. This report describes the use of extended PPDE. (author)
User's guide of parallel program development environment (PPDE). The 2nd edition
Energy Technology Data Exchange (ETDEWEB)
Ueno, Hirokazu; Takemiya, Hiroshi; Imamura, Toshiyuki; Koide, Hiroshi; Matsuda, Katsuyuki; Higuchi, Kenji; Hirayama, Toshio [Center for Promotion of Computational Science and Engineering, Japan Atomic Energy Research Institute, Tokyo (Japan); Ohta, Hirofumi [Hitachi Ltd., Tokyo (Japan)
2000-03-01
The STA basic system has been enhanced to accelerate support for parallel programming on heterogeneous parallel computers, through a series of R and D on the technology of parallel processing. The enhancement has been made through extending the function of the PPDF, Parallel Program Development Environment in the STA basic system. The extended PPDE has the function to make: 1) the automatic creation of a 'makefile' and a shell script file for its execution, 2) the multi-tools execution which makes the tools on heterogeneous computers to execute with one operation a task on a computer, and 3) the mirror composition to reflect editing results of a file on a computer into all related files on other computers. These additional functions will enhance the work efficiency for program development on some computers. More functions have been added to the PPDE to provide help for parallel program development. New functions were also designed to complement a HPF translator and a paralleilizing support tool when working together so that a sequential program is efficiently converted to a parallel program. This report describes the use of extended PPDE. (author)
Massively Parallel Computing: A Sandia Perspective
Energy Technology Data Exchange (ETDEWEB)
Dosanjh, Sudip S.; Greenberg, David S.; Hendrickson, Bruce; Heroux, Michael A.; Plimpton, Steve J.; Tomkins, James L.; Womble, David E.
1999-05-06
The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is beginning to realize its potential for enabling significant break-throughs in science and engineering. This paper provides a perspective on the state of the field, colored by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address trends in hardware, system software and algorithms, and we also offer our view of the forces shaping the parallel computing industry.
Parallel generation of architecture on the GPU
Steinberger, Markus; Kenzel, Michael; Kainz, Bernhard K.; Mü ller, Jö rg; Wonka, Peter; Schmalstieg, Dieter
2014-01-01
they can take advantage of, or both, our method supports state of the art procedural modeling including stochasticity and context-sensitivity. To increase parallelism, we explicitly express independence in the grammar, reduce inter-rule dependencies
New high voltage parallel plate analyzer
International Nuclear Information System (INIS)
Hamada, Y.; Kawasumi, Y.; Masai, K.; Iguchi, H.; Fujisawa, A.; Abe, Y.
1992-01-01
A new modification on the parallel plate analyzer for 500 keV heavy ions to eliminate the effect of the intense UV and visible radiations, is successfully conducted. Its principle and results are discussed. (author)
Parallel data encryption with RSA algorithm
Неретин, А. А.
2016-01-01
In this paper a parallel RSA algorithm with preliminary shuffling of source text was presented.Dependence of an encryption speed on the number of encryption nodes has been analysed, The proposed algorithm was implemented on C# language.
Data parallel sorting for particle simulation
Dagum, Leonardo
1992-01-01
Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.
Parallel debt in the Serbian finance law
Directory of Open Access Journals (Sweden)
Kuzman Miloš
2014-01-01
Full Text Available The purpose of this paper is to present the mechanism of parallel debt in the Serbian financial law. While considering whether the mechanism of parallel debt exists under the Serbian law, the Anglo-Saxon mechanism of trust is represented. Hence it is explained why the mechanism of trust is not allowed under the Serbian law. Further on, the mechanism of parallel debt is introduced as well as a debate on permissibility of its cause in the Serbian law. Comparative legal arguments about this issue are also presented in this paper. In conclusion, the author suggests that on the basis of the conclusions drawn in this paper, the parallel debt mechanism is to be declared admissible if it is ever taken into consideration by the Serbian courts.
Parallel Monte Carlo simulation of aerosol dynamics
Zhou, K.; He, Z.; Xiao, M.; Zhang, Z.
2014-01-01
is simulated with a stochastic method (Marcus-Lushnikov stochastic process). Operator splitting techniques are used to synthesize the deterministic and stochastic parts in the algorithm. The algorithm is parallelized using the Message Passing Interface (MPI
Stranger than fiction: parallel universes beguile science
2007-01-01
We may not be able - at least not yet - to prove they exist, many serious scientists say, but there are plenty of reasons to think that parallel dimensions are more than figments of effeaded imagination. (1/2 page)
Parallel computation of nondeterministic algorithms in VLSI
Energy Technology Data Exchange (ETDEWEB)
Hortensius, P D
1987-01-01
This work examines parallel VLSI implementations of nondeterministic algorithms. It is demonstrated that conventional pseudorandom number generators are unsuitable for highly parallel applications. Efficient parallel pseudorandom sequence generation can be accomplished using certain classes of elementary one-dimensional cellular automata. The pseudorandom numbers appear in parallel on each clock cycle. Extensive study of the properties of these new pseudorandom number generators is made using standard empirical random number tests, cycle length tests, and implementation considerations. Furthermore, it is shown these particular cellular automata can form the basis of efficient VLSI architectures for computations involved in the Monte Carlo simulation of both the percolation and Ising models from statistical mechanics. Finally, a variation on a Built-In Self-Test technique based upon cellular automata is presented. These Cellular Automata-Logic-Block-Observation (CALBO) circuits improve upon conventional design for testability circuitry.
Adapting algorithms to massively parallel hardware
Sioulas, Panagiotis
2016-01-01
In the recent years, the trend in computing has shifted from delivering processors with faster clock speeds to increasing the number of cores per processor. This marks a paradigm shift towards parallel programming in which applications are programmed to exploit the power provided by multi-cores. Usually there is gain in terms of the time-to-solution and the memory footprint. Specifically, this trend has sparked an interest towards massively parallel systems that can provide a large number of processors, and possibly computing nodes, as in the GPUs and MPPAs (Massively Parallel Processor Arrays). In this project, the focus was on two distinct computing problems: k-d tree searches and track seeding cellular automata. The goal was to adapt the algorithms to parallel systems and evaluate their performance in different cases.
Implementing Shared Memory Parallelism in MCBEND
Directory of Open Access Journals (Sweden)
Bird Adam
2017-01-01
Full Text Available MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers’s ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.
Domain decomposition methods and parallel computing
International Nuclear Information System (INIS)
Meurant, G.
1991-01-01
In this paper, we show how to efficiently solve large linear systems on parallel computers. These linear systems arise from discretization of scientific computing problems described by systems of partial differential equations. We show how to get a discrete finite dimensional system from the continuous problem and the chosen conjugate gradient iterative algorithm is briefly described. Then, the different kinds of parallel architectures are reviewed and their advantages and deficiencies are emphasized. We sketch the problems found in programming the conjugate gradient method on parallel computers. For this algorithm to be efficient on parallel machines, domain decomposition techniques are introduced. We give results of numerical experiments showing that these techniques allow a good rate of convergence for the conjugate gradient algorithm as well as computational speeds in excess of a billion of floating point operations per second. (author). 5 refs., 11 figs., 2 tabs., 1 inset
6th International Parallel Tools Workshop
Brinkmann, Steffen; Gracia, José; Resch, Michael; Nagel, Wolfgang
2013-01-01
The latest advances in the High Performance Computing hardware have significantly raised the level of available compute performance. At the same time, the growing hardware capabilities of modern supercomputing architectures have caused an increasing complexity of the parallel application development. Despite numerous efforts to improve and simplify parallel programming, there is still a lot of manual debugging and tuning work required. This process is supported by special software tools, facilitating debugging, performance analysis, and optimization and thus making a major contribution to the development of robust and efficient parallel software. This book introduces a selection of the tools, which were presented and discussed at the 6th International Parallel Tools Workshop, held in Stuttgart, Germany, 25-26 September 2012.
Parallel processor programs in the Federal Government
Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.
1985-01-01
In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.
High performance parallel computers for science
International Nuclear Information System (INIS)
Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.
1989-01-01
This paper reports that Fermilab's Advanced Computer Program (ACP) has been developing cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 Mflops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction
Massively parallel evolutionary computation on GPGPUs
Tsutsui, Shigeyoshi
2013-01-01
Evolutionary algorithms (EAs) are metaheuristics that learn from natural collective behavior and are applied to solve optimization problems in domains such as scheduling, engineering, bioinformatics, and finance. Such applications demand acceptable solutions with high-speed execution using finite computational resources. Therefore, there have been many attempts to develop platforms for running parallel EAs using multicore machines, massively parallel cluster machines, or grid computing environments. Recent advances in general-purpose computing on graphics processing units (GPGPU) have opened u
Freeman, Bryan
2013-01-01
This book contains practical recipes on everything you will need to create task-based parallel programs using C#, .NET 4.5, and Visual Studio. The book is packed with illustrated code examples to create scalable programs.This book is intended to help experienced C# developers write applications that leverage the power of modern multicore processors. It provides the necessary knowledge for an experienced C# developer to work with .NET parallelism APIs. Previous experience of writing multithreaded applications is not necessary.
Simulation Exploration through Immersive Parallel Planes: Preprint
Energy Technology Data Exchange (ETDEWEB)
Brunhart-Lupo, Nicholas; Bush, Brian W.; Gruchalla, Kenny; Smith, Steve
2016-03-01
We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, each individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.
Alternative derivation of the parallel ion viscosity
International Nuclear Information System (INIS)
Bravenec, R.V.; Berk, H.L.; Hammer, J.H.
1982-01-01
A set of double-adiabatic fluid equations with additional collisional relaxation between the ion temperatures parallel and perpendicular to a magnetic field are shown to reduce to a set involving a single temperature and a parallel viscosity. This result is applied to a recently published paper [R. V. Bravenec, A. J. Lichtenberg, M. A. Leiberman, and H. L. Berk, Phys. Fluids 24, 1320 (1981)] on viscous flow in a multiple-mirror configuration
Acoustic simulation in architecture with parallel algorithm
Li, Xiaohong; Zhang, Xinrong; Li, Dan
2004-03-01
In allusion to complexity of architecture environment and Real-time simulation of architecture acoustics, a parallel radiosity algorithm was developed. The distribution of sound energy in scene is solved with this method. And then the impulse response between sources and receivers at frequency segment, which are calculated with multi-process, are combined into whole frequency response. The numerical experiment shows that parallel arithmetic can improve the acoustic simulating efficiency of complex scene.
PARALLEL SOLUTION METHODS OF PARTIAL DIFFERENTIAL EQUATIONS
Directory of Open Access Journals (Sweden)
Korhan KARABULUT
1998-03-01
Full Text Available Partial differential equations arise in almost all fields of science and engineering. Computer time spent in solving partial differential equations is much more than that of in any other problem class. For this reason, partial differential equations are suitable to be solved on parallel computers that offer great computation power. In this study, parallel solution to partial differential equations with Jacobi, Gauss-Siedel, SOR (Succesive OverRelaxation and SSOR (Symmetric SOR algorithms is studied.
Simulation Exploration through Immersive Parallel Planes
Energy Technology Data Exchange (ETDEWEB)
Brunhart-Lupo, Nicholas J [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Bush, Brian W [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Gruchalla, Kenny M [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Smith, Steve [Los Alamos Visualization Associates
2017-05-25
We present a visualization-driven simulation system that tightly couples systems dynamics simulations with an immersive virtual environment to allow analysts to rapidly develop and test hypotheses in a high-dimensional parameter space. To accomplish this, we generalize the two-dimensional parallel-coordinates statistical graphic as an immersive 'parallel-planes' visualization for multivariate time series emitted by simulations running in parallel with the visualization. In contrast to traditional parallel coordinate's mapping the multivariate dimensions onto coordinate axes represented by a series of parallel lines, we map pairs of the multivariate dimensions onto a series of parallel rectangles. As in the case of parallel coordinates, each individual observation in the dataset is mapped to a polyline whose vertices coincide with its coordinate values. Regions of the rectangles can be 'brushed' to highlight and select observations of interest: a 'slider' control allows the user to filter the observations by their time coordinate. In an immersive virtual environment, users interact with the parallel planes using a joystick that can select regions on the planes, manipulate selection, and filter time. The brushing and selection actions are used to both explore existing data as well as to launch additional simulations corresponding to the visually selected portions of the input parameter space. As soon as the new simulations complete, their resulting observations are displayed in the virtual environment. This tight feedback loop between simulation and immersive analytics accelerates users' realization of insights about the simulation and its output.
Current distribution characteristics of superconducting parallel circuits
International Nuclear Information System (INIS)
Mori, K.; Suzuki, Y.; Hara, N.; Kitamura, M.; Tominaka, T.
1994-01-01
In order to increase the current carrying capacity of the current path of the superconducting magnet system, the portion of parallel circuits such as insulated multi-strand cables or parallel persistent current switches (PCS) are made. In superconducting parallel circuits of an insulated multi-strand cable or a parallel persistent current switch (PCS), the current distribution during the current sweep, the persistent mode, and the quench process were investigated. In order to measure the current distribution, two methods were used. (1) Each strand was surrounded with a pure iron core with the air gap. In the air gap, a Hall probe was located. The accuracy of this method was deteriorated by the magnetic hysteresis of iron. (2) The Rogowski coil without iron was used for the current measurement of each path in a 4-parallel PCS. As a result, it was shown that the current distribution characteristics of a parallel PCS is very similar to that of an insulated multi-strand cable for the quench process
Parallel processing of structural integrity analysis codes
International Nuclear Information System (INIS)
Swami Prasad, P.; Dutta, B.K.; Kushwaha, H.S.
1996-01-01
Structural integrity analysis forms an important role in assessing and demonstrating the safety of nuclear reactor components. This analysis is performed using analytical tools such as Finite Element Method (FEM) with the help of digital computers. The complexity of the problems involved in nuclear engineering demands high speed computation facilities to obtain solutions in reasonable amount of time. Parallel processing systems such as ANUPAM provide an efficient platform for realising the high speed computation. The development and implementation of software on parallel processing systems is an interesting and challenging task. The data and algorithm structure of the codes plays an important role in exploiting the parallel processing system capabilities. Structural analysis codes based on FEM can be divided into two categories with respect to their implementation on parallel processing systems. The first category codes such as those used for harmonic analysis, mechanistic fuel performance codes need not require the parallelisation of individual modules of the codes. The second category of codes such as conventional FEM codes require parallelisation of individual modules. In this category, parallelisation of equation solution module poses major difficulties. Different solution schemes such as domain decomposition method (DDM), parallel active column solver and substructuring method are currently used on parallel processing systems. Two codes, FAIR and TABS belonging to each of these categories have been implemented on ANUPAM. The implementation details of these codes and the performance of different equation solvers are highlighted. (author). 5 refs., 12 figs., 1 tab
Parallel computation of automatic differentiation applied to magnetic field calculations
International Nuclear Information System (INIS)
Hinkins, R.L.; Lawrence Berkeley Lab., CA
1994-09-01
The author presents a parallelization of an accelerator physics application to simulate magnetic field in three dimensions. The problem involves the evaluation of high order derivatives with respect to two variables of a multivariate function. Automatic differentiation software had been used with some success, but the computation time was prohibitive. The implementation runs on several platforms, including a network of workstations using PVM, a MasPar using MPFortran, and a CM-5 using CMFortran. A careful examination of the code led to several optimizations that improved its serial performance by a factor of 8.7. The parallelization produced further improvements, especially on the MasPar with a speedup factor of 620. As a result a problem that took six days on a SPARC 10/41 now runs in minutes on the MasPar, making it feasible for physicists at Lawrence Berkeley Laboratory to simulate larger magnets
Sequential and parallel image restoration: neural network implementations.
Figueiredo, M T; Leitao, J N
1994-01-01
Sequential and parallel image restoration algorithms and their implementations on neural networks are proposed. For images degraded by linear blur and contaminated by additive white Gaussian noise, maximum a posteriori (MAP) estimation and regularization theory lead to the same high dimension convex optimization problem. The commonly adopted strategy (in using neural networks for image restoration) is to map the objective function of the optimization problem into the energy of a predefined network, taking advantage of its energy minimization properties. Departing from this approach, we propose neural implementations of iterative minimization algorithms which are first proved to converge. The developed schemes are based on modified Hopfield (1985) networks of graded elements, with both sequential and parallel updating schedules. An algorithm supported on a fully standard Hopfield network (binary elements and zero autoconnections) is also considered. Robustness with respect to finite numerical precision is studied, and examples with real images are presented.
A novel parallel pipeline structure of VP9 decoder
Qin, Huabiao; Chen, Wu; Yi, Sijun; Tan, Yunfei; Yi, Huan
2018-04-01
To improve the efficiency of VP9 decoder, a novel parallel pipeline structure of VP9 decoder is presented in this paper. According to the decoding workflow, VP9 decoder can be divided into sub-modules which include entropy decoding, inverse quantization, inverse transform, intra prediction, inter prediction, deblocking and pixel adaptive compensation. By analyzing the computing time of each module, hotspot modules are located and the causes of low efficiency of VP9 decoder can be found. Then, a novel pipeline decoder structure is designed by using mixed parallel decoding methods of data division and function division. The experimental results show that this structure can greatly improve the decoding efficiency of VP9.
Application of parallel processing for automatic inspection of printed circuits
International Nuclear Information System (INIS)
Lougheed, R.M.
1986-01-01
Automated visual inspection of printed electronic circuits is a challenging application for image processing systems. Detailed inspection requires high speed analysis of gray scale imagery along with high quality optics, lighting, and sensing equipment. A prototype system has been developed and demonstrated at the Environmental Research Institute of Michigan (ERIM) for inspection of multilayer thick-film circuits. The central problem of real-time image processing is solved by a special-purpose parallel processor which includes a new high-speed Cytocomputer. In this chapter the inspection process and the algorithms used are summarized, along with the functional requirements of the machine vision system. Next, the parallel processor is described in detail and then performance on this application is given
Parallel Geometries in Geant4 foundation and recent enhancements
Apostolakis, J; Cosmo, G; Howard, A; Ivanchenko, V; Verderi, M
2009-01-01
The Geant4 software toolkit simulates the passage of particles through matter. It is utilized in high energy and nuclear physics experiments, in medical physics and space applications. For many applications it is necessary to measure particle fluxes and radiation doses in parts of the setup where there are complex structures. To undertake this in a flexible way, Geant4 has tools to create and use additional, parallel, geometrical hierarchies within a single application. A separate, parallel geometry can be used for each one amongst shower parameterization, event biasing, scoring of radiation, and/or the creation of hits in detailed readout structures. We describe the existing basic capabilities of the Geant4 toolkit to create multiple geometries and the recent major enhancements undertaken to streamline, enhance and extend these. New functionality enables Geant4 developers to offer new embedded schemes for scoring (requiring no user C++ code); has simplified the implementation of processes or capabilities usi...
Rideaux, Reuben; Apthorp, Deborah; Edwards, Mark
2015-02-12
Recent findings have indicated the capacity to consolidate multiple items into visual short-term memory in parallel varies as a function of the type of information. That is, while color can be consolidated in parallel, evidence suggests that orientation cannot. Here we investigated the capacity to consolidate multiple motion directions in parallel and reexamined this capacity using orientation. This was achieved by determining the shortest exposure duration necessary to consolidate a single item, then examining whether two items, presented simultaneously, could be consolidated in that time. The results show that parallel consolidation of direction and orientation information is possible, and that parallel consolidation of direction appears to be limited to two. Additionally, we demonstrate the importance of adequate separation between feature intervals used to define items when attempting to consolidate in parallel, suggesting that when multiple items are consolidated in parallel, as opposed to serially, the resolution of representations suffer. Finally, we used facilitation of spatial attention to show that the deterioration of item resolution occurs during parallel consolidation, as opposed to storage. © 2015 ARVO.
Type synthesis for 4-DOF parallel press mechanism using GF set theory
He, Jun; Gao, Feng; Meng, Xiangdun; Guo, Weizhong
2015-07-01
Parallel mechanisms is used in the large capacity servo press to avoid the over-constraint of the traditional redundant actuation. Currently, the researches mainly focus on the performance analysis for some specific parallel press mechanisms. However, the type synthesis and evaluation of parallel press mechanisms is seldom studied, especially for the four degrees of freedom(DOF) press mechanisms. The type synthesis of 4-DOF parallel press mechanisms is carried out based on the generalized function(GF) set theory. Five design criteria of 4-DOF parallel press mechanisms are firstly proposed. The general procedure of type synthesis of parallel press mechanisms is obtained, which includes number synthesis, symmetrical synthesis of constraint GF sets, decomposition of motion GF sets and design of limbs. Nine combinations of constraint GF sets of 4-DOF parallel press mechanisms, ten combinations of GF sets of active limbs, and eleven combinations of GF sets of passive limbs are synthesized. Thirty-eight kinds of press mechanisms are presented and then different structures of kinematic limbs are designed. Finally, the geometrical constraint complexity( GCC), kinematic pair complexity( KPC), and type complexity( TC) are proposed to evaluate the press types and the optimal press type is achieved. The general methodologies of type synthesis and evaluation for parallel press mechanism are suggested.
Stampi: a message passing library for distributed parallel computing. User's guide, second edition
International Nuclear Information System (INIS)
Imamura, Toshiyuki; Koide, Hiroshi; Takemiya, Hiroshi
2000-02-01
A new message passing library, Stampi, has been developed to realize a computation with different kind of parallel computers arbitrarily and making MPI (Message Passing Interface) as an unique interface for communication. Stampi is based on the MPI2 specification, and it realizes dynamic process creation to different machines and communication between spawned one within the scope of MPI semantics. Main features of Stampi are summarized as follows: (i) an automatic switch function between external- and internal communications, (ii) a message routing/relaying with a routing module, (iii) a dynamic process creation, (iv) a support of two types of connection, Master/Slave and Client/Server, (v) a support of a communication with Java applets. Indeed vendors implemented MPI libraries as a closed system in one parallel machine or their systems, and did not support both functions; process creation and communication to external machines. Stampi supports both functions and enables us distributed parallel computing. Currently Stampi has been implemented on COMPACS (COMplex PArallel Computer System) introduced in CCSE, five parallel computers and one graphic workstation, moreover on eight kinds of parallel machines, totally fourteen systems. Stampi provides us MPI communication functionality on them. This report describes mainly the usage of Stampi. (author)
Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis
Choudhary, Alok Nidhi
1989-01-01
Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.
Concurrent computation of attribute filters on shared memory parallel machines
Wilkinson, Michael H.F.; Gao, Hui; Hesselink, Wim H.; Jonker, Jan-Eppo; Meijster, Arnold
2008-01-01
Morphological attribute filters have not previously been parallelized mainly because they are both global and nonseparable. We propose a parallel algorithm that achieves efficient parallelism for a large class of attribute filters, including attribute openings, closings, thinnings, and thickenings,
A task parallel implementation of fast multipole methods
Taura, Kenjiro; Nakashima, Jun; Yokota, Rio; Maruyama, Naoya
2012-01-01
This paper describes a task parallel implementation of ExaFMM, an open source implementation of fast multipole methods (FMM), using a lightweight task parallel library MassiveThreads. Although there have been many attempts on parallelizing FMM
Parallel phase model : a programming model for high-end parallel machines with manycores.
Energy Technology Data Exchange (ETDEWEB)
Wu, Junfeng (Syracuse University, Syracuse, NY); Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian
2009-04-01
This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.
The specification of Stampi, a message passing library for distributed parallel computing
International Nuclear Information System (INIS)
Imamura, Toshiyuki; Takemiya, Hiroshi; Koide, Hiroshi
2000-03-01
At CCSE, Center for Promotion of Computational Science and Engineering, a new message passing library for heterogeneous and distributed parallel computing has been developed, and it is called as Stampi. Stampi enables us to communicate between any combination of parallel computers as well as workstations. Currently, a Stampi system is constructed from Stampi library and Stampi/Java. It provides functions to connect a Stampi application with not only those on COMPACS, COMplex Parallel Computer System, but also applets which work on WWW browsers. This report summarizes the specifications of Stampi and details the development of its system. (author)
Parallel evolutionary computation in bioinformatics applications.
Pinho, Jorge; Sobral, João Luis; Rocha, Miguel
2013-05-01
A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Parallelization of Subchannel Analysis Code MATRA
International Nuclear Information System (INIS)
Kim, Seongjin; Hwang, Daehyun; Kwon, Hyouk
2014-01-01
A stand-alone calculation of MATRA code used up pertinent computing time for the thermal margin calculations while a relatively considerable time is needed to solve the whole core pin-by-pin problems. In addition, it is strongly required to improve the computation speed of the MATRA code to satisfy the overall performance of the multi-physics coupling calculations. Therefore, a parallel approach to improve and optimize the computability of the MATRA code is proposed and verified in this study. The parallel algorithm is embodied in the MATRA code using the MPI communication method and the modification of the previous code structure was minimized. An improvement is confirmed by comparing the results between the single and multiple processor algorithms. The speedup and efficiency are also evaluated when increasing the number of processors. The parallel algorithm was implemented to the subchannel code MATRA using the MPI. The performance of the parallel algorithm was verified by comparing the results with those from the MATRA with the single processor. It is also noticed that the performance of the MATRA code was greatly improved by implementing the parallel algorithm for the 1/8 core and whole core problems
Improvement of Parallel Algorithm for MATRA Code
International Nuclear Information System (INIS)
Kim, Seong-Jin; Seo, Kyong-Won; Kwon, Hyouk; Hwang, Dae-Hyun
2014-01-01
The feasibility study to parallelize the MATRA code was conducted in KAERI early this year. As a result, a parallel algorithm for the MATRA code has been developed to decrease a considerably required computing time to solve a bigsize problem such as a whole core pin-by-pin problem of a general PWR reactor and to improve an overall performance of the multi-physics coupling calculations. It was shown that the performance of the MATRA code was greatly improved by implementing the parallel algorithm using MPI communication. For problems of a 1/8 core and whole core for SMART reactor, a speedup was evaluated as about 10 when the numbers of used processor were 25. However, it was also shown that the performance deteriorated as the axial node number increased. In this paper, the procedure of a communication between processors is optimized to improve the previous parallel algorithm.. To improve the performance deterioration of the parallelized MATRA code, the communication algorithm between processors was newly presented. It was shown that the speedup was improved and stable regardless of the axial node number
Iteration schemes for parallelizing models of superconductivity
Energy Technology Data Exchange (ETDEWEB)
Gray, P.A. [Michigan State Univ., East Lansing, MI (United States)
1996-12-31
The time dependent Lawrence-Doniach model, valid for high fields and high values of the Ginzburg-Landau parameter, is often used for studying vortex dynamics in layered high-T{sub c} superconductors. When solving these equations numerically, the added degrees of complexity due to the coupling and nonlinearity of the model often warrant the use of high-performance computers for their solution. However, the interdependence between the layers can be manipulated so as to allow parallelization of the computations at an individual layer level. The reduced parallel tasks may then be solved independently using a heterogeneous cluster of networked workstations connected together with Parallel Virtual Machine (PVM) software. Here, this parallelization of the model is discussed and several computational implementations of varying degrees of parallelism are presented. Computational results are also given which contrast properties of convergence speed, stability, and consistency of these implementations. Included in these results are models involving the motion of vortices due to an applied current and pinning effects due to various material properties.
Design, construction, and testing of a hysteresis controlled inverter for paralleling
Fillmore, Paul F.
2003-01-01
The U. S. Navy is pursuing an all electric ship that will require enormous amounts of power for applications such as electric propulsion. Reliability and redundancy in the electronics are imperative, since failure of a critical system could leave a ship stranded and vulnerable. A parallel inverter drive topology has been proposed to provide reliability and redundancy through load sharing. The parallel architecture enables some functionality in the event that one of the inverters fails. This t...
Fast electrostatic force calculation on parallel computer clusters
International Nuclear Information System (INIS)
Kia, Amirali; Kim, Daejoong; Darve, Eric
2008-01-01
The fast multipole method (FMM) and smooth particle mesh Ewald (SPME) are well known fast algorithms to evaluate long range electrostatic interactions in molecular dynamics and other fields. FMM is a multi-scale method which reduces the computation cost by approximating the potential due to a group of particles at a large distance using few multipole functions. This algorithm scales like O(N) for N particles. SPME algorithm is an O(NlnN) method which is based on an interpolation of the Fourier space part of the Ewald sum and evaluating the resulting convolutions using fast Fourier transform (FFT). Those algorithms suffer from relatively poor efficiency on large parallel machines especially for mid-size problems around hundreds of thousands of atoms. A variation of the FMM, called PWA, based on plane wave expansions is presented in this paper. A new parallelization strategy for PWA, which takes advantage of the specific form of this expansion, is described. Its parallel efficiency is compared with SPME through detail time measurements on two different computer clusters
Hybrid parallel computing architecture for multiview phase shifting
Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun
2014-11-01
The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.
Real-time trajectory optimization on parallel processors
Psiaki, Mark L.
1993-01-01
A parallel algorithm has been developed for rapidly solving trajectory optimization problems. The goal of the work has been to develop an algorithm that is suitable to do real-time, on-line optimal guidance through repeated solution of a trajectory optimization problem. The algorithm has been developed on an INTEL iPSC/860 message passing parallel processor. It uses a zero-order-hold discretization of a continuous-time problem and solves the resulting nonlinear programming problem using a custom-designed augmented Lagrangian nonlinear programming algorithm. The algorithm achieves parallelism of function, derivative, and search direction calculations through the principle of domain decomposition applied along the time axis. It has been encoded and tested on 3 example problems, the Goddard problem, the acceleration-limited, planar minimum-time to the origin problem, and a National Aerospace Plane minimum-fuel ascent guidance problem. Execution times as fast as 118 sec of wall clock time have been achieved for a 128-stage Goddard problem solved on 32 processors. A 32-stage minimum-time problem has been solved in 151 sec on 32 processors. A 32-stage National Aerospace Plane problem required 2 hours when solved on 32 processors. A speed-up factor of 7.2 has been achieved by using 32-nodes instead of 1-node to solve a 64-stage Goddard problem.
Parallel particle swarm optimization algorithm in nuclear problems
International Nuclear Information System (INIS)
Waintraub, Marcel; Pereira, Claudio M.N.A.; Schirru, Roberto
2009-01-01
Particle Swarm Optimization (PSO) is a population-based metaheuristic (PBM), in which solution candidates evolve through simulation of a simplified social adaptation model. Putting together robustness, efficiency and simplicity, PSO has gained great popularity. Many successful applications of PSO are reported, in which PSO demonstrated to have advantages over other well-established PBM. However, computational costs are still a great constraint for PSO, as well as for all other PBMs, especially in optimization problems with time consuming objective functions. To overcome such difficulty, parallel computation has been used. The default advantage of parallel PSO (PPSO) is the reduction of computational time. Master-slave approaches, exploring this characteristic are the most investigated. However, much more should be expected. It is known that PSO may be improved by more elaborated neighborhood topologies. Hence, in this work, we develop several different PPSO algorithms exploring the advantages of enhanced neighborhood topologies implemented by communication strategies in multiprocessor architectures. The proposed PPSOs have been applied to two complex and time consuming nuclear engineering problems: reactor core design and fuel reload optimization. After exhaustive experiments, it has been concluded that: PPSO still improves solutions after many thousands of iterations, making prohibitive the efficient use of serial (non-parallel) PSO in such kind of realworld problems; and PPSO with more elaborated communication strategies demonstrated to be more efficient and robust than the master-slave model. Advantages and peculiarities of each model are carefully discussed in this work. (author)
Plane-wave electronic structure calculations on a parallel supercomputer
International Nuclear Information System (INIS)
Nelson, J.S.; Plimpton, S.J.; Sears, M.P.
1993-01-01
The development of iterative solutions of Schrodinger's equation in a plane-wave (pw) basis over the last several years has coincided with great advances in the computational power available for performing the calculations. These dual developments have enabled many new and interesting condensed matter phenomena to be studied from a first-principles approach. The authors present a detailed description of the implementation on a parallel supercomputer (hypercube) of the first-order equation-of-motion solution to Schrodinger's equation, using plane-wave basis functions and ab initio separable pseudopotentials. By distributing the plane-waves across the processors of the hypercube many of the computations can be performed in parallel, resulting in decreases in the overall computation time relative to conventional vector supercomputers. This partitioning also provides ample memory for large Fast Fourier Transform (FFT) meshes and the storage of plane-wave coefficients for many hundreds of energy bands. The usefulness of the parallel techniques is demonstrated by benchmark timings for both the FFT's and iterations of the self-consistent solution of Schrodinger's equation for different sized Si unit cells of up to 512 atoms
Parallel visualization on leadership computing resources
Energy Technology Data Exchange (ETDEWEB)
Peterka, T; Ross, R B [Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439 (United States); Shen, H-W [Department of Computer Science and Engineering, Ohio State University, Columbus, OH 43210 (United States); Ma, K-L [Department of Computer Science, University of California at Davis, Davis, CA 95616 (United States); Kendall, W [Department of Electrical Engineering and Computer Science, University of Tennessee at Knoxville, Knoxville, TN 37996 (United States); Yu, H, E-mail: tpeterka@mcs.anl.go [Sandia National Laboratories, California, Livermore, CA 94551 (United States)
2009-07-01
Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.
Distributed Parallel Architecture for "Big Data"
Directory of Open Access Journals (Sweden)
Catalin BOJA
2012-01-01
Full Text Available This paper is an extension to the "Distributed Parallel Architecture for Storing and Processing Large Datasets" paper presented at the WSEAS SEPADS’12 conference in Cambridge. In its original version the paper went over the benefits of using a distributed parallel architecture to store and process large datasets. This paper analyzes the problem of storing, processing and retrieving meaningful insight from petabytes of data. It provides a survey on current distributed and parallel data processing technologies and, based on them, will propose an architecture that can be used to solve the analyzed problem. In this version there is more emphasis put on distributed files systems and the ETL processes involved in a distributed environment.
Java parallel secure stream for grid computing
International Nuclear Information System (INIS)
Chen, J.; Akers, W.; Chen, Y.; Watson, W.
2001-01-01
The emergence of high speed wide area networks makes grid computing a reality. However grid applications that need reliable data transfer still have difficulties to achieve optimal TCP performance due to network tuning of TCP window size to improve the bandwidth and to reduce latency on a high speed wide area network. The authors present a pure Java package called JPARSS (Java Parallel Secure Stream) that divides data into partitions that are sent over several parallel Java streams simultaneously and allows Java or Web applications to achieve optimal TCP performance in a gird environment without the necessity of tuning the TCP window size. Several experimental results are provided to show that using parallel stream is more effective than tuning TCP window size. In addition X.509 certificate based single sign-on mechanism and SSL based connection establishment are integrated into this package. Finally a few applications using this package will be discussed
Applications of Parallel Processing in Mobile Banking
Directory of Open Access Journals (Sweden)
2007-01-01
Full Text Available The future of mobile banking will be represented by such applications that support mobile, Internet banking and EFT (Electronic Funds Transfer transactions in a single user interface. In such a way, the mobile banking will be able to cover all the types of applications demanded at the market level. The parallel processing of credit card bank transactions could be performed with the help of a grid network. Excluding some limitations, the grid processing offers huge opportunities to exploit the parallelism. For this reason, a lot of applications of waiting queues in grid processing were developed in the last years. Grid networks represent a distinctive and very modern field of the parallel and distributed processing.
Parallel optoelectronic trinary signed-digit division
Alam, Mohammad S.
1999-03-01
The trinary signed-digit (TSD) number system has been found to be very useful for parallel addition and subtraction of any arbitrary length operands in constant time. Using the TSD addition and multiplication modules as the basic building blocks, we develop an efficient algorithm for performing parallel TSD division in constant time. The proposed division technique uses one TSD subtraction and two TSD multiplication steps. An optoelectronic correlator based architecture is suggested for implementation of the proposed TSD division algorithm, which fully exploits the parallelism and high processing speed of optics. An efficient spatial encoding scheme is used to ensure better utilization of space bandwidth product of the spatial light modulators used in the optoelectronic implementation.
Abstract Level Parallelization of Finite Difference Methods
Directory of Open Access Journals (Sweden)
Edwin Vollebregt
1997-01-01
Full Text Available A formalism is proposed for describing finite difference calculations in an abstract way. The formalism consists of index sets and stencils, for characterizing the structure of sets of data items and interactions between data items (“neighbouring relations”. The formalism provides a means for lifting programming to a more abstract level. This simplifies the tasks of performance analysis and verification of correctness, and opens the way for automaticcode generation. The notation is particularly useful in parallelization, for the systematic construction of parallel programs in a process/channel programming paradigm (e.g., message passing. This is important because message passing, unfortunately, still is the only approach that leads to acceptable performance for many more unstructured or irregular problems on parallel computers that have non-uniform memory access times. It will be shown that the use of index sets and stencils greatly simplifies the determination of which data must be exchanged between different computing processes.
Parallel visualization on leadership computing resources
International Nuclear Information System (INIS)
Peterka, T; Ross, R B; Shen, H-W; Ma, K-L; Kendall, W; Yu, H
2009-01-01
Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.
A possibility of parallel and anti-parallel diffraction measurements on ...
Indian Academy of Sciences (India)
However, a bent perfect crystal (BPC) monochromator at monochromatic focusing condition can provide a quite flat and equal resolution property at both parallel and anti-parallel positions and thus one can have a chance to use both sides for the diffraction experiment. From the data of the FWHM and the / measured ...
International Nuclear Information System (INIS)
Ishizuki, Shigeru; Kawai, Wataru; Nemoto, Toshiyuki; Ogasawara, Shinobu; Kume, Etsuo; Adachi, Masaaki; Kawasaki, Nobuo; Yatake, Yo-ichi
2000-03-01
Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system, the AP3000 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 12 codes in fiscal 1998. These results are reported in 3 parts, i.e., the vectorization and parallelization on vector processors part, the parallelization on scalar processors part and the porting part. In this report, we describe the vectorization and parallelization on vector processors. In this vectorization and parallelization on vector processors part, the vectorization of General Tokamak Circuit Simulation Program code GTCSP, the vectorization and parallelization of Molecular Dynamics NTV (n-particle, Temperature and Velocity) Simulation code MSP2, Eddy Current Analysis code EDDYCAL, Thermal Analysis Code for Test of Passive Cooling System by HENDEL T2 code THANPACST2 and MHD Equilibrium code SELENEJ on the VPP500 are described. In the parallelization on scalar processors part, the parallelization of Monte Carlo N-Particle Transport code MCNP4B2, Plasma Hydrodynamics code using Cubic Interpolated Propagation Method PHCIP and Vectorized Monte Carlo code (continuous energy model / multi-group model) MVP/GMVP on the Paragon are described. In the porting part, the porting of Monte Carlo N-Particle Transport code MCNP4B2 and Reactor Safety Analysis code RELAP5 on the AP3000 are described. (author)
A SPECT reconstruction method for extending parallel to non-parallel geometries
International Nuclear Information System (INIS)
Wen Junhai; Liang Zhengrong
2010-01-01
Due to its simplicity, parallel-beam geometry is usually assumed for the development of image reconstruction algorithms. The established reconstruction methodologies are then extended to fan-beam, cone-beam and other non-parallel geometries for practical application. This situation occurs for quantitative SPECT (single photon emission computed tomography) imaging in inverting the attenuated Radon transform. Novikov reported an explicit parallel-beam formula for the inversion of the attenuated Radon transform in 2000. Thereafter, a formula for fan-beam geometry was reported by Bukhgeim and Kazantsev (2002 Preprint N. 99 Sobolev Institute of Mathematics). At the same time, we presented a formula for varying focal-length fan-beam geometry. Sometimes, the reconstruction formula is so implicit that we cannot obtain the explicit reconstruction formula in the non-parallel geometries. In this work, we propose a unified reconstruction framework for extending parallel-beam geometry to any non-parallel geometry using ray-driven techniques. Studies by computer simulations demonstrated the accuracy of the presented unified reconstruction framework for extending parallel-beam to non-parallel geometries in inverting the attenuated Radon transform.
Programming massively parallel processors a hands-on approach
Kirk, David B
2010-01-01
Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...
Parallel algorithms for numerical linear algebra
van der Vorst, H
1990-01-01
This is the first in a new series of books presenting research results and developments concerning the theory and applications of parallel computers, including vector, pipeline, array, fifth/future generation computers, and neural computers.All aspects of high-speed computing fall within the scope of the series, e.g. algorithm design, applications, software engineering, networking, taxonomy, models and architectural trends, performance, peripheral devices.Papers in Volume One cover the main streams of parallel linear algebra: systolic array algorithms, message-passing systems, algorithms for p
Keldysh formalism for multiple parallel worlds
International Nuclear Information System (INIS)
Ansari, M.; Nazarov, Y. V.
2016-01-01
We present a compact and self-contained review of the recently developed Keldysh formalism for multiple parallel worlds. The formalism has been applied to consistent quantum evaluation of the flows of informational quantities, in particular, to the evaluation of Renyi and Shannon entropy flows. We start with the formulation of the standard and extended Keldysh techniques in a single world in a form convenient for our presentation. We explain the use of Keldysh contours encompassing multiple parallel worlds. In the end, we briefly summarize the concrete results obtained with the method.
Keldysh formalism for multiple parallel worlds
Ansari, M.; Nazarov, Y. V.
2016-03-01
We present a compact and self-contained review of the recently developed Keldysh formalism for multiple parallel worlds. The formalism has been applied to consistent quantum evaluation of the flows of informational quantities, in particular, to the evaluation of Renyi and Shannon entropy flows. We start with the formulation of the standard and extended Keldysh techniques in a single world in a form convenient for our presentation. We explain the use of Keldysh contours encompassing multiple parallel worlds. In the end, we briefly summarize the concrete results obtained with the method.
A Massively Parallel Face Recognition System
Directory of Open Access Journals (Sweden)
Lahdenoja Olli
2007-01-01
Full Text Available We present methods for processing the LBPs (local binary patterns with a massively parallel hardware, especially with CNN-UM (cellular nonlinear network-universal machine. In particular, we present a framework for implementing a massively parallel face recognition system, including a dedicated highly accurate algorithm suitable for various types of platforms (e.g., CNN-UM and digital FPGA. We study in detail a dedicated mixed-mode implementation of the algorithm and estimate its implementation cost in the view of its performance and accuracy restrictions.
Xyce parallel electronic simulator release notes.
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.
2010-05-01
The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.
Parallel transposition of sparse data structures
DEFF Research Database (Denmark)
Wang, Hao; Liu, Weifeng; Hou, Kaixi
2016-01-01
Many applications in computational sciences and social sciences exploit sparsity and connectivity of acquired data. Even though many parallel sparse primitives such as sparse matrix-vector (SpMV) multiplication have been extensively studied, some other important building blocks, e.g., parallel tr...... transposition in the latest vendor-supplied library on an Intel multicore CPU platform, and the MergeTrans approach achieves on average of 3.4-fold (up to 11.7-fold) speedup on an Intel Xeon Phi many-core processor....
Temporal fringe pattern analysis with parallel computing
International Nuclear Information System (INIS)
Tuck Wah Ng; Kar Tien Ang; Argentini, Gianluca
2005-01-01
Temporal fringe pattern analysis is invaluable in transient phenomena studies but necessitates long processing times. Here we describe a parallel computing strategy based on the single-program multiple-data model and hyperthreading processor technology to reduce the execution time. In a two-node cluster workstation configuration we found that execution periods were reduced by 1.6 times when four virtual processors were used. To allow even lower execution times with an increasing number of processors, the time allocated for data transfer, data read, and waiting should be minimized. Parallel computing is found here to present a feasible approach to reduce execution times in temporal fringe pattern analysis
On radial flow between parallel disks
International Nuclear Information System (INIS)
Wee, A Y L; Gorin, A
2015-01-01
Approximate analytical solutions are presented for converging flow in between two parallel non rotating disks. The static pressure distribution and radial component of the velocity are developed by averaging the inertial term across the gap in between parallel disks. The predicted results from the first approximation are favourable to experimental results as well as results presented by other authors. The second approximation shows that as the fluid approaches the center, the velocity at the mid channel slows down which is due to the struggle between the inertial term and the flowrate. (paper)
Logical inference techniques for loop parallelization
DEFF Research Database (Denmark)
Oancea, Cosmin Eugen; Rauchwerger, Lawrence
2012-01-01
the parallelization transformation by verifying the independence of the loop's memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S={}, where S is a set expression representing array indexes. Using...... of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECT-CLUB and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers....
A PARALLEL EXTENSION OF THE UAL ENVIRONMENT
International Nuclear Information System (INIS)
MALITSKY, N.; SHISHLO, A.
2001-01-01
The deployment of the Unified Accelerator Library (UAL) environment on the parallel cluster is presented. The approach is based on the Message-Passing Interface (MPI) library and the Perl adapter that allows one to control and mix together the existing conventional UAL components with the new MPI-based parallel extensions. In the paper, we provide timing results and describe the application of the new environment to the SNS Ring complex beam dynamics studies, particularly, simulations of several physical effects, such as space charge, field errors, fringe fields, and others
Analysis of a parallel multigrid algorithm
Chan, Tony F.; Tuminaro, Ray S.
1989-01-01
The parallel multigrid algorithm of Frederickson and McBryan (1987) is considered. This algorithm uses multiple coarse-grid problems (instead of one problem) in the hope of accelerating convergence and is found to have a close relationship to traditional multigrid methods. Specifically, the parallel coarse-grid correction operator is identical to a traditional multigrid coarse-grid correction operator, except that the mixing of high and low frequencies caused by aliasing error is removed. Appropriate relaxation operators can be chosen to take advantage of this property. Comparisons between the standard multigrid and the new method are made.
Parallel processing for artificial intelligence 2
Kumar, V; Suttner, CB
1994-01-01
With the increasing availability of parallel machines and the raising of interest in large scale and real world applications, research on parallel processing for Artificial Intelligence (AI) is gaining greater importance in the computer science environment. Many applications have been implemented and delivered but the field is still considered to be in its infancy. This book assembles diverse aspects of research in the area, providing an overview of the current state of technology. It also aims to promote further growth across the discipline. Contributions have been grouped according to their
Configuration affects parallel stent grafting results.
Tanious, Adam; Wooster, Mathew; Armstrong, Paul A; Zwiebel, Bruce; Grundy, Shane; Back, Martin R; Shames, Murray L
2018-05-01
A number of adjunctive "off-the-shelf" procedures have been described to treat complex aortic diseases. Our goal was to evaluate parallel stent graft configurations and to determine an optimal formula for these procedures. This is a retrospective review of all patients at a single medical center treated with parallel stent grafts from January 2010 to September 2015. Outcomes were evaluated on the basis of parallel graft orientation, type, and main body device. Primary end points included parallel stent graft compromise and overall endovascular aneurysm repair (EVAR) compromise. There were 78 patients treated with a total of 144 parallel stents for a variety of pathologic processes. There was a significant correlation between main body oversizing and snorkel compromise (P = .0195) and overall procedural complication (P = .0019) but not with endoleak rates. Patients were organized into the following oversizing groups for further analysis: 0% to 10%, 10% to 20%, and >20%. Those oversized into the 0% to 10% group had the highest rate of overall EVAR complication (73%; P = .0003). There were no significant correlations between any one particular configuration and overall procedural complication. There was also no significant correlation between total number of parallel stents employed and overall complication. Composite EVAR configuration had no significant correlation with individual snorkel compromise, endoleak, or overall EVAR or procedural complication. The configuration most prone to individual snorkel compromise and overall EVAR complication was a four-stent configuration with two stents in an antegrade position and two stents in a retrograde position (60% complication rate). The configuration most prone to endoleak was one or two stents in retrograde position (33% endoleak rate), followed by three stents in an all-antegrade position (25%). There was a significant correlation between individual stent configuration and stent compromise (P = .0385), with 31
Keldysh formalism for multiple parallel worlds
Energy Technology Data Exchange (ETDEWEB)
Ansari, M.; Nazarov, Y. V., E-mail: y.v.nazarov@tudelft.nl [Delft University of Technology, Kavli Institute of Nanoscience (Netherlands)
2016-03-15
We present a compact and self-contained review of the recently developed Keldysh formalism for multiple parallel worlds. The formalism has been applied to consistent quantum evaluation of the flows of informational quantities, in particular, to the evaluation of Renyi and Shannon entropy flows. We start with the formulation of the standard and extended Keldysh techniques in a single world in a form convenient for our presentation. We explain the use of Keldysh contours encompassing multiple parallel worlds. In the end, we briefly summarize the concrete results obtained with the method.
Use of parallel counters for triggering
International Nuclear Information System (INIS)
Nikityuk, N.M.
1991-01-01
Results of investigation of using parallel counters, majority coincidence schemes, parallel compressors for triggering in multichannel high energy spectrometers are described. Concrete examples of methods of constructing fast and economic new devices used to determine multiplicity hits t>900 registered in a hodoscopic plane and a pixel detector are given. For this purpose the author uses the syndrome coding method and cellular arrays. In addition, an effective coding matrix has been created which can be used for light signal coding. For example, such signals are supplied from scintillators to photomultipliers. 23 refs.; 21 figs
A Massively Parallel Face Recognition System
Directory of Open Access Journals (Sweden)
Ari Paasio
2006-12-01
Full Text Available We present methods for processing the LBPs (local binary patterns with a massively parallel hardware, especially with CNN-UM (cellular nonlinear network-universal machine. In particular, we present a framework for implementing a massively parallel face recognition system, including a dedicated highly accurate algorithm suitable for various types of platforms (e.g., CNN-UM and digital FPGA. We study in detail a dedicated mixed-mode implementation of the algorithm and estimate its implementation cost in the view of its performance and accuracy restrictions.
Parallel processor for fast event analysis
International Nuclear Information System (INIS)
Hensley, D.C.
1983-01-01
Current maximum data rates from the Spin Spectrometer of approx. 5000 events/s (up to 1.3 MBytes/s) and minimum analysis requiring at least 3000 operations/event require a CPU cycle time near 70 ns. In order to achieve an effective cycle time of 70 ns, a parallel processing device is proposed where up to 4 independent processors will be implemented in parallel. The individual processors are designed around the Am2910 Microsequencer, the AM29116 μP, and the Am29517 Multiplier. Satellite histogramming in a mass memory system will be managed by a commercial 16-bit μP system
Parallel adaptive simulations on unstructured meshes
International Nuclear Information System (INIS)
Shephard, M S; Jansen, K E; Sahni, O; Diachin, L A
2007-01-01
This paper discusses methods being developed by the ITAPS center to support the execution of parallel adaptive simulations on unstructured meshes. The paper first outlines the ITAPS approach to the development of interoperable mesh, geometry and field services to support the needs of SciDAC application in these areas. The paper then demonstrates the ability of unstructured adaptive meshing methods built on such interoperable services to effectively solve important physics problems. Attention is then focused on ITAPs' developing ability to solve adaptive unstructured mesh problems on massively parallel computers
Structured building model reduction toward parallel simulation
Energy Technology Data Exchange (ETDEWEB)
Dobbs, Justin R. [Cornell University; Hencey, Brondon M. [Cornell University
2013-08-26
Building energy model reduction exchanges accuracy for improved simulation speed by reducing the number of dynamical equations. Parallel computing aims to improve simulation times without loss of accuracy but is poorly utilized by contemporary simulators and is inherently limited by inter-processor communication. This paper bridges these disparate techniques to implement efficient parallel building thermal simulation. We begin with a survey of three structured reduction approaches that compares their performance to a leading unstructured method. We then use structured model reduction to find thermal clusters in the building energy model and allocate processing resources. Experimental results demonstrate faster simulation and low error without any interprocessor communication.
Parallel preconditioning techniques for sparse CG solvers
Energy Technology Data Exchange (ETDEWEB)
Basermann, A.; Reichel, B.; Schelthoff, C. [Central Institute for Applied Mathematics, Juelich (Germany)
1996-12-31
Conjugate gradient (CG) methods to solve sparse systems of linear equations play an important role in numerical methods for solving discretized partial differential equations. The large size and the condition of many technical or physical applications in this area result in the need for efficient parallelization and preconditioning techniques of the CG method. In particular for very ill-conditioned matrices, sophisticated preconditioner are necessary to obtain both acceptable convergence and accuracy of CG. Here, we investigate variants of polynomial and incomplete Cholesky preconditioners that markedly reduce the iterations of the simply diagonally scaled CG and are shown to be well suited for massively parallel machines.
Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.
2017-07-01
Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
SPINET: A Parallel Computing Approach to Spine Simulations
Directory of Open Access Journals (Sweden)
Peter G. Kropf
1996-01-01
Full Text Available Research in scientitic programming enables us to realize more and more complex applications, and on the other hand, application-driven demands on computing methods and power are continuously growing. Therefore, interdisciplinary approaches become more widely used. The interdisciplinary SPINET project presented in this article applies modern scientific computing tools to biomechanical simulations: parallel computing and symbolic and modern functional programming. The target application is the human spine. Simulations of the spine help us to investigate and better understand the mechanisms of back pain and spinal injury. Two approaches have been used: the first uses the finite element method for high-performance simulations of static biomechanical models, and the second generates a simulation developmenttool for experimenting with different dynamic models. A finite element program for static analysis has been parallelized for the MUSIC machine. To solve the sparse system of linear equations, a conjugate gradient solver (iterative method and a frontal solver (direct method have been implemented. The preprocessor required for the frontal solver is written in the modern functional programming language SML, the solver itself in C, thus exploiting the characteristic advantages of both functional and imperative programming. The speedup analysis of both solvers show very satisfactory results for this irregular problem. A mixed symbolic-numeric environment for rigid body system simulations is presented. It automatically generates C code from a problem specification expressed by the Lagrange formalism using Maple.
Integrated variable projection approach (IVAPA) for parallel magnetic resonance imaging.
Zhang, Qiao; Sheng, Jinhua
2012-10-01
Parallel magnetic resonance imaging (pMRI) is a fast method which requires algorithms for the reconstructing image from a small number of measured k-space lines. The accurate estimation of the coil sensitivity functions is still a challenging problem in parallel imaging. The joint estimation of the coil sensitivity functions and the desired image has recently been proposed to improve the situation by iteratively optimizing both the coil sensitivity functions and the image reconstruction. It regards both the coil sensitivities and the desired images as unknowns to be solved for jointly. In this paper, we propose an integrated variable projection approach (IVAPA) for pMRI, which integrates two individual processing steps (coil sensitivity estimation and image reconstruction) into a single processing step to improve the accuracy of the coil sensitivity estimation using the variable projection approach. The method is demonstrated to be able to give an optimal solution with considerably reduced artifacts for high reduction factors and a low number of auto-calibration signal (ACS) lines, and our implementation has a fast convergence rate. The performance of the proposed method is evaluated using a set of in vivo experiment data. Copyright © 2012 Elsevier Ltd. All rights reserved.