Prabhat
2014-01-01
Gain Critical Insight into the Parallel I/O EcosystemParallel I/O is an integral component of modern high performance computing (HPC), especially in storing and processing very large datasets to facilitate scientific discovery. Revealing the state of the art in this field, High Performance Parallel I/O draws on insights from leading practitioners, researchers, software architects, developers, and scientists who shed light on the parallel I/O ecosystem.The first part of the book explains how large-scale HPC facilities scope, configure, and operate systems, with an emphasis on choices of I/O har
Implementation and performance of parallelized elegant
International Nuclear Information System (INIS)
Wang, Y.; Borland, M.
2008-01-01
The program elegant is widely used for design and modeling of linacs for free-electron lasers and energy recovery linacs, as well as storage rings and other applications. As part of a multi-year effort, we have parallelized many aspects of the code, including single-particle dynamics, wakefields, and coherent synchrotron radiation. We report on the approach used for gradual parallelization, which proved very beneficial in getting parallel features into the hands of users quickly. We also report details of parallelization of collective effects. Finally, we discuss performance of the parallelized code in various applications.
Multitasking TORT Under UNICOS: Parallel Performance Models and Measurements
International Nuclear Information System (INIS)
Azmy, Y.Y.; Barnett, D.A.
1999-01-01
The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-COS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead
Multitasking TORT under UNICOS: Parallel performance models and measurements
International Nuclear Information System (INIS)
Barnett, A.; Azmy, Y.Y.
1999-01-01
The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead
Analysis of parallel computing performance of the code MCNP
International Nuclear Information System (INIS)
Wang Lei; Wang Kan; Yu Ganglin
2006-01-01
Parallel computing can reduce the running time of the code MCNP effectively. With the MPI message transmitting software, MCNP5 can achieve its parallel computing on PC cluster with Windows operating system. Parallel computing performance of MCNP is influenced by factors such as the type, the complexity level and the parameter configuration of the computing problem. This paper analyzes the parallel computing performance of MCNP regarding with these factors and gives measures to improve the MCNP parallel computing performance. (authors)
The Performance of an Object-Oriented, Parallel Operating System
Directory of Open Access Journals (Sweden)
David R. Kohr, Jr.
1994-01-01
Full Text Available The nascent and rapidly evolving state of parallel systems often leaves parallel application developers at the mercy of inefficient, inflexible operating system software. Given the relatively primitive state of parallel systems software, maximizing the performance of parallel applications not only requires judicious tuning of the application software, but occasionally, the replacement of specific system software modules with others that can more readily respond to the imposed pattern of resource demands. To assess the feasibility of application and performance tuning via malleable system software and to understand the performance penalties for detailed operating system performance data capture, we describe a set of performance instrumentation techniques for parallel, object-oriented operating systems and a set of performance experiments with Choices, an experimental, object-oriented operating system designed for use with parallel sys- tems. These performance experiments show that (a the performance overhead for operating system data capture is modest, (b the penalty for malleable, object-oriented operating systems is negligible, but (c techniques are needed to strictly enforce adherence of implementation to design if operating system modules are to be replaced.
High performance parallel computers for science
International Nuclear Information System (INIS)
Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.
1989-01-01
This paper reports that Fermilab's Advanced Computer Program (ACP) has been developing cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 Mflops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction
Performance studies of the parallel VIM code
International Nuclear Information System (INIS)
Shi, B.; Blomquist, R.N.
1996-01-01
In this paper, the authors evaluate the performance of the parallel version of the VIM Monte Carlo code on the IBM SPx at the High Performance Computing Research Facility at ANL. Three test problems with contrasting computational characteristics were used to assess effects in performance. A statistical method for estimating the inefficiencies due to load imbalance and communication is also introduced. VIM is a large scale continuous energy Monte Carlo radiation transport program and was parallelized using history partitioning, the master/worker approach, and p4 message passing library. Dynamic load balancing is accomplished when the master processor assigns chunks of histories to workers that have completed a previously assigned task, accommodating variations in the lengths of histories, processor speeds, and worker loads. At the end of each batch (generation), the fission sites and tallies are sent from each worker to the master process, contributing to the parallel inefficiency. All communications are between master and workers, and are serial. The SPx is a scalable 128-node parallel supercomputer with high-performance Omega switches of 63 microsec latency and 35 MBytes/sec bandwidth. For uniform and reproducible performance, they used only the 120 identical regular processors (IBM RS/6000) and excluded the remaining eight planet nodes, which may be loaded by other's jobs
Performance Analysis of Parallel Mathematical Subroutine library PARCEL
International Nuclear Information System (INIS)
Yamada, Susumu; Shimizu, Futoshi; Kobayashi, Kenichi; Kaburaki, Hideo; Kishida, Norio
2000-01-01
The parallel mathematical subroutine library PARCEL (Parallel Computing Elements) has been developed by Japan Atomic Energy Research Institute for easy use of typical parallelized mathematical codes in any application problems on distributed parallel computers. The PARCEL includes routines for linear equations, eigenvalue problems, pseudo-random number generation, and fast Fourier transforms. It is shown that the results of performance for linear equations routines exhibit good parallelization efficiency on vector, as well as scalar, parallel computers. A comparison of the efficiency results with the PETSc (Portable Extensible Tool kit for Scientific Computations) library has been reported. (author)
Performance of the Galley Parallel File System
Nieuwejaar, Nils; Kotz, David
1996-01-01
As the input/output (I/O) needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. This interface conceals the parallism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. Initial experiments, reported in this paper, indicate that Galley is capable of providing high-performance 1/O to applications the applications that rely on them. In Section 3 we describe that access data in patterns that have been observed to be common.
Identification of parallel and divergent optimization solutions for homologous metabolic enzymes
Directory of Open Access Journals (Sweden)
Robert F. Standaert
2018-06-01
Full Text Available Metabolic pathway assembly typically involves the expression of enzymes from multiple organisms in a single heterologous host. Ensuring that each enzyme functions effectively can be challenging, since many potential factors can disrupt proper pathway flux. Here, we compared the performance of two enzyme homologs in a pathway engineered to allow Escherichia coli to grow on 4-hydroxybenzoate (4-HB, a byproduct of lignocellulosic biomass deconstruction. Single chromosomal copies of the 4-HB 3-monooxygenase genes pobA and praI, from Pseudomonas putida KT2440 and Paenibacillus sp. JJ-1B, respectively, were introduced into a strain able to metabolize protocatechuate (PCA, the oxidation product of 4-HB. Neither enzyme initially supported consistent growth on 4-HB. Experimental evolution was used to identify mutations that improved pathway activity. For both enzymes, silent mRNA mutations were identified that increased enzyme expression. With pobA, duplication of the genes for PCA metabolism allowed growth on 4-HB. However, with praI, growth required a mutation in the 4-HB/PCA transporter pcaK that increased intracellular concentrations of 4-HB, suggesting that flux through PraI was limiting. These findings demonstrate the value of directed evolution strategies to rapidly identify and overcome diverse factors limiting enzyme activity. Keywords: Lignin, Protocatechuate, Experimental evolution
Identification of parallel and divergent optimization solutions for homologous metabolic enzymes.
Standaert, Robert F; Giannone, Richard J; Michener, Joshua K
2018-06-01
Metabolic pathway assembly typically involves the expression of enzymes from multiple organisms in a single heterologous host. Ensuring that each enzyme functions effectively can be challenging, since many potential factors can disrupt proper pathway flux. Here, we compared the performance of two enzyme homologs in a pathway engineered to allow Escherichia coli to grow on 4-hydroxybenzoate (4-HB), a byproduct of lignocellulosic biomass deconstruction. Single chromosomal copies of the 4-HB 3-monooxygenase genes pobA and praI , from Pseudomonas putida KT2440 and Paenibacillus sp. JJ-1B, respectively, were introduced into a strain able to metabolize protocatechuate (PCA), the oxidation product of 4-HB. Neither enzyme initially supported consistent growth on 4-HB. Experimental evolution was used to identify mutations that improved pathway activity. For both enzymes, silent mRNA mutations were identified that increased enzyme expression. With pobA , duplication of the genes for PCA metabolism allowed growth on 4-HB. However, with praI , growth required a mutation in the 4-HB/PCA transporter pcaK that increased intracellular concentrations of 4-HB, suggesting that flux through PraI was limiting. These findings demonstrate the value of directed evolution strategies to rapidly identify and overcome diverse factors limiting enzyme activity.
A high performance parallel approach to medical imaging
International Nuclear Information System (INIS)
Frieder, G.; Frieder, O.; Stytz, M.R.
1988-01-01
Research into medical imaging using general purpose parallel processing architectures is described and a review of the performance of previous medical imaging machines is provided. Results demonstrating that general purpose parallel architectures can achieve performance comparable to other, specialized, medical imaging machine architectures is presented. A new back-to-front hidden-surface removal algorithm is described. Results demonstrating the computational savings obtained by using the modified back-to-front hidden-surface removal algorithm are presented. Performance figures for forming a full-scale medical image on a mesh interconnected multiprocessor are presented
High Performance Parallel Multigrid Algorithms for Unstructured Grids
Frederickson, Paul O.
1996-01-01
We describe a high performance parallel multigrid algorithm for a rather general class of unstructured grid problems in two and three dimensions. The algorithm PUMG, for parallel unstructured multigrid, is related in structure to the parallel multigrid algorithm PSMG introduced by McBryan and Frederickson, for they both obtain a higher convergence rate through the use of multiple coarse grids. Another reason for the high convergence rate of PUMG is its smoother, an approximate inverse developed by Baumgardner and Frederickson.
Tuning HDF5 subfiling performance on parallel file systems
Energy Technology Data Exchange (ETDEWEB)
Byna, Suren [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Chaarawi, Mohamad [Intel Corp. (United States); Koziol, Quincey [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Mainzer, John [The HDF Group (United States); Willmore, Frank [The HDF Group (United States)
2017-05-12
Subfiling is a technique used on parallel file systems to reduce locking and contention issues when multiple compute nodes interact with the same storage target node. Subfiling provides a compromise between the single shared file approach that instigates the lock contention problems on parallel file systems and having one file per process, which results in generating a massive and unmanageable number of files. In this paper, we evaluate and tune the performance of recently implemented subfiling feature in HDF5. In specific, we explain the implementation strategy of subfiling feature in HDF5, provide examples of using the feature, and evaluate and tune parallel I/O performance of this feature with parallel file systems of the Cray XC40 system at NERSC (Cori) that include a burst buffer storage and a Lustre disk-based storage. We also evaluate I/O performance on the Cray XC30 system, Edison, at NERSC. Our results show performance benefits of 1.2X to 6X performance advantage with subfiling compared to writing a single shared HDF5 file. We present our exploration of configurations, such as the number of subfiles and the number of Lustre storage targets to storing files, as optimization parameters to obtain superior I/O performance. Based on this exploration, we discuss recommendations for achieving good I/O performance as well as limitations with using the subfiling feature.
Performance assessment of the SIMFAP parallel cluster at IFIN-HH Bucharest
International Nuclear Information System (INIS)
Adam, Gh.; Adam, S.; Ayriyan, A.; Dushanov, E.; Hayryan, E.; Korenkov, V.; Lutsenko, A.; Mitsyn, V.; Sapozhnikova, T.; Sapozhnikov, A; Streltsova, O.; Buzatu, F.; Dulea, M.; Vasile, I.; Sima, A.; Visan, C.; Busa, J.; Pokorny, I.
2008-01-01
Performance assessment and case study outputs of the parallel SIMFAP cluster at IFIN-HH Bucharest point to its effective and reliable operation. A comparison with results on the supercomputing system in LIT-JINR Dubna adds insight on resource allocation for problem solving by parallel computing. The solution of models asking for very large numbers of knots in the discretization mesh needs the migration to high performance computing based on parallel cluster architectures. The acquisition of ready-to-use parallel computing facilities being beyond limited budgetary resources, the solution at IFIN-HH was to buy the hardware and the inter-processor network, and to implement by own efforts the open software concerning both the operating system and the parallel computing standard. The present paper provides a report demonstrating the successful solution of these tasks. The implementation of the well-known HPL (High Performance LINPACK) Benchmark points to the effective and reliable operation of the cluster. The comparison of HPL outputs obtained on parallel clusters of different magnitudes shows that there is an optimum range of the order N of the linear algebraic system over which a given parallel cluster provides optimum parallel solutions. For the SIMFAP cluster, this range can be inferred to correspond to about 1 to 2 x 10 4 linear algebraic equations. For an algorithm of polynomial complexity N α the task sharing among p processors within a parallel solution mainly follows an (N/p)α behaviour under peak performance achievement. Thus, while the problem complexity remains the same, a substantial decrease of the coefficient of the leading order of the polynomial complexity is achieved. (authors)
Linking Hydrolysis Performance to Trichoderma reesei Cellulolytic Enzyme Profile
DEFF Research Database (Denmark)
Lehmann, Linda Olkjær; Petersen, Nanna; I. Jørgensen, Christian
2016-01-01
Trichoderma reesei expresses a large number of enzymes involved in lignocellulose hydrolysis and the mechanism of how these enzymes work together is too complex to study by traditional methods, e.g. by spiking with single enzymes and monitoring hydrolysis performance. In this study a multivariate...... approach, partial least squares regression, was used to see if it could help explain the correlation between enzyme profile and hydrolysis performance. Diverse enzyme mixtures were produced by Trichoderma reesei Rut-C30 by exploiting various fermentation conditions and used for hydrolysis of washed...
Enzyme Molecules in Solitary Confinement
Directory of Open Access Journals (Sweden)
Raphaela B. Liebherr
2014-09-01
Full Text Available Large arrays of homogeneous microwells each defining a femtoliter volume are a versatile platform for monitoring the substrate turnover of many individual enzyme molecules in parallel. The high degree of parallelization enables the analysis of a statistically representative enzyme population. Enclosing individual enzyme molecules in microwells does not require any surface immobilization step and enables the kinetic investigation of enzymes free in solution. This review describes various microwell array formats and explores their applications for the detection and investigation of single enzyme molecules. The development of new fabrication techniques and sensitive detection methods drives the field of single molecule enzymology. Here, we introduce recent progress in single enzyme molecule analysis in microwell arrays and discuss the challenges and opportunities.
Microwave tomography global optimization, parallelization and performance evaluation
Noghanian, Sima; Desell, Travis; Ashtari, Ali
2014-01-01
This book provides a detailed overview on the use of global optimization and parallel computing in microwave tomography techniques. The book focuses on techniques that are based on global optimization and electromagnetic numerical methods. The authors provide parallelization techniques on homogeneous and heterogeneous computing architectures on high performance and general purpose futuristic computers. The book also discusses the multi-level optimization technique, hybrid genetic algorithm and its application in breast cancer imaging.
Circuit mismatch influence on performance of paralleling silicon carbide MOSFETs
DEFF Research Database (Denmark)
Li, Helong; Munk-Nielsen, Stig; Pham, Cam
2014-01-01
This paper focuses on circuit mismatch influence on performance of paralleling SiC MOSFETs. Power circuit mismatch and gate driver mismatch influences are analyzed in detail. Simulation and experiment results show the influence of circuit mismatch and verify the analysis. This paper aims to give...... suggestions on paralleling discrete SiC MOSFETs and designing layout of power modules with paralleled SiC MOSFETs dies....
Analysis for Parallel Execution without Performing Hardware/Software Co-simulation
Muhammad Rashid
2014-01-01
Hardware/software co-simulation improves the performance of embedded applications by executing the applications on a virtual platform before the actual hardware is available in silicon. However, the virtual platform of the target architecture is often not available during early stages of the embedded design flow. Consequently, analysis for parallel execution without performing hardware/software co-simulation is required. This article presents an analysis methodology for parallel execution of ...
International Nuclear Information System (INIS)
Kimura, Toshiya.
1997-03-01
A two-dimensional explicit Euler solver has been implemented for five MIMD parallel computers of different machine architectures in Center for Promotion of Computational Science and Engineering of Japan Atomic Energy Research Institute. These parallel computers are Fujitsu VPP300, NEC SX-4, CRAY T94, IBM SP2, and Hitachi SR2201. The code was parallelized by several parallelization methods, and a typical compressible flow problem has been calculated for different grid sizes changing the number of processors. Their effective performances for parallel calculations, such as calculation speed, speed-up ratio and parallel efficiency, have been investigated and evaluated. The communication time among processors has been also measured and evaluated. As a result, the differences on the performance and the characteristics between vector-parallel and scalar-parallel computers can be pointed, and it will present the basic data for efficient use of parallel computers and for large scale CFD simulations on parallel computers. (author)
Parallel file system performances in fusion data storage
International Nuclear Information System (INIS)
Iannone, F.; Podda, S.; Bracco, G.; Manduchi, G.; Maslennikov, A.; Migliori, S.; Wolkersdorfer, K.
2012-01-01
High I/O flow rates, up to 10 GB/s, are required in large fusion Tokamak experiments like ITER where hundreds of nodes store simultaneously large amounts of data acquired during the plasma discharges. Typical network topologies such as linear arrays (systolic), rings, meshes (2-D arrays), tori (3-D arrays), trees, butterfly, hypercube in combination with high speed data transports like Infiniband or 10G-Ethernet, are the main areas in which the effort to overcome the so-called parallel I/O bottlenecks is most focused. The high I/O flow rates were modelled in an emulated testbed based on the parallel file systems such as Lustre and GPFS, commonly used in High Performance Computing. The test runs on High Performance Computing–For Fusion (8640 cores) and ENEA CRESCO (3392 cores) supercomputers. Message Passing Interface based applications were developed to emulate parallel I/O on Lustre and GPFS using data archival and access solutions like MDSPLUS and Universal Access Layer. These methods of data storage organization are widely diffused in nuclear fusion experiments and are being developed within the EFDA Integrated Tokamak Modelling – Task Force; the authors tried to evaluate their behaviour in a realistic emulation setup.
Parallel file system performances in fusion data storage
Energy Technology Data Exchange (ETDEWEB)
Iannone, F., E-mail: francesco.iannone@enea.it [Associazione EURATOM-ENEA sulla Fusione, C.R.ENEA Frascati, via E.Fermi, 45 - 00044 Frascati, Rome (Italy); Podda, S.; Bracco, G. [ENEA Information Communication Tecnologies, Lungotevere Thaon di Revel, 76 - 00196 Rome (Italy); Manduchi, G. [Associazione EURATOM-ENEA sulla Fusione, Consorzio RFX, Corso Stati Uniti, 4 - 35127 Padua (Italy); Maslennikov, A. [CASPUR Inter-University Consortium for the Application of Super-Computing for Research, via dei Tizii, 6b - 00185 Rome (Italy); Migliori, S. [ENEA Information Communication Tecnologies, Lungotevere Thaon di Revel, 76 - 00196 Rome (Italy); Wolkersdorfer, K. [Juelich Supercomputing Centre-FZJ, D-52425 Juelich (Germany)
2012-12-15
High I/O flow rates, up to 10 GB/s, are required in large fusion Tokamak experiments like ITER where hundreds of nodes store simultaneously large amounts of data acquired during the plasma discharges. Typical network topologies such as linear arrays (systolic), rings, meshes (2-D arrays), tori (3-D arrays), trees, butterfly, hypercube in combination with high speed data transports like Infiniband or 10G-Ethernet, are the main areas in which the effort to overcome the so-called parallel I/O bottlenecks is most focused. The high I/O flow rates were modelled in an emulated testbed based on the parallel file systems such as Lustre and GPFS, commonly used in High Performance Computing. The test runs on High Performance Computing-For Fusion (8640 cores) and ENEA CRESCO (3392 cores) supercomputers. Message Passing Interface based applications were developed to emulate parallel I/O on Lustre and GPFS using data archival and access solutions like MDSPLUS and Universal Access Layer. These methods of data storage organization are widely diffused in nuclear fusion experiments and are being developed within the EFDA Integrated Tokamak Modelling - Task Force; the authors tried to evaluate their behaviour in a realistic emulation setup.
Performance modeling of parallel algorithms for solving neutron diffusion problems
International Nuclear Information System (INIS)
Azmy, Y.Y.; Kirk, B.L.
1995-01-01
Neutron diffusion calculations are the most common computational methods used in the design, analysis, and operation of nuclear reactors and related activities. Here, mathematical performance models are developed for the parallel algorithm used to solve the neutron diffusion equation on message passing and shared memory multiprocessors represented by the Intel iPSC/860 and the Sequent Balance 8000, respectively. The performance models are validated through several test problems, and these models are used to estimate the performance of each of the two considered architectures in situations typical of practical applications, such as fine meshes and a large number of participating processors. While message passing computers are capable of producing speedup, the parallel efficiency deteriorates rapidly as the number of processors increases. Furthermore, the speedup fails to improve appreciably for massively parallel computers so that only small- to medium-sized message passing multiprocessors offer a reasonable platform for this algorithm. In contrast, the performance model for the shared memory architecture predicts very high efficiency over a wide range of number of processors reasonable for this architecture. Furthermore, the model efficiency of the Sequent remains superior to that of the hypercube if its model parameters are adjusted to make its processors as fast as those of the iPSC/860. It is concluded that shared memory computers are better suited for this parallel algorithm than message passing computers
Distributed and parallel approach for handle and perform huge datasets
Konopko, Joanna
2015-12-01
Big Data refers to the dynamic, large and disparate volumes of data comes from many different sources (tools, machines, sensors, mobile devices) uncorrelated with each others. It requires new, innovative and scalable technology to collect, host and analytically process the vast amount of data. Proper architecture of the system that perform huge data sets is needed. In this paper, the comparison of distributed and parallel system architecture is presented on the example of MapReduce (MR) Hadoop platform and parallel database platform (DBMS). This paper also analyzes the problem of performing and handling valuable information from petabytes of data. The both paradigms: MapReduce and parallel DBMS are described and compared. The hybrid architecture approach is also proposed and could be used to solve the analyzed problem of storing and processing Big Data.
Pthreads vs MPI Parallel Performance of Angular-Domain Decomposed S
International Nuclear Information System (INIS)
Azmy, Y.Y.; Barnett, D.A.
2000-01-01
Two programming models for parallelizing the Angular Domain Decomposition (ADD) of the discrete ordinates (S n ) approximation of the neutron transport equation are examined. These are the shared memory model based on the POSIX threads (Pthreads) standard, and the message passing model based on the Message Passing Interface (MPI) standard. These standard libraries are available on most multiprocessor platforms thus making the resulting parallel codes widely portable. The question is: on a fixed platform, and for a particular code solving a given test problem, which of the two programming models delivers better parallel performance? Such comparison is possible on Symmetric Multi-Processors (SMP) architectures in which several CPUs physically share a common memory, and in addition are capable of emulating message passing functionality. Implementation of the two-dimensional,(S n ), Arbitrarily High Order Transport (AHOT) code for solving neutron transport problems using these two parallelization models is described. Measured parallel performance of each model on the COMPAQ AlphaServer 8400 and the SGI Origin 2000 platforms is described, and comparison of the observed speedup for the two programming models is reported. For the case presented in this paper it appears that the MPI implementation scales better than the Pthreads implementation on both platforms
10th International Workshop on Parallel Tools for High Performance Computing
Gracia, José; Hilbrich, Tobias; Knüpfer, Andreas; Resch, Michael; Nagel, Wolfgang
2017-01-01
This book presents the proceedings of the 10th International Parallel Tools Workshop, held October 4-5, 2016 in Stuttgart, Germany – a forum to discuss the latest advances in parallel tools. High-performance computing plays an increasingly important role for numerical simulation and modelling in academic and industrial research. At the same time, using large-scale parallel systems efficiently is becoming more difficult. A number of tools addressing parallel program development and analysis have emerged from the high-performance computing community over the last decade, and what may have started as collection of small helper script has now matured to production-grade frameworks. Powerful user interfaces and an extensive body of documentation allow easy usage by non-specialists.
Integration experiences and performance studies of A COTS parallel archive systems
Energy Technology Data Exchange (ETDEWEB)
Chen, Hsing-bung [Los Alamos National Laboratory; Scott, Cody [Los Alamos National Laboratory; Grider, Bary [Los Alamos National Laboratory; Torres, Aaron [Los Alamos National Laboratory; Turley, Milton [Los Alamos National Laboratory; Sanchez, Kathy [Los Alamos National Laboratory; Bremer, John [Los Alamos National Laboratory
2010-01-01
Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf(COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching and less robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, ls, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petaflop/s computing system, LANL's Roadrunner, and demonstrated its capability to address requirements of
Integration experiments and performance studies of a COTS parallel archive system
Energy Technology Data Exchange (ETDEWEB)
Chen, Hsing-bung [Los Alamos National Laboratory; Scott, Cody [Los Alamos National Laboratory; Grider, Gary [Los Alamos National Laboratory; Torres, Aaron [Los Alamos National Laboratory; Turley, Milton [Los Alamos National Laboratory; Sanchez, Kathy [Los Alamos National Laboratory; Bremer, John [Los Alamos National Laboratory
2010-06-16
Current and future Archive Storage Systems have been asked to (a) scale to very high bandwidths, (b) scale in metadata performance, (c) support policy-based hierarchical storage management capability, (d) scale in supporting changing needs of very large data sets, (e) support standard interface, and (f) utilize commercial-off-the-shelf (COTS) hardware. Parallel file systems have been asked to do the same thing but at one or more orders of magnitude faster in performance. Archive systems continue to move closer to file systems in their design due to the need for speed and bandwidth, especially metadata searching speeds such as more caching and less robust semantics. Currently the number of extreme highly scalable parallel archive solutions is very small especially those that will move a single large striped parallel disk file onto many tapes in parallel. We believe that a hybrid storage approach of using COTS components and innovative software technology can bring new capabilities into a production environment for the HPC community much faster than the approach of creating and maintaining a complete end-to-end unique parallel archive software solution. In this paper, we relay our experience of integrating a global parallel file system and a standard backup/archive product with a very small amount of additional code to provide a scalable, parallel archive. Our solution has a high degree of overlap with current parallel archive products including (a) doing parallel movement to/from tape for a single large parallel file, (b) hierarchical storage management, (c) ILM features, (d) high volume (non-single parallel file) archives for backup/archive/content management, and (e) leveraging all free file movement tools in Linux such as copy, move, Is, tar, etc. We have successfully applied our working COTS Parallel Archive System to the current world's first petafiop/s computing system, LANL's Roadrunner machine, and demonstrated its capability to address
Performance of Air Pollution Models on Massively Parallel Computers
DEFF Research Database (Denmark)
Brown, John; Hansen, Per Christian; Wasniewski, Jerzy
1996-01-01
To compare the performance and use of three massively parallel SIMD computers, we implemented a large air pollution model on the computers. Using a realistic large-scale model, we gain detailed insight about the performance of the three computers when used to solve large-scale scientific problems...
7th International Workshop on Parallel Tools for High Performance Computing
Gracia, José; Nagel, Wolfgang; Resch, Michael
2014-01-01
Current advances in High Performance Computing (HPC) increasingly impact efficient software development workflows. Programmers for HPC applications need to consider trends such as increased core counts, multiple levels of parallelism, reduced memory per core, and I/O system challenges in order to derive well performing and highly scalable codes. At the same time, the increasing complexity adds further sources of program defects. While novel programming paradigms and advanced system libraries provide solutions for some of these challenges, appropriate supporting tools are indispensable. Such tools aid application developers in debugging, performance analysis, or code optimization and therefore make a major contribution to the development of robust and efficient parallel software. This book introduces a selection of the tools presented and discussed at the 7th International Parallel Tools Workshop, held in Dresden, Germany, September 3-4, 2013.
PERFORMANCE EVALUATION OF OR1200 PROCESSOR WITH EVOLUTIONARY PARALLEL HPRC USING GEP
Directory of Open Access Journals (Sweden)
R. Maheswari
2012-04-01
Full Text Available In this fast computing era, most of the embedded system requires more computing power to complete the complex function/ task at the lesser amount of time. One way to achieve this is by boosting up the processor performance which allows processor core to run faster. This paper presents a novel technique of increasing the performance by parallel HPRC (High Performance Reconfigurable Computing in the CPU/DSP (Digital Signal Processor unit of OR1200 (Open Reduced Instruction Set Computer (RISC 1200 using Gene Expression Programming (GEP an evolutionary programming model. OR1200 is a soft-core RISC processor of the Intellectual Property cores that can efficiently run any modern operating system. In the manufacturing process of OR1200 a parallel HPRC is placed internally in the Integer Execution Pipeline unit of the CPU/DSP core to increase the performance. The GEP Parallel HPRC is activated /deactivated by triggering the signals i HPRC_Gene_Start ii HPRC_Gene_End. A Verilog HDL(Hardware Description language functional code for Gene Expression Programming parallel HPRC is developed and synthesised using XILINX ISE in the former part of the work and a CoreMark processor core benchmark is used to test the performance of the OR1200 soft core in the later part of the work. The result of the implementation ensures the overall speed-up increased to 20.59% by GEP based parallel HPRC in the execution unit of OR1200.
8th International Workshop on Parallel Tools for High Performance Computing
Gracia, José; Knüpfer, Andreas; Resch, Michael; Nagel, Wolfgang
2015-01-01
Numerical simulation and modelling using High Performance Computing has evolved into an established technique in academic and industrial research. At the same time, the High Performance Computing infrastructure is becoming ever more complex. For instance, most of the current top systems around the world use thousands of nodes in which classical CPUs are combined with accelerator cards in order to enhance their compute power and energy efficiency. This complexity can only be mastered with adequate development and optimization tools. Key topics addressed by these tools include parallelization on heterogeneous systems, performance optimization for CPUs and accelerators, debugging of increasingly complex scientific applications, and optimization of energy usage in the spirit of green IT. This book represents the proceedings of the 8th International Parallel Tools Workshop, held October 1-2, 2014 in Stuttgart, Germany – which is a forum to discuss the latest advancements in the parallel tools.
A High-Performance Parallel FDTD Method Enhanced by Using SSE Instruction Set
Directory of Open Access Journals (Sweden)
Dau-Chyrh Chang
2012-01-01
Full Text Available We introduce a hardware acceleration technique for the parallel finite difference time domain (FDTD method using the SSE (streaming (single instruction multiple data SIMD extensions instruction set. The implementation of SSE instruction set to parallel FDTD method has achieved the significant improvement on the simulation performance. The benchmarks of the SSE acceleration on both the multi-CPU workstation and computer cluster have demonstrated the advantages of (vector arithmetic logic unit VALU acceleration over GPU acceleration. Several engineering applications are employed to demonstrate the performance of parallel FDTD method enhanced by SSE instruction set.
Parallel performance of TORT on the CRAY J90: Model and measurement
International Nuclear Information System (INIS)
Barnett, A.; Azmy, Y.Y.
1997-10-01
A limitation on the parallel performance of TORT on the CRAY J90 is the amount of extra work introduced by the multitasking algorithm itself. The extra work beyond that of the serial version of the code, called overhead, arises from the synchronization of the parallel tasks and the accumulation of results by the master task. The goal of recent updates to TORT was to reduce the time consumed by these activities. To help understand which components of the multitasking algorithm contribute significantly to the overhead, a parallel performance model was constructed and compared to measurements of actual timings of the code
Energy Technology Data Exchange (ETDEWEB)
Zhang, Ping [Zhejiang Vocational College of Commerce, Hangzhou, Binwen Road 470 (China); Department of Mechanical Science and Engineering, University of Illinois at Urbana-Champaign, 1206 West Green Street, Urbana, IL 61801 (United States); Hrnjak, P.S. [Department of Mechanical Science and Engineering, University of Illinois at Urbana-Champaign, 1206 West Green Street, Urbana, IL 61801 (United States)
2010-09-15
The thermal-hydraulic performance in periodic frosting conditions is experimentally studied for the parallel-flow parallel-fin heat exchanger, henceforth referred to as a PF{sup 2} heat exchanger, a new style of heat exchanger that uses louvered bent fins on flat tubes to enhance water drainage when the flat tubes are horizontal. Typically, it takes a few frosting/defrosting cycles to come to repeatable conditions. The criterion for the initiation of defrost and a sufficiently long defrost period are determined for the test PF{sup 2} heat exchanger and test condition. The effects of blower operation on the pressure drop, frost accumulation, water retention, and capacity in time are compared under the conditions of 15 sequential frosting cycles. Pressure drop across the heat exchanger and overall heat transfer coefficient are quantified under frost conditions as functions of the air humidity and air face velocity. The performances of two types of flat-tube heat exchangers, PF{sup 2} heat exchanger and conventional parallel-flow serpentine-fin (PFSF) heat exchanger, are compared and the results obtained are presented. (author)
Directory of Open Access Journals (Sweden)
Xing Cai
2005-01-01
Full Text Available This article addresses the performance of scientific applications that use the Python programming language. First, we investigate several techniques for improving the computational efficiency of serial Python codes. Then, we discuss the basic programming techniques in Python for parallelizing serial scientific applications. It is shown that an efficient implementation of the array-related operations is essential for achieving good parallel performance, as for the serial case. Once the array-related operations are efficiently implemented, probably using a mixed-language implementation, good serial and parallel performance become achievable. This is confirmed by a set of numerical experiments. Python is also shown to be well suited for writing high-level parallel programs.
Flexibility and Performance of Parallel File Systems
Kotz, David; Nieuwejaar, Nils
1996-01-01
As we gain experience with parallel file systems, it becomes increasingly clear that a single solution does not suit all applications. For example, it appears to be impossible to find a single appropriate interface, caching policy, file structure, or disk-management strategy. Furthermore, the proliferation of file-system interfaces and abstractions make applications difficult to port. We propose that the traditional functionality of parallel file systems be separated into two components: a fixed core that is standard on all platforms, encapsulating only primitive abstractions and interfaces, and a set of high-level libraries to provide a variety of abstractions and application-programmer interfaces (API's). We present our current and next-generation file systems as examples of this structure. Their features, such as a three-dimensional file structure, strided read and write interfaces, and I/O-node programs, are specifically designed with the flexibility and performance necessary to support a wide range of applications.
Development Of A Parallel Performance Model For The THOR Neutral Particle Transport Code
Energy Technology Data Exchange (ETDEWEB)
Yessayan, Raffi; Azmy, Yousry; Schunert, Sebastian
2017-02-01
The THOR neutral particle transport code enables simulation of complex geometries for various problems from reactor simulations to nuclear non-proliferation. It is undergoing a thorough V&V requiring computational efficiency. This has motivated various improvements including angular parallelization, outer iteration acceleration, and development of peripheral tools. For guiding future improvements to the code’s efficiency, better characterization of its parallel performance is useful. A parallel performance model (PPM) can be used to evaluate the benefits of modifications and to identify performance bottlenecks. Using INL’s Falcon HPC, the PPM development incorporates an evaluation of network communication behavior over heterogeneous links and a functional characterization of the per-cell/angle/group runtime of each major code component. After evaluating several possible sources of variability, this resulted in a communication model and a parallel portion model. The former’s accuracy is bounded by the variability of communication on Falcon while the latter has an error on the order of 1%.
Evaluation method for the drying performance of enzyme containing formulations
DEFF Research Database (Denmark)
Sloth, Jakob; Bach, P.; Jensen, Anker Degn
2008-01-01
A method is presented for fast and cheap evaluation of the performance of enzyme containing formulations in terms of preserving the highest enzyme activity during spray drying. The method is based on modeling the kinetics of the thermal inactivation reaction which occurs during the drying process....... Relevant kinetic parameters are determined from differential scanning calorimeter (DSC) experiments and the model is used to simulate the severity of the inactivation reaction for temperatures and moisture levels relevant for spray drying. After conducting experiments and subsequent simulations...... for a number of different formulations it may be deduced which formulation performs best. This is illustrated by a formulation design study where 4 different enzyme containing formulations are evaluated. The method is validated by comparison to pilot scale spray dryer experiments....
International Nuclear Information System (INIS)
Samatova, Nagiza F; Branstetter, Marcia; Ganguly, Auroop R; Hettich, Robert; Khan, Shiraj; Kora, Guruprasad; Li, Jiangtian; Ma, Xiaosong; Pan, Chongle; Shoshani, Arie; Yoginath, Srikanth
2006-01-01
Ultrascale computing and high-throughput experimental technologies have enabled the production of scientific data about complex natural phenomena. With this opportunity, comes a new problem - the massive quantities of data so produced. Answers to fundamental questions about the nature of those phenomena remain largely hidden in the produced data. The goal of this work is to provide a scalable high performance statistical data analysis framework to help scientists perform interactive analyses of these raw data to extract knowledge. Towards this goal we have been developing an open source parallel statistical analysis package, called Parallel R, that lets scientists employ a wide range of statistical analysis routines on high performance shared and distributed memory architectures without having to deal with the intricacies of parallelizing these routines
Implementation of a high performance parallel finite element micromagnetics package
International Nuclear Information System (INIS)
Scholz, W.; Suess, D.; Dittrich, R.; Schrefl, T.; Tsiantos, V.; Forster, H.; Fidler, J.
2004-01-01
A new high performance scalable parallel finite element micromagnetics package has been implemented. It includes solvers for static energy minimization, time integration of the Landau-Lifshitz-Gilbert equation, and the nudged elastic band method
von Davier, Matthias
2016-01-01
This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…
Routing performance analysis and optimization within a massively parallel computer
Archer, Charles Jens; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen
2013-04-16
An apparatus, program product and method optimize the operation of a massively parallel computer system by, in part, receiving actual performance data concerning an application executed by the plurality of interconnected nodes, and analyzing the actual performance data to identify an actual performance pattern. A desired performance pattern may be determined for the application, and an algorithm may be selected from among a plurality of algorithms stored within a memory, the algorithm being configured to achieve the desired performance pattern based on the actual performance data.
Performance study of a cluster calculation; parallelization and application under geant4
International Nuclear Information System (INIS)
Trabelsi, Abir
2007-01-01
This work concretizes the final studies project for engineering computer sciences, it is archived within the national center of nuclear sciences and technology. The project consists in studying the performance of a set of machines in order to determine the best architecture to assemble them in a cluster. As well as the parallelism and the parallel implementation of GEANT4, as a tool of simulation. The realisation of this project consists on : 1) programming with C++ and executing the two benchmarks P MV and PMM on each station; 2) Interpreting this result in order to show the best architecture of the cluster; 3) parallelism with TOP-C the two benchmarks; 4) Executing the two Top-C versions on the cluster; 5) Generalizing this results; 6)parallelism et executing the parallel version of GEANT4. (Author). 14 refs
Energy Technology Data Exchange (ETDEWEB)
McLay, R.T.; Carey, G.F.
1996-12-31
In this study we consider parallel solution of sparse linear systems arising from discretized PDE`s. As part of our continuing work on our parallel PCG Solver package, we have made improvements in two areas. The first is improving the performance of the matrix-vector product. Here on regular finite-difference grids, we are able to use the cache memory more efficiently for smaller domains or where there are multiple degrees of freedom. The second problem of interest in the present work is the construction of preconditioners in the context of the parallel PCG solver we are developing. Here the problem is partitioned over a set of processors subdomains and the matrix-vector product for PCG is carried out in parallel for overlapping grid subblocks. For problems of scaled speedup, the actual rate of convergence of the unpreconditioned system deteriorates as the mesh is refined. Multigrid and subdomain strategies provide a logical approach to resolving the problem. We consider the parallel trade-offs between communication and computation and provide a complexity analysis of a representative algorithm. Some preliminary calculations using the parallel package and comparisons with other preconditioners are provided together with parallel performance results.
P3T+: A Performance Estimator for Distributed and Parallel Programs
Directory of Open Access Journals (Sweden)
T. Fahringer
2000-01-01
Full Text Available Developing distributed and parallel programs on today's multiprocessor architectures is still a challenging task. Particular distressing is the lack of effective performance tools that support the programmer in evaluating changes in code, problem and machine sizes, and target architectures. In this paper we introduce P3T+ which is a performance estimator for mostly regular HPF (High Performance Fortran programs but partially covers also message passing programs (MPI. P3T+ is unique by modeling programs, compiler code transformations, and parallel and distributed architectures. It computes at compile-time a variety of performance parameters including work distribution, number of transfers, amount of data transferred, transfer times, computation times, and number of cache misses. Several novel technologies are employed to compute these parameters: loop iteration spaces, array access patterns, and data distributions are modeled by employing highly effective symbolic analysis. Communication is estimated by simulating the behavior of a communication library used by the underlying compiler. Computation times are predicted through pre-measured kernels on every target architecture of interest. We carefully model most critical architecture specific factors such as cache lines sizes, number of cache lines available, startup times, message transfer time per byte, etc. P3T+ has been implemented and is closely integrated with the Vienna High Performance Compiler (VFC to support programmers develop parallel and distributed applications. Experimental results for realistic kernel codes taken from real-world applications are presented to demonstrate both accuracy and usefulness of P3T+.
International Nuclear Information System (INIS)
Mark Dennis Usang; Mohd Hairie Rabir; Mohd Amin Sharifuldin Salleh; Mohamad Puad Abu
2012-01-01
MPI parallelism are implemented on a SUN Workstation for running MCNPX and on the High Performance Computing Facility (HPC) for running MCNP5. 23 input less obtained from MCNP Criticality Validation Suite are utilized for the purpose of evaluating the amount of speed up achievable by using the parallel capabilities of MPI. More importantly, we will study the economics of using more processors and the type of problem where the performance gain are obvious. This is important to enable better practices of resource sharing especially for the HPC facilities processing time. Future endeavours in this direction might even reveal clues for best MCNP5/ MCNPX coding practices for optimum performance of MPI parallelisms. (author)
International Nuclear Information System (INIS)
Fischer, J.W.; Azmy, Y.Y.
2003-01-01
A previously reported parallel performance model for Angular Domain Decomposition (ADD) of the Discrete Ordinates method for solving multidimensional neutron transport problems is revisited for further validation. Three communication schemes: native MPI, the bucket algorithm, and the distributed bucket algorithm, are included in the validation exercise that is successfully conducted on a Beowulf cluster. The parallel performance model is comprised of three components: serial, parallel, and communication. The serial component is largely independent of the number of participating processors, P, while the parallel component decreases like 1/P. These two components are independent of the communication scheme, in contrast with the communication component that typically increases with P in a manner highly dependent on the global reduced algorithm. Correct trends for each component and each communication scheme were measured for the Arbitrarily High Order Transport (AHOT) code, thus validating the performance models. Furthermore, extensive experiments illustrate the superiority of the bucket algorithm. The primary question addressed in this research is: for a given problem size, which domain decomposition method, angular or spatial, is best suited to parallelize Discrete Ordinates methods on a specific computational platform? We address this question for three-dimensional applications via parallel performance models that include parameters specifying the problem size and system performance: the above-mentioned ADD, and a previously constructed and validated Spatial Domain Decomposition (SDD) model. We conclude that for large problems the parallel component dwarfs the communication component even on moderately large numbers of processors. The main advantages of SDD are: (a) scalability to higher numbers of processors of the order of the number of computational cells; (b) smaller memory requirement; (c) better performance than ADD on high-end platforms and large number of
CUDA/GPU Technology : Parallel Programming For High Performance Scientific Computing
YUHENDRA; KUZE, Hiroaki; JOSAPHAT, Tetuko Sri Sumantyo
2009-01-01
[ABSTRACT]Graphics processing units (GP Us) originally designed for computer video cards have emerged as the most powerful chip in a high-performance workstation. In the high performance computation capabilities, graphic processing units (GPU) lead to much more powerful performance than conventional CPUs by means of parallel processing. In 2007, the birth of Compute Unified Device Architecture (CUDA) and CUDA-enabled GPUs by NVIDIA Corporation brought a revolution in the general purpose GPU a...
Kinematic Analysis and Performance Evaluation of Novel PRS Parallel Mechanism
Balaji, K.; Khan, B. Shahul Hamid
2018-02-01
In this paper, a 3 DoF (Degree of Freedom) novel PRS (Prismatic-Revolute- Spherical) type parallel mechanisms has been designed and presented. The combination of striaght and arc type linkages for 3 DOF parallel mechanism is introduced for the first time. The performances of the mechanisms are evaluated based on the indices such as Minimum Singular Value (MSV), Condition Number (CN), Local Conditioning Index (LCI), Kinematic Configuration Index (KCI) and Global Conditioning Index (GCI). The overall reachable workspace of all mechanisms are presented. The kinematic measure, dexterity measure and workspace analysis for all the mechanism have been evaluated and compared.
Performance of broilers fed enzyme-supplemented tigernut ...
African Journals Online (AJOL)
A feeding trial was set up to study the effects of replacing maize with tigernut meal (TGN) at 0, 33.33, 66.67 and 100 per cent levels, with 0.10 per cent enzyme supplementation of all levels, on performance characteristics and carcass yield in broiler chicken for 8 weeks (56 days). A total of 200 Anak-2000 breed of broilers ...
Modifications Caused by Enzyme-Retting and Their Effect on Composite Performance
Directory of Open Access Journals (Sweden)
Jonn A. Foulk
2011-01-01
Full Text Available Bethune seed flax was collected from Canada with seed removed using a stripper header and straw pulled and left in field for several weeks. Unretted straw was decorticated providing a coarse fiber bundle feedstock for enzyme treatments. Enzyme treatments using a bacterial pectinolytic enzyme with lyase activity were conducted in lab-scale reactors. Four fiber specimens were created: no retting, minimal retting, moderate retting, and full retting. Fiber characterization tests: strength, elongation, diameter, metal content, wax content, and pH were conducted with significant differences between fibers. Thermosetting vinyl ester resin was used to produce composite panels via vacuum-assisted infusion. Composite performance was evaluated using fiber bundle pull-out, tensile, impact, and interlaminar shear tests. Composite tests indicate that composite panels are largely unchanged among fiber samples. Variation in composite performance might not be realized due to poor interfacial bonding being of larger impact than the more subtle changes incurred by the enzyme treatment.
Automatic performance tuning of parallel and accelerated seismic imaging kernels
Haberdar, Hakan
2014-01-01
With the increased complexity and diversity of mainstream high performance computing systems, significant effort is required to tune parallel applications in order to achieve the best possible performance for each particular platform. This task becomes more and more challenging and requiring a larger set of skills. Automatic performance tuning is becoming a must for optimizing applications such as Reverse Time Migration (RTM) widely used in seismic imaging for oil and gas exploration. An empirical search based auto-tuning approach is applied to the MPI communication operations of the parallel isotropic and tilted transverse isotropic kernels. The application of auto-tuning using the Abstract Data and Communication Library improved the performance of the MPI communications as well as developer productivity by providing a higher level of abstraction. Keeping productivity in mind, we opted toward pragma based programming for accelerated computation on latest accelerated architectures such as GPUs using the fairly new OpenACC standard. The same auto-tuning approach is also applied to the OpenACC accelerated seismic code for optimizing the compute intensive kernel of the Reverse Time Migration application. The application of such technique resulted in an improved performance of the original code and its ability to adapt to different execution environments.
Design of high-performance parallelized gene predictors in MATLAB.
Rivard, Sylvain Robert; Mailloux, Jean-Gabriel; Beguenane, Rachid; Bui, Hung Tien
2012-04-10
This paper proposes a method of implementing parallel gene prediction algorithms in MATLAB. The proposed designs are based on either Goertzel's algorithm or on FFTs and have been implemented using varying amounts of parallelism on a central processing unit (CPU) and on a graphics processing unit (GPU). Results show that an implementation using a straightforward approach can require over 4.5 h to process 15 million base pairs (bps) whereas a properly designed one could perform the same task in less than five minutes. In the best case, a GPU implementation can yield these results in 57 s. The present work shows how parallelism can be used in MATLAB for gene prediction in very large DNA sequences to produce results that are over 270 times faster than a conventional approach. This is significant as MATLAB is typically overlooked due to its apparent slow processing time even though it offers a convenient environment for bioinformatics. From a practical standpoint, this work proposes two strategies for accelerating genome data processing which rely on different parallelization mechanisms. Using a CPU, the work shows that direct access to the MEX function increases execution speed and that the PARFOR construct should be used in order to take full advantage of the parallelizable Goertzel implementation. When the target is a GPU, the work shows that data needs to be segmented into manageable sizes within the GFOR construct before processing in order to minimize execution time.
High performance parallelism pearls 2 multicore and many-core programming approaches
Jeffers, Jim
2015-01-01
High Performance Parallelism Pearls Volume 2 offers another set of examples that demonstrate how to leverage parallelism. Similar to Volume 1, the techniques included here explain how to use processors and coprocessors with the same programming - illustrating the most effective ways to combine Xeon Phi coprocessors with Xeon and other multicore processors. The book includes examples of successful programming efforts, drawn from across industries and domains such as biomed, genetics, finance, manufacturing, imaging, and more. Each chapter in this edited work includes detailed explanations of t
Performance Improvement of Shunt Active Power Filter With Dual Parallel Topology
DEFF Research Database (Denmark)
Asiminoaei, Lucian; Lascu, Cristian; Blaabjerg, Frede
2007-01-01
loop and the other is in a feedforward loop for harmonic compensation. Thus, both active power filters bring their own characteristic advantages, i.e., the feedback filter improves the steady-state performance of the harmonic mitigation and the feedforward filter improves the dynamic response. Another......This paper describes the control and parallel operation of two active power filters (APFs). Possible parallel operation situations of two APFs are investigated, and then the proposed topology is analyzed. The filters are coupled in a combined topology in which one filter is connected in a feedback...
Parallel integer sorting with medium and fine-scale parallelism
Dagum, Leonardo
1993-01-01
Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
Directory of Open Access Journals (Sweden)
Vanaja Gokul
2012-01-01
Full Text Available In distributed systems real time optimizations need to be performed dynamically for better utilization of the network resources. Real time optimizations can be performed effectively by using Cross Layer Optimization (CLO within the network operating system. This paper presents the performance evaluation of Cross Layer Optimization (CLO in comparison with the traditional approach of Single-Layer Optimization (SLO. In the parallel implementation of the approaches the experimental study carried out indicates that the CLO results in a significant improvement in network utilization when compared to SLO. A variant of the Particle Swarm Optimization technique that utilizes Digital Pheromones (PSODP for better performance has been used here. A significantly higher speed up in performance was observed from the parallel implementation of CLO that used PSODP on a cluster of nodes.
Effects of supplemental microbial phytase enzyme on performance ...
African Journals Online (AJOL)
This experiment was conducted to investigate the effects of supplemental phytase in a corn-wheatsoybean meal basal diet on phosphorus (P) digestibility and performance of broiler chicks. 378 one-day old broiler chicks (Ross 308) were allocated to 3×3 factorial arrangements with three levels of phytase enzyme (0, 500 ...
Kemari: A Portable High Performance Fortran System for Distributed Memory Parallel Processors
Directory of Open Access Journals (Sweden)
T. Kamachi
1997-01-01
Full Text Available We have developed a compilation system which extends High Performance Fortran (HPF in various aspects. We support the parallelization of well-structured problems with loop distribution and alignment directives similar to HPF's data distribution directives. Such directives give both additional control to the user and simplify the compilation process. For the support of unstructured problems, we provide directives for dynamic data distribution through user-defined mappings. The compiler also allows integration of message-passing interface (MPI primitives. The system is part of a complete programming environment which also comprises a parallel debugger and a performance monitor and analyzer. After an overview of the compiler, we describe the language extensions and related compilation mechanisms in detail. Performance measurements demonstrate the compiler's applicability to a variety of application classes.
HVI Ballistic Performance Characterization of Non-Parallel Walls
Bohl, William; Miller, Joshua; Christiansen, Eric
2012-01-01
The Double-Wall, "Whipple" Shield [1] has been the subject of many hypervelocity impact studies and has proven to be an effective shield system for Micro-Meteoroid and Orbital Debris (MMOD) impacts for spacecraft. The US modules of the International Space Station (ISS), with their "bumper shields" offset from their pressure holding rear walls provide good examples of effective on-orbit use of the double wall shield. The concentric cylinder shield configuration with its large radius of curvature relative to separation distance is easily and effectively represented for testing and analysis as a system of two parallel plates. The parallel plate double wall configuration has been heavily tested and characterized for shield performance for normal and oblique impacts for the ISS and other programs. The double wall shield and principally similar Stuffed Whipple Shield are very common shield types for MMOD protection. However, in some locations with many spacecraft designs, the rear wall cannot be modeled as being parallel or concentric with the outer bumper wall. As represented in Figure 1, there is an included angle between the two walls. And, with a cylindrical outer wall, the effective included angle constantly changes. This complicates assessment of critical spacecraft components located within outer spacecraft walls when using software tools such as NASA's BumperII. In addition, the validity of the risk assessment comes into question when using the standard double wall shield equations, especially since verification testing of every set of double wall included angles is impossible.
Rizki, Permata Nur Miftahur; Lee, Heezin; Lee, Minsu; Oh, Sangyoon
2017-01-01
With the rapid advance of remote sensing technology, the amount of three-dimensional point-cloud data has increased extraordinarily, requiring faster processing in the construction of digital elevation models. There have been several attempts to accelerate the computation using parallel methods; however, little attention has been given to investigating different approaches for selecting the most suited parallel programming model for a given computing environment. We present our findings and insights identified by implementing three popular high-performance parallel approaches (message passing interface, MapReduce, and GPGPU) on time demanding but accurate kriging interpolation. The performances of the approaches are compared by varying the size of the grid and input data. In our empirical experiment, we demonstrate the significant acceleration by all three approaches compared to a C-implemented sequential-processing method. In addition, we also discuss the pros and cons of each method in terms of usability, complexity infrastructure, and platform limitation to give readers a better understanding of utilizing those parallel approaches for gridding purposes.
International Nuclear Information System (INIS)
Nash, T.; Areti, H.; Atac, R.
1988-08-01
Fermilab's Advanced Computer Program (ACP) has been developing highly cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 MFlops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction. 10 refs., 7 figs
Directory of Open Access Journals (Sweden)
Subhash Chander
2015-11-01
Full Text Available This paper presents a study on impact of temperature on the performance of series and parallel connected mono-crystalline silicon (mono-Si solar cell employing solar simulator. The experiment was carried out at constant light intensity 550 W/m2with cell temperature in the range 25–60 oC for single, series and parallel connected mono-Si solar cells. The performance parameters like open circuit voltage, maximum power, fill factor and efficiency are found to decrease with cell temperature while the short circuit current is observed to increase. The experimental results reveal that silicon solar cells connected in series and parallel combinations follow the Kirchhoff’s laws and the temperature has a significant effect on the performance parameters of solar cell.
Moon, Hongsik
What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the
Visualizing Network Traffic to Understand the Performance of Massively Parallel Simulations
Landge, A. G.
2012-12-01
The performance of massively parallel applications is often heavily impacted by the cost of communication among compute nodes. However, determining how to best use the network is a formidable task, made challenging by the ever increasing size and complexity of modern supercomputers. This paper applies visualization techniques to aid parallel application developers in understanding the network activity by enabling a detailed exploration of the flow of packets through the hardware interconnect. In order to visualize this large and complex data, we employ two linked views of the hardware network. The first is a 2D view, that represents the network structure as one of several simplified planar projections. This view is designed to allow a user to easily identify trends and patterns in the network traffic. The second is a 3D view that augments the 2D view by preserving the physical network topology and providing a context that is familiar to the application developers. Using the massively parallel multi-physics code pF3D as a case study, we demonstrate that our tool provides valuable insight that we use to explain and optimize pF3D-s performance on an IBM Blue Gene/P system. © 1995-2012 IEEE.
Optimized Parallel Discrete Event Simulation (PDES) for High Performance Computing (HPC) Clusters
National Research Council Canada - National Science Library
Abu-Ghazaleh, Nael
2005-01-01
The aim of this project was to study the communication subsystem performance of state of the art optimistic simulator Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES...
Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana
2017-01-01
In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.
Park, Chul Min; Lee, Kyunghoon; Jun, Sun-Hee; Song, Sang Hoon; Song, Junghan
2017-08-15
Deficiencies in erythrocyte metabolic enzymes are associated with hereditary hemolytic anemia. Here, we report the development of a novel multiplex enzyme assay for six major enzymes, namely glucose-6-phosphate dehydrogenase, pyruvate kinase, pyrimidine 5'-nucleotidase, hexokinase, triosephosphate isomerase, and adenosine deaminase, deficiencies in which are implicated in erythrocyte enzymopathies. To overcome the drawbacks of traditional spectrophotometric enzyme assays, the present assay was based on ultra-performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS). The products of the six enzymes were directly measured by using ion pairing UPLC-MS/MS, and the precision, linearity, ion suppression, optimal sample amounts, and incubation times were evaluated. Eighty-three normal individuals and 13 patients with suspected enzymopathy were analyzed. The UPLC running time was within 5min. No ion suppression was observed at the retention time for the products or internal standards. We selected an optimal dilution factor and incubation time for each enzyme system. The intra- and inter-assay imprecision values (CVs) were 2.5-12.1% and 2.9-14.3%, respectively. The linearity of each system was good, with R 2 values >0.97. Patient samples showed consistently lower enzyme activities than those from normal individuals. The present ion paring UPLC-MS/MS assay enables facile and reproducible multiplex evaluation of the activity of enzymes implicated in enzymopathy-associated hemolytic anemia. Copyright © 2017 Elsevier B.V. All rights reserved.
The Parallel Algorithm Based on Genetic Algorithm for Improving the Performance of Cognitive Radio
Directory of Open Access Journals (Sweden)
Liu Miao
2018-01-01
Full Text Available The intercarrier interference (ICI problem of cognitive radio (CR is severe. In this paper, the machine learning algorithm is used to obtain the optimal interference subcarriers of an unlicensed user (un-LU. Masking the optimal interference subcarriers can suppress the ICI of CR. Moreover, the parallel ICI suppression algorithm is designed to improve the calculation speed and meet the practical requirement of CR. Simulation results show that the data transmission rate threshold of un-LU can be set, the data transmission quality of un-LU can be ensured, the ICI of a licensed user (LU is suppressed, and the bit error rate (BER performance of LU is improved by implementing the parallel suppression algorithm. The ICI problem of CR is solved well by the new machine learning algorithm. The computing performance of the algorithm is improved by designing a new parallel structure and the communication performance of CR is enhanced.
Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide
2015-09-01
The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Directory of Open Access Journals (Sweden)
MOHAMMED FAIZ ABOALMAALY
2014-10-01
Full Text Available With the continuous revolution of multicore architecture, several parallel programming platforms have been introduced in order to pave the way for fast and efficient development of parallel algorithms. Back into its categories, parallel computing can be done through two forms: Data-Level Parallelism (DLP or Task-Level Parallelism (TLP. The former can be done by the distribution of data among the available processing elements while the latter is based on executing independent tasks concurrently. Most of the parallel programming platforms have built-in techniques to distribute the data among processors, these techniques are technically known as automatic distribution (scheduling. However, due to their wide range of purposes, variation of data types, amount of distributed data, possibility of extra computational overhead and other hardware-dependent factors, manual distribution could achieve better outcomes in terms of performance when compared to the automatic distribution. In this paper, this assumption is investigated by conducting a comparison between automatic and our newly proposed manual distribution of data among threads in parallel. Empirical results of matrix addition and matrix multiplication show a considerable performance gain when manual distribution is applied against automatic distribution.
Performance Assessment in a Heat Exchanger Tube with Opposite/Parallel Wing Twisted Tapes
Directory of Open Access Journals (Sweden)
S. Eiamsa-ard
2015-02-01
Full Text Available The thermohydraulic performance in a tube containing a modified twisted tape with alternate-axes and wing arrangements is reported. This work aims to investigate the effects of wing arrangements (opposite (O and parallel (P wings at different wing shapes (triangle (Tri, rectangular (Rec, and trapezoidal (Tra wings and on the thermohydraulic performance characteristics. The obtained results show that wing twisted tapes with all wing shape arrangements (O-Tri/O-Rec/O-Tra/P-Tri/P-Rec/P-Tra give superior thermohydraulic performance and heat transfer rate to the typical twisted tape. In addition, the tapes with opposite wing arrangement of O-Tra, O-Rec, and O-Tri give superior thermohydraulic performances to those with parallel wing arrangement of P-Tra, P-Rec, and P-Tri around 2.7%, 3.5%, and 3.2%, respectively.
Parallel Performance Optimizations on Unstructured Mesh-based Simulations
Energy Technology Data Exchange (ETDEWEB)
Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid
2015-01-01
© The Authors. Published by Elsevier B.V. This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.
Computational Performance of a Parallelized Three-Dimensional High-Order Spectral Element Toolbox
Bosshard, Christoph; Bouffanais, Roland; Clémençon, Christian; Deville, Michel O.; Fiétier, Nicolas; Gruber, Ralf; Kehtari, Sohrab; Keller, Vincent; Latt, Jonas
In this paper, a comprehensive performance review of an MPI-based high-order three-dimensional spectral element method C++ toolbox is presented. The focus is put on the performance evaluation of several aspects with a particular emphasis on the parallel efficiency. The performance evaluation is analyzed with help of a time prediction model based on a parameterization of the application and the hardware resources. A tailor-made CFD computation benchmark case is introduced and used to carry out this review, stressing the particular interest for clusters with up to 8192 cores. Some problems in the parallel implementation have been detected and corrected. The theoretical complexities with respect to the number of elements, to the polynomial degree, and to communication needs are correctly reproduced. It is concluded that this type of code has a nearly perfect speed up on machines with thousands of cores, and is ready to make the step to next-generation petaflop machines.
9th International Workshop on Parallel Tools for High Performance Computing
Hilbrich, Tobias; Niethammer, Christoph; Gracia, José; Nagel, Wolfgang; Resch, Michael
2016-01-01
High Performance Computing (HPC) remains a driver that offers huge potentials and benefits for science and society. However, a profound understanding of the computational matters and specialized software is needed to arrive at effective and efficient simulations. Dedicated software tools are important parts of the HPC software landscape, and support application developers. Even though a tool is by definition not a part of an application, but rather a supplemental piece of software, it can make a fundamental difference during the development of an application. Such tools aid application developers in the context of debugging, performance analysis, and code optimization, and therefore make a major contribution to the development of robust and efficient parallel software. This book introduces a selection of the tools presented and discussed at the 9th International Parallel Tools Workshop held in Dresden, Germany, September 2-3, 2015, which offered an established forum for discussing the latest advances in paral...
Performance of a parallel plate volume calorimeter prototype
International Nuclear Information System (INIS)
Arefiev, A.; Bencze, Gy.L.; Bizzeti, A.; Choumilov, E.; Civinini, C; D'Alessandro, R.; Ferrando, A.; Fouz, M.C.; Iglesias, A.; Ivochkin, V.; Josa, M.I.; Malinin, A.; Meschini, M.; Misyura, S.; Pojidaev, V.; Salicio, J.M.; Sikler, F.
1995-01-01
An iron/gas parallel plate volume calorimeter prototype, working in the avalanche mode, has been tested using electrons of 20 to 150 GeV/c momentum with high voltages varying from 5400 to 5600 V (electric fields ranging from 36 to 37 KV/cm), and a gas mixture of CF4/CO, (80/20%). The collected charge was measured as a function of the high voltage and of the electron energy. The energy resolution was also measured. Comparisons are made with Monte-Carlo predictions. Agreement between data and simulation allows the calculation of the expected performance of a full size calorimeter. (Author)
Performance of a parallel plate volume calorimeter prototype
International Nuclear Information System (INIS)
Arefiev, A.; Bencze, G.L.; Bizzeti, A.
1995-09-01
An iron/gas parallel plate volume calorimeter prototype, working in the avalanche mode, has been tested using electrons of 20 to 150 GeV/c momentum with high voltages varying from 5400 to 5600 V (electric fields ranging from 36 to 37 KV/cm), and a gas mixture of CF 4 /CO 2 (80/20%). The collected charge was measured as a function of the high voltage and of the electron energy. The energy resolution was also measured. Comparisons are made with Monte-Carlo predictions. Agreement between data and simulation allows the calculation of the expected performance of a full size calorimeter
Rocchitta, Gaia; Spanu, Angela; Babudieri, Sergio; Latte, Gavinella; Madeddu, Giordano; Galleri, Grazia; Nuvoli, Susanna; Bagella, Paola; Demartis, Maria Ilaria; Fiore, Vito; Manetti, Roberto; Serra, Pier Andrea
2016-01-01
Enzyme-based chemical biosensors are based on biological recognition. In order to operate, the enzymes must be available to catalyze a specific biochemical reaction and be stable under the normal operating conditions of the biosensor. Design of biosensors is based on knowledge about the target analyte, as well as the complexity of the matrix in which the analyte has to be quantified. This article reviews the problems resulting from the interaction of enzyme-based amperometric biosensors with complex biological matrices containing the target analyte(s). One of the most challenging disadvantages of amperometric enzyme-based biosensor detection is signal reduction from fouling agents and interference from chemicals present in the sample matrix. This article, therefore, investigates the principles of functioning of enzymatic biosensors, their analytical performance over time and the strategies used to optimize their performance. Moreover, the composition of biological fluids as a function of their interaction with biosensing will be presented. PMID:27249001
Morse, H Stephen
1994-01-01
Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi
Energy Technology Data Exchange (ETDEWEB)
1991-10-23
An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
International Nuclear Information System (INIS)
Pang, Kar Mun; Ng, Hoon Kiat; Gan, Suyin
2012-01-01
Highlights: ► A performance benchmarking exercise is conducted for diesel combustion simulations. ► The reduced chemical mechanism shows its advantages over base and skeletal models. ► High efficiency and great reduction of CPU runtime are achieved through 4-node solver. ► Increasing ISAT memory from 0.1 to 2 GB reduces the CPU runtime by almost 35%. ► Combustion and soot processes are predicted well with minimal computational cost. - Abstract: In the present study, in-cylinder diesel combustion simulation was performed with parallel processing on an Intel Xeon Quad-Core platform to allow both fluid dynamics and chemical kinetics of the surrogate diesel fuel model to be solved simultaneously on multiple processors. Here, Cartesian Z-Coordinate was selected as the most appropriate partitioning algorithm since it computationally bisects the domain such that the dynamic load associated with fuel particle tracking was evenly distributed during parallel computations. Other variables examined included number of compute nodes, chemistry sizes and in situ adaptive tabulation (ISAT) parameters. Based on the performance benchmarking test conducted, parallel configuration of 4-compute node was found to reduce the computational runtime most efficiently whereby a parallel efficiency of up to 75.4% was achieved. The simulation results also indicated that accuracy level was insensitive to the number of partitions or the partitioning algorithms. The effect of reducing the number of species on computational runtime was observed to be more significant than reducing the number of reactions. Besides, the study showed that an increase in the ISAT maximum storage of up to 2 GB reduced the computational runtime by 50%. Also, the ISAT error tolerance of 10 −3 was chosen to strike a balance between results accuracy and computational runtime. The optimised parameters in parallel processing and ISAT, as well as the use of the in-house reduced chemistry model allowed accurate
Choudhary, Alok N.; Patel, Janak H.; Ahuja, Narendra
1989-01-01
In part 1 architecture of NETRA is presented. A performance evaluation of NETRA using several common vision algorithms is also presented. Performance of algorithms when they are mapped on one cluster is described. It is shown that SIMD, MIMD, and systolic algorithms can be easily mapped onto processor clusters, and almost linear speedups are possible. For some algorithms, analytical performance results are compared with implementation performance results. It is observed that the analysis is very accurate. Performance analysis of parallel algorithms when mapped across clusters is presented. Mappings across clusters illustrate the importance and use of shared as well as distributed memory in achieving high performance. The parameters for evaluation are derived from the characteristics of the parallel algorithms, and these parameters are used to evaluate the alternative communication strategies in NETRA. Furthermore, the effect of communication interference from other processors in the system on the execution of an algorithm is studied. Using the analysis, performance of many algorithms with different characteristics is presented. It is observed that if communication speeds are matched with the computation speeds, good speedups are possible when algorithms are mapped across clusters.
Performance of DS-CDMA systems with optimal hard-decision parallel interference cancellation
Hofstad, van der R.W.; Klok, M.J.
2003-01-01
We study a multiuser detection system for code-division multiple access (CDMA). We show that applying multistage hard-decision parallel interference cancellation (HD-PIC) significantly improves performance compared to the matched filter system. In (multistage) HD-PIC, estimates of the interfering
Performing an allreduce operation on a plurality of compute nodes of a parallel computer
Faraj, Ahmad [Rochester, MN
2012-04-17
Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.
Evaluating the performance of the particle finite element method in parallel architectures
Gimenez, Juan M.; Nigro, Norberto M.; Idelsohn, Sergio R.
2014-05-01
This paper presents a high performance implementation for the particle-mesh based method called particle finite element method two (PFEM-2). It consists of a material derivative based formulation of the equations with a hybrid spatial discretization which uses an Eulerian mesh and Lagrangian particles. The main aim of PFEM-2 is to solve transport equations as fast as possible keeping some level of accuracy. The method was found to be competitive with classical Eulerian alternatives for these targets, even in their range of optimal application. To evaluate the goodness of the method with large simulations, it is imperative to use of parallel environments. Parallel strategies for Finite Element Method have been widely studied and many libraries can be used to solve Eulerian stages of PFEM-2. However, Lagrangian stages, such as streamline integration, must be developed considering the parallel strategy selected. The main drawback of PFEM-2 is the large amount of memory needed, which limits its application to large problems with only one computer. Therefore, a distributed-memory implementation is urgently needed. Unlike a shared-memory approach, using domain decomposition the memory is automatically isolated, thus avoiding race conditions; however new issues appear due to data distribution over the processes. Thus, a domain decomposition strategy for both particle and mesh is adopted, which minimizes the communication between processes. Finally, performance analysis running over multicore and multinode architectures are presented. The Courant-Friedrichs-Lewy number used influences the efficiency of the parallelization and, in some cases, a weighted partitioning can be used to improve the speed-up. However the total cputime for cases presented is lower than that obtained when using classical Eulerian strategies.
Effect of enzymes on anaerobic digestion of primary sludge and septic tank performance.
Diak, James; Örmeci, Banu; Kennedy, Kevin J
2012-11-01
Enzyme additives are believed to improve septic tank performance by increasing the hydrolysis and digestion rates and maintaining a healthy microbial population. Previous studies reported mixed results on the effectiveness of enzymes on mesophilic and thermophilic digestion, and it is not clear whether enzymes would be effective under septic tank conditions where there is no heating or mixing, quantities of enzymes added are small, and they can be washed out quickly. In this study, batch reactors and continuous-flow reactors designed and operated as septic tanks were used to evaluate whether enzymatic treatment would increase the hydrolysis and digestion rates in primary sludge. Total solids, volatile solids, total suspended solids, total and soluble chemical oxygen demand, concentrations of protein, carbohydrate, ammonia and volatile acids in sludge and effluent samples were measured to determine the differences in digestion rates in the presence and absence of enzymes. Overall, no significant improvement was observed in enzyme-treated reactors compared with the control reactors.
Arkin, Ethem; Tekinerdogan, Bedir; Imre, Kayhan M.
2017-01-01
The need for high-performance computing together with the increasing trend from single processor to parallel computer architectures has leveraged the adoption of parallel computing. To benefit from parallel computing power, usually parallel algorithms are defined that can be mapped and executed
Bernal, Claudia; Rodríguez, Karen; Martínez, Ronny
2018-06-09
Enzyme immobilization often achieves reusable biocatalysts with improved operational stability and solvent resistance. However, these modifications are generally associated with a decrease in activity or detrimental modifications in catalytic properties. On the other hand, protein engineering aims to generate enzymes with increased performance at specific conditions by means of genetic manipulation, directed evolution and rational design. However, the achieved biocatalysts are generally generated as soluble enzymes, -thus not reusable- and their performance under real operational conditions is uncertain. Combined protein engineering and enzyme immobilization approaches have been employed as parallel or consecutive strategies for improving an enzyme of interest. Recent reports show efforts on simultaneously improving both enzymatic and immobilization components through genetic modification of enzymes and optimizing binding chemistry for site-specific and oriented immobilization. Nonetheless, enzyme engineering and immobilization are usually performed as separate workflows to achieve improved biocatalysts. In this review, we summarize and discuss recent research aiming to integrate enzyme immobilization and protein engineering and propose strategies to further converge protein engineering and enzyme immobilization efforts into a novel "immobilized biocatalyst engineering" research field. We believe that through the integration of both enzyme engineering and enzyme immobilization strategies, novel biocatalysts can be obtained, not only as the sum of independently improved intrinsic and operational properties of enzymes, but ultimately tailored specifically for increased performance as immobilized biocatalysts, potentially paving the way for a qualitative jump in the development of efficient, stable biocatalysts with greater real-world potential in challenging bioprocess applications. Copyright © 2018. Published by Elsevier Inc.
Directory of Open Access Journals (Sweden)
Lin Yuan
Full Text Available Three hundred one-day-old male broiler chickens (Ross-308 were fed corn-soybean basal diets containing non-starch polysaccharide (NSP enzyme and different levels of acid protease from 1 to 42 days of age to investigate the effects of exogenous enzymes on growth performance, digestive function, activity of endogenous digestive enzymes in the pancreas and mRNA expression of pancreatic digestive enzymes. For days 1-42, compared to the control chickens, average daily feed intake (ADFI and average daily gain (ADG were significantly enhanced by the addition of NSP enzyme in combination with protease supplementation at 40 or 80 mg/kg (p<0.05. Feed-to-gain ratio (FGR was significantly improved by supplementation with NSP enzymes or NSP enzyme combined with 40 or 80 mg/kg protease compared to the control diet (p<0.05. Apparent digestibility of crude protein (ADCP was significantly enhanced by the addition of NSP enzyme or NSP enzyme combined with 40 or 80 mg/kg protease (p<0.05. Cholecystokinin (CCK level in serum was reduced by 31.39% with NSP enzyme combined with protease supplementation at 160 mg/kg (p<0.05, but the CCK level in serum was increased by 26.51% with NSP enzyme supplementation alone. After 21 days, supplementation with NSP enzyme and NSP enzyme combined with 40 or 80 mg/kg protease increased the activity of pancreatic trypsin by 74.13%, 70.66% and 42.59% (p<0.05, respectively. After 42 days, supplementation with NSP enzyme and NSP enzyme combined with 40 mg/kg protease increased the activity of pancreatic trypsin by 32.45% and 27.41%, respectively (p<0.05. However, supplementation with NSP enzyme and 80 or 160 mg/kg protease decreased the activity of pancreatic trypsin by 10.75% and 25.88%, respectively (p<0.05. The activities of pancreatic lipase and amylase were significantly higher in treated animals than they were in the control group (p<0.05. Supplementation with NSP enzyme, NSP enzyme combined with 40 or 80 mg/kg protease increased
Effect of Different Levels of Extruded Soybean and Avizyme Enzyme on Broiler Performance
Directory of Open Access Journals (Sweden)
H Nasiri Mogadam
2012-01-01
Full Text Available An experiment was conducted to examine the effect of different levels of extruded soybean and enzyme on broiler performance. In a completely randomized design with 2×3 factorial arrangement, 480 one day-old, Ross broiler chickens were divided into 40 groups, 12 chicks per pen. Treatments were consisting of combination of four levels of extruded soybean (0.0, 5.0, 10.0 and 15.0 % and two levels of enzyme (0.0 and 500 g per ton. Different levels of extruded soybean and enzyme had no significant effect on blood factors such as cholesterol, triglyceride and the weight of liver and heart. The usage of extruded soybean and enzyme showed significantly higher weight gain and better feed conversion (p
DEFF Research Database (Denmark)
Nielsen, Kaspar Kirstein; Engelbrecht, Kurt; Bahl, Christian R.H.
2013-01-01
of 50 random stacks having equal average channel thicknesses with 20 channels each are used to provide a statistical base. The standard deviation of the stacks is varied as are the flow rate (Reynolds number) and the thermal conductivity of the solid heat exchanger material. It is found that the heat...... transfer performance of inhomogeneous stacks of parallel plates may be reduced significantly due to the maldistribution of the fluid flow compared to the ideal homogeneous case. The individual channels experience different flow velocities and this further induces an inter-channel thermal cross talk.......The heat transfer performance of inhomogeneous parallel plate heat exchangers in transient operation is investigated using an established model. A performance parameter, denoted the Nusselt-scaling factor, is used as benchmark and calculated using a well-established single blow technique. A sample...
Open | SpeedShop: An Open Source Infrastructure for Parallel Performance Analysis
Directory of Open Access Journals (Sweden)
Martin Schulz
2008-01-01
Full Text Available Over the last decades a large number of performance tools has been developed to analyze and optimize high performance applications. Their acceptance by end users, however, has been slow: each tool alone is often limited in scope and comes with widely varying interfaces and workflow constraints, requiring different changes in the often complex build and execution infrastructure of the target application. We started the Open | SpeedShop project about 3 years ago to overcome these limitations and provide efficient, easy to apply, and integrated performance analysis for parallel systems. Open | SpeedShop has two different faces: it provides an interoperable tool set covering the most common analysis steps as well as a comprehensive plugin infrastructure for building new tools. In both cases, the tools can be deployed to large scale parallel applications using DPCL/Dyninst for distributed binary instrumentation. Further, all tools developed within or on top of Open | SpeedShop are accessible through multiple fully equivalent interfaces including an easy-to-use GUI as well as an interactive command line interface reducing the usage threshold for those tools.
Massively parallel performance of neutron transport response matrix algorithms
International Nuclear Information System (INIS)
Hanebutte, U.R.; Lewis, E.E.
1993-01-01
Massively parallel red/black response matrix algorithms for the solution of within-group neutron transport problems are implemented on the Connection Machines-2, 200 and 5. The response matrices are dericed from the diamond-differences and linear-linear nodal discrete ordinate and variational nodal P 3 approximations. The unaccelerated performance of the iterative procedure is examined relative to the maximum rated performances of the machines. The effects of processor partitions size, of virtual processor ratio and of problems size are examined in detail. For the red/black algorithm, the ratio of inter-node communication to computing times is found to be quite small, normally of the order of ten percent or less. Performance increases with problems size and with virtual processor ratio, within the memeory per physical processor limitation. Algorithm adaptation to courser grain machines is straight-forward, with total computing time being virtually inversely proportional to the number of physical processors. (orig.)
High performance parallel computing of flows in complex geometries: I. Methods
International Nuclear Information System (INIS)
Gourdain, N; Gicquel, L; Montagnac, M; Vermorel, O; Staffelbach, G; Garcia, M; Boussuge, J-F; Gazaix, M; Poinsot, T
2009-01-01
Efficient numerical tools coupled with high-performance computers, have become a key element of the design process in the fields of energy supply and transportation. However flow phenomena that occur in complex systems such as gas turbines and aircrafts are still not understood mainly because of the models that are needed. In fact, most computational fluid dynamics (CFD) predictions as found today in industry focus on a reduced or simplified version of the real system (such as a periodic sector) and are usually solved with a steady-state assumption. This paper shows how to overcome such barriers and how such a new challenge can be addressed by developing flow solvers running on high-end computing platforms, using thousands of computing cores. Parallel strategies used by modern flow solvers are discussed with particular emphases on mesh-partitioning, load balancing and communication. Two examples are used to illustrate these concepts: a multi-block structured code and an unstructured code. Parallel computing strategies used with both flow solvers are detailed and compared. This comparison indicates that mesh-partitioning and load balancing are more straightforward with unstructured grids than with multi-block structured meshes. However, the mesh-partitioning stage can be challenging for unstructured grids, mainly due to memory limitations of the newly developed massively parallel architectures. Finally, detailed investigations show that the impact of mesh-partitioning on the numerical CFD solutions, due to rounding errors and block splitting, may be of importance and should be accurately addressed before qualifying massively parallel CFD tools for a routine industrial use.
CERN. Geneva
2012-01-01
side bus or processor interconnections. Parallelism can only result in performance gain, if the memory usage is optimized, memory locality improved and the communication between threads is minimized. But the domain of concurrent programming has become a field for highly skilled experts, as the implementation of multithreading is difficult, error prone and labor intensive. A full re-implementation for parallel execution of existing offline frameworks, like AliRoot in ALICE, is thus unaffordable. An alternative method, is to use a semi-automatic source-to-source transformation for getting a simple parallel design, with almost no interference between threads. This reduces the need of rewriting the develop...
International Nuclear Information System (INIS)
Adelmann, Andreas; Gsell, Achim; Oswald, Benedikt; Schietinger, Thomas; Bethel, Wes; Shalf, John; Siegerist, Cristina; Stockinger, Kurt
2007-01-01
Significant problems facing all experimental and computational sciences arise from growing data size and complexity. Common to all these problems is the need to perform efficient data I/O on diverse computer architectures. In our scientific application, the largest parallel particle simulations generate vast quantities of six-dimensional data. Such a simulation run produces data for an aggregate data size up to several TB per run. Motivated by the need to address data I/O and access challenges, we have implemented H5Part, an open source data I/O API that simplifies the use of the Hierarchical Data Format v5 library (HDF5). HDF5 is an industry standard for high performance, cross-platform data storage and retrieval that runs on all contemporary architectures from large parallel supercomputers to laptops. H5Part, which is oriented to the needs of the particle physics and cosmology communities, provides support for parallel storage and retrieval of particles, structured and in the future unstructured meshes. In this paper, we describe recent work focusing on I/O support for particles and structured meshes and provide data showing performance on modern supercomputer architectures like the IBM POWER 5
Study on High Performance of MPI-Based Parallel FDTD from WorkStation to Super Computer Platform
Directory of Open Access Journals (Sweden)
Z. L. He
2012-01-01
Full Text Available Parallel FDTD method is applied to analyze the electromagnetic problems of the electrically large targets on super computer. It is well known that the more the number of processors the less computing time consumed. Nevertheless, with the same number of processors, computing efficiency is affected by the scheme of the MPI virtual topology. Then, the influence of different virtual topology schemes on parallel performance of parallel FDTD is studied in detail. The general rules are presented on how to obtain the highest efficiency of parallel FDTD algorithm by optimizing MPI virtual topology. To show the validity of the presented method, several numerical results are given in the later part. Various comparisons are made and some useful conclusions are summarized.
Directory of Open Access Journals (Sweden)
Mark James Abraham
2015-09-01
Full Text Available GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported.
DEFF Research Database (Denmark)
Christensen, Mark Schram; Ehrsson, H Henrik; Nielsen, Jens Bo
2013-01-01
a different network, involving bilateral dorsal premotor cortex (PMd), primary motor cortex, and SMA, was more active when subjects viewed parallel movements while performing either symmetrical or parallel movements. Correlations between behavioral instability and brain activity were present in right lateral...... adduction-abduction movements symmetrically or in parallel with real-time congruent or incongruent visual feedback of the movements. One network, consisting of bilateral superior and middle frontal gyrus and supplementary motor area (SMA), was more active when subjects performed parallel movements, whereas...
The ongoing investigation of high performance parallel computing in HEP
Peach, Kenneth J; Böck, R K; Dobinson, Robert W; Hansroul, M; Norton, Alan Robert; Willers, Ian Malcolm; Baud, J P; Carminati, F; Gagliardi, F; McIntosh, E; Metcalf, M; Robertson, L; CERN. Geneva. Detector Research and Development Committee
1993-01-01
Past and current exploitation of parallel computing in High Energy Physics is summarized and a list of R & D projects in this area is presented. The applicability of new parallel hardware and software to physics problems is investigated, in the light of the requirements for computing power of LHC experiments and the current trends in the computer industry. Four main themes are discussed (possibilities for a finer grain of parallelism; fine-grain communication mechanism; usable parallel programming environment; different programming models and architectures, using standard commercial products). Parallel computing technology is potentially of interest for offline and vital for real time applications in LHC. A substantial investment in applications development and evaluation of state of the art hardware and software products is needed. A solid development environment is required at an early stage, before mainline LHC program development begins.
High performance parallel computing of flows in complex geometries: II. Applications
International Nuclear Information System (INIS)
Gourdain, N; Gicquel, L; Staffelbach, G; Vermorel, O; Duchaine, F; Boussuge, J-F; Poinsot, T
2009-01-01
Present regulations in terms of pollutant emissions, noise and economical constraints, require new approaches and designs in the fields of energy supply and transportation. It is now well established that the next breakthrough will come from a better understanding of unsteady flow effects and by considering the entire system and not only isolated components. However, these aspects are still not well taken into account by the numerical approaches or understood whatever the design stage considered. The main challenge is essentially due to the computational requirements inferred by such complex systems if it is to be simulated by use of supercomputers. This paper shows how new challenges can be addressed by using parallel computing platforms for distinct elements of a more complex systems as encountered in aeronautical applications. Based on numerical simulations performed with modern aerodynamic and reactive flow solvers, this work underlines the interest of high-performance computing for solving flow in complex industrial configurations such as aircrafts, combustion chambers and turbomachines. Performance indicators related to parallel computing efficiency are presented, showing that establishing fair criterions is a difficult task for complex industrial applications. Examples of numerical simulations performed in industrial systems are also described with a particular interest for the computational time and the potential design improvements obtained with high-fidelity and multi-physics computing methods. These simulations use either unsteady Reynolds-averaged Navier-Stokes methods or large eddy simulation and deal with turbulent unsteady flows, such as coupled flow phenomena (thermo-acoustic instabilities, buffet, etc). Some examples of the difficulties with grid generation and data analysis are also presented when dealing with these complex industrial applications.
Directory of Open Access Journals (Sweden)
majid kalantar
2016-04-01
Ross-308 broiler chickens were allocated randomly to 3 treatments with 5 replicates using a CRD statistical design. Treatments were included control, barley and barley+ enzyme. The experimental diets were formulated to have similar contents of crude protein, metabolizable energy, total non-starch polysaccharides (NSP. Results and Discussion According to the results, effect of barley with or without enzyme on growth performance at starter, grower and the entire period and also on carcass characteristics, pancreas enzyme activity and measures of ileal acidity and viscosity at the age of 42 were significant (P
High performance parallel backprojection on FPGA
Energy Technology Data Exchange (ETDEWEB)
Pfanner, Florian; Knaup, Michael; Kachelriess, Marc [Erlangen-Nuernberg Univ., Erlangen (Germany). Inst. of Medical Physics (IMP)
2011-07-01
Reconstruction of tomographic images, i.e., images from a Computed Tomography scanner, is a very time consuming issue. The most calculation power is needed for the backprojection step. A closer inspection shows that the algorithm for backprojection is easy to parallelize. FPGAs are able to execute many operations in the same time, so a highly parallel algorithm is a requirement for a powerful acceleration. For data flow rate maximization, we realized the backprojection in a pipelined structure with data throughput of one clock cycle. Due the hardware limitations of the FPGA, it is not possible to reconstruct the image as a whole. So it is necessary to split up the image and reconstruct these parts separately. Despite that, a reconstruction of 512 projections into a 5122 image is calculated within 13 ms on a Virtex 5 FPGA. To save hardware resources we use fixed point arithmetic with an accuracy of 23 bit for calculation. A comparison of the result image and an image, calculated with floating point arithmetic on CPU, shows that there are no differences between these images. (orig.)
Advanced parallel processing with supercomputer architectures
International Nuclear Information System (INIS)
Hwang, K.
1987-01-01
This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers
Parallelization of 2-D lattice Boltzmann codes
International Nuclear Information System (INIS)
Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo.
1996-03-01
Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author)
Parallelization of 2-D lattice Boltzmann codes
Energy Technology Data Exchange (ETDEWEB)
Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo
1996-03-01
Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author).
Design of parallel dual-energy X-ray beam and its performance for security radiography
International Nuclear Information System (INIS)
Kim, Kwang Hyun; Myoung, Sung Min; Chung, Yong Hyun
2011-01-01
A new concept of dual-energy X-ray beam generation and acquisition of dual-energy security radiography is proposed. Erbium (Er) and rhodium (Rh) with a copper filter were positioned in front of X-ray tube to generate low- and high-energy X-ray spectra. Low- and high-energy X-rays were guided to separately enter into two parallel detectors. Monte Carlo code of MCNPX was used to derive an optimum thickness of each filter for improved dual X-ray image quality. It was desired to provide separation ability between organic and inorganic matters for the condition of 140 kVp/0.8 mA as used in the security application. Acquired dual-energy X-ray beams were evaluated by the dual-energy Z-map yielding enhanced performance compared with a commercial dual-energy detector. A collimator for the parallel dual-energy X-ray beam was designed to minimize X-ray beam interference between low- and high-energy parallel beams for 500 mm source-to-detector distance.
Morita, Yukinori; Mori, Takahiro; Migita, Shinji; Mizubayashi, Wataru; Tanabe, Akihito; Fukuda, Koichi; Matsukawa, Takashi; Endo, Kazuhiko; O'uchi, Shin-ichi; Liu, Yongxun; Masahara, Meishoku; Ota, Hiroyuki
2014-12-01
The performance of parallel electric field tunnel field-effect transistors (TFETs), in which band-to-band tunneling (BTBT) was initiated in-line to the gate electric field was evaluated. The TFET was fabricated by inserting an epitaxially-grown parallel-plate tunnel capacitor between heavily doped source wells and gate insulators. Analysis using a distributed-element circuit model indicated there should be a limit of the drain current caused by the self-voltage-drop effect in the ultrathin channel layer.
Directory of Open Access Journals (Sweden)
S.O. Ayodele
2016-09-01
Full Text Available A feeding trial was conducted to study the performance, digestibility and health status of weaner rabbits fed diets including Alchornea cordifolia leaf meal (ALM: 18% crude protein [CP] and 12.9% crude fibre and supplemented with a multi-enzyme additive (cellulase, xylanase, β-glucanase, α-amylase, protease, lipase. Six experimental diets were arranged factorially: 3 levels of ALM (0, 5 and 10% substituting palm kernel cake: 16.3% CP and 39.1% neutral detergent fibre combined with 2 levels of enzyme supplementation (0 and 0.35 g/kg. One hundred and eighty healthy, 5-wk-old weaner rabbits of cross-breeds were randomly allotted to 6 dietary treatments (30 rabbits/treatment, 3 rabbits/replicate. Growth rate was not affected (P>0.05 by the main factors (exogenous enzyme and ALM inclusion and their interactions (13.5 g/d on av.. Daily feed intake and feed conversion ratio decreased (P=0.01 with the ALM inclusion by 8%, but did not affect faecal digestibility. However, enzyme supplementation improved crude protein and crude fibre digestibility (P<0.001 by 6%. In conclusion, ALM inclusion and enzyme supplementation had no adverse effect on the performance and digestibility of rabbits.
Parallel processing for fluid dynamics applications
International Nuclear Information System (INIS)
Johnson, G.M.
1989-01-01
The impact of parallel processing on computational science and, in particular, on computational fluid dynamics is growing rapidly. In this paper, particular emphasis is given to developments which have occurred within the past two years. Parallel processing is defined and the reasons for its importance in high-performance computing are reviewed. Parallel computer architectures are classified according to the number and power of their processing units, their memory, and the nature of their connection scheme. Architectures which show promise for fluid dynamics applications are emphasized. Fluid dynamics problems are examined for parallelism inherent at the physical level. CFD algorithms and their mappings onto parallel architectures are discussed. Several example are presented to document the performance of fluid dynamics applications on present-generation parallel processing devices
International Nuclear Information System (INIS)
Delaney, C.J.; Opheim, K.E.; Smith, A.L.; Plorde, J.J.
1982-01-01
We compared the accuracy, precision, and between-method error of the microbiological assay, the radioenzymatic assay, the homogeneous enzyme immunoassay, and the high-performance liquid chromatographic assay for the quantitation of gentamicin in serum. Precision and accuracy were evaluated by reference samples prepared to contain 0.0 to 32.7 micrograms of gentamicin per ml. Correlations between the methods utilized patient sera with gentamicin concentrations ranging from 0.6 to 13.3 micrograms/ml. All methods were reliable within acceptable limits for routine clinical use; intermethod correlation coefficients exceeded 0.96. Relative to the microbiological assay, the alternative methods offer the advantage of rapid analysis. The elapsed times for acquiring data on a set of 10 specimens under routine operating conditions were 0.5 h by the enzyme immunoassay, 4 h by the radioenzymatic assay, 5 h by the high-performance liquid chromatographic assay, and 10 h by the microbiological assay
Energy Technology Data Exchange (ETDEWEB)
Lober, R.R.; Tautges, T.J.; Vaughan, C.T.
1997-03-01
Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.
National Research Council Canada - National Science Library
Hisley, Dixie
1999-01-01
.... The goals of this report are: (1) to investigate the performance of message passing and loop level parallelization techniques, as they were implemented in the computational fluid dynamics (CFD...
Nam, Sung Sik; Alouini, Mohamed-Slim; Ko, Young-Chai
2018-01-01
In this paper, we statistically analyze the performance of a threshold-based parallel multiple beam selection scheme for a free-space optical (FSO) based system with wavelength division multiplexing (WDM) in cases where a pointing error has occurred
Effects of Maize Source and Complex Enzymes on Performance and Nutrient Utilization of Broilers
Directory of Open Access Journals (Sweden)
Defu Tang
2014-12-01
Full Text Available The objective of this study was to investigate the effect of maize source and complex enzymes containing amylase, xylanase and protease on performance and nutrient utilization of broilers. The experiment was a 4×3 factorial design with diets containing four source maize samples (M1, M2, M3, and M4 and without or with two kinds of complex enzyme A (Axtra XAP and B (Avizyme 1502. Nine hundred and sixty day old Arbor Acres broiler chicks were used in the trial (12 treatments with 8 replicate pens of 10 chicks. Birds fed M1 diet had better body weight gain (BWG and lower feed/gain ratio compared with those fed M3 diet and M4 diet (p0.05, respectively. The fresh feces output was significantly decreased by the addition of enzyme B (p<0.05. Maize source affects the nutrients digestibility and performance of broilers, and a combination of amylase, xylanase and protease is effective in improving the growth profiles of broilers fed maize-soybean-rapeseed-cotton mixed diets.
[Advances on enzymes and enzyme inhibitors research based on microfluidic devices].
Hou, Feng-Hua; Ye, Jian-Qing; Chen, Zuan-Guang; Cheng, Zhi-Yi
2010-06-01
With the continuous development in microfluidic fabrication technology, microfluidic analysis has evolved from a concept to one of research frontiers in last twenty years. The research of enzymes and enzyme inhibitors based on microfluidic devices has also made great progress. Microfluidic technology improved greatly the analytical performance of the research of enzymes and enzyme inhibitors by reducing the consumption of reagents, decreasing the analysis time, and developing automation. This review focuses on the development and classification of enzymes and enzyme inhibitors research based on microfluidic devices.
Automatic Loop Parallelization via Compiler Guided Refactoring
DEFF Research Database (Denmark)
Larsen, Per; Ladelsky, Razya; Lidman, Jacob
For many parallel applications, performance relies not on instruction-level parallelism, but on loop-level parallelism. Unfortunately, many modern applications are written in ways that obstruct automatic loop parallelization. Since we cannot identify sufficient parallelization opportunities...... for these codes in a static, off-line compiler, we developed an interactive compilation feedback system that guides the programmer in iteratively modifying application source, thereby improving the compiler’s ability to generate loop-parallel code. We use this compilation system to modify two sequential...... benchmarks, finding that the code parallelized in this way runs up to 8.3 times faster on an octo-core Intel Xeon 5570 system and up to 12.5 times faster on a quad-core IBM POWER6 system. Benchmark performance varies significantly between the systems. This suggests that semi-automatic parallelization should...
Parallel discrete event simulation
Overeinder, B.J.; Hertzberger, L.O.; Sloot, P.M.A.; Withagen, W.J.
1991-01-01
In simulating applications for execution on specific computing systems, the simulation performance figures must be known in a short period of time. One basic approach to the problem of reducing the required simulation time is the exploitation of parallelism. However, in parallelizing the simulation
Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis
Choudhary, Alok Nidhi
1989-01-01
Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.
Enzyme activity and kinetics in substrate-amended river sediments
Energy Technology Data Exchange (ETDEWEB)
Duddridge, J E; Wainwright, M
1982-01-01
In determining the effects of heavy metals in microbial activity and litter degradation in river sediments, one approach is to determine the effects of these pollutants on sediment enzyme activity and synthesis. Methods to assay amylase, cellulase and urease activity in diverse river sediments are reported. Enzyme activity was low in non-amended sediments, but increased markedly when the appropriate substrate was added, paralleling both athropogenic and natural amendment. Linear relationships between enzyme activity, length of incubation, sample size and substrate concentration were established. Sediment enzyme activity generally obeyed Michaelis-Menton kinetics, but of the three enzymes, urease gave least significant correlation coefficients when the data for substrate concentration versus activity was applied to the Eadie-Hofstee transformation of the Michaelis-Menten equation. K/sub m/ and V/sub max/ for amylase, cellulase and urease in sediments are reported. (JMT)
6th International Parallel Tools Workshop
Brinkmann, Steffen; Gracia, José; Resch, Michael; Nagel, Wolfgang
2013-01-01
The latest advances in the High Performance Computing hardware have significantly raised the level of available compute performance. At the same time, the growing hardware capabilities of modern supercomputing architectures have caused an increasing complexity of the parallel application development. Despite numerous efforts to improve and simplify parallel programming, there is still a lot of manual debugging and tuning work required. This process is supported by special software tools, facilitating debugging, performance analysis, and optimization and thus making a major contribution to the development of robust and efficient parallel software. This book introduces a selection of the tools, which were presented and discussed at the 6th International Parallel Tools Workshop, held in Stuttgart, Germany, 25-26 September 2012.
Template based parallel checkpointing in a massively parallel computer system
Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN
2009-01-13
A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
International Nuclear Information System (INIS)
Satake, Shinsuke; Okamoto, Masao; Nakajima, Noriyoshi; Takamaru, Hisanori
2005-11-01
A neoclassical transport simulation code (FORTEC-3D) applicable to three-dimensional configurations has been developed using High Performance Fortran (HPF). Adoption of computing techniques for parallelization and a hybrid simulation model to the δf Monte-Carlo method transport simulation, including non-local transport effects in three-dimensional configurations, makes it possible to simulate the dynamism of global, non-local transport phenomena with a self-consistent radial electric field within a reasonable computation time. In this paper, development of the transport code using HPF is reported. Optimization techniques in order to achieve both high vectorization and parallelization efficiency, adoption of a parallel random number generator, and also benchmark results, are shown. (author)
International Nuclear Information System (INIS)
Idomura, Yasuhiro; Jolliet, Sebastien
2010-01-01
A gyrokinetic toroidal five dimensional Eulerian code GT5D is ported on six advanced massively parallel platforms and comprehensive benchmark tests are performed. A parallelisation technique based on physical properties of the gyrokinetic equation is presented. By extending the parallelisation technique with a hybrid parallel model, the scalability of the code is improved on platforms with multi-core processors. In the benchmark tests, a good salability is confirmed up to several thousands cores on every platforms, and the maximum sustained performance of ∼18.6 Tflops is achieved using 16384 cores of BX900. (author)
Massively parallel mathematical sieves
Energy Technology Data Exchange (ETDEWEB)
Montry, G.R.
1989-01-01
The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.
Two experiments were conducted to determine the effects of treating sorghum WDG with solubles (SWDG) with an enzyme, or enzyme-buffer combination on diet digestibility and feedlot performance. Experimental treatments are; 1) untreated SWDG (control), 2) addition of an enzyme complex to SWDG (enzyme...
Performance of a parallel plate ionization chamber in beta radiation dosimetry
International Nuclear Information System (INIS)
Antonio, Patricia L.; Caldas, Linda V.E.
2011-01-01
A homemade parallel plate ionization chamber with graphite collecting electrode, and developed for use in mammography beams, was tested in relation to its usefulness in beta radiation dosimetry at the Calibration Laboratory of IPEN. Characterization tests of this ionization chamber were performed, using the Sr-90 + Y-90, Kr-85 and Pm-147 sources of a beta secondary standard system. The results of saturation, leakage current, stabilization time, response stability, linearity, angular dependence, and calibration coefficients are within the recommended limits of international recommendations that indicate that this chamber may be used for beta radiation dosimetry. (author)
Performance of a parallel plate ionization chamber in beta radiation dosimetry
Energy Technology Data Exchange (ETDEWEB)
Antonio, Patricia L.; Caldas, Linda V.E., E-mail: patrilan@ipen.b, E-mail: lcaldas@ipen.b [Instituto de Pesquisas Energeticas e Nucleares (IPEN/CNEN-SP), Sao Paulo, SP (Brazil)
2011-07-01
A homemade parallel plate ionization chamber with graphite collecting electrode, and developed for use in mammography beams, was tested in relation to its usefulness in beta radiation dosimetry at the Calibration Laboratory of IPEN. Characterization tests of this ionization chamber were performed, using the Sr-90 + Y-90, Kr-85 and Pm-147 sources of a beta secondary standard system. The results of saturation, leakage current, stabilization time, response stability, linearity, angular dependence, and calibration coefficients are within the recommended limits of international recommendations that indicate that this chamber may be used for beta radiation dosimetry. (author)
Simple, parallel, high-performance virtual machines for extreme computations
International Nuclear Information System (INIS)
Chokoufe Nejad, Bijan; Ohl, Thorsten; Reuter, Jurgen
2014-11-01
We introduce a high-performance virtual machine (VM) written in a numerically fast language like Fortran or C to evaluate very large expressions. We discuss the general concept of how to perform computations in terms of a VM and present specifically a VM that is able to compute tree-level cross sections for any number of external legs, given the corresponding byte code from the optimal matrix element generator, O'Mega. Furthermore, this approach allows to formulate the parallel computation of a single phase space point in a simple and obvious way. We analyze hereby the scaling behaviour with multiple threads as well as the benefits and drawbacks that are introduced with this method. Our implementation of a VM can run faster than the corresponding native, compiled code for certain processes and compilers, especially for very high multiplicities, and has in general runtimes in the same order of magnitude. By avoiding the tedious compile and link steps, which may fail for source code files of gigabyte sizes, new processes or complex higher order corrections that are currently out of reach could be evaluated with a VM given enough computing power.
Directory of Open Access Journals (Sweden)
J. Q. Yi
2013-08-01
Full Text Available Two experiments were conducted to evaluate the effect of supplementing a corn-soybean meal-based diet with an enzyme complex containing amylase, protease and xylanase on the performance, intestinal health, apparent ileal digestibility of amino acids and nutrient digestibility of weaned pigs. In Exp. 1, 108 piglets weaned at 28 d of age were fed one of three diets containing 0 (control, 100, or 150 ppm enzyme complex for 4 wks, based on a two-phase feeding program namely 1 to 7 d (phase 1 and 8 to 28 d (phase 2. At the end of the experiment, six pigs from the control group and the group supplemented with 150 ppm enzyme complex were chosen to collect digesta samples from intestine to measure viscosity and pH in the stomach, ileum, and cecum, as well as volatile fatty acid concentrations and composition of the microflora in the cecum and colon. There were linear increases (p<0.01 in weight gain, gain: feed ratio and digestibility of gross energy with the increasing dose rate of enzyme supplementation during the whole experiment. Supplementation with enzyme complex increased the digesta viscosity in the stomach (p<0.05 and significantly increased (p<0.01 the concentrations of acetic, propionic and butyric acid in the cecum and colon. Enzyme supplementation also significantly increased the population of Lactobacilli (p<0.01 in the cecum and decreased the population of E. coli (p<0.05 in the colon. In Exp. 2, six crossbred barrows (initial body weight: 18.26±1.21 kg, fitted with a simple T-cannula at the distal ileum, were assigned to three dietary treatments according to a replicated 3×3 Latin Square design. The experimental diets were the same as the diets used in phase 2 in Exp. 1. Apparent ileal digestibility of isoleucine (p<0.01, valine (p<0.05 and aspartic acid (p<0.05 linearly increased with the increasing dose rate of enzyme supplementation. In conclusion, supplementation of the diet with an enzyme complex containing amylase, protease and
International Nuclear Information System (INIS)
Kreitner, K.F.; Romaneehsen, Bernd; Oberholzer, Katja; Dueber, Christoph; Krummenauer, Frank; Mueller, L.P.
2006-01-01
The performance of a magnetic resonance (MR) imaging strategy that uses multiple receiver coil elements and integrated parallel imaging techniques (iPAT) in traumatic and degenerative disorders of the knee and to compare this technique with a standard MR imaging protocol was evaluated. Ninety patients with suspected internal derangements of the knee joint prospectively underwent MR imaging at 1.5 T. For signal detection, a 6-channel array coil was used. All patients were investigated with a standard imaging protocol consisting of different turbo spin-echo sequences proton density (PD), T 2 -weighted turbo spin echo (TSE) with and without fat suppression in three imaging planes. All sequences were repeated with an integrated parallel acquisition technique (iPAT) using the modified sensitivity encoding (mSENSE) algorithm with an acceleration factor of 2. Two radiologists independently evaluated and scored all images with regard to overall image quality, artefacts and pathologic findings. Agreement of the parallel ratings between readers and imaging techniques, respectively, was evaluated by means of pairwise kappa coefficients that were stratified for the area of evaluation. Agreement between the parallel readers for both the iPAT imaging and the conventional technique, respectively, as well as between imaging techniques was found encouraging with inter-observer kappa values ranging between 0.78 and 0.98 for both imaging techniques, and the inter-method kappa values ranging between 0.88 and 1.00 for both clinical readers. All pathological findings (e.g. occult fractures, meniscal and cruciate ligament tears, torn and interpositioned Hoffa's cleft, cartilage damage) were detected by both techniques with comparable performance. The use of iPAT lead to a 48% reduction of acquisition time compared with standard technique. Parallel imaging using mSENSE proved to be an efficient and economic tool for fast musculoskeletal MR imaging of the knee joint with comparable
Zaidi, H; Morel, Christian
1998-01-01
This paper describes the implementation of the Eidolon Monte Carlo program designed to simulate fully three-dimensional (3D) cylindrical positron tomographs on a MIMD parallel architecture. The original code was written in Objective-C and developed under the NeXTSTEP development environment. Different steps involved in porting the software on a parallel architecture based on PowerPC 604 processors running under AIX 4.1 are presented. Basic aspects and strategies of running Monte Carlo calculations on parallel computers are described. A linear decrease of the computing time was achieved with the number of computing nodes. The improved time performances resulting from parallelisation of the Monte Carlo calculations makes it an attractive tool for modelling photon transport in 3D positron tomography. The parallelisation paradigm used in this work is independent from the chosen parallel architecture
Programming massively parallel processors a hands-on approach
Kirk, David B
2010-01-01
Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...
Monovalent Cation Activation of the Radical SAM Enzyme Pyruvate Formate-Lyase Activating Enzyme.
Shisler, Krista A; Hutcheson, Rachel U; Horitani, Masaki; Duschene, Kaitlin S; Crain, Adam V; Byer, Amanda S; Shepard, Eric M; Rasmussen, Ashley; Yang, Jian; Broderick, William E; Vey, Jessica L; Drennan, Catherine L; Hoffman, Brian M; Broderick, Joan B
2017-08-30
Pyruvate formate-lyase activating enzyme (PFL-AE) is a radical S-adenosyl-l-methionine (SAM) enzyme that installs a catalytically essential glycyl radical on pyruvate formate-lyase. We show that PFL-AE binds a catalytically essential monovalent cation at its active site, yet another parallel with B 12 enzymes, and we characterize this cation site by a combination of structural, biochemical, and spectroscopic approaches. Refinement of the PFL-AE crystal structure reveals Na + as the most likely ion present in the solved structures, and pulsed electron nuclear double resonance (ENDOR) demonstrates that the same cation site is occupied by 23 Na in the solution state of the as-isolated enzyme. A SAM carboxylate-oxygen is an M + ligand, and EPR and circular dichroism spectroscopies reveal that both the site occupancy and the identity of the cation perturb the electronic properties of the SAM-chelated iron-sulfur cluster. ENDOR studies of the PFL-AE/[ 13 C-methyl]-SAM complex show that the target sulfonium positioning varies with the cation, while the observation of an isotropic hyperfine coupling to the cation by ENDOR measurements establishes its intimate, SAM-mediated interaction with the cluster. This monovalent cation site controls enzyme activity: (i) PFL-AE in the absence of any simple monovalent cations has little-no activity; and (ii) among monocations, going down Group 1 of the periodic table from Li + to Cs + , PFL-AE activity sharply maximizes at K + , with NH 4 + closely matching the efficacy of K + . PFL-AE is thus a type I M + -activated enzyme whose M + controls reactivity by interactions with the cosubstrate, SAM, which is bound to the catalytic iron-sulfur cluster.
Adsorption and enzyme activity of asparaginase at lipid Langmuir and Langmuir-Blodgett films
International Nuclear Information System (INIS)
Rocha Junior, Carlos da; Caseli, Luciano
2017-01-01
In this present work, the surface activity of the enzyme asparaginase was investigated at the air-water interface, presenting surface activity in high ionic strengths. Asparaginase was incorporated in Langmuir monolayers of the phospholipid dipalmitoylphosphatidylcholine (DPPC), forming a mixed film, which was characterized with surface pressure-area isotherms, surface potential-area isotherms, polarization-modulated infrared reflection-absorption spectroscopy (PM-IRRAS), and Brewster angle microscopy (BAM). The adsorption of the enzyme at the air-water interface condensed the lipid monolayer and increased the film compressibility at high surface pressures. Amide bands in the PM-IRRAS spectra were identified, with the C−N and C =O dipole moments lying parallel to monolayer plane, revealing the structuring of the enzyme into α-helices and β-sheets. The floating monolayers were transferred to solid supports as Langmuir-Blodgett (LB) films and characterized with fluorescence spectroscopy and atomic force microscopy. Catalytic activities of the films were measured and compared to the homogenous medium. The enzyme accommodated in the LB films preserved more than 78% of the enzyme activity after 30 days, in contrast for the homogeneous medium, which preserved less than 13%. The method presented in this work not only allows for an enhanced catalytic activity, but also can help explain why certain film architectures exhibit better performance. - Highlights: • Biomembranes are mimicked with Langmuir monolayers. • Asparaginase is incorporated into the lipid monolayer. • Enzyme adsorption is confirmed with tensiometry and infrared spectroscopy. • Langmuir-Blodgett films of the enzyme present enzyme activity.
Adsorption and enzyme activity of asparaginase at lipid Langmuir and Langmuir-Blodgett films
Energy Technology Data Exchange (ETDEWEB)
Rocha Junior, Carlos da; Caseli, Luciano, E-mail: lcaseli@unifesp.br
2017-04-01
In this present work, the surface activity of the enzyme asparaginase was investigated at the air-water interface, presenting surface activity in high ionic strengths. Asparaginase was incorporated in Langmuir monolayers of the phospholipid dipalmitoylphosphatidylcholine (DPPC), forming a mixed film, which was characterized with surface pressure-area isotherms, surface potential-area isotherms, polarization-modulated infrared reflection-absorption spectroscopy (PM-IRRAS), and Brewster angle microscopy (BAM). The adsorption of the enzyme at the air-water interface condensed the lipid monolayer and increased the film compressibility at high surface pressures. Amide bands in the PM-IRRAS spectra were identified, with the C−N and C =O dipole moments lying parallel to monolayer plane, revealing the structuring of the enzyme into α-helices and β-sheets. The floating monolayers were transferred to solid supports as Langmuir-Blodgett (LB) films and characterized with fluorescence spectroscopy and atomic force microscopy. Catalytic activities of the films were measured and compared to the homogenous medium. The enzyme accommodated in the LB films preserved more than 78% of the enzyme activity after 30 days, in contrast for the homogeneous medium, which preserved less than 13%. The method presented in this work not only allows for an enhanced catalytic activity, but also can help explain why certain film architectures exhibit better performance. - Highlights: • Biomembranes are mimicked with Langmuir monolayers. • Asparaginase is incorporated into the lipid monolayer. • Enzyme adsorption is confirmed with tensiometry and infrared spectroscopy. • Langmuir-Blodgett films of the enzyme present enzyme activity.
Computational enzyme design: transitioning from catalytic proteins to enzymes.
Mak, Wai Shun; Siegel, Justin B
2014-08-01
The widespread interest in enzymes stem from their ability to catalyze chemical reactions under mild and ecologically friendly conditions with unparalleled catalytic proficiencies. While thousands of naturally occurring enzymes have been identified and characterized, there are still numerous important applications for which there are no biological catalysts capable of performing the desired chemical transformation. In order to engineer enzymes for which there is no natural starting point, efforts using a combination of quantum chemistry and force-field based protein molecular modeling have led to the design of novel proteins capable of catalyzing chemical reactions not catalyzed by naturally occurring enzymes. Here we discuss the current status and potential avenues to pursue as the field of computational enzyme design moves forward. Published by Elsevier Ltd.
Overview of Parallel Platforms for Common High Performance Computing
Directory of Open Access Journals (Sweden)
T. Fryza
2012-04-01
Full Text Available The paper deals with various parallel platforms used for high performance computing in the signal processing domain. More precisely, the methods exploiting the multicores central processing units such as message passing interface and OpenMP are taken into account. The properties of the programming methods are experimentally proved in the application of a fast Fourier transform and a discrete cosine transform and they are compared with the possibilities of MATLAB's built-in functions and Texas Instruments digital signal processors with very long instruction word architectures. New FFT and DCT implementations were proposed and tested. The implementation phase was compared with CPU based computing methods and with possibilities of the Texas Instruments digital signal processing library on C6747 floating-point DSPs. The optimal combination of computing methods in the signal processing domain and new, fast routines' implementation is proposed as well.
International Nuclear Information System (INIS)
Takei, Toshifumi; Doi, Shun; Matsumoto, Hideki; Muramatsu, Kazuhiro
2000-01-01
We have developed a concurrent visualization system RVSLIB (Real-time Visual Simulation Library). This paper shows the effectiveness of the system when it is applied to large-scale unsteady simulations, for which the conventional post-processing approach may no longer work, on high-performance parallel vector supercomputers. The system performs almost all of the visualization tasks on a computation server and uses compressed visualized image data for efficient communication between the server and the user terminal. We have introduced several techniques, including vectorization and parallelization, into the system to minimize the computational costs of the visualization tools. The performance of RVSLIB was evaluated by using an actual CFD code on an NEC SX-4. The computational time increase due to the concurrent visualization was at most 3% for a smaller (1.6 million) grid and less than 1% for a larger (6.2 million) one. (author)
Harmonisation of seven common enzyme results through EQA.
Weykamp, Cas; Franck, Paul; Gunnewiek, Jacqueline Klein; de Jonge, Robert; Kuypers, Aldy; van Loon, Douwe; Steigstra, Herman; Cobbaert, Christa
2014-11-01
Equivalent results between different laboratories enable optimal patient care and can be achieved with harmonisation. We report on EQA-initiated national harmonisation of seven enzymes using commutable samples. EQA samples were prepared from human serum spiked with human recombinant enzymes. Target values were assigned with the IFCC Reference Measurement Procedures. The same samples were included at four occasions in the EQA programmes of 2012 and 2013. Laboratories were encouraged to report IFCC traceable results. A parallel study was done to confirm commutability of the samples. Of the 223 participating laboratories, 95% reported IFCC traceable results, ranging from 98% (ASAT) to 87% (amylase). Users of Roche and Siemens (97%) more frequently reported in IFCC traceable results than users of Abbott (91%), Beckman (90%), and Olympus (87%). The success of harmonisation, expressed as the recovery of assigned values and the inter-laboratory CV was: ALAT (recovery 100%; inter-lab CV 4%), ASAT (102%; 4%), LD (98%; 3%), CK (101%; 5%), GGT (98%; 4%), AP (96%; 6%), amylase (99%; 4%). There were no significant differences between the manufacturers. Commutability was demonstrated in the parallel study. Equal results in the same sample in the 2012 and 2013 EQA programmes demonstrated stability of the samples. The EQA-initiated national harmonisation of seven enzymes, using stable, commutable human serum samples, spiked with human recombinant enzymes, and targeted with the IFCC Reference Measurement Procedures, was successful in terms of implementation of IFCC traceable results (95%), recovery of the target (99%), and inter-laboratory CV (4%).
Parallel reservoir simulator computations
International Nuclear Information System (INIS)
Hemanth-Kumar, K.; Young, L.C.
1995-01-01
The adaptation of a reservoir simulator for parallel computations is described. The simulator was originally designed for vector processors. It performs approximately 99% of its calculations in vector/parallel mode and relative to scalar calculations it achieves speedups of 65 and 81 for black oil and EOS simulations, respectively on the CRAY C-90
Adesogan, A T; Ma, Z X; Romero, J J; Arriola, K G
2014-04-01
This paper aimed to summarize published responses to treatment of cattle diets with exogenous fibrolytic enzymes (EFE), to discuss reasons for variable EFE efficacy in animal trials, to recommend strategies for improving enzyme testing and EFE efficacy in ruminant diets, and to identify proteomic differences between effective and ineffective EFE. A meta-analysis of 20 dairy cow studies with 30 experiments revealed that only a few increased lactational performance and the response was inconsistent. This variability is attributable to several enzyme, feed, animal, and management factors that were discussed in this paper. The variability reflects our limited understanding of the synergistic and sequential interactions between exogenous glycosyl hydrolases, autochthonous ruminal microbes, and endogenous fibrolytic enzymes that are necessary to optimize ruminal fiber digestion. An added complication is that many of the standard methods of assaying EFE activities may over- or underestimate their potential effects because they are based on pure substrate saccharification and do not simulate ruminal conditions. Our recent evaluation of 18 commercial EFE showed that 78 and 83% of them exhibited optimal endoglucanase and xylanase activities, respectively, at 50 °C, and 77 and 61% had optimal activities at pH 4 to 5, respectively, indicating that most would likely act suboptimally in the rumen. Of the many fibrolytic activities that act synergistically to degrade forage fiber, the few usually assayed, typically endoglucanase and xylanase, cannot hydrolyze the recalcitrant phenolic acid-lignin linkages that are the main constraints to ruminal fiber degradation. These factors highlight the futility of random addition of EFE to diets. This paper discusses reasons for the variable animal responses to dietary addition of fibrolytic enzymes, advances explanations for the inconsistency, suggests a strategy to improve enzyme efficacy in ruminant diets, and describes differences
H5Part A Portable High Performance Parallel Data Interface for Particle Simulations
Adelmann, Andreas; Shalf, John M; Siegerist, Cristina
2005-01-01
Largest parallel particle simulations, in six dimensional phase space generate wast amont of data. It is also desirable to share data and data analysis tools such as ParViT (Particle Visualization Toolkit) among other groups who are working on particle-based accelerator simulations. We define a very simple file schema built on top of HDF5 (Hierarchical Data Format version 5) as well as an API that simplifies the reading/writing of the data to the HDF5 file format. HDF5 offers a self-describing machine-independent binary file format that supports scalable parallel I/O performance for MPI codes on a variety of supercomputing systems and works equally well on laptop computers. The API is available for C, C++, and Fortran codes. The file format will enable disparate research groups with very different simulation implementations to share data transparently and share data analysis tools. For instance, the common file format will enable groups that depend on completely different simulation implementations to share c...
International Nuclear Information System (INIS)
Masahiro, Tatsumi; Akio, Yamamoto
2003-01-01
A production code SCOPE2 was developed based on the fine-grained parallel algorithm by the red/black iterative method targeting parallel computing environments such as a PC-cluster. It can perform a depletion calculation in a few hours using a PC-cluster with the model based on a 9-group nodal-SP3 transport method in 3-dimensional pin-by-pin geometry for in-core fuel management of commercial PWRs. The present algorithm guarantees the identical convergence process as that in serial execution, which is very important from the viewpoint of quality management. The fine-mesh geometry is constructed by hierarchical decomposition with introduction of intermediate management layer as a block that is a quarter piece of a fuel assembly in radial direction. A combination of a mesh division scheme forcing even meshes on each edge and a latency-hidden communication algorithm provided simplicity and efficiency to message passing to enhance parallel performance. Inter-processor communication and parallel I/O access were realized using the MPI functions. Parallel performance was measured for depletion calculations by the 9-group nodal-SP3 transport method in 3-dimensional pin-by-pin geometry with 340 x 340 x 26 meshes for full core geometry and 170 x 170 x 26 for quarter core geometry. A PC cluster that consists of 24 Pentium-4 processors connected by the Fast Ethernet was used for the performance measurement. Calculations in full core geometry gave better speedups compared to those in quarter core geometry because of larger granularity. Fine-mesh sweep and feedback calculation parts gave almost perfect scalability since granularity is large enough, while 1-group coarse-mesh diffusion acceleration gave only around 80%. The speedup and parallel efficiency for total computation time were 22.6 and 94%, respectively, for the calculation in full core geometry with 24 processors. (authors)
Forced-convection boiling tests performed in parallel simulated LMR fuel assemblies
International Nuclear Information System (INIS)
Rose, S.D.; Carbajo, J.J.; Levin, A.E.; Lloyd, D.B.; Montgomery, B.H.; Wantland, J.L.
1985-01-01
Forced-convection tests have been carried out using parallel simulated Liquid Metal Reactor fuel assemblies in an engineering-scale sodium loop, the Thermal-Hydraulic Out-of-Reactor Safety facility. The tests, performed under single- and two-phase conditions, have shown that for low forced-convection flow there is significant flow augmentation by thermal convection, an important phenomenon under degraded shutdown heat removal conditions in an LMR. The power and flows required for boiling and dryout to occur are much higher than decay heat levels. The experimental evidence supports analytical results that heat removal from an LMR is possible with a degraded shutdown heat removal system
Performance analysis of a refrigeration system with parallel control of evaporation pressure
International Nuclear Information System (INIS)
Lee, Jong Suk
2008-01-01
The conventional refrigeration system is composed of a compressor, condenser, receiver, expansion valve or capillary tube, and an evaporator. The refrigeration system used in this study has additional expansion valve and evaporator along with an Evaporation Pressure Regulator(EPR) at the exit side of the evaporator. The two evaporators can be operated at different temperatures according to the opening of the EPR. The experimental results obtained using the refrigeration system with parallel control of evaporation pressure are presented and the performance analysis of the refrigeration system with two evaporators is conducted
Towards a streaming model for nested data parallelism
DEFF Research Database (Denmark)
Madsen, Frederik Meisner; Filinski, Andrzej
2013-01-01
The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable accuracy. However, this predictability, obtained through a uniform, parallelism-flattening......The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable accuracy. However, this predictability, obtained through a uniform, parallelism......-processable in a streaming fashion. This semantics is directly compatible with previously proposed piecewise execution models for nested data parallelism, but allows the expected space usage to be reasoned about directly at the source-language level. The language definition and implementation are still very much work...
Improvement of Parallel Algorithm for MATRA Code
International Nuclear Information System (INIS)
Kim, Seong-Jin; Seo, Kyong-Won; Kwon, Hyouk; Hwang, Dae-Hyun
2014-01-01
The feasibility study to parallelize the MATRA code was conducted in KAERI early this year. As a result, a parallel algorithm for the MATRA code has been developed to decrease a considerably required computing time to solve a bigsize problem such as a whole core pin-by-pin problem of a general PWR reactor and to improve an overall performance of the multi-physics coupling calculations. It was shown that the performance of the MATRA code was greatly improved by implementing the parallel algorithm using MPI communication. For problems of a 1/8 core and whole core for SMART reactor, a speedup was evaluated as about 10 when the numbers of used processor were 25. However, it was also shown that the performance deteriorated as the axial node number increased. In this paper, the procedure of a communication between processors is optimized to improve the previous parallel algorithm.. To improve the performance deterioration of the parallelized MATRA code, the communication algorithm between processors was newly presented. It was shown that the speedup was improved and stable regardless of the axial node number
NREL Discovers Enzyme Domains that Dramatically Improve Performance | News
of genomics data to find better enzymes, based on their genetic sequence alone. "In 10 years, it on these enzymes that can be targeted via genetic engineering to help break down cellulose faster to Decker, "At the time, tools for genetic engineering in Trichoderma were very limited, but we
Magnetically responsive enzyme powders
Energy Technology Data Exchange (ETDEWEB)
Pospiskova, Kristyna, E-mail: kristyna.pospiskova@upol.cz [Regional Centre of Advanced Technologies and Materials, Palacky University, Slechtitelu 11, 783 71 Olomouc (Czech Republic); Safarik, Ivo, E-mail: ivosaf@yahoo.com [Regional Centre of Advanced Technologies and Materials, Palacky University, Slechtitelu 11, 783 71 Olomouc (Czech Republic); Department of Nanobiotechnology, Institute of Nanobiology and Structural Biology of GCRC, Na Sadkach 7, 370 05 Ceske Budejovice (Czech Republic)
2015-04-15
Powdered enzymes were transformed into their insoluble magnetic derivatives retaining their catalytic activity. Enzyme powders (e.g., trypsin and lipase) were suspended in various liquid media not allowing their solubilization (e.g., saturated ammonium sulfate and highly concentrated polyethylene glycol solutions, ethanol, methanol, 2-propanol) and subsequently cross-linked with glutaraldehyde. Magnetic modification was successfully performed at low temperature in a freezer (−20 °C) using magnetic iron oxides nano- and microparticles prepared by microwave-assisted synthesis from ferrous sulfate. Magnetized cross-linked enzyme powders were stable at least for two months in water suspension without leakage of fixed magnetic particles. Operational stability of magnetically responsive enzymes during eight repeated reaction cycles was generally without loss of enzyme activity. Separation of magnetically modified cross-linked powdered enzymes from reaction mixtures was significantly simplified due to their magnetic properties. - Highlights: • Cross-linked enzyme powders were prepared in various liquid media. • Insoluble enzymes were magnetized using iron oxides particles. • Magnetic iron oxides particles were prepared by microwave-assisted synthesis. • Magnetic modification was performed under low (freezing) temperature. • Cross-linked powdered trypsin and lipase can be used repeatedly for reaction.
Huang, Fang; Liu, Dingsheng; Tan, Xicheng; Wang, Jian; Chen, Yunping; He, Binbin
2011-04-01
To design and implement an open-source parallel GIS (OP-GIS) based on a Linux cluster, the parallel inverse distance weighting (IDW) interpolation algorithm has been chosen as an example to explore the working model and the principle of algorithm parallel pattern (APP), one of the parallelization patterns for OP-GIS. Based on an analysis of the serial IDW interpolation algorithm of GRASS GIS, this paper has proposed and designed a specific parallel IDW interpolation algorithm, incorporating both single process, multiple data (SPMD) and master/slave (M/S) programming modes. The main steps of the parallel IDW interpolation algorithm are: (1) the master node packages the related information, and then broadcasts it to the slave nodes; (2) each node calculates its assigned data extent along one row using the serial algorithm; (3) the master node gathers the data from all nodes; and (4) iterations continue until all rows have been processed, after which the results are outputted. According to the experiments performed in the course of this work, the parallel IDW interpolation algorithm can attain an efficiency greater than 0.93 compared with similar algorithms, which indicates that the parallel algorithm can greatly reduce processing time and maximize speed and performance.
Data access performance through parallelization and vectored access. Some results
International Nuclear Information System (INIS)
Furano, F; Hanushevsky, A
2008-01-01
High Energy Physics data processing and analysis applications typically deal with the problem of accessing and processing data at high speed. Recent studies, development and test work have shown that the latencies due to data access can often be hidden by parallelizing them with the data processing, thus giving the ability to have applications which process remote data with a high level of efficiency. Techniques and algorithms able to reach this result have been implemented in the client side of the Scalla/xrootd system, and in this contribution we describe the results of some tests done in order to compare their performance and characteristics. These techniques, if used together with multiple streams data access, can also be effective in allowing to efficiently and transparently deal with data repositories accessible via a Wide Area Network
Parallel Monte Carlo reactor neutronics
International Nuclear Information System (INIS)
Blomquist, R.N.; Brown, F.B.
1994-01-01
The issues affecting implementation of parallel algorithms for large-scale engineering Monte Carlo neutron transport simulations are discussed. For nuclear reactor calculations, these include load balancing, recoding effort, reproducibility, domain decomposition techniques, I/O minimization, and strategies for different parallel architectures. Two codes were parallelized and tested for performance. The architectures employed include SIMD, MIMD-distributed memory, and workstation network with uneven interactive load. Speedups linear with the number of nodes were achieved
Deri, Robert J.; DeGroot, Anthony J.; Haigh, Ronald E.
2002-01-01
As the performance of individual elements within parallel processing systems increases, increased communication capability between distributed processor and memory elements is required. There is great interest in using fiber optics to improve interconnect communication beyond that attainable using electronic technology. Several groups have considered WDM, star-coupled optical interconnects. The invention uses a fiber optic transceiver to provide low latency, high bandwidth channels for such interconnects using a robust multimode fiber technology. Instruction-level simulation is used to quantify the bandwidth, latency, and concurrency required for such interconnects to scale to 256 nodes, each operating at 1 GFLOPS performance. Performance scales have been shown to .apprxeq.100 GFLOPS for scientific application kernels using a small number of wavelengths (8 to 32), only one wavelength received per node, and achievable optoelectronic bandwidth and latency.
Parallelization of Subchannel Analysis Code MATRA
International Nuclear Information System (INIS)
Kim, Seongjin; Hwang, Daehyun; Kwon, Hyouk
2014-01-01
A stand-alone calculation of MATRA code used up pertinent computing time for the thermal margin calculations while a relatively considerable time is needed to solve the whole core pin-by-pin problems. In addition, it is strongly required to improve the computation speed of the MATRA code to satisfy the overall performance of the multi-physics coupling calculations. Therefore, a parallel approach to improve and optimize the computability of the MATRA code is proposed and verified in this study. The parallel algorithm is embodied in the MATRA code using the MPI communication method and the modification of the previous code structure was minimized. An improvement is confirmed by comparing the results between the single and multiple processor algorithms. The speedup and efficiency are also evaluated when increasing the number of processors. The parallel algorithm was implemented to the subchannel code MATRA using the MPI. The performance of the parallel algorithm was verified by comparing the results with those from the MATRA with the single processor. It is also noticed that the performance of the MATRA code was greatly improved by implementing the parallel algorithm for the 1/8 core and whole core problems
Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing
Amooie, M. A.; Moortgat, J.
2017-12-01
We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.
International Nuclear Information System (INIS)
Yamada, Susumu; Igarashi, Ryo; Machida, Masahiko; Imamura, Toshiyuki; Okumura, Masahiko; Onishi, Hiroaki
2010-01-01
We parallelize the density matrix renormalization group (DMRG) method, which is a ground-state solver for one-dimensional quantum lattice systems. The parallelization allows us to extend the applicable range of the DMRG to n-leg ladders i.e., quasi two-dimension cases. Such an extension is regarded to bring about several breakthroughs in e.g., quantum-physics, chemistry, and nano-engineering. However, the straightforward parallelization requires all-to-all communications between all processes which are unsuitable for multi-core systems, which is a mainstream of current parallel computers. Therefore, we optimize the all-to-all communications by the following two steps. The first one is the elimination of the communications between all processes by only rearranging data distribution with the communication data amount kept. The second one is the avoidance of the communication conflict by rescheduling the calculation and the communication. We evaluate the performance of the DMRG method on multi-core supercomputers and confirm that our two-steps tuning is quite effective. (author)
Energy Technology Data Exchange (ETDEWEB)
2017-04-04
A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.
Language interoperability for high-performance parallel scientific components
International Nuclear Information System (INIS)
Elliot, N; Kohn, S; Smolinski, B
1999-01-01
With the increasing complexity and interdisciplinary nature of scientific applications, code reuse is becoming increasingly important in scientific computing. One method for facilitating code reuse is the use of components technologies, which have been used widely in industry. However, components have only recently worked their way into scientific computing. Language interoperability is an important underlying technology for these component architectures. In this paper, we present an approach to language interoperability for a high-performance parallel, component architecture being developed by the Common Component Architecture (CCA) group. Our approach is based on Interface Definition Language (IDL) techniques. We have developed a Scientific Interface Definition Language (SIDL), as well as bindings to C and Fortran. We have also developed a SIDL compiler and run-time library support for reference counting, reflection, object management, and exception handling (Babel). Results from using Babel to call a standard numerical solver library (written in C) from C and Fortran show that the cost of using Babel is minimal, where as the savings in development time and the benefits of object-oriented development support for C and Fortran far outweigh the costs
MulticoreBSP for C : A high-performance library for shared-memory parallel programming
Yzelman, A. N.; Bisseling, R. H.; Roose, D.; Meerbergen, K.
2014-01-01
The bulk synchronous parallel (BSP) model, as well as parallel programming interfaces based on BSP, classically target distributed-memory parallel architectures. In earlier work, Yzelman and Bisseling designed a MulticoreBSP for Java library specifically for shared-memory architectures. In the
Experiences in Data-Parallel Programming
Directory of Open Access Journals (Sweden)
Terry W. Clark
1997-01-01
Full Text Available To efficiently parallelize a scientific application with a data-parallel compiler requires certain structural properties in the source program, and conversely, the absence of others. A recent parallelization effort of ours reinforced this observation and motivated this correspondence. Specifically, we have transformed a Fortran 77 version of GROMOS, a popular dusty-deck program for molecular dynamics, into Fortran D, a data-parallel dialect of Fortran. During this transformation we have encountered a number of difficulties that probably are neither limited to this particular application nor do they seem likely to be addressed by improved compiler technology in the near future. Our experience with GROMOS suggests a number of points to keep in mind when developing software that may at some time in its life cycle be parallelized with a data-parallel compiler. This note presents some guidelines for engineering data-parallel applications that are compatible with Fortran D or High Performance Fortran compilers.
Automatic Parallelization Tool: Classification of Program Code for Parallel Computing
Directory of Open Access Journals (Sweden)
Mustafa Basthikodi
2016-04-01
Full Text Available Performance growth of single-core processors has come to a halt in the past decade, but was re-enabled by the introduction of parallelism in processors. Multicore frameworks along with Graphical Processing Units empowered to enhance parallelism broadly. Couples of compilers are updated to developing challenges forsynchronization and threading issues. Appropriate program and algorithm classifications will have advantage to a great extent to the group of software engineers to get opportunities for effective parallelization. In present work we investigated current species for classification of algorithms, in that related work on classification is discussed along with the comparison of issues that challenges the classification. The set of algorithms are chosen which matches the structure with different issues and perform given task. We have tested these algorithms utilizing existing automatic species extraction toolsalong with Bones compiler. We have added functionalities to existing tool, providing a more detailed characterization. The contributions of our work include support for pointer arithmetic, conditional and incremental statements, user defined types, constants and mathematical functions. With this, we can retain significant data which is not captured by original speciesof algorithms. We executed new theories into the device, empowering automatic characterization of program code.
Parallel pic plasma simulation through particle decomposition techniques
International Nuclear Information System (INIS)
Briguglio, S.; Vlad, G.; Di Martino, B.; Naples, Univ. 'Federico II'
1998-02-01
Particle-in-cell (PIC) codes are among the major candidates to yield a satisfactory description of the detail of kinetic effects, such as the resonant wave-particle interaction, relevant in determining the transport mechanism in magnetically confined plasmas. A significant improvement of the simulation performance of such codes con be expected from parallelization, e.g., by distributing the particle population among several parallel processors. Parallelization of a hybrid magnetohydrodynamic-gyrokinetic code has been accomplished within the High Performance Fortran (HPF) framework, and tested on the IBM SP2 parallel system, using a 'particle decomposition' technique. The adopted technique requires a moderate effort in porting the code in parallel form and results in intrinsic load balancing and modest inter processor communication. The performance tests obtained confirm the hypothesis of high effectiveness of the strategy, if targeted towards moderately parallel architectures. Optimal use of resources is also discussed with reference to a specific physics problem [it
HPC-NMF: A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization
Energy Technology Data Exchange (ETDEWEB)
2016-08-22
NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems for $\\WW$ and $\\HH$. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementation, our algorithm is also flexible: It performs well for both dense and sparse matrices, and allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors $\\WW$ and $\\HH$ within the alternating iterations.
PDDP, A Data Parallel Programming Model
Directory of Open Access Journals (Sweden)
Karen H. Warren
1996-01-01
Full Text Available PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.
International Nuclear Information System (INIS)
Heggarty, J.W.
1999-06-01
For almost thirty years, sequential R-matrix computation has been used by atomic physics research groups, from around the world, to model collision phenomena involving the scattering of electrons or positrons with atomic or molecular targets. As considerable progress has been made in the understanding of fundamental scattering processes, new data, obtained from more complex calculations, is of current interest to experimentalists. Performing such calculations, however, places considerable demands on the computational resources to be provided by the target machine, in terms of both processor speed and memory requirement. Indeed, in some instances the computational requirements are so great that the proposed R-matrix calculations are intractable, even when utilising contemporary classic supercomputers. Historically, increases in the computational requirements of R-matrix computation were accommodated by porting the problem codes to a more powerful classic supercomputer. Although this approach has been successful in the past, it is no longer considered to be a satisfactory solution due to the limitations of current (and future) Von Neumann machines. As a consequence, there has been considerable interest in the high performance multicomputers, that have emerged over the last decade which appear to offer the computational resources required by contemporary R-matrix research. Unfortunately, developing codes for these machines is not as simple a task as it was to develop codes for successive classic supercomputers. The difficulty arises from the considerable differences in the computing models that exist between the two types of machine and results in the programming of multicomputers to be widely acknowledged as a difficult, time consuming and error-prone task. Nevertheless, unless parallel R-matrix computation is realised, important theoretical and experimental atomic physics research will continue to be hindered. This thesis describes work that was undertaken in
Comparing the performance of different meta-heuristics for unweighted parallel machine scheduling
Directory of Open Access Journals (Sweden)
Adamu, Mumuni Osumah
2015-08-01
Full Text Available This article considers the due window scheduling problem to minimise the number of early and tardy jobs on identical parallel machines. This problem is known to be NP complete and thus finding an optimal solution is unlikely. Three meta-heuristics and their hybrids are proposed and extensive computational experiments are conducted. The purpose of this paper is to compare the performance of these meta-heuristics and their hybrids and to determine the best among them. Detailed comparative tests have also been conducted to analyse the different heuristics with the simulated annealing hybrid giving the best result.
Pattern-Driven Automatic Parallelization
Directory of Open Access Journals (Sweden)
Christoph W. Kessler
1996-01-01
Full Text Available This article describes a knowledge-based system for automatic parallelization of a wide class of sequential numerical codes operating on vectors and dense matrices, and for execution on distributed memory message-passing multiprocessors. Its main feature is a fast and powerful pattern recognition tool that locally identifies frequently occurring computations and programming concepts in the source code. This tool also works for dusty deck codes that have been "encrypted" by former machine-specific code transformations. Successful pattern recognition guides sophisticated code transformations including local algorithm replacement such that the parallelized code need not emerge from the sequential program structure by just parallelizing the loops. It allows access to an expert's knowledge on useful parallel algorithms, available machine-specific library routines, and powerful program transformations. The partially restored program semantics also supports local array alignment, distribution, and redistribution, and allows for faster and more exact prediction of the performance of the parallelized target code than is usually possible.
Computer-Aided Parallelizer and Optimizer
Jin, Haoqiang
2011-01-01
The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
Design strategies for irregularly adapting parallel applications
International Nuclear Information System (INIS)
Oliker, Leonid; Biswas, Rupak; Shan, Hongzhang; Sing, Jaswinder Pal
2000-01-01
Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance of dynamically adapting computations. In this work, we examine two major classes of adaptive applications, under five competing programming methodologies and four leading parallel architectures. Results indicate that it is possible to achieve message-passing performance using shared-memory programming techniques by carefully following the same high level strategies. Adaptive applications have computational work loads and communication patterns which change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications are therefore a challenging task. This work examines the implementation of two typical adaptive applications, Dynamic Remeshing and N-Body, across various programming paradigms and architectural platforms. We compare several critical factors of the parallel code development, including performance, programmability, scalability, algorithmic development, and portability
Parallel Computing Using Web Servers and "Servlets".
Lo, Alfred; Bloor, Chris; Choi, Y. K.
2000-01-01
Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications
Sun, Xian-He
1997-01-01
Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm
Alderson, Rosanna G.; Ferrari, Luna De; Mavridis, Lazaros; McDonagh, James L.; Mitchell, John B. O.; Nath, Neetika
2012-01-01
Over the last 50 years, sequencing, structural biology and bioinformatics have completely revolutionised biomolecular science, with millions of sequences and tens of thousands of three dimensional structures becoming available. The bioinformatics of enzymes is well served by, mostly free, online databases. BRENDA describes the chemistry, substrate specificity, kinetics, preparation and biological sources of enzymes, while KEGG is valuable for understanding enzymes and metabolic pathways. EzCatDB, SFLD and MACiE are key repositories for data on the chemical mechanisms by which enzymes operate. At the current rate of genome sequencing and manual annotation, human curation will never finish the functional annotation of the ever-expanding list of known enzymes. Hence there is an increasing need for automated annotation, though it is not yet widespread for enzyme data. In contrast, functional ontologies such as the Gene Ontology already profit from automation. Despite our growing understanding of enzyme structure and dynamics, we are only beginning to be able to design novel enzymes. One can now begin to trace the functional evolution of enzymes using phylogenetics. The ability of enzymes to perform secondary functions, albeit relatively inefficiently, gives clues as to how enzyme function evolves. Substrate promiscuity in enzymes is one example of imperfect specificity in protein-ligand interactions. Similarly, most drugs bind to more than one protein target. This may sometimes result in helpful polypharmacology as a drug modulates plural targets, but also often leads to adverse side-effects. Many cheminformatics approaches can be used to model the interactions between druglike molecules and proteins in silico. We can even use quantum chemical techniques like DFT and QM/MM to compute the structural and energetic course of enzyme catalysed chemical reaction mechanisms, including a full description of bond making and breaking. PMID:23116471
Parallel Application Performance on Two Generations of Intel Xeon HPC Platforms
Energy Technology Data Exchange (ETDEWEB)
Chang, Christopher H.; Long, Hai; Sides, Scott; Vaidhynathan, Deepthi; Jones, Wesley
2015-10-15
Two next-generation node configurations hosting the Haswell microarchitecture were tested with a suite of microbenchmarks and application examples, and compared with a current Ivy Bridge production node on NREL" tm s Peregrine high-performance computing cluster. A primary conclusion from this study is that the additional cores are of little value to individual task performance--limitations to application parallelism, or resource contention among concurrently running but independent tasks, limits effective utilization of these added cores. Hyperthreading generally impacts throughput negatively, but can improve performance in the absence of detailed attention to runtime workflow configuration. The observations offer some guidance to procurement of future HPC systems at NREL. First, raw core count must be balanced with available resources, particularly memory bandwidth. Balance-of-system will determine value more than processor capability alone. Second, hyperthreading continues to be largely irrelevant to the workloads that are commonly seen, and were tested here, at NREL. Finally, perhaps the most impactful enhancement to productivity might occur through enabling multiple concurrent jobs per node. Given the right type and size of workload, more may be achieved by doing many slow things at once, than fast things in order.
Directory of Open Access Journals (Sweden)
Harald Michalik
2009-02-01
Full Text Available Multicore platforms are such that have one physical processor chip with multiple cores interconnected via a chip level bus. Because they deliver a greater computing power through concurrency, offer greater system density multicore platforms provide best qualifications to address the performance bottleneck encountered in PC-based control systems for parallel kinematic robots with heavy CPU-load. Heavy load control tasks are generated by new control approaches that include features like singularity prediction, structure control algorithms, vision data integration and similar tasks. In this paper we introduce the parallel task scheduling extension of a communication architecture specially tailored for the development of PC-based control of parallel kinematics. The Sche-duling is specially designed for the processing on a multicore platform. It breaks down the serial task processing of the robot control cycle and extends it with parallel task processing paths in order to enhance the overall control performance.
Step by step parallel programming method for molecular dynamics code
International Nuclear Information System (INIS)
Orii, Shigeo; Ohta, Toshio
1996-07-01
Parallel programming for a numerical simulation program of molecular dynamics is carried out with a step-by-step programming technique using the two phase method. As a result, within the range of a certain computing parameters, it is found to obtain parallel performance by using the level of parallel programming which decomposes the calculation according to indices of do-loops into each processor on the vector parallel computer VPP500 and the scalar parallel computer Paragon. It is also found that VPP500 shows parallel performance in wider range computing parameters. The reason is that the time cost of the program parts, which can not be reduced by the do-loop level of the parallel programming, can be reduced to the negligible level by the vectorization. After that, the time consuming parts of the program are concentrated on less parts that can be accelerated by the do-loop level of the parallel programming. This report shows the step-by-step parallel programming method and the parallel performance of the molecular dynamics code on VPP500 and Paragon. (author)
Directory of Open Access Journals (Sweden)
P. Kang
2013-02-01
Full Text Available Paddy rice is rarely used as a feed because of its high fiber content. In this study, two experiments were conducted to study the effects of supplementing an enzyme complex consisting of xylanase, beta-glucanase and cellulase, to paddy-based diets on the performance and nutrient digestibility in meat-type ducks. In the both experiments, meat-type ducks (Cherry Valley were randomly assigned to four treatments. Treatment 1 was a basal diet of corn-soybean; treatment 2 was a basal diet of corn-paddy-soybean; treatment 3, had enzyme complex added to the corn-paddy-soybean basal diet at levels of 0.5 g/kg diet; and treatment 4, had enzyme complex added to the corn-paddy-soybean diet at levels of 1.0 g/kg diet. The results showed that the enzyme complex increased the ADG, and decreased the ADFI and F/G significantly (p0.05. The outcome of this research indicates that the application of enzyme complex made up of xylanase, beta-glucanase, and cellulase, in the corn-paddy-soybean diet, can improve performance and nutrition digestibility in meat-type ducks.
Artificial Enzymes, "Chemzymes"
DEFF Research Database (Denmark)
Bjerre, Jeannette; Rousseau, Cyril Andre Raphaël; Pedersen, Lavinia Georgeta M
2008-01-01
Enzymes have fascinated scientists since their discovery and, over some decades, one aim in organic chemistry has been the creation of molecules that mimic the active sites of enzymes and promote catalysis. Nevertheless, even today, there are relatively few examples of enzyme models that successf......Enzymes have fascinated scientists since their discovery and, over some decades, one aim in organic chemistry has been the creation of molecules that mimic the active sites of enzymes and promote catalysis. Nevertheless, even today, there are relatively few examples of enzyme models...... that successfully perform Michaelis-Menten catalysis under enzymatic conditions (i.e., aqueous medium, neutral pH, ambient temperature) and for those that do, very high rate accelerations are seldomly seen. This review will provide a brief summary of the recent developments in artificial enzymes, so called...... "Chemzymes", based on cyclodextrins and other molecules. Only the chemzymes that have shown enzyme-like activity that has been quantified by different methods will be mentioned. This review will summarize the work done in the field of artificial glycosidases, oxidases, epoxidases, and esterases, as well...
Directory of Open Access Journals (Sweden)
M.G. Mithra
2017-08-01
Full Text Available Two strategies leading to enzyme saving during saccharification of pretreated lignocellulo-starch biomass (LCSB was investigated which included reducing enzyme dosage by varying their levels in enzyme cocktails and enhancing the fermentable sugar yield in enzyme-reduced systems using detoxification chemicals. Time course release of reducing sugars (RS during 24–120 h was significantly higher when an enzyme cocktail containing full dose of cellulase (16 FPU/g cellulose along with half dose each of xylanase (1.5 mg protein/g hemicelluloses and Stargen (12.5 μl/g biomass was used to saccharify conventional dilute sulphuric acid (DSA pretreated biomass compared to a parallel system where only one-fourth the dose of the latter two enzymes was used. The reduction in RS content in the 120 h saccharified mash to the extent of 3–4 g/L compared to the system saccharified with full complement of the three enzymes could be overcome considerably by supplementing the system (half dose of two enzymes with detoxification chemical mix incorporating Tween 20, PEG 4000 and sodium borohydride. Microwave (MW-assisted DSA pretreated biomass on saccharification with enzyme cocktail having full dose of cellulase and half dose of Stargen along with detoxification chemicals gave significantly higher RS yield than DSA pretreated system saccharified using three enzymes. The study showed that xylanase could be eliminated during saccharification of MW-assisted DSA pretreated biomass without affecting RS yield when detoxification chemicals were also supplemented. The Saccharification Efficiency and Overall Conversion Efficiency were also high for the MW-assisted DSA pretreated biomass. Since whole slurry saccharifcation of pretreated biomass is essential to conserve fermentable sugars in LCSB saccharification, detoxification of soluble inhibitors is equally important as channelling out of insoluble lignin remaining in the residue. As one of the major factors contributing
Parallel Framework for Cooperative Processes
Directory of Open Access Journals (Sweden)
Mitică Craus
2005-01-01
Full Text Available This paper describes the work of an object oriented framework designed to be used in the parallelization of a set of related algorithms. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can be used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. Two applications are presented: an Ant Colony Optimization (ACO parallel algorithm for the Travelling Salesman Problem (TSP and an Image Processing (IP parallel algorithm for the Symmetrical Neighborhood Filter (SNF. The implementations of these applications by means of the parallel framework prove to have good performances: approximatively linear speedup and low communication cost.
Workspace optimization and kinematic performance evaluation of 2-DOF parallel mechanisms
International Nuclear Information System (INIS)
Nam, Yun Joo; Park, Myeong Kwan
2006-01-01
This paper presents the kinematics and workspace optimization of the two different 2-DOF (Degrees-of-Freedom) planar parallel mechanisms: one (called 2-RPR mechanism) with translational actuators and the other (called 2-RRR mechanism) with rotational ones. First of all, the inverse kinematics and Jacobian matrix for each mechanism are derived analytically. Then, the workspace including the output-space and the joint-space is systematically analyzed in order to determine the geometric parameters and the operating range of the actuators. Finally, the kinematic optimization of the mechanisms is performed in consideration of their dexterity and rigidity. It is expected that the optimization results can be effectively used as a basic material for the applications of the presented mechanisms to more industrial fields
Parallel plasma fluid turbulence calculations
International Nuclear Information System (INIS)
Leboeuf, J.N.; Carreras, B.A.; Charlton, L.A.; Drake, J.B.; Lynch, V.E.; Newman, D.E.; Sidikman, K.L.; Spong, D.A.
1994-01-01
The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center's CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated
Parallel Computing Strategies for Irregular Algorithms
Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)
2002-01-01
Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
A survey of parallel multigrid algorithms
Chan, Tony F.; Tuminaro, Ray S.
1987-01-01
A typical multigrid algorithm applied to well-behaved linear-elliptic partial-differential equations (PDEs) is described. Criteria for designing and evaluating parallel algorithms are presented. Before evaluating the performance of some parallel multigrid algorithms, consideration is given to some theoretical complexity results for solving PDEs in parallel and for executing the multigrid algorithm. The effect of mapping and load imbalance on the partial efficiency of the algorithm is studied.
Parallel 3-D method of characteristics in MPACT
International Nuclear Information System (INIS)
Kochunas, B.; Dovvnar, T. J.; Liu, Z.
2013-01-01
A new parallel 3-D MOC kernel has been developed and implemented in MPACT which makes use of the modular ray tracing technique to reduce computational requirements and to facilitate parallel decomposition. The parallel model makes use of both distributed and shared memory parallelism which are implemented with the MPI and OpenMP standards, respectively. The kernel is capable of parallel decomposition of problems in space, angle, and by characteristic rays up to 0(104) processors. Initial verification of the parallel 3-D MOC kernel was performed using the Takeda 3-D transport benchmark problems. The eigenvalues computed by MPACT are within the statistical uncertainty of the benchmark reference and agree well with the averages of other participants. The MPACT k eff differs from the benchmark results for rodded and un-rodded cases by 11 and -40 pcm, respectively. The calculations were performed for various numbers of processors and parallel decompositions up to 15625 processors; all producing the same result at convergence. The parallel efficiency of the worst case was 60%, while very good efficiency (>95%) was observed for cases using 500 processors. The overall run time for the 500 processor case was 231 seconds and 19 seconds for the case with 15625 processors. Ongoing work is focused on developing theoretical performance models and the implementation of acceleration techniques to minimize the number of iterations to converge. (authors)
Parallelization characteristics of the DeCART code
International Nuclear Information System (INIS)
Cho, J. Y.; Joo, H. G.; Kim, H. Y.; Lee, C. C.; Chang, M. H.; Zee, S. Q.
2003-12-01
This report is to describe the parallelization characteristics of the DeCART code and also examine its parallel performance. Parallel computing algorithms are implemented to DeCART to reduce the tremendous computational burden and memory requirement involved in the three-dimensional whole core transport calculation. In the parallelization of the DeCART code, the axial domain decomposition is first realized by using MPI (Message Passing Interface), and then the azimuthal angle domain decomposition by using either MPI or OpenMP. When using the MPI for both the axial and the angle domain decomposition, the concept of MPI grouping is employed for convenient communication in each communication world. For the parallel computation, most of all the computing modules except for the thermal hydraulic module are parallelized. These parallelized computing modules include the MOC ray tracing, CMFD, NEM, region-wise cross section preparation and cell homogenization modules. For the distributed allocation, most of all the MOC and CMFD/NEM variables are allocated only for the assigned planes, which reduces the required memory by a ratio of the number of the assigned planes to the number of all planes. The parallel performance of the DeCART code is evaluated by solving two problems, a rodded variation of the C5G7 MOX three-dimensional benchmark problem and a simplified three-dimensional SMART PWR core problem. In the aspect of parallel performance, the DeCART code shows a good speedup of about 40.1 and 22.4 in the ray tracing module and about 37.3 and 20.2 in the total computing time when using 48 CPUs on the IBM Regatta and 24 CPUs on the LINUX cluster, respectively. In the comparison between the MPI and OpenMP, OpenMP shows a somewhat better performance than MPI. Therefore, it is concluded that the first priority in the parallel computation of the DeCART code is in the axial domain decomposition by using MPI, and then in the angular domain using OpenMP, and finally the angular
Impact of interference on the performance of selection based parallel multiuser scheduling
Nam, Sungsik
2012-02-01
In conventional multiuser parallel scheduling schemes, every scheduled user is interfering with every other scheduled user, which limits the capacity and performance of multiuser systems, and the level of interference becomes substantial as the number of scheduled users increases. Based on the above observations, we investigate the trade-off between the system throughput and the number of scheduled users through the exact analysis of the total average sum rate capacity and the average spectral efficiency. Our analytical results can help the system designer to carefully select the appropriate number of scheduled users to maximize the overall throughput while maintaining an acceptable quality of service under certain channel conditions. © 2012 IEEE.
Effect of Enzyme Supplementation and Irradiation of Barley on Broiler Chicks Performance
International Nuclear Information System (INIS)
Farag, D.H.M.; Abd El-Hakeim, N.F.
1999-01-01
The experiments were conducted to study the influence of irradiation treatment at dose levels of 0.20 and 60 kGy on barley beta-glucan and the effect of enzyme supplementation and irradiation of barley on broiler chicks performance. The amount of total and water-soluble beta-glucan in raw barley was 36 kg -1 , respectively. The effect of irradiation treatment on total beta-glucan was insignificant while the level of soluble beta-glucan was increased with increasing the dose levels of irradiation. The effect of irradiation treatment and enzyme supplementation of barley diets on growth and conversion performance of broiler chicks indicated that birds fed raw barley diet had lower body weight, body weight gain and feed conversion than those fed control diet throughout the experimental period. Irradiation of barley at dose of 20 kGy did not affect the chick performance (feed consumption, weight gain feed-gain ratio) that received the B 20 diet from 7 to 21 days of age, but when bird maintained on B 20 diet from 7 28 days of age, only feed-gain ratio was improved by 14.4%. The results indicate that there was a significant effect of irradiation of barley at 60 kGy (B 60) on feed -gain ratio of chicks when were fed B 60 diet from 7 to 21 days of age. The corresponding improvement in feed-gain ratio was 16.4%. When birds were fed B 60 diet from 7-28 days of age, the improvement in body weight and feed-gain ratio was 25.5 and 19.6%, respectively
Parallel Backprojection: A Case Study in High-Performance Reconfigurable Computing
Directory of Open Access Journals (Sweden)
Cordes Ben
2009-01-01
Full Text Available High-performance reconfigurable computing (HPRC is a novel approach to provide large-scale computing power to modern scientific applications. Using both general-purpose processors and FPGAs allows application designers to exploit fine-grained and coarse-grained parallelism, achieving high degrees of speedup. One scientific application that benefits from this technique is backprojection, an image formation algorithm that can be used as part of a synthetic aperture radar (SAR processing system. We present an implementation of backprojection for SAR on an HPRC system. Using simulated data taken at a variety of ranges, our implementation runs over 200 times faster than a similar software program, with an overall application speedup better than 50x. The backprojection application is easily parallelizable, achieving near-linear speedup when run on multiple nodes of a clustered HPRC system. The results presented can be applied to other systems and other algorithms with similar characteristics.
Parallel Backprojection: A Case Study in High-Performance Reconfigurable Computing
Directory of Open Access Journals (Sweden)
2009-03-01
Full Text Available High-performance reconfigurable computing (HPRC is a novel approach to provide large-scale computing power to modern scientific applications. Using both general-purpose processors and FPGAs allows application designers to exploit fine-grained and coarse-grained parallelism, achieving high degrees of speedup. One scientific application that benefits from this technique is backprojection, an image formation algorithm that can be used as part of a synthetic aperture radar (SAR processing system. We present an implementation of backprojection for SAR on an HPRC system. Using simulated data taken at a variety of ranges, our implementation runs over 200 times faster than a similar software program, with an overall application speedup better than 50x. The backprojection application is easily parallelizable, achieving near-linear speedup when run on multiple nodes of a clustered HPRC system. The results presented can be applied to other systems and other algorithms with similar characteristics.
Russkova, Tatiana V.
2017-11-01
One tool to improve the performance of Monte Carlo methods for numerical simulation of light transport in the Earth's atmosphere is the parallel technology. A new algorithm oriented to parallel execution on the CUDA-enabled NVIDIA graphics processor is discussed. The efficiency of parallelization is analyzed on the basis of calculating the upward and downward fluxes of solar radiation in both a vertically homogeneous and inhomogeneous models of the atmosphere. The results of testing the new code under various atmospheric conditions including continuous singlelayered and multilayered clouds, and selective molecular absorption are presented. The results of testing the code using video cards with different compute capability are analyzed. It is shown that the changeover of computing from conventional PCs to the architecture of graphics processors gives more than a hundredfold increase in performance and fully reveals the capabilities of the technology used.
Design considerations for parallel graphics libraries
Crockett, Thomas W.
1994-01-01
Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
Parallelism in computations in quantum and statistical mechanics
International Nuclear Information System (INIS)
Clementi, E.; Corongiu, G.; Detrich, J.H.
1985-01-01
Often very fundamental biochemical and biophysical problems defy simulations because of limitations in today's computers. We present and discuss a distributed system composed of two IBM 4341 s and/or an IBM 4381 as front-end processors and ten FPS-164 attached array processors. This parallel system - called LCAP - has presently a peak performance of about 110 Mflops; extensions to higher performance are discussed. Presently, the system applications use a modified version of VM/SP as the operating system: description of the modifications is given. Three applications programs have been migrated from sequential to parallel: a molecular quantum mechanical, a Metropolis-Monte Carlo and a molecular dynamics program. Descriptions of the parallel codes are briefly outlined. Use of these parallel codes has already opened up new capabilities for our research. The very positive performance comparisons with today's supercomputers allow us to conclude that parallel computers and programming, of the type we have considered, represent a pragmatic answer to many computationally intensive problems. (orig.)
Performance of optical biosensor using alcohol oxidase enzyme for formaldehyde detection
Sari, A. P.; Rachim, A.; Nurlely, Fauzia, V.
2017-07-01
The recent issue in the world is the long exposure of formaldehyde which is can increase the risk of human health, therefore, that is very important to develop a device and method that can be optimized to detect the formaldehyde elements accurately, have a long lifetime and can be fabricated and produced in large quantities. A new and simple prepared optical biosensor for detection of formaldehyde in aqueous solutions using alcohol oxidase (AOX) enzyme was successfully fabricated. The poly-n-butyl acrylic-co-N-acryloxysuccinimide (nBA-NAS) membranes containing chromoionophore ETH5294 were used for immobilization of alcohol oxidase enzyme (AOX). Biosensor response was based on the colour change of chromoionophore as a result of enzymatic oxidation of formaldehyde and correlated with the detection concentration of formaldehyde. The performance of biosensor parameters were measured through the optical absorption value using UV-Vis spectrophotometer including the repeatability, reproducibility, selectivity and lifetime. The results showed that the prepared biosensor has good repeatability (RSD = 1.9 %) and good reproducibility (RSD = 2.1 %). The biosensor was selective formaldehyde with no disturbance by methanol, ethanol, and acetaldehyde, and also stable before 49 days and decrease by 41.77 % after 49 days.
Parallel visualization on leadership computing resources
Energy Technology Data Exchange (ETDEWEB)
Peterka, T; Ross, R B [Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439 (United States); Shen, H-W [Department of Computer Science and Engineering, Ohio State University, Columbus, OH 43210 (United States); Ma, K-L [Department of Computer Science, University of California at Davis, Davis, CA 95616 (United States); Kendall, W [Department of Electrical Engineering and Computer Science, University of Tennessee at Knoxville, Knoxville, TN 37996 (United States); Yu, H, E-mail: tpeterka@mcs.anl.go [Sandia National Laboratories, California, Livermore, CA 94551 (United States)
2009-07-01
Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.
Parallel visualization on leadership computing resources
International Nuclear Information System (INIS)
Peterka, T; Ross, R B; Shen, H-W; Ma, K-L; Kendall, W; Yu, H
2009-01-01
Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.
Determination of digoxin by enzyme immunoassay and radioimmunoassay
International Nuclear Information System (INIS)
Mueller, H.; Braeuer, H.; Foerster, G.; Reinhardt, M.
1978-01-01
The results of parallel determinations of digoxin in the sera of non selected patients (n=104) by enzyme immunoassay (EMIT.EIA) and radioimmunoassay (J-125 labeled RIA) were compared with each other. The determinations revealed considerably different concentrations; the values determined by EIA were statistical lower (for EIA 1,09+'0,99ng/ml, for RIA 1,34+'1,01ng/ml, p [de
Shared Variable Oriented Parallel Precompiler for SPMD Model
Institute of Scientific and Technical Information of China (English)
无
1995-01-01
For the moment,commercial parallel computer systems with distributed memory architecture are usually provided with parallel FORTRAN or parallel C compliers,which are just traditional sequential FORTRAN or C compilers expanded with communication statements.Programmers suffer from writing parallel programs with communication statements. The Shared Variable Oriented Parallel Precompiler (SVOPP) proposed in this paper can automatically generate appropriate communication statements based on shared variables for SPMD(Single Program Multiple Data) computation model and greatly ease the parallel programming with high communication efficiency.The core function of parallel C precompiler has been successfully verified on a transputer-based parallel computer.Its prominent performance shows that SVOPP is probably a break-through in parallel programming technique.
Energy Technology Data Exchange (ETDEWEB)
Cossar, D; Canevascini, G
1986-07-01
The cellulolytic fungus Sporotrichum (Chrysosporium) thermophile produces an extracellular cellobiose dehydrogenase during batch culture on cellulose or cellobiose. In chemostat culture at pH 5.6 on cellobiose this enzyme was produced in parallel with endo-cellulase. At pH 5.0 in continuous or fed-batch culture such a pattern was not evident. At constant growth rate in a chemostat with varying pH, activity of these enzymes was found to be poorly correlated. Thus the induction of cellobiose dehydrogenase shows a dependence on pH and cellobiose concentration which is different to that for endo-cellulase. The natural inducer of these enzymes and the role of cellubiose dehydrogenase remain to be elucidated.
Development of parallel/serial program analyzing tool
International Nuclear Information System (INIS)
Watanabe, Hiroshi; Nagao, Saichi; Takigawa, Yoshio; Kumakura, Toshimasa
1999-03-01
Japan Atomic Energy Research Institute has been developing 'KMtool', a parallel/serial program analyzing tool, in order to promote the parallelization of the science and engineering computation program. KMtool analyzes the performance of program written by FORTRAN77 and MPI, and it reduces the effort for parallelization. This paper describes development purpose, design, utilization and evaluation of KMtool. (author)
Automatic Management of Parallel and Distributed System Resources
Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.
1990-01-01
Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.
Parallel sparse direct solver for integrated circuit simulation
Chen, Xiaoming; Yang, Huazhong
2017-01-01
This book describes algorithmic methods and parallelization techniques to design a parallel sparse direct solver which is specifically targeted at integrated circuit simulation problems. The authors describe a complete flow and detailed parallel algorithms of the sparse direct solver. They also show how to improve the performance by simple but effective numerical techniques. The sparse direct solver techniques described can be applied to any SPICE-like integrated circuit simulator and have been proven to be high-performance in actual circuit simulation. Readers will benefit from the state-of-the-art parallel integrated circuit simulation techniques described in this book, especially the latest parallel sparse matrix solution techniques. · Introduces complicated algorithms of sparse linear solvers, using concise principles and simple examples, without complex theory or lengthy derivations; · Describes a parallel sparse direct solver that can be adopted to accelerate any SPICE-like integrated circuit simulato...
Productive Parallel Programming: The PCN Approach
Directory of Open Access Journals (Sweden)
Ian Foster
1992-01-01
Full Text Available We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel computer and run unchanged on supercomputers, and integrated debugging and performance analysis tools. We survey representative scientific applications and identify problem classes for which PCN has proved particularly useful.
The STAPL Parallel Graph Library
Harshvardhan,
2013-01-01
This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable distributed graph container and a collection of commonly used parallel graph algorithms. The library introduces pGraph pViews that separate algorithm design from the container implementation. It supports three graph processing algorithmic paradigms, level-synchronous, asynchronous and coarse-grained, and provides common graph algorithms based on them. Experimental results demonstrate improved scalability in performance and data size over existing graph libraries on more than 16,000 cores and on internet-scale graphs containing over 16 billion vertices and 250 billion edges. © Springer-Verlag Berlin Heidelberg 2013.
The Galley Parallel File System
Nieuwejaar, Nils; Kotz, David
1996-01-01
Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.
Data parallel sorting for particle simulation
Dagum, Leonardo
1992-01-01
Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.
International Nuclear Information System (INIS)
Hoisie, A.; Lubeck, O.; Wasserman, H.
1998-01-01
The authors develop a model for the parallel performance of algorithms that consist of concurrent, two-dimensional wavefronts implemented in a message passing environment. The model, based on a LogGP machine parameterization, combines the separate contributions of computation and communication wavefronts. They validate the model on three important supercomputer systems, on up to 500 processors. They use data from a deterministic particle transport application taken from the ASCI workload, although the model is general to any wavefront algorithm implemented on a 2-D processor domain. They also use the validated model to make estimates of performance and scalability of wavefront algorithms on 100-TFLOPS computer systems expected to be in existence within the next decade as part of the ASCI program and elsewhere. In this context, they analyze two problem sizes. The model shows that on the largest such problem (1 billion cells), inter-processor communication performance is not the bottleneck. Single-node efficiency is the dominant factor
Performing a local reduction operation on a parallel computer
Blocksome, Michael A.; Faraj, Daniel A.
2012-12-11
A parallel computer including compute nodes, each including two reduction processing cores, a network write processing core, and a network read processing core, each processing core assigned an input buffer. Copying, in interleaved chunks by the reduction processing cores, contents of the reduction processing cores' input buffers to an interleaved buffer in shared memory; copying, by one of the reduction processing cores, contents of the network write processing core's input buffer to shared memory; copying, by another of the reduction processing cores, contents of the network read processing core's input buffer to shared memory; and locally reducing in parallel by the reduction processing cores: the contents of the reduction processing core's input buffer; every other interleaved chunk of the interleaved buffer; the copied contents of the network write processing core's input buffer; and the copied contents of the network read processing core's input buffer.
Color enhancement of Fetească neagră wines by using pectolytic enzymes during maceration
Directory of Open Access Journals (Sweden)
Cezar Bichescu
2012-08-01
Full Text Available Red wine is the result of red grape must fermentation and the parallel extraction of various polyphenolic compounds from the grape berry skin. The aim of this work was to evaluate the influence of pectolytic enzymes addition during maceration fermentation on the chromatic characteristics of the Fetească neagră wine.An increase in concentrations of anthocyanins was observed when pectolytic enzymes were used during maceration fermentation process.
Beam dynamics simulations using a parallel version of PARMILA
International Nuclear Information System (INIS)
Ryne, R.D.
1996-01-01
The computer code PARMILA has been the primary tool for the design of proton and ion linacs in the United States for nearly three decades. Previously it was sufficient to perform simulations with of order 10000 particles, but recently the need to perform high resolution halo studies for next-generation, high intensity linacs has made it necessary to perform simulations with of order 100 million particles. With the advent of massively parallel computers such simulations are now within reach. Parallel computers already make it possible, for example, to perform beam dynamics calculations with tens of millions of particles, requiring over 10 GByte of core memory, in just a few hours. Also, parallel computers are becoming easier to use thanks to the availability of mature, Fortran-like languages such as Connection Machine Fortran and High Performance Fortran. We will describe our experience developing a parallel version of PARMILA and the performance of the new code
Beam dynamics simulations using a parallel version of PARMILA
International Nuclear Information System (INIS)
Ryne, Robert
1996-01-01
The computer code PARMILA has been the primary tool for the design of proton and ion linacs in the United States for nearly three decades. Previously it was sufficient to perform simulations with of order 10000 particles, but recently the need to perform high resolution halo studies for next-generation, high intensity linacs has made it necessary to perform simulations with of order 100 million particles. With the advent of massively parallel computers such simulations are now within reach. Parallel computers already make it possible, for example, to perform beam dynamics calculations with tens of millions of particles, requiring over 10 GByte of core memory, in just a few hours. Also, parallel computers are becoming easier to use thanks to the availability of mature, Fortran-like languages such as Connection Machine Fortran and High Performance Fortran. We will describe our experience developing a parallel version of PARMILA and the performance of the new code. (author)
Directory of Open Access Journals (Sweden)
P. Saenphoom
2013-04-01
Full Text Available This study examined whether pre-treating palm kernel expeller (PKE with exogenous enzyme would degrade its fiber content; thus improving its metabolizable energy (ME, growth performance, villus height and digesta viscosity in broiler chickens fed diets containing PKE. Our results showed that enzyme treatment decreased (p0.05 among treatment groups in the finisher period, ADG of chickens in the control (PKE-free diet was higher (p0.05 FCR. The intestinal villus height and crypt depth (duodenum, jejunum and ileum were not different (p>0.05 among treatments except for duodenal crypt depth. The villus height and crypt depth of birds in enzyme treated PKE diets were higher (p0.05 among treatments. Results of this study suggest that exogenous enzyme is effective in hydrolyzing the fiber (hemicellulose and cellulose component and improved the ME values of PKE, however, the above positive effects were not reflected in the growth performance in broiler chickens fed the enzyme treated PKE compared to those received raw PKE. The results suggest that PKE can be included up to 5% in the grower diet and 20% in the finisher diet without any significant negative effect on FCR in broiler chickens.
Performance of Polycrystalline Photovoltaic and Thermal Collector (PVT on Serpentine-Parallel Absor
Directory of Open Access Journals (Sweden)
Mustofa
2015-10-01
Full Text Available This paper presents the performance of an unglazed polycrystalline photovoltaic-thermal PVT on 0.045 kg/s mass flow rate. PVT combine photovoltaic modules and solar thermal collectors, forming a single device that receive solar radiation and produces heat and electricity simultaneously. The collector figures out serpentine-parallel tubes that can prolong fluid heat conductivity from morning till afternoon. During testing, cell PV, inlet and outlet fluid temperatures were recorded by thermocouple digital LM35 Arduino Mega 2560. Panel voltage and electric current were also noted in which they were connected to computer and presented each second data recorded. But, in this performance only shows in the certain significant time data. This because the electric current was only noted by multimeter device not the digital one. Based on these testing data, average cell efficiency was about 19%, while thermal efficiency of above 50% and correspondent cell efficiency of 11%, respectively.
An efficient parallel algorithm for matrix-vector multiplication
Energy Technology Data Exchange (ETDEWEB)
Hendrickson, B.; Leland, R.; Plimpton, S.
1993-03-01
The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in the well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.
Aspects of computation on asynchronous parallel processors
International Nuclear Information System (INIS)
Wright, M.
1989-01-01
The increasing availability of asynchronous parallel processors has provided opportunities for original and useful work in scientific computing. However, the field of parallel computing is still in a highly volatile state, and researchers display a wide range of opinion about many fundamental questions such as models of parallelism, approaches for detecting and analyzing parallelism of algorithms, and tools that allow software developers and users to make effective use of diverse forms of complex hardware. This volume collects the work of researchers specializing in different aspects of parallel computing, who met to discuss the framework and the mechanics of numerical computing. The far-reaching impact of high-performance asynchronous systems is reflected in the wide variety of topics, which include scientific applications (e.g. linear algebra, lattice gauge simulation, ordinary and partial differential equations), models of parallelism, parallel language features, task scheduling, automatic parallelization techniques, tools for algorithm development in parallel environments, and system design issues
Energy Technology Data Exchange (ETDEWEB)
Daneshmand, Morteza [University of Tartu, Tartu (Estonia); Saadatzi, Mohammad Hossein [Colorado School of Mines, Golden (United States); Kaloorazi, Mohammad Hadi [École de Technologie Supérieur, Montréal (Canada); Masouleh, Mehdi Tale [University of Tehran, Tehran (Iran, Islamic Republic of); Anbarjafari, Gholamreza [Hasan Kalyoncu University, Gaziantep (Turkmenistan)
2016-03-15
This study aims to provide an optimal design for a Spherical parallel manipulator (SPM), namely, the Agile Eye. This aim is approached by investigating kinetostatic performance and workspace and searching for the most promising design. Previously recommended designs are examined to determine whether they provide acceptable kinetostatic performance and workspace. Optimal designs are provided according to different kinetostatic performance indices, especially kinematic sensitivity. The optimization process is launched based on the concept of the genetic algorithm. A single-objective process is implemented in accordance with the guidelines of an evolutionary algorithm called differential evolution. A multi-objective procedure is then provided following the reasoning of the nondominated sorting genetic algorithm-II. This process results in several sets of Pareto points for reconciliation between kinetostatic performance indices and workspace. The concept of numerous kinetostatic performance indices and the results of optimization algorithms are elaborated. The conclusions provide hints on the provided set of designs and their credibility to provide a well-conditioned workspace and acceptable kinetostatic performance for the SPM under study, which can be well extended to other types of SPMs.
Evaluation of thermal performance of all-GaN power module in parallel operation
International Nuclear Information System (INIS)
Chou, Po-Chien; Cheng, Stone; Chen, Szu-Hao
2014-01-01
This work presents an extensive thermal characterization of a single discrete GaN high-electron-mobility transistor (HEMT) device when operated in parallel at temperatures of 25 °C–175 °C. The maximum drain current (I D max ), on-resistance (R ON ), pinch-off voltage (V P ) and peak transconductance (g m ) at various chamber temperatures are measured and correlations among these parameters studied. Understanding the dependence of key transistor parameters on temperature is crucial to inhibiting the generation of hot spots and the equalization of currents in the parallel operation of HEMTs. A detailed analysis of the current imbalance between two parallel HEMT cells and its consequential effect on the junction temperature are also presented. The results from variations in the characteristics of the parallel-connected devices further verify that the thermal stability and switching behavior of these cells are balanced. Two parallel HEMT cells are operated at a safe working distance from thermal runaway to prevent destruction of the hottest cell. - Highlights: • This work reveals the sorting process of GaN devices for parallel operation. • The variations of I D max , R ON , V P , and g m with temperature are established. • The temperature-dependence parameters are crucial to prevent hot spots generation. • Safe working operation prevents thermal runaway and hottest cell destruction
Parallel kinematics type, kinematics, and optimal design
Liu, Xin-Jun
2014-01-01
Parallel Kinematics- Type, Kinematics, and Optimal Design presents the results of 15 year's research on parallel mechanisms and parallel kinematics machines. This book covers the systematic classification of parallel mechanisms (PMs) as well as providing a large number of mechanical architectures of PMs available for use in practical applications. It focuses on the kinematic design of parallel robots. One successful application of parallel mechanisms in the field of machine tools, which is also called parallel kinematics machines, has been the emerging trend in advanced machine tools. The book describes not only the main aspects and important topics in parallel kinematics, but also references novel concepts and approaches, i.e. type synthesis based on evolution, performance evaluation and optimization based on screw theory, singularity model taking into account motion and force transmissibility, and others. This book is intended for researchers, scientists, engineers and postgraduates or above with interes...
Parallelization of an existing high energy physics event reconstruction software package
International Nuclear Information System (INIS)
Schiefer, R.; Francis, D.
1996-01-01
Software parallelization allows an efficient use of available computing power to increase the performance of applications. In a case study the authors have investigated the parallelization of high energy physics event reconstruction software in terms of costs (effort, computing resource requirements), benefits (performance increase) and the feasibility of a systematic parallelization approach. Guidelines facilitating a parallel implementation are proposed for future software development
Parallel processing of structural integrity analysis codes
International Nuclear Information System (INIS)
Swami Prasad, P.; Dutta, B.K.; Kushwaha, H.S.
1996-01-01
Structural integrity analysis forms an important role in assessing and demonstrating the safety of nuclear reactor components. This analysis is performed using analytical tools such as Finite Element Method (FEM) with the help of digital computers. The complexity of the problems involved in nuclear engineering demands high speed computation facilities to obtain solutions in reasonable amount of time. Parallel processing systems such as ANUPAM provide an efficient platform for realising the high speed computation. The development and implementation of software on parallel processing systems is an interesting and challenging task. The data and algorithm structure of the codes plays an important role in exploiting the parallel processing system capabilities. Structural analysis codes based on FEM can be divided into two categories with respect to their implementation on parallel processing systems. The first category codes such as those used for harmonic analysis, mechanistic fuel performance codes need not require the parallelisation of individual modules of the codes. The second category of codes such as conventional FEM codes require parallelisation of individual modules. In this category, parallelisation of equation solution module poses major difficulties. Different solution schemes such as domain decomposition method (DDM), parallel active column solver and substructuring method are currently used on parallel processing systems. Two codes, FAIR and TABS belonging to each of these categories have been implemented on ANUPAM. The implementation details of these codes and the performance of different equation solvers are highlighted. (author). 5 refs., 12 figs., 1 tab
Parallelization of quantum molecular dynamics simulation code
International Nuclear Information System (INIS)
Kato, Kaori; Kunugi, Tomoaki; Shibahara, Masahiko; Kotake, Susumu
1998-02-01
A quantum molecular dynamics simulation code has been developed for the analysis of the thermalization of photon energies in the molecule or materials in Kansai Research Establishment. The simulation code is parallelized for both Scalar massively parallel computer (Intel Paragon XP/S75) and Vector parallel computer (Fujitsu VPP300/12). Scalable speed-up has been obtained with a distribution to processor units by division of particle group in both parallel computers. As a result of distribution to processor units not only by particle group but also by the particles calculation that is constructed with fine calculations, highly parallelization performance is achieved in Intel Paragon XP/S75. (author)
Xyce parallel electronic simulator : users' guide.
Energy Technology Data Exchange (ETDEWEB)
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick
2011-05-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is
High-Performance Parallel and Stream Processing of X-ray Microdiffraction Data on Multicores
International Nuclear Information System (INIS)
Bauer, Michael A; McIntyre, Stewart; Xie Yuzhen; Biem, Alain; Tamura, Nobumichi
2012-01-01
We present the design and implementation of a high-performance system for processing synchrotron X-ray microdiffraction (XRD) data in IBM InfoSphere Streams on multicore processors. We report on the parallel and stream processing techniques that we use to harvest the power of clusters of multicores to analyze hundreds of gigabytes of synchrotron XRD data in order to reveal the microtexture of polycrystalline materials. The timing to process one XRD image using one pipeline is about ten times faster than the best C program at present. With the support of InfoSphere Streams platform, our software is able to be scaled up to operate on clusters of multi-cores for processing multiple images concurrently. This system provides a high-performance processing kernel to achieve near real-time data analysis of image data from synchrotron experiments.
Performance modeling and analysis of parallel Gaussian elimination on multi-core computers
Directory of Open Access Journals (Sweden)
Fadi N. Sibai
2014-01-01
Full Text Available Gaussian elimination is used in many applications and in particular in the solution of systems of linear equations. This paper presents mathematical performance models and analysis of four parallel Gaussian Elimination methods (precisely the Original method and the new Meet in the Middle –MiM– algorithms and their variants with SIMD vectorization on multi-core systems. Analytical performance models of the four methods are formulated and presented followed by evaluations of these models with modern multi-core systems’ operation latencies. Our results reveal that the four methods generally exhibit good performance scaling with increasing matrix size and number of cores. SIMD vectorization only makes a large difference in performance for low number of cores. For a large matrix size (n ⩾ 16 K, the performance difference between the MiM and Original methods falls from 16× with four cores to 4× with 16 K cores. The efficiencies of all four methods are low with 1 K cores or more stressing a major problem of multi-core systems where the network-on-chip and memory latencies are too high in relation to basic arithmetic operations. Thus Gaussian Elimination can greatly benefit from the resources of multi-core systems, but higher performance gains can be achieved if multi-core systems can be designed with lower memory operation, synchronization, and interconnect communication latencies, requirements of utmost importance and challenge in the exascale computing age.
Parallel thermal radiation transport in two dimensions
International Nuclear Information System (INIS)
Smedley-Stevenson, R.P.; Ball, S.R.
2003-01-01
This paper describes the distributed memory parallel implementation of a deterministic thermal radiation transport algorithm in a 2-dimensional ALE hydrodynamics code. The parallel algorithm consists of a variety of components which are combined in order to produce a state of the art computational capability, capable of solving large thermal radiation transport problems using Blue-Oak, the 3 Tera-Flop MPP (massive parallel processors) computing facility at AWE (United Kingdom). Particular aspects of the parallel algorithm are described together with examples of the performance on some challenging applications. (author)
Parallel thermal radiation transport in two dimensions
Energy Technology Data Exchange (ETDEWEB)
Smedley-Stevenson, R.P.; Ball, S.R. [AWE Aldermaston (United Kingdom)
2003-07-01
This paper describes the distributed memory parallel implementation of a deterministic thermal radiation transport algorithm in a 2-dimensional ALE hydrodynamics code. The parallel algorithm consists of a variety of components which are combined in order to produce a state of the art computational capability, capable of solving large thermal radiation transport problems using Blue-Oak, the 3 Tera-Flop MPP (massive parallel processors) computing facility at AWE (United Kingdom). Particular aspects of the parallel algorithm are described together with examples of the performance on some challenging applications. (author)
Optimization approaches to mpi and area merging-based parallel buffer algorithm
Directory of Open Access Journals (Sweden)
Junfu Fan
Full Text Available On buffer zone construction, the rasterization-based dilation method inevitably introduces errors, and the double-sided parallel line method involves a series of complex operations. In this paper, we proposed a parallel buffer algorithm based on area merging and MPI (Message Passing Interface to improve the performances of buffer analyses on processing large datasets. Experimental results reveal that there are three major performance bottlenecks which significantly impact the serial and parallel buffer construction efficiencies, including the area merging strategy, the task load balance method and the MPI inter-process results merging strategy. Corresponding optimization approaches involving tree-like area merging strategy, the vertex number oriented parallel task partition method and the inter-process results merging strategy were suggested to overcome these bottlenecks. Experiments were carried out to examine the performance efficiency of the optimized parallel algorithm. The estimation results suggested that the optimization approaches could provide high performance and processing ability for buffer construction in a cluster parallel environment. Our method could provide insights into the parallelization of spatial analysis algorithm.
High-energy physics software parallelization using database techniques
International Nuclear Information System (INIS)
Argante, E.; Van der Stok, P.D.V.; Willers, I.
1997-01-01
A programming model for software parallelization, called CoCa, is introduced that copes with problems caused by typical features of high-energy physics software. By basing CoCa on the database transaction paradigm, the complexity induced by the parallelization is for a large part transparent to the programmer, resulting in a higher level of abstraction than the native message passing software. CoCa is implemented on a Meiko CS-2 and on a SUN SPARCcenter 2000 parallel computer. On the CS-2, the performance is comparable with the performance of native PVM and MPI. (orig.)
Directory of Open Access Journals (Sweden)
Hadavi A
2015-12-01
Full Text Available An experiment was conducted to study the effects of carbon tetrachloride (CCl4 on post-peak performance and serum enzymes of Hy-Line W-36 laying hens from 32-36 weeks of age. The experiment was carried out with a total of 192 laying hens in a completely randomized block design. During the experiment laying hens were allocated to 4 groups consisted of T1 no CCl4 as control diet, T2, T3 and T4 control diet supplemented with 1, 3 and 5 mL CCl4/100 g diet, respectively. Each experimental group was divided into 6 blocks of 8 hens each. Egg production, cracked egg percentage and feed intake were recorded weekly. Blood samples were taken from wing veins of hens at the middle and end of the experiment to measure serum hepatic enzymes of alkaline phosphatase, alanine aminotransferase and aspartate aminotransferase. Data showed that in comparison with the control group, the inclusion of CCl4 to the diets had no significant effect on performance parameters. However, by increasing the level of CCl4, egg production was linearly decreased and feed intake was linearly increased (P < 0.05. The effect of CCl4 on cracked eggs was significant and this effect was linearly increased (P < 0.05. Dietary supplementation of 3 and 5 mL CCl4 elevated the serum concentration of hepatic enzymes of alkaline phosphatase, aspartate aminotransferase and alanine aminotransferase, linearly (P < 0.0001. In conclusion, the dietary supplementation of CCl4 has the ability to decrease the performance and egg quality. CCl4 is also a potent hepatic toxicity inducer and may damage liver hepatocytes. Therefore, the level of 3 mL CCl4 was assigned as the one had the maximum negative effect on serum hepatic enzymes concentration (maximum liver damage alongside the minimum negative effect on laying hen performance for further studies.
Efficient multitasking: parallel versus serial processing of multiple tasks.
Fischer, Rico; Plessow, Franziska
2015-01-01
In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling.
Nam, Sung Sik; Yoon, Chang Seok; Alouini, Mohamed-Slim
2017-01-01
In this paper, we statistically analyze the performance of a threshold-based parallel multiple beam selection scheme (TPMBS) for Free-space optical (FSO) based system with wavelength division multiplexing (WDM) in cases where a pointing error has
Synchronization Techniques in Parallel Discrete Event Simulation
Lindén, Jonatan
2018-01-01
Discrete event simulation is an important tool for evaluating system models in many fields of science and engineering. To improve the performance of large-scale discrete event simulations, several techniques to parallelize discrete event simulation have been developed. In parallel discrete event simulation, the work of a single discrete event simulation is distributed over multiple processing elements. A key challenge in parallel discrete event simulation is to ensure that causally dependent ...
Massively Parallel Finite Element Programming
Heister, Timo; Kronbichler, Martin; Bangerth, Wolfgang
2010-01-01
Today's large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
Massively Parallel Finite Element Programming
Heister, Timo
2010-01-01
Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
Parallel Algorithms for the Exascale Era
Energy Technology Data Exchange (ETDEWEB)
Robey, Robert W. [Los Alamos National Laboratory
2016-10-19
New parallel algorithms are needed to reach the Exascale level of parallelism with millions of cores. We look at some of the research developed by students in projects at LANL. The research blends ideas from the early days of computing while weaving in the fresh approach brought by students new to the field of high performance computing. We look at reproducibility of global sums and why it is important to parallel computing. Next we look at how the concept of hashing has led to the development of more scalable algorithms suitable for next-generation parallel computers. Nearly all of this work has been done by undergraduates and published in leading scientific journals.
Enzyme alterations in mediastine during and after radiotherapy. 2
International Nuclear Information System (INIS)
Alheit, H.D.; Alheit, C.; Herrmann, T.
1986-01-01
Results are presented estimating the serum activity of transaminases (ASAT and ALAT) in 72 patients after mediastinal irradiation. During and after mediastinal irradiation both enzymes showed essentially a parallel reaction. One day after irradiation a decrease of enzymes in patients who were irradiated with high single dosis (5 Gy) was observed, while patients irradiated with low or middle single dosis showed an increase of enzyme activity. A different temporal enzyme reaction is discussed to be the cause for this reaction in dependence on the applied single dose so that in patients with high single doses an initial enzyme increase caused by the radiation insult has changed into a following decrease under the starting level at the first control 24 hours later. Because patients without mediastinal tumors react in the same manner, the normal tissue surrounding the tumor is discussed to be the original place of enzyme secretion. Up to the end of irradiation a decrease of enzymes was observed in patients with high single dose or with high total dose (60 Gy) which is interpreted as an enzyme deficiency in tissue in consequence of destruction in formation places. In patients with middle total and low single doses an enzyme increase is registered with a still sufficient restoration capacity of the tissue discussed to be the cause of it. An enzyme increase, observed from the end of irradiation to the control date 3 to 6 months after irradiation, is mainly caused by a tumor progression (increased rate of liver metastases, especially in bronchial carcinoma) and can still be intensified by occurrence of pulmonal or cardiac radioreactions. (author)
Semi-coarsening multigrid methods for parallel computing
Energy Technology Data Exchange (ETDEWEB)
Jones, J.E.
1996-12-31
Standard multigrid methods are not well suited for problems with anisotropic coefficients which can occur, for example, on grids that are stretched to resolve a boundary layer. There are several different modifications of the standard multigrid algorithm that yield efficient methods for anisotropic problems. In the paper, we investigate the parallel performance of these multigrid algorithms. Multigrid algorithms which work well for anisotropic problems are based on line relaxation and/or semi-coarsening. In semi-coarsening multigrid algorithms a grid is coarsened in only one of the coordinate directions unlike standard or full-coarsening multigrid algorithms where a grid is coarsened in each of the coordinate directions. When both semi-coarsening and line relaxation are used, the resulting multigrid algorithm is robust and automatic in that it requires no knowledge of the nature of the anisotropy. This is the basic multigrid algorithm whose parallel performance we investigate in the paper. The algorithm is currently being implemented on an IBM SP2 and its performance is being analyzed. In addition to looking at the parallel performance of the basic semi-coarsening algorithm, we present algorithmic modifications with potentially better parallel efficiency. One modification reduces the amount of computational work done in relaxation at the expense of using multiple coarse grids. This modification is also being implemented with the aim of comparing its performance to that of the basic semi-coarsening algorithm.
Energy Technology Data Exchange (ETDEWEB)
Fang, Chin [SLAC National Accelerator Lab., Menlo Park, CA (United States); Corttrell, R. A. [SLAC National Accelerator Lab., Menlo Park, CA (United States)
2015-05-06
This Technical Note provides an overview of high-performance parallel Big Data transfers with and without encryption for data in-transit over multiple network channels. It shows that with the parallel approach, it is feasible to carry out high-performance parallel "encrypted" Big Data transfers without serious impact to throughput. But other impacts, e.g. the energy-consumption part should be investigated. It also explains our rationales of using a statistics-based approach for gaining understanding from test results and for improving the system. The presentation is of high-level nature. Nevertheless, at the end we will pose some questions and identify potentially fruitful directions for future work.
Parallel-In-Time For Moving Meshes
Energy Technology Data Exchange (ETDEWEB)
Falgout, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Manteuffel, T. A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Southworth, B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Schroder, J. B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)
2016-02-04
With steadily growing computational resources available, scientists must develop e ective ways to utilize the increased resources. High performance, highly parallel software has be- come a standard. However until recent years parallelism has focused primarily on the spatial domain. When solving a space-time partial di erential equation (PDE), this leads to a sequential bottleneck in the temporal dimension, particularly when taking a large number of time steps. The XBraid parallel-in-time library was developed as a practical way to add temporal parallelism to existing se- quential codes with only minor modi cations. In this work, a rezoning-type moving mesh is applied to a di usion problem and formulated in a parallel-in-time framework. Tests and scaling studies are run using XBraid and demonstrate excellent results for the simple model problem considered herein.
Parallel community climate model: Description and user`s guide
Energy Technology Data Exchange (ETDEWEB)
Drake, J.B.; Flanery, R.E.; Semeraro, B.D.; Worley, P.H. [and others
1996-07-15
This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain into geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.
Introduction to massively-parallel computing in high-energy physics
AUTHOR|(CDS)2083520
1993-01-01
Ever since computers were first used for scientific and numerical work, there has existed an "arms race" between the technical development of faster computing hardware, and the desires of scientists to solve larger problems in shorter time-scales. However, the vast leaps in processor performance achieved through advances in semi-conductor science have reached a hiatus as the technology comes up against the physical limits of the speed of light and quantum effects. This has lead all high performance computer manufacturers to turn towards a parallel architecture for their new machines. In these lectures we will introduce the history and concepts behind parallel computing, and review the various parallel architectures and software environments currently available. We will then introduce programming methodologies that allow efficient exploitation of parallel machines, and present case studies of the parallelization of typical High Energy Physics codes for the two main classes of parallel computing architecture (S...
Parallel hierarchical radiosity rendering
Energy Technology Data Exchange (ETDEWEB)
Carter, Michael [Iowa State Univ., Ames, IA (United States)
1993-07-01
In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.
Reducing Concurrency Bottlenecks in Parallel I/O Workloads
Energy Technology Data Exchange (ETDEWEB)
Manzanares, Adam C. [Los Alamos National Laboratory; Bent, John M. [Los Alamos National Laboratory; Wingate, Meghan [Los Alamos National Laboratory
2011-01-01
To enable high performance parallel checkpointing we introduced the Parallel Log Structured File System (PLFS). PLFS is middleware interposed on the file system stack to transform concurrent writing of one application file into many non-concurrently written component files. The promising effectiveness of PLFS makes it important to examine its performance for workloads other than checkpoint capture, notably the different ways that state snapshots may be later read, to make the case for using PLFS in the Exascale I/O stack. Reading a PLFS file involved reading each of its component files. In this paper we identify performance limitations on broader workloads in an early version of PLFS, specifically the need to build and distribute an index for the overall file, and the pressure on the underlying parallel file system's metadata server, and show how PLFS's decomposed components architecture can be exploited to alleviate bottlenecks in the underlying parallel file system.
21 CFR 864.9400 - Stabilized enzyme solution.
2010-04-01
... 21 Food and Drugs 8 2010-04-01 2010-04-01 false Stabilized enzyme solution. 864.9400 Section 864... and Blood Products § 864.9400 Stabilized enzyme solution. (a) Identification. A stabilized enzyme... enzyme solutions include papain, bromelin, ficin, and trypsin. (b) Classification. Class II (performance...
Evaluating parallel optimization on transputers
Directory of Open Access Journals (Sweden)
A.G. Chalmers
2003-12-01
Full Text Available The faster processing power of modern computers and the development of efficient algorithms have made it possible for operations researchers to tackle a much wider range of problems than ever before. Further improvements in processing speed can be achieved utilising relatively inexpensive transputers to process components of an algorithm in parallel. The Davidon-Fletcher-Powell method is one of the most successful and widely used optimisation algorithms for unconstrained problems. This paper examines the algorithm and identifies the components that can be processed in parallel. The results of some experiments with these components are presented which indicates under what conditions parallel processing with an inexpensive configuration is likely to be faster than the traditional sequential implementations. The performance of the whole algorithm with its parallel components is then compared with the original sequential algorithm. The implementation serves to illustrate the practicalities of speeding up typical OR algorithms in terms of difficulty, effort and cost. The results give an indication of the savings in time a given parallel implementation can be expected to yield.
Customizable Memory Schemes for Data Parallel Architectures
Gou, C.
2011-01-01
Memory system efficiency is crucial for any processor to achieve high performance, especially in the case of data parallel machines. Processing capabilities of parallel lanes will be wasted, when data requests are not accomplished in a sustainable and timely manner. Irregular vector memory accesses
Performance of a plasma fluid code on the Intel parallel computers
International Nuclear Information System (INIS)
Lynch, V.E.; Carreras, B.A.; Drake, J.B.; Leboeuf, J.N.; Liewer, P.
1992-01-01
One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel σ machine gives an improvement factor close to 64 over the single-processor CRAY-2
Performance of a plasma fluid code on the Intel parallel computers
Lynch, V. E.; Carreras, B. A.; Drake, J. B.; Leboeuf, J. N.; Liewer, P.
1992-01-01
One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel (sigma) machine gives an improvement factor close to 64 over the single-processor CRAY-2.
Alghamdi, Amal Mohammed
2012-04-01
Clawpack, a conservation laws package implemented in Fortran, and its Python-based version, PyClaw, are existing tools providing nonlinear wave propagation solvers that use state of the art finite volume methods. Simulations using those tools can have extensive computational requirements to provide accurate results. Therefore, a number of tools, such as BearClaw and MPIClaw, have been developed based on Clawpack to achieve significant speedup by exploiting parallel architectures. However, none of them has been shown to scale on a large number of cores. Furthermore, these tools, implemented in Fortran, achieve parallelization by inserting parallelization logic and MPI standard routines throughout the serial code in a non modular manner. Our contribution in this thesis research is three-fold. First, we demonstrate an advantageous use case of Python in implementing easy-to-use modular extensible scalable scientific software tools by developing an implementation of a parallelization framework, PetClaw, for PyClaw using the well-known Portable Extensible Toolkit for Scientific Computation, PETSc, through its Python wrapper petsc4py. Second, we demonstrate the possibility of getting acceptable Python code performance when compared to Fortran performance after introducing a number of serial optimizations to the Python code including integrating Clawpack Fortran kernels into PyClaw for low-level computationally intensive parts of the code. As a result of those optimizations, the Python overhead in PetClaw for a shallow water application is only 12 percent when compared to the corresponding Fortran Clawpack application. Third, we provide a demonstration of PetClaw scalability on up to the entirety of Shaheen; a 16-rack Blue Gene/P IBM supercomputer that comprises 65,536 cores and located at King Abdullah University of Science and Technology (KAUST). The PetClaw solver achieved above 0.98 weak scaling efficiency for an Euler application on the whole machine excluding the
Cu₂O-Au nanocomposites for enzyme-free glucose sensing with enhanced performances.
Hu, Qiyan; Wang, Fenyun; Fang, Zhen; Liu, Xiaowang
2012-06-15
A facile method for the synthesis of Cu(2)O-Au nanocomposites has been reported by injecting Cu(2)O nanocubes into Au precursor directly with the assistance of ultrasound radiation at room temperature. The ultrasound radiation is not a necessary requirement but can make the distribution of Au nanoparticles more homogenous. The formation of Cu(2)O-Au nanocomposites is attributed to following two reasons. The first one is the difference in the reduction potential between Cu(2+)/Cu(2)O and AuCl(4)(-)/Au, which can also be considered as the driving force for the redox reaction. The other one is the low lattice mismatch between (200) planes of Cu(2)O and (200) facets of Au, which is favorable for the formation of heterostructure. The electrochemical investigation demonstrates that the performances of Cu(2)O nanocubes in enzyme-free glucose sensing have been improved significantly after the decoration of Au nanoparticles which may be derived from the polarization effect provided by Au nanoparticles. As-prepared Cu(2)O-Au nanocomposites have great potential in enzyme-free glucose sensing. Copyright © 2012 Elsevier B.V. All rights reserved.
Parallelization of a spherical Sn transport theory algorithm
International Nuclear Information System (INIS)
Haghighat, A.
1989-01-01
The work described in this paper derives a parallel algorithm for an R-dependent spherical S N transport theory algorithm and studies its performance by testing different sample problems. The S N transport method is one of the most accurate techniques used to solve the linear Boltzmann equation. Several studies have been done on the vectorization of the S N algorithms; however, very few studies have been performed on the parallelization of this algorithm. Weinke and Hommoto have looked at the parallel processing of the different energy groups, and Azmy recently studied the parallel processing of the inner iterations of an X-Y S N nodal transport theory method. Both studies have reported very encouraging results, which have prompted us to look at the parallel processing of an R-dependent S N spherical geometry algorithm. This geometry was chosen because, in spite of its simplicity, it contains the complications of the curvilinear geometries (i.e., redistribution of neutrons over the discretized angular bins)
CUBESIM, Hypercube and Denelcor Hep Parallel Computer Simulation
International Nuclear Information System (INIS)
Dunigan, T.H.
1988-01-01
1 - Description of program or function: CUBESIM is a set of subroutine libraries and programs for the simulation of message-passing parallel computers and shared-memory parallel computers. Subroutines are supplied to simulate the Intel hypercube and the Denelcor HEP parallel computers. The system permits a user to develop and test parallel programs written in C or FORTRAN on a single processor. The user may alter such hypercube parameters as message startup times, packet size, and the computation-to-communication ratio. The simulation generates a trace file that can be used for debugging, performance analysis, or graphical display. 2 - Method of solution: The CUBESIM simulator is linked with the user's parallel application routines to run as a single UNIX process. The simulator library provides a small operating system to perform process and message management. 3 - Restrictions on the complexity of the problem: Up to 128 processors can be simulated with a virtual memory limit of 6 million bytes. Up to 1000 processes can be simulated
A high-speed linear algebra library with automatic parallelism
Boucher, Michael L.
1994-01-01
Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
JANUS: A Compilation System for Balancing Parallelism and Performance in OpenVX
Omidian, Hossein; Lemieux, Guy G. F.
2018-04-01
Embedded systems typically do not have enough on-chip memory for entire an image buffer. Programming systems like OpenCV operate on entire image frames at each step, making them use excessive memory bandwidth and power. In contrast, the paradigm used by OpenVX is much more efficient; it uses image tiling, and the compilation system is allowed to analyze and optimize the operation sequence, specified as a compute graph, before doing any pixel processing. In this work, we are building a compilation system for OpenVX that can analyze and optimize the compute graph to take advantage of parallel resources in many-core systems or FPGAs. Using a database of prewritten OpenVX kernels, it automatically adjusts the image tile size as well as using kernel duplication and coalescing to meet a defined area (resource) target, or to meet a specified throughput target. This allows a single compute graph to target implementations with a wide range of performance needs or capabilities, e.g. from handheld to datacenter, that use minimal resources and power to reach the performance target.
International Nuclear Information System (INIS)
Orii, Shigeo
1998-06-01
A benchmark specification for performance evaluation of parallel computers for numerical analysis is proposed. Level 1 benchmark, which is a conventional type benchmark using processing time, measures performance of computers running a code. Level 2 benchmark proposed in this report is to give the reason of the performance. As an example, scalar-parallel computer SP2 is evaluated with this benchmark specification in case of a molecular dynamics code. As a result, the main causes to suppress the parallel performance are maximum band width and start-up time of communication between nodes. Especially the start-up time is proportional not only to the number of processors but also to the number of particles. (author)
Kalman Filter Tracking on Parallel Architectures
International Nuclear Information System (INIS)
Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi
2016-01-01
Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors such as GPGPU, ARM and Intel MIC. In order to achieve the theoretical performance gains of these processors, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High-Luminosity Large Hadron Collider (HL-LHC), for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques such as Cellular Automata or Hough Transforms. The most common track finding techniques in use today, however, are those based on a Kalman filter approach. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust, and are in use today at the LHC. Given the utility of the Kalman filter in track finding, we have begun to port these algorithms to parallel architectures, namely Intel Xeon and Xeon Phi. We report here on our progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a simplified experimental environment
Directory of Open Access Journals (Sweden)
Anja Peters
2015-09-01
Full Text Available The supplementation of exogenous fibrolytic enzymes (EFE to dairy cows diets could be a strategy to improve fiber degradation in the rumen which is especially important for the early lactating cows characterized by a high milk energy output and an insufficient energy intake. The objective of this study was to examine the effects of a fibrolytic enzyme product (Roxazyme G2 Liquid, 3.8 and 3.9 mL/kg total mixed ration [TMR] DM supplemented to a TMR on production performance and blood parameters of dairy cows during early (trial 1 and mid-lactation (trail 2. In addition, rumination activity was measured in trial 2. The nutrient digestibility of the experimental TMR was obtained by using wethers. In the digestibility trial, EFE was supplemented at a rate of 4.4 mL/kg Roxazyme G2 Liquid TMR-DM. The TMR contained 60% forage and 40% concentrate (DM basis. Twenty eight 50 ± 16 days in milk (DIM and twenty six 136 ± 26 DIM Holstein cows were used in two 8-wk completely randomized trails, stratified by parity and milk yield level. One milliliter of the enzyme product contained primarily cellulase and xylanase activities (8,000 units endo-1,4-ß glucanase, 18,000 units endo-1,3(4-ß glucanase and 26,000 units 1,4-ß xylanase. No differences in digestibility of DM, OM, CP, NDF and ADF were observed (P > 0.05 between the control and the EFE supplemented TMR. Addition of EFE to the TMR fed to early (trial 1 and mid-lactation cows (trial 2 did not affect daily dry matter intake (DMI, milk yield, 4% fat-corrected milk, energy-corrected milk (ECM, concentration of milk fat, protein, fat-protein-quotients, somatic cell score, energy balance, and gross feed efficiency of early and mid-lactation cows (P > 0.05. Mid-lactation cows (trial 2 fed with TMR enzyme showed a tendency of a slightly higher ECM yield (P = 0.09. The tested blood parameters were not affected by treatment in trials 1 and 2 (P > 0.05. Exogenous fibrolytic enzymes supplementation did not alter
Testing New Programming Paradigms with NAS Parallel Benchmarks
Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.
2000-01-01
Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage
Event parallelism: Distributed memory parallel computing for high energy physics experiments
International Nuclear Information System (INIS)
Nash, T.
1989-05-01
This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. 6 figs
Event parallelism: Distributed memory parallel computing for high energy physics experiments
International Nuclear Information System (INIS)
Nash, T.
1989-01-01
This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. (orig.)
Event parallelism: Distributed memory parallel computing for high energy physics experiments
Nash, Thomas
1989-12-01
This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.
Thermodynamic activity-based intrinsic enzyme kinetic sheds light on enzyme-solvent interactions.
Grosch, Jan-Hendrik; Wagner, David; Nistelkas, Vasilios; Spieß, Antje C
2017-01-01
The reaction medium has major impact on biocatalytic reaction systems and on their economic significance. To allow for tailored medium engineering, thermodynamic phenomena, intrinsic enzyme kinetics, and enzyme-solvent interactions have to be discriminated. To this end, enzyme reaction kinetic modeling was coupled with thermodynamic calculations based on investigations of the alcohol dehydrogenase from Lactobacillus brevis (LbADH) in monophasic water/methyl tert-butyl ether (MTBE) mixtures as a model solvent. Substrate concentrations and substrate thermodynamic activities were varied separately to identify the individual thermodynamic and kinetic effects on the enzyme activity. Microkinetic parameters based on concentration and thermodynamic activity were derived to successfully identify a positive effect of MTBE on the availability of the substrate to the enzyme, but a negative effect on the enzyme performance. In conclusion, thermodynamic activity-based kinetic modeling might be a suitable tool to initially curtail the type of enzyme-solvent interactions and thus, a powerful first step to potentially understand the phenomena that occur in nonconventional media in more detail. © 2016 American Institute of Chemical Engineers Biotechnol. Prog., 33:96-103, 2017. © 2016 American Institute of Chemical Engineers.
Characterizing and Mitigating Work Time Inflation in Task Parallel Programs
Directory of Open Access Journals (Sweden)
Stephen L. Olivier
2013-01-01
Full Text Available Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems. Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.
International Nuclear Information System (INIS)
Ding Yu; Qi Yujin; Zhang Xuezhu; Zhao Cuilan
2011-01-01
In this paper, we report the development of a high-performance image processing platform, which is based on CPU-GPU heterogeneous cluster. Currently, it consists of a Dell Precision T7500 and HP XW8600 workstations with parallel programming and runtime environment, using the message-passing interface (MPI) and CUDA (Compute Unified Device Architecture). We succeeded in developing parallel image processing techniques for 3D image reconstruction of X-ray micro-CT imaging. The results show that a GPU provides a computing efficiency of about 194 times faster than a single CPU, and the CPU-GPU clusters provides a computing efficiency of about 46 times faster than the CPU clusters. These meet the requirements of rapid 3D image reconstruction and real time image display. In conclusion, the use of CPU-GPU heterogeneous cluster is an effective way to build high-performance image processing platform. (authors)
Parallel segmented outlet flow high performance liquid chromatography with multiplexed detection
International Nuclear Information System (INIS)
Camenzuli, Michelle; Terry, Jessica M.; Shalliker, R. Andrew; Conlan, Xavier A.; Barnett, Neil W.; Francis, Paul S.
2013-01-01
Graphical abstract: -- Highlights: •Multiplexed detection for liquid chromatography. •‘Parallel segmented outlet flow’ distributes inner and outer portions of the analyte zone. •Three detectors were used simultaneously for the determination of opiate alkaloids. -- Abstract: We describe a new approach to multiplex detection for HPLC, exploiting parallel segmented outlet flow – a new column technology that provides pressure-regulated control of eluate flow through multiple outlet channels, which minimises the additional dead volume associated with conventional post-column flow splitting. Using three detectors: one UV-absorbance and two chemiluminescence systems (tris(2,2′-bipyridine)ruthenium(III) and permanganate), we examine the relative responses for six opium poppy (Papaver somniferum) alkaloids under conventional and multiplexed conditions, where approximately 30% of the eluate was distributed to each detector and the remaining solution directed to a collection vessel. The parallel segmented outlet flow mode of operation offers advantages in terms of solvent consumption, waste generation, total analysis time and solute band volume when applying multiple detectors to HPLC, but the manner in which each detection system is influenced by changes in solute concentration and solution flow rates must be carefully considered
Parallel segmented outlet flow high performance liquid chromatography with multiplexed detection
Energy Technology Data Exchange (ETDEWEB)
Camenzuli, Michelle [Australian Centre for Research on Separation Science (ACROSS), School of Science and Health, University of Western Sydney (Parramatta), Sydney, NSW (Australia); Terry, Jessica M. [Centre for Chemistry and Biotechnology, School of Life and Environmental Sciences, Deakin University, Geelong, Victoria 3216 (Australia); Shalliker, R. Andrew, E-mail: r.shalliker@uws.edu.au [Australian Centre for Research on Separation Science (ACROSS), School of Science and Health, University of Western Sydney (Parramatta), Sydney, NSW (Australia); Conlan, Xavier A.; Barnett, Neil W. [Centre for Chemistry and Biotechnology, School of Life and Environmental Sciences, Deakin University, Geelong, Victoria 3216 (Australia); Francis, Paul S., E-mail: paul.francis@deakin.edu.au [Centre for Chemistry and Biotechnology, School of Life and Environmental Sciences, Deakin University, Geelong, Victoria 3216 (Australia)
2013-11-25
Graphical abstract: -- Highlights: •Multiplexed detection for liquid chromatography. •‘Parallel segmented outlet flow’ distributes inner and outer portions of the analyte zone. •Three detectors were used simultaneously for the determination of opiate alkaloids. -- Abstract: We describe a new approach to multiplex detection for HPLC, exploiting parallel segmented outlet flow – a new column technology that provides pressure-regulated control of eluate flow through multiple outlet channels, which minimises the additional dead volume associated with conventional post-column flow splitting. Using three detectors: one UV-absorbance and two chemiluminescence systems (tris(2,2′-bipyridine)ruthenium(III) and permanganate), we examine the relative responses for six opium poppy (Papaver somniferum) alkaloids under conventional and multiplexed conditions, where approximately 30% of the eluate was distributed to each detector and the remaining solution directed to a collection vessel. The parallel segmented outlet flow mode of operation offers advantages in terms of solvent consumption, waste generation, total analysis time and solute band volume when applying multiple detectors to HPLC, but the manner in which each detection system is influenced by changes in solute concentration and solution flow rates must be carefully considered.
Emergence of dynamic cooperativity in the stochastic kinetics of fluctuating enzymes
International Nuclear Information System (INIS)
Kumar, Ashutosh; Chatterjee, Sambarta; Nandi, Mintu; Dua, Arti
2016-01-01
Dynamic co-operativity in monomeric enzymes is characterized in terms of a non-Michaelis-Menten kinetic behaviour. The latter is believed to be associated with mechanisms that include multiple reaction pathways due to enzymatic conformational fluctuations. Recent advances in single-molecule fluorescence spectroscopy have provided new fundamental insights on the possible mechanisms underlying reactions catalyzed by fluctuating enzymes. Here, we present a bottom-up approach to understand enzyme turnover kinetics at physiologically relevant mesoscopic concentrations informed by mechanisms extracted from single-molecule stochastic trajectories. The stochastic approach, presented here, shows the emergence of dynamic co-operativity in terms of a slowing down of the Michaelis-Menten (MM) kinetics resulting in negative co-operativity. For fewer enzymes, dynamic co-operativity emerges due to the combined effects of enzymatic conformational fluctuations and molecular discreteness. The increase in the number of enzymes, however, suppresses the effect of enzymatic conformational fluctuations such that dynamic co-operativity emerges solely due to the discrete changes in the number of reacting species. These results confirm that the turnover kinetics of fluctuating enzyme based on the parallel-pathway MM mechanism switches over to the single-pathway MM mechanism with the increase in the number of enzymes. For large enzyme numbers, convergence to the exact MM equation occurs in the limit of very high substrate concentration as the stochastic kinetics approaches the deterministic behaviour.
Emergence of dynamic cooperativity in the stochastic kinetics of fluctuating enzymes
Energy Technology Data Exchange (ETDEWEB)
Kumar, Ashutosh; Chatterjee, Sambarta; Nandi, Mintu; Dua, Arti, E-mail: arti@iitm.ac.in [Department of Chemistry, Indian Institute of Technology, Madras, Chennai 600036 (India)
2016-08-28
Dynamic co-operativity in monomeric enzymes is characterized in terms of a non-Michaelis-Menten kinetic behaviour. The latter is believed to be associated with mechanisms that include multiple reaction pathways due to enzymatic conformational fluctuations. Recent advances in single-molecule fluorescence spectroscopy have provided new fundamental insights on the possible mechanisms underlying reactions catalyzed by fluctuating enzymes. Here, we present a bottom-up approach to understand enzyme turnover kinetics at physiologically relevant mesoscopic concentrations informed by mechanisms extracted from single-molecule stochastic trajectories. The stochastic approach, presented here, shows the emergence of dynamic co-operativity in terms of a slowing down of the Michaelis-Menten (MM) kinetics resulting in negative co-operativity. For fewer enzymes, dynamic co-operativity emerges due to the combined effects of enzymatic conformational fluctuations and molecular discreteness. The increase in the number of enzymes, however, suppresses the effect of enzymatic conformational fluctuations such that dynamic co-operativity emerges solely due to the discrete changes in the number of reacting species. These results confirm that the turnover kinetics of fluctuating enzyme based on the parallel-pathway MM mechanism switches over to the single-pathway MM mechanism with the increase in the number of enzymes. For large enzyme numbers, convergence to the exact MM equation occurs in the limit of very high substrate concentration as the stochastic kinetics approaches the deterministic behaviour.
Emergence of dynamic cooperativity in the stochastic kinetics of fluctuating enzymes
Kumar, Ashutosh; Chatterjee, Sambarta; Nandi, Mintu; Dua, Arti
2016-08-01
Dynamic co-operativity in monomeric enzymes is characterized in terms of a non-Michaelis-Menten kinetic behaviour. The latter is believed to be associated with mechanisms that include multiple reaction pathways due to enzymatic conformational fluctuations. Recent advances in single-molecule fluorescence spectroscopy have provided new fundamental insights on the possible mechanisms underlying reactions catalyzed by fluctuating enzymes. Here, we present a bottom-up approach to understand enzyme turnover kinetics at physiologically relevant mesoscopic concentrations informed by mechanisms extracted from single-molecule stochastic trajectories. The stochastic approach, presented here, shows the emergence of dynamic co-operativity in terms of a slowing down of the Michaelis-Menten (MM) kinetics resulting in negative co-operativity. For fewer enzymes, dynamic co-operativity emerges due to the combined effects of enzymatic conformational fluctuations and molecular discreteness. The increase in the number of enzymes, however, suppresses the effect of enzymatic conformational fluctuations such that dynamic co-operativity emerges solely due to the discrete changes in the number of reacting species. These results confirm that the turnover kinetics of fluctuating enzyme based on the parallel-pathway MM mechanism switches over to the single-pathway MM mechanism with the increase in the number of enzymes. For large enzyme numbers, convergence to the exact MM equation occurs in the limit of very high substrate concentration as the stochastic kinetics approaches the deterministic behaviour.
Performance of a plasma fluid code on the Intel parallel computers
International Nuclear Information System (INIS)
Lynch, V.E.; Carreras, B.A.; Drake, J.B.; Leboeuf, J.N.; Liewer, P.
1992-01-01
One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel (sigma) machine gives an improvement factor close to 64 over the single-processor CRAY-2. 12 refs
Parallel computing by Monte Carlo codes MVP/GMVP
International Nuclear Information System (INIS)
Nagaya, Yasunobu; Nakagawa, Masayuki; Mori, Takamasa
2001-01-01
General-purpose Monte Carlo codes MVP/GMVP are well-vectorized and thus enable us to perform high-speed Monte Carlo calculations. In order to achieve more speedups, we parallelized the codes on the different types of parallel computing platforms or by using a standard parallelization library MPI. The platforms used for benchmark calculations are a distributed-memory vector-parallel computer Fujitsu VPP500, a distributed-memory massively parallel computer Intel paragon and a distributed-memory scalar-parallel computer Hitachi SR2201, IBM SP2. As mentioned generally, linear speedup could be obtained for large-scale problems but parallelization efficiency decreased as the batch size per a processing element(PE) was smaller. It was also found that the statistical uncertainty for assembly powers was less than 0.1% by the PWR full-core calculation with more than 10 million histories and it took about 1.5 hours by massively parallel computing. (author)
Enzyme based soil stabilization for unpaved road construction
Directory of Open Access Journals (Sweden)
Renjith Rintu
2017-01-01
Full Text Available Enzymes as soil stabilizers have been successfully used in road construction in several countries for the past 30 years. However, research has shown that the successful application of these enzymes is case specific, emphasizing that enzyme performance is dependent on subgrade soil type, condition and the type of enzyme used as the stabilizer. A universal standard or a tool for road engineers to assess the performance of stabilized unbound pavements using well-established enzymes is not available to date. The research aims to produce a validated assessment tool which can be used to predict strength enhancement within a generalized statistical framework. The objective of the present study is to identify new materials for developing the assessment tool which supports enzyme based stabilization, as well as to identify the correct construction sequence for such new materials. A series of characterization tests were conducted on several soil types obtained from proposed construction sites. Having identified the suitable soil type to mix with the enzyme, a trial road construction has been performed to investigate the efficiency of the enzyme stabilization along with the correct construction sequence. The enzyme stabilization has showed significant improvement of the road performance as was evidenced from the test results which were based on site soil obtained before and after stabilization. The research will substantially benefit the road construction industry by not only replacing traditional construction methods with economical/reliable approaches, but also eliminating site specific tests required in current practice of enzyme based road construction.
Directory of Open Access Journals (Sweden)
Mustofa Mustofa
2017-03-01
Full Text Available This paper presents the performance of an unglazed polycrystalline photovoltaic-thermal PVT on 0.045 kg/s mass flow rate. PVT combine photovoltaic modules and solar thermal collectors, forming a single device that receive solar radiation and produces heat and electricity simultaneously. The collector figures out serpentine-parallel tubes that can prolong fluid heat conductivity from morning till afternoon. During testing, cell PV, inlet and outlet fluid temperaturs were recorded by thermocouple digital LM35 Arduino Mega 2560. Panel voltage and electric current were also noted in which they were connected to computer and presented each second data recorded. But, in this performance only shows in the certain significant time data. This because the electric current was only noted by multimeter device not the digital one. Based on these testing data, average cell efficieny was about 19%, while thermal efficiency of above 50% and correspondeng cell efficiency of 11%, respectively
Schoonen, Lise; Nolte, Roeland J. M.; van Hest, Jan C. M.
2016-07-01
The study of enzyme behavior in small nanocompartments is crucial for the understanding of biocatalytic processes in the cellular environment. We have developed an enzymatic conjugation strategy to attach a model enzyme to the interior of a cowpea chlorotic mottle virus capsid. It is shown that with this methodology high encapsulation efficiencies can be achieved. Additionally, we demonstrate that the encapsulation does not affect the enzyme performance in terms of a decreased activity or a hampered substrate diffusion. Finally, it is shown that the encapsulated enzymes are protected against proteases. We believe that our strategy can be used to study enzyme kinetics in an environment that approaches physiological conditions.The study of enzyme behavior in small nanocompartments is crucial for the understanding of biocatalytic processes in the cellular environment. We have developed an enzymatic conjugation strategy to attach a model enzyme to the interior of a cowpea chlorotic mottle virus capsid. It is shown that with this methodology high encapsulation efficiencies can be achieved. Additionally, we demonstrate that the encapsulation does not affect the enzyme performance in terms of a decreased activity or a hampered substrate diffusion. Finally, it is shown that the encapsulated enzymes are protected against proteases. We believe that our strategy can be used to study enzyme kinetics in an environment that approaches physiological conditions. Electronic supplementary information (ESI) available: Experimental procedures for the cloning, expression, and purification of all proteins, as well as supplementary figures and calculations. See DOI: 10.1039/c6nr04181g
SPINning parallel systems software
International Nuclear Information System (INIS)
Matlin, O.S.; Lusk, E.; McCune, W.
2002-01-01
We describe our experiences in using Spin to verify parts of the Multi Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of processes connected by Unix network sockets. MPD is dynamic processes and connections among them are created and destroyed as MPD is initialized, runs user processes, recovers from faults, and terminates. This dynamic nature is easily expressible in the Spin/Promela framework but poses performance and scalability challenges. We present here the results of expressing some of the parallel algorithms of MPD and executing both simulation and verification runs with Spin
Parallel Application Development Using Architecture View Driven Model Transformations
Arkin, E.; Tekinerdogan, B.
2015-01-01
o realize the increased need for computing performance the current trend is towards applying parallel computing in which the tasks are run in parallel on multiple nodes. On its turn we can observe the rapid increase of the scale of parallel computing platforms. This situation has led to a complexity
Parallel Programming with Intel Parallel Studio XE
Blair-Chappell , Stephen
2012-01-01
Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the
Discrete Hadamard transformation algorithm's parallelism analysis and achievement
Hu, Hui
2009-07-01
With respect to Discrete Hadamard Transformation (DHT) wide application in real-time signal processing while limitation in operation speed of DSP. The article makes DHT parallel research and its parallel performance analysis. Based on multiprocessor platform-TMS320C80 programming structure, the research is carried out to achieve two kinds of parallel DHT algorithms. Several experiments demonstrated the effectiveness of the proposed algorithms.
Study on MPI/OpenMP hybrid parallelism for Monte Carlo neutron transport code
International Nuclear Information System (INIS)
Liang Jingang; Xu Qi; Wang Kan; Liu Shiwen
2013-01-01
Parallel programming with mixed mode of messages-passing and shared-memory has several advantages when used in Monte Carlo neutron transport code, such as fitting hardware of distributed-shared clusters, economizing memory demand of Monte Carlo transport, improving parallel performance, and so on. MPI/OpenMP hybrid parallelism was implemented based on a one dimension Monte Carlo neutron transport code. Some critical factors affecting the parallel performance were analyzed and solutions were proposed for several problems such as contention access, lock contention and false sharing. After optimization the code was tested finally. It is shown that the hybrid parallel code can reach good performance just as pure MPI parallel program, while it saves a lot of memory usage at the same time. Therefore hybrid parallel is efficient for achieving large-scale parallel of Monte Carlo neutron transport. (authors)
Parallel Robot for Lower Limb Rehabilitation Exercises
Directory of Open Access Journals (Sweden)
Alireza Rastegarpanah
2016-01-01
Full Text Available The aim of this study is to investigate the capability of a 6-DoF parallel robot to perform various rehabilitation exercises. The foot trajectories of twenty healthy participants have been measured by a Vicon system during the performing of four different exercises. Based on the kinematics and dynamics of a parallel robot, a MATLAB program was developed in order to calculate the length of the actuators, the actuators’ forces, workspace, and singularity locus of the robot during the performing of the exercises. The calculated length of the actuators and the actuators’ forces were used by motion analysis in SolidWorks in order to simulate different foot trajectories by the CAD model of the robot. A physical parallel robot prototype was built in order to simulate and execute the foot trajectories of the participants. Kinect camera was used to track the motion of the leg’s model placed on the robot. The results demonstrate the robot’s capability to perform a full range of various rehabilitation exercises.
Directory of Open Access Journals (Sweden)
Alexander G Bulakhov
Full Text Available Penicillium verruculosum is an efficient producer of highly active cellulase multienzyme system. One of the approaches for enhancing cellulase performance in hydrolysis of cellulosic substrates is to enrich the reaction system with β -glucosidase and/or accessory enzymes, such as lytic polysaccharide monooxygenases (LPMO displaying a synergism with cellulases.Genes bglI, encoding β-glucosidase from Aspergillus niger (AnBGL, and eglIV, encoding LPMO (formerly endoglucanase IV from Trichoderma reesei (TrLPMO, were cloned and expressed by P. verruculosum B1-537 strain under the control of the inducible gla1 gene promoter. Content of the heterologous AnBGL in the secreted multienzyme cocktails (hBGL1, hBGL2 and hBGL3 varied from 4 to 10% of the total protein, while the content of TrLPMO in the hLPMO sample was ~3%. The glucose yields in 48-h hydrolysis of Avicel and milled aspen wood by the hBGL1, hBGL2 and hBGL3 preparations increased by up to 99 and 80%, respectively, relative to control enzyme preparations without the heterologous AnBGL (at protein loading 5 mg/g substrate for all enzyme samples. The heterologous TrLPMO in the hLPMO preparation boosted the conversion of the lignocellulosic substrate by 10-43%; however, in hydrolysis of Avicel the hLPMO sample was less effective than the control preparations. The highest product yield in hydrolysis of aspen wood was obtained when the hBGL2 and hLPMO preparations were used at the ratio 1:1.The enzyme preparations produced by recombinant P. verruculosum strains, expressing the heterologous AnBGL or TrLPMO under the control of the gla1 gene promoter in a starch-containing medium, proved to be more effective in hydrolysis of a lignocellulosic substrate than control enzyme preparations without the heterologous enzymes. The enzyme composition containing both AnBGL and TrLPMO demonstrated the highest performance in lignocellulose hydrolysis, providing a background for developing a fungal strain capable
Differences Between Distributed and Parallel Systems
Energy Technology Data Exchange (ETDEWEB)
Brightwell, R.; Maccabe, A.B.; Rissen, R.
1998-10-01
Distributed systems have been studied for twenty years and are now coming into wider use as fast networks and powerful workstations become more readily available. In many respects a massively parallel computer resembles a network of workstations and it is tempting to port a distributed operating system to such a machine. However, there are significant differences between these two environments and a parallel operating system is needed to get the best performance out of a massively parallel system. This report characterizes the differences between distributed systems, networks of workstations, and massively parallel systems and analyzes the impact of these differences on operating system design. In the second part of the report, we introduce Puma, an operating system specifically developed for massively parallel systems. We describe Puma portals, the basic building blocks for message passing paradigms implemented on top of Puma, and show how the differences observed in the first part of the report have influenced the design and implementation of Puma.
Reproductive performance of female goats fed life-enzyme ...
African Journals Online (AJOL)
Direct-fed-microbes (DFM) (life-enzyme) was prepared in a traditional setting using Zymomonas mobilis (bacteria from palm sap) to ferment sawdust. The result revealed an improvement in the nutrient content of the sawdust and its feed values (protein, fibre etc.), and the feed usage efficiency. The reproductive ...
Adapting algorithms to massively parallel hardware
Sioulas, Panagiotis
2016-01-01
In the recent years, the trend in computing has shifted from delivering processors with faster clock speeds to increasing the number of cores per processor. This marks a paradigm shift towards parallel programming in which applications are programmed to exploit the power provided by multi-cores. Usually there is gain in terms of the time-to-solution and the memory footprint. Specifically, this trend has sparked an interest towards massively parallel systems that can provide a large number of processors, and possibly computing nodes, as in the GPUs and MPPAs (Massively Parallel Processor Arrays). In this project, the focus was on two distinct computing problems: k-d tree searches and track seeding cellular automata. The goal was to adapt the algorithms to parallel systems and evaluate their performance in different cases.
Implementing Shared Memory Parallelism in MCBEND
Directory of Open Access Journals (Sweden)
Bird Adam
2017-01-01
Full Text Available MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers’s ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.
Peformance Tuning and Evaluation of a Parallel Community Climate Model
Energy Technology Data Exchange (ETDEWEB)
Drake, J.B.; Worley, P.H.; Hammond, S.
1999-11-13
The Parallel Community Climate Model (PCCM) is a message-passing parallelization of version 2.1 of the Community Climate Model (CCM) developed by researchers at Argonne and Oak Ridge National Laboratories and at the National Center for Atmospheric Research in the early to mid 1990s. In preparation for use in the Department of Energy's Parallel Climate Model (PCM), PCCM has recently been updated with new physics routines from version 3.2 of the CCM, improvements to the parallel implementation, and ports to the SGIKray Research T3E and Origin 2000. We describe our experience in porting and tuning PCCM on these new platforms, evaluating the performance of different parallel algorithm options and comparing performance between the T3E and Origin 2000.
Ackerman, S B; Kelley, E A
1983-03-01
The performance of a fiberoptic probe colorimeter (model PC800; Brinkmann Instruments, Inc., Westbury, N.Y.) for quantitating enzymatic or colorimetric assays in 96-well microtiter plates was compared with the performances of a spectrophotometer (model 240; Gilford Instrument Laboratories, Inc., Oberlin, Ohio) and a commercially available enzyme immunoassay reader (model MR590; Dynatech Laboratories, Inc., Alexandria, Va.). Alkaline phosphatase-p-nitrophenyl phosphate in 3 M NaOH was used as the chromophore source. Six types of plates were evaluated for use with the probe colorimeter; they generated reproducibility values (100% coefficient of variation) ranging from 91 to 98% when one individual made 24 independent measurements on the same dilution of chromophore on each plate. Eleven individuals each performed 24 measurements with the colorimeter on either a visually light (absorbance of 0.10 at 420 nm) or a dark (absorbance of 0.80 at 420 nm) dilution of chromophore; reproducibilities averaged 87% for the light dilution and 97% for the dark dilution. When one individual measured the same chromophore sample at least 20 times in the colorimeter, in the spectrophotometer or in the enzyme immunoassay reader, reproducibility for each instrument was greater than 99%. Measurements of a dilution series of chromophore in a fixed volume indicated that the optical responses of each instrument were linear in a range of 0.05 to 1.10 absorbance units.
Energy Technology Data Exchange (ETDEWEB)
Ullmann, P.
2007-06-15
The primary objective of this work was the first experimental realization of parallel RF transmission for accelerating spatially selective excitation in magnetic resonance imaging. Furthermore, basic aspects regarding the performance of this technique were investigated, potential risks regarding the specific absorption rate (SAR) were considered and feasibility studies under application-oriented conditions as first steps towards a practical utilisation of this technique were undertaken. At first, based on the RF electronics platform of the Bruker Avance MRI systems, the technical foundations were laid to perform simultaneous transmission of individual RF waveforms on different RF channels. Another essential requirement for the realization of Parallel Excitation (PEX) was the design and construction of suitable RF transmit arrays with elements driven by separate transmit channels. In order to image the PEX results two imaging methods were implemented based on a spin-echo and a gradient-echo sequence, in which a parallel spatially selective pulse was included as an excitation pulse. In the course of this work PEX experiments were successfully performed on three different MRI systems, a 4.7 T and a 9.4 T animal system and a 3 T human scanner, using 5 different RF coil setups in total. In the last part of this work investigations regarding possible applications of Parallel Excitation were performed. A first study comprised experiments of slice-selective B1 inhomogeneity correction by using 3D-selective Parallel Excitation. The investigations were performed in a phantom as well as in a rat fixed in paraformaldehyde solution. In conjunction with these experiments a novel method of calculating RF pulses for spatially selective excitation based on a so-called Direct Calibration approach was developed, which is particularly suitable for this type of experiments. In the context of these experiments it was demonstrated how to combine the advantages of parallel transmission
A parallel approach to the stable marriage problem
DEFF Research Database (Denmark)
Larsen, Jesper
1997-01-01
This paper describes two parallel algorithms for the stable marriage problem implemented on a MIMD parallel computer. The algorithms are tested against sequential algorithms on randomly generated and worst-case instances. The results clearly show that the combination fo a very simple problem...... and a commercial MIMD system results in parallel algorithms which are not competitive with sequential algorithms wrt. practical performance. 1 Introduction In 1962 the Stable Marriage Problem was....
Nurhasanah, F.; Kusumah, Y. S.; Sabandar, J.; Suryadi, D.
2018-05-01
As one of the non-conventional mathematics concepts, Parallel Coordinates is potential to be learned by pre-service mathematics teachers in order to give them experiences in constructing richer schemes and doing abstraction process. Unfortunately, the study related to this issue is still limited. This study wants to answer a research question “to what extent the abstraction process of pre-service mathematics teachers in learning concept of Parallel Coordinates could indicate their performance in learning Analytic Geometry”. This is a case study that part of a larger study in examining mathematical abstraction of pre-service mathematics teachers in learning non-conventional mathematics concept. Descriptive statistics method is used in this study to analyze the scores from three different tests: Cartesian Coordinate, Parallel Coordinates, and Analytic Geometry. The participants in this study consist of 45 pre-service mathematics teachers. The result shows that there is a linear association between the score on Cartesian Coordinate and Parallel Coordinates. There also found that the higher levels of the abstraction process in learning Parallel Coordinates are linearly associated with higher student achievement in Analytic Geometry. The result of this study shows that the concept of Parallel Coordinates has a significant role for pre-service mathematics teachers in learning Analytic Geometry.
Distance-two interpolation for parallel algebraic multigrid
International Nuclear Information System (INIS)
Sterck, H de; Falgout, R D; Nolting, J W; Yang, U M
2007-01-01
In this paper we study the use of long distance interpolation methods with the low complexity coarsening algorithm PMIS. AMG performance and scalability is compared for classical as well as long distance interpolation methods on parallel computers. It is shown that the increased interpolation accuracy largely restores the scalability of AMG convergence factors for PMIS-coarsened grids, and in combination with complexity reducing methods, such as interpolation truncation, one obtains a class of parallel AMG methods that enjoy excellent scalability properties on large parallel computers
An Expert System for the Development of Efficient Parallel Code
Jost, Gabriele; Chun, Robert; Jin, Hao-Qiang; Labarta, Jesus; Gimenez, Judit
2004-01-01
We have built the prototype of an expert system to assist the user in the development of efficient parallel code. The system was integrated into the parallel programming environment that is currently being developed at NASA Ames. The expert system interfaces to tools for automatic parallelization and performance analysis. It uses static program structure information and performance data in order to automatically determine causes of poor performance and to make suggestions for improvements. In this paper we give an overview of our programming environment, describe the prototype implementation of our expert system, and demonstrate its usefulness with several case studies.
PRISMA/DB: A Parallel Main-Memory Relational DBMS
Apers, Peter M.G.; Flokstra, Jan; van den Berg, Carel A.; Grefen, P.W.P.J.; Wilschut, A.N.; Kersten, Martin L.; van den Berg, C.A.
1992-01-01
PRISMA/DB, a full-fledged parallel, main memory relational database management system (DBMS) is described. PRISMA/DB's high performance is obtained by the use of parallelism for query processing and main memory storage of the entire database. A flexible architecture for experimenting with
Frame-Based and Subpicture-Based Parallelization Approaches of the HEVC Video Encoder
Directory of Open Access Journals (Sweden)
Héctor Migallón
2018-05-01
Full Text Available The most recent video coding standard, High Efficiency Video Coding (HEVC, is able to significantly improve the compression performance at the expense of a huge computational complexity increase with respect to its predecessor, H.264/AVC. Parallel versions of the HEVC encoder may help to reduce the overall encoding time in order to make it more suitable for practical applications. In this work, we study two parallelization strategies. One of them follows a coarse-grain approach, where parallelization is based on frames, and the other one follows a fine-grain approach, where parallelization is performed at subpicture level. Two different frame-based approaches have been developed. The first one only uses MPI and the second one is a hybrid MPI/OpenMP algorithm. An exhaustive experimental test was carried out to study the performance of both approaches in order to find out the best setup in terms of parallel efficiency and coding performance. Both frame-based and subpicture-based approaches are compared under the same hardware platform. Although subpicture-based schemes provide an excellent performance with high-resolution video sequences, scalability is limited by resolution, and the coding performance worsens by increasing the number of processes. Conversely, the proposed frame-based approaches provide the best results with respect to both parallel performance (increasing scalability and coding performance (not degrading the rate/distortion behavior.
Mohammadi Gheisar, M; Hosseindoust, A; Kim, I H
2016-06-01
This research was conducted to study the performance and carcass parameters of broiler chickens fed diets supplemented with heat-treated non-starch polysaccharide degrading enzyme. A total of 432 one-day old Ross 308 broiler chickens were allocated to five treatments: (i) CON (basal diet), (ii) E1: CON + 0.05% multi-enzyme, (iii) E2: CON + 0.1% multi-enzyme, (iv) E3: CON + 0.05% thermo-resistant multi-enzyme and (v) E4: CON + 0.1% thermo-resistant multi-enzyme, each treatment consisted of six replications and 12 chickens in each replication. The chickens were housed in three floor battery cages during 28-day experimental period. On days 1-7, gain in body weight (BWG) improved by feeding the diets supplemented with thermo-resistant multi-enzyme. On days 7-21 and 1-28, chickens fed the diets containing thermo-resistant multi-enzyme showed improved (p thermo-resistant multi-enzyme affected the percentage of drip loss on d 1 (p thermo-resistant multi-enzyme did not affect the relative weights of organs but compared to CON group, relative weight of breast muscle increased and abdominal fat decreased (p thermo-resistant multi-enzyme showed higher (p thermo-resistant multi-enzyme improved performance of broiler chickens. Journal of Animal Physiology and Animal Nutrition © 2015 Blackwell Verlag GmbH.
A parallel calibration utility for WRF-Hydro on high performance computers
Wang, J.; Wang, C.; Kotamarthi, V. R.
2017-12-01
A successful modeling of complex hydrological processes comprises establishing an integrated hydrological model which simulates the hydrological processes in each water regime, calibrates and validates the model performance based on observation data, and estimates the uncertainties from different sources especially those associated with parameters. Such a model system requires large computing resources and often have to be run on High Performance Computers (HPC). The recently developed WRF-Hydro modeling system provides a significant advancement in the capability to simulate regional water cycles more completely. The WRF-Hydro model has a large range of parameters such as those in the input table files — GENPARM.TBL, SOILPARM.TBL and CHANPARM.TBL — and several distributed scaling factors such as OVROUGHRTFAC. These parameters affect the behavior and outputs of the model and thus may need to be calibrated against the observations in order to obtain a good modeling performance. Having a parameter calibration tool specifically for automate calibration and uncertainty estimates of WRF-Hydro model can provide significant convenience for the modeling community. In this study, we developed a customized tool using the parallel version of the model-independent parameter estimation and uncertainty analysis tool, PEST, to enabled it to run on HPC with PBS and SLURM workload manager and job scheduler. We also developed a series of PEST input file templates that are specifically for WRF-Hydro model calibration and uncertainty analysis. Here we will present a flood case study occurred in April 2013 over Midwest. The sensitivity and uncertainties are analyzed using the customized PEST tool we developed.
Wu, Xingfu; Taylor, Valerie
2011-01-01
The NAS Parallel Benchmarks (NPB) are well-known applications with the fixed algorithms for evaluating parallel systems and tools. Multicore supercomputers provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node and MPI can be used with the communication between nodes. In this paper, we use SP and BT benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore supercomputers. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76%, and the hybrid BT outperforms the MPI BT by up to 8.58% on up to 10,000 cores on BlueGene/P at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. We also use performance tools and MPI trace libraries available on these supercomputers to further investigate the performance characteristics of the hybrid SP and BT.
Wu, Xingfu
2011-03-29
The NAS Parallel Benchmarks (NPB) are well-known applications with the fixed algorithms for evaluating parallel systems and tools. Multicore supercomputers provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node and MPI can be used with the communication between nodes. In this paper, we use SP and BT benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore supercomputers. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76%, and the hybrid BT outperforms the MPI BT by up to 8.58% on up to 10,000 cores on BlueGene/P at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. We also use performance tools and MPI trace libraries available on these supercomputers to further investigate the performance characteristics of the hybrid SP and BT.
Feed-forward volume rendering algorithm for moderately parallel MIMD machines
Yagel, Roni
1993-01-01
Algorithms for direct volume rendering on parallel and vector processors are investigated. Volumes are transformed efficiently on parallel processors by dividing the data into slices and beams of voxels. Equal sized sets of slices along one axis are distributed to processors. Parallelism is achieved at two levels. Because each slice can be transformed independently of others, processors transform their assigned slices with no communication, thus providing maximum possible parallelism at the first level. Within each slice, consecutive beams are incrementally transformed using coherency in the transformation computation. Also, coherency across slices can be exploited to further enhance performance. This coherency yields the second level of parallelism through the use of the vector processing or pipelining. Other ongoing efforts include investigations into image reconstruction techniques, load balancing strategies, and improving performance.
Influence of diphenylhydantoin on lysosomal enzyme release during bone resorption in vitro
International Nuclear Information System (INIS)
Lerner, U.; Haenstroem, L.
1980-01-01
The effect of diphenylhydantoin (DPH) on the release of lysosomal enzymes during resorption of cultured mouse calvarial bone was studied. The enzyme activities of β-glucuronidase and β-galactosidase in the culture medium was taken as indicators for lysosomal enzyme release. In concentrations 50 μg/ml or higher, DPH inhibited the release of β-glucuronidase and β-galactosidase in parallel with bone resorption as indicated by reduced release of 4 Ca, Ca 2 , Psub(i) and hydroxyproline. The release of the cytosolic enzyme lactate dehydrogenase was not influenced by concentrations of DPH up to 50 μg/ml but higher concentrations caused an increased release indicating cell injury. When bone resorption was stimulated by prostaglandin E 2 , DPH(50 μg/ml) also reduced the mobilization of bone mineral and the release of β- glucuronidase without influencing the release of lactate dehydrogenase. It is suggested that DPH by interfering with cellular release processes reduces the resorption on bone. (author)
A task parallel implementation of fast multipole methods
Taura, Kenjiro
2012-11-01
This paper describes a task parallel implementation of ExaFMM, an open source implementation of fast multipole methods (FMM), using a lightweight task parallel library MassiveThreads. Although there have been many attempts on parallelizing FMM, experiences have almost exclusively been limited to formulation based on flat homogeneous parallel loops. FMM in fact contains operations that cannot be readily expressed in such conventional but restrictive models. We show that task parallelism, or parallel recursions in particular, allows us to parallelize all operations of FMM naturally and scalably. Moreover it allows us to parallelize a \\'\\'mutual interaction\\'\\' for force/potential evaluation, which is roughly twice as efficient as a more conventional, unidirectional force/potential evaluation. The net result is an open source FMM that is clearly among the fastest single node implementations, including those on GPUs; with a million particles on a 32 cores Sandy Bridge 2.20GHz node, it completes a single time step including tree construction and force/potential evaluation in 65 milliseconds. The study clearly showcases both programmability and performance benefits of flexible parallel constructs over more monolithic parallel loops. © 2012 IEEE.
Parallel-Architecture Simulator Development Using Hardware Transactional Memory
Armejach Sanosa, Adrià
2009-01-01
To address the need for a simpler parallel programming model, Transactional Memory (TM) has been developed and promises good parallel performance with easy-to-write parallel code. Unlike lock-based approaches, with TM, programmers do not need to explicitly specify and manage the synchronization among threads. However, programmers simply mark code segments as transactions, and the TM system manages the concurrency control for them. TM can be implemented either in software (STM) or hardware (HT...
Argentina's experience with parallel exchange markets: 1981-1990
Steven B. Kamin
1991-01-01
This paper surveys the development and operation of the parallel exchange market in Argentina during the 1980s, and evaluates its impact upon macroeconomic performance and policy. The historical evolution of Argentina's exchange market policies is reviewed in order to understand the government's motives for imposing exchange controls. The parallel exchange market engendered by these controls is then analyzed, and econometric methods are used to evaluate the behavior of the parallel exchange r...
Parallel Object-Oriented Computation Applied to a Finite Element Problem
Directory of Open Access Journals (Sweden)
Jon B. Weissman
1993-01-01
Full Text Available The conventional wisdom in the scientific computing community is that the best way to solve large-scale numerically intensive scientific problems on today's parallel MIMD computers is to use Fortran or C programmed in a data-parallel style using low-level message-passing primitives. This approach inevitably leads to nonportable codes and extensive development time, and restricts parallel programming to the domain of the expert programmer. We believe that these problems are not inherent to parallel computing but are the result of the programming tools used. We will show that comparable performance can be achieved with little effort if better tools that present higher level abstractions are used. The vehicle for our demonstration is a 2D electromagnetic finite element scattering code we have implemented in Mentat, an object-oriented parallel processing system. We briefly describe the application. Mentat, the implementation, and present performance results for both a Mentat and a hand-coded parallel Fortran version.
International Nuclear Information System (INIS)
Mitala, J.J.; Michael, A.C.
2006-01-01
Microsensors based on carbon fiber microelectrodes coated with enzyme-entrapping redox hydrogels facilitate the in vivo detection of substances of interest within the central nervous system, including hydrogen peroxide, glucose, choline and glutamate. The hydrogel, formed by cross-linking a redox polymer, entraps the enzymes and mediates electron transfer between the enzymes and the electrode. It is important that the enzymes are entrapped in their enzymatically active state. Should entrapment cause enzyme denaturation, the sensitivity and the selectivity of the sensor may be compromised. Synthesis of the redox polymer according to published procedures may yield a product that precipitates when added to aqueous enzyme solutions. Casting hydrogels from solutions that contain the precipitate produces microsensors with low sensitivity and selectivity, suggesting that the precipitation disrupts the structure of the enzymes. Herein, we show that a surfactant, sodium dodecyl sulfate (SDS), can prevent the precipitation and improve the sensitivity and selectivity of the sensors
Angular parallelization of a curvilinear Sn transport theory method
International Nuclear Information System (INIS)
Haghighat, A.
1991-01-01
In this paper a parallel algorithm for angular domain decomposition (or parallelization) of an r-dependent spherical S n transport theory method is derived. The parallel formulation is incorporated into TWOTRAN-II using the IBM Parallel Fortran compiler and implemented on an IBM 3090/400 (with four processors). The behavior of the parallel algorithm for different physical problems is studied, and it is concluded that the parallel algorithm behaves differently in the presence of a fission source as opposed to the absence of a fission source; this is attributed to the relative contributions of the source and the angular redistribution terms in the S s algorithm. Further, the parallel performance of the algorithm is measured for various problem sizes and different combinations of angular subdomains or processors. Poor parallel efficiencies between ∼35 and 50% are achieved in situations where the relative difference of parallel to serial iterations is ∼50%. High parallel efficiencies between ∼60% and 90% are obtained in situations where the relative difference of parallel to serial iterations is <35%
Parallel evolutionary computation in bioinformatics applications.
Pinho, Jorge; Sobral, João Luis; Rocha, Miguel
2013-05-01
A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Parallel computing of physical maps--a comparative study in SIMD and MIMD parallelism.
Bhandarkar, S M; Chirravuri, S; Arnold, J
1996-01-01
Ordering clones from a genomic library into physical maps of whole chromosomes presents a central computational problem in genetics. Chromosome reconstruction via clone ordering is usually isomorphic to the NP-complete Optimal Linear Arrangement problem. Parallel SIMD and MIMD algorithms for simulated annealing based on Markov chain distribution are proposed and applied to the problem of chromosome reconstruction via clone ordering. Perturbation methods and problem-specific annealing heuristics are proposed and described. The SIMD algorithms are implemented on a 2048 processor MasPar MP-2 system which is an SIMD 2-D toroidal mesh architecture whereas the MIMD algorithms are implemented on an 8 processor Intel iPSC/860 which is an MIMD hypercube architecture. A comparative analysis of the various SIMD and MIMD algorithms is presented in which the convergence, speedup, and scalability characteristics of the various algorithms are analyzed and discussed. On a fine-grained, massively parallel SIMD architecture with a low synchronization overhead such as the MasPar MP-2, a parallel simulated annealing algorithm based on multiple periodically interacting searches performs the best. For a coarse-grained MIMD architecture with high synchronization overhead such as the Intel iPSC/860, a parallel simulated annealing algorithm based on multiple independent searches yields the best results. In either case, distribution of clonal data across multiple processors is shown to exacerbate the tendency of the parallel simulated annealing algorithm to get trapped in a local optimum.
Transmission Index Research of Parallel Manipulators Based on Matrix Orthogonal Degree
Shao, Zhu-Feng; Mo, Jiao; Tang, Xiao-Qiang; Wang, Li-Ping
2017-11-01
Performance index is the standard of performance evaluation, and is the foundation of both performance analysis and optimal design for the parallel manipulator. Seeking the suitable kinematic indices is always an important and challenging issue for the parallel manipulator. So far, there are extensive studies in this field, but few existing indices can meet all the requirements, such as simple, intuitive, and universal. To solve this problem, the matrix orthogonal degree is adopted, and generalized transmission indices that can evaluate motion/force transmissibility of fully parallel manipulators are proposed. Transmission performance analysis of typical branches, end effectors, and parallel manipulators is given to illustrate proposed indices and analysis methodology. Simulation and analysis results reveal that proposed transmission indices possess significant advantages, such as normalized finite (ranging from 0 to 1), dimensionally homogeneous, frame-free, intuitive and easy to calculate. Besides, proposed indices well indicate the good transmission region and relativity to the singularity with better resolution than the traditional local conditioning index, and provide a novel tool for kinematic analysis and optimal design of fully parallel manipulators.
Arriola, Kathy G; Oliveira, Andre S; Ma, Zhengxin X; Lean, Ian J; Giurcanu, Mihai C; Adesogan, Adegbola T
2017-06-01
The aim of this study was to use meta-analytical methods to estimate effects of adding exogenous fibrolytic enzymes (EFE) to dairy cow diets on their performance and to determine which factors affect the response. Fifteen studies with 17 experiments and 36 observations met the study selection criteria for inclusion in the meta-analysis. The effects were compared by using random-effect models to examine the raw mean difference (RMD) and standardized mean difference between EFE and control treatments after both were weighted with the inverse of the study variances. Heterogeneity sources evaluated by meta-regression included experimental duration, EFE type and application rate, form (liquid or solid), and method (application to the forage, concentrate, or total mixed ration). Only the cellulase-xylanase (C-X) enzymes had a substantial number of observations (n = 13 studies). Application of EFE, overall, did not affect dry matter intake, feed efficiency but tended to increase total-tract dry matter digestibility and neutral detergent fiber digestibility (NDFD) by relatively small amounts (1.36 and 2.30%, respectively, or 50%) was detected for total-tract dry matter digestibility and NDFD. Milk production responses were higher for the C-X enzymes (RMD = 1.04 kg/d; 95% confidence interval: 0.33 to 1.74), but were still only moderate, about 0.35 standardized mean difference. A 24% numerical increase in the RMD resulting from examining only C-X enzymes instead of all enzymes (RMD = 1.04 vs. 0.83 kg/d) suggests that had more studies met the inclusion criteria, the C-X enzymes would have statistically increased the milk response relative to that for all enzymes. Increasing the EFE application rate had no effect on performance measures. Application of EFE to the total mixed ration improved only milk protein concentration, and application to the forage or concentrate had no effect. Applying EFE tended to increase dry matter digestibility and NDFD and increased milk yield by
K.I.S.S. Parallel Coding (lecture 2)
CERN. Geneva
2018-01-01
K.I.S.S.ing parallel computing means, finally, loving it. Parallel computing will be approached in a theoretical and experimental way, using the most advanced and used C API: OpenMP. OpenMP is an open source project constantly developed and updated to hide the awful complexity of parallel coding in an awesome interface. The result is a tool which leaves plenty of space for clever solutions and terrific results in terms of efficiency and performance maximisation.
Parallel Tensor Compression for Large-Scale Scientific Data.
Energy Technology Data Exchange (ETDEWEB)
Kolda, Tamara G. [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Ballard, Grey [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Austin, Woody Nathan [Univ. of Texas, Austin, TX (United States)
2015-10-01
As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data. By viewing the data as a dense five way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 10000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed memory parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.
Electro-ultrafiltration of industrial enzyme solutions
DEFF Research Database (Denmark)
Enevoldsen, Ann Dorrit; Hansen, Erik Børresen; Jonsson, Gunnar Eigil
2007-01-01
To reduce the problems with fouling and concentration polarization during crossflow ultrafiltration of industrial enzyme solutions an electric field is applied across the membrane. The filtration performance during electro-ultrafiltration (EUF) has been tested with several enzymes. Results show...
Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael
2012-01-01
We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529
Parallel optoelectronic trinary signed-digit division
Alam, Mohammad S.
1999-03-01
The trinary signed-digit (TSD) number system has been found to be very useful for parallel addition and subtraction of any arbitrary length operands in constant time. Using the TSD addition and multiplication modules as the basic building blocks, we develop an efficient algorithm for performing parallel TSD division in constant time. The proposed division technique uses one TSD subtraction and two TSD multiplication steps. An optoelectronic correlator based architecture is suggested for implementation of the proposed TSD division algorithm, which fully exploits the parallelism and high processing speed of optics. An efficient spatial encoding scheme is used to ensure better utilization of space bandwidth product of the spatial light modulators used in the optoelectronic implementation.
Substrate mediated enzyme prodrug therapy
DEFF Research Database (Denmark)
Fejerskov, Betina; Jarlstad Olesen, Morten T; Zelikin, Alexander N
2017-01-01
Substrate mediated enzyme prodrug therapy (SMEPT) is a biomedical platform developed to perform a localized synthesis of drugs mediated by implantable biomaterials. This approach combines the benefits and at the same time offers to overcome the drawbacks for traditional pill-based drug administra......Substrate mediated enzyme prodrug therapy (SMEPT) is a biomedical platform developed to perform a localized synthesis of drugs mediated by implantable biomaterials. This approach combines the benefits and at the same time offers to overcome the drawbacks for traditional pill-based drug...
Parallel PDE-Based Simulations Using the Common Component Architecture
International Nuclear Information System (INIS)
McInnes, Lois C.; Allan, Benjamin A.; Armstrong, Robert; Benson, Steven J.; Bernholdt, David E.; Dahlgren, Tamara L.; Diachin, Lori; Krishnan, Manoj Kumar; Kohl, James A.; Larson, J. Walter; Lefantzi, Sophia; Nieplocha, Jarek; Norris, Boyana; Parker, Steven G.; Ray, Jaideep; Zhou, Shujia
2006-01-01
The complexity of parallel PDE-based simulations continues to increase as multimodel, multiphysics, and multi-institutional projects become widespread. A goal of component based software engineering in such large-scale simulations is to help manage this complexity by enabling better interoperability among various codes that have been independently developed by different groups. The Common Component Architecture (CCA) Forum is defining a component architecture specification to address the challenges of high-performance scientific computing. In addition, several execution frameworks, supporting infrastructure, and general purpose components are being developed. Furthermore, this group is collaborating with others in the high-performance computing community to design suites of domain-specific component interface specifications and underlying implementations. This chapter discusses recent work on leveraging these CCA efforts in parallel PDE-based simulations involving accelerator design, climate modeling, combustion, and accidental fires and explosions. We explain how component technology helps to address the different challenges posed by each of these applications, and we highlight how component interfaces built on existing parallel toolkits facilitate the reuse of software for parallel mesh manipulation, discretization, linear algebra, integration, optimization, and parallel data redistribution. We also present performance data to demonstrate the suitability of this approach, and we discuss strategies for applying component technologies to both new and existing applications
Scalability of Parallel Scientific Applications on the Cloud
Directory of Open Access Journals (Sweden)
Satish Narayana Srirama
2011-01-01
Full Text Available Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study the effects of moving parallel scientific applications onto the cloud, we deployed several benchmark applications like matrix–vector operations and NAS parallel benchmarks, and DOUG (Domain decomposition On Unstructured Grids on the cloud. DOUG is an open source software package for parallel iterative solution of very large sparse systems of linear equations. The detailed analysis of DOUG on the cloud showed that parallel applications benefit a lot and scale reasonable on the cloud. We could also observe the limitations of the cloud and its comparison with cluster in terms of performance. However, for efficiently running the scientific applications on the cloud infrastructure, the applications must be reduced to frameworks that can successfully exploit the cloud resources, like the MapReduce framework. Several iterative and embarrassingly parallel algorithms are reduced to the MapReduce model and their performance is measured and analyzed. The analysis showed that Hadoop MapReduce has significant problems with iterative methods, while it suits well for embarrassingly parallel algorithms. Scientific computing often uses iterative methods to solve large problems. Thus, for scientific computing on the cloud, this paper raises the necessity for better frameworks or optimizations for MapReduce.
An Algorithm for Parallel Sn Sweeps on Unstructured Meshes
International Nuclear Information System (INIS)
Pautz, Shawn D.
2002-01-01
A new algorithm for performing parallel S n sweeps on unstructured meshes is developed. The algorithm uses a low-complexity list ordering heuristic to determine a sweep ordering on any partitioned mesh. For typical problems and with 'normal' mesh partitionings, nearly linear speedups on up to 126 processors are observed. This is an important and desirable result, since although analyses of structured meshes indicate that parallel sweeps will not scale with normal partitioning approaches, no severe asymptotic degradation in the parallel efficiency is observed with modest (≤100) levels of parallelism. This result is a fundamental step in the development of efficient parallel S n methods
Design, analysis and control of cable-suspended parallel robots and its applications
Zi, Bin
2017-01-01
This book provides an essential overview of the authors’ work in the field of cable-suspended parallel robots, focusing on innovative design, mechanics, control, development and applications. It presents and analyzes several typical mechanical architectures of cable-suspended parallel robots in practical applications, including the feed cable-suspended structure for super antennae, hybrid-driven-based cable-suspended parallel robots, and cooperative cable parallel manipulators for multiple mobile cranes. It also addresses the fundamental mechanics of cable-suspended parallel robots on the basis of their typical applications, including the kinematics, dynamics and trajectory tracking control of the feed cable-suspended structure for super antennae. In addition it proposes a novel hybrid-driven-based cable-suspended parallel robot that uses integrated mechanism design methods to improve the performance of traditional cable-suspended parallel robots. A comparative study on error and performance indices of hybr...
Integrated parallel reception, excitation, and shimming (iPRES).
Han, Hui; Song, Allen W; Truong, Trong-Kha
2013-07-01
To develop a new concept for a hardware platform that enables integrated parallel reception, excitation, and shimming. This concept uses a single coil array rather than separate arrays for parallel excitation/reception and B0 shimming. It relies on a novel design that allows a radiofrequency current (for excitation/reception) and a direct current (for B0 shimming) to coexist independently in the same coil. Proof-of-concept B0 shimming experiments were performed with a two-coil array in a phantom, whereas B0 shimming simulations were performed with a 48-coil array in the human brain. Our experiments show that individually optimized direct currents applied in each coil can reduce the B0 root-mean-square error by 62-81% and minimize distortions in echo-planar images. The simulations show that dynamic shimming with the 48-coil integrated parallel reception, excitation, and shimming array can reduce the B0 root-mean-square error in the prefrontal and temporal regions by 66-79% as compared with static second-order spherical harmonic shimming and by 12-23% as compared with dynamic shimming with a 48-coil conventional shim array. Our results demonstrate the feasibility of the integrated parallel reception, excitation, and shimming concept to perform parallel excitation/reception and B0 shimming with a unified coil system as well as its promise for in vivo applications. Copyright © 2013 Wiley Periodicals, Inc.
International Nuclear Information System (INIS)
Misawa, Takeharu; Yoshida, Hiroyuki; Akimoto, Hajime
2008-01-01
In Japan Atomic Energy Agency (JAEA), the Innovative Water Reactor for Flexible Fuel Cycle (FLWR) has been developed. For thermal design of FLWR, it is necessary to develop analytical method to predict boiling transition of FLWR. Japan Atomic Energy Agency (JAEA) has been developing three-dimensional two-fluid model analysis code ACE-3D, which adopts boundary fitted coordinate system to simulate complex shape channel flow. In this paper, as a part of development of ACE-3D to apply to rod bundle analysis, introduction of parallelization to ACE-3D and assessments of ACE-3D are shown. In analysis of large-scale domain such as a rod bundle, even two-fluid model requires large number of computational cost, which exceeds upper limit of memory amount of 1 CPU. Therefore, parallelization was introduced to ACE-3D to divide data amount for analysis of large-scale domain among large number of CPUs, and it is confirmed that analysis of large-scale domain such as a rod bundle can be performed by parallel computation with keeping parallel computation performance even using large number of CPUs. ACE-3D adopts two-phase flow models, some of which are dependent upon channel geometry. Therefore, analyses in the domains, which simulate individual subchannel and 37 rod bundle, are performed, and compared with experiments. It is confirmed that the results obtained by both analyses using ACE-3D show agreement with past experimental result qualitatively. (author)
DEFF Research Database (Denmark)
Zhang, Chi; Guerrero, Josep M.; Vasquez, Juan Carlos
2015-01-01
In this paper, a control strategy for the parallel operation of three-phase inverters forming an online uninterruptible power system (UPS) is presented. The UPS system consists of a cluster of paralleled inverters with LC filters directly connected to an AC critical bus and an AC/DC forming a DC...... bus. The proposed control scheme is performed on two layers: (i) a local layer that contains a “reactive power vs phase” in order to synchronize the phase angle of each inverter and a virtual resistance loop that guarantees equal power sharing among inverters; (ii) a central controller that guarantees...... synchronization with an external real/fictitious utility, and critical bus voltage restoration. Constant transient and steady-state frequency, active, reactive and harmonic power sharing, and global phase-locked loop resynchronization capability are achieved. Detailed system topology and control architecture...
Directory of Open Access Journals (Sweden)
Kalantar M
2016-06-01
Full Text Available The effects of grain and carbohydrase enzyme supplementation were investigated on digestive physiology of chickens. A total of 625 one-day-old chicks (Ross 308 were randomly assigned to five treatments in a completely randomized design. Treatments included two different types of grains (wheat, and barley with or without a multi-carbohydrase supplement. A corn-based diet was also considered to serve as a control. Feeding barley-based diet with multi-carbohydrase led to higher feed intake (P < 0.01 than those fed corn- and wheat-based diets. Birds fed on barley and wheat diets had lower weight gain despite a higher feed conversion ratio (P < 0.01. Total count and number of different type of bacteria including Gram-negative, E. coli, and Clostridia increased after feeding wheat and barley but the number of Lactobacilli and Bifidobacteria decreased (P < 0.01. Feeding barley and wheat diets reduced villus height in different parts of the small intestine when compared to those fed on a corn diet. However, enzyme supplementation of barley and wheat diets improved weight gain and feed conversion ratio and resulted in reduced number of E. coli and Clostridia and increased number of Lactobacilli and Bifidobacteria, and also restored the negative effects on intestinal villi height (P < 0.01. The activities of pancreatic α-amylase and lipase were (P < 0.01 increased in chickens fed wheat and barley diets when compared to the control fed on a corn diet. Enzyme supplementation reduced the activities of pancreatic α-amylase and lipase (P < 0.01. In conclusion, various dietary non-starch polysaccharides without enzyme supplementation have an adverse effect on digesta viscosity, ileal microflora, villi morphology, and pancreatic enzyme activity.
International Nuclear Information System (INIS)
Laurent, C.; Chassery, J.M.; Peyrin, F.; Girerd, C.
1996-01-01
This paper deals with the parallel implementations of reconstruction methods in 3D tomography. 3D tomography requires voluminous data and long computation times. Parallel computing, on MIMD computers, seems to be a good approach to manage this problem. In this study, we present the different steps of the parallelization on an abstract parallel computer. Depending on the method, we use two main approaches to parallelize the algorithms: the local approach and the global approach. Experimental results on MIMD computers are presented. Two 3D images reconstructed from realistic data are showed
Chen, Dong; Wang, Wei; Ru, Shaoguo
2015-01-01
An 8-week feeding experiment was performed to evaluate the effect of dietary genistein on growth performance, body composition, and digestive enzymes activity of juvenile Nile tilapia ( Oreochromis niloticus). Four isonitrogenous and isoenergetic diets were formulated containing four graded supplements of genistein: 0, 30, 300, and 3 000 μg/g. Each diet was randomly assigned in triplicate to tanks stocked with 15 juvenile tilapia (10.47±1.24 g). The results show that 30 and 300 μg/g dietary genistein had no significant effect on growth performance of Nile tilapia, but the higher level of genistein (3 000 μg/g) significantly depressed the final body weight and specific growth rate. There was no significant difference in survival rate, feed intake, feed efficiency ratio or whole body composition among all dietary treatments. An assay of digestive enzymes showed that the diet containing 3 000 μg/ggenistein decreased stomach and hepatopancreas protease activity, and amylase activity in the liver and intestine, while a dietary level of 300 μg/g genistein depressed stomach protease and intestine amylase activities. However, no significant difference in stomach amylase activity was found among dietary treatments. Overall, the results of the present study indicate that a high level of dietary genistein (3 000 μg/g, or above) would significantly reduce the growth of Nile tilapia, partly because of its inhibitory effect on the activity of major digestive enzymes. Accordingly, the detrimental effects of genistein, as found in soybean products, should not be ignored when applied as an alternative ingredient source in aquaculture.
A two-level parallel direct search implementation for arbitrarily sized objective functions
Energy Technology Data Exchange (ETDEWEB)
Hutchinson, S.A.; Shadid, N.; Moffat, H.K. [Sandia National Labs., Albuquerque, NM (United States)] [and others
1994-12-31
In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p{sub 2}) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.
Narayanan, Kiran
2012-07-17
A hybrid parallelization method composed of a coarse-grained genetic algorithm (GA) and fine-grained objective function evaluations is implemented on a heterogeneous computational resource consisting of 16 IBM Blue Gene/P racks, a single x86 cluster node and a high-performance file system. The GA iterator is coupled with a finite-element (FE) analysis code developed in house to facilitate computational steering in order to calculate the optimal impact velocities of a projectile colliding with a polyurea/structural steel composite plate. The FE code is capable of capturing adiabatic shear bands and strain localization, which are typically observed in high-velocity impact applications, and it includes several constitutive models of plasticity, viscoelasticity and viscoplasticity for metals and soft materials, which allow simulation of ductile fracture by void growth. A strong scaling study of the FE code was conducted to determine the optimum number of processes run in parallel. The relative efficiency of the hybrid, multi-level parallelization method is studied in order to determine the parameters for the parallelization. Optimal impact velocities of the projectile calculated using the proposed approach, are reported. © The Author(s) 2012.
Crockett, Thomas W.
1995-01-01
This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
Parallel computing in plasma physics: Nonlinear instabilities
International Nuclear Information System (INIS)
Pohn, E.; Kamelander, G.; Shoucri, M.
2000-01-01
A Vlasov-Poisson-system is used for studying the time evolution of the charge-separation at a spatial one- as well as a two-dimensional plasma-edge. Ions are advanced in time using the Vlasov-equation. The whole three-dimensional velocity-space is considered leading to very time-consuming four-resp. five-dimensional fully kinetic simulations. In the 1D simulations electrons are assumed to behave adiabatic, i.e. they are Boltzmann-distributed, leading to a nonlinear Poisson-equation. In the 2D simulations a gyro-kinetic approximation is used for the electrons. The plasma is assumed to be initially neutral. The simulations are performed at an equidistant grid. A constant time-step is used for advancing the density-distribution function in time. The time-evolution of the distribution function is performed using a splitting scheme. Each dimension (x, y, υ x , υ y , υ z ) of the phase-space is advanced in time separately. The value of the distribution function for the next time is calculated from the value of an - in general - interstitial point at the present time (fractional shift). One-dimensional cubic-spline interpolation is used for calculating the interstitial function values. After the fractional shifts are performed for each dimension of the phase-space, a whole time-step for advancing the distribution function is finished. Afterwards the charge density is calculated, the Poisson-equation is solved and the electric field is calculated before the next time-step is performed. The fractional shift method sketched above was parallelized for p processors as follows. Considering first the shifts in y-direction, a proper parallelization strategy is to split the grid into p disjoint υ z -slices, which are sub-grids, each containing a different 1/p-th part of the υ z range but the whole range of all other dimensions. Each processor is responsible for performing the y-shifts on a different slice, which can be done in parallel without any communication between
DEFF Research Database (Denmark)
Baum, Andreas
the understanding of the structural properties of the extracted pectin. Secondly, enzyme kinetics of biomass converting enzymes was examined in terms of measuring enzyme activity by spectral evolution profiling utilizing FTIR. Chemometric multiway methods were used to analyze the tensor datasets enabling the second......-order calibration advantage (reference Theory of Analytical chemistry). As PAPER 3 illustrates the method is universally applicable without the need of any external standards and was exemplified by performing quantitative enzyme activity determinations for glucose oxidase, pectin lyase and a cellolytic enzyme blend...... (Celluclast 1.5L). In PAPER 4, the concept is extended to quantify enzyme activity of two simultaneously acting enzymes, namely pectin lyase and pectin methyl esterase. By doing so the multiway methods PARAFAC, TUCKER3 and NPLS were compared and evaluated towards accuracy and precision....
Zhang, Xue-Jian; Wang, Xiao-Wei; Sun, Jiaxing; Su, Chao; Yang, Shuguang; Zhang, Wen-Bin
2018-05-16
Protein immobilization is critical to utilize their unique functions in diverse applications. Herein, we report that orthogonal peptide-protein chemistry enabled multilayer construction can facilitate the incorporation of various folded structural domains, including calmodulin in different states, affibody and dihydrofolate reductase (DHFR). An extended conformation is found to be the most advantageous for steady film growth. The resulting protein thin films exhibit sensitive and selective responsive behaviors to bio-signals (Ca2+, TFP, NADPH, etc.) and fully maintain the catalytic activity of DHFR. The approach is applicable to different substrates such as hydrophobic gold and hydrophilic silica microparticles. The DHFR enzyme can be immobilized onto silica microparticles with tunable amounts. The multi-layer set-up exhibits a synergistic enhancement of DHFR activity with increasing number of bilayers and also makes the embedded DHFR more resilient to lyophilization. Therefore, this is a convenient and versatile method for protein immobilization with potential benefits of synergistic enhancement in enzyme performance and resilience.
Parallelization of applications for networks with homogeneous and heterogeneous processors
International Nuclear Information System (INIS)
Colombet, L.
1994-01-01
The aim of this thesis is to study and develop efficient methods for parallelization of scientific applications on parallel computers with distributed memory. The first part presents two libraries of PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) communication tools. They allow implementation of programs on most parallel machines, but also on heterogeneous computer networks. This chapter illustrates the problems faced when trying to evaluate performances of networks with heterogeneous processors. To evaluate such performances, the concepts of speed-up and efficiency have been modified and adapted to account for heterogeneity. The second part deals with a study of parallel application libraries such as ScaLAPACK and with the development of communication masking techniques. The general concept is based on communication anticipation, in particular by pipelining message sending operations. Experimental results on Cray T3D and IBM SP1 machines validates the theoretical studies performed on basic algorithms of the libraries discussed above. Two examples of scientific applications are given: the first is a model of young stars for astrophysics and the other is a model of photon trajectories in the Compton effect. (J.S.). 83 refs., 65 figs., 24 tabs
Efficient parallel implicit methods for rotary-wing aerodynamics calculations
Wissink, Andrew M.
Euler/Navier-Stokes Computational Fluid Dynamics (CFD) methods are commonly used for prediction of the aerodynamics and aeroacoustics of modern rotary-wing aircraft. However, their widespread application to large complex problems is limited lack of adequate computing power. Parallel processing offers the potential for dramatic increases in computing power, but most conventional implicit solution methods are inefficient in parallel and new techniques must be adopted to realize its potential. This work proposes alternative implicit schemes for Euler/Navier-Stokes rotary-wing calculations which are robust and efficient in parallel. The first part of this work proposes an efficient parallelizable modification of the Lower Upper-Symmetric Gauss Seidel (LU-SGS) implicit operator used in the well-known Transonic Unsteady Rotor Navier Stokes (TURNS) code. The new hybrid LU-SGS scheme couples a point-relaxation approach of the Data Parallel-Lower Upper Relaxation (DP-LUR) algorithm for inter-processor communication with the Symmetric Gauss Seidel algorithm of LU-SGS for on-processor computations. With the modified operator, TURNS is implemented in parallel using Message Passing Interface (MPI) for communication. Numerical performance and parallel efficiency are evaluated on the IBM SP2 and Thinking Machines CM-5 multi-processors for a variety of steady-state and unsteady test cases. The hybrid LU-SGS scheme maintains the numerical performance of the original LU-SGS algorithm in all cases and shows a good degree of parallel efficiency. It experiences a higher degree of robustness than DP-LUR for third-order upwind solutions. The second part of this work examines use of Krylov subspace iterative solvers for the nonlinear CFD solutions. The hybrid LU-SGS scheme is used as a parallelizable preconditioner. Two iterative methods are tested, Generalized Minimum Residual (GMRES) and Orthogonal s-Step Generalized Conjugate Residual (OSGCR). The Newton method demonstrates good
Energy Technology Data Exchange (ETDEWEB)
Takemiya, Hiroshi; Ohta, Hirofumi; Honma, Ichirou
1996-03-01
The parallelization of Electro-Magnetic Cascade Monte Carlo Simulation Code, EGS4 on distributed memory scalar parallel computer: Intel Paragon XP/S15-256 is described. EGS4 has the feature that calculation time for one incident particle is quite different from each other because of the dynamic generation of secondary particles and different behavior of each particle. Granularity for parallel processing, parallel programming model and the algorithm of parallel random number generation are discussed and two kinds of method, each of which allocates particles dynamically or statically, are used for the purpose of realizing high speed parallel processing of this code. Among four problems chosen for performance evaluation, the speedup factors for three problems have been attained to nearly 100 times with 128 processor. It has been found that when both the calculation time for each incident particles and its dispersion are large, it is preferable to use dynamic particle allocation method which can average the load for each processor. And it has also been found that when they are small, it is preferable to use static particle allocation method which reduces the communication overhead. Moreover, it is pointed out that to get the result accurately, it is necessary to use double precision variables in EGS4 code. Finally, the workflow of program parallelization is analyzed and tools for program parallelization through the experience of the EGS4 parallelization are discussed. (author).
1982-01-01
Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn
Plasma Physics Calculations on a Parallel Macintosh Cluster
Decyk, Viktor; Dauger, Dean; Kokelaar, Pieter
2000-03-01
We have constructed a parallel cluster consisting of 16 Apple Macintosh G3 computers running the MacOS, and achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. For large problems where message packets are large and relatively few in number, performance of 50-150 MFlops/node is possible, depending on the problem. This is fast enough that 3D calculations can be routinely done. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. Full details are available on our web site: http://exodus.physics.ucla.edu/appleseed/.
Samaratunga, Ashani; Kudina, Olena; Nahar, Nurun; Zakharchenko, Andrey; Minko, Sergiy; Voronov, Andriy; Pryor, Scott W
2015-03-01
Cellulase and β-glucosidase were adsorbed on a polyacrylic acid polymer brush grafted on silica nanoparticles to produce enzymogels as a form of enzyme immobilization. Enzyme loading on the enzymogels was increased to a saturation level of approximately 110 μg (protein) mg(-1) (particle) for each enzyme. Enzymogels with varied enzyme loadings were then used to determine the impact on hydrolysis rate and enzyme recovery. Soluble sugar concentrations during the hydrolysis of filter paper and Solka-Floc with the enzymogels were 45 and 53%, respectively, of concentrations when using free cellulase. β-Glucosidase enzymogels showed lower performance; hydrolyzate glucose concentrations were just 38% of those using free enzymes. Increasing enzyme loading on the enzymogels did not reduce net efficacy for cellulase and improved efficacy for β-glucosidase. The use of free cellulases and cellulase enzymogels resulted in hydrolyzates with different proportions of cellobiose and glucose, suggesting differential attachment or efficacy of endoglucanases, exoglucanases, and β-glucosidases present in cellulase mixtures. When loading β-glucosidase individually, higher enzyme loadings on the enzymogels produced higher hydrolyzate glucose concentrations. Approximately 96% of cellulase and 66 % of β-glucosidase were recovered on the enzymogels, while enzyme loading level did not impact recovery for either enzyme.
Parallel processing of neutron transport in fuel assembly calculation
International Nuclear Information System (INIS)
Song, Jae Seung
1992-02-01
Group constants, which are used for reactor analyses by nodal method, are generated by fuel assembly calculations based on the neutron transport theory, since one or a quarter of the fuel assembly corresponds to a unit mesh in the current nodal calculation. The group constant calculation for a fuel assembly is performed through spectrum calculations, a two-dimensional fuel assembly calculation, and depletion calculations. The purpose of this study is to develop a parallel algorithm to be used in a parallel processor for the fuel assembly calculation and the depletion calculations of the group constant generation. A serial program, which solves the neutron integral transport equation using the transmission probability method and the linear depletion equation, was prepared and verified by a benchmark calculation. Small changes from the serial program was enough to parallelize the depletion calculation which has inherent parallel characteristics. In the fuel assembly calculation, however, efficient parallelization is not simple and easy because of the many coupling parameters in the calculation and data communications among CPU's. In this study, the group distribution method is introduced for the parallel processing of the fuel assembly calculation to minimize the data communications. The parallel processing was performed on Quadputer with 4 CPU's operating in NURAD Lab. at KAIST. Efficiencies of 54.3 % and 78.0 % were obtained in the fuel assembly calculation and depletion calculation, respectively, which lead to the overall speedup of about 2.5. As a result, it is concluded that the computing time consumed for the group constant generation can be easily reduced by parallel processing on the parallel computer with small size CPU's
Akl, Selim G
1985-01-01
Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the
Parallel Breadth-First Search on Distributed Memory Systems
Energy Technology Data Exchange (ETDEWEB)
Computational Research Division; Buluc, Aydin; Madduri, Kamesh
2011-04-15
Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms. We present two highly-tuned par- allel approaches for BFS on large parallel systems: a level-synchronous strategy that relies on a simple vertex-based partitioning of the graph, and a two-dimensional sparse matrix- partitioning-based approach that mitigates parallel commu- nication overhead. For both approaches, we also present hybrid versions with intra-node multithreading. Our novel hybrid two-dimensional algorithm reduces communication times by up to a factor of 3.5, relative to a common vertex based approach. Our experimental study identifies execu- tion regimes in which these approaches will be competitive, and we demonstrate extremely high performance on lead- ing distributed-memory parallel systems. For instance, for a 40,000-core parallel execution on Hopper, an AMD Magny- Cours based system, we achieve a BFS performance rate of 17.8 billion edge visits per second on an undirected graph of 4.3 billion vertices and 68.7 billion edges with skewed degree distribution.
Parallel processing of Monte Carlo code MCNP for particle transport problem
Energy Technology Data Exchange (ETDEWEB)
Higuchi, Kenji; Kawasaki, Takuji
1996-06-01
It is possible to vectorize or parallelize Monte Carlo codes (MC code) for photon and neutron transport problem, making use of independency of the calculation for each particle. Applicability of existing MC code to parallel processing is mentioned. As for parallel computer, we have used both vector-parallel processor and scalar-parallel processor in performance evaluation. We have made (i) vector-parallel processing of MCNP code on Monte Carlo machine Monte-4 with four vector processors, (ii) parallel processing on Paragon XP/S with 256 processors. In this report we describe the methodology and results for parallel processing on two types of parallel or distributed memory computers. In addition, we mention the evaluation of parallel programming environments for parallel computers used in the present work as a part of the work developing STA (Seamless Thinking Aid) Basic Software. (author)
EFFECTS OF EXOGENOUS ENZYMES ON NUTRIENTS DIGESTIBILITY AND GROWTH PERFORMANCE IN SHEEP AND GOATS
Directory of Open Access Journals (Sweden)
Abdel-Fattah Z.M. Salem
2011-07-01
Full Text Available Six crossbred sheep (32.00±0.603 kg BW and 6 Baladi goats (18.00±0.703 kg BW were used in 2×2 factorial design to evaluate the effect of exogenous enzymes of ZADO® (i.e., ENZ and on digestibility and growth performance. Animals were fed on wheat straw ad libitum and restricted amount of commercial concentrate with (+ENZ or without (-ENZ 10 g/animal/day of ZADO to cover 120% of their maintenance requirements. Nutrients digestibilities were increased (P
Test generation for digital circuits using parallel processing
Hartmann, Carlos R.; Ali, Akhtar-Uz-Zaman M.
1990-12-01
The problem of test generation for digital logic circuits is an NP-Hard problem. Recently, the availability of low cost, high performance parallel machines has spurred interest in developing fast parallel algorithms for computer-aided design and test. This report describes a method of applying a 15-valued logic system for digital logic circuit test vector generation in a parallel programming environment. A concept called fault site testing allows for test generation, in parallel, that targets more than one fault at a given location. The multi-valued logic system allows results obtained by distinct processors and/or processes to be merged by means of simple set intersections. A machine-independent description is given for the proposed algorithm.
Development of Industrial High-Speed Transfer Parallel Robot
International Nuclear Information System (INIS)
Kim, Byung In; Kyung, Jin Ho; Do, Hyun Min; Jo, Sang Hyun
2013-01-01
Parallel robots used in industry require high stiffness or high speed because of their structural characteristics. Nowadays, the importance of rapid transportation has increased in the distribution industry. In this light, an industrial parallel robot has been developed for high-speed transfer. The developed parallel robot can handle a maximum payload of 3 kg. For a payload of 0.1 kg, the trajectory cycle time is 0.3 s (come and go), and the maximum velocity is 4.5 m/s (pick amp, place work, adept cycle). In this motion, its maximum acceleration is very high and reaches approximately 13g. In this paper, the design, analysis, and performance test results of the developed parallel robot system are introduced
Parallel Scaling Characteristics of Selected NERSC User ProjectCodes
Energy Technology Data Exchange (ETDEWEB)
Skinner, David; Verdier, Francesca; Anand, Harsh; Carter,Jonathan; Durst, Mark; Gerber, Richard
2005-03-05
This report documents parallel scaling characteristics of NERSC user project codes between Fiscal Year 2003 and the first half of Fiscal Year 2004 (Oct 2002-March 2004). The codes analyzed cover 60% of all the CPU hours delivered during that time frame on seaborg, a 6080 CPU IBM SP and the largest parallel computer at NERSC. The scale in terms of concurrency and problem size of the workload is analyzed. Drawing on batch queue logs, performance data and feedback from researchers we detail the motivations, benefits, and challenges of implementing highly parallel scientific codes on current NERSC High Performance Computing systems. An evaluation and outlook of the NERSC workload for Allocation Year 2005 is presented.
Parallel k-means++ for Multiple Shared-Memory Architectures
Energy Technology Data Exchange (ETDEWEB)
Mackey, Patrick S.; Lewis, Robert R.
2016-09-22
In recent years k-means++ has become a popular initialization technique for improved k-means clustering. To date, most of the work done to improve its performance has involved parallelizing algorithms that are only approximations of k-means++. In this paper we present a parallelization of the exact k-means++ algorithm, with a proof of its correctness. We develop implementations for three distinct shared-memory architectures: multicore CPU, high performance GPU, and the massively multithreaded Cray XMT platform. We demonstrate the scalability of the algorithm on each platform. In addition we present a visual approach for showing which platform performed k-means++ the fastest for varying data sizes.
Parallel adaptation of a vectorised quantumchemical program system
International Nuclear Information System (INIS)
Van Corler, L.C.H.; Van Lenthe, J.H.
1987-01-01
Supercomputers, like the CRAY 1 or the Cyber 205, have had, and still have, a marked influence on Quantum Chemistry. Vectorization has led to a considerable increase in the performance of Quantum Chemistry programs. However, clockcycle times more than a factor 10 smaller than those of the present supercomputers are not to be expected. Therefore future supercomputers will have to depend on parallel structures. Recently, the first examples of such supercomputers have been installed. To be prepared for this new generation of (parallel) supercomputers one should consider the concepts one wants to use and the kind of problems one will encounter during implementation of existing vectorized programs on those parallel systems. The authors implemented four important parts of a large quantumchemical program system (ATMOL), i.e. integrals, SCF, 4-index and Direct-CI in the parallel environment at ECSEC (Rome, Italy). This system offers simulated parallellism on the host computer (IBM 4381) and real parallellism on at most 10 attached processors (FPS-164). Quantumchemical programs usually handle large amounts of data and very large, often sparse matrices. The transfer of that many data can cause problems concerning communication and overhead, in view of which shared memory and shared disks must be considered. The strategy and the tools that were used to parallellise the programs are shown. Also, some examples are presented to illustrate effectiveness and performance of the system in Rome for these type of calculations
Multilevel Parallelization of AutoDock 4.2
Directory of Open Access Journals (Sweden)
Norgan Andrew P
2011-04-01
Full Text Available Abstract Background Virtual (computational screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4. Results Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Conclusions Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI and node-level (OpenMP parallelization to best fit both workloads and computational resources.
Multilevel Parallelization of AutoDock 4.2.
Norgan, Andrew P; Coffman, Paul K; Kocher, Jean-Pierre A; Katzmann, David J; Sosa, Carlos P
2011-04-28
Virtual (computational) screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4). Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O) traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI) and node-level (OpenMP) parallelization to best fit both workloads and computational resources.
Computationally efficient implementation of combustion chemistry in parallel PDF calculations
International Nuclear Information System (INIS)
Lu Liuyan; Lantz, Steven R.; Ren Zhuyin; Pope, Stephen B.
2009-01-01
In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f m pi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel
Java parallel secure stream for grid computing
International Nuclear Information System (INIS)
Chen, J.; Akers, W.; Chen, Y.; Watson, W.
2001-01-01
The emergence of high speed wide area networks makes grid computing a reality. However grid applications that need reliable data transfer still have difficulties to achieve optimal TCP performance due to network tuning of TCP window size to improve the bandwidth and to reduce latency on a high speed wide area network. The authors present a pure Java package called JPARSS (Java Parallel Secure Stream) that divides data into partitions that are sent over several parallel Java streams simultaneously and allows Java or Web applications to achieve optimal TCP performance in a gird environment without the necessity of tuning the TCP window size. Several experimental results are provided to show that using parallel stream is more effective than tuning TCP window size. In addition X.509 certificate based single sign-on mechanism and SSL based connection establishment are integrated into this package. Finally a few applications using this package will be discussed
Abstract Level Parallelization of Finite Difference Methods
Directory of Open Access Journals (Sweden)
Edwin Vollebregt
1997-01-01
Full Text Available A formalism is proposed for describing finite difference calculations in an abstract way. The formalism consists of index sets and stencils, for characterizing the structure of sets of data items and interactions between data items (“neighbouring relations”. The formalism provides a means for lifting programming to a more abstract level. This simplifies the tasks of performance analysis and verification of correctness, and opens the way for automaticcode generation. The notation is particularly useful in parallelization, for the systematic construction of parallel programs in a process/channel programming paradigm (e.g., message passing. This is important because message passing, unfortunately, still is the only approach that leads to acceptable performance for many more unstructured or irregular problems on parallel computers that have non-uniform memory access times. It will be shown that the use of index sets and stencils greatly simplifies the determination of which data must be exchanged between different computing processes.
Block-Parallel Data Analysis with DIY2
Energy Technology Data Exchange (ETDEWEB)
Morozov, Dmitriy [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Peterka, Tom [Argonne National Lab. (ANL), Argonne, IL (United States)
2017-08-30
DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial, parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.
Advanced optical signal processing of broadband parallel data signals
DEFF Research Database (Denmark)
Oxenløwe, Leif Katsuo; Hu, Hao; Kjøller, Niels-Kristian
2016-01-01
Optical signal processing may aid in reducing the number of active components in communication systems with many parallel channels, by e.g. using telescopic time lens arrangements to perform format conversion and allow for WDM regeneration.......Optical signal processing may aid in reducing the number of active components in communication systems with many parallel channels, by e.g. using telescopic time lens arrangements to perform format conversion and allow for WDM regeneration....
Parallel science and engineering applications the Charm++ approach
Kale, Laxmikant V
2016-01-01
Developed in the context of science and engineering applications, with each abstraction motivated by and further honed by specific application needs, Charm++ is a production-quality system that runs on almost all parallel computers available. Parallel Science and Engineering Applications: The Charm++ Approach surveys a diverse and scalable collection of science and engineering applications, most of which are used regularly on supercomputers by scientists to further their research. After a brief introduction to Charm++, the book presents several parallel CSE codes written in the Charm++ model, along with their underlying scientific and numerical formulations, explaining their parallelization strategies and parallel performance. These chapters demonstrate the versatility of Charm++ and its utility for a wide variety of applications, including molecular dynamics, cosmology, quantum chemistry, fracture simulations, agent-based simulations, and weather modeling. The book is intended for a wide audience of people i...
Survey on present status and trend of parallel programming environments
International Nuclear Information System (INIS)
Takemiya, Hiroshi; Higuchi, Kenji; Honma, Ichiro; Ohta, Hirofumi; Kawasaki, Takuji; Imamura, Toshiyuki; Koide, Hiroshi; Akimoto, Masayuki.
1997-03-01
This report intends to provide useful information on software tools for parallel programming through the survey on parallel programming environments of the following six parallel computers, Fujitsu VPP300/500, NEC SX-4, Hitachi SR2201, Cray T94, IBM SP, and Intel Paragon, all of which are installed at Japan Atomic Energy Research Institute (JAERI), moreover, the present status of R and D's on parallel softwares of parallel languages, compilers, debuggers, performance evaluation tools, and integrated tools is reported. This survey has been made as a part of our project of developing a basic software for parallel programming environment, which is designed on the concept of STA (Seamless Thinking Aid to programmers). (author)
Jacobson, R H; Downing, D R; Lynch, T J
1982-11-15
A computer-assisted enzyme-linked immunosorbent assay (ELISA) system, based on kinetics of the reaction between substrate and enzyme molecules, was developed for testing large numbers of sera in laboratory applications. Systematic and random errors associated with conventional ELISA technique were identified leading to results formulated on a statistically validated, objective, and standardized basis. In a parallel development, an inexpensive system for field and veterinary office applications contained many of the qualities of the computer-assisted ELISA. This system uses a fluorogenic indicator (rather than the enzyme-substrate interaction) in a rapid test (15 to 20 minutes' duration) which promises broad application in serodiagnosis.
CALTRANS: A parallel, deterministic, 3D neutronics code
Energy Technology Data Exchange (ETDEWEB)
Carson, L.; Ferguson, J.; Rogers, J.
1994-04-01
Our efforts to parallelize the deterministic solution of the neutron transport equation has culminated in a new neutronics code CALTRANS, which has full 3D capability. In this article, we describe the layout and algorithms of CALTRANS and present performance measurements of the code on a variety of platforms. Explicit implementation of the parallel algorithms of CALTRANS using both the function calls of the Parallel Virtual Machine software package (PVM 3.2) and the Meiko CS-2 tagged message passing library (based on the Intel NX/2 interface) are provided in appendices.
Parallel tempering in full QCD with Wilson fermions
International Nuclear Information System (INIS)
Ilgenfritz, E.-M.; Kerler, W.; Mueller-Preussker, M.; Stueben, H.
2002-01-01
We study the performance of QCD simulations with dynamical Wilson fermions by combining the hybrid Monte Carlo algorithm with parallel tempering on 10 4 and 12 4 lattices. In order to compare tempered with standard simulations, covariance matrices between subensembles have to be formulated and evaluated using the general properties of autocorrelations of the parallel tempering algorithm. We find that rendering the hopping parameter κ dynamical does not lead to an essential improvement. We point out possible reasons for this observation and discuss more suitable ways of applying parallel tempering to QCD
Load Balancing of Parallel Monte Carlo Transport Calculations
International Nuclear Information System (INIS)
Procassini, R J; O'Brien, M J; Taylor, J M
2005-01-01
The performance of parallel Monte Carlo transport calculations which use both spatial and particle parallelism is increased by dynamically assigning processors to the most worked domains. Since he particle work load varies over the course of the simulation, this algorithm determines each cycle if dynamic load balancing would speed up the calculation. If load balancing is required, a small number of particle communications are initiated in order to achieve load balance. This method has decreased the parallel run time by more than a factor of three for certain criticality calculations
Enzyme Engineering for In Situ Immobilization.
Rehm, Fabian B H; Chen, Shuxiong; Rehm, Bernd H A
2016-10-14
Enzymes are used as biocatalysts in a vast range of industrial applications. Immobilization of enzymes to solid supports or their self-assembly into insoluble particles enhances their applicability by strongly improving properties such as stability in changing environments, re-usability and applicability in continuous biocatalytic processes. The possibility of co-immobilizing various functionally related enzymes involved in multistep synthesis, conversion or degradation reactions enables the design of multifunctional biocatalyst with enhanced performance compared to their soluble counterparts. This review provides a brief overview of up-to-date in vitro immobilization strategies while focusing on recent advances in enzyme engineering towards in situ self-assembly into insoluble particles. In situ self-assembly approaches include the bioengineering of bacteria to abundantly form enzymatically active inclusion bodies such as enzyme inclusions or enzyme-coated polyhydroxyalkanoate granules. These one-step production strategies for immobilized enzymes avoid prefabrication of the carrier as well as chemical cross-linking or attachment to a support material while the controlled oriented display strongly enhances the fraction of accessible catalytic sites and hence functional enzymes.
Parallel algorithms for numerical linear algebra
van der Vorst, H
1990-01-01
This is the first in a new series of books presenting research results and developments concerning the theory and applications of parallel computers, including vector, pipeline, array, fifth/future generation computers, and neural computers.All aspects of high-speed computing fall within the scope of the series, e.g. algorithm design, applications, software engineering, networking, taxonomy, models and architectural trends, performance, peripheral devices.Papers in Volume One cover the main streams of parallel linear algebra: systolic array algorithms, message-passing systems, algorithms for p
A Massively Parallel Face Recognition System
Directory of Open Access Journals (Sweden)
Lahdenoja Olli
2007-01-01
Full Text Available We present methods for processing the LBPs (local binary patterns with a massively parallel hardware, especially with CNN-UM (cellular nonlinear network-universal machine. In particular, we present a framework for implementing a massively parallel face recognition system, including a dedicated highly accurate algorithm suitable for various types of platforms (e.g., CNN-UM and digital FPGA. We study in detail a dedicated mixed-mode implementation of the algorithm and estimate its implementation cost in the view of its performance and accuracy restrictions.
A Massively Parallel Face Recognition System
Directory of Open Access Journals (Sweden)
Ari Paasio
2006-12-01
Full Text Available We present methods for processing the LBPs (local binary patterns with a massively parallel hardware, especially with CNN-UM (cellular nonlinear network-universal machine. In particular, we present a framework for implementing a massively parallel face recognition system, including a dedicated highly accurate algorithm suitable for various types of platforms (e.g., CNN-UM and digital FPGA. We study in detail a dedicated mixed-mode implementation of the algorithm and estimate its implementation cost in the view of its performance and accuracy restrictions.
International Nuclear Information System (INIS)
Mo Zeyao
2004-11-01
Multiphysics parallel numerical simulations are usually essential to simplify researches on complex physical phenomena in which several physics are tightly coupled. It is very important on how to concatenate those coupled physics for fully scalable parallel simulation. Meanwhile, three objectives should be balanced, the first is efficient data transfer among simulations, the second and the third are efficient parallel executions and simultaneously developments of those simulation codes. Two concatenating algorithms for multiphysics parallel numerical simulations coupling radiation hydrodynamics with neutron transport on unstructured grid are presented. The first algorithm, Fully Loosely Concatenation (FLC), focuses on the independence of code development and the independence running with optimal performance of code. The second algorithm. Two Level Tightly Concatenation (TLTC), focuses on the optimal tradeoffs among above three objectives. Theoretical analyses for communicational complexity and parallel numerical experiments on hundreds of processors on two parallel machines have showed that these two algorithms are efficient and can be generalized to other multiphysics parallel numerical simulations. In especial, algorithm TLTC is linearly scalable and has achieved the optimal parallel performance. (authors)
Directory of Open Access Journals (Sweden)
Seyed Adel Moftakharzadeh
2017-02-01
Full Text Available In this study, we have evaluated the effect of three multi-enzymes nutrient matrix values and compared the results with that fed barley and the corn diets without enzyme. In entire period, addition of all enzymes to the barley-based diet significantly (p 0.05. Litter moisture and water to feed ratio at 15, 25, and 33 days of age significantly decreased by addition of all enzymes (p < 0.05. In conclusion, considering nutrient matrix values for all used enzymes improved performance of broilers and can be used in formulating diets commercial broiler diets based on barley.
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
Rostrup, Scott; De Sterck, Hans
2010-12-01
Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL
Quealy, Angela; Cole, Gary L.; Blech, Richard A.
1993-01-01
The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.
Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A; Seiberlich, Nicole
2012-07-01
Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the undersampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. Copyright © 2012 Wiley Periodicals, Inc.
Declarative Parallel Programming in Spreadsheet End-User Development
DEFF Research Database (Denmark)
Biermann, Florian
2016-01-01
Spreadsheets are first-order functional languages and are widely used in research and industry as a tool to conveniently perform all kinds of computations. Because cells on a spreadsheet are immutable, there are possibilities for implicit parallelization of spreadsheet computations. In this liter...... can directly apply results from functional array programming to a spreadsheet model of computations.......Spreadsheets are first-order functional languages and are widely used in research and industry as a tool to conveniently perform all kinds of computations. Because cells on a spreadsheet are immutable, there are possibilities for implicit parallelization of spreadsheet computations....... In this literature study, we provide an overview of the publications on spreadsheet end-user programming and declarative array programming to inform further research on parallel programming in spreadsheets. Our results show that there is a clear overlap between spreadsheet programming and array programming and we...
Parallel processing of genomics data
Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario
2016-10-01
The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.
Silica Sol-Gel Entrapment of the Enzyme Chloro peroxidase
International Nuclear Information System (INIS)
Le, T.; Chan, S.; Ebaid, B.; Sommerhalter, M.
2015-01-01
The enzyme chloro peroxidase (CPO) was immobilized in silica sol-gel beads prepared from tetramethoxysilane. The average pore diameter of the silica host structure (∼3 nm) was smaller than the globular CPO diameter (∼6 nm) and the enzyme remained entrapped after sol-gel maturation. The catalytic performance of the entrapped enzyme was assessed via the pyrogallol peroxidation reaction. Sol-gel beads loaded with 4 μg CPO per mL sol solution reached 9-12% relative activity compared to free CPO in solution. Enzyme kinetic analysis revealed a decrease in K_cat but no changes in K_M or K_I . Product release or enzyme damage might thus limit catalytic performance. Yet circular dichroism and visible absorption spectra of transparent CPO sol-gel sheets did not indicate enzyme damage. Activity decline due to methanol exposure was shown to be reversible in solution. To improve catalytic performance the sol-gel protocol was modified. The incorporation of 5, 20, or 40% methyltrimethoxysilane resulted in more brittle sol-gel beads but the catalytic performance increased to 14% relative to free CPO in solution. The use of more acidic casting buffers (ph 4.5 or 5.5 instead of 6.5) resulted in a more porous silica host reaching up to 18% relative activity
International Nuclear Information System (INIS)
Liebrock, L.M.; McGrath, J.F.; Hicks, D.L.
1986-01-01
Parallel programming promises improved processing speeds for hydrocodes, magnetohydrocodes, multiphase flow codes, thermal-hydraulics codes, wavecodes and other continuum dynamics codes. This paper presents the results of some investigations of parallel algorithms on three parallel processors: the CRAY X-MP, ELXSI and the HEP computers. Introduction and Background: We report the results of investigations of parallel algorithms for computational continuum dynamics. These programs (hydrocodes, wavecodes, etc.) produce simulations of the solutions to problems arising in the motion of continua: solid dynamics, liquid dynamics, gas dynamics, plasma dynamics, multiphase flow dynamics, thermal-hydraulic dynamics and multimaterial flow dynamics. This report restricts its scope to one-dimensional algorithms such as the von Neumann-Richtmyer (1950) scheme
Directory of Open Access Journals (Sweden)
M.E. Abd El-Hack
2017-02-01
Full Text Available An experiment was conducted with 160 Hisex Brown laying hens to evaluate the effect of different inclusion levels of faba bean (FB and enzyme supplementation on productive performance and egg quality parameters. The experimental diets consisted of five levels of FB: 0% (control, 25%, 50%, 75% and 100%, substituting soybean meal (SBM, and two levels of enzyme supplementation (0 or 250 mg/kg. Each dietary treatment was assigned to four replicate groups and the experiment lasted 22 weeks. A positive relationship (P 0.05. The main effect of FB levels replacing for SBM affected (P < 0.05 yolk and shell percentages, yolk index, yolk to albumen ratio, shell thickness and egg shape index. It can be concluded that FB and enzyme supplementation could be included in hens diet at less than 50% instead of SBM to support egg productive performance, however higher raw FB levels negatively affected egg production indices and quality.
Wu, X.; Taylor, V.
2011-01-01
The NAS Parallel Benchmarks (NPB) are well-known applications with fixed algorithms for evaluating parallel systems and tools. Multicore clusters provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node, and MPI can be used with the communication between nodes. In this paper, we use Scalar Pentadiagonal (SP) and Block Tridiagonal (BT) benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore clusters, Intrepid (BlueGene/P) at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76 %, and the hybrid BT outperforms the MPI BT by up to 8.58 % on up to 10 000 cores on Intrepid and Jaguar. We also use performance tools and MPI trace libraries available on these clusters to further investigate the performance characteristics of the hybrid SP and BT. © 2011 The Author. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved.
Wu, X.
2011-07-18
The NAS Parallel Benchmarks (NPB) are well-known applications with fixed algorithms for evaluating parallel systems and tools. Multicore clusters provide a natural programming paradigm for hybrid programs, whereby OpenMP can be used with the data sharing with the multicores that comprise a node, and MPI can be used with the communication between nodes. In this paper, we use Scalar Pentadiagonal (SP) and Block Tridiagonal (BT) benchmarks of MPI NPB 3.3 as a basis for a comparative approach to implement hybrid MPI/OpenMP versions of SP and BT. In particular, we can compare the performance of the hybrid SP and BT with the MPI counterparts on large-scale multicore clusters, Intrepid (BlueGene/P) at Argonne National Laboratory and Jaguar (Cray XT4/5) at Oak Ridge National Laboratory. Our performance results indicate that the hybrid SP outperforms the MPI SP by up to 20.76 %, and the hybrid BT outperforms the MPI BT by up to 8.58 % on up to 10 000 cores on Intrepid and Jaguar. We also use performance tools and MPI trace libraries available on these clusters to further investigate the performance characteristics of the hybrid SP and BT. © 2011 The Author. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved.
Iteration schemes for parallelizing models of superconductivity
Energy Technology Data Exchange (ETDEWEB)
Gray, P.A. [Michigan State Univ., East Lansing, MI (United States)
1996-12-31
The time dependent Lawrence-Doniach model, valid for high fields and high values of the Ginzburg-Landau parameter, is often used for studying vortex dynamics in layered high-T{sub c} superconductors. When solving these equations numerically, the added degrees of complexity due to the coupling and nonlinearity of the model often warrant the use of high-performance computers for their solution. However, the interdependence between the layers can be manipulated so as to allow parallelization of the computations at an individual layer level. The reduced parallel tasks may then be solved independently using a heterogeneous cluster of networked workstations connected together with Parallel Virtual Machine (PVM) software. Here, this parallelization of the model is discussed and several computational implementations of varying degrees of parallelism are presented. Computational results are also given which contrast properties of convergence speed, stability, and consistency of these implementations. Included in these results are models involving the motion of vortices due to an applied current and pinning effects due to various material properties.
Murugesan, Ganapathi Raj; Persia, Michael E
2015-09-01
Efficacy of a multi-strain direct-fed microbial product (PoultryStar(®) ME; PS) and a xylanase enzyme product on the dietary energy utilization efficiency and resulting performance in broiler chickens was evaluated. Apart from performance parameters, cecal and serum metabolites and activities of hepatic enzymes involved in energy metabolism were also determined. Ross 308 chicks were fed one of four experimental diets [control (CON), CON + PS, CON + xylanase and CON + PS + xylanase] using a 2 × 2 factorial arrangement from 1-21 days of age. Cecal proportions of propionate and butyrate, as well as total short-chain fatty acid concentration were increased (P energy uptake and hepatic energy retention. The combination additively increased the FCR, suggesting involvement of synergistic modes of actions. © 2014 Society of Chemical Industry.
Practical steady-state enzyme kinetics.
Lorsch, Jon R
2014-01-01
Enzymes are key components of most biological processes. Characterization of enzymes is therefore frequently required during the study of biological systems. Steady-state kinetics provides a simple and rapid means of assessing the substrate specificity of an enzyme. When combined with site-directed mutagenesis (see Site-Directed Mutagenesis), it can be used to probe the roles of particular amino acids in the enzyme in substrate recognition and catalysis. Effects of interaction partners and posttranslational modifications can also be assessed using steady-state kinetics. This overview explains the general principles of steady-state enzyme kinetics experiments in a practical, rather than theoretical, way. Any biochemistry textbook will have a section on the theory of Michaelis-Menten kinetics, including derivations of the relevant equations. No specific enzymatic assay is described here, although a method for monitoring product formation or substrate consumption over time (an assay) is required to perform the experiments described. © 2014 Elsevier Inc. All rights reserved.
Dataflow Query Execution in a Parallel Main-Memory Environment
Wilschut, A.N.; Apers, Peter M.G.
1991-01-01
The performance and characteristics of the execution of various join-trees on a parallel DBMS are studied. The results are a step in the direction of the design of a query optimization strategy that is fit for parallel execution of complex queries. Among others, synchronization issues are identified
Xyce parallel electronic simulator : users' guide. Version 5.1.
Energy Technology Data Exchange (ETDEWEB)
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-11-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a
Xyce Parallel Electronic Simulator : users' guide, version 4.1.
Energy Technology Data Exchange (ETDEWEB)
Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick
2009-02-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a
Comparison of multihardware parallel implementations for a phase unwrapping algorithm
Hernandez-Lopez, Francisco Javier; Rivera, Mariano; Salazar-Garibay, Adan; Legarda-Sáenz, Ricardo
2018-04-01
Phase unwrapping is an important problem in the areas of optical metrology, synthetic aperture radar (SAR) image analysis, and magnetic resonance imaging (MRI) analysis. These images are becoming larger in size and, particularly, the availability and need for processing of SAR and MRI data have increased significantly with the acquisition of remote sensing data and the popularization of magnetic resonators in clinical diagnosis. Therefore, it is important to develop faster and accurate phase unwrapping algorithms. We propose a parallel multigrid algorithm of a phase unwrapping method named accumulation of residual maps, which builds on a serial algorithm that consists of the minimization of a cost function; minimization achieved by means of a serial Gauss-Seidel kind algorithm. Our algorithm also optimizes the original cost function, but unlike the original work, our algorithm is a parallel Jacobi class with alternated minimizations. This strategy is known as the chessboard type, where red pixels can be updated in parallel at same iteration since they are independent. Similarly, black pixels can be updated in parallel in an alternating iteration. We present parallel implementations of our algorithm for different parallel multicore architecture such as CPU-multicore, Xeon Phi coprocessor, and Nvidia graphics processing unit. In all the cases, we obtain a superior performance of our parallel algorithm when compared with the original serial version. In addition, we present a detailed comparative performance of the developed parallel versions.
Algorithms for computational fluid dynamics n parallel processors
International Nuclear Information System (INIS)
Van de Velde, E.F.
1986-01-01
A study of parallel algorithms for the numerical solution of partial differential equations arising in computational fluid dynamics is presented. The actual implementation on parallel processors of shared and nonshared memory design is discussed. The performance of these algorithms is analyzed in terms of machine efficiency, communication time, bottlenecks and software development costs. For elliptic equations, a parallel preconditioned conjugate gradient method is described, which has been used to solve pressure equations discretized with high order finite elements on irregular grids. A parallel full multigrid method and a parallel fast Poisson solver are also presented. Hyperbolic conservation laws were discretized with parallel versions of finite difference methods like the Lax-Wendroff scheme and with the Random Choice method. Techniques are developed for comparing the behavior of an algorithm on different architectures as a function of problem size and local computational effort. Effective use of these advanced architecture machines requires the use of machine dependent programming. It is shown that the portability problems can be minimized by introducing high level operations on vectors and matrices structured into program libraries
Simulation of neutron transport equation using parallel Monte Carlo for deep penetration problems
International Nuclear Information System (INIS)
Bekar, K. K.; Tombakoglu, M.; Soekmen, C. N.
2001-01-01
Neutron transport equation is simulated using parallel Monte Carlo method for deep penetration neutron transport problem. Monte Carlo simulation is parallelized by using three different techniques; direct parallelization, domain decomposition and domain decomposition with load balancing, which are used with PVM (Parallel Virtual Machine) software on LAN (Local Area Network). The results of parallel simulation are given for various model problems. The performances of the parallelization techniques are compared with each other. Moreover, the effects of variance reduction techniques on parallelization are discussed
Parallel computing in genomic research: advances and applications
Directory of Open Access Journals (Sweden)
Ocaña K
2015-11-01
Full Text Available Kary Ocaña,1 Daniel de Oliveira2 1National Laboratory of Scientific Computing, Petrópolis, Rio de Janeiro, 2Institute of Computing, Fluminense Federal University, Niterói, Brazil Abstract: Today's genomic experiments have to process the so-called "biological big data" that is now reaching the size of Terabytes and Petabytes. To process this huge amount of data, scientists may require weeks or months if they use their own workstations. Parallelism techniques and high-performance computing (HPC environments can be applied for reducing the total processing time and to ease the management, treatment, and analyses of this data. However, running bioinformatics experiments in HPC environments such as clouds, grids, clusters, and graphics processing unit requires the expertise from scientists to integrate computational, biological, and mathematical techniques and technologies. Several solutions have already been proposed to allow scientists for processing their genomic experiments using HPC capabilities and parallelism techniques. This article brings a systematic review of literature that surveys the most recently published research involving genomics and parallel computing. Our objective is to gather the main characteristics, benefits, and challenges that can be considered by scientists when running their genomic experiments to benefit from parallelism techniques and HPC capabilities. Keywords: high-performance computing, genomic research, cloud computing, grid computing, cluster computing, parallel computing
Applications of Parallel Processing in Mobile Banking
Directory of Open Access Journals (Sweden)
2007-01-01
Full Text Available The future of mobile banking will be represented by such applications that support mobile, Internet banking and EFT (Electronic Funds Transfer transactions in a single user interface. In such a way, the mobile banking will be able to cover all the types of applications demanded at the market level. The parallel processing of credit card bank transactions could be performed with the help of a grid network. Excluding some limitations, the grid processing offers huge opportunities to exploit the parallelism. For this reason, a lot of applications of waiting queues in grid processing were developed in the last years. Grid networks represent a distinctive and very modern field of the parallel and distributed processing.
Laccase Enzymes in Inocula Pleurotus spp
Directory of Open Access Journals (Sweden)
Nora García-Oduardo
2017-01-01
Full Text Available The cultivation of edible and medicinal mushrooms Pleurotus has been aimed at promoting alternative management for agricultural products. This basidiomicete has been the subject of numerous studies because of its fruiting body constitutes a food, being a producer of enzymes with industrial interest and for its ability of biotransformation of lignocellulosic substrates. Pleurotus inocula in the established technology for growing edible and medicinal mushrooms in the CEBI Research- Production Plant were performed using sorghum or wheat. However, it is possible to expand the possibilities with other substrates. In this paper, the results of laccase enzymes production in inocula prepared with sorghum, corn and coffee pulp with two strains Pleurotus ostreatus CCEBI 3021 and Pleurotus ostreatus CCEBI 3024 are presented. The period of preparation of seed reaches 15-21 days, the measurements of laccase activity were performed in periods of seven days. Extraction of crude enzyme was performed in aqueous phase, the determination of the laccase enzyme activity, using guaiacol as substrate. The results obtained in this work with studies in previous work using sorghum as inocula are compared. It is found that higher yields are obtained laccase in coffee pulp. This study contributes to the theoretical knowledge and to provide alternatives for securing the production process of the plant.
IMPLEMENTATION OF SERIAL AND PARALLEL BUBBLE SORT ON FPGA
Directory of Open Access Journals (Sweden)
Dwi Marhaendro Jati Purnomo
2016-06-01
Full Text Available Sorting is common process in computational world. Its utilization are on many fields from research to industry. There are many sorting algorithm in nowadays. One of the simplest yet powerful is bubble sort. In this study, bubble sort is implemented on FPGA. The implementation was taken on serial and parallel approach. Serial and parallel bubble sort then compared by means of its memory, execution time, and utility which comprises slices and LUTs. The experiments show that serial bubble sort required smaller memory as well as utility compared to parallel bubble sort. Meanwhile, parallel bubble sort performed faster than serial bubble sort
Dataflow Query Execution in a Parallel, Main-memory Environment
Wilschut, A.N.; Apers, Peter M.G.
In this paper, the performance and characteristics of the execution of various join-trees on a parallel DBMS are studied. The results of this study are a step into the direction of the design of a query optimization strategy that is fit for parallel execution of complex queries. Among others,
Energy Technology Data Exchange (ETDEWEB)
Mazhari, M.; Golian, A.; Kermanshahi, H.
2015-07-01
An experiment was carried out to study the effect of corn replacement with five levels of wheat screening (0, 150, 300, 450 and 600 g/kg of diet) with (0.5 g/kg of diet) or without xylanase-glucanase enzyme on performance, blood lipids, viscosity and jejunal histomorphology of finisher broilers (25-42 days of age). Five hundred day-old Ross-308 male broiler chicks were fed by a standard commercial diet up to 24 days of age, then randomly assigned to 10 diets. Each diet was fed to five groups of ten chicks each. There was not significant differences in body weight gain (BWG), feed intake, and feed conversion ratio of birds fed with different levels of wheat screening (WS), whereas enzyme increased (p<0.05) BWG. Different levels of WS and enzyme did not have a significant effect on relative weights of carcass, breast, thigh, and abdominal fat of broilers. Relative weights of gizzard, pancreas, small and large intestine, and relative length of jejunum and jejunal and ileal viscosity were increased (p<0.05) by WS, while were decreased (p<0.05) by enzyme. The serum cholesterol level decreased (p<0.05) by increasing levels of WS. Jejunal histomorphological observations showed (p<0.05) shorter and thicker villus and lower crypt depth by increasing levels of WS, while addition of enzyme to the diets, affected (p<0.05) reversely to these parameters. The results showed that the addition of wheat screening up to an inclusion level of 600 g/kg of diet had no adverse effect on broiler performance in the finisher (25-42 d) phases whereas decreased serum cholesterol levels, increased viscosity and villus atrophy. The dietary administration of exogenous enzyme improved performance parameters and decreased viscosity and villus atrophy of broiler jejunum. (Author)
A hybrid parallel framework for the cellular Potts model simulations
Energy Technology Data Exchange (ETDEWEB)
Jiang, Yi [Los Alamos National Laboratory; He, Kejing [SOUTH CHINA UNIV; Dong, Shoubin [SOUTH CHINA UNIV
2009-01-01
The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).
Castor Oil Transesterification Catalysed by Liquid Enzymes
DEFF Research Database (Denmark)
Andrade, Thalles; Errico, Massimiliano; Christensen, Knud Villy
2017-01-01
In the present work, biodiesel production by reaction of non-edible castor oil with methanol under enzymatic catalysis is investigated. Two liquid enzymes were tested: Eversa Transform and Resinase HT. Reactions were performed at 35 °C and with a molar ratio of methanol to oil of 6:1. The reaction...... time was 8 hours. Stepwise addition of methanol was necessary to avoid enzyme inhibition by methanol. In order to minimize the enzyme costs, the influence of enzyme activity loss during reuse of both enzymes was evaluated under two distinct conditions. In the former, the enzymes were recovered...... and fully reused; in the latter, a mixture of 50 % reused and 50 % fresh enzymes was tested. In the case of total reuse after three cycles, both enzymes achieved only low conversions. The biodiesel content in the oil-phase using Eversa Transform was 94.21 % for the first cycle, 68.39 % in the second, and 33...
Advances in randomized parallel computing
Rajasekaran, Sanguthevar
1999-01-01
The technique of randomization has been employed to solve numerous prob lems of computing both sequentially and in parallel. Examples of randomized algorithms that are asymptotically better than their deterministic counterparts in solving various fundamental problems abound. Randomized algorithms have the advantages of simplicity and better performance both in theory and often in practice. This book is a collection of articles written by renowned experts in the area of randomized parallel computing. A brief introduction to randomized algorithms In the aflalysis of algorithms, at least three different measures of performance can be used: the best case, the worst case, and the average case. Often, the average case run time of an algorithm is much smaller than the worst case. 2 For instance, the worst case run time of Hoare's quicksort is O(n ), whereas its average case run time is only O( n log n). The average case analysis is conducted with an assumption on the input space. The assumption made to arrive at t...
A CS1 pedagogical approach to parallel thinking
Rague, Brian William
Almost all collegiate programs in Computer Science offer an introductory course in programming primarily devoted to communicating the foundational principles of software design and development. The ACM designates this introduction to computer programming course for first-year students as CS1, during which methodologies for solving problems within a discrete computational context are presented. Logical thinking is highlighted, guided primarily by a sequential approach to algorithm development and made manifest by typically using the latest, commercially successful programming language. In response to the most recent developments in accessible multicore computers, instructors of these introductory classes may wish to include training on how to design workable parallel code. Novel issues arise when programming concurrent applications which can make teaching these concepts to beginning programmers a seemingly formidable task. Student comprehension of design strategies related to parallel systems should be monitored to ensure an effective classroom experience. This research investigated the feasibility of integrating parallel computing concepts into the first-year CS classroom. To quantitatively assess student comprehension of parallel computing, an experimental educational study using a two-factor mixed group design was conducted to evaluate two instructional interventions in addition to a control group: (1) topic lecture only, and (2) topic lecture with laboratory work using a software visualization Parallel Analysis Tool (PAT) specifically designed for this project. A new evaluation instrument developed for this study, the Perceptions of Parallelism Survey (PoPS), was used to measure student learning regarding parallel systems. The results from this educational study show a statistically significant main effect among the repeated measures, implying that student comprehension levels of parallel concepts as measured by the PoPS improve immediately after the delivery of
A SPECT reconstruction method for extending parallel to non-parallel geometries
International Nuclear Information System (INIS)
Wen Junhai; Liang Zhengrong
2010-01-01
Due to its simplicity, parallel-beam geometry is usually assumed for the development of image reconstruction algorithms. The established reconstruction methodologies are then extended to fan-beam, cone-beam and other non-parallel geometries for practical application. This situation occurs for quantitative SPECT (single photon emission computed tomography) imaging in inverting the attenuated Radon transform. Novikov reported an explicit parallel-beam formula for the inversion of the attenuated Radon transform in 2000. Thereafter, a formula for fan-beam geometry was reported by Bukhgeim and Kazantsev (2002 Preprint N. 99 Sobolev Institute of Mathematics). At the same time, we presented a formula for varying focal-length fan-beam geometry. Sometimes, the reconstruction formula is so implicit that we cannot obtain the explicit reconstruction formula in the non-parallel geometries. In this work, we propose a unified reconstruction framework for extending parallel-beam geometry to any non-parallel geometry using ray-driven techniques. Studies by computer simulations demonstrated the accuracy of the presented unified reconstruction framework for extending parallel-beam to non-parallel geometries in inverting the attenuated Radon transform.
Electromagnetic Physics Models for Parallel Computing Architectures
Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.
2016-10-01
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.
The language parallel Pascal and other aspects of the massively parallel processor
Reeves, A. P.; Bruner, J. D.
1982-01-01
A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
Parallel Atomistic Simulations
Energy Technology Data Exchange (ETDEWEB)
HEFFELFINGER,GRANT S.
2000-01-18
Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.
Parallel discrete-event simulation of FCFS stochastic queueing networks
Nicol, David M.
1988-01-01
Physical systems are inherently parallel. Intuition suggests that simulations of these systems may be amenable to parallel execution. The parallel execution of a discrete-event simulation requires careful synchronization of processes in order to ensure the execution's correctness; this synchronization can degrade performance. Largely negative results were recently reported in a study which used a well-known synchronization method on queueing network simulations. Discussed here is a synchronization method (appointments), which has proven itself to be effective on simulations of FCFS queueing networks. The key concept behind appointments is the provision of lookahead. Lookahead is a prediction on a processor's future behavior, based on an analysis of the processor's simulation state. It is shown how lookahead can be computed for FCFS queueing network simulations, give performance data that demonstrates the method's effectiveness under moderate to heavy loads, and discuss performance tradeoffs between the quality of lookahead, and the cost of computing lookahead.
Parallelization of ITOUGH2 using PVM
International Nuclear Information System (INIS)
Finsterle, Stefan
1998-01-01
ITOUGH2 inversions are computationally intensive because the forward problem must be solved many times to evaluate the objective function for different parameter combinations or to numerically calculate sensitivity coefficients. Most of these forward runs are independent from each other and can therefore be performed in parallel. Message passing based on the Parallel Virtual Machine (PVM) system has been implemented into ITOUGH2 to enable parallel processing of ITOUGH2 jobs on a heterogeneous network of Unix workstations. This report describes the PVM system and its implementation into ITOUGH2. Instructions are given for installing PVM, compiling ITOUGH2-PVM for use on a workstation cluster, the preparation of an 1.TOUGH2 input file under PVM, and the execution of an ITOUGH2-PVM application. Examples are discussed, demonstrating the use of ITOUGH2-PVM
Compiling Scientific Programs for Scalable Parallel Systems
National Research Council Canada - National Science Library
Kennedy, Ken
2001-01-01
...). The research performed in this project included new techniques for recognizing implicit parallelism in sequential programs, a powerful and precise set-based framework for analysis and transformation...
The new landscape of parallel computer architecture
International Nuclear Information System (INIS)
Shalf, John
2007-01-01
The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models
The new landscape of parallel computer architecture
Energy Technology Data Exchange (ETDEWEB)
Shalf, John [NERSC Division, Lawrence Berkeley National Laboratory 1 Cyclotron Road, Berkeley California, 94720 (United States)
2007-07-15
The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models.
Directions in parallel processor architecture, and GPUs too
CERN. Geneva
2014-01-01
Modern computing is power-limited in every domain of computing. Performance increments extracted from instruction-level parallelism (ILP) are no longer power-efficient; they haven't been for some time. Thread-level parallelism (TLP) is a more easily exploited form of parallelism, at the expense of programmer effort to expose it in the program. In this talk, I will introduce you to disparate topics in parallel processor architecture that will impact programming models (and you) in both the near and far future. About the speaker Olivier is a senior GPU (SM) architect at NVIDIA and an active participant in the concurrency working group of the ISO C++ committee. He has also worked on very large diesel engines as a mechanical engineer, and taught at McGill University (Canada) as a faculty instructor.
Parallel Evolutionary Optimization for Neuromorphic Network Training
Energy Technology Data Exchange (ETDEWEB)
Schuman, Catherine D [ORNL; Disney, Adam [University of Tennessee (UT); Singh, Susheela [North Carolina State University (NCSU), Raleigh; Bruer, Grant [University of Tennessee (UT); Mitchell, John Parker [University of Tennessee (UT); Klibisz, Aleksander [University of Tennessee (UT); Plank, James [University of Tennessee (UT)
2016-01-01
One of the key impediments to the success of current neuromorphic computing architectures is the issue of how best to program them. Evolutionary optimization (EO) is one promising programming technique; in particular, its wide applicability makes it especially attractive for neuromorphic architectures, which can have many different characteristics. In this paper, we explore different facets of EO on a spiking neuromorphic computing model called DANNA. We focus on the performance of EO in the design of our DANNA simulator, and on how to structure EO on both multicore and massively parallel computing systems. We evaluate how our parallel methods impact the performance of EO on Titan, the U.S.'s largest open science supercomputer, and BOB, a Beowulf-style cluster of Raspberry Pi's. We also focus on how to improve the EO by evaluating commonality in higher performing neural networks, and present the result of a study that evaluates the EO performed by Titan.
Parallel iterative solvers and preconditioners using approximate hierarchical methods
Energy Technology Data Exchange (ETDEWEB)
Grama, A.; Kumar, V.; Sameh, A. [Univ. of Minnesota, Minneapolis, MN (United States)
1996-12-31
In this paper, we report results of the performance, convergence, and accuracy of a parallel GMRES solver for Boundary Element Methods. The solver uses a hierarchical approximate matrix-vector product based on a hybrid Barnes-Hut / Fast Multipole Method. We study the impact of various accuracy parameters on the convergence and show that with minimal loss in accuracy, our solver yields significant speedups. We demonstrate the excellent parallel efficiency and scalability of our solver. The combined speedups from approximation and parallelism represent an improvement of several orders in solution time. We also develop fast and paralellizable preconditioners for this problem. We report on the performance of an inner-outer scheme and a preconditioner based on truncated Green`s function. Experimental results on a 256 processor Cray T3D are presented.
Parallelization of the model-based iterative reconstruction algorithm DIRA
International Nuclear Information System (INIS)
Oertenberg, A.; Sandborg, M.; Alm Carlsson, G.; Malusek, A.; Magnusson, M.
2016-01-01
New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelization of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelization of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelized using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelization of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelization with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. (authors)
Directory of Open Access Journals (Sweden)
M. R. Monazzam ، P. Nassiri
2009-10-01
Full Text Available This paper presents the results of an investigation on the acoustic performance of tilted profile parallel barriers with quadratic residue diffuser (QRD tops and faces. A 2D boundary element method (BEM is used to predict the barrier insertion loss. The results of rigid and with absorptive coverage are also calculated for comparisons. Using QRD on the top surface and faces of all tilted profile parallel barrier models introduced here is found to improve the efficiency of barriers compared with rigid equivalent parallel barrier at the examined receiver positions. Applying a QRD with frequency design of 400 Hz on 5 degrees tilted parallel barrier improves the overall performance of its equivalent rigid barrier by 1.8 dB(A. Increase in the treated surfaces with reactive elements shifts the effective performance toward lower frequencies. It is found that by tilting the barriers from 0 to 10 degrees in parallel set up, the degradation effects in parallel barriers is reduced but the absorption effect of fibrous materials and also diffusivity of the quadratic residue diffuser is reduced significantly. In this case all the designed barriers have better performance with 10 degrees tilting in parallel set up. The most economic traffic noise parallel barrier which produces significantly high performance, is achieved by covering the top surface of the barrier closed to the receiver by just a QRD with frequency design of 400 Hz and tilting angle of 10 degrees. The average A-weighted insertion loss in this barrier is predicted to be 16.3 dB (A.
An Automatic Instruction-Level Parallelization of Machine Code
Directory of Open Access Journals (Sweden)
MARINKOVIC, V.
2018-02-01
Full Text Available Prevailing multicores and novel manycores have made a great challenge of modern day - parallelization of embedded software that is still written as sequential. In this paper, automatic code parallelization is considered, focusing on developing a parallelization tool at the binary level as well as on the validation of this approach. The novel instruction-level parallelization algorithm for assembly code which uses the register names after SSA to find independent blocks of code and then to schedule independent blocks using METIS to achieve good load balance is developed. The sequential consistency is verified and the validation is done by measuring the program execution time on the target architecture. Great speedup, taken as the performance measure in the validation process, and optimal load balancing are achieved for multicore RISC processors with 2 to 16 cores (e.g. MIPS, MicroBlaze, etc.. In particular, for 16 cores, the average speedup is 7.92x, while in some cases it reaches 14x. An approach to automatic parallelization provided by this paper is useful to researchers and developers in the area of parallelization as the basis for further optimizations, as the back-end of a compiler, or as the code parallelization tool for an embedded system.
An inherently parallel method for solving discretized diffusion equations
International Nuclear Information System (INIS)
Eccleston, B.R.; Palmer, T.S.
1999-01-01
A Monte Carlo approach to solving linear systems of equations is being investigated in the context of the solution of discretized diffusion equations. While the technique was originally devised decades ago, changes in computer architectures (namely, massively parallel machines) have driven the authors to revisit this technique. There are a number of potential advantages to this approach: (1) Analog Monte Carlo techniques are inherently parallel; this is not necessarily true to today's more advanced linear equation solvers (multigrid, conjugate gradient, etc.); (2) Some forms of this technique are adaptive in that they allow the user to specify locations in the problem where resolution is of particular importance and to concentrate the work at those locations; and (3) These techniques permit the solution of very large systems of equations in that matrix elements need not be stored. The user could trade calculational speed for storage if elements of the matrix are calculated on the fly. The goal of this study is to compare the parallel performance of Monte Carlo linear solvers to that of a more traditional parallelized linear solver. The authors observe the linear speedup that they expect from the Monte Carlo algorithm, given that there is no domain decomposition to cause significant communication overhead. Overall, PETSc outperforms the Monte Carlo solver for the test problem. The PETSc parallel performance improves with larger numbers of unknowns for a given number of processors. Parallel performance of the Monte Carlo technique is independent of the size of the matrix and the number of processes. They are investigating modifications to the scheme to accommodate matrix problems with positive off-diagonal elements. They are also currently coding an on-the-fly version of the algorithm to investigate the solution of very large linear systems
Modelling and parallel calculation of a kinetic boundary layer
International Nuclear Information System (INIS)
Perlat, Jean Philippe
1998-01-01
This research thesis aims at addressing reliability and cost issues in the calculation by numeric simulation of flows in transition regime. The first step has been to reduce calculation cost and memory space for the Monte Carlo method which is known to provide performance and reliability for rarefied regimes. Vector and parallel computers allow this objective to be reached. Here, a MIMD (multiple instructions, multiple data) machine has been used which implements parallel calculation at different levels of parallelization. Parallelization procedures have been adapted, and results showed that parallelization by calculation domain decomposition was far more efficient. Due to reliability issue related to the statistic feature of Monte Carlo methods, a new deterministic model was necessary to simulate gas molecules in transition regime. New models and hyperbolic systems have therefore been studied. One is chosen which allows thermodynamic values (density, average velocity, temperature, deformation tensor, heat flow) present in Navier-Stokes equations to be determined, and the equations of evolution of thermodynamic values are described for the mono-atomic case. Numerical resolution of is reported. A kinetic scheme is developed which complies with the structure of all systems, and which naturally expresses boundary conditions. The validation of the obtained 14 moment-based model is performed on shock problems and on Couette flows [fr
Zhang, M Z; Zhang, X F; Chen, X M; Chen, X; Wu, S; Xu, L L
2015-08-10
The enzyme-linked probe hybridization chip utilizes a method based on ligase-hybridizing probe chip technology, with the principle of using thio-primers for protection against enzyme digestion, and using lambda DNA exonuclease to cut multiple PCR products obtained from the sample being tested into single-strand chains for hybridization. The 5'-end amino-labeled probe was fixed onto the aldehyde chip, and hybridized with the single-stranded PCR product, followed by addition of a fluorescent-modified probe that was then enzymatically linked with the adjacent, substrate-bound probe in order to achieve highly specific, parallel, and high-throughput detection. Specificity and sensitivity testing demonstrated that enzyme-linked probe hybridization technology could be applied to the specific detection of eight genetic modification events at the same time, with a sensitivity reaching 0.1% and the achievement of accurate, efficient, and stable results.
Parallelization of a numerical simulation code for isotropic turbulence
International Nuclear Information System (INIS)
Sato, Shigeru; Yokokawa, Mitsuo; Watanabe, Tadashi; Kaburaki, Hideo.
1996-03-01
A parallel pseudospectral code which solves the three-dimensional Navier-Stokes equation by direct numerical simulation is developed and execution time, parallelization efficiency, load balance and scalability are evaluated. A vector parallel supercomputer, Fujitsu VPP500 with up to 16 processors is used for this calculation for Fourier modes up to 256x256x256 using 16 processors. Good scalability for number of processors is achieved when number of Fourier mode is fixed. For small Fourier modes, calculation time of the program is proportional to NlogN which is ideal complexity of calculation for 3D-FFT on vector parallel processors. It is found that the calculation performance decreases as the increase of the Fourier modes. (author)
Poya, Roman; Gil, Antonio J.; Ortigosa, Rogelio
2017-07-01
The paper presents aspects of implementation of a new high performance tensor contraction framework for the numerical analysis of coupled and multi-physics problems on streaming architectures. In addition to explicit SIMD instructions and smart expression templates, the framework introduces domain specific constructs for the tensor cross product and its associated algebra recently rediscovered by Bonet et al. (2015, 2016) in the context of solid mechanics. The two key ingredients of the presented expression template engine are as follows. First, the capability to mathematically transform complex chains of operations to simpler equivalent expressions, while potentially avoiding routes with higher levels of computational complexity and, second, to perform a compile time depth-first or breadth-first search to find the optimal contraction indices of a large tensor network in order to minimise the number of floating point operations. For optimisations of tensor contraction such as loop transformation, loop fusion and data locality optimisations, the framework relies heavily on compile time technologies rather than source-to-source translation or JIT techniques. Every aspect of the framework is examined through relevant performance benchmarks, including the impact of data parallelism on the performance of isomorphic and nonisomorphic tensor products, the FLOP and memory I/O optimality in the evaluation of tensor networks, the compilation cost and memory footprint of the framework and the performance of tensor cross product kernels. The framework is then applied to finite element analysis of coupled electro-mechanical problems to assess the speed-ups achieved in kernel-based numerical integration of complex electroelastic energy functionals. In this context, domain-aware expression templates combined with SIMD instructions are shown to provide a significant speed-up over the classical low-level style programming techniques.
Parallelization of a Monte Carlo particle transport simulation code
Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.
2010-05-01
We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
Enzymes are complex proteins that cause a specific chemical change in all parts of the body. For ... use them. Blood clotting is another example of enzymes at work. Enzymes are needed for all body ...
About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems
Directory of Open Access Journals (Sweden)
Loredana MOCEAN
2009-01-01
Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.
SMARTS: Exploiting Temporal Locality and Parallelism through Vertical Execution
International Nuclear Information System (INIS)
Beckman, P.; Crotinger, J.; Karmesin, S.; Malony, A.; Oldehoeft, R.; Shende, S.; Smith, S.; Vajracharya, S.
1999-01-01
In the solution of large-scale numerical prob- lems, parallel computing is becoming simultaneously more important and more difficult. The complex organization of today's multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely com- plex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of manag- ing parallelism and data locality from the user. We present innovative algorithms, based on the macro -dataflow model, for detecting data parallelism and efficiently executing data- parallel statements on shared-memory multiprocessors. We also desclibe how these algorithms can be implemented on clusters of SMPS
SMARTS: Exploiting Temporal Locality and Parallelism through Vertical Execution
Energy Technology Data Exchange (ETDEWEB)
Beckman, P.; Crotinger, J.; Karmesin, S.; Malony, A.; Oldehoeft, R.; Shende, S.; Smith, S.; Vajracharya, S.
1999-01-04
In the solution of large-scale numerical prob- lems, parallel computing is becoming simultaneously more important and more difficult. The complex organization of today's multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely com- plex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of manag- ing parallelism and data locality from the user. We present innovative algorithms, based on the macro -dataflow model, for detecting data parallelism and efficiently executing data- parallel statements on shared-memory multiprocessors. We also desclibe how these algorithms can be implemented on clusters of SMPS.
Fox, Geoffrey C; Messina, Guiseppe C
2014-01-01
A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop
Performance of a parallel plate volume cell prototype for a fast iron/gas calorimeter
International Nuclear Information System (INIS)
Bizzeti, A.; Civinini, C.; D'alessandro, R.; Ferrando, A.
1993-01-01
We present the first test of the application of the parallel plate chamber principles for the design of a very fast and radiation-hard iron/gas sampling calorimeter, suitable for very forward regions in detectors for LBC; based on the use of thick iron plates as electrodes. We have built a one cell prototype consisting of three parallel thick iron plates (117 mn each). Results on efficiencies and mean collected charge for minimum ionizing particles with different gases are presented. (Author)
Performance of a parallel plate volume cell prototype for a fast iron/gas calorimeter
Energy Technology Data Exchange (ETDEWEB)
Bizzeti, A.; Civinini, C.; D' Alessandro, R.; Ferrando, A.; Malinin, A.; Martinez-Laso, L.; Pojidaev, V.
1993-07-01
We present the first test of the application of the parallel plate chamber principles for the design of a very fast and radiation-hard iron/gas sampling calorimeter, suitable for very forward regions in detectors for LHC, based on the use of thick iron plates as electrodes. We have built a one cell prototype consisting of three parallel thick iron plates (17 mm each). Results on efficiencies and mean collected charge for minimum ionizing particles with different gases are presented. (Author) 7 refs.
Performance of a parallel plate volume cell prototype for a fast iron/gas calorimeter
International Nuclear Information System (INIS)
Bizzeti, A.; Civinini, C.; D'Alessandro, R.; Ferrando, A.; Malinin, A.; Martinez-Laso, L.; Pojidaev, V.
1993-01-01
We present the first test of the application of the parallel plate chamber principles for the design of a very fast and radiation-hard iron/gas sampling calorimeter, suitable for very forward regions in detectors for LHC, based on the use of thick iron plates as electrodes. We have built a one cell prototype consisting of three parallel thick iron plates (17 mm each). Results on efficiencies and mean collected charge for minimum ionizing particles with different gases are presented. (Author) 7 refs
Ultrascalable petaflop parallel supercomputer
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY
2010-07-20
A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
Parallelization and automatic data distribution for nuclear reactor simulations
Energy Technology Data Exchange (ETDEWEB)
Liebrock, L.M. [Liebrock-Hicks Research, Calumet, MI (United States)
1997-07-01
Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.
Parallelization and automatic data distribution for nuclear reactor simulations
International Nuclear Information System (INIS)
Liebrock, L.M.
1997-01-01
Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed
Recent advances in rational approaches for enzyme engineering
Directory of Open Access Journals (Sweden)
Kerstin Steiner
2012-09-01
Full Text Available Enzymes are an attractive alternative in the asymmetric syntheses of chiral building blocks. To meet the requirements of industrial biotechnology and to introduce new functionalities, the enzymes need to be optimized by protein engineering. This article specifically reviews rational approaches for enzyme engineering and de novo enzyme design involving structure-based approaches developed in recent years for improvement of the enzymes’ performance, broadened substrate range, and creation of novel functionalities to obtain products with high added value for industrial applications.
Provably optimal parallel transport sweeps on regular grids
International Nuclear Information System (INIS)
Adams, M. P.; Adams, M. L.; Hawkins, W. D.; Smith, T.; Rauchwerger, L.; Amato, N. M.; Bailey, T. S.; Falgout, R. D.
2013-01-01
We have found provably optimal algorithms for full-domain discrete-ordinate transport sweeps on regular grids in 3D Cartesian geometry. We describe these algorithms and sketch a 'proof that they always execute the full eight-octant sweep in the minimum possible number of stages for a given P x x P y x P z partitioning. Computational results demonstrate that our optimal scheduling algorithms execute sweeps in the minimum possible stage count. Observed parallel efficiencies agree well with our performance model. An older version of our PDT transport code achieves almost 80% parallel efficiency on 131,072 cores, on a weak-scaling problem with only one energy group, 80 directions, and 4096 cells/core. A newer version is less efficient at present-we are still improving its implementation - but achieves almost 60% parallel efficiency on 393,216 cores. These results conclusively demonstrate that sweeps can perform with high efficiency on core counts approaching 10 6 . (authors)
Provably optimal parallel transport sweeps on regular grids
Energy Technology Data Exchange (ETDEWEB)
Adams, M. P.; Adams, M. L.; Hawkins, W. D. [Dept. of Nuclear Engineering, Texas A and M University, 3133 TAMU, College Station, TX 77843-3133 (United States); Smith, T.; Rauchwerger, L.; Amato, N. M. [Dept. of Computer Science and Engineering, Texas A and M University, 3133 TAMU, College Station, TX 77843-3133 (United States); Bailey, T. S.; Falgout, R. D. [Lawrence Livermore National Laboratory (United States)
2013-07-01
We have found provably optimal algorithms for full-domain discrete-ordinate transport sweeps on regular grids in 3D Cartesian geometry. We describe these algorithms and sketch a 'proof that they always execute the full eight-octant sweep in the minimum possible number of stages for a given P{sub x} x P{sub y} x P{sub z} partitioning. Computational results demonstrate that our optimal scheduling algorithms execute sweeps in the minimum possible stage count. Observed parallel efficiencies agree well with our performance model. An older version of our PDT transport code achieves almost 80% parallel efficiency on 131,072 cores, on a weak-scaling problem with only one energy group, 80 directions, and 4096 cells/core. A newer version is less efficient at present-we are still improving its implementation - but achieves almost 60% parallel efficiency on 393,216 cores. These results conclusively demonstrate that sweeps can perform with high efficiency on core counts approaching 10{sup 6}. (authors)
pcircle - A Suite of Scalable Parallel File System Tools
Energy Technology Data Exchange (ETDEWEB)
2015-10-01
Most of the software related to file system are written for conventional local file system, they are serialized and can't take advantage of the benefit of a large scale parallel file system. "pcircle" software builds on top of ubiquitous MPI in cluster computing environment and "work-stealing" pattern to provide a scalable, high-performance suite of file system tools. In particular - it implemented parallel data copy and parallel data checksumming, with advanced features such as async progress report, checkpoint and restart, as well as integrity checking.
Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations
International Nuclear Information System (INIS)
O'Brien, M; Taylor, J; Procassini, R
2004-01-01
The performance of parallel Monte Carlo transport calculations which use both spatial and particle parallelism is increased by dynamically assigning processors to the most worked domains. Since the particle work load varies over the course of the simulation, this algorithm determines each cycle if dynamic load balancing would speed up the calculation. If load balancing is required, a small number of particle communications are initiated in order to achieve load balance. This method has decreased the parallel run time by more than a factor of three for certain criticality calculations
Isolation and characterization of an insulin-degrading enzyme from Drosophila melanogaster
International Nuclear Information System (INIS)
Garcia, J.V.; Fenton, B.W.; Rosner, M.R.
1988-01-01
An insulin-degrading enzyme (IDE) from the cytoplasm of Drosophila Kc cells has been purified and characterized. The purified enzyme is a monomer with an s value of 7.2 S, an apparent K/sub m/ for porcine insulin of 3 μM, and a specific activity of 3.3 nmol of porcine insulin degraded/(min x mg). N-Terminal sequence analysis of the gel-purified enzyme gave a single, serine-rich sequence. The Drosophila IDE shares a number of properties in common with its mammalian counterpart. The enzyme could be specifically affinity-labeled with [ 125 I]insulin, has a molecular weight of 110K, and has a pI of 5.3. Although Drosophila Kc cells grow at room temperature, the optimal enzyme activity assay conditions parallel those of the mammalian IDE: 37 0 C and a pH range of 7-8. The Drosophila IDE activity, like the mammalian enzymes, is inhibited by bacitracin and sulfhydryl-specific reagents. Similarly, the Drosophila IDE activity is insensitive to glutathione as well as protease inhibitors such as aprotinin and leupeptin. Insulin-like growth factor II, equine insulin, and porcine insulin compete for degradation of [ 125 I]insulin at comparable concentrations (approximately 10 -6 M), whereas insulin-like growth factor I and the individual A and B chains of insulin are less effective. The high degree of evolutionary conservation between the Drosophila and mammalian IDE suggest an important role for this enzyme in the metabolism of insulin and also provides further evidence for the existence of a complete insulin-like system in invertebrate organisms such as Drosophila
Identifying failure in a tree network of a parallel computer
Archer, Charles J.; Pinnow, Kurt W.; Wallenfelt, Brian P.
2010-08-24
Methods, parallel computers, and products are provided for identifying failure in a tree network of a parallel computer. The parallel computer includes one or more processing sets including an I/O node and a plurality of compute nodes. For each processing set embodiments include selecting a set of test compute nodes, the test compute nodes being a subset of the compute nodes of the processing set; measuring the performance of the I/O node of the processing set; measuring the performance of the selected set of test compute nodes; calculating a current test value in dependence upon the measured performance of the I/O node of the processing set, the measured performance of the set of test compute nodes, and a predetermined value for I/O node performance; and comparing the current test value with a predetermined tree performance threshold. If the current test value is below the predetermined tree performance threshold, embodiments include selecting another set of test compute nodes. If the current test value is not below the predetermined tree performance threshold, embodiments include selecting from the test compute nodes one or more potential problem nodes and testing individually potential problem nodes and links to potential problem nodes.
Electromagnetic Physics Models for Parallel Computing Architectures
International Nuclear Information System (INIS)
Amadio, G; Bianchini, C; Iope, R; Ananya, A; Apostolakis, J; Aurora, A; Bandieramonte, M; Brun, R; Carminati, F; Gheata, A; Gheata, M; Goulas, I; Nikitina, T; Bhattacharyya, A; Mohanty, A; Canal, P; Elvira, D; Jun, S Y; Lima, G; Duhem, L
2016-01-01
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well. (paper)
International Nuclear Information System (INIS)
Solis, C.; Oliver, A.; Andrade, E.; Ruvalcaba-Sil, J.L.; Romero, I.; Celis, H.
1999-01-01
Zinc is a necessary component in the action and structural stability of many enzymes. Some of them are well characterized, but in others, Zn stoichiometry and its association is not known. PIXE has been proven to be a suitable technique for analyzing metallic proteins embedded in electrophoresis gels. In this study, PIXE has been used to investigate the Zn content of enzymes that are known to carry Zn atoms. These include the carbonic anhydrase, an enzyme well characterized by other methods and the cytoplasmic pyrophosphatase of Rhodospirillum rubrum that is known to require Zn to be stable but not how many metal ions are involved or how they are bound to the enzyme. Native proteins have been purified by polyacrylamide gel electrophoresis and direct identification and quantification of Zn in the gel bands was performed with an external proton beam of 3.7 MeV energy
Parallelization for X-ray crystal structural analysis program
Energy Technology Data Exchange (ETDEWEB)
Watanabe, Hiroshi [Japan Atomic Energy Research Inst., Tokyo (Japan); Minami, Masayuki; Yamamoto, Akiji
1997-10-01
In this report we study vectorization and parallelization for X-ray crystal structural analysis program. The target machine is NEC SX-4 which is a distributed/shared memory type vector parallel supercomputer. X-ray crystal structural analysis is surveyed, and a new multi-dimensional discrete Fourier transform method is proposed. The new method is designed to have a very long vector length, so that it enables to obtain the 12.0 times higher performance result that the original code. Besides the above-mentioned vectorization, the parallelization by micro-task functions on SX-4 reaches 13.7 times acceleration in the part of multi-dimensional discrete Fourier transform with 14 CPUs, and 3.0 times acceleration in the whole program. Totally 35.9 times acceleration to the original 1CPU scalar version is achieved with vectorization and parallelization on SX-4. (author)
The use of enzymes for beer brewing
Donkelaar, van Laura H.G.; Mostert, Joost; Zisopoulos, Filippos K.; Boom, Remko M.; Goot, van der Atze Jan
2016-01-01
The exergetic performance of beer produced by the conventional malting and brewing process is compared with that of beer produced using an enzyme-assisted process. The aim is to estimate if the use of an exogenous enzyme formulation reduces the environmental impact of the overall brewing process.
Enzymes for Enhanced Oil Recovery (EOR)
Energy Technology Data Exchange (ETDEWEB)
Nasiri, Hamidreza
2011-04-15
Primary oil recovery by reservoir pressure depletion and secondary oil recovery by waterflooding usually result in poor displacement efficiency. As a consequence there is always some trapped oil remaining in oil reservoirs. Oil entrapment is a result of complex interactions between viscous, gravity and capillary forces. Improving recovery from hydrocarbon fields typically involves altering the relative importance of the viscous and capillary forces. The potential of many EOR methods depends on their influence on fluid/rock interactions related to wettability and fluid/fluid interactions reflected in IFT. If the method has the potential to change the interactions favorably, it may be considered for further investigation, i.e. core flooding experiment, pilot and reservoir implementation. Enzyme-proteins can be introduced as an enhanced oil recovery method to improve waterflood performance by affecting interactions at the oil-water-rock interfaces. An important part of this thesis was to investigate how selected enzymes may influence wettability and capillary forces in a crude oil-brine-rock system, and thus possibly contribute to enhanced oil recovery. To investigate further by which mechanisms selected enzyme-proteins may contribute to enhance oil recovery, groups of enzymes with different properties and catalytic functions, known to be interfacially active, were chosen to cover a wide range of possible effects. These groups include (1) Greenzyme (GZ) which is a commercial EOR enzyme and consists of enzymes and stabilizers (surfactants), (2) The Zonase group consists of two types of pure enzyme, Zonase1 and Zonase2 which are protease enzymes and whose catalytic functions are to hydrolyze (breakdown) peptide bonds, (3) The Novozyme (NZ) group consists of three types of pure enzyme, NZ2, NZ3 and NZ6 which are esterase enzymes and whose catalytic functions are to hydrolyze ester bonds, and (4) Alpha-Lactalbumin ( -La) which is an important whey protein. The effect of
SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation
Steinman, Jeff S.
1992-01-01
Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.
Parallel optimization of IDW interpolation algorithm on multicore platform
Guan, Xuefeng; Wu, Huayi
2009-10-01
Due to increasing power consumption, heat dissipation, and other physical issues, the architecture of central processing unit (CPU) has been turning to multicore rapidly in recent years. Multicore processor is packaged with multiple processor cores in the same chip, which not only offers increased performance, but also presents significant challenges to application developers. As a matter of fact, in GIS field most of current GIS algorithms were implemented serially and could not best exploit the parallelism potential on such multicore platforms. In this paper, we choose Inverse Distance Weighted spatial interpolation algorithm (IDW) as an example to study how to optimize current serial GIS algorithms on multicore platform in order to maximize performance speedup. With the help of OpenMP, threading methodology is introduced to split and share the whole interpolation work among processor cores. After parallel optimization, execution time of interpolation algorithm is greatly reduced and good performance speedup is achieved. For example, performance speedup on Intel Xeon 5310 is 1.943 with 2 execution threads and 3.695 with 4 execution threads respectively. An additional output comparison between pre-optimization and post-optimization is carried out and shows that parallel optimization does to affect final interpolation result.
Parallel and distributed processing in two SGBDS: A case study
Directory of Open Access Journals (Sweden)
Francisco Javier Moreno
2017-04-01
Full Text Available Context: One of the strategies for managing large volumes of data is distributed and parallel computing. Among the tools that allow applying these characteristics are some Data Base Management Systems (DBMS, such as Oracle, DB2, and SQL Server. Method: In this paper we present a case study where we evaluate the performance of an SQL query in two of these DBMS. The evaluation is done through various forms of data distribution in a computer network with different degrees of parallelism. Results: The tests of the SQL query evidenced the performance differences between the two DBMS analyzed. However, more thorough testing and a wider variety of queries are needed. Conclusions: The differences in performance between the two DBMSs analyzed show that when evaluating this aspect, it is necessary to consider the particularities of each DBMS and the degree of parallelism of the queries.
Contribution of diffuser surfaces to efficiency of tilted T shape parallel highway noise barriers
Directory of Open Access Journals (Sweden)
N. Javid Rouzi
2009-04-01
Full Text Available Background and aimsThe paper presents the results of an investigation on the acoustic performance of tilted profile parallel barriers with quadratic residue diffuser tops and faces.MethodsA2D boundary element method (BEM is used to predict the barrier insertion loss. The results of rigid and with absorptive coverage are also calculated for comparisons. Using QRD on the top surface and faces of all tilted profile parallel barrier models introduced here is found to improve the efficiency of barriers compared with rigid equivalent parallel barrier at the examined receiver positions.Results Applying a QRD with frequency design of 400 Hz on 5 degrees tilted parallel barrier improves the overall performance of its equivalent rigid barrier by 1.8 dB(A. Increase the treated surfaces with reactive elements shifts the effective performance toward lower frequencies. It is found that by tilting the barriers from 0 to 10 degrees in parallel set up, the degradation effects in parallel barriers is reduced but the absorption effect of fibrous materials and also diffusivity of thequadratic residue diffuser is reduced significantly. In this case all the designed barriers have better performance with 10 degrees tilting in parallel set up.ConclusionThe most economic traffic noise parallel barrier, which produces significantly high performance, is achieved by covering the top surface of the barrier closed to the receiver by just a QRD with frequency design of 400 Hz and tilting angle of 10 degrees. The average Aweighted insertion loss in this barrier is predicted to be 16.3 dB (A.
An object-oriented programming paradigm for parallelization of computational fluid dynamics
International Nuclear Information System (INIS)
Ohta, Takashi.
1997-03-01
We propose an object-oriented programming paradigm for parallelization of scientific computing programs, and show that the approach can be a very useful strategy. Generally, parallelization of scientific programs tends to be complicated and unportable due to the specific requirements of each parallel computer or compiler. In this paper, we show that the object-oriented programming design, which separates the parallel processing parts from the solver of the applications, can achieve the large improvement in the maintenance of the codes, as well as the high portability. We design the program for the two-dimensional Euler equations according to the paradigm, and evaluate the parallel performance on IBM SP2. (author)
Directory of Open Access Journals (Sweden)
MA Al-Harthi
Full Text Available ABSTRACT An experiment was carried out to investigate the effect of using olive cake (OC in broiler diets, when it is supplemented with multi-enzymes or phytase enzyme. The OC was included in isocaloric, isonitorgneous diets at 5 and 10% levels and fed to broilers from 1-28 days of age. Experimental diets were fed with or without either of the two enzymes: galzym or phytase. This resulted in 3 OC levels (0, 5, 10% × 3 enzyme supplementations (no enzyme, galzym enzyme, phytase enzyme. This included nine treatments, and each treatment was replicated eight times with seven broiler chickens each. Feed intake, feed conversion ratio, body weight gain, survival rate, dressing, inner and immune organ´s weights (compared to live body weight; and blood lipids constituents were investigated. According to the findings in this study, it could be concluded that OC is a valuable ingredient and might be included in the broiler diet up to 10% without galzym or phytase enzyme addition. Also, further studies should investigate the possibility of using higher ratios of it or mixed with another by-product in poultry diets; as a very cheap by-product. Moreover, these studies can be associated with suitable additives at different concentrations that might help to increase the utilization of olive cake or at least to keep performance equal to the control. On the other hand, it is worthwhile to follow the positive effect of phytase enzyme on cholesterol and very low density lipoprotein (VLDL concentrations, which may relate it´s use with chicken´s health.
Khempaka, Sutisa; Maliwan, Prapot; Okrathok, Supattra; Molee, Wittawat
2018-02-24
Two experiments were conducted to investigate the potential use of dried cassava pulp (DCP) supplemented with enzymes as an alternative feed ingredient in laying hen diets. In experiment 1, 45 laying hens (Isa Brown) aged 45 weeks were placed in individual cages to measure nutrient digestibility for 10 days. Nine dietary treatments were control and DCP as a replacement for corn at 20, 25, 30, and 35% supplemented with mixed enzymes (cellulase, glucanase, and xylanase) at 0.10 and 0.15%. Results showed that the use of DCP at 20-35% added with mixed enzymes had no negative effects on dry matter digestibility, while organic matter digestibility and nitrogen retention decreased with increased DCP up to 30-35% in diets. Both enzyme levels (0.10 and 0.15%) showed similar results on nutrient digestibility and retention. In experiment 2, a total of 336 laying hens aged 32 weeks were randomly allocated to seven dietary treatments (control and DCP-substituted diets at 20, 25, and 30%) supplemented with mixed enzymes (0.10 and 0.15%). Diets incorporated with 20-30% of DCP and supplemented with mixed enzymes at both levels had no significant effects on egg production, egg weight, feed intake, egg mass, feed conversion ratio, protein efficiency ratio, or egg quality, except for egg yolk color being decreased with an increase of DCP in diets (P digestibility, productive performance, or egg quality.
DEFF Research Database (Denmark)
Romero, N. A.; Glinsvad, Christian; Larsen, Ask Hjorth
2013-01-01
Density function theory (DFT) is the most widely employed electronic structure method because of its favorable scaling with system size and accuracy for a broad range of molecular and condensed-phase systems. The advent of massively parallel supercomputers has enhanced the scientific community...
Directory of Open Access Journals (Sweden)
Y. Feng
2012-05-01
Full Text Available Immune stress is the loss of immune homeostasis caused by external forces. The purpose of this experiment was to investigate the effects of immune stress on the growth performance, small intestinal enzymes and peristalsis rate, and mRNA expression of nutrient transporters in broiler chickens. Four hundred and thirty-two 1-d-old broilers (Cobb500 were randomly assigned to four groups for treatment; each group included nine cages with 12 birds per cage. Group 1 = no vaccine (NV; Group 2 = conventional vaccine (CV; group 3 = lipopolysaccharide (LPS+conventional vaccine (LPS; group 4 = cyclophosphamide (CYP+conventional vaccine (CYP. The results demonstrated that immune stress by LPS and CYP reduced body weight gain (BWG, feed intake (FI, small intestine peristalsis rate and sIgA content in small intestinal digesta (p<0.05. However, the feed conversion ratio (FCR remained unchanged during the feeding period. LPS and CYP increased intestinal enzyme activity, relative expression of SGLT-1, CaBP-D28k and L-FABP mRNAs (p<0.05. LPS and CYP injection had a negative effect on the growth performance of healthy broiler chickens. The present study demonstrated that NV and CV could improve growth performance while enzyme activity in small intestine and relative expression of nutrient transporter mRNA of NV and CV were decreased in the conditions of a controlled rational feeding environment. It is generally recommended that broilers only need to be vaccinated for the diseases to which they might be exposed.
Implantable enzyme amperometric biosensors.
Kotanen, Christian N; Moussy, Francis Gabriel; Carrara, Sandro; Guiseppi-Elie, Anthony
2012-05-15
The implantable enzyme amperometric biosensor continues as the dominant in vivo format for the detection, monitoring and reporting of biochemical analytes related to a wide range of pathologies. Widely used in animal studies, there is increasing emphasis on their use in diabetes care and management, the management of trauma-associated hemorrhage and in critical care monitoring by intensivists in the ICU. These frontier opportunities demand continuous indwelling performance for up to several years, well in excess of the currently approved seven days. This review outlines the many challenges to successful deployment of chronically implantable amperometric enzyme biosensors and emphasizes the emerging technological approaches in their continued development. The foreign body response plays a prominent role in implantable biotransducer failure. Topics considering the approaches to mitigate the inflammatory response, use of biomimetic chemistries, nanostructured topographies, drug eluting constructs, and tissue-to-device interface modulus matching are reviewed. Similarly, factors that influence biotransducer performance such as enzyme stability, substrate interference, mediator selection and calibration are reviewed. For the biosensor system, the opportunities and challenges of integration, guided by footprint requirements, the limitations of mixed signal electronics, and power requirements, has produced three systems approaches. The potential is great. However, integration along the multiple length scales needed to address fundamental issues and integration across the diverse disciplines needed to achieve success of these highly integrated systems, continues to be a challenge in the development and deployment of implantable amperometric enzyme biosensor systems. Copyright © 2012 Elsevier B.V. All rights reserved.
Energy Technology Data Exchange (ETDEWEB)
Franz, A., LLNL
1998-02-17
The numerical simulation of chemically reacting flows is a topic, that has attracted a great deal of current research At the heart of numerical reactive flow simulations are large sets of coupled, nonlinear Partial Differential Equations (PDES). Due to the stiffness that is usually present, explicit time differencing schemes are not used despite their inherent simplicity and efficiency on parallel and vector machines, since these schemes require prohibitively small numerical stepsizes. Implicit time differencing schemes, although possessing good stability characteristics, introduce a great deal of computational overhead necessary to solve the simultaneous algebraic system at each timestep. This thesis examines an algorithm based on a preconditioned time differencing scheme. The algorithm is explicit and permits a large stable time step. An investigation of the algorithm`s accuracy, stability and performance on a parallel architecture is presented
Directory of Open Access Journals (Sweden)
Yeu-Long Jiang
2013-08-01
Full Text Available This paper describes a light-addressed electrolytic system used to perform an electrodeposition of enzyme-entrapped chitosan membranes for multiplexed enzyme-based bioassays using a digital micromirror device (DMD. In this system, a patterned light illumination is projected onto a photoconductive substrate serving as a photo-cathode to electrolytically produce hydroxide ions, which leads to an increased pH gradient. The high pH generated at the cathode can cause a local gelation of chitosan through sol-gel transition. By controlling the illumination pattern on the DMD, a light-addressed electrodeposition of chitosan membranes with different shapes and sizes, as well as multiplexed micropatterning, was performed. The effect of the illumination time of the light pattern on the dimensional resolution of chitosan membrane formation was examined experimentally. Moreover, multiplexed enzyme-based bioassay of enzyme-entrapped chitosan membranes was also successfully demonstrated through the electrodeposition of the chitosan membranes with various shapes/sizes and entrapping different enzymes. As a model experiment, glucose and ethanol were simultaneously detected in a single detection chamber without cross-talk using shape-coded chitosan membranes entrapped with glucose oxidase (GOX, peroxidase (POD, and Amplex Red (AmR or alcohol oxidase (AOX, POD, and AmR by using same fluorescence indicator (AmR.
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2014-08-12
Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Spatially parallel processing of within-dimension conjunctions.
Linnell, K J; Humphreys, G W
2001-01-01
Within-dimension conjunction search for red-green targets amongst red-blue, and blue-green, nontargets is extremely inefficient (Wolfe et al, 1990 Journal of Experimental Psychology: Human Perception and Performance 16 879-892). We tested whether pairs of red-green conjunction targets can nevertheless be processed spatially in parallel. Participants made speeded detection responses whenever a red-green target was present. Across trials where a second identical target was present, the distribution of detection times was compatible with the assumption that targets were processed in parallel (Miller, 1982 Cognitive Psychology 14 247-279). We show that this was not an artifact of response-competition or feature-based processing. We suggest that within-dimension conjunctions can be processed spatially in parallel. Visual search for such items may be inefficient owing to within-dimension grouping between items.
Kinematics analysis of a novel planar parallel manipulator with kinematic redundancy
Energy Technology Data Exchange (ETDEWEB)
Qu, Haibo; Guo, Sheng [Beijing Jiaotong University, Beijing (China)
2017-04-15
In this paper, a novel planar parallel manipulator with kinematic redundancy is proposed. First, the Degrees of freedom (DOF) of the whole parallel manipulator and the Relative DOF (RDOF) between the moving platform and fixed base are studied. The results indicate that the proposed mechanism is kinematically redundant. Then, the kinematics, Jacobian matrices and workspace of this proposed parallel manipulator with kinematic redundancy are analyzed. Finally, the statics simulation of the proposed parallel manipulator is performed. The obtained stress and displacement distribution can be used to determine the easily destroyed place in the mechanism configurations.
Kinematics analysis of a novel planar parallel manipulator with kinematic redundancy
International Nuclear Information System (INIS)
Qu, Haibo; Guo, Sheng
2017-01-01
In this paper, a novel planar parallel manipulator with kinematic redundancy is proposed. First, the Degrees of freedom (DOF) of the whole parallel manipulator and the Relative DOF (RDOF) between the moving platform and fixed base are studied. The results indicate that the proposed mechanism is kinematically redundant. Then, the kinematics, Jacobian matrices and workspace of this proposed parallel manipulator with kinematic redundancy are analyzed. Finally, the statics simulation of the proposed parallel manipulator is performed. The obtained stress and displacement distribution can be used to determine the easily destroyed place in the mechanism configurations
DEEPre: sequence-based enzyme EC number prediction by deep learning
Li, Yu
2017-10-20
Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number.We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manuallycrafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre\\'s ability to capture the functional difference of enzyme isoforms.The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre.
DEEPre: sequence-based enzyme EC number prediction by deep learning
Li, Yu; Wang, Sheng; Umarov, Ramzan; Xie, Bingqing; Fan, Ming; Li, Lihua; Gao, Xin
2017-01-01
Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number.We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manuallycrafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre's ability to capture the functional difference of enzyme isoforms.The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre.
International Nuclear Information System (INIS)
Obata, Shiro; Yamanishi, Mikio; Katsumi, Shingo
2015-01-01
Preoperative chemoradiotherapy was applied to two cases of advanced rectal cancer. In addition, radiation sensitizers were injected to the lesion endoscopically at a pace of twice a week in order to enhance therapeutic effects (so-called enzyme-targeting and radio-sensitization treatment: KORTUC [Kochi Oxydol Radio-sensitization Treatment for Unresectable Carcinomas]). The flattening of the lesion shape was observed for both cases in a short period of time, then, Mile's and lateral lymphnode dissection was performed. The remnant of lesion was not pointed out in postoperative pathological specimens for both cases, and histological judgment after the treatment was ranked as Grade 3. In light of the better-than-expected results, this hospital is preparing for clinical trials, and planning to carefully accumulate the cases. As one of the curative treatment strategies against advanced rectal cancer, the authors are willing to make this KORTUC more objectively reliable as a safe and minimally invasive therapy. (A.O.)
Xyce parallel electronic simulator release notes.
Energy Technology Data Exchange (ETDEWEB)
Keiter, Eric R; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.
2010-05-01
The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.
A Parallel Processing Algorithm for Remote Sensing Classification
Gualtieri, J. Anthony
2005-01-01
A current thread in parallel computation is the use of cluster computers created by networking a few to thousands of commodity general-purpose workstation-level commuters using the Linux operating system. For example on the Medusa cluster at NASA/GSFC, this provides for super computing performance, 130 G(sub flops) (Linpack Benchmark) at moderate cost, $370K. However, to be useful for scientific computing in the area of Earth science, issues of ease of programming, access to existing scientific libraries, and portability of existing code need to be considered. In this paper, I address these issues in the context of tools for rendering earth science remote sensing data into useful products. In particular, I focus on a problem that can be decomposed into a set of independent tasks, which on a serial computer would be performed sequentially, but with a cluster computer can be performed in parallel, giving an obvious speedup. To make the ideas concrete, I consider the problem of classifying hyperspectral imagery where some ground truth is available to train the classifier. In particular I will use the Support Vector Machine (SVM) approach as applied to hyperspectral imagery. The approach will be to introduce notions about parallel computation and then to restrict the development to the SVM problem. Pseudocode (an outline of the computation) will be described and then details specific to the implementation will be given. Then timing results will be reported to show what speedups are possible using parallel computation. The paper will close with a discussion of the results.
Analysis of Retransmission Policies for Parallel Data Transmission
Directory of Open Access Journals (Sweden)
I. A. Halepoto
2018-06-01
Full Text Available Stream control transmission protocol (SCTP is a transport layer protocol, which is efficient, reliable, and connection-oriented as compared to transmission control protocol (TCP and user datagram protocol (UDP. Additionally, SCTP has more innovative features like multihoming, multistreaming and unordered delivery. With multihoming, SCTP establishes multiple paths between a sender and receiver. However, it only uses the primary path for data transmission and the secondary path (or paths for fault tolerance. Concurrent multipath transfer extension of SCTP (CMT-SCTP allows a sender to transmit data in parallel over multiple paths, which increases the overall transmission throughput. Parallel data transmission is beneficial for higher data rates. Parallel transmission or connection is also good in services such as video streaming where if one connection is occupied with errors the transmission continues on alternate links. With parallel transmission, the unordered data packets arrival is very common at receiver. The receiver has to wait until the missing data packets arrive, causing performance degradation while using CMT-SCTP. In order to reduce the transmission delay at the receiver, CMT-SCTP uses intelligent retransmission polices to immediately retransmit the missing packets. The retransmission policies used by CMT-SCTP are RTX-SSTHRESH, RTX-LOSSRATE and RTX-CWND. The main objective of this paper is the performance analysis of the retransmission policies. This paper evaluates RTX-SSTHRESH, RTX-LOSSRATE and RTX-CWND. Simulations are performed on the Network Simulator 2. In the simulations with various scenarios and parameters, it is observed that the RTX-LOSSRATE is a suitable policy.
Massively parallel sparse matrix function calculations with NTPoly
Dawson, William; Nakajima, Takahito
2018-04-01
We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
A hybrid algorithm for parallel molecular dynamics simulations
Mangiardi, Chris M.; Meyer, R.
2017-10-01
This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.
Development of parallel Fokker-Planck code ALLAp
International Nuclear Information System (INIS)
Batishcheva, A.A.; Sigmar, D.J.; Koniges, A.E.
1996-01-01
We report on our ongoing development of the 3D Fokker-Planck code ALLA for a highly collisional scrape-off-layer (SOL) plasma. A SOL with strong gradients of density and temperature in the spatial dimension is modeled. Our method is based on a 3-D adaptive grid (in space, magnitude of the velocity, and cosine of the pitch angle) and a second order conservative scheme. Note that the grid size is typically 100 x 257 x 65 nodes. It was shown in our previous work that only these capabilities make it possible to benchmark a 3D code against a spatially-dependent self-similar solution of a kinetic equation with the Landau collision term. In the present work we show results of a more precise benchmarking against the exact solutions of the kinetic equation using a new parallel code ALLAp with an improved method of parallelization and a modified boundary condition at the plasma edge. We also report first results from the code parallelization using Message Passing Interface for a Massively Parallel CRI T3D platform. We evaluate the ALLAp code performance versus the number of T3D processors used and compare its efficiency against a Work/Data Sharing parallelization scheme and a workstation version
Parallel single-cell analysis microfluidic platform
van den Brink, Floris Teunis Gerardus; Gool, Elmar; Frimat, Jean-Philippe; Bomer, Johan G.; van den Berg, Albert; le Gac, Severine
2011-01-01
We report a PDMS microfluidic platform for parallel single-cell analysis (PaSCAl) as a powerful tool to decipher the heterogeneity found in cell populations. Cells are trapped individually in dedicated pockets, and thereafter, a number of invasive or non-invasive analysis schemes are performed.
Parallel quantum computing in a single ensemble quantum computer
International Nuclear Information System (INIS)
Long Guilu; Xiao, L.
2004-01-01
We propose a parallel quantum computing mode for ensemble quantum computer. In this mode, some qubits are in pure states while other qubits are in mixed states. It enables a single ensemble quantum computer to perform 'single-instruction-multidata' type of parallel computation. Parallel quantum computing can provide additional speedup in Grover's algorithm and Shor's algorithm. In addition, it also makes a fuller use of qubit resources in an ensemble quantum computer. As a result, some qubits discarded in the preparation of an effective pure state in the Schulman-Varizani and the Cleve-DiVincenzo algorithms can be reutilized
Parallel phase model : a programming model for high-end parallel machines with manycores.
Energy Technology Data Exchange (ETDEWEB)
Wu, Junfeng (Syracuse University, Syracuse, NY); Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian
2009-04-01
This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.
Parallelized event chain algorithm for dense hard sphere and polymer systems
International Nuclear Information System (INIS)
Kampmann, Tobias A.; Boltz, Horst-Holger; Kierfeld, Jan
2015-01-01
We combine parallelization and cluster Monte Carlo for hard sphere systems and present a parallelized event chain algorithm for the hard disk system in two dimensions. For parallelization we use a spatial partitioning approach into simulation cells. We find that it is crucial for correctness to ensure detailed balance on the level of Monte Carlo sweeps by drawing the starting sphere of event chains within each simulation cell with replacement. We analyze the performance gains for the parallelized event chain and find a criterion for an optimal degree of parallelization. Because of the cluster nature of event chain moves massive parallelization will not be optimal. Finally, we discuss first applications of the event chain algorithm to dense polymer systems, i.e., bundle-forming solutions of attractive semiflexible polymers
A Parallel Algebraic Multigrid Solver on Graphics Processing Units
Haase, Gundolf
2010-01-01
The paper presents a multi-GPU implementation of the preconditioned conjugate gradient algorithm with an algebraic multigrid preconditioner (PCG-AMG) for an elliptic model problem on a 3D unstructured grid. An efficient parallel sparse matrix-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster and a multi-GPU configuration with eight GPUs is about 100 times faster than a typical server CPU core. © 2010 Springer-Verlag.
Parallel simulation of tsunami inundation on a large-scale supercomputer
Oishi, Y.; Imamura, F.; Sugawara, D.
2013-12-01
An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the
Performance response and egg qualities of laying birds fed enzyme ...
African Journals Online (AJOL)
Theperformance response and egg qualities o laying birds fed enzyme supplemented PKC diets asreplacement for maize was investigated wth 210, 20 week old layng pullets of Dominant Black strain at the Teaching and Research Farm of the Delta State University, Asaba Campus, Nigeria. The birds which ust come into ...
Parallel real-time visualization system for large-scale simulation. Application to WSPEEDI
International Nuclear Information System (INIS)
Muramatsu, Kazuhiro; Otani, Takayuki; Kitabata, Hideyuki; Matsumoto, Hideki; Takei, Toshifumi; Doi, Shun
2000-01-01
The real-time visualization system, PATRAS (PArallel TRAcking Steering system) has been developed on parallel computing servers. The system performs almost all of the visualization tasks on a parallel computing server, and uses image data compression technique for efficient communication between the server and the client terminal. Therefore, the system realizes high performance concurrent visualization in an internet computing environment. The experience in applying PATRAS to WSPEEDI (Worldwide version of System for Prediction Environmental Emergency Dose Information) is reported. The application of PATRAS to WSPEEDI enables users to understand behaviours of radioactive tracers from different release points easily and quickly. (author)
International Nuclear Information System (INIS)
Ullmann, P.
2007-06-01
The primary objective of this work was the first experimental realization of parallel RF transmission for accelerating spatially selective excitation in magnetic resonance imaging. Furthermore, basic aspects regarding the performance of this technique were investigated, potential risks regarding the specific absorption rate (SAR) were considered and feasibility studies under application-oriented conditions as first steps towards a practical utilisation of this technique were undertaken. At first, based on the RF electronics platform of the Bruker Avance MRI systems, the technical foundations were laid to perform simultaneous transmission of individual RF waveforms on different RF channels. Another essential requirement for the realization of Parallel Excitation (PEX) was the design and construction of suitable RF transmit arrays with elements driven by separate transmit channels. In order to image the PEX results two imaging methods were implemented based on a spin-echo and a gradient-echo sequence, in which a parallel spatially selective pulse was included as an excitation pulse. In the course of this work PEX experiments were successfully performed on three different MRI systems, a 4.7 T and a 9.4 T animal system and a 3 T human scanner, using 5 different RF coil setups in total. In the last part of this work investigations regarding possible applications of Parallel Excitation were performed. A first study comprised experiments of slice-selective B1 inhomogeneity correction by using 3D-selective Parallel Excitation. The investigations were performed in a phantom as well as in a rat fixed in paraformaldehyde solution. In conjunction with these experiments a novel method of calculating RF pulses for spatially selective excitation based on a so-called Direct Calibration approach was developed, which is particularly suitable for this type of experiments. In the context of these experiments it was demonstrated how to combine the advantages of parallel transmission
Scheduling Parallel Jobs Using Migration and Consolidation in the Cloud
Directory of Open Access Journals (Sweden)
Xiaocheng Liu
2012-01-01
Full Text Available An increasing number of high performance computing parallel applications leverages the power of the cloud for parallel processing. How to schedule the parallel applications to improve the quality of service is the key to the successful host of parallel applications in the cloud. The large scale of the cloud makes the parallel job scheduling more complicated as even simple parallel job scheduling problem is NP-complete. In this paper, we propose a parallel job scheduling algorithm named MEASY. MEASY adopts migration and consolidation to enhance the most popular EASY scheduling algorithm. Our extensive experiments on well-known workloads show that our algorithm takes very good care of the quality of service. For two common parallel job scheduling objectives, our algorithm produces an up to 41.1% and an average of 23.1% improvement on the average response time; an up to 82.9% and an average of 69.3% improvement on the average slowdown. Our algorithm is robust even in terms that it allows inaccurate CPU usage estimation and high migration cost. Our approach involves trivial modification on EASY and requires no additional technique; it is practical and effective in the cloud environment.
A Parallel Prefix Algorithm for Almost Toeplitz Tridiagonal Systems
Sun, Xian-He; Joslin, Ronald D.
1995-01-01
A compact scheme is a discretization scheme that is advantageous in obtaining highly accurate solutions. However, the resulting systems from compact schemes are tridiagonal systems that are difficult to solve efficiently on parallel computers. Considering the almost symmetric Toeplitz structure, a parallel algorithm, simple parallel prefix (SPP), is proposed. The SPP algorithm requires less memory than the conventional LU decomposition and is efficient on parallel machines. It consists of a prefix communication pattern and AXPY operations. Both the computation and the communication can be truncated without degrading the accuracy when the system is diagonally dominant. A formal accuracy study has been conducted to provide a simple truncation formula. Experimental results have been measured on a MasPar MP-1 SIMD machine and on a Cray 2 vector machine. Experimental results show that the simple parallel prefix algorithm is a good algorithm for symmetric, almost symmetric Toeplitz tridiagonal systems and for the compact scheme on high-performance computers.
Casanova, Henri; Robert, Yves
2008-01-01
""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi
A portable, parallel, object-oriented Monte Carlo neutron transport code in C++
International Nuclear Information System (INIS)
Lee, S.R.; Cummings, J.C.; Nolen, S.D.
1997-01-01
We have developed a multi-group Monte Carlo neutron transport code using C++ and the Parallel Object-Oriented Methods and Applications (POOMA) class library. This transport code, called MC++, currently computes k and α-eigenvalues and is portable to and runs parallel on a wide variety of platforms, including MPPs, clustered SMPs, and individual workstations. It contains appropriate classes and abstractions for particle transport and, through the use of POOMA, for portable parallelism. Current capabilities of MC++ are discussed, along with physics and performance results on a variety of hardware, including all Accelerated Strategic Computing Initiative (ASCI) hardware. Current parallel performance indicates the ability to compute α-eigenvalues in seconds to minutes rather than hours to days. Future plans and the implementation of a general transport physics framework are also discussed
International Nuclear Information System (INIS)
Wellmann, E.; Hrazdina, G.; Grisebach, H.
1976-01-01
Only UV light below 345 nm stimulates anthocyanin formation in dark grown cell suspension cultures of Haplopappus gracilis. A linear relationship between UV dose and flavonoid accumulation, as found previously with parsley cell cultures was not observed with the H.gracilis cells. Only continuous irradiation with high doses of UV was effective. Drastic increases in the activities of the enzymes phenylalanine ammonia-lyase, chalcone isomerase and flavanone synthase were observed under continuous UV light. The increase in enzyme activities paralleled anthocyanin formation. (author)
Extending the POSIX I/O interface: a parallel file system perspective.
Energy Technology Data Exchange (ETDEWEB)
Vilayannur, M.; Lang, S.; Ross, R.; Klundt, R.; Ward, L.; Mathematics and Computer Science; VMWare, Inc.; SNL
2008-12-11
The POSIX interface does not lend itself well to enabling good performance for high-end applications. Extensions are needed in the POSIX I/O interface so that high-concurrency HPC applications running on top of parallel file systems perform well. This paper presents the rationale, design, and evaluation of a reference implementation of a subset of the POSIX I/O interfaces on a widely used parallel file system (PVFS) on clusters. Experimental results on a set of micro-benchmarks confirm that the extensions to the POSIX interface greatly improve scalability and performance.
Exploiting Thread Parallelism for Ocean Modeling on Cray XC Supercomputers
Energy Technology Data Exchange (ETDEWEB)
Sarje, Abhinav [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Jacobsen, Douglas W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Williams, Samuel W. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Ringler, Todd [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)
2016-05-01
The incorporation of increasing core counts in modern processors used to build state-of-the-art supercomputers is driving application development towards exploitation of thread parallelism, in addition to distributed memory parallelism, with the goal of delivering efficient high-performance codes. In this work we describe the exploitation of threading and our experiences with it with respect to a real-world ocean modeling application code, MPAS-Ocean. We present detailed performance analysis and comparisons of various approaches and configurations for threading on the Cray XC series supercomputers.
Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study
Directory of Open Access Journals (Sweden)
Hari Radhakrishnan
2015-01-01
Full Text Available This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were done using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.
Directory of Open Access Journals (Sweden)
Dwi Purnomo
2015-08-01
Full Text Available Register is one of the computer components that have a key role in computer organisation. Every computer contains millions of registers that are manifested by flip-flop. This research focuses on the investigation of flip-flop performance based on its type (D, T, S-R, and J-K and architecture (structural, behavioural, and hybrid. Each type of flip-flop on each architecture would be tested in different bit of shift register with parallel load applications. The experiment criteria that will be assessed are power consumption, resources required, memory required, latency, and efficiency. Based on the experiment, it could be shown that D flip-flop and hybrid architecture showed the best performance in required memory, latency, power consumption, and efficiency. In addition, the experiment results showed that the greater the register number, the less efficient the system would be.
Parallel imaging with phase scrambling.
Zaitsev, Maxim; Schultz, Gerrit; Hennig, Juergen; Gruetter, Rolf; Gallichan, Daniel
2015-04-01
Most existing methods for accelerated parallel imaging in MRI require additional data, which are used to derive information about the sensitivity profile of each radiofrequency (RF) channel. In this work, a method is presented to avoid the acquisition of separate coil calibration data for accelerated Cartesian trajectories. Quadratic phase is imparted to the image to spread the signals in k-space (aka phase scrambling). By rewriting the Fourier transform as a convolution operation, a window can be introduced to the convolved chirp function, allowing a low-resolution image to be reconstructed from phase-scrambled data without prominent aliasing. This image (for each RF channel) can be used to derive coil sensitivities to drive existing parallel imaging techniques. As a proof of concept, the quadratic phase was applied by introducing an offset to the x(2) - y(2) shim and the data were reconstructed using adapted versions of the image space-based sensitivity encoding and GeneRalized Autocalibrating Partially Parallel Acquisitions algorithms. The method is demonstrated in a phantom (1 × 2, 1 × 3, and 2 × 2 acceleration) and in vivo (2 × 2 acceleration) using a 3D gradient echo acquisition. Phase scrambling can be used to perform parallel imaging acceleration without acquisition of separate coil calibration data, demonstrated here for a 3D-Cartesian trajectory. Further research is required to prove the applicability to other 2D and 3D sampling schemes. © 2014 Wiley Periodicals, Inc.
Xiong, Bo; Wang, Ling-Ling; Li, Qiong; Nie, Yu-Ting; Cheng, Shuang-Shuang; Zhang, Hui; Sun, Ren-Qiang; Wang, Yu-Jiao; Zhou, Hong-Bin
2015-11-01
A parallel microscope-based laser-induced fluorescence (LIF), ultraviolet-visible absorbance (UV) and time-of-flight mass spectrometry (TOF-MS) detection for high performance liquid chromatography (HPLC) was achieved and used to determine glucosamine in urines. First, a reliable and convenient LIF detection was developed based on an inverted microscope and corresponding modulations. Parallel HPLC-LIF/UV/TOF-MS detection was developed by the combination of preceding Microscope-based LIF detection and HPLC coupled with UV and TOF-MS. The proposed setup, due to its parallel scheme, was free of the influence from photo bleaching in LIF detection. Rhodamine B, glutamic acid and glucosamine have been determined to evaluate its performance. Moreover, the proposed strategy was used to determine the glucosamine in urines, and subsequent results suggested that glucosamine, which was widely used in the prevention of the bone arthritis, was metabolized to urines within 4h. Furthermore, its concentration in urines decreased to 5.4mM at 12h. Efficient glucosamine detection was achieved based on a sensitive quantification (LIF), a universal detection (UV) and structural characterizations (TOF-MS). This application indicated that the proposed strategy was sensitive, universal and versatile, and it was capable of improved analysis, especially for analytes with low concentrations in complex samples, compared with conventional HPLC-UV/TOF-MS. Copyright © 2015 Elsevier B.V. All rights reserved.
Fast robot kinematics modeling by using a parallel simulator (PSIM)
International Nuclear Information System (INIS)
El-Gazzar, H.M.; Ayad, N.M.A.
2002-01-01
High-speed computers are strongly needed not only for solving scientific and engineering problems, but also for numerous industrial applications. Such applications include computer-aided design, oil exploration, weather predication, space applications and safety of nuclear reactors. The rapid development in VLSI technology makes it possible to implement time consuming algorithms in real-time situations. Parallel processing approaches can now be used to reduce the processing-time for models of very high mathematical structure such as the kinematics molding of robot manipulator. This system is used to construct and evaluate the performance and cost effectiveness of several proposed methods to solve the Jacobian algorithm. Parallelism is introduced to the algorithms by using different task-allocations and dividing the whole job into sub tasks. Detailed analysis is performed and results are obtained for the case of six DOF (degree of freedom) robot arms (Stanford Arm). Execution times comparisons between Von Neumann (uni processor) and parallel processor architectures by using parallel simulator package (PSIM) are presented. The gained results are much in favour for the parallel techniques by at least fifty-percent improvements. Of course, further studies are needed to achieve the convenient and optimum number of processors has to be done
Fast robot kinematics modeling by using a parallel simulator (PSIM)
Energy Technology Data Exchange (ETDEWEB)
El-Gazzar, H M; Ayad, N M.A. [Atomic Energy Authority, Reactor Dept., Computer and Control Lab., P.O. Box no 13759 (Egypt)
2002-09-15
High-speed computers are strongly needed not only for solving scientific and engineering problems, but also for numerous industrial applications. Such applications include computer-aided design, oil exploration, weather predication, space applications and safety of nuclear reactors. The rapid development in VLSI technology makes it possible to implement time consuming algorithms in real-time situations. Parallel processing approaches can now be used to reduce the processing-time for models of very high mathematical structure such as the kinematics molding of robot manipulator. This system is used to construct and evaluate the performance and cost effectiveness of several proposed methods to solve the Jacobian algorithm. Parallelism is introduced to the algorithms by using different task-allocations and dividing the whole job into sub tasks. Detailed analysis is performed and results are obtained for the case of six DOF (degree of freedom) robot arms (Stanford Arm). Execution times comparisons between Von Neumann (uni processor) and parallel processor architectures by using parallel simulator package (PSIM) are presented. The gained results are much in favour for the parallel techniques by at least fifty-percent improvements. Of course, further studies are needed to achieve the convenient and optimum number of processors has to be done.
Hybrid parallel computing architecture for multiview phase shifting
Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun
2014-11-01
The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.
A novel parallel architecture for local histogram equalization
Ohannessian, Mesrob I.; Choueiter, Ghinwa F.; Diab, Hassan
2005-07-01
Local histogram equalization is an image enhancement algorithm that has found wide application in the pre-processing stage of areas such as computer vision, pattern recognition and medical imaging. The computationally intensive nature of the procedure, however, is a main limitation when real time interactive applications are in question. This work explores the possibility of performing parallel local histogram equalization, using an array of special purpose elementary processors, through an HDL implementation that targets FPGA or ASIC platforms. A novel parallelization scheme is presented and the corresponding architecture is derived. The algorithm is reduced to pixel-level operations. Processing elements are assigned image blocks, to maintain a reasonable performance-cost ratio. To further simplify both processor and memory organizations, a bit-serial access scheme is used. A brief performance assessment is provided to illustrate and quantify the merit of the approach.
Benefits of Parallel I/O in Ab Initio Nuclear Physics Calculations
International Nuclear Information System (INIS)
Laghave, Nikhil; Sosonkina, Masha; Maris, Pieter; Vary, James P.
2009-01-01
Many modern scientific applications rely on highly parallel calculations, which scale to 10's of thousands processors. However, most applications do not concentrate on parallelizing input/output operations. In particular, sequential I/O has been identified as a bottleneck for the highly scalable MFDn (Many Fermion Dynamics for nuclear structure) code performing ab initio nuclear structure calculations. In this paper, we develop interfaces and parallel I/O procedures to use a well-known parallel I/O library in MFDn. As a result, we gain efficient input/output of large datasets along with their portability and ease of use in the downstream processing.
Mikulski, D; Juskiewicz, J; Przybylska-Gornowicz, B; Sosnowska, E; Slominski, B A; Jankowski, J; Zdunczyk, Z
2017-12-01
The aim of this study was to investigate the effect of dietary replacement of soya bean meal (SBM) with faba bean (FB) and a blend of non-starch polysaccharide (NSP) degrading enzymes on the gastrointestinal function, growth performance and welfare of young turkeys (1 to 56 days of age). An experiment with a 2×2 factorial design was performed to compare the efficacy of four diets: a SBM-based diet and a diet containing FB, with and without enzyme supplementation (C, FB, CE and FBE, respectively). In comparison with groups C, higher dry matter content and lower viscosity of the small intestinal digesta were noted in groups FB. The content of short-chain fatty acids (SCFAs) in the small intestinal digesta was higher in groups FB, but SCFA concentrations in the caecal digesta were comparable in groups C and FB. In comparison with control groups, similar BW gains, higher feed conversion ratio (FCR), higher dry matter content of excreta and milder symptoms of footpad dermatitis (FPD) were noted in groups FB. Enzyme supplementation increased the concentrations of acetate, butyrate and total SCFAs, but it did not increase the SCFA pool in the caecal digesta. The enzymatic preparation significantly improved FCR, reduced excreta hydration and the severity of FPD in turkeys. It can be concluded that in comparison with the SBM-based diet, the diet containing 30% of FB enables to achieve comparable BW gains accompanied by lower feed efficiency during the first 8 weeks of rearing. Non-starch polysaccharide-degrading enzymes can be used to improve the nutritional value of diets for young turkeys, but more desirable results of enzyme supplementation were noted in the SBM-based diet than in the FB-based diet.
Selective Adaptive Parallel Interference Cancellation for Multicarrier DS-CDMA Systems
Directory of Open Access Journals (Sweden)
Ahmed El-Sayed El-Mahdy
2013-07-01
Full Text Available In this paper, Selective Adaptive Parallel Interference Cancellation (SA-PIC technique is presented for Multicarrier Direct Sequence Code Division Multiple Access (MC DS-CDMA scheme. The motivation of using SA-PIC is that it gives high performance and at the same time, reduces the computational complexity required to perform interference cancellation. An upper bound expression of the bit error rate (BER for the SA-PIC under Rayleigh fading channel condition is derived. Moreover, the implementation complexities for SA-PIC and Adaptive Parallel Interference Cancellation (APIC are discussed and compared. The performance of SA-PIC is investigated analytically and validated via computer simulations.
Optimization of enzyme parameters for fermentative production of biorenewable fuels and chemicals
Directory of Open Access Journals (Sweden)
Ping Liu
2012-10-01
Full Text Available Microbial biocatalysts such as Escherichia coli and Saccharomyces cerevisiae have been extensively subjected to Metabolic Engineering for the fermentative production of biorenewable fuels and chemicals. This often entails the introduction of new enzymes, deletion of unwanted enzymes and efforts to fine-tune enzyme abundance in order to attain the desired strain performance. Enzyme performance can be quantitatively described in terms of the Michaelis-Menten type parameters Km, turnover number kcat and Ki, which roughly describe the affinity of an enzyme for its substrate, the speed of a reaction and the enzyme sensitivity to inhibition by regulatory molecules. Here we describe examples of where knowledge of these parameters have been used to select, evolve or engineer enzymes for the desired performance and enabled increased production of biorenewable fuels and chemicals. Examples include production of ethanol, isobutanol, 1-butanol and tyrosine and furfural tolerance. The Michaelis-Menten parameters can also be used to judge the cofactor dependence of enzymes and quantify their preference for NADH or NADPH. Similarly, enzymes can be selected, evolved or engineered for the preferred cofactor preference. Examples of exporter engineering and selection are also discussed in the context of production of malate, valine and limonene.
OPTIMIZATION OF ENZYME PARAMETERS FOR FERMENTATIVE PRODUCTION OF BIORENEWABLE FUELS AND CHEMICALS
Directory of Open Access Journals (Sweden)
Laura R. Jarboe
2012-10-01
Full Text Available Microbial biocatalysts such as Escherichia coli and Saccharomyces cerevisiae have been extensively subjected to Metabolic Engineering for the fermentative production of biorenewable fuels and chemicals. This often entails the introduction of new enzymes, deletion of unwanted enzymes and efforts to fine-tune enzyme abundance in order to attain the desired strain performance. Enzyme performance can be quantitatively described in terms of the Michaelis-Menten type parameters Km, turnover number kcat and Ki, which roughly describe the affinity of an enzyme for its substrate, the speed of a reaction and the enzyme sensitivity to inhibition by regulatory molecules. Here we describe examples of where knowledge of these parameters have been used to select, evolve or engineer enzymes for the desired performance and enabled increased production of biorenewable fuels and chemicals. Examples include production of ethanol, isobutanol, 1-butanol and tyrosine and furfural tolerance. The Michaelis-Menten parameters can also be used to judge the cofactor dependence of enzymes and quantify their preference for NADH or NADPH. Similarly, enzymes can be selected, evolved or engineered for the preferred cofactor preference. Examples of exporter engineering and selection are also discussed in the context of production of malate, valine and limonene.
penORNL: a parallel Monte Carlo photon and electron transport package using PENELOPE
International Nuclear Information System (INIS)
Bekar, Kursat B.; Miller, Thomas Martin; Patton, Bruce W.; Weber, Charles F.
2015-01-01
The parallel Monte Carlo photon and electron transport code package penORNL was developed at Oak Ridge National Laboratory to enable advanced scanning electron microscope (SEM) simulations on high-performance computing systems. This paper discusses the implementations, capabilities and parallel performance of the new code package. penORNL uses PENELOPE for its physics calculations and provides all available PENELOPE features to the users, as well as some new features including source definitions specifically developed for SEM simulations, a pulse-height tally capability for detailed simulations of gamma and x-ray detectors, and a modified interaction forcing mechanism to enable accurate energy deposition calculations. The parallel performance of penORNL was extensively tested with several model problems, and very good linear parallel scaling was observed with up to 512 processors. penORNL, along with its new features, will be available for SEM simulations upon completion of the new pulse-height tally implementation.
Brzostek, Edward R.; Finzi, Adrien C.
2012-03-01
Increasing soil temperature has the potential to alter the activity of the extracellular enzymes that mobilize nitrogen (N) from soil organic matter (SOM) and ultimately the availability of N for primary production. Proteolytic enzymes depolymerize N from proteinaceous components of SOM into amino acids, and their activity is a principal driver of the within-system cycle of soil N. The objectives of this study were to investigate whether the soils of temperate forest tree species differ in the temperature sensitivity of proteolytic enzyme activity over the growing season and the role of substrate limitation in regulating temperature sensitivity. Across species and sampling dates, proteolytic enzyme activity had relatively low sensitivity to temperature with a mean activation energy (Ea) of 33.5 kJ mol-1. Ea declined in white ash, American beech, and eastern hemlock soils across the growing season as soils warmed. By contrast, Eain sugar maple soil increased across the growing season. We used these data to develop a species-specific empirical model of proteolytic enzyme activity for the 2009 calendar year and studied the interactive effects of soil temperature (ambient or +5°C) and substrate limitation (ambient or elevated protein) on enzyme activity. Declines in substrate limitation had a larger single-factor effect on proteolytic enzyme activity than temperature, particularly in the spring. There was, however, a large synergistic effect of increasing temperature and substrate supply on proteolytic enzyme activity. Our results suggest limited increases in N availability with climate warming unless there is a parallel increase in the availability of protein substrates.
Visual analysis of inter-process communication for large-scale parallel computing.
Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu
2009-01-01
In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.