WorldWideScience

Sample records for parallel factor analysis

  1. Evaluation of Parallel Analysis Methods for Determining the Number of Factors

    Science.gov (United States)

    Crawford, Aaron V.; Green, Samuel B.; Levy, Roy; Lo, Wen-Juo; Scott, Lietta; Svetina, Dubravka; Thompson, Marilyn S.

    2010-01-01

    Population and sample simulation approaches were used to compare the performance of parallel analysis using principal component analysis (PA-PCA) and parallel analysis using principal axis factoring (PA-PAF) to identify the number of underlying factors. Additionally, the accuracies of the mean eigenvalue and the 95th percentile eigenvalue criteria…

  2. Parallel factor analysis PARAFAC of process affected water

    Energy Technology Data Exchange (ETDEWEB)

    Ewanchuk, A.M.; Ulrich, A.C.; Sego, D. [Alberta Univ., Edmonton, AB (Canada). Dept. of Civil and Environmental Engineering; Alostaz, M. [Thurber Engineering Ltd., Calgary, AB (Canada)

    2010-07-01

    A parallel factor analysis (PARAFAC) of oil sands process-affected water was presented. Naphthenic acids (NA) are traditionally described as monobasic carboxylic acids. Research has indicated that oil sands NA do not fit classical definitions of NA. Oil sands organic acids have toxic and corrosive properties. When analyzed by fluorescence technology, oil sands process-affected water displays a characteristic peak at 290 nm excitation and approximately 346 nm emission. In this study, a parallel factor analysis (PARAFAC) was used to decompose process-affected water multi-way data into components representing analytes, chemical compounds, and groups of compounds. Water samples from various oil sands operations were analyzed in order to obtain EEMs. The EEMs were then arranged into a large matrix in decreasing process-affected water content for PARAFAC. Data were divided into 5 components. A comparison with commercially prepared NA samples suggested that oil sands NA is fundamentally different. Further research is needed to determine what each of the 5 components represent. tabs., figs.

  3. Functional Parallel Factor Analysis for Functions of One- and Two-dimensional Arguments

    NARCIS (Netherlands)

    Choi, Ji Yeh; Hwang, Heungsun; Timmerman, Marieke

    Parallel factor analysis (PARAFAC) is a useful multivariate method for decomposing three-way data that consist of three different types of entities simultaneously. This method estimates trilinear components, each of which is a low-dimensional representation of a set of entities, often called a mode,

  4. Sparse Probabilistic Parallel Factor Analysis for the Modeling of PET and Task-fMRI Data

    DEFF Research Database (Denmark)

    Beliveau, Vincent; Papoutsakis, Georgios; Hinrich, Jesper Løve

    2017-01-01

    Modern datasets are often multiway in nature and can contain patterns common to a mode of the data (e.g. space, time, and subjects). Multiway decomposition such as parallel factor analysis (PARAFAC) take into account the intrinsic structure of the data, and sparse versions of these methods improv...

  5. Spectral analysis of parallel incomplete factorizations with implicit pseudo­-overlap

    NARCIS (Netherlands)

    Magolu monga Made, Mardochée; Vorst, H.A. van der

    2000-01-01

    Two general parallel incomplete factorization strategies are investigated. The techniques may be interpreted as generalized domain decomposition methods. In contrast to classical domain decomposition methods, adjacent subdomains exchange data during the construction of the incomplete

  6. Monitoring organic loading to swimming pools by fluorescence excitation–emission matrix with parallel factor analysis (PARAFAC)

    DEFF Research Database (Denmark)

    Seredynska-Sobecka, Bozena; Stedmon, Colin; Boe-Hansen, Rasmus

    2011-01-01

    Fluorescence Excitation–Emission Matrix spectroscopy combined with parallel factor analysis was employed to monitor water quality and organic contamination in swimming pools. The fluorescence signal of the swimming pool organic matter was low but increased slightly through the day. The analysis...... revealed that the organic matter fluorescence was characterised by five different components, one of which was unique to swimming pool organic matter and one which was specific to organic contamination. The latter component had emission peaks at 420nm and was found to be a sensitive indicator of organic...... loading in swimming pool water. The fluorescence at 420nm gradually increased during opening hours and represented material accumulating through the day....

  7. Exploiting fine-grain parallelism in recursive LU factorization

    KAUST Repository

    Dongarra, Jack

    2012-01-01

    The LU factorization is an important numerical algorithm for solving system of linear equations. This paper proposes a novel approach for computing the LU factorization in parallel on multicore architectures. It improves the overall performance and also achieves the numerical quality of the standard LU factorization with partial pivoting. While the update of the trailing submatrix is computationally intensive and highly parallel, the inherently problematic portion of the LU factorization is the panel factorization due to its memory-bound characteristic and the atomicity of selecting the appropriate pivots. We remedy this in our new approach to LU factorization of (narrow and tall) panel submatrices. We use a parallel fine-grained recursive formulation of the factorization. It is based on conflict-free partitioning of the data and lock-less synchronization mechanisms. Our implementation lets the overall computation naturally flow with limited contention. Our recursive panel factorization provides the necessary performance increase for the inherently problematic portion of the LU factorization of square matrices. A large panel width results in larger Amdahl\\'s fraction as our experiments have revealed which is consistent with related efforts. The performance results of our implementation reveal superlinear speedup and far exceed what can be achieved with equivalent MKL and/or LAPACK routines. © 2012 The authors and IOS Press. All rights reserved.

  8. Metabolic profiling based on two-dimensional J-resolved 1H NMR data and parallel factor analysis

    DEFF Research Database (Denmark)

    Yilmaz, Ali; Nyberg, Nils T; Jaroszewski, Jerzy W.

    2011-01-01

    the intensity variances along the chemical shift axis are taken into account. Here, we describe the use of parallel factor analysis (PARAFAC) as a tool to preprocess a set of two-dimensional J-resolved spectra with the aim of keeping the J-coupling information intact. PARAFAC is a mathematical decomposition......-model was done automatically by evaluating amount of explained variance and core consistency values. Score plots showing the distribution of objects in relation to each other, and loading plots in the form of two-dimensional pseudo-spectra with the same appearance as the original J-resolved spectra...

  9. Analysis of parallel computing performance of the code MCNP

    International Nuclear Information System (INIS)

    Wang Lei; Wang Kan; Yu Ganglin

    2006-01-01

    Parallel computing can reduce the running time of the code MCNP effectively. With the MPI message transmitting software, MCNP5 can achieve its parallel computing on PC cluster with Windows operating system. Parallel computing performance of MCNP is influenced by factors such as the type, the complexity level and the parameter configuration of the computing problem. This paper analyzes the parallel computing performance of MCNP regarding with these factors and gives measures to improve the MCNP parallel computing performance. (authors)

  10. Regional-scale calculation of the LS factor using parallel processing

    Science.gov (United States)

    Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong

    2015-05-01

    With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.

  11. Exploiting fine-grain parallelism in recursive LU factorization

    KAUST Repository

    Dongarra, Jack; Faverge, Mathieu; Ltaief, Hatem; Luszczek, Piotr R.

    2012-01-01

    is the panel factorization due to its memory-bound characteristic and the atomicity of selecting the appropriate pivots. We remedy this in our new approach to LU factorization of (narrow and tall) panel submatrices. We use a parallel fine-grained recursive

  12. Physics Structure Analysis of Parallel Waves Concept of Physics Teacher Candidate

    International Nuclear Information System (INIS)

    Sarwi, S; Linuwih, S; Supardi, K I

    2017-01-01

    The aim of this research was to find a parallel structure concept of wave physics and the factors that influence on the formation of parallel conceptions of physics teacher candidates. The method used qualitative research which types of cross-sectional design. These subjects were five of the third semester of basic physics and six of the fifth semester of wave course students. Data collection techniques used think aloud and written tests. Quantitative data were analysed with descriptive technique-percentage. The data analysis technique for belief and be aware of answers uses an explanatory analysis. Results of the research include: 1) the structure of the concept can be displayed through the illustration of a map containing the theoretical core, supplements the theory and phenomena that occur daily; 2) the trend of parallel conception of wave physics have been identified on the stationary waves, resonance of the sound and the propagation of transverse electromagnetic waves; 3) the influence on the parallel conception that reading textbooks less comprehensive and knowledge is partial understanding as forming the structure of the theory. (paper)

  13. Fast parallel molecular algorithms for DNA-based computation: factoring integers.

    Science.gov (United States)

    Chang, Weng-Long; Guo, Minyi; Ho, Michael Shan-Hui

    2005-06-01

    The RSA public-key cryptosystem is an algorithm that converts input data to an unrecognizable encryption and converts the unrecognizable data back into its original decryption form. The security of the RSA public-key cryptosystem is based on the difficulty of factoring the product of two large prime numbers. This paper demonstrates to factor the product of two large prime numbers, and is a breakthrough in basic biological operations using a molecular computer. In order to achieve this, we propose three DNA-based algorithms for parallel subtractor, parallel comparator, and parallel modular arithmetic that formally verify our designed molecular solutions for factoring the product of two large prime numbers. Furthermore, this work indicates that the cryptosystems using public-key are perhaps insecure and also presents clear evidence of the ability of molecular computing to perform complicated mathematical operations.

  14. Parallel processing of structural integrity analysis codes

    International Nuclear Information System (INIS)

    Swami Prasad, P.; Dutta, B.K.; Kushwaha, H.S.

    1996-01-01

    Structural integrity analysis forms an important role in assessing and demonstrating the safety of nuclear reactor components. This analysis is performed using analytical tools such as Finite Element Method (FEM) with the help of digital computers. The complexity of the problems involved in nuclear engineering demands high speed computation facilities to obtain solutions in reasonable amount of time. Parallel processing systems such as ANUPAM provide an efficient platform for realising the high speed computation. The development and implementation of software on parallel processing systems is an interesting and challenging task. The data and algorithm structure of the codes plays an important role in exploiting the parallel processing system capabilities. Structural analysis codes based on FEM can be divided into two categories with respect to their implementation on parallel processing systems. The first category codes such as those used for harmonic analysis, mechanistic fuel performance codes need not require the parallelisation of individual modules of the codes. The second category of codes such as conventional FEM codes require parallelisation of individual modules. In this category, parallelisation of equation solution module poses major difficulties. Different solution schemes such as domain decomposition method (DDM), parallel active column solver and substructuring method are currently used on parallel processing systems. Two codes, FAIR and TABS belonging to each of these categories have been implemented on ANUPAM. The implementation details of these codes and the performance of different equation solvers are highlighted. (author). 5 refs., 12 figs., 1 tab

  15. Functional Parallel Factor Analysis for Functions of One- and Two-dimensional Arguments.

    Science.gov (United States)

    Choi, Ji Yeh; Hwang, Heungsun; Timmerman, Marieke E

    2018-03-01

    Parallel factor analysis (PARAFAC) is a useful multivariate method for decomposing three-way data that consist of three different types of entities simultaneously. This method estimates trilinear components, each of which is a low-dimensional representation of a set of entities, often called a mode, to explain the maximum variance of the data. Functional PARAFAC permits the entities in different modes to be smooth functions or curves, varying over a continuum, rather than a collection of unconnected responses. The existing functional PARAFAC methods handle functions of a one-dimensional argument (e.g., time) only. In this paper, we propose a new extension of functional PARAFAC for handling three-way data whose responses are sequenced along both a two-dimensional domain (e.g., a plane with x- and y-axis coordinates) and a one-dimensional argument. Technically, the proposed method combines PARAFAC with basis function expansion approximations, using a set of piecewise quadratic finite element basis functions for estimating two-dimensional smooth functions and a set of one-dimensional basis functions for estimating one-dimensional smooth functions. In a simulation study, the proposed method appeared to outperform the conventional PARAFAC. We apply the method to EEG data to demonstrate its empirical usefulness.

  16. Binocular optical axis parallelism detection precision analysis based on Monte Carlo method

    Science.gov (United States)

    Ying, Jiaju; Liu, Bingqi

    2018-02-01

    According to the working principle of the binocular photoelectric instrument optical axis parallelism digital calibration instrument, and in view of all components of the instrument, the various factors affect the system precision is analyzed, and then precision analysis model is established. Based on the error distribution, Monte Carlo method is used to analyze the relationship between the comprehensive error and the change of the center coordinate of the circle target image. The method can further guide the error distribution, optimize control the factors which have greater influence on the comprehensive error, and improve the measurement accuracy of the optical axis parallelism digital calibration instrument.

  17. Impact of Optimization and Parallelism on Factorization Speed of SIQS

    Directory of Open Access Journals (Sweden)

    Dominik Breitenbacher

    2016-06-01

    Full Text Available This paper examines optimization possibilities of Self-Initialization Quadratic Sieve (SIQS, which is enhanced version of Quadratic Sieve factorization method. SIQS is considered the second fastest factorization method at all and the fastest one for numbers shorter than 100 decimal digits, respectively. Although, SIQS is the fastest method up to 100 decimal digits, it cannot be effectively utilized to work in polynomial time. Therefore, it is desirable to look for options how to speed up the method as much as possible. Two feasible ways of achieving it are code optimization and parallelism. Both of them are utilized in this paper. The goal of this paper is to show how it is possible to take advantage of parallelism in SIQS as well as reach a large speed-up thanks to detailed source code analysis with optimization. Our implementation process consists of two phases. In the first phase, the complete serial algorithm is implemented in the simplest way which does not consider any requirements for execution speed. The solution from the first phase serves as the reference implementation for further experiments. An improvement of factorization speed is performed in the second phase of the SIQS implementation, where we use the method of iterative modifications in order to examine contribution of each proposed step. The final optimized version of the SIQS implementation has achieved over 200x speed-up.

  18. Massive Asynchronous Parallelization of Sparse Matrix Factorizations

    Energy Technology Data Exchange (ETDEWEB)

    Chow, Edmond [Georgia Inst. of Technology, Atlanta, GA (United States)

    2018-01-08

    Solving sparse problems is at the core of many DOE computational science applications. We focus on the challenge of developing sparse algorithms that can fully exploit the parallelism in extreme-scale computing systems, in particular systems with massive numbers of cores per node. Our approach is to express a sparse matrix factorization as a large number of bilinear constraint equations, and then solving these equations via an asynchronous iterative method. The unknowns in these equations are the matrix entries of the factorization that is desired.

  19. Decoupling Principle Analysis and Development of a Parallel Three-Dimensional Force Sensor.

    Science.gov (United States)

    Zhao, Yanzhi; Jiao, Leihao; Weng, Dacheng; Zhang, Dan; Zheng, Rencheng

    2016-09-15

    In the development of the multi-dimensional force sensor, dimension coupling is the ubiquitous factor restricting the improvement of the measurement accuracy. To effectively reduce the influence of dimension coupling on the parallel multi-dimensional force sensor, a novel parallel three-dimensional force sensor is proposed using a mechanical decoupling principle, and the influence of the friction on dimension coupling is effectively reduced by making the friction rolling instead of sliding friction. In this paper, the mathematical model is established by combining with the structure model of the parallel three-dimensional force sensor, and the modeling and analysis of mechanical decoupling are carried out. The coupling degree (ε) of the designed sensor is defined and calculated, and the calculation results show that the mechanical decoupling parallel structure of the sensor possesses good decoupling performance. A prototype of the parallel three-dimensional force sensor was developed, and FEM analysis was carried out. The load calibration and data acquisition experiment system are built, and then calibration experiments were done. According to the calibration experiments, the measurement accuracy is less than 2.86% and the coupling accuracy is less than 3.02%. The experimental results show that the sensor system possesses high measuring accuracy, which provides a basis for the applied research of the parallel multi-dimensional force sensor.

  20. Parallelization of the Physical-Space Statistical Analysis System (PSAS)

    Science.gov (United States)

    Larson, J. W.; Guo, J.; Lyster, P. M.

    1999-01-01

    Atmospheric data assimilation is a method of combining observations with model forecasts to produce a more accurate description of the atmosphere than the observations or forecast alone can provide. Data assimilation plays an increasingly important role in the study of climate and atmospheric chemistry. The NASA Data Assimilation Office (DAO) has developed the Goddard Earth Observing System Data Assimilation System (GEOS DAS) to create assimilated datasets. The core computational components of the GEOS DAS include the GEOS General Circulation Model (GCM) and the Physical-space Statistical Analysis System (PSAS). The need for timely validation of scientific enhancements to the data assimilation system poses computational demands that are best met by distributed parallel software. PSAS is implemented in Fortran 90 using object-based design principles. The analysis portions of the code solve two equations. The first of these is the "innovation" equation, which is solved on the unstructured observation grid using a preconditioned conjugate gradient (CG) method. The "analysis" equation is a transformation from the observation grid back to a structured grid, and is solved by a direct matrix-vector multiplication. Use of a factored-operator formulation reduces the computational complexity of both the CG solver and the matrix-vector multiplication, rendering the matrix-vector multiplications as a successive product of operators on a vector. Sparsity is introduced to these operators by partitioning the observations using an icosahedral decomposition scheme. PSAS builds a large (approx. 128MB) run-time database of parameters used in the calculation of these operators. Implementing a message passing parallel computing paradigm into an existing yet developing computational system as complex as PSAS is nontrivial. One of the technical challenges is balancing the requirements for computational reproducibility with the need for high performance. The problem of computational

  1. Parallel single-cell analysis microfluidic platform

    NARCIS (Netherlands)

    van den Brink, Floris Teunis Gerardus; Gool, Elmar; Frimat, Jean-Philippe; Bomer, Johan G.; van den Berg, Albert; le Gac, Severine

    2011-01-01

    We report a PDMS microfluidic platform for parallel single-cell analysis (PaSCAl) as a powerful tool to decipher the heterogeneity found in cell populations. Cells are trapped individually in dedicated pockets, and thereafter, a number of invasive or non-invasive analysis schemes are performed.

  2. Discrete Hadamard transformation algorithm's parallelism analysis and achievement

    Science.gov (United States)

    Hu, Hui

    2009-07-01

    With respect to Discrete Hadamard Transformation (DHT) wide application in real-time signal processing while limitation in operation speed of DSP. The article makes DHT parallel research and its parallel performance analysis. Based on multiprocessor platform-TMS320C80 programming structure, the research is carried out to achieve two kinds of parallel DHT algorithms. Several experiments demonstrated the effectiveness of the proposed algorithms.

  3. Parallelization for X-ray crystal structural analysis program

    Energy Technology Data Exchange (ETDEWEB)

    Watanabe, Hiroshi [Japan Atomic Energy Research Inst., Tokyo (Japan); Minami, Masayuki; Yamamoto, Akiji

    1997-10-01

    In this report we study vectorization and parallelization for X-ray crystal structural analysis program. The target machine is NEC SX-4 which is a distributed/shared memory type vector parallel supercomputer. X-ray crystal structural analysis is surveyed, and a new multi-dimensional discrete Fourier transform method is proposed. The new method is designed to have a very long vector length, so that it enables to obtain the 12.0 times higher performance result that the original code. Besides the above-mentioned vectorization, the parallelization by micro-task functions on SX-4 reaches 13.7 times acceleration in the part of multi-dimensional discrete Fourier transform with 14 CPUs, and 3.0 times acceleration in the whole program. Totally 35.9 times acceleration to the original 1CPU scalar version is achieved with vectorization and parallelization on SX-4. (author)

  4. HPC-NMF: A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization

    Energy Technology Data Exchange (ETDEWEB)

    2016-08-22

    NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems for $\\WW$ and $\\HH$. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementation, our algorithm is also flexible: It performs well for both dense and sparse matrices, and allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors $\\WW$ and $\\HH$ within the alternating iterations.

  5. A new scheduling algorithm for parallel sparse LU factorization with static pivoting

    Energy Technology Data Exchange (ETDEWEB)

    Grigori, Laura; Li, Xiaoye S.

    2002-08-20

    In this paper we present a static scheduling algorithm for parallel sparse LU factorization with static pivoting. The algorithm is divided into mapping and scheduling phases, using the symmetric pruned graphs of L' and U to represent dependencies. The scheduling algorithm is designed for driving the parallel execution of the factorization on a distributed-memory architecture. Experimental results and comparisons with SuperLU{_}DIST are reported after applying this algorithm on real world application matrices on an IBM SP RS/6000 distributed memory machine.

  6. Temporal fringe pattern analysis with parallel computing

    International Nuclear Information System (INIS)

    Tuck Wah Ng; Kar Tien Ang; Argentini, Gianluca

    2005-01-01

    Temporal fringe pattern analysis is invaluable in transient phenomena studies but necessitates long processing times. Here we describe a parallel computing strategy based on the single-program multiple-data model and hyperthreading processor technology to reduce the execution time. In a two-node cluster workstation configuration we found that execution periods were reduced by 1.6 times when four virtual processors were used. To allow even lower execution times with an increasing number of processors, the time allocated for data transfer, data read, and waiting should be minimized. Parallel computing is found here to present a feasible approach to reduce execution times in temporal fringe pattern analysis

  7. Parallel processor for fast event analysis

    International Nuclear Information System (INIS)

    Hensley, D.C.

    1983-01-01

    Current maximum data rates from the Spin Spectrometer of approx. 5000 events/s (up to 1.3 MBytes/s) and minimum analysis requiring at least 3000 operations/event require a CPU cycle time near 70 ns. In order to achieve an effective cycle time of 70 ns, a parallel processing device is proposed where up to 4 independent processors will be implemented in parallel. The individual processors are designed around the Am2910 Microsequencer, the AM29116 μP, and the Am29517 Multiplier. Satellite histogramming in a mass memory system will be managed by a commercial 16-bit μP system

  8. Parallel interactive data analysis with PROOF

    International Nuclear Information System (INIS)

    Ballintijn, Maarten; Biskup, Marek; Brun, Rene; Canal, Philippe; Feichtinger, Derek; Ganis, Gerardo; Kickinger, Guenter; Peters, Andreas; Rademakers, Fons

    2006-01-01

    The Parallel ROOT Facility, PROOF, enables the analysis of much larger data sets on a shorter time scale. It exploits the inherent parallelism in data of uncorrelated events via a multi-tier architecture that optimizes I/O and CPU utilization in heterogeneous clusters with distributed storage. The system provides transparent and interactive access to gigabytes today. Being part of the ROOT framework PROOF inherits the benefits of a performant object storage system and a wealth of statistical and visualization tools. This paper describes the data analysis model of ROOT and the latest developments on closer integration of PROOF into that model and the ROOT user environment, e.g. support for PROOF-based browsing of trees stored remotely, and the popular TTree::Draw() interface. We also outline the ongoing developments aimed to improve the flexibility and user-friendliness of the system

  9. Impact analysis on a massively parallel computer

    International Nuclear Information System (INIS)

    Zacharia, T.; Aramayo, G.A.

    1994-01-01

    Advanced mathematical techniques and computer simulation play a major role in evaluating and enhancing the design of beverage cans, industrial, and transportation containers for improved performance. Numerical models are used to evaluate the impact requirements of containers used by the Department of Energy (DOE) for transporting radioactive materials. Many of these models are highly compute-intensive. An analysis may require several hours of computational time on current supercomputers despite the simplicity of the models being studied. As computer simulations and materials databases grow in complexity, massively parallel computers have become important tools. Massively parallel computational research at the Oak Ridge National Laboratory (ORNL) and its application to the impact analysis of shipping containers is briefly described in this paper

  10. Comparing the Effects of Different Smoothing Algorithms on the Assessment of Dimensionality of Ordered Categorical Items with Parallel Analysis.

    Science.gov (United States)

    Debelak, Rudolf; Tran, Ulrich S

    2016-01-01

    The analysis of polychoric correlations via principal component analysis and exploratory factor analysis are well-known approaches to determine the dimensionality of ordered categorical items. However, the application of these approaches has been considered as critical due to the possible indefiniteness of the polychoric correlation matrix. A possible solution to this problem is the application of smoothing algorithms. This study compared the effects of three smoothing algorithms, based on the Frobenius norm, the adaption of the eigenvalues and eigenvectors, and on minimum-trace factor analysis, on the accuracy of various variations of parallel analysis by the means of a simulation study. We simulated different datasets which varied with respect to the size of the respondent sample, the size of the item set, the underlying factor model, the skewness of the response distributions and the number of response categories in each item. We found that a parallel analysis and principal component analysis of smoothed polychoric and Pearson correlations led to the most accurate results in detecting the number of major factors in simulated datasets when compared to the other methods we investigated. Of the methods used for smoothing polychoric correlation matrices, we recommend the algorithm based on minimum trace factor analysis.

  11. Dictionary Learning Based on Nonnegative Matrix Factorization Using Parallel Coordinate Descent

    Directory of Open Access Journals (Sweden)

    Zunyi Tang

    2013-01-01

    Full Text Available Sparse representation of signals via an overcomplete dictionary has recently received much attention as it has produced promising results in various applications. Since the nonnegativities of the signals and the dictionary are required in some applications, for example, multispectral data analysis, the conventional dictionary learning methods imposed simply with nonnegativity may become inapplicable. In this paper, we propose a novel method for learning a nonnegative, overcomplete dictionary for such a case. This is accomplished by posing the sparse representation of nonnegative signals as a problem of nonnegative matrix factorization (NMF with a sparsity constraint. By employing the coordinate descent strategy for optimization and extending it to multivariable case for processing in parallel, we develop a so-called parallel coordinate descent dictionary learning (PCDDL algorithm, which is structured by iteratively solving the two optimal problems, the learning process of the dictionary and the estimating process of the coefficients for constructing the signals. Numerical experiments demonstrate that the proposed algorithm performs better than the conventional nonnegative K-SVD (NN-KSVD algorithm and several other algorithms for comparison. What is more, its computational consumption is remarkably lower than that of the compared algorithms.

  12. Graph Transformation and Designing Parallel Sparse Matrix Algorithms beyond Data Dependence Analysis

    Directory of Open Access Journals (Sweden)

    H.X. Lin

    2004-01-01

    Full Text Available Algorithms are often parallelized based on data dependence analysis manually or by means of parallel compilers. Some vector/matrix computations such as the matrix-vector products with simple data dependence structures (data parallelism can be easily parallelized. For problems with more complicated data dependence structures, parallelization is less straightforward. The data dependence graph is a powerful means for designing and analyzing parallel algorithms. However, for sparse matrix computations, parallelization based on solely exploiting the existing parallelism in an algorithm does not always give satisfactory results. For example, the conventional Gaussian elimination algorithm for the solution of a tri-diagonal system is inherently sequential, so algorithms specially for parallel computation has to be designed. After briefly reviewing different parallelization approaches, a powerful graph formalism for designing parallel algorithms is introduced. This formalism will be discussed using a tri-diagonal system as an example. Its application to general matrix computations is also discussed. Its power in designing parallel algorithms beyond the ability of data dependence analysis is shown by means of a new algorithm called ACER (Alternating Cyclic Elimination and Reduction algorithm.

  13. Assessment on the leakage hazard of landfill leachate using three-dimensional excitation-emission fluorescence and parallel factor analysis method.

    Science.gov (United States)

    Pan, Hongwei; Lei, Hongjun; Liu, Xin; Wei, Huaibin; Liu, Shufang

    2017-09-01

    A large number of simple and informal landfills exist in developing countries, which pose as tremendous soil and groundwater pollution threats. Early warning and monitoring of landfill leachate pollution status is of great importance. However, there is a shortage of affordable and effective tools and methods. In this study, a soil column experiment was performed to simulate the pollution status of leachate using three-dimensional excitation-emission fluorescence (3D-EEMF) and parallel factor analysis (PARAFAC) models. Sum of squared residuals (SSR) and principal component analysis (PCA) were used to determine the optimal components for PARAFAC. A one-way analysis of variance showed that the component scores of the soil column leachate were significant influenced by landfill leachate (plandfill to that of natural soil could be used to evaluate the leakage status of landfill leachate. Furthermore, a hazard index (HI) and a hazard evaluation standard were established. A case study of Kaifeng landfill indicated a low hazard (level 5) by the use of HI. In summation, HI is presented as a tool to evaluate landfill pollution status and for the guidance of municipal solid waste management. Copyright © 2017 Elsevier Ltd. All rights reserved.

  14. Parallel database search and prime factorization with magnonic holographic memory devices

    Energy Technology Data Exchange (ETDEWEB)

    Khitun, Alexander [Electrical and Computer Engineering Department, University of California - Riverside, Riverside, California 92521 (United States)

    2015-12-28

    In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploit wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed.

  15. Parallel database search and prime factorization with magnonic holographic memory devices

    Science.gov (United States)

    Khitun, Alexander

    2015-12-01

    In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploit wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed.

  16. Parallel database search and prime factorization with magnonic holographic memory devices

    International Nuclear Information System (INIS)

    Khitun, Alexander

    2015-01-01

    In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploit wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed

  17. A Comparative Study of the Application of Fluorescence Excitation-Emission Matrices Combined with Parallel Factor Analysis and Nonnegative Matrix Factorization in the Analysis of Zn Complexation by Humic Acids

    Directory of Open Access Journals (Sweden)

    Patrycja Boguta

    2016-10-01

    Full Text Available The main aim of this study was the application of excitation-emission fluorescence matrices (EEMs combined with two decomposition methods: parallel factor analysis (PARAFAC and nonnegative matrix factorization (NMF to study the interaction mechanisms between humic acids (HAs and Zn(II over a wide concentration range (0–50 mg·dm−3. The influence of HA properties on Zn(II complexation was also investigated. Stability constants, quenching degree and complexation capacity were estimated for binding sites found in raw EEM, EEM-PARAFAC and EEM-NMF data using mathematical models. A combination of EEM fluorescence analysis with one of the proposed decomposition methods enabled separation of overlapping binding sites and yielded more accurate calculations of the binding parameters. PARAFAC and NMF processing allowed finding binding sites invisible in a few raw EEM datasets as well as finding totally new maxima attributed to structures of the lowest humification. Decomposed data showed an increase in Zn complexation with an increase in humification, aromaticity and molecular weight of HAs. EEM-PARAFAC analysis also revealed that the most stable compounds were formed by structures containing the highest amounts of nitrogen. The content of oxygen-functional groups did not influence the binding parameters, mainly due to fact of higher competition of metal cation with protons. EEM spectra coupled with NMF and especially PARAFAC processing gave more adequate assessments of interactions as compared to raw EEM data and should be especially recommended for modeling of complexation processes where the fluorescence intensities (FI changes are weak or where the processes are interfered with by the presence of other fluorophores.

  18. Design and Transmission Analysis of an Asymmetrical Spherical Parallel Manipulator

    DEFF Research Database (Denmark)

    Wu, Guanglei; Caro, Stéphane; Wang, Jiawei

    2015-01-01

    analysis and optimal design of the proposed manipulator based on its kinematic analysis. The input and output transmission indices of the manipulator are defined for its optimum design based on the virtual coefficient between the transmission wrenches and twist screws. The sets of optimal parameters......This paper presents an asymmetrical spherical parallel manipulator and its transmissibility analysis. This manipulator contains a center shaft to both generate a decoupled unlimited-torsion motion and support the mobile platform for high positioning accuracy. This work addresses the transmission...... are identified and the distribution of the transmission index is visualized. Moreover, a comparative study regarding to the performances with the symmetrical spherical parallel manipulators is conducted and the comparison shows the advantages of the proposed manipulator with respect to its spherical parallel...

  19. Analysis for Parallel Execution without Performing Hardware/Software Co-simulation

    OpenAIRE

    Muhammad Rashid

    2014-01-01

    Hardware/software co-simulation improves the performance of embedded applications by executing the applications on a virtual platform before the actual hardware is available in silicon. However, the virtual platform of the target architecture is often not available during early stages of the embedded design flow. Consequently, analysis for parallel execution without performing hardware/software co-simulation is required. This article presents an analysis methodology for parallel execution of ...

  20. Parallel algorithms for nuclear reactor analysis via domain decomposition method

    International Nuclear Information System (INIS)

    Kim, Yong Hee

    1995-02-01

    In this thesis, the neutron diffusion equation in reactor physics is discretized by the finite difference method and is solved on a parallel computer network which is composed of T-800 transputers. T-800 transputer is a message-passing type MIMD (multiple instruction streams and multiple data streams) architecture. A parallel variant of Schwarz alternating procedure for overlapping subdomains is developed with domain decomposition. The thesis provides convergence analysis and improvement of the convergence of the algorithm. The convergence of the parallel Schwarz algorithms with DN(or ND), DD, NN, and mixed pseudo-boundary conditions(a weighted combination of Dirichlet and Neumann conditions) is analyzed for both continuous and discrete models in two-subdomain case and various underlying features are explored. The analysis shows that the convergence rate of the algorithm highly depends on the pseudo-boundary conditions and the theoretically best one is the mixed boundary conditions(MM conditions). Also it is shown that there may exist a significant discrepancy between continuous model analysis and discrete model analysis. In order to accelerate the convergence of the parallel Schwarz algorithm, relaxation in pseudo-boundary conditions is introduced and the convergence analysis of the algorithm for two-subdomain case is carried out. The analysis shows that under-relaxation of the pseudo-boundary conditions accelerates the convergence of the parallel Schwarz algorithm if the convergence rate without relaxation is negative, and any relaxation(under or over) decelerates convergence if the convergence rate without relaxation is positive. Numerical implementation of the parallel Schwarz algorithm on an MIMD system requires multi-level iterations: two levels for fixed source problems, three levels for eigenvalue problems. Performance of the algorithm turns out to be very sensitive to the iteration strategy. In general, multi-level iterations provide good performance when

  1. Supercritical Fluid Chromatography of Drugs: Parallel Factor Analysis for Column Testing in a Wide Range of Operational Conditions

    Science.gov (United States)

    Al-Degs, Yahya; Andri, Bertyl; Thiébaut, Didier; Vial, Jérôme

    2017-01-01

    Retention mechanisms involved in supercritical fluid chromatography (SFC) are influenced by interdependent parameters (temperature, pressure, chemistry of the mobile phase, and nature of the stationary phase), a complexity which makes the selection of a proper stationary phase for a given separation a challenging step. For the first time in SFC studies, Parallel Factor Analysis (PARAFAC) was employed to evaluate the chromatographic behavior of eight different stationary phases in a wide range of chromatographic conditions (temperature, pressure, and gradient elution composition). Design of Experiment was used to optimize experiments involving 14 pharmaceutical compounds present in biological and/or environmental samples and with dissimilar physicochemical properties. The results showed the superiority of PARAFAC for the analysis of the three-way (column × drug × condition) data array over unfolding the multiway array to matrices and performing several classical principal component analyses. Thanks to the PARAFAC components, similarity in columns' function, chromatographic trend of drugs, and correlation between separation conditions could be simply depicted: columns were grouped according to their H-bonding forces, while gradient composition was dominating for condition classification. Also, the number of drugs could be efficiently reduced for columns classification as some of them exhibited a similar behavior, as shown by hierarchical clustering based on PARAFAC components. PMID:28695040

  2. Supercritical Fluid Chromatography of Drugs: Parallel Factor Analysis for Column Testing in a Wide Range of Operational Conditions

    Directory of Open Access Journals (Sweden)

    Ramia Z. Al Bakain

    2017-01-01

    Full Text Available Retention mechanisms involved in supercritical fluid chromatography (SFC are influenced by interdependent parameters (temperature, pressure, chemistry of the mobile phase, and nature of the stationary phase, a complexity which makes the selection of a proper stationary phase for a given separation a challenging step. For the first time in SFC studies, Parallel Factor Analysis (PARAFAC was employed to evaluate the chromatographic behavior of eight different stationary phases in a wide range of chromatographic conditions (temperature, pressure, and gradient elution composition. Design of Experiment was used to optimize experiments involving 14 pharmaceutical compounds present in biological and/or environmental samples and with dissimilar physicochemical properties. The results showed the superiority of PARAFAC for the analysis of the three-way (column × drug × condition data array over unfolding the multiway array to matrices and performing several classical principal component analyses. Thanks to the PARAFAC components, similarity in columns’ function, chromatographic trend of drugs, and correlation between separation conditions could be simply depicted: columns were grouped according to their H-bonding forces, while gradient composition was dominating for condition classification. Also, the number of drugs could be efficiently reduced for columns classification as some of them exhibited a similar behavior, as shown by hierarchical clustering based on PARAFAC components.

  3. Block-Parallel Data Analysis with DIY2

    Energy Technology Data Exchange (ETDEWEB)

    Morozov, Dmitriy [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Peterka, Tom [Argonne National Lab. (ANL), Argonne, IL (United States)

    2017-08-30

    DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial, parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.

  4. Characterizing fluorescent dissolved organic matter in a membrane bioreactor via excitation-emission matrix combined with parallel factor analysis.

    Science.gov (United States)

    Maqbool, Tahir; Quang, Viet Ly; Cho, Jinwoo; Hur, Jin

    2016-06-01

    In this study, we successfully tracked the dynamic changes in different constitutes of bound extracellular polymeric substances (bEPS), soluble microbial products (SMP), and permeate during the operation of bench scale membrane bioreactors (MBRs) via fluorescence excitation-emission matrix (EEM) combined with parallel factor analysis (PARAFAC). Three fluorescent groups were identified, including two protein-like (tryptophan-like C1 and tyrosine-like C2) and one microbial humic-like components (C3). In bEPS, protein-like components were consistently more dominant than C3 during the MBR operation, while their relative abundance in SMP depended on aeration intensities. C1 of bEPS exhibited a linear correlation (R(2)=0.738; pbEPS amounts in sludge, and C2 was closely related to the stability of sludge. The protein-like components were more greatly responsible for membrane fouling. Our study suggests that EEM-PARAFAC can be a promising monitoring tool to provide further insight into process evaluation and membrane fouling during MBR operation. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. Measurement and analysis on dynamic behaviour of parallel-plate assembly in nuclear reactors

    International Nuclear Information System (INIS)

    Chen Junjie; Guo Changqing; Zou Changchuan

    1997-01-01

    Measurement and analysis on dynamic behaviour of parallel-plate assembly in nuclear reactors have been explored. The electromagnetic method, a new method of measuring and analysing dynamic behaviour with the parallel-plate assembly as the structure of multi-parallel-beams joining with single-beam, has been presented. Theoretical analysis and computation results of dry-modal natural frequencies show good agreement with experimental measurements

  6. SPSS and SAS programs for determining the number of components using parallel analysis and velicer's MAP test.

    Science.gov (United States)

    O'Connor, B P

    2000-08-01

    Popular statistical software packages do not have the proper procedures for determining the number of components in factor and principal components analyses. Parallel analysis and Velicer's minimum average partial (MAP) test are validated procedures, recommended widely by statisticians. However, many researchers continue to use alternative, simpler, but flawed procedures, such as the eigenvalues-greater-than-one rule. Use of the proper procedures might be increased if these procedures could be conducted within familiar software environments. This paper describes brief and efficient programs for using SPSS and SAS to conduct parallel analyses and the MAP test.

  7. Tutorial: Parallel Computing of Simulation Models for Risk Analysis.

    Science.gov (United States)

    Reilly, Allison C; Staid, Andrea; Gao, Michael; Guikema, Seth D

    2016-10-01

    Simulation models are widely used in risk analysis to study the effects of uncertainties on outcomes of interest in complex problems. Often, these models are computationally complex and time consuming to run. This latter point may be at odds with time-sensitive evaluations or may limit the number of parameters that are considered. In this article, we give an introductory tutorial focused on parallelizing simulation code to better leverage modern computing hardware, enabling risk analysts to better utilize simulation-based methods for quantifying uncertainty in practice. This article is aimed primarily at risk analysts who use simulation methods but do not yet utilize parallelization to decrease the computational burden of these models. The discussion is focused on conceptual aspects of embarrassingly parallel computer code and software considerations. Two complementary examples are shown using the languages MATLAB and R. A brief discussion of hardware considerations is located in the Appendix. © 2016 Society for Risk Analysis.

  8. A g-factor metric for k-t SENSE and k-t PCA based parallel imaging.

    Science.gov (United States)

    Binter, Christian; Ramb, Rebecca; Jung, Bernd; Kozerke, Sebastian

    2016-02-01

    To propose and validate a g-factor formalism for k-t SENSE, k-t PCA and related k-t methods for assessing SNR and temporal fidelity. An analytical gxf -factor formulation in the spatiotemporal frequency domain is derived, enabling assessment of noise and depiction fidelity in both the spatial and frequency domain. Using pseudoreplica analysis of cardiac cine data the gxf -factor description is validated and example data are used to analyze the performance of k-t methods for various parameter settings. Analytical gxf -factor maps were found to agree well with pseudoreplica analysis for 3x, 5x, and 7x k-t SENSE and k-t PCA. While k-t SENSE resulted in lower average gxf values (gx (avg) ) in static regions when compared with k-t PCA, k-t PCA yielded lower gx (avg) values in dynamic regions. Temporal transfer was better preserved with k-t PCA for increasing undersampling factors. The proposed gxf -factor and temporal transfer formalism allows assessing noise performance and temporal depiction fidelity of k-t methods including k-t SENSE and k-t PCA. The framework enables quantitative comparison of different k-t methods relative to frame-by-frame parallel imaging reconstruction. © 2015 Wiley Periodicals, Inc.

  9. Locality-Driven Parallel Static Analysis for Power Delivery Networks

    KAUST Repository

    Zeng, Zhiyu

    2011-06-01

    Large VLSI on-chip Power Delivery Networks (PDNs) are challenging to analyze due to the sheer network complexity. In this article, a novel parallel partitioning-based PDN analysis approach is presented. We use the boundary circuit responses of each partition to divide the full grid simulation problem into a set of independent subgrid simulation problems. Instead of solving exact boundary circuit responses, a more efficient scheme is proposed to provide near-exact approximation to the boundary circuit responses by exploiting the spatial locality of the flip-chip-type power grids. This scheme is also used in a block-based iterative error reduction process to achieve fast convergence. Detailed computational cost analysis and performance modeling is carried out to determine the optimal (or near-optimal) number of partitions for parallel implementation. Through the analysis of several large power grids, the proposed approach is shown to have excellent parallel efficiency, fast convergence, and favorable scalability. Our approach can solve a 16-million-node power grid in 18 seconds on an IBM p5-575 processing node with 16 Power5+ processors, which is 18.8X faster than a state-of-the-art direct solver. © 2011 ACM.

  10. Characterization of CDOM from urban waters in Northern-Northeastern China using excitation-emission matrix fluorescence and parallel factor analysis.

    Science.gov (United States)

    Zhao, Ying; Song, Kaishan; Li, Sijia; Ma, Jianhang; Wen, Zhidan

    2016-08-01

    Chromophoric dissolved organic matter (CDOM) plays an important role in aquatic systems, but high concentrations of organic materials are considered pollutants. The fluorescent component characteristics of CDOM in urban waters sampled from Northern and Northeastern China were examined by excitation-emission matrix fluorescence and parallel factor analysis (EEM-PARAFAC) to investigate the source and compositional changes of CDOM on both space and pollution levels. One humic-like (C1), one tryptophan-like component (C2), and one tyrosine-like component (C3) were identified by PARAFAC. Mean fluorescence intensities of the three CDOM components varied spatially and by pollution level in cities of Northern and Northeastern China during July-August, 2013 and 2014. Principal components analysis (PCA) was conducted to identify the relative distribution of all water samples. Cluster analysis (CA) was also used to categorize the samples into groups of similar pollution levels within a study area. Strong positive linear relationships were revealed between the CDOM absorption coefficients a(254) (R (2) = 0.89, p CDOM components can be applied to monitor water quality in real time compared to that of traditional approaches. These results demonstrate that EEM-PARAFAC is useful to evaluate the dynamics of CDOM fluorescent components in urban waters from Northern and Northeastern China and this method has potential applications for monitoring urban water quality in different regions with various hydrological conditions and pollution levels.

  11. Exploratory factor analysis in Rehabilitation Psychology: a content analysis.

    Science.gov (United States)

    Roberson, Richard B; Elliott, Timothy R; Chang, Jessica E; Hill, Jessica N

    2014-11-01

    Our objective was to examine the use and quality of exploratory factor analysis (EFA) in articles published in Rehabilitation Psychology. Trained raters examined 66 separate exploratory factor analyses in 47 articles published between 1999 and April 2014. The raters recorded the aim of the EFAs, the distributional statistics, sample size, factor retention method(s), extraction and rotation method(s), and whether the pattern coefficients, structure coefficients, and the matrix of association were reported. The primary use of the EFAs was scale development, but the most widely used extraction and rotation method was principle component analysis, with varimax rotation. When determining how many factors to retain, multiple methods (e.g., scree plot, parallel analysis) were used most often. Many articles did not report enough information to allow for the duplication of their results. EFA relies on authors' choices (e.g., factor retention rules extraction, rotation methods), and few articles adhered to all of the best practices. The current findings are compared to other empirical investigations into the use of EFA in published research. Recommendations for improving EFA reporting practices in rehabilitation psychology research are provided.

  12. State-plane analysis of parallel resonant converter

    Science.gov (United States)

    Oruganti, R.; Lee, F. C.

    1985-01-01

    A method for analyzing the complex operation of a parallel resonant converter is developed, utilizing graphical state-plane techniques. The comprehensive mode analysis uncovers, for the first time, the presence of other complex modes besides the continuous conduction mode and the discontinuous conduction mode and determines their theoretical boundaries. Based on the insight gained from the analysis, a novel, high-frequency resonant buck converter is proposed. The voltage conversion ratio of the new converter is almost independent of load.

  13. A parallel solution for high resolution histological image analysis.

    Science.gov (United States)

    Bueno, G; González, R; Déniz, O; García-Rojo, M; González-García, J; Fernández-Carrobles, M M; Vállez, N; Salido, J

    2012-10-01

    This paper describes a general methodology for developing parallel image processing algorithms based on message passing for high resolution images (on the order of several Gigabytes). These algorithms have been applied to histological images and must be executed on massively parallel processing architectures. Advances in new technologies for complete slide digitalization in pathology have been combined with developments in biomedical informatics. However, the efficient use of these digital slide systems is still a challenge. The image processing that these slides are subject to is still limited both in terms of data processed and processing methods. The work presented here focuses on the need to design and develop parallel image processing tools capable of obtaining and analyzing the entire gamut of information included in digital slides. Tools have been developed to assist pathologists in image analysis and diagnosis, and they cover low and high-level image processing methods applied to histological images. Code portability, reusability and scalability have been tested by using the following parallel computing architectures: distributed memory with massive parallel processors and two networks, INFINIBAND and Myrinet, composed of 17 and 1024 nodes respectively. The parallel framework proposed is flexible, high performance solution and it shows that the efficient processing of digital microscopic images is possible and may offer important benefits to pathology laboratories. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  14. Workspace Analysis for Parallel Robot

    Directory of Open Access Journals (Sweden)

    Ying Sun

    2013-05-01

    Full Text Available As a completely new-type of robot, the parallel robot possesses a lot of advantages that the serial robot does not, such as high rigidity, great load-carrying capacity, small error, high precision, small self-weight/load ratio, good dynamic behavior and easy control, hence its range is extended in using domain. In order to find workspace of parallel mechanism, the numerical boundary-searching algorithm based on the reverse solution of kinematics and limitation of link length has been introduced. This paper analyses position workspace, orientation workspace of parallel robot of the six degrees of freedom. The result shows: It is a main means to increase and decrease its workspace to change the length of branch of parallel mechanism; The radius of the movement platform has no effect on the size of workspace, but will change position of workspace.

  15. Optical escape factors for Doppler profiles in spherical, cylindrical and plane parallel geometries

    International Nuclear Information System (INIS)

    Otsuka, Masamoto.

    1977-12-01

    Optical escape factors for Doppler profiles in spherical, cylindrical and plane parallel geometries are tabulated over the range of optical depths from 10 -3 to 10 5 . Relations with the known formulae are discussed also. (auth.)

  16. Absorbed dose calibration factors for parallel-plate chambers in high energy photon beams

    International Nuclear Information System (INIS)

    McEwen, M.R.; Duane, S.; Thomas, R.A.S.

    2002-01-01

    An investigation was carried out into the performance of parallel-plate chambers in 60 Co and MV photon beams. The aim was to derive calibration factors, investigate chamber-to-chamber variability and provide much-needed information on the use of parallel-plate chambers in high-energy X-ray beams. A set of NE2561/NE2611 reference chambers, calibrated against the primary standard graphite calorimeter is used for the dissemination of absorbed dose to water. The parallel-plate chambers were calibrated by comparison with the NPL reference chambers in a water phantom. Two types of parallel-plate chamber were investigated - the NACP -02 and Roos and measurements were made at 60 C0 and 6 linac photon energies (6-19 MV). Calibration factors were derived together with polarity corrections. The standard uncertainty in the calibration of a chamber in terms of absorbed dose to water is estimated to be ±0.75%. The results of the polarity measurements were somewhat confusing. One would expect the correction to be small and previous measurements in electron beams have indicated that there is little variation between chambers of these types. However, some chambers gave unexpectedly large polarity corrections, up to 0.8%. By contrast the measured polarity correction for a NE2611 chamber was less than 0.13% at all energies. The reason for these large polarity corrections is not clear, but experimental error and linac variations have been ruled out. By combining the calibration data for the different chambers it was possible to obtain experimental k Q factors for the two chamber types. It would appear from the data that the variations between chambers of the same type are random and one can therefore define a generic curve for each chamber type. These are presented in Figure 1, together with equivalent data for two cylindrical chamber types - NE2561/NE2611 and NE2571. As can be seen, there is a clear difference between the curves for the cylindrical chambers and those for the

  17. State-space-based harmonic stability analysis for paralleled grid-connected inverters

    DEFF Research Database (Denmark)

    Wang, Yanbo; Wang, Xiongfei; Chen, Zhe

    2016-01-01

    This paper addresses a state-space-based harmonic stability analysis of paralleled grid-connected inverters system. A small signal model of individual inverter is developed, where LCL filter, the equivalent delay of control system, and current controller are modeled. Then, the overall small signal...... model of paralleled grid-connected inverters is built. Finally, the state space-based stability analysis approach is developed to explain the harmonic resonance phenomenon. The eigenvalue traces associated with time delay and coupled grid impedance are obtained, which accounts for how the unstable...... inverter produces the harmonic resonance and leads to the instability of whole paralleled system. The proposed approach reveals the contributions of the grid impedance as well as the coupled effect on other grid-connected inverters under different grid conditions. Simulation and experimental results...

  18. Kinematic analysis of parallel manipulators by algebraic screw theory

    CERN Document Server

    Gallardo-Alvarado, Jaime

    2016-01-01

    This book reviews the fundamentals of screw theory concerned with velocity analysis of rigid-bodies, confirmed with detailed and explicit proofs. The author additionally investigates acceleration, jerk, and hyper-jerk analyses of rigid-bodies following the trend of the velocity analysis. With the material provided in this book, readers can extend the theory of screws into the kinematics of optional order of rigid-bodies. Illustrative examples and exercises to reinforce learning are provided. Of particular note, the kinematics of emblematic parallel manipulators, such as the Delta robot as well as the original Gough and Stewart platforms are revisited applying, in addition to the theory of screws, new methods devoted to simplify the corresponding forward-displacement analysis, a challenging task for most parallel manipulators. Stands as the only book devoted to the acceleration, jerk and hyper-jerk (snap) analyses of rigid-body by means of screw theory; Provides new strategies to simplify the forward kinematic...

  19. Parallel Factor-Based Model for Two-Dimensional Direction Estimation

    Directory of Open Access Journals (Sweden)

    Nizar Tayem

    2017-01-01

    Full Text Available Two-dimensional (2D Direction-of-Arrivals (DOA estimation for elevation and azimuth angles assuming noncoherent, mixture of coherent and noncoherent, and coherent sources using extended three parallel uniform linear arrays (ULAs is proposed. Most of the existing schemes have drawbacks in estimating 2D DOA for multiple narrowband incident sources as follows: use of large number of snapshots, estimation failure problem for elevation and azimuth angles in the range of typical mobile communication, and estimation of coherent sources. Moreover, the DOA estimation for multiple sources requires complex pair-matching methods. The algorithm proposed in this paper is based on first-order data matrix to overcome these problems. The main contributions of the proposed method are as follows: (1 it avoids estimation failure problem using a new antenna configuration and estimates elevation and azimuth angles for coherent sources; (2 it reduces the estimation complexity by constructing Toeplitz data matrices, which are based on a single or few snapshots; (3 it derives parallel factor (PARAFAC model to avoid pair-matching problems between multiple sources. Simulation results demonstrate the effectiveness of the proposed algorithm.

  20. Analysis of multigrid methods on massively parallel computers: Architectural implications

    Science.gov (United States)

    Matheson, Lesley R.; Tarjan, Robert E.

    1993-01-01

    We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.

  1. Modeling and Grid impedance Variation Analysis of Parallel Connected Grid Connected Inverter based on Impedance Based Harmonic Analysis

    DEFF Research Database (Denmark)

    Kwon, JunBum; Wang, Xiongfei; Bak, Claus Leth

    2014-01-01

    This paper addresses the harmonic compensation error problem existing with parallel connected inverter in the same grid interface conditions by means of impedance-based analysis and modeling. Unlike the single grid connected inverter, it is found that multiple parallel connected inverters and grid...... impedance can make influence to each other if they each have a harmonic compensation function. The analysis method proposed in this paper is based on the relationship between the overall output impedance and input impedance of parallel connected inverter, where controller gain design method, which can...

  2. Vacuum Large Current Parallel Transfer Numerical Analysis

    Directory of Open Access Journals (Sweden)

    Enyuan Dong

    2014-01-01

    Full Text Available The stable operation and reliable breaking of large generator current are a difficult problem in power system. It can be solved successfully by the parallel interrupters and proper timing sequence with phase-control technology, in which the strategy of breaker’s control is decided by the time of both the first-opening phase and second-opening phase. The precise transfer current’s model can provide the proper timing sequence to break the generator circuit breaker. By analysis of the transfer current’s experiments and data, the real vacuum arc resistance and precise correctional model in the large transfer current’s process are obtained in this paper. The transfer time calculated by the correctional model of transfer current is very close to the actual transfer time. It can provide guidance for planning proper timing sequence and breaking the vacuum generator circuit breaker with the parallel interrupters.

  3. Parallel O(log n) algorithms for open- and closed-chain rigid multibody systems based on a new mass matrix factorization technique

    Science.gov (United States)

    Fijany, Amir

    1993-01-01

    In this paper, parallel O(log n) algorithms for computation of rigid multibody dynamics are developed. These parallel algorithms are derived by parallelization of new O(n) algorithms for the problem. The underlying feature of these O(n) algorithms is a drastically different strategy for decomposition of interbody force which leads to a new factorization of the mass matrix (M). Specifically, it is shown that a factorization of the inverse of the mass matrix in the form of the Schur Complement is derived as M(exp -1) = C - B(exp *)A(exp -1)B, wherein matrices C, A, and B are block tridiagonal matrices. The new O(n) algorithm is then derived as a recursive implementation of this factorization of M(exp -1). For the closed-chain systems, similar factorizations and O(n) algorithms for computation of Operational Space Mass Matrix lambda and its inverse lambda(exp -1) are also derived. It is shown that these O(n) algorithms are strictly parallel, that is, they are less efficient than other algorithms for serial computation of the problem. But, to our knowledge, they are the only known algorithms that can be parallelized and that lead to both time- and processor-optimal parallel algorithms for the problem, i.e., parallel O(log n) algorithms with O(n) processors. The developed parallel algorithms, in addition to their theoretical significance, are also practical from an implementation point of view due to their simple architectural requirements.

  4. Asterias: A Parallelized Web-based Suite for the Analysis of Expression and aCGH Data

    Directory of Open Access Journals (Sweden)

    Ramón Díaz-Uriarte

    2007-01-01

    Full Text Available The analysis of expression and CGH arrays plays a central role in the study of complex diseases, especially cancer, including finding markers for early diagnosis and prognosis, choosing an optimal therapy, or increasing our understanding of cancer development and metastasis. Asterias (http://www.asterias.info is an integrated collection of freely-accessible web tools for the analysis of gene expression and aCGH data. Most of the tools use parallel computing (via MPI and run on a server with 60 CPUs for computation; compared to a desktop or server-based but not parallelized application, parallelization provides speed ups of factors up to 50. Most of our applications allow the user to obtain additional information for user-selected genes (chromosomal location, PubMed ids, Gene Ontology terms, etc. by using clickable links in tables and/or fi gures. Our tools include: normalization of expression and aCGH data (DNMAD; converting between different types of gene/clone and protein identifi ers (IDconverter/IDClight; fi ltering and imputation (preP; finding differentially expressed genes related to patient class and survival data (Pomelo II; searching for models of class prediction (Tnasas; using random forests to search for minimal models for class prediction or for large subsets of genes with predictive capacity (GeneSrF; searching for molecular signatures and predictive genes with survival data (SignS; detecting regions of genomic DNA gain or loss (ADaCGH. The capability to send results between different applications, access to additional functional information, and parallelized computation make our suite unique and exploit features only available to web-based applications.

  5. Parallelization of Subchannel Analysis Code MATRA

    International Nuclear Information System (INIS)

    Kim, Seongjin; Hwang, Daehyun; Kwon, Hyouk

    2014-01-01

    A stand-alone calculation of MATRA code used up pertinent computing time for the thermal margin calculations while a relatively considerable time is needed to solve the whole core pin-by-pin problems. In addition, it is strongly required to improve the computation speed of the MATRA code to satisfy the overall performance of the multi-physics coupling calculations. Therefore, a parallel approach to improve and optimize the computability of the MATRA code is proposed and verified in this study. The parallel algorithm is embodied in the MATRA code using the MPI communication method and the modification of the previous code structure was minimized. An improvement is confirmed by comparing the results between the single and multiple processor algorithms. The speedup and efficiency are also evaluated when increasing the number of processors. The parallel algorithm was implemented to the subchannel code MATRA using the MPI. The performance of the parallel algorithm was verified by comparing the results with those from the MATRA with the single processor. It is also noticed that the performance of the MATRA code was greatly improved by implementing the parallel algorithm for the 1/8 core and whole core problems

  6. Analysis of jacobian and singularity of planar parallel robots using screw theory

    Energy Technology Data Exchange (ETDEWEB)

    Choi, Jung Hyun; Lee, Jeh Won; Lee, Hyuk Jin [Yeungnam Univ., Gyeongsan (Korea, Republic of)

    2012-11-15

    The Jacobian and singularity analysis of parallel robots is necessary to analyze robot motion. The derivations of the Jacobian matrix and singularity configuration are complicated and have no geometrical earning in the velocity form of the Jacobian matrix. In this study, the screw theory is used to derive the Jacobian of parallel robots. The statics form of the Jacobian has a geometrical meaning. In addition, singularity analysis can be performed by using the geometrical values. Furthermore, this study shows that the screw theory is applicable to redundantly actuated robots as well as non redundant robots.

  7. Analysis of Retransmission Policies for Parallel Data Transmission

    Directory of Open Access Journals (Sweden)

    I. A. Halepoto

    2018-06-01

    Full Text Available Stream control transmission protocol (SCTP is a transport layer protocol, which is efficient, reliable, and connection-oriented as compared to transmission control protocol (TCP and user datagram protocol (UDP. Additionally, SCTP has more innovative features like multihoming, multistreaming and unordered delivery. With multihoming, SCTP establishes multiple paths between a sender and receiver. However, it only uses the primary path for data transmission and the secondary path (or paths for fault tolerance. Concurrent multipath transfer extension of SCTP (CMT-SCTP allows a sender to transmit data in parallel over multiple paths, which increases the overall transmission throughput. Parallel data transmission is beneficial for higher data rates. Parallel transmission or connection is also good in services such as video streaming where if one connection is occupied with errors the transmission continues on alternate links. With parallel transmission, the unordered data packets arrival is very common at receiver. The receiver has to wait until the missing data packets arrive, causing performance degradation while using CMT-SCTP. In order to reduce the transmission delay at the receiver, CMT-SCTP uses intelligent retransmission polices to immediately retransmit the missing packets. The retransmission policies used by CMT-SCTP are RTX-SSTHRESH, RTX-LOSSRATE and RTX-CWND. The main objective of this paper is the performance analysis of the retransmission policies. This paper evaluates RTX-SSTHRESH, RTX-LOSSRATE and RTX-CWND. Simulations are performed on the Network Simulator 2. In the simulations with various scenarios and parameters, it is observed that the RTX-LOSSRATE is a suitable policy.

  8. Data-Parallel Mesh Connected Components Labeling and Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Harrison, Cyrus; Childs, Hank; Gaither, Kelly

    2011-04-10

    We present a data-parallel algorithm for identifying and labeling the connected sub-meshes within a domain-decomposed 3D mesh. The identification task is challenging in a distributed-memory parallel setting because connectivity is transitive and the cells composing each sub-mesh may span many or all processors. Our algorithm employs a multi-stage application of the Union-find algorithm and a spatial partitioning scheme to efficiently merge information across processors and produce a global labeling of connected sub-meshes. Marking each vertex with its corresponding sub-mesh label allows us to isolate mesh features based on topology, enabling new analysis capabilities. We briefly discuss two specific applications of the algorithm and present results from a weak scaling study. We demonstrate the algorithm at concurrency levels up to 2197 cores and analyze meshes containing up to 68 billion cells.

  9. [Resolving excitation emission matrix spectroscopy of estuarine CDOM with parallel factor analysis and its application in organic pollution monitoring].

    Science.gov (United States)

    Guo, Wei-Dong; Huang, Jian-Ping; Hong, Hua-Sheng; Xu, Jing; Deng, Xun

    2010-06-01

    The distribution and estuarine behavior of fluorescent components of chromophoric dissolved organic matter (CDOM) from Jiulong Estuary were determined by fluorescence excitation emission matrix spectroscopy (EEMs) combined with parallel factor analysis (PARAFAC). The feasibility of these components as tracers for organic pollution in estuarine environments was also evaluated. Four separate fluorescent components were identified by PARAFAC, including three humic-like components (C1: 240, 310/382 nm; C2: 230, 250, 340/422 nm; C4: 260, 390/482 nm) and one protein-like components (C3: 225, 275/342 nm). These results indicated that UV humic-like peak A area designated by traditional "peak-picking method" was not a single peak but actually a combination of several fluorescent components, and it also had inherent links to so-called marine humic-like peak M or terrestrial humic-like peak C. Component C2 which include peak M decreased with increase of salinity in Jiulong Estuary, demonstrating that peak M can not be thought as the specific indicator of the "marine" humic-like component. Two humic-like components C1 and C2 showed additional behavior in the turbidity maximum region (salinity CDOM may provide a fast in-situ way to monitor the variation of the degree of organic pollution in estuarine environments.

  10. [Application of excitation-emission matrix spectrum combined with parallel factor analysis in dissolved organic matter in East China Sea].

    Science.gov (United States)

    Lü, Li-Sha; Zhao, Wei-Hong; Miao, Hui

    2013-03-01

    Using excitation-emission matrix spectrum(EEMs) combined with parallel factor analysis (PARAFAC) examine the fluorescent components feature of dissolved organic matter (DOM) sampled from East China Sea in the summer and autumn was examined. The type, distribution and origin of the fluorescence dissolved organic matter were also discussed. Three fluorescent components were identified by PARAFAC, including protein-like component C1 (235, 280/330), terrestrial or marine humic-like component C2 (255, 330/400) and terrestrial humic-like component C3 (275, 360/480). The good linearity of the two humic-like components showed the same source or some relationship between the chemical constitutions. As a whole, the level of the fluorescence intensity in coastal ocean was higher than that of the open ocean in different water layers in two seasons. The relationship of three components with chlorophyll-a and salinity showed the DOM in the study area is almost not influenced by the living algal matter, but the fresh water outflow of the Yangtze River might be the source of them in the Yangtze River estuary in Summer. From what has been discussed above, we can draw the conclusion that the application of EEM-PARAFAC modeling will exert a profound influence upon the research of the dissolved organic matter.

  11. Parallel Dynamic Analysis of a Large-Scale Water Conveyance Tunnel under Seismic Excitation Using ALE Finite-Element Method

    Directory of Open Access Journals (Sweden)

    Xiaoqing Wang

    2016-01-01

    Full Text Available Parallel analyses about the dynamic responses of a large-scale water conveyance tunnel under seismic excitation are presented in this paper. A full three-dimensional numerical model considering the water-tunnel-soil coupling is established and adopted to investigate the tunnel’s dynamic responses. The movement and sloshing of the internal water are simulated using the multi-material Arbitrary Lagrangian Eulerian (ALE method. Nonlinear fluid–structure interaction (FSI between tunnel and inner water is treated by using the penalty method. Nonlinear soil-structure interaction (SSI between soil and tunnel is dealt with by using the surface to surface contact algorithm. To overcome computing power limitations and to deal with such a large-scale calculation, a parallel algorithm based on the modified recursive coordinate bisection (MRCB considering the balance of SSI and FSI loads is proposed and used. The whole simulation is accomplished on Dawning 5000 A using the proposed MRCB based parallel algorithm optimized to run on supercomputers. The simulation model and the proposed approaches are validated by comparison with the added mass method. Dynamic responses of the tunnel are analyzed and the parallelism is discussed. Besides, factors affecting the dynamic responses are investigated. Better speedup and parallel efficiency show the scalability of the parallel method and the analysis results can be used to aid in the design of water conveyance tunnels.

  12. Dynamic and Control Analysis of Modular Multi-Parallel Rectifiers (MMR)

    DEFF Research Database (Denmark)

    Zare, Firuz; Ghosh, Arindam; Davari, Pooya

    2017-01-01

    This paper presents dynamic analysis of a Modular Multi-Parallel Rectifier (MMR) based on state-space modelling and analysis. The proposed topology is suitable for high power application which can reduce line current harmonics emissions significantly. However, a proper controller is required...... to share and control current through each rectifier. Mathematical analysis and preliminary simulations have been carried out to verify the proposed controller under different operating conditions....

  13. A Dual Super-Element Domain Decomposition Approach for Parallel Nonlinear Finite Element Analysis

    Science.gov (United States)

    Jokhio, G. A.; Izzuddin, B. A.

    2015-05-01

    This article presents a new domain decomposition method for nonlinear finite element analysis introducing the concept of dual partition super-elements. The method extends ideas from the displacement frame method and is ideally suited for parallel nonlinear static/dynamic analysis of structural systems. In the new method, domain decomposition is realized by replacing one or more subdomains in a "parent system," each with a placeholder super-element, where the subdomains are processed separately as "child partitions," each wrapped by a dual super-element along the partition boundary. The analysis of the overall system, including the satisfaction of equilibrium and compatibility at all partition boundaries, is realized through direct communication between all pairs of placeholder and dual super-elements. The proposed method has particular advantages for matrix solution methods based on the frontal scheme, and can be readily implemented for existing finite element analysis programs to achieve parallelization on distributed memory systems with minimal intervention, thus overcoming memory bottlenecks typically faced in the analysis of large-scale problems. Several examples are presented in this article which demonstrate the computational benefits of the proposed parallel domain decomposition approach and its applicability to the nonlinear structural analysis of realistic structural systems.

  14. Analysis and implementation of LLC-T series parallel resonant ...

    African Journals Online (AJOL)

    A prototype 300 W, 100 kHz converter is designed and built to experimentally demonstrate, dynamic and steady state performance for the LLC-T series parallel resonant converter. A comparative study is performed between experimental results and the simulation studies. The analysis shows that the output of converter is ...

  15. Performance Analysis of Parallel Mathematical Subroutine library PARCEL

    International Nuclear Information System (INIS)

    Yamada, Susumu; Shimizu, Futoshi; Kobayashi, Kenichi; Kaburaki, Hideo; Kishida, Norio

    2000-01-01

    The parallel mathematical subroutine library PARCEL (Parallel Computing Elements) has been developed by Japan Atomic Energy Research Institute for easy use of typical parallelized mathematical codes in any application problems on distributed parallel computers. The PARCEL includes routines for linear equations, eigenvalue problems, pseudo-random number generation, and fast Fourier transforms. It is shown that the results of performance for linear equations routines exhibit good parallelization efficiency on vector, as well as scalar, parallel computers. A comparison of the efficiency results with the PETSc (Portable Extensible Tool kit for Scientific Computations) library has been reported. (author)

  16. Kinematic Analysis and Performance Evaluation of Novel PRS Parallel Mechanism

    Science.gov (United States)

    Balaji, K.; Khan, B. Shahul Hamid

    2018-02-01

    In this paper, a 3 DoF (Degree of Freedom) novel PRS (Prismatic-Revolute- Spherical) type parallel mechanisms has been designed and presented. The combination of striaght and arc type linkages for 3 DOF parallel mechanism is introduced for the first time. The performances of the mechanisms are evaluated based on the indices such as Minimum Singular Value (MSV), Condition Number (CN), Local Conditioning Index (LCI), Kinematic Configuration Index (KCI) and Global Conditioning Index (GCI). The overall reachable workspace of all mechanisms are presented. The kinematic measure, dexterity measure and workspace analysis for all the mechanism have been evaluated and compared.

  17. Digital tomosynthesis parallel imaging computational analysis with shift and add and back projection reconstruction algorithms.

    Science.gov (United States)

    Chen, Ying; Balla, Apuroop; Rayford II, Cleveland E; Zhou, Weihua; Fang, Jian; Cong, Linlin

    2010-01-01

    Digital tomosynthesis is a novel technology that has been developed for various clinical applications. Parallel imaging configuration is utilised in a few tomosynthesis imaging areas such as digital chest tomosynthesis. Recently, parallel imaging configuration for breast tomosynthesis began to appear too. In this paper, we present the investigation on computational analysis of impulse response characterisation as the start point of our important research efforts to optimise the parallel imaging configurations. Results suggest that impulse response computational analysis is an effective method to compare and optimise imaging configurations.

  18. Screw-System-Based Mobility Analysis of a Family of Fully Translational Parallel Manipulators

    Directory of Open Access Journals (Sweden)

    Ernesto Rodriguez-Leal

    2013-01-01

    Full Text Available This paper investigates the mobility of a family of fully translational parallel manipulators based on screw system analysis by identifying the common constraint and redundant constraints, providing a case study of this approach. The paper presents the branch motion-screws for the 3-RP̲C-Y parallel manipulator, the 3-RCC-Y (or 3-RP̲RC-Y parallel manipulator, and a newly proposed 3-RP̲C-T parallel manipulator. Then the paper determines the sets of platform constraint-screws for each of these three manipulators. The constraints exerted on the platforms of the 3-RP̲C architectures and the 3-RCC-Y manipulators are analyzed using the screw system approach and have been identified as couples. A similarity has been identified with the axes of couples: they are perpendicular to the R joint axes, but in the former the axes are coplanar with the base and in the latter the axes are perpendicular to the limb. The remaining couples act about the axis that is normal to the base. The motion-screw system and constraint-screw system analysis leads to the insightful understanding of the mobility of the platform that is then obtained by determining the reciprocal screws to the platform constraint screw sets, resulting in three independent instantaneous translational degrees-of-freedom. To validate the mobility analysis of the three parallel manipulators, the paper includes motion simulations which use a commercially available kinematics software.

  19. Parallelization of pressure equation solver for incompressible N-S equations

    International Nuclear Information System (INIS)

    Ichihara, Kiyoshi; Yokokawa, Mitsuo; Kaburaki, Hideo.

    1996-03-01

    A pressure equation solver in a code for 3-dimensional incompressible flow analysis has been parallelized by using red-black SOR method and PCG method on Fujitsu VPP500, a vector parallel computer with distributed memory. For the comparison of scalability, the solver using the red-black SOR method has been also parallelized on the Intel Paragon, a scalar parallel computer with a distributed memory. The scalability of the red-black SOR method on both VPP500 and Paragon was lost, when number of processor elements was increased. The reason of non-scalability on both systems is increasing communication time between processor elements. In addition, the parallelization by DO-loop division makes the vectorizing efficiency lower on VPP500. For an effective implementation on VPP500, a large scale problem which holds very long vectorized DO-loops in the parallel program should be solved. PCG method with red-black SOR method applied to incomplete LU factorization (red-black PCG) has more iteration steps than normal PCG method with forward and backward substitution, in spite of same number of the floating point operations in a DO-loop of incomplete LU factorization. The parallelized red-black PCG method has less merits than the parallelized red-black SOR method when the computational region has fewer grids, because the low vectorization efficiency is obtained in red-black PCG method. (author)

  20. Framework for Interactive Parallel Dataset Analysis on the Grid

    Energy Technology Data Exchange (ETDEWEB)

    Alexander, David A.; Ananthan, Balamurali; /Tech-X Corp.; Johnson, Tony; Serbo, Victor; /SLAC

    2007-01-10

    We present a framework for use at a typical Grid site to facilitate custom interactive parallel dataset analysis targeting terabyte-scale datasets of the type typically produced by large multi-institutional science experiments. We summarize the needs for interactive analysis and show a prototype solution that satisfies those needs. The solution consists of desktop client tool and a set of Web Services that allow scientists to sign onto a Grid site, compose analysis script code to carry out physics analysis on datasets, distribute the code and datasets to worker nodes, collect the results back to the client, and to construct professional-quality visualizations of the results.

  1. Parallel imports of hospital pharmaceuticals: An empirical analysis of price effects from parallel imports and the design of procurement procedures in the Danish hospital sector

    OpenAIRE

    Hostenkamp, Gisela; Kronborg, Christian; Arendt, Jacob Nielsen

    2012-01-01

    We analyse pharmaceutical imports in the Danish hospital sector. In this market medicines are publicly tendered using first-price sealed-bid procurement auctions. We analyse whether parallel imports have an effect on pharmaceutical prices and whether the way tenders were organised matters for the competitive effect of parallel imports on prices. Our theoretical analysis shows that the design of the procurement rules affects both market structure and pharmaceutical prices. Parallel imports may...

  2. Parallel imaging: is GRAPPA a useful acquisition tool for MR imaging intended for volumetric brain analysis?

    Directory of Open Access Journals (Sweden)

    Frank Anders

    2009-08-01

    Full Text Available Abstract Background The work presented here investigates parallel imaging applied to T1-weighted high resolution imaging for use in longitudinal volumetric clinical studies involving Alzheimer's disease (AD and Mild Cognitive Impairment (MCI patients. This was in an effort to shorten acquisition times to minimise the risk of motion artefacts caused by patient discomfort and disorientation. The principle question is, "Can parallel imaging be used to acquire images at 1.5 T of sufficient quality to allow volumetric analysis of patient brains?" Methods Optimisation studies were performed on a young healthy volunteer and the selected protocol (including the use of two different parallel imaging acceleration factors was then tested on a cohort of 15 elderly volunteers including MCI and AD patients. In addition to automatic brain segmentation, hippocampus volumes were manually outlined and measured in all patients. The 15 patients were scanned on a second occasion approximately one week later using the same protocol and evaluated in the same manner to test repeatability of measurement using images acquired with the GRAPPA parallel imaging technique applied to the MPRAGE sequence. Results Intraclass correlation tests show that almost perfect agreement between repeated measurements of both segmented brain parenchyma fraction and regional measurement of hippocampi. The protocol is suitable for both global and regional volumetric measurement dementia patients. Conclusion In summary, these results indicate that parallel imaging can be used without detrimental effect to brain tissue segmentation and volumetric measurement and should be considered for both clinical and research studies where longitudinal measurements of brain tissue volumes are of interest.

  3. Logical inference techniques for loop parallelization

    KAUST Repository

    Oancea, Cosmin E.; Rauchwerger, Lawrence

    2012-01-01

    This paper presents a fully automatic approach to loop parallelization that integrates the use of static and run-time analysis and thus overcomes many known difficulties such as nonlinear and indirect array indexing and complex control flow. Our hybrid analysis framework validates the parallelization transformation by verifying the independence of the loop's memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S = Ø, where S is a set expression representing array indexes. Using a language instead of an array-abstraction representation for S results in a smaller number of conservative approximations but exhibits a potentially-high runtime cost. To alleviate this cost we introduce a language translation F from the USR set-expression language to an equally rich language of predicates (F(S) ⇒ S = Ø). Loop parallelization is then validated using a novel logic inference algorithm that factorizes the obtained complex predicates (F(S)) into a sequence of sufficient-independence conditions that are evaluated first statically and, when needed, dynamically, in increasing order of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECTCLUB and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers. Copyright © 2012 ACM.

  4. Logical inference techniques for loop parallelization

    KAUST Repository

    Oancea, Cosmin E.

    2012-01-01

    This paper presents a fully automatic approach to loop parallelization that integrates the use of static and run-time analysis and thus overcomes many known difficulties such as nonlinear and indirect array indexing and complex control flow. Our hybrid analysis framework validates the parallelization transformation by verifying the independence of the loop\\'s memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S = Ø, where S is a set expression representing array indexes. Using a language instead of an array-abstraction representation for S results in a smaller number of conservative approximations but exhibits a potentially-high runtime cost. To alleviate this cost we introduce a language translation F from the USR set-expression language to an equally rich language of predicates (F(S) ⇒ S = Ø). Loop parallelization is then validated using a novel logic inference algorithm that factorizes the obtained complex predicates (F(S)) into a sequence of sufficient-independence conditions that are evaluated first statically and, when needed, dynamically, in increasing order of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECTCLUB and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers. Copyright © 2012 ACM.

  5. Modeling, analysis, and design of stationary reference frame droop controlled parallel three-phase voltage source inverters

    DEFF Research Database (Denmark)

    Vasquez, Juan Carlos; Guerrero, Josep M.; Savaghebi, Mehdi

    2013-01-01

    Power electronics based MicroGrids consist of a number of voltage source inverters (VSIs) operating in parallel. In this paper, the modeling, control design, and stability analysis of parallel connected three-phase VSIs are derived. The proposed voltage and current inner control loops and the mat......Power electronics based MicroGrids consist of a number of voltage source inverters (VSIs) operating in parallel. In this paper, the modeling, control design, and stability analysis of parallel connected three-phase VSIs are derived. The proposed voltage and current inner control loops...... control restores the frequency and amplitude deviations produced by the primary control. Also, a synchronization algorithm is presented in order to connect the MicroGrid to the grid. Experimental results are provided to validate the performance and robustness of the parallel VSI system control...

  6. Analysis of series resonant converter with series-parallel connection

    Science.gov (United States)

    Lin, Bor-Ren; Huang, Chien-Lan

    2011-02-01

    In this study, a parallel inductor-inductor-capacitor (LLC) resonant converter series-connected on the primary side and parallel-connected on the secondary side is presented for server power supply systems. Based on series resonant behaviour, the power metal-oxide-semiconductor field-effect transistors are turned on at zero voltage switching and the rectifier diodes are turned off at zero current switching. Thus, the switching losses on the power semiconductors are reduced. In the proposed converter, the primary windings of the two LLC converters are connected in series. Thus, the two converters have the same primary currents to ensure that they can supply the balance load current. On the output side, two LLC converters are connected in parallel to share the load current and to reduce the current stress on the secondary windings and the rectifier diodes. In this article, the principle of operation, steady-state analysis and design considerations of the proposed converter are provided and discussed. Experiments with a laboratory prototype with a 24 V/21 A output for server power supply were performed to verify the effectiveness of the proposed converter.

  7. Model-driven product line engineering for mapping parallel algorithms to parallel computing platforms

    NARCIS (Netherlands)

    Arkin, Ethem; Tekinerdogan, Bedir

    2016-01-01

    Mapping parallel algorithms to parallel computing platforms requires several activities such as the analysis of the parallel algorithm, the definition of the logical configuration of the platform, the mapping of the algorithm to the logical configuration platform and the implementation of the

  8. New Structural Representation and Digital-Analysis Platform for Symmetrical Parallel Mechanisms

    Directory of Open Access Journals (Sweden)

    Wenao Cao

    2013-05-01

    Full Text Available Abstract An automatic design platform capable of automatic structural analysis, structural synthesis and the application of parallel mechanisms will be a great aid in the conceptual design of mechanisms, though up to now such a platform has only existed as an idea. The work in this paper constitutes part of such a platform. Based on the screw theory and a new structural representation method proposed here which builds a one-to-one correspondence between the strings of representative characters and the kinematic structures of symmetrical parallel mechanisms (SPMs, this paper develops a fully-automatic approach for mobility (degree-of-freedom analysis, and further establishes an automatic digital-analysis platform for SPMs. With this platform, users simply have to enter the strings of representative characters, and the kinematic structures of the SPMs will be generated and displayed automatically, and the mobility and its properties will also be analysed and displayed automatically. Typical examples are provided to show the effectiveness of the approach.

  9. Optimization of headspace experimental factors to determine chlorophenols in water by means of headspace solid-phase microextraction and gas chromatography coupled with mass spectrometry and parallel factor analysis.

    Science.gov (United States)

    Morales, Rocío; Cruz Ortiz, M; Sarabia, Luis A

    2012-11-19

    In this work an analytical procedure based on headspace solid-phase microextraction and gas chromatography coupled with mass spectrometry (HS-SPME-GC/MS) is proposed to determine chlorophenols with prior derivatization step to improve analyte volatility and therefore the decision limit (CCα). After optimization, the analytical procedure was applied to analyze river water samples. The following analytes are studied: 2,4-dichlorophenol (2,4-DCP), 2,4,6-trichlorophenol (2,4,6-TrCP), 2,3,4,6-tetrachlorophenol (2,4,6-TeCP) and pentachlorophenol (PCP). A D-optimal design is used to study the parameters affecting the HS-SPME process and the derivatization step. Four experimental factors at two levels and one factor at three levels were considered: (i) equilibrium/extraction temperature, (ii) extraction time, (iii) sample volume, (iv) agitation time and (v) equilibrium time. In addition two interactions between four of them were considered. The D-optimal design enables the reduction of the number of experiments from 48 to 18 while maintaining enough precision in the estimation of the effects. As every analysis took 1h, the design is blocked in 2 days. The second-order property of the PARAFAC (parallel factor analysis) decomposition avoids the need of fitting a new calibration model each time that the experimental conditions change. In consequence, the standardized loadings in the sample mode estimated by a PARAFAC decomposition are the response used in the design because they are proportional to the amount of analyte extracted. It has been found that block effect is significant and that 60°C equilibrium temperature together with 25min extraction time are necessary to achieve the best extraction for the chlorophenols analyzed. The other factors and interactions were not significant. After that, a calibration based in a PARAFAC2 decomposition provided the following values of CCα: 120, 208, 86, 39ngL(-1) for 2,4-DCP, 2,4,6-TrCP, 2,3,4,5-TeCP and PCP respectively for a

  10. Analysis and Modeling of Circulating Current in Two Parallel-Connected Inverters

    DEFF Research Database (Denmark)

    Maheshwari, Ram Krishan; Gohil, Ghanshyamsinh Vijaysinh; Bede, Lorand

    2015-01-01

    Parallel-connected inverters are gaining attention for high power applications because of the limited power handling capability of the power modules. Moreover, the parallel-connected inverters may have low total harmonic distortion of the ac current if they are operated with the interleaved pulse...... this model, the circulating current between two parallel-connected inverters is analysed in this study. The peak and root mean square (rms) values of the normalised circulating current are calculated for different PWM methods, which makes this analysis a valuable tool to design a filter for the circulating......-width modulation (PWM). However, the interleaved PWM causes a circulating current between the inverters, which in turn causes additional losses. A model describing the dynamics of the circulating current is presented in this study which shows that the circulating current depends on the common-mode voltage. Using...

  11. Analysis on detection accuracy of binocular photoelectric instrument optical axis parallelism digital calibration instrument

    Science.gov (United States)

    Ying, Jia-ju; Yin, Jian-ling; Wu, Dong-sheng; Liu, Jie; Chen, Yu-dan

    2017-11-01

    Low-light level night vision device and thermal infrared imaging binocular photoelectric instrument are used widely. The maladjustment of binocular instrument ocular axises parallelism will cause the observer the symptom such as dizziness, nausea, when use for a long time. Binocular photoelectric equipment digital calibration instrument is developed for detecting ocular axises parallelism. And the quantitative value of optical axis deviation can be quantitatively measured. As a testing instrument, the precision must be much higher than the standard of test instrument. Analyzes the factors that influence the accuracy of detection. Factors exist in each testing process link which affect the precision of the detecting instrument. They can be divided into two categories, one category is factors which directly affect the position of reticle image, the other category is factors which affect the calculation the center of reticle image. And the Synthesize error is calculated out. And further distribute the errors reasonably to ensure the accuracy of calibration instruments.

  12. Implementation and analysis of a Navier-Stokes algorithm on parallel computers

    Science.gov (United States)

    Fatoohi, Raad A.; Grosch, Chester E.

    1988-01-01

    The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.

  13. A model for optimizing file access patterns using spatio-temporal parallelism

    Energy Technology Data Exchange (ETDEWEB)

    Boonthanome, Nouanesengsy [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Patchett, John [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Geveci, Berk [Kitware Inc., Clifton Park, NY (United States); Ahrens, James [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Bauer, Andy [Kitware Inc., Clifton Park, NY (United States); Chaudhary, Aashish [Kitware Inc., Clifton Park, NY (United States); Miller, Ross G. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Shipman, Galen M. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Williams, Dean N. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2013-01-01

    For many years now, I/O read time has been recognized as the primary bottleneck for parallel visualization and analysis of large-scale data. In this paper, we introduce a model that can estimate the read time for a file stored in a parallel filesystem when given the file access pattern. Read times ultimately depend on how the file is stored and the access pattern used to read the file. The file access pattern will be dictated by the type of parallel decomposition used. We employ spatio-temporal parallelism, which combines both spatial and temporal parallelism, to provide greater flexibility to possible file access patterns. Using our model, we were able to configure the spatio-temporal parallelism to design optimized read access patterns that resulted in a speedup factor of approximately 400 over traditional file access patterns.

  14. Parallelization and scheduling of data intensive particle physics analysis jobs on clusters of PCs

    CERN Document Server

    Ponce, S

    2004-01-01

    Summary form only given. Scheduling policies are proposed for parallelizing data intensive particle physics analysis applications on computer clusters. Particle physics analysis jobs require the analysis of tens of thousands of particle collision events, each event requiring typically 200ms processing time and 600KB of data. Many jobs are launched concurrently by a large number of physicists. At a first view, particle physics jobs seem to be easy to parallelize, since particle collision events can be processed independently one from another. However, since large amounts of data need to be accessed, the real challenge resides in making an efficient use of the underlying computing resources. We propose several job parallelization and scheduling policies aiming at reducing job processing times and at increasing the sustainable load of a cluster server. Since particle collision events are usually reused by several jobs, cache based job splitting strategies considerably increase cluster utilization and reduce job ...

  15. Tracking senescence-induced patterns in leaf litter leachate using parallel factor analysis (PARAFAC) modeling and self-organizing maps

    Science.gov (United States)

    Wheeler, K. I.; Levia, D. F.; Hudson, J. E.

    2017-09-01

    In autumn, the dissolved organic matter (DOM) contribution of leaf litter leachate to streams in forested watersheds changes as trees undergo resorption, senescence, and leaf abscission. Despite its biogeochemical importance, little work has investigated how leaf litter leachate DOM changes throughout autumn and how any changes might differ interspecifically and intraspecifically. Since climate change is expected to cause vegetation migration, it is necessary to learn how changes in forest composition could affect DOM inputs via leaf litter leachate. We examined changes in leaf litter leachate fluorescent DOM (FDOM) from American beech (Fagus grandifolia Ehrh.) leaves in Maryland, Rhode Island, Vermont, and North Carolina and from yellow poplar (Liriodendron tulipifera L.) leaves from Maryland. FDOM in leachate samples was characterized by excitation-emission matrices (EEMs). A six-component parallel factor analysis (PARAFAC) model was created to identify components that accounted for the majority of the variation in the data set. Self-organizing maps (SOM) compared the PARAFAC component proportions of leachate samples. Phenophase and species exerted much stronger influence on the determination of a sample's SOM placement than geographic origin. As expected, FDOM from all trees transitioned from more protein-like components to more humic-like components with senescence. Percent greenness of sampled leaves and the proportion of tyrosine-like component 1 were found to be significantly different between the two genetic beech clusters, suggesting differences in photosynthesis and resorption. Our results highlight the need to account for interspecific and intraspecific variations in leaf litter leachate FDOM throughout autumn when examining the influence of allochthonous inputs to streams.

  16. Nuclear respiratory factor 2 regulates the expression of the same NMDA receptor subunit genes as NRF-1: both factors act by a concurrent and parallel mechanism to couple energy metabolism and synaptic transmission.

    Science.gov (United States)

    Priya, Anusha; Johar, Kaid; Wong-Riley, Margaret T T

    2013-01-01

    Neuronal activity and energy metabolism are tightly coupled processes. Previously, we found that nuclear respiratory factor 1 (NRF-1) transcriptionally co-regulates energy metabolism and neuronal activity by regulating all 13 subunits of the critical energy generating enzyme, cytochrome c oxidase (COX), as well as N-methyl-d-aspartate (NMDA) receptor subunits 1 and 2B, GluN1 (Grin1) and GluN2B (Grin2b). We also found that another transcription factor, nuclear respiratory factor 2 (NRF-2 or GA-binding protein) regulates all subunits of COX as well. The goal of the present study was to test our hypothesis that NRF-2 also regulates specific subunits of NMDA receptors, and that it functions with NRF-1 via one of three mechanisms: complementary, concurrent and parallel, or a combination of complementary and concurrent/parallel. By means of multiple approaches, including in silico analysis, electrophoretic mobility shift and supershift assays, in vivo chromatin immunoprecipitation of mouse neuroblastoma cells and rat visual cortical tissue, promoter mutations, real-time quantitative PCR, and western blot analysis, NRF-2 was found to functionally regulate Grin1 and Grin2b genes, but not any other NMDA subunit genes. Grin1 and Grin2b transcripts were up-regulated by depolarizing KCl, but silencing of NRF-2 prevented this up-regulation. On the other hand, over-expression of NRF-2 rescued the down-regulation of these subunits by the impulse blocker TTX. NRF-2 binding sites on Grin1 and Grin2b are conserved among species. Our data indicate that NRF-2 and NRF-1 operate in a concurrent and parallel manner in mediating the tight coupling between energy metabolism and neuronal activity at the molecular level. Copyright © 2012 Elsevier B.V. All rights reserved.

  17. Parallel analysis tools and new visualization techniques for ultra-large climate data set

    Energy Technology Data Exchange (ETDEWEB)

    Middleton, Don [National Center for Atmospheric Research, Boulder, CO (United States); Haley, Mary [National Center for Atmospheric Research, Boulder, CO (United States)

    2014-12-10

    ParVis was a project funded under LAB 10-05: “Earth System Modeling: Advanced Scientific Visualization of Ultra-Large Climate Data Sets”. Argonne was the lead lab with partners at PNNL, SNL, NCAR and UC-Davis. This report covers progress from January 1st, 2013 through Dec 1st, 2014. Two previous reports covered the period from Summer, 2010, through September 2011 and October 2011 through December 2012, respectively. While the project was originally planned to end on April 30, 2013, personnel and priority changes allowed many of the institutions to continue work through FY14 using existing funds. A primary focus of ParVis was introducing parallelism to climate model analysis to greatly reduce the time-to-visualization for ultra-large climate data sets. Work in the first two years was conducted on two tracks with different time horizons: one track to provide immediate help to climate scientists already struggling to apply their analysis to existing large data sets and another focused on building a new data-parallel library and tool for climate analysis and visualization that will give the field a platform for performing analysis and visualization on ultra-large datasets for the foreseeable future. In the final 2 years of the project, we focused mostly on the new data-parallel library and associated tools for climate analysis and visualization.

  18. Using parallel factor analysis modeling (PARAFAC) and self-organizing maps to track senescence-induced patterns in leaf litter leachate

    Science.gov (United States)

    Wheeler, K. I.; Levia, D. F., Jr.; Hudson, J. E.

    2017-12-01

    As trees undergo autumnal processes such as resorption, senescence, and leaf abscission, the dissolved organic matter (DOM) contribution of leaf litter leachate to streams changes. However, little research has investigated how the fluorescent DOM (FDOM) changes throughout the autumn and how this differs inter- and intraspecifically. Two of the major impacts of global climate change on forested ecosystems include altering phenology and causing forest community species and subspecies composition restructuring. We examined changes in FDOM in leachate from American beech (Fagus grandifolia Ehrh.) leaves in Maryland, Rhode Island, Vermont, and North Carolina and yellow poplar (Liriodendron tulipifera L.) leaves from Maryland throughout three different phenophases: green, senescing, and freshly abscissed. Beech leaves from Maryland and Rhode Island have previously been identified as belonging to the same distinct genetic cluster and beech trees from Vermont and the study site in North Carolina from the other. FDOM in samples was characterized using excitation-emission matrices (EEMs) and a six-component parallel factor analysis (PARAFAC) model was created to identify components. Self-organizing maps (SOMs) were used to visualize variation and patterns in the PARAFAC component proportions of the leachate samples. Phenophase and species had the greatest influence on determining where a sample mapped on the SOM when compared to genetic clusters and geographic origin. Throughout senescence, FDOM from all the trees transitioned from more protein-like components to more humic-like ones. Percent greenness of the sampled leaves and the proportion of the tyrosine-like component 1 were found to significantly differ between the two genetic beech clusters. This suggests possible differences in photosynthesis and resorption between the two genetic clusters of beech. The use of SOMs to visualize differences in patterns of senescence between the different species and genetic

  19. Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis.

    Science.gov (United States)

    Kindlmann, Gordon; Chiw, Charisee; Seltzer, Nicholas; Samuels, Lamont; Reppy, John

    2016-01-01

    Many algorithms for scientific visualization and image analysis are rooted in the world of continuous scalar, vector, and tensor fields, but are programmed in low-level languages and libraries that obscure their mathematical foundations. Diderot is a parallel domain-specific language that is designed to bridge this semantic gap by providing the programmer with a high-level, mathematical programming notation that allows direct expression of mathematical concepts in code. Furthermore, Diderot provides parallel performance that takes advantage of modern multicore processors and GPUs. The high-level notation allows a concise and natural expression of the algorithms and the parallelism allows efficient execution on real-world datasets.

  20. Seasonal characterization of CDOM for lakes in semiarid regions of Northeast China using excitation-emission matrix fluorescence and parallel factor analysis (EEM-PARAFAC)

    Science.gov (United States)

    Zhao, Ying; Song, Kaishan; Wen, Zhidan; Li, Lin; Zang, Shuying; Shao, Tiantian; Li, Sijia; Du, Jia

    2016-03-01

    The seasonal characteristics of fluorescent components in chromophoric dissolved organic matter (CDOM) for lakes in the semiarid region of Northeast China were examined by excitation-emission matrix (EEM) spectra and parallel factor analysis (PARAFAC). Two humic-like (C1 and C2) and protein-like (C3 and C4) components were identified using PARAFAC. The average fluorescence intensity of the four components differed under seasonal variation from June and August 2013 to February and April 2014. Components 1 and 2 exhibited a strong linear correlation (R2 = 0.628). Significantly positive linear relationships between CDOM absorption coefficients a(254) (R2 = 0.72, 0.46, p DOC). However, almost no obvious correlation was found between salinity and EEM-PARAFAC-extracted components except for C3 (R2 = 0.469). Results from this investigation demonstrate that the EEM-PARAFAC technique can be used to evaluate the seasonal dynamics of CDOM fluorescent components for inland waters in the semiarid regions of Northeast China, and to quantify CDOM components for other waters with similar environmental conditions.

  1. The relationship of chromophoric dissolved organic matter parallel factor analysis fluorescence and polycyclic aromatic hydrocarbons in natural surface waters.

    Science.gov (United States)

    Li, Sijia; Chen, Ya'nan; Zhang, Jiquan; Song, Kaishan; Mu, Guangyi; Sun, Caiyun; Ju, Hanyu; Ji, Meichen

    2018-01-01

    Polycyclic aromatic hydrocarbons (PAHs), a large group of persistent organic pollutants (POPs), have caused wide environmental pollution and ecological effects. Chromophoric dissolved organic matter (CDOM), which consists of complex compounds, was seen as a proxy of water quality. An attempt was made to understand the relationships of CDOM absorption parameters and parallel factor analysis (PARAFAC) components with PAHs under seasonal variation in the riverine, reservoir, and urban waters of the Yinma River watershed in 2016. These different types of water bodies provided wide CDOM and PAHs concentration ranges with CDOM absorption coefficients at a wavelength of 350 nm (a CDOM (350)) of 1.17-20.74 m -1 and total PAHs of 0-1829 ng/L. CDOM excitation-emission matrix (EEM) presented two fluorescent components, e.g., terrestrial humic-like (C1) and tryptophan-like (C2) were identified using PARAFAC. Tryptophan-like associated protein-like fluorescence often dominates the EEM signatures of sewage samples. Our finding is that seasonal CDOM EEM-PARAFAC and PAHs concentration showed consistent tendency indicated that PAHs were un-ignorable pollutants. However, the disparities in seasonal CDOM-PAH relationships relate to the similar sources of CDOM and PAHs, and the proportion of PAHs in CDOM. Overlooked and poorly appreciated, quantifying the relationship between CDOM and PAHs has important implications, because these results simplify ecological and health-based risk assessment of pollutants compared to the traditional chemical measurements.

  2. Vectorization, parallelization and porting of nuclear codes (vectorization and parallelization). Progress report fiscal 1998

    International Nuclear Information System (INIS)

    Ishizuki, Shigeru; Kawai, Wataru; Nemoto, Toshiyuki; Ogasawara, Shinobu; Kume, Etsuo; Adachi, Masaaki; Kawasaki, Nobuo; Yatake, Yo-ichi

    2000-03-01

    Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system, the AP3000 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 12 codes in fiscal 1998. These results are reported in 3 parts, i.e., the vectorization and parallelization on vector processors part, the parallelization on scalar processors part and the porting part. In this report, we describe the vectorization and parallelization on vector processors. In this vectorization and parallelization on vector processors part, the vectorization of General Tokamak Circuit Simulation Program code GTCSP, the vectorization and parallelization of Molecular Dynamics NTV (n-particle, Temperature and Velocity) Simulation code MSP2, Eddy Current Analysis code EDDYCAL, Thermal Analysis Code for Test of Passive Cooling System by HENDEL T2 code THANPACST2 and MHD Equilibrium code SELENEJ on the VPP500 are described. In the parallelization on scalar processors part, the parallelization of Monte Carlo N-Particle Transport code MCNP4B2, Plasma Hydrodynamics code using Cubic Interpolated Propagation Method PHCIP and Vectorized Monte Carlo code (continuous energy model / multi-group model) MVP/GMVP on the Paragon are described. In the porting part, the porting of Monte Carlo N-Particle Transport code MCNP4B2 and Reactor Safety Analysis code RELAP5 on the AP3000 are described. (author)

  3. Parallel computation of aerodynamic influence coefficients for aeroelastic analysis on a transputer network

    Science.gov (United States)

    Janetzke, D. C.; Murthy, D. V.

    1991-01-01

    Aeroelastic analysis is mult-disciplinary and computationally expensive. Hence, it can greatly benefit from parallel processing. As part of an effort to develop an aeroelastic analysis capability on a distributed-memory transputer network, a parallel algorithm for the computation of aerodynamic influence coefficients is implemented on a network of 32 transputers. The aerodynamic influence coefficients are calculated using a three-dimensional unsteady aerodynamic model and a panel discretization. Efficiencies up to 85 percent are demonstrated using 32 processors. The effects of subtask ordering, problem size and network topology are presented. A comparison to results on a shared-memory computer indicates that higher speedup is achieved on the distributed-memory system.

  4. Analysis of Parallel Burn Without Crossfeed TSTO RLV Architectures and Comparison to Parallel Burn With Crossfeed and Series Burn Architectures

    Science.gov (United States)

    Smith, Garrett; Phillips, Alan

    2002-01-01

    There are currently three dominant TSTO class architectures. These are Series Burn (SB), Parallel Burn with crossfeed (PBw/cf), and Parallel Burn without crossfeed (PBncf). The goal of this study was to determine what factors uniquely affect PBncf architectures, how each of these factors interact, and to determine from a performance perspective whether a PBncf vehicle could be competitive with a PBw/cf or SB vehicle using equivalent technology and assumptions. In all cases, performance was evaluated on a relative basis for a fixed payload and mission by comparing gross and dry vehicle masses of a closed vehicle. Propellant combinations studied were LOX: LH2 propelled orbiter and booster (HH) and LOX: Kerosene booster with LOX: LH2 orbiter (KH). The study conclusions were: 1) a PBncf orbiter should be throttled as deeply as possible after launch until the staging point. 2) a detailed structural model is essential to accurate architecture analysis and evaluation. 3) a PBncf TSTO architecture is feasible for systems that stage at mach 7. 3a) HH architectures can achieve a mass growth relative to PBw/cf of ratio and to the position of the orbiter required to align the nozzle heights at liftoff. 5 ) thrust to weight ratios of 1.3 at liftoff and between 1.0 and 0.9 when staging at mach 7 appear to be close to ideal for PBncf vehicles. 6) performance for all vehicles studied is better when staged at mach 7 instead of mach 5. The study showed that a Series Burn architecture has the lowest gross mass for HH cases, and has the lowest dry mass for KH cases. The potential disadvantages of SB are the required use of an air-start for the orbiter engines and potential CG control issues. A Parallel Burn with crossfeed architecture solves both these problems, but the mechanics of a large bipropellant crossfeed system pose significant technical difficulties. Parallel Burn without crossfeed vehicles start both booster and orbiter engines on the ground and thus avoid both the risk of

  5. Parallel inter channel interaction mechanisms

    International Nuclear Information System (INIS)

    Jovic, V.; Afgan, N.; Jovic, L.

    1995-01-01

    Parallel channels interactions are examined. For experimental researches of nonstationary regimes flow in three parallel vertical channels results of phenomenon analysis and mechanisms of parallel channel interaction for adiabatic condition of one-phase fluid and two-phase mixture flow are shown. (author)

  6. Comprehensive quantification of signal-to-noise ratio and g-factor for image-based and k-space-based parallel imaging reconstructions.

    Science.gov (United States)

    Robson, Philip M; Grant, Aaron K; Madhuranthakam, Ananth J; Lattanzi, Riccardo; Sodickson, Daniel K; McKenzie, Charles A

    2008-10-01

    Parallel imaging reconstructions result in spatially varying noise amplification characterized by the g-factor, precluding conventional measurements of noise from the final image. A simple Monte Carlo based method is proposed for all linear image reconstruction algorithms, which allows measurement of signal-to-noise ratio and g-factor and is demonstrated for SENSE and GRAPPA reconstructions for accelerated acquisitions that have not previously been amenable to such assessment. Only a simple "prescan" measurement of noise amplitude and correlation in the phased-array receiver, and a single accelerated image acquisition are required, allowing robust assessment of signal-to-noise ratio and g-factor. The "pseudo multiple replica" method has been rigorously validated in phantoms and in vivo, showing excellent agreement with true multiple replica and analytical methods. This method is universally applicable to the parallel imaging reconstruction techniques used in clinical applications and will allow pixel-by-pixel image noise measurements for all parallel imaging strategies, allowing quantitative comparison between arbitrary k-space trajectories, image reconstruction, or noise conditioning techniques. (c) 2008 Wiley-Liss, Inc.

  7. Combining Compile-Time and Run-Time Parallelization

    Directory of Open Access Journals (Sweden)

    Sungdo Moon

    1999-01-01

    Full Text Available This paper demonstrates that significant improvements to automatic parallelization technology require that existing systems be extended in two ways: (1 they must combine high‐quality compile‐time analysis with low‐cost run‐time testing; and (2 they must take control flow into account during analysis. We support this claim with the results of an experiment that measures the safety of parallelization at run time for loops left unparallelized by the Stanford SUIF compiler’s automatic parallelization system. We present results of measurements on programs from two benchmark suites – SPECFP95 and NAS sample benchmarks – which identify inherently parallel loops in these programs that are missed by the compiler. We characterize remaining parallelization opportunities, and find that most of the loops require run‐time testing, analysis of control flow, or some combination of the two. We present a new compile‐time analysis technique that can be used to parallelize most of these remaining loops. This technique is designed to not only improve the results of compile‐time parallelization, but also to produce low‐cost, directed run‐time tests that allow the system to defer binding of parallelization until run‐time when safety cannot be proven statically. We call this approach predicated array data‐flow analysis. We augment array data‐flow analysis, which the compiler uses to identify independent and privatizable arrays, by associating predicates with array data‐flow values. Predicated array data‐flow analysis allows the compiler to derive “optimistic” data‐flow values guarded by predicates; these predicates can be used to derive a run‐time test guaranteeing the safety of parallelization.

  8. Correlation analysis of respiratory signals by using parallel coordinate plots.

    Science.gov (United States)

    Saatci, Esra

    2018-01-01

    The understanding of the bonds and the relationships between the respiratory signals, i.e. the airflow, the mouth pressure, the relative temperature and the relative humidity during breathing may provide the improvement on the measurement methods of respiratory mechanics and sensor designs or the exploration of the several possible applications in the analysis of respiratory disorders. Therefore, the main objective of this study was to propose a new combination of methods in order to determine the relationship between respiratory signals as a multidimensional data. In order to reveal the coupling between the processes two very different methods were used: the well-known statistical correlation analysis (i.e. Pearson's correlation and cross-correlation coefficient) and parallel coordinate plots (PCPs). Curve bundling with the number intersections for the correlation analysis, Least Mean Square Time Delay Estimator (LMS-TDE) for the point delay detection and visual metrics for the recognition of the visual structures were proposed and utilized in PCP. The number of intersections was increased when the correlation coefficient changed from high positive to high negative correlation between the respiratory signals, especially if whole breath was processed. LMS-TDE coefficients plotted in PCP indicated well-matched point delay results to the findings in the correlation analysis. Visual inspection of PCB by visual metrics showed range, dispersions, entropy comparisons and linear and sinusoidal-like relationships between the respiratory signals. It is demonstrated that the basic correlation analysis together with the parallel coordinate plots perceptually motivates the visual metrics in the display and thus can be considered as an aid to the user analysis by providing meaningful views of the data. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Analysis of a parallel multigrid algorithm

    Science.gov (United States)

    Chan, Tony F.; Tuminaro, Ray S.

    1989-01-01

    The parallel multigrid algorithm of Frederickson and McBryan (1987) is considered. This algorithm uses multiple coarse-grid problems (instead of one problem) in the hope of accelerating convergence and is found to have a close relationship to traditional multigrid methods. Specifically, the parallel coarse-grid correction operator is identical to a traditional multigrid coarse-grid correction operator, except that the mixing of high and low frequencies caused by aliasing error is removed. Appropriate relaxation operators can be chosen to take advantage of this property. Comparisons between the standard multigrid and the new method are made.

  10. PERFORMANCE ANALYSIS BETWEEN EXPLICIT SCHEDULING AND IMPLICIT SCHEDULING OF PARALLEL ARRAY-BASED DOMAIN DECOMPOSITION USING OPENMP

    Directory of Open Access Journals (Sweden)

    MOHAMMED FAIZ ABOALMAALY

    2014-10-01

    Full Text Available With the continuous revolution of multicore architecture, several parallel programming platforms have been introduced in order to pave the way for fast and efficient development of parallel algorithms. Back into its categories, parallel computing can be done through two forms: Data-Level Parallelism (DLP or Task-Level Parallelism (TLP. The former can be done by the distribution of data among the available processing elements while the latter is based on executing independent tasks concurrently. Most of the parallel programming platforms have built-in techniques to distribute the data among processors, these techniques are technically known as automatic distribution (scheduling. However, due to their wide range of purposes, variation of data types, amount of distributed data, possibility of extra computational overhead and other hardware-dependent factors, manual distribution could achieve better outcomes in terms of performance when compared to the automatic distribution. In this paper, this assumption is investigated by conducting a comparison between automatic and our newly proposed manual distribution of data among threads in parallel. Empirical results of matrix addition and matrix multiplication show a considerable performance gain when manual distribution is applied against automatic distribution.

  11. Parallel Wavefront Analysis for a 4D Interferometer

    Science.gov (United States)

    Rao, Shanti R.

    2011-01-01

    This software provides a programming interface for automating data collection with a PhaseCam interferometer from 4D Technology, and distributing the image-processing algorithm across a cluster of general-purpose computers. Multiple instances of 4Sight (4D Technology s proprietary software) run on a networked cluster of computers. Each connects to a single server (the controller) and waits for instructions. The controller directs the interferometer to several images, then assigns each image to a different computer for processing. When the image processing is finished, the server directs one of the computers to collate and combine the processed images, saving the resulting measurement in a file on a disk. The available software captures approximately 100 images and analyzes them immediately. This software separates the capture and analysis processes, so that analysis can be done at a different time and faster by running the algorithm in parallel across several processors. The PhaseCam family of interferometers can measure an optical system in milliseconds, but it takes many seconds to process the data so that it is usable. In characterizing an adaptive optics system, like the next generation of astronomical observatories, thousands of measurements are required, and the processing time quickly becomes excessive. A programming interface distributes data processing for a PhaseCam interferometer across a Windows computing cluster. A scriptable controller program coordinates data acquisition from the interferometer, storage on networked hard disks, and parallel processing. Idle time of the interferometer is minimized. This architecture is implemented in Python and JavaScript, and may be altered to fit a customer s needs.

  12. A Massively Parallel Solver for the Mechanical Harmonic Analysis of Accelerator Cavities

    International Nuclear Information System (INIS)

    2015-01-01

    ACE3P is a 3D massively parallel simulation suite that developed at SLAC National Accelerator Laboratory that can perform coupled electromagnetic, thermal and mechanical study. Effectively utilizing supercomputer resources, ACE3P has become a key simulation tool for particle accelerator R and D. A new frequency domain solver to perform mechanical harmonic response analysis of accelerator components is developed within the existing parallel framework. This solver is designed to determine the frequency response of the mechanical system to external harmonic excitations for time-efficient accurate analysis of the large-scale problems. Coupled with the ACE3P electromagnetic modules, this capability complements a set of multi-physics tools for a comprehensive study of microphonics in superconducting accelerating cavities in order to understand the RF response and feedback requirements for the operational reliability of a particle accelerator. (auth)

  13. Constraint treatment techniques and parallel algorithms for multibody dynamic analysis. Ph.D. Thesis

    Science.gov (United States)

    Chiou, Jin-Chern

    1990-01-01

    Computational procedures for kinematic and dynamic analysis of three-dimensional multibody dynamic (MBD) systems are developed from the differential-algebraic equations (DAE's) viewpoint. Constraint violations during the time integration process are minimized and penalty constraint stabilization techniques and partitioning schemes are developed. The governing equations of motion, a two-stage staggered explicit-implicit numerical algorithm, are treated which takes advantage of a partitioned solution procedure. A robust and parallelizable integration algorithm is developed. This algorithm uses a two-stage staggered central difference algorithm to integrate the translational coordinates and the angular velocities. The angular orientations of bodies in MBD systems are then obtained by using an implicit algorithm via the kinematic relationship between Euler parameters and angular velocities. It is shown that the combination of the present solution procedures yields a computationally more accurate solution. To speed up the computational procedures, parallel implementation of the present constraint treatment techniques, the two-stage staggered explicit-implicit numerical algorithm was efficiently carried out. The DAE's and the constraint treatment techniques were transformed into arrowhead matrices to which Schur complement form was derived. By fully exploiting the sparse matrix structural analysis techniques, a parallel preconditioned conjugate gradient numerical algorithm is used to solve the systems equations written in Schur complement form. A software testbed was designed and implemented in both sequential and parallel computers. This testbed was used to demonstrate the robustness and efficiency of the constraint treatment techniques, the accuracy of the two-stage staggered explicit-implicit numerical algorithm, and the speed up of the Schur-complement-based parallel preconditioned conjugate gradient algorithm on a parallel computer.

  14. Chromophoric dissolved organic matter (CDOM) variability in Barataria Basin using excitation-emission matrix (EEM) fluorescence and parallel factor analysis (PARAFAC).

    Science.gov (United States)

    Singh, Shatrughan; D'Sa, Eurico J; Swenson, Erick M

    2010-07-15

    Chromophoric dissolved organic matter (CDOM) variability in Barataria Basin, Louisiana, USA,was examined by excitation emission matrix (EEM) fluorescence combined with parallel factor analysis (PARAFAC). CDOM optical properties of absorption and fluorescence at 355nm along an axial transect (36 stations) during March, April, and May 2008 showed an increasing trend from the marine end member to the upper basin with mean CDOM absorption of 11.06 + or - 5.01, 10.05 + or - 4.23, 11.67 + or - 6.03 (m(-)(1)) and fluorescence 0.80 + or - 0.37, 0.78 + or - 0.39, 0.75 + or - 0.51 (RU), respectively. PARAFAC analysis identified two terrestrial humic-like (component 1 and 2), one non-humic like (component 3), and one soil derived humic acid like (component 4) components. The spatial variation of the components showed an increasing trend from station 1 (near the mouth of basin) to station 36 (end member of bay; upper basin). Deviations from this increasing trend were observed at a bayou channel with very high chlorophyll-a concentrations especially for component 3 in May 2008 that suggested autochthonous production of CDOM. The variability of components with salinity indicated conservative mixing along the middle part of the transect. Component 1 and 4 were found to be relatively constant, while components 2 and 3 revealed an inverse relationship for the sampling period. Total organic carbon showed increasing trend for each of the components. An increase in humification and a decrease in fluorescence indices along the transect indicated an increase in terrestrial derived organic matter and reduced microbial activity from lower to upper basin. The use of these indices along with PARAFAC results improved dissolved organic matter characterization in the Barataria Basin. Copyright 2010 Elsevier B.V. All rights reserved.

  15. Using exploratory factor analysis in personality research: Best-practice recommendations

    Directory of Open Access Journals (Sweden)

    Sumaya Laher

    2010-11-01

    Research purpose: This article presents more objective methods to determine the number of factors, most notably parallel analysis and Velicer’s minimum average partial (MAP. The benefits of rotation are also discussed. The article argues for more consistent use of Procrustes rotation and congruence coefficients in factor analytic studies. Motivation for the study: Exploratory factor analysis is often criticised for not being rigorous and objective enough in terms of the methods used to determine the number of factors, the rotations to be used and ultimately the validity of the factor structure. Research design, approach and method: The article adopts a theoretical stance to discuss the best-practice recommendations for factor analytic research in the field of psychology. Following this, an example located within personality assessment and using the NEO-PI-R specifically is presented. A total of 425 students at the University of the Witwatersrand completed the NEO-PI-R. These responses were subjected to a principal components analysis using varimax rotation. The rotated solution was subjected to a Procrustes rotation with Costa and McCrae’s (1992 matrix as the target matrix. Congruence coefficients were also computed. Main findings: The example indicates the use of the methods recommended in the article and demonstrates an objective way of determining the number of factors. It also provides an example of Procrustes rotation with coefficients of agreement as an indication of how factor analytic results may be presented more rigorously in local research. Practical/managerial implications: It is hoped that the recommendations in this article will have best-practice implications for both researchers and practitioners in the field who employ factor analysis regularly. Contribution/value-add: This article will prove useful to all researchers employing factor analysis and has the potential to set the trend for better use of factor analysis in the South African context.

  16. Chromatographic background drift correction coupled with parallel factor analysis to resolve coelution problems in three-dimensional chromatographic data: quantification of eleven antibiotics in tap water samples by high-performance liquid chromatography coupled with a diode array detector.

    Science.gov (United States)

    Yu, Yong-Jie; Wu, Hai-Long; Fu, Hai-Yan; Zhao, Juan; Li, Yuan-Na; Li, Shu-Fang; Kang, Chao; Yu, Ru-Qin

    2013-08-09

    Chromatographic background drift correction has been an important field of research in chromatographic analysis. In the present work, orthogonal spectral space projection for background drift correction of three-dimensional chromatographic data was described in detail and combined with parallel factor analysis (PARAFAC) to resolve overlapped chromatographic peaks and obtain the second-order advantage. This strategy was verified by simulated chromatographic data and afforded significant improvement in quantitative results. Finally, this strategy was successfully utilized to quantify eleven antibiotics in tap water samples. Compared with the traditional methodology of introducing excessive factors for the PARAFAC model to eliminate the effect of background drift, clear improvement in the quantitative performance of PARAFAC was observed after background drift correction by orthogonal spectral space projection. Copyright © 2013 Elsevier B.V. All rights reserved.

  17. Exploiting Symmetry on Parallel Architectures.

    Science.gov (United States)

    Stiller, Lewis Benjamin

    1995-01-01

    This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.

  18. Heterogeneous adsorption behavior of landfill leachate on granular activated carbon revealed by fluorescence excitation emission matrix (EEM)-parallel factor analysis (PARAFAC).

    Science.gov (United States)

    Lee, Sonmin; Hur, Jin

    2016-04-01

    Heterogeneous adsorption behavior of landfill leachate on granular activated carbon (GAC) was investigated by fluorescence excitation-emission matrix (EEM) combined with parallel factor analysis (PARAFAC). The equilibrium adsorption of two leachates on GAC was well described by simple Langmuir and Freundlich isotherm models. More nonlinear isotherm and a slower adsorption rate were found for the leachate with the higher values of specific UV absorbance and humification index, suggesting that the leachate containing more aromatic content and condensed structures might have less accessible sites of GAC surface and a lower degree of diffusive adsorption. Such differences in the adsorption behavior were found even within the bulk leachate as revealed by the dissimilarity in the isotherm and kinetic model parameters between two identified PARAFAC components. For both leachates, terrestrial humic-like fluorescence (C1) component, which is likely associated with relatively large sized and condensed aromatic structures, exhibited a higher isotherm nonlinearity and a slower kinetic rate for GAC adsorption than microbial humic-like (C2) component. Our results were consistent with size exclusion effects, a well-known GAC adsorption mechanism. This study demonstrated the promising benefit of using EEM-PARAFAC for GAC adsorption processes of landfill leachate through fast monitoring of the influent and treated leachate, which can provide valuable information on optimizing treatment processes and predicting further environmental impacts of the treated effluent. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. PVeStA: A Parallel Statistical Model Checking and Quantitative Analysis Tool

    KAUST Repository

    AlTurki, Musab

    2011-01-01

    Statistical model checking is an attractive formal analysis method for probabilistic systems such as, for example, cyber-physical systems which are often probabilistic in nature. This paper is about drastically increasing the scalability of statistical model checking, and making such scalability of analysis available to tools like Maude, where probabilistic systems can be specified at a high level as probabilistic rewrite theories. It presents PVeStA, an extension and parallelization of the VeStA statistical model checking tool [10]. PVeStA supports statistical model checking of probabilistic real-time systems specified as either: (i) discrete or continuous Markov Chains; or (ii) probabilistic rewrite theories in Maude. Furthermore, the properties that it can model check can be expressed in either: (i) PCTL/CSL, or (ii) the QuaTEx quantitative temporal logic. As our experiments show, the performance gains obtained from parallelization can be very high. © 2011 Springer-Verlag.

  20. One Factor or Two Parallel Processes? Comorbidity and Development of Adolescent Anxiety and Depressive Disorder Symptoms

    Science.gov (United States)

    Hale, William W., III; Raaijmakers, Quinten A. W.; Muris, Peter; van Hoof, Anne; Meeus, Wim H. J.

    2009-01-01

    Background: This study investigates whether anxiety and depressive disorder symptoms of adolescents from the general community are best described by a model that assumes they are indicative of one general factor or by a model that assumes they are two distinct disorders with parallel growth processes. Additional analyses were conducted to explore…

  1. Analysis of gamma irradiator dose rate using spent fuel elements with parallel configuration

    International Nuclear Information System (INIS)

    Setiyanto; Pudjijanto MS; Ardani

    2006-01-01

    To enhance the utilization of the RSG-GAS reactor spent fuel, the gamma irradiator using spent fuel elements as a gamma source is a suitable choice. This irradiator can be used for food sterilization and preservation. The first step before realization, it is necessary to determine the gamma dose rate theoretically. The assessment was realized for parallel configuration fuel elements with the irradiation space can be placed between fuel element series. This analysis of parallel model was choice to compare with the circle model and as long as possible to get more space for irradiation and to do manipulation of irradiation target. Dose rate calculation were done with MCNP, while the estimation of gamma activities of fuel element was realized by OREGEN code with 1 year of average delay time. The calculation result show that the gamma dose rate of parallel model decreased up to 50% relatively compared with the circle model, but the value still enough for sterilization and preservation. Especially for food preservation, this parallel model give more flexible, while the gamma dose rate can be adjusted to the irradiation needed. The conclusion of this assessment showed that the utilization of reactor spent fuels for gamma irradiator with parallel model give more advantage the circle model. (author)

  2. Study on Parallel Processing for Efficient Flexible Multibody Analysis based on Subsystem Synthesis Method

    Energy Technology Data Exchange (ETDEWEB)

    Han, Jong-Boo; Song, Hajun; Kim, Sung-Soo [Chungnam Nat’l Univ., Daejeon (Korea, Republic of)

    2017-06-15

    Flexible multibody simulations are widely used in the industry to design mechanical systems. In flexible multibody dynamics, deformation coordinates are described either relatively in the body reference frame that is floating in the space or in the inertial reference frame. Moreover, these deformation coordinates are generated based on the discretization of the body according to the finite element approach. Therefore, the formulation of the flexible multibody system always deals with a huge number of degrees of freedom and the numerical solution methods require a substantial amount of computational time. Parallel computational methods are a solution for efficient computation. However, most of the parallel computational methods are focused on the efficient solution of large-sized linear equations. For multibody analysis, we need to develop an efficient formulation that could be suitable for parallel computation. In this paper, we developed a subsystem synthesis method for a flexible multibody system and proposed efficient parallel computational schemes based on the OpenMP API in order to achieve efficient computation. Simulations of a rotating blade system, which consists of three identical blades, were carried out with two different parallel computational schemes. Actual CPU times were measured to investigate the efficiency of the proposed parallel schemes.

  3. Combining analysis of variance and three‐way factor analysis methods for studying additive and multiplicative effects in sensory panel data

    DEFF Research Database (Denmark)

    Romano, Rosaria; Næs, Tormod; Brockhoff, Per Bruun

    2015-01-01

    Data from descriptive sensory analysis are essentially three‐way data with assessors, samples and attributes as the three ways in the data set. Because of this, there are several ways that the data can be analysed. The paper focuses on the analysis of sensory characteristics of products while...... in the use of the scale with reference to the existing structure of relationships between sensory descriptors. The multivariate assessor model will be tested on a data set from milk. Relations between the proposed model and other multiplicative models like parallel factor analysis and analysis of variance...

  4. Effective damping for SSR analysis of parallel turbine-generators

    International Nuclear Information System (INIS)

    Agrawal, B.L.; Farmer, R.G.

    1988-01-01

    Damping is a dominant parameter in studies to determine SSR problem severity and countermeasure requirements. To reach valid conclusions for multi-unit plants, it is essential that the net effective damping of unequally loaded units be known. For the Palo Verde Nuclear Generating Station, extensive testing and analysis have been performed to verify and develop an accurate means of determining the effective damping of unequally loaded units in parallel. This has led to a unique and simple algorithm which correlates well with two other analytic techniques

  5. Stability of tapered and parallel-walled dental implants: A systematic review and meta-analysis.

    Science.gov (United States)

    Atieh, Momen A; Alsabeeha, Nabeel; Duncan, Warwick J

    2018-05-15

    Clinical trials have suggested that dental implants with a tapered configuration have improved stability at placement, allowing immediate placement and/or loading. The aim of this systematic review and meta-analysis was to evaluate the implant stability of tapered dental implants compared to standard parallel-walled dental implants. Applying the guidelines of Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement, randomized controlled trials (RCTs) were searched for in electronic databases and complemented by hand searching. The risk of bias was assessed using the Cochrane Collaboration's Risk of Bias tool and data were analyzed using statistical software. A total of 1199 studies were identified, of which, five trials were included with 336 dental implants in 303 participants. Overall meta-analysis showed that tapered dental implants had higher implant stability values than parallel-walled dental implants at insertion and 8 weeks but the difference was not statistically significant. Tapered dental implants had significantly less marginal bone loss compared to parallel-walled dental implants. No significant differences in implant failure rate were found between tapered and parallel-walled dental implants. There is limited evidence to demonstrate the effectiveness of tapered dental implants in achieving greater implant stability compared to parallel-walled dental implants. Superior short-term results in maintaining peri-implant marginal bone with tapered dental implants are possible. Further properly designed RCTs are required to endorse the supposed advantages of tapered dental implants in immediate loading protocol and other complex clinical scenarios. © 2018 Wiley Periodicals, Inc.

  6. Techniques and environments for big data analysis parallel, cloud, and grid computing

    CERN Document Server

    Dehuri, Satchidananda; Kim, Euiwhan; Wang, Gi-Name

    2016-01-01

    This volume is aiming at a wide range of readers and researchers in the area of Big Data by presenting the recent advances in the fields of Big Data Analysis, as well as the techniques and tools used to analyze it. The book includes 10 distinct chapters providing a concise introduction to Big Data Analysis and recent Techniques and Environments for Big Data Analysis. It gives insight into how the expensive fitness evaluation of evolutionary learning can play a vital role in big data analysis by adopting Parallel, Grid, and Cloud computing environments.

  7. Calibrationless Parallel Magnetic Resonance Imaging: A Joint Sparsity Model

    Directory of Open Access Journals (Sweden)

    Angshul Majumdar

    2013-12-01

    Full Text Available State-of-the-art parallel MRI techniques either explicitly or implicitly require certain parameters to be estimated, e.g., the sensitivity map for SENSE, SMASH and interpolation weights for GRAPPA, SPIRiT. Thus all these techniques are sensitive to the calibration (parameter estimation stage. In this work, we have proposed a parallel MRI technique that does not require any calibration but yields reconstruction results that are at par with (or even better than state-of-the-art methods in parallel MRI. Our proposed method required solving non-convex analysis and synthesis prior joint-sparsity problems. This work also derives the algorithms for solving them. Experimental validation was carried out on two datasets—eight channel brain and eight channel Shepp-Logan phantom. Two sampling methods were used—Variable Density Random sampling and non-Cartesian Radial sampling. For the brain data, acceleration factor of 4 was used and for the other an acceleration factor of 6 was used. The reconstruction results were quantitatively evaluated based on the Normalised Mean Squared Error between the reconstructed image and the originals. The qualitative evaluation was based on the actual reconstructed images. We compared our work with four state-of-the-art parallel imaging techniques; two calibrated methods—CS SENSE and l1SPIRiT and two calibration free techniques—Distributed CS and SAKE. Our method yields better reconstruction results than all of them.

  8. On synchronous parallel computations with independent probabilistic choice

    International Nuclear Information System (INIS)

    Reif, J.H.

    1984-01-01

    This paper introduces probabilistic choice to synchronous parallel machine models; in particular parallel RAMs. The power of probabilistic choice in parallel computations is illustrate by parallelizing some known probabilistic sequential algorithms. The authors characterize the computational complexity of time, space, and processor bounded probabilistic parallel RAMs in terms of the computational complexity of probabilistic sequential RAMs. They show that parallelism uniformly speeds up time bounded probabilistic sequential RAM computations by nearly a quadratic factor. They also show that probabilistic choice can be eliminated from parallel computations by introducing nonuniformity

  9. Parallel Hybrid Gas-Electric Geared Turbofan Engine Conceptual Design and Benefits Analysis

    Science.gov (United States)

    Lents, Charles; Hardin, Larry; Rheaume, Jonathan; Kohlman, Lee

    2016-01-01

    The conceptual design of a parallel gas-electric hybrid propulsion system for a conventional single aisle twin engine tube and wing vehicle has been developed. The study baseline vehicle and engine technology are discussed, followed by results of the hybrid propulsion system sizing and performance analysis. The weights analysis for the electric energy storage & conversion system and thermal management system is described. Finally, the potential system benefits are assessed.

  10. Fuzzy Controlled Parallel AC-DC Converter for PFC

    Directory of Open Access Journals (Sweden)

    M Subba Rao

    2011-01-01

    Full Text Available Paralleling of converter modules is a well-known technique that is often used in medium-power applications to achieve the desired output power by using smaller size of high frequency transformers and inductors. In this paper, a parallel-connected single-phase PFC topology using flyback and forward converters is proposed to improve the output voltage regulation with simultaneous input power factor correction (PFC and control. The goal of the control is to stabilize the output voltage of the converter against the load variations. The paper presents the derivation of fuzzy control rules for the dc/dc converter circuit and control algorithm for regulating the dc/dc converter. This paper presents a design example and circuit analysis for 200 W power supply. The proposed approach offers cost effective, compact and efficient AC/DC converter by the use of parallel power processing. MATLAB/SIMULINK is used for implementation and simulation results show the performance improvement.

  11. Parallel auto-correlative statistics with VTK.

    Energy Technology Data Exchange (ETDEWEB)

    Pebay, Philippe Pierre; Bennett, Janine Camille

    2013-08-01

    This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.

  12. Automatic Management of Parallel and Distributed System Resources

    Science.gov (United States)

    Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

    1990-01-01

    Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.

  13. Parallel magnetic resonance imaging

    International Nuclear Information System (INIS)

    Larkman, David J; Nunes, Rita G

    2007-01-01

    Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed. (invited topical review)

  14. Fourier analysis of parallel block-Jacobi splitting with transport synthetic acceleration in two-dimensional geometry

    International Nuclear Information System (INIS)

    Rosa, M.; Warsa, J. S.; Chang, J. H.

    2007-01-01

    A Fourier analysis is conducted in two-dimensional (2D) Cartesian geometry for the discrete-ordinates (SN) approximation of the neutron transport problem solved with Richardson iteration (Source Iteration) and Richardson iteration preconditioned with Transport Synthetic Acceleration (TSA), using the Parallel Block-Jacobi (PBJ) algorithm. The results for the un-accelerated algorithm show that convergence of PBJ can degrade, leading in particular to stagnation of GMRES(m) in problems containing optically thin sub-domains. The results for the accelerated algorithm indicate that TSA can be used to efficiently precondition an iterative method in the optically thin case when implemented in the 'modified' version MTSA, in which only the scattering in the low order equations is reduced by some non-negative factor β<1. (authors)

  15. Unified Singularity Modeling and Reconfiguration of 3rTPS Metamorphic Parallel Mechanisms with Parallel Constraint Screws

    Directory of Open Access Journals (Sweden)

    Yufeng Zhuang

    2015-01-01

    Full Text Available This paper presents a unified singularity modeling and reconfiguration analysis of variable topologies of a class of metamorphic parallel mechanisms with parallel constraint screws. The new parallel mechanisms consist of three reconfigurable rTPS limbs that have two working phases stemming from the reconfigurable Hooke (rT joint. While one phase has full mobility, the other supplies a constraint force to the platform. Based on these, the platform constraint screw systems show that the new metamorphic parallel mechanisms have four topologies by altering the limb phases with mobility change among 1R2T (one rotation with two translations, 2R2T, and 3R2T and mobility 6. Geometric conditions of the mechanism design are investigated with some special topologies illustrated considering the limb arrangement. Following this and the actuation scheme analysis, a unified Jacobian matrix is formed using screw theory to include the change between geometric constraints and actuation constraints in the topology reconfiguration. Various singular configurations are identified by analyzing screw dependency in the Jacobian matrix. The work in this paper provides basis for singularity-free workspace analysis and optimal design of the class of metamorphic parallel mechanisms with parallel constraint screws which shows simple geometric constraints with potential simple kinematics and dynamics properties.

  16. A dataflow analysis tool for parallel processing of algorithms

    Science.gov (United States)

    Jones, Robert L., III

    1993-01-01

    A graph-theoretic design process and software tool is presented for selecting a multiprocessing scheduling solution for a class of computational problems. The problems of interest are those that can be described using a dataflow graph and are intended to be executed repetitively on a set of identical parallel processors. Typical applications include signal processing and control law problems. Graph analysis techniques are introduced and shown to effectively determine performance bounds, scheduling constraints, and resource requirements. The software tool is shown to facilitate the application of the design process to a given problem.

  17. An efficient parallel stochastic simulation method for analysis of nonviral gene delivery systems

    KAUST Repository

    Kuwahara, Hiroyuki

    2011-01-01

    Gene therapy has a great potential to become an effective treatment for a wide variety of diseases. One of the main challenges to make gene therapy practical in clinical settings is the development of efficient and safe mechanisms to deliver foreign DNA molecules into the nucleus of target cells. Several computational and experimental studies have shown that the design process of synthetic gene transfer vectors can be greatly enhanced by computational modeling and simulation. This paper proposes a novel, effective parallelization of the stochastic simulation algorithm (SSA) for pharmacokinetic models that characterize the rate-limiting, multi-step processes of intracellular gene delivery. While efficient parallelizations of the SSA are still an open problem in a general setting, the proposed parallel simulation method is able to substantially accelerate the next reaction selection scheme and the reaction update scheme in the SSA by exploiting and decomposing the structures of stochastic gene delivery models. This, thus, makes computationally intensive analysis such as parameter optimizations and gene dosage control for specific cell types, gene vectors, and transgene expression stability substantially more practical than that could otherwise be with the standard SSA. Here, we translated the nonviral gene delivery model based on mass-action kinetics by Varga et al. [Molecular Therapy, 4(5), 2001] into a more realistic model that captures intracellular fluctuations based on stochastic chemical kinetics, and as a case study we applied our parallel simulation to this stochastic model. Our results show that our simulation method is able to increase the efficiency of statistical analysis by at least 50% in various settings. © 2011 ACM.

  18. Design and Analysis of Cooperative Cable Parallel Manipulators for Multiple Mobile Cranes

    Directory of Open Access Journals (Sweden)

    Bin Zi

    2012-11-01

    Full Text Available The design, dynamic modelling, and workspace are presented in this paper concerning cooperative cable parallel manipulators for multiple mobile cranes (CPMMCs. The CPMMCs can handle complex tasks that are more difficult or even impossible for a single mobile crane. Kinematics and dynamics of the CPMMCs are studied on the basis of geometric methodology and d'Alembert's principle, and a mathematical model of the CPMMCs is developed and presented with dynamic simulation. The constant orientation workspace analysis of the CPMMCs is carried out additionally. As an example, a cooperative cable parallel manipulator for triple mobile cranes with 6 Degrees of Freedom is investigated on the basis of the above design objectives.

  19. Study of talcum charging status in parallel plate electrostatic separator based on particle trajectory analysis

    Science.gov (United States)

    Yunxiao, CAO; Zhiqiang, WANG; Jinjun, WANG; Guofeng, LI

    2018-05-01

    Electrostatic separation has been extensively used in mineral processing, and has the potential to separate gangue minerals from raw talcum ore. As for electrostatic separation, the particle charging status is one of important influence factors. To describe the talcum particle charging status in a parallel plate electrostatic separator accurately, this paper proposes a modern images processing method. Based on the actual trajectories obtained from sequence images of particle movement and the analysis of physical forces applied on a charged particle, a numerical model is built, which could calculate the charge-to-mass ratios represented as the charging status of particle and simulate the particle trajectories. The simulated trajectories agree well with the experimental results obtained by images processing. In addition, chemical composition analysis is employed to reveal the relationship between ferrum gangue mineral content and charge-to-mass ratios. Research results show that the proposed method is effective for describing the particle charging status in electrostatic separation.

  20. A parallelization study of the general purpose Monte Carlo code MCNP4 on a distributed memory highly parallel computer

    International Nuclear Information System (INIS)

    Yamazaki, Takao; Fujisaki, Masahide; Okuda, Motoi; Takano, Makoto; Masukawa, Fumihiro; Naito, Yoshitaka

    1993-01-01

    The general purpose Monte Carlo code MCNP4 has been implemented on the Fujitsu AP1000 distributed memory highly parallel computer. Parallelization techniques developed and studied are reported. A shielding analysis function of the MCNP4 code is parallelized in this study. A technique to map a history to each processor dynamically and to map control process to a certain processor was applied. The efficiency of parallelized code is up to 80% for a typical practical problem with 512 processors. These results demonstrate the advantages of a highly parallel computer to the conventional computers in the field of shielding analysis by Monte Carlo method. (orig.)

  1. Behaviour of parallel girders stabilised with U-frames

    DEFF Research Database (Denmark)

    Virdi, Kuldeep; Azzi, Walid

    2010-01-01

    Lateral torsional buckling is a key factor in the design of steel girders. Stability can be enhanced by cross-bracing, reducing the effective length and thus increasing the ultimate capacity. U-frames are an option often used to brace the girders when designing through type of bridges and where...... overhead bracing is not practical. This paper investigates the effect of the U-frame spacing on the stability of the parallel girders. Eigenvalue buckling analysis was undertaken with four different spacings of the U-frames. Results were extracted from finite element analysis, interpreted and conclusions...

  2. PAPIRUS, a parallel computing framework for sensitivity analysis, uncertainty propagation, and estimation of parameter distribution

    International Nuclear Information System (INIS)

    Heo, Jaeseok; Kim, Kyung Doo

    2015-01-01

    Highlights: • We developed an interface between an engineering simulation code and statistical analysis software. • Multiple packages of the sensitivity analysis, uncertainty quantification, and parameter estimation algorithms are implemented in the framework. • Parallel computing algorithms are also implemented in the framework to solve multiple computational problems simultaneously. - Abstract: This paper introduces a statistical data analysis toolkit, PAPIRUS, designed to perform the model calibration, uncertainty propagation, Chi-square linearity test, and sensitivity analysis for both linear and nonlinear problems. The PAPIRUS was developed by implementing multiple packages of methodologies, and building an interface between an engineering simulation code and the statistical analysis algorithms. A parallel computing framework is implemented in the PAPIRUS with multiple computing resources and proper communications between the server and the clients of each processor. It was shown that even though a large amount of data is considered for the engineering calculation, the distributions of the model parameters and the calculation results can be quantified accurately with significant reductions in computational effort. A general description about the PAPIRUS with a graphical user interface is presented in Section 2. Sections 2.1–2.5 present the methodologies of data assimilation, uncertainty propagation, Chi-square linearity test, and sensitivity analysis implemented in the toolkit with some results obtained by each module of the software. Parallel computing algorithms adopted in the framework to solve multiple computational problems simultaneously are also summarized in the paper

  3. PAPIRUS, a parallel computing framework for sensitivity analysis, uncertainty propagation, and estimation of parameter distribution

    Energy Technology Data Exchange (ETDEWEB)

    Heo, Jaeseok, E-mail: jheo@kaeri.re.kr; Kim, Kyung Doo, E-mail: kdkim@kaeri.re.kr

    2015-10-15

    Highlights: • We developed an interface between an engineering simulation code and statistical analysis software. • Multiple packages of the sensitivity analysis, uncertainty quantification, and parameter estimation algorithms are implemented in the framework. • Parallel computing algorithms are also implemented in the framework to solve multiple computational problems simultaneously. - Abstract: This paper introduces a statistical data analysis toolkit, PAPIRUS, designed to perform the model calibration, uncertainty propagation, Chi-square linearity test, and sensitivity analysis for both linear and nonlinear problems. The PAPIRUS was developed by implementing multiple packages of methodologies, and building an interface between an engineering simulation code and the statistical analysis algorithms. A parallel computing framework is implemented in the PAPIRUS with multiple computing resources and proper communications between the server and the clients of each processor. It was shown that even though a large amount of data is considered for the engineering calculation, the distributions of the model parameters and the calculation results can be quantified accurately with significant reductions in computational effort. A general description about the PAPIRUS with a graphical user interface is presented in Section 2. Sections 2.1–2.5 present the methodologies of data assimilation, uncertainty propagation, Chi-square linearity test, and sensitivity analysis implemented in the toolkit with some results obtained by each module of the software. Parallel computing algorithms adopted in the framework to solve multiple computational problems simultaneously are also summarized in the paper.

  4. Parallel Imports, Drag Price Control and Pharmaceutical Innovation

    OpenAIRE

    Ken Tabata; Testuya Shinkai; Satoru Tanaka; Makoto Okamura

    2005-01-01

    This paper examines how parallel importation influences pharmaceutical innovation and the welfare of the economy, when crossnational drug price differentials occur not only because of demand elasticity based factors, but also governmental drug price control based factors. By explicitly considering the governmental drug price control baaed factors, this paper shows that parallel importation may enhance pharmaceutical innovation, when the bargaining power of a foreign government is strong and t...

  5. Development of whole core thermal-hydraulic analysis program ACT. 4. Simplified fuel assembly model and parallelization by MPI

    International Nuclear Information System (INIS)

    Ohshima, Hiroyuki

    2001-10-01

    A whole core thermal-hydraulic analysis program ACT is being developed for the purpose of evaluating detailed in-core thermal hydraulic phenomena of fast reactors including the effect of the flow between wrapper-tube walls (inter-wrapper flow) under various reactor operation conditions. As appropriate boundary conditions in addition to a detailed modeling of the core are essential for accurate simulations of in-core thermal hydraulics, ACT consists of not only fuel assembly and inter-wrapper flow analysis modules but also a heat transport system analysis module that gives response of the plant dynamics to the core model. This report describes incorporation of a simplified model to the fuel assembly analysis module and program parallelization by a message passing method toward large-scale simulations. ACT has a fuel assembly analysis module which can simulate a whole fuel pin bundle in each fuel assembly of the core and, however, it may take much CPU time for a large-scale core simulation. Therefore, a simplified fuel assembly model that is thermal-hydraulically equivalent to the detailed one has been incorporated in order to save the simulation time and resources. This simplified model is applied to several parts of fuel assemblies in a core where the detailed simulation results are not required. With regard to the program parallelization, the calculation load and the data flow of ACT were analyzed and the optimum parallelization has been done including the improvement of the numerical simulation algorithm of ACT. Message Passing Interface (MPI) is applied to data communication between processes and synchronization in parallel calculations. Parallelized ACT was verified through a comparison simulation with the original one. In addition to the above works, input manuals of the core analysis module and the heat transport system analysis module have been prepared. (author)

  6. Parallel processing of genomics data

    Science.gov (United States)

    Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario

    2016-10-01

    The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.

  7. Node-based finite element method for large-scale adaptive fluid analysis in parallel environments

    International Nuclear Information System (INIS)

    Toshimitsu, Fujisawa; Genki, Yagawa

    2003-01-01

    In this paper, a FEM-based (finite element method) mesh free method with a probabilistic node generation technique is presented. In the proposed method, all computational procedures, from the mesh generation to the solution of a system of equations, can be performed fluently in parallel in terms of nodes. Local finite element mesh is generated robustly around each node, even for harsh boundary shapes such as cracks. The algorithm and the data structure of finite element calculation are based on nodes, and parallel computing is realized by dividing a system of equations by the row of the global coefficient matrix. In addition, the node-based finite element method is accompanied by a probabilistic node generation technique, which generates good-natured points for nodes of finite element mesh. Furthermore, the probabilistic node generation technique can be performed in parallel environments. As a numerical example of the proposed method, we perform a compressible flow simulation containing strong shocks. Numerical simulations with frequent mesh refinement, which are required for such kind of analysis, can effectively be performed on parallel processors by using the proposed method. (authors)

  8. Node-based finite element method for large-scale adaptive fluid analysis in parallel environments

    Energy Technology Data Exchange (ETDEWEB)

    Toshimitsu, Fujisawa [Tokyo Univ., Collaborative Research Center of Frontier Simulation Software for Industrial Science, Institute of Industrial Science (Japan); Genki, Yagawa [Tokyo Univ., Department of Quantum Engineering and Systems Science (Japan)

    2003-07-01

    In this paper, a FEM-based (finite element method) mesh free method with a probabilistic node generation technique is presented. In the proposed method, all computational procedures, from the mesh generation to the solution of a system of equations, can be performed fluently in parallel in terms of nodes. Local finite element mesh is generated robustly around each node, even for harsh boundary shapes such as cracks. The algorithm and the data structure of finite element calculation are based on nodes, and parallel computing is realized by dividing a system of equations by the row of the global coefficient matrix. In addition, the node-based finite element method is accompanied by a probabilistic node generation technique, which generates good-natured points for nodes of finite element mesh. Furthermore, the probabilistic node generation technique can be performed in parallel environments. As a numerical example of the proposed method, we perform a compressible flow simulation containing strong shocks. Numerical simulations with frequent mesh refinement, which are required for such kind of analysis, can effectively be performed on parallel processors by using the proposed method. (authors)

  9. A qualitative single case study of parallel processes

    DEFF Research Database (Denmark)

    Jacobsen, Claus Haugaard

    2007-01-01

    Parallel process in psychotherapy and supervision is a phenomenon manifest in relationships and interactions, that originates in one setting and is reflected in another. This article presents an explorative single case study of parallel processes based on qualitative analyses of two successive...... randomly chosen psychotherapy sessions with a schizophrenic patient and the supervision session given in between. The author's analysis is verified by an independent examiner's analysis. Parallel processes are identified and described. Reflections on the dynamics of parallel processes and supervisory...

  10. Linear stability analysis of heated parallel channels

    International Nuclear Information System (INIS)

    Nourbakhsh, H.P.; Isbin, H.S.

    1982-01-01

    An analyis is presented of thermal hydraulic stability of flow in parallel channels covering the range from inlet subcooling to exit superheat. The model is based on a one-dimensional drift velocity formulation of the two phase flow conservation equations. The system of equations is linearized by assuming small disturbances about the steady state. The dynamic response of the system to an inlet flow perturbation is derived yielding the characteristic equation which predicts the onset of instabilities. A specific application is carried out for homogeneous and regional uniformly heated systems. The particular case of equal characteristic frequencies of two-phase and single phase vapor region is studied in detail. The D-partition method and the Mikhailov stability criterion are used for determining the marginal stability boundary. Stability predictions from the present analysis are compared with the experimental data from the solar test facility. 8 references

  11. Vectorization, parallelization and porting of nuclear codes on the VPP500 system (parallelization). Progress report fiscal 1996

    Energy Technology Data Exchange (ETDEWEB)

    Watanabe, Hideo; Kawai, Wataru; Nemoto, Toshiyuki [Fujitsu Ltd., Tokyo (Japan); and others

    1997-12-01

    Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. These results are reported in 3 parts, i.e., the vectorization part, the parallelization part and the porting part. In this report, we describe the parallelization. In this parallelization part, the parallelization of 2-Dimensional relativistic electromagnetic particle code EM2D, Cylindrical Direct Numerical Simulation code CYLDNS and molecular dynamics code for simulating radiation damages in diamond crystals DGR are described. In the vectorization part, the vectorization of two and three dimensional discrete ordinates simulation code DORT-TORT, gas dynamics analysis code FLOWGR and relativistic Boltzmann-Uehling-Uhlenbeck simulation code RBUU are described. And then, in the porting part, the porting of reactor safety analysis code RELAP5/MOD3.2 and RELAP5/MOD3.2.1.2, nuclear data processing system NJOY and 2-D multigroup discrete ordinate transport code TWOTRAN-II are described. And also, a survey for the porting of command-driven interactive data analysis plotting program IPLOT are described. (author)

  12. Computer-Aided Parallelizer and Optimizer

    Science.gov (United States)

    Jin, Haoqiang

    2011-01-01

    The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.

  13. Research on parallel algorithm for sequential pattern mining

    Science.gov (United States)

    Zhou, Lijuan; Qin, Bai; Wang, Yu; Hao, Zhongxiao

    2008-03-01

    Sequential pattern mining is the mining of frequent sequences related to time or other orders from the sequence database. Its initial motivation is to discover the laws of customer purchasing in a time section by finding the frequent sequences. In recent years, sequential pattern mining has become an important direction of data mining, and its application field has not been confined to the business database and has extended to new data sources such as Web and advanced science fields such as DNA analysis. The data of sequential pattern mining has characteristics as follows: mass data amount and distributed storage. Most existing sequential pattern mining algorithms haven't considered the above-mentioned characteristics synthetically. According to the traits mentioned above and combining the parallel theory, this paper puts forward a new distributed parallel algorithm SPP(Sequential Pattern Parallel). The algorithm abides by the principal of pattern reduction and utilizes the divide-and-conquer strategy for parallelization. The first parallel task is to construct frequent item sets applying frequent concept and search space partition theory and the second task is to structure frequent sequences using the depth-first search method at each processor. The algorithm only needs to access the database twice and doesn't generate the candidated sequences, which abates the access time and improves the mining efficiency. Based on the random data generation procedure and different information structure designed, this paper simulated the SPP algorithm in a concrete parallel environment and implemented the AprioriAll algorithm. The experiments demonstrate that compared with AprioriAll, the SPP algorithm had excellent speedup factor and efficiency.

  14. Functional efficiency comparison between split- and parallel-hybrid using advanced energy flow analysis methods

    Energy Technology Data Exchange (ETDEWEB)

    Guttenberg, Philipp; Lin, Mengyan [Romax Technology, Nottingham (United Kingdom)

    2009-07-01

    The following paper presents a comparative efficiency analysis of the Toyota Prius versus the Honda Insight using advanced Energy Flow Analysis methods. The sample study shows that even very different hybrid concepts like a split- and a parallel-hybrid can be compared in a high level of detail and demonstrates the benefit showing exemplary results. (orig.)

  15. Advanced mathematical on-line analysis in nuclear experiments. Usage of parallel computing CUDA routines in standard root analysis

    Science.gov (United States)

    Grzeszczuk, A.; Kowalski, S.

    2015-04-01

    Compute Unified Device Architecture (CUDA) is a parallel computing platform developed by Nvidia for increase speed of graphics by usage of parallel mode for processes calculation. The success of this solution has opened technology General-Purpose Graphic Processor Units (GPGPUs) for applications not coupled with graphics. The GPGPUs system can be applying as effective tool for reducing huge number of data for pulse shape analysis measures, by on-line recalculation or by very quick system of compression. The simplified structure of CUDA system and model of programming based on example Nvidia GForce GTX580 card are presented by our poster contribution in stand-alone version and as ROOT application.

  16. Is Monte Carlo embarrassingly parallel?

    Energy Technology Data Exchange (ETDEWEB)

    Hoogenboom, J. E. [Delft Univ. of Technology, Mekelweg 15, 2629 JB Delft (Netherlands); Delft Nuclear Consultancy, IJsselzoom 2, 2902 LB Capelle aan den IJssel (Netherlands)

    2012-07-01

    Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)

  17. Is Monte Carlo embarrassingly parallel?

    International Nuclear Information System (INIS)

    Hoogenboom, J. E.

    2012-01-01

    Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)

  18. Position Analysis of a Hybrid Serial-Parallel Manipulator in Immersion Lithography

    Directory of Open Access Journals (Sweden)

    Jie-jie Shao

    2015-01-01

    Full Text Available This paper proposes a novel hybrid serial-parallel mechanism with 6 degrees of freedom. The new mechanism combines two different parallel modules in a serial form. 3-P̲(PH parallel module is architecture of 3 degrees of freedom based on higher joints and specializes in describing two planes’ relative pose. 3-P̲SP parallel module is typical architecture which has been widely investigated in recent researches. In this paper, the direct-inverse position problems of the 3-P̲SP parallel module in the couple mixed-type mode are analyzed in detail, and the solutions are obtained in an analytical form. Furthermore, the solutions for the direct and inverse position problems of the novel hybrid serial-parallel mechanism are also derived and obtained in the analytical form. The proposed hybrid serial-parallel mechanism is applied to regulate the immersion hood’s pose in an immersion lithography system. Through measuring and regulating the pose of the immersion hood with respect to the wafer surface simultaneously, the immersion hood can track the wafer surface’s pose in real-time and the gap status is stabilized. This is another exploration to hybrid serial-parallel mechanism’s application.

  19. Massively Parallel Finite Element Programming

    KAUST Repository

    Heister, Timo; Kronbichler, Martin; Bangerth, Wolfgang

    2010-01-01

    Today's large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.

  20. Massively Parallel Finite Element Programming

    KAUST Repository

    Heister, Timo

    2010-01-01

    Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.

  1. Operation Analysis of the Series-Parallel Resonant Converter Working above Resonance Frequency

    Directory of Open Access Journals (Sweden)

    Peter Dzurko

    2006-01-01

    Full Text Available The present article deals with theoretical analysis of operation of a series-parallel converter working above resonance frequency. Derived are principal equations for individual operation intervals. Based on these made out are waveforms of individual quantities during both the inverter operation at load and no-load operation. The waveforms may be utilised at designing the inverter individual parts.

  2. cudaBayesreg: Parallel Implementation of a Bayesian Multilevel Model for fMRI Data Analysis

    Directory of Open Access Journals (Sweden)

    Adelino R. Ferreira da Silva

    2011-10-01

    Full Text Available Graphic processing units (GPUs are rapidly gaining maturity as powerful general parallel computing devices. A key feature in the development of modern GPUs has been the advancement of the programming model and programming tools. Compute Unified Device Architecture (CUDA is a software platform for massively parallel high-performance computing on Nvidia many-core GPUs. In functional magnetic resonance imaging (fMRI, the volume of the data to be processed, and the type of statistical analysis to perform call for high-performance computing strategies. In this work, we present the main features of the R-CUDA package cudaBayesreg which implements in CUDA the core of a Bayesian multilevel model for the analysis of brain fMRI data. The statistical model implements a Gibbs sampler for multilevel/hierarchical linear models with a normal prior. The main contribution for the increased performance comes from the use of separate threads for fitting the linear regression model at each voxel in parallel. The R-CUDA implementation of the Bayesian model proposed here has been able to reduce significantly the run-time processing of Markov chain Monte Carlo (MCMC simulations used in Bayesian fMRI data analyses. Presently, cudaBayesreg is only configured for Linux systems with Nvidia CUDA support.

  3. Quantitative analysis of pulmonary perfusion using time-resolved parallel 3D MRI - initial results

    International Nuclear Information System (INIS)

    Fink, C.; Buhmann, R.; Plathow, C.; Puderbach, M.; Kauczor, H.U.; Risse, F.; Ley, S.; Meyer, F.J.

    2004-01-01

    Purpose: to assess the use of time-resolved parallel 3D MRI for a quantitative analysis of pulmonary perfusion in patients with cardiopulmonary disease. Materials and methods: eight patients with pulmonary embolism or pulmonary hypertension were examined with a time-resolved 3D gradient echo pulse sequence with parallel imaging techniques (FLASH 3D, TE/TR: 0.8/1.9 ms; flip angle: 40 ; GRAPPA). A quantitative perfusion analysis based on indicator dilution theory was performed using a dedicated software. Results: patients with pulmonary embolism or chronic thromboembolic pulmonary hypertension revealed characteristic wedge-shaped perfusion defects at perfusion MRI. They were characterized by a decreased pulmonary blood flow (PBF) and pulmonary blood volume (PBV) and increased mean transit time (MTT). Patients with primary pulmonary hypertension or eisenmenger syndrome showed a more homogeneous perfusion pattern. The mean MTT of all patients was 3.3 - 4.7 s. The mean PBF and PBV showed a broader interindividual variation (PBF: 104-322 ml/100 ml/min; PBV: 8 - 21 ml/100 ml). Conclusion: time-resolved parallel 3D MRI allows at least a semi-quantitative assessment of lung perfusion. Future studies will have to assess the clinical value of this quantitative information for the diagnosis and management of cardiopulmonary disease. (orig.) [de

  4. Generalized Analytical Program of Thyristor Phase Control Circuit with Series and Parallel Resonance Load

    OpenAIRE

    Nakanishi, Sen-ichiro; Ishida, Hideaki; Himei, Toyoji

    1981-01-01

    The systematic analytical method is reqUired for the ac phase control circuit by means of an inverse parallel thyristor pair which has a series and parallel L-C resonant load, because the phase control action causes abnormal and interesting phenomena, such as an extreme increase of voltage and current, an unique increase and decrease of contained higher harmonics, and a wide variation of power factor, etc. In this paper, the program for the analysis of the thyristor phase control circuit with...

  5. Screw Theory Based Singularity Analysis of Lower-Mobility Parallel Robots considering the Motion/Force Transmissibility and Constrainability

    Directory of Open Access Journals (Sweden)

    Xiang Chen

    2015-01-01

    Full Text Available Singularity is an inherent characteristic of parallel robots and is also a typical mathematical problem in engineering application. In general, to identify singularity configuration, the singular solution in mathematics should be derived. This work introduces an alternative approach to the singularity identification of lower-mobility parallel robots considering the motion/force transmissibility and constrainability. The theory of screws is used as the mathematic tool to define the transmission and constraint indices of parallel robots. The singularity is hereby classified into four types concerning both input and output members of a parallel robot, that is, input transmission singularity, output transmission singularity, input constraint singularity, and output constraint singularity. Furthermore, we take several typical parallel robots as examples to illustrate the process of singularity analysis. Particularly, the input and output constraint singularities which are firstly proposed in this work are depicted in detail. The results demonstrate that the method can not only identify all possible singular configurations, but also explain their physical meanings. Therefore, the proposed approach is proved to be comprehensible and effective in solving singularity problems in parallel mechanisms.

  6. Fourier analysis of parallel inexact Block-Jacobi splitting with transport synthetic acceleration in slab geometry

    International Nuclear Information System (INIS)

    Rosa, M.; Warsa, J. S.; Chang, J. H.

    2006-01-01

    A Fourier analysis is conducted for the discrete-ordinates (SN) approximation of the neutron transport problem solved with Richardson iteration (Source Iteration) and Richardson iteration preconditioned with Transport Synthetic Acceleration (TSA), using the Parallel Block-Jacobi (PBJ) algorithm. Both 'traditional' TSA (TTSA) and a 'modified' TSA (MTSA), in which only the scattering in the low order equations is reduced by some non-negative factor β and < 1, are considered. The results for the un-accelerated algorithm show that convergence of the PBJ algorithm can degrade. The PBJ algorithm with TTSA can be effective provided the β parameter is properly tuned for a given scattering ratio c, but is potentially unstable. Compared to TTSA, MTSA is less sensitive to the choice of β, more effective for the same computational effort (c'), and it is unconditionally stable. (authors)

  7. DEA Sensitivity Analysis for Parallel Production Systems

    Directory of Open Access Journals (Sweden)

    J. Gerami

    2011-06-01

    Full Text Available In this paper, we introduce systems consisting of several production units, each of which include several subunits working in parallel. Meanwhile, each subunit is working independently. The input and output of each production unit are the sums of the inputs and outputs of its subunits, respectively. We consider each of these subunits as an independent decision making unit(DMU and create the production possibility set(PPS produced by these DMUs, in which the frontier points are considered as efficient DMUs. Then we introduce models for obtaining the efficiency of the production subunits. Using super-efficiency models, we categorize all efficient subunits into different efficiency classes. Then we follow by presenting the sensitivity analysis and stability problem for efficient subunits, including extreme efficient and non-extreme efficient subunits, assuming simultaneous perturbations in all inputs and outputs of subunits such that the efficiency of the subunit under evaluation declines while the efficiencies of other subunits improve.

  8. Automatic Parallelization An Overview of Fundamental Compiler Techniques

    CERN Document Server

    Midkiff, Samuel P

    2012-01-01

    Compiling for parallelism is a longstanding topic of compiler research. This book describes the fundamental principles of compiling "regular" numerical programs for parallelism. We begin with an explanation of analyses that allow a compiler to understand the interaction of data reads and writes in different statements and loop iterations during program execution. These analyses include dependence analysis, use-def analysis and pointer analysis. Next, we describe how the results of these analyses are used to enable transformations that make loops more amenable to parallelization, and

  9. Multi-GPU parallel algorithm design and analysis for improved inversion of probability tomography with gravity gradiometry data

    Science.gov (United States)

    Hou, Zhenlong; Huang, Danian

    2017-09-01

    In this paper, we make a study on the inversion of probability tomography (IPT) with gravity gradiometry data at first. The space resolution of the results is improved by multi-tensor joint inversion, depth weighting matrix and the other methods. Aiming at solving the problems brought by the big data in the exploration, we present the parallel algorithm and the performance analysis combining Compute Unified Device Architecture (CUDA) with Open Multi-Processing (OpenMP) based on Graphics Processing Unit (GPU) accelerating. In the test of the synthetic model and real data from Vinton Dome, we get the improved results. It is also proved that the improved inversion algorithm is effective and feasible. The performance of parallel algorithm we designed is better than the other ones with CUDA. The maximum speedup could be more than 200. In the performance analysis, multi-GPU speedup and multi-GPU efficiency are applied to analyze the scalability of the multi-GPU programs. The designed parallel algorithm is demonstrated to be able to process larger scale of data and the new analysis method is practical.

  10. A CS1 pedagogical approach to parallel thinking

    Science.gov (United States)

    Rague, Brian William

    Almost all collegiate programs in Computer Science offer an introductory course in programming primarily devoted to communicating the foundational principles of software design and development. The ACM designates this introduction to computer programming course for first-year students as CS1, during which methodologies for solving problems within a discrete computational context are presented. Logical thinking is highlighted, guided primarily by a sequential approach to algorithm development and made manifest by typically using the latest, commercially successful programming language. In response to the most recent developments in accessible multicore computers, instructors of these introductory classes may wish to include training on how to design workable parallel code. Novel issues arise when programming concurrent applications which can make teaching these concepts to beginning programmers a seemingly formidable task. Student comprehension of design strategies related to parallel systems should be monitored to ensure an effective classroom experience. This research investigated the feasibility of integrating parallel computing concepts into the first-year CS classroom. To quantitatively assess student comprehension of parallel computing, an experimental educational study using a two-factor mixed group design was conducted to evaluate two instructional interventions in addition to a control group: (1) topic lecture only, and (2) topic lecture with laboratory work using a software visualization Parallel Analysis Tool (PAT) specifically designed for this project. A new evaluation instrument developed for this study, the Perceptions of Parallelism Survey (PoPS), was used to measure student learning regarding parallel systems. The results from this educational study show a statistically significant main effect among the repeated measures, implying that student comprehension levels of parallel concepts as measured by the PoPS improve immediately after the delivery of

  11. VALIDATION OF CRACK INTERACTION LIMIT MODEL FOR PARALLEL EDGE CRACKS USING TWO-DIMENSIONAL FINITE ELEMENT ANALYSIS

    Directory of Open Access Journals (Sweden)

    R. Daud

    2013-06-01

    Full Text Available Shielding interaction effects of two parallel edge cracks in finite thickness plates subjected to remote tension load is analyzed using a developed finite element analysis program. In the present study, the crack interaction limit is evaluated based on the fitness of service (FFS code, and focus is given to the weak crack interaction region as the crack interval exceeds the length of cracks (b > a. Crack interaction factors are evaluated based on stress intensity factors (SIFs for Mode I SIFs using a displacement extrapolation technique. Parametric studies involved a wide range of crack-to-width (0.05 ≤ a/W ≤ 0.5 and crack interval ratios (b/a > 1. For validation, crack interaction factors are compared with single edge crack SIFs as a state of zero interaction. Within the considered range of parameters, the proposed numerical evaluation used to predict the crack interaction factor reduces the error of existing analytical solution from 1.92% to 0.97% at higher a/W. In reference to FFS codes, the small discrepancy in the prediction of the crack interaction factor validates the reliability of the numerical model to predict crack interaction limits under shielding interaction effects. In conclusion, the numerical model gave a successful prediction in estimating the crack interaction limit, which can be used as a reference for the shielding orientation of other cracks.

  12. Advanced mathematical on-line analysis in nuclear experiments. Usage of parallel computing CUDA routines in standard root analysis

    Directory of Open Access Journals (Sweden)

    Grzeszczuk A.

    2015-01-01

    Full Text Available Compute Unified Device Architecture (CUDA is a parallel computing platform developed by Nvidia for increase speed of graphics by usage of parallel mode for processes calculation. The success of this solution has opened technology General-Purpose Graphic Processor Units (GPGPUs for applications not coupled with graphics. The GPGPUs system can be applying as effective tool for reducing huge number of data for pulse shape analysis measures, by on-line recalculation or by very quick system of compression. The simplified structure of CUDA system and model of programming based on example Nvidia GForce GTX580 card are presented by our poster contribution in stand-alone version and as ROOT application.

  13. PARALLEL IMPORT: REALITY FOR RUSSIA

    Directory of Open Access Journals (Sweden)

    Т. А. Сухопарова

    2014-01-01

    Full Text Available Problem of parallel import is urgent question at now. Parallel import legalization in Russia is expedient. Such statement based on opposite experts opinion analysis. At the same time it’s necessary to negative consequences consider of this decision and to apply remedies to its minimization.Purchase on Elibrary.ru > Buy now

  14. Open | SpeedShop: An Open Source Infrastructure for Parallel Performance Analysis

    Directory of Open Access Journals (Sweden)

    Martin Schulz

    2008-01-01

    Full Text Available Over the last decades a large number of performance tools has been developed to analyze and optimize high performance applications. Their acceptance by end users, however, has been slow: each tool alone is often limited in scope and comes with widely varying interfaces and workflow constraints, requiring different changes in the often complex build and execution infrastructure of the target application. We started the Open | SpeedShop project about 3 years ago to overcome these limitations and provide efficient, easy to apply, and integrated performance analysis for parallel systems. Open | SpeedShop has two different faces: it provides an interoperable tool set covering the most common analysis steps as well as a comprehensive plugin infrastructure for building new tools. In both cases, the tools can be deployed to large scale parallel applications using DPCL/Dyninst for distributed binary instrumentation. Further, all tools developed within or on top of Open | SpeedShop are accessible through multiple fully equivalent interfaces including an easy-to-use GUI as well as an interactive command line interface reducing the usage threshold for those tools.

  15. Parallel processing from applications to systems

    CERN Document Server

    Moldovan, Dan I

    1993-01-01

    This text provides one of the broadest presentations of parallelprocessing available, including the structure of parallelprocessors and parallel algorithms. The emphasis is on mappingalgorithms to highly parallel computers, with extensive coverage ofarray and multiprocessor architectures. Early chapters provideinsightful coverage on the analysis of parallel algorithms andprogram transformations, effectively integrating a variety ofmaterial previously scattered throughout the literature. Theory andpractice are well balanced across diverse topics in this concisepresentation. For exceptional cla

  16. Massively Parallel, Molecular Analysis Platform Developed Using a CMOS Integrated Circuit With Biological Nanopores

    Science.gov (United States)

    Roever, Stefan

    2012-01-01

    A massively parallel, low cost molecular analysis platform will dramatically change the nature of protein, molecular and genomics research, DNA sequencing, and ultimately, molecular diagnostics. An integrated circuit (IC) with 264 sensors was fabricated using standard CMOS semiconductor processing technology. Each of these sensors is individually controlled with precision analog circuitry and is capable of single molecule measurements. Under electronic and software control, the IC was used to demonstrate the feasibility of creating and detecting lipid bilayers and biological nanopores using wild type α-hemolysin. The ability to dynamically create bilayers over each of the sensors will greatly accelerate pore development and pore mutation analysis. In addition, the noise performance of the IC was measured to be 30fA(rms). With this noise performance, single base detection of DNA was demonstrated using α-hemolysin. The data shows that a single molecule, electrical detection platform using biological nanopores can be operationalized and can ultimately scale to millions of sensors. Such a massively parallel platform will revolutionize molecular analysis and will completely change the field of molecular diagnostics in the future.

  17. Parallel and vector implementation of APROS simulator code

    International Nuclear Information System (INIS)

    Niemi, J.; Tommiska, J.

    1990-01-01

    In this paper the vector and parallel processing implementation of a general purpose simulator code is discussed. In this code the utilization of vector processing is straightforward. In addition to the loop level parallel processing, the functional decomposition and the domain decomposition have been considered. Results represented for a PWR-plant simulation illustrate the potential speed-up factors of the alternatives. It turns out that the loop level parallelism and the domain decomposition are the most promising alternative to employ the parallel processing. (author)

  18. Kinematics and dynamics analysis of a novel serial-parallel dynamic simulator

    Energy Technology Data Exchange (ETDEWEB)

    Hu, Bo; Zhang, Lian Dong; Yu, Jingjing [Parallel Robot and Mechatronic System Laboratory of Hebei Province, Yanshan University, Qinhuangdao, Hebei (China)

    2016-11-15

    A serial-parallel dynamics simulator based on serial-parallel manipulator is proposed. According to the dynamics simulator motion requirement, the proposed serial-parallel dynamics simulator formed by 3-RRS (active revolute joint-revolute joint-spherical joint) and 3-SPR (Spherical joint-active prismatic joint-revolute joint) PMs adopts the outer and inner layout. By integrating the kinematics, constraint and coupling information of the 3-RRS and 3-SPR PMs into the serial-parallel manipulator, the inverse Jacobian matrix, velocity, and acceleration of the serial-parallel dynamics simulator are studied. Based on the principle of virtual work and the kinematics model, the inverse dynamic model is established. Finally, the workspace of the (3-RRS)+(3-SPR) dynamics simulator is constructed.

  19. Kinematics and dynamics analysis of a novel serial-parallel dynamic simulator

    International Nuclear Information System (INIS)

    Hu, Bo; Zhang, Lian Dong; Yu, Jingjing

    2016-01-01

    A serial-parallel dynamics simulator based on serial-parallel manipulator is proposed. According to the dynamics simulator motion requirement, the proposed serial-parallel dynamics simulator formed by 3-RRS (active revolute joint-revolute joint-spherical joint) and 3-SPR (Spherical joint-active prismatic joint-revolute joint) PMs adopts the outer and inner layout. By integrating the kinematics, constraint and coupling information of the 3-RRS and 3-SPR PMs into the serial-parallel manipulator, the inverse Jacobian matrix, velocity, and acceleration of the serial-parallel dynamics simulator are studied. Based on the principle of virtual work and the kinematics model, the inverse dynamic model is established. Finally, the workspace of the (3-RRS)+(3-SPR) dynamics simulator is constructed

  20. The cost of conservative synchronization in parallel discrete event simulations

    Science.gov (United States)

    Nicol, David M.

    1990-01-01

    The performance of a synchronous conservative parallel discrete-event simulation protocol is analyzed. The class of simulation models considered is oriented around a physical domain and possesses a limited ability to predict future behavior. A stochastic model is used to show that as the volume of simulation activity in the model increases relative to a fixed architecture, the complexity of the average per-event overhead due to synchronization, event list manipulation, lookahead calculations, and processor idle time approach the complexity of the average per-event overhead of a serial simulation. The method is therefore within a constant factor of optimal. The analysis demonstrates that on large problems--those for which parallel processing is ideally suited--there is often enough parallel workload so that processors are not usually idle. The viability of the method is also demonstrated empirically, showing how good performance is achieved on large problems using a thirty-two node Intel iPSC/2 distributed memory multiprocessor.

  1. Spatial data analytics on heterogeneous multi- and many-core parallel architectures using python

    Science.gov (United States)

    Laura, Jason R.; Rey, Sergio J.

    2017-01-01

    Parallel vector spatial analysis concerns the application of parallel computational methods to facilitate vector-based spatial analysis. The history of parallel computation in spatial analysis is reviewed, and this work is placed into the broader context of high-performance computing (HPC) and parallelization research. The rise of cyber infrastructure and its manifestation in spatial analysis as CyberGIScience is seen as a main driver of renewed interest in parallel computation in the spatial sciences. Key problems in spatial analysis that have been the focus of parallel computing are covered. Chief among these are spatial optimization problems, computational geometric problems including polygonization and spatial contiguity detection, the use of Monte Carlo Markov chain simulation in spatial statistics, and parallel implementations of spatial econometric methods. Future directions for research on parallelization in computational spatial analysis are outlined.

  2. Spatiotemporal Distribution, Sources, and Photobleaching Imprint of Dissolved Organic Matter in the Yangtze Estuary and Its Adjacent Sea Using Fluorescence and Parallel Factor Analysis

    Science.gov (United States)

    Li, Penghui; Chen, Ling; Zhang, Wen; Huang, Qinghui

    2015-01-01

    To investigate the seasonal and interannual dynamics of dissolved organic matter (DOM) in the Yangtze Estuary, surface and bottom water samples in the Yangtze Estuary and its adjacent sea were collected and characterized using fluorescence excitation-emission matrices (EEMs) and parallel factor analysis (PARAFAC) in both dry and wet seasons in 2012 and 2013. Two protein-like components and three humic-like components were identified. Three humic-like components decreased linearly with increasing salinity (r>0.90, p<0.001), suggesting their distribution could primarily be controlled by physical mixing. By contrast, two protein-like components fell below the theoretical mixing line, largely due to microbial degradation and removal during mixing. Higher concentrations of humic-like components found in 2012 could be attributed to higher freshwater discharge relative to 2013. There was a lack of systematic patterns for three humic-like components between seasons and years, probably due to variations of other factors such as sources and characteristics. Highest concentrations of fluorescent components, observed in estuarine turbidity maximum (ETM) region, could be attributed to sediment resuspension and subsequent release of DOM, supported by higher concentrations of fluorescent components in bottom water than in surface water at two stations where sediments probably resuspended. Meanwhile, photobleaching could be reflected from the changes in the ratios between fluorescence intensity (Fmax) of humic-like components and chromophoric DOM (CDOM) absorption coefficient (a355) along the salinity gradient. This study demonstrates the abundance and composition of DOM in estuaries are controlled not only by hydrological conditions, but also by its sources, characteristics and related estuarine biogeochemical processes. PMID:26107640

  3. Spatiotemporal Distribution, Sources, and Photobleaching Imprint of Dissolved Organic Matter in the Yangtze Estuary and Its Adjacent Sea Using Fluorescence and Parallel Factor Analysis.

    Directory of Open Access Journals (Sweden)

    Penghui Li

    Full Text Available To investigate the seasonal and interannual dynamics of dissolved organic matter (DOM in the Yangtze Estuary, surface and bottom water samples in the Yangtze Estuary and its adjacent sea were collected and characterized using fluorescence excitation-emission matrices (EEMs and parallel factor analysis (PARAFAC in both dry and wet seasons in 2012 and 2013. Two protein-like components and three humic-like components were identified. Three humic-like components decreased linearly with increasing salinity (r>0.90, p<0.001, suggesting their distribution could primarily be controlled by physical mixing. By contrast, two protein-like components fell below the theoretical mixing line, largely due to microbial degradation and removal during mixing. Higher concentrations of humic-like components found in 2012 could be attributed to higher freshwater discharge relative to 2013. There was a lack of systematic patterns for three humic-like components between seasons and years, probably due to variations of other factors such as sources and characteristics. Highest concentrations of fluorescent components, observed in estuarine turbidity maximum (ETM region, could be attributed to sediment resuspension and subsequent release of DOM, supported by higher concentrations of fluorescent components in bottom water than in surface water at two stations where sediments probably resuspended. Meanwhile, photobleaching could be reflected from the changes in the ratios between fluorescence intensity (Fmax of humic-like components and chromophoric DOM (CDOM absorption coefficient (a355 along the salinity gradient. This study demonstrates the abundance and composition of DOM in estuaries are controlled not only by hydrological conditions, but also by its sources, characteristics and related estuarine biogeochemical processes.

  4. Vectorization, parallelization and porting of nuclear codes. Vectorization and parallelization. Progress report fiscal 1999

    Energy Technology Data Exchange (ETDEWEB)

    Adachi, Masaaki; Ogasawara, Shinobu; Kume, Etsuo [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment; Ishizuki, Shigeru; Nemoto, Toshiyuki; Kawasaki, Nobuo; Kawai, Wataru [Fujitsu Ltd., Tokyo (Japan); Yatake, Yo-ichi [Hitachi Ltd., Tokyo (Japan)

    2001-02-01

    Several computer codes in the nuclear field have been vectorized, parallelized and trans-ported on the FUJITSU VPP500 system, the AP3000 system, the SX-4 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 18 codes in fiscal 1999. These results are reported in 3 parts, i.e., the vectorization and the parallelization part on vector processors, the parallelization part on scalar processors and the porting part. In this report, we describe the vectorization and parallelization on vector processors. In this vectorization and parallelization on vector processors part, the vectorization of Relativistic Molecular Orbital Calculation code RSCAT, a microscopic transport code for high energy nuclear collisions code JAM, three-dimensional non-steady thermal-fluid analysis code STREAM, Relativistic Density Functional Theory code RDFT and High Speed Three-Dimensional Nodal Diffusion code MOSRA-Light on the VPP500 system and the SX-4 system are described. (author)

  5. Identification of Genetic Susceptibility to Childhood Cancer through Analysis of Genes in Parallel

    Science.gov (United States)

    Plon, Sharon E.; Wheeler, David A.; Strong, Louise C.; Tomlinson, Gail E.; Pirics, Michael; Meng, Qingchang; Cheung, Hannah C.; Begin, Phyllis R.; Muzny, Donna M.; Lewis, Lora; Biegel, Jaclyn A.; Gibbs, Richard A.

    2011-01-01

    Clinical cancer genetic susceptibility analysis typically proceeds sequentially beginning with the most likely causative gene. The process is time consuming and the yield is low particularly for families with unusual patterns of cancer. We determined the results of in parallel mutation analysis of a large cancer-associated gene panel. We performed deletion analysis and sequenced the coding regions of 45 genes (8 oncogenes and 37 tumor suppressor or DNA repair genes) in 48 childhood cancer patients who also (1) were diagnosed with a second malignancy under age 30, (2) have a sibling diagnosed with cancer under age 30 and/or (3) have a major congenital anomaly or developmental delay. Deleterious mutations were identified in 6 of 48 (13%) families, 4 of which met the sibling criteria. Mutations were identified in genes previously implicated in both dominant and recessive childhood syndromes including SMARCB1, PMS2, and TP53. No pathogenic deletions were identified. This approach has provided efficient identification of childhood cancer susceptibility mutations and will have greater utility as additional cancer susceptibility genes are identified. Integrating parallel analysis of large gene panels into clinical testing will speed results and increase diagnostic yield. The failure to detect mutations in 87% of families highlights that a number of childhood cancer susceptibility genes remain to be discovered. PMID:21356188

  6. Stiffness analysis and comparison of a Biglide parallel grinder with alternative spatial modular parallelograms

    DEFF Research Database (Denmark)

    Wu, Guanglei; Zou, Ping

    2017-01-01

    This paper deals with the stiffness modeling, analysis and comparison of a Biglide parallel grinder with two alternative modular parallelograms. It turns out that the Cartesian stiffness matrix of the manipulator has the property that it can be decoupled into two homogeneous matrices, correspondi...

  7. Parallel visualization on leadership computing resources

    Energy Technology Data Exchange (ETDEWEB)

    Peterka, T; Ross, R B [Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439 (United States); Shen, H-W [Department of Computer Science and Engineering, Ohio State University, Columbus, OH 43210 (United States); Ma, K-L [Department of Computer Science, University of California at Davis, Davis, CA 95616 (United States); Kendall, W [Department of Electrical Engineering and Computer Science, University of Tennessee at Knoxville, Knoxville, TN 37996 (United States); Yu, H, E-mail: tpeterka@mcs.anl.go [Sandia National Laboratories, California, Livermore, CA 94551 (United States)

    2009-07-01

    Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.

  8. Parallel visualization on leadership computing resources

    International Nuclear Information System (INIS)

    Peterka, T; Ross, R B; Shen, H-W; Ma, K-L; Kendall, W; Yu, H

    2009-01-01

    Changes are needed in the way that visualization is performed, if we expect the analysis of scientific data to be effective at the petascale and beyond. By using similar techniques as those used to parallelize simulations, such as parallel I/O, load balancing, and effective use of interprocess communication, the supercomputers that compute these datasets can also serve as analysis and visualization engines for them. Our team is assessing the feasibility of performing parallel scientific visualization on some of the most powerful computational resources of the U.S. Department of Energy's National Laboratories in order to pave the way for analyzing the next generation of computational results. This paper highlights some of the conclusions of that research.

  9. Productive Parallel Programming: The PCN Approach

    Directory of Open Access Journals (Sweden)

    Ian Foster

    1992-01-01

    Full Text Available We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel computer and run unchanged on supercomputers, and integrated debugging and performance analysis tools. We survey representative scientific applications and identify problem classes for which PCN has proved particularly useful.

  10. A Parallel Priority Queue with Constant Time Operations

    DEFF Research Database (Denmark)

    Brodal, Gerth Stølting; Träff, Jesper Larsson; Zaroliagis, Christos D.

    1998-01-01

    We present a parallel priority queue that supports the following operations in constant time:parallel insertionof a sequence of elements ordered according to key,parallel decrease keyfor a sequence of elements ordered according to key,deletion of the minimum key element, anddeletion of an arbitrary...... application is a parallel implementation of Dijkstra's algorithm for the single-source shortest path problem, which runs inO(n) time andO(mlogn) work on a CREW PRAM on graphs withnvertices andmedges. This is a logarithmic factor improvement in the running time compared with previous approaches....

  11. An environment for parallel structuring of Fortran programs

    International Nuclear Information System (INIS)

    Sridharan, K.; McShea, M.; Denton, C.; Eventoff, B.; Browne, J.C.; Newton, P.; Ellis, M.; Grossbard, D.; Wise, T.; Clemmer, D.

    1990-01-01

    The paper describes and illustrates an environment for interactive support of the detection and implementation of macro-level parallelism in Fortran programs. The approach couples algorithms for dependence analysis with both innovative techniques for complexity management and capabilities for the measurement and analysis of the parallel computation structures generated through use of the environment. The resulting environment is complementary to the more common approach of seeking local parallelism by loop unrolling, either by an automatic compiler or manually. (orig.)

  12. Parallel S/sub n/ iteration schemes

    International Nuclear Information System (INIS)

    Wienke, B.R.; Hiromoto, R.E.

    1986-01-01

    The iterative, multigroup, discrete ordinates (S/sub n/) technique for solving the linear transport equation enjoys widespread usage and appeal. Serial iteration schemes and numerical algorithms developed over the years provide a timely framework for parallel extension. On the Denelcor HEP, the authors investigate three parallel iteration schemes for solving the one-dimensional S/sub n/ transport equation. The multigroup representation and serial iteration methods are also reviewed. This analysis represents a first attempt to extend serial S/sub n/ algorithms to parallel environments and provides good baseline estimates on ease of parallel implementation, relative algorithm efficiency, comparative speedup, and some future directions. The authors examine ordered and chaotic versions of these strategies, with and without concurrent rebalance and diffusion acceleration. Two strategies efficiently support high degrees of parallelization and appear to be robust parallel iteration techniques. The third strategy is a weaker parallel algorithm. Chaotic iteration, difficult to simulate on serial machines, holds promise and converges faster than ordered versions of the schemes. Actual parallel speedup and efficiency are high and payoff appears substantial

  13. Kinematics analysis of a novel planar parallel manipulator with kinematic redundancy

    Energy Technology Data Exchange (ETDEWEB)

    Qu, Haibo; Guo, Sheng [Beijing Jiaotong University, Beijing (China)

    2017-04-15

    In this paper, a novel planar parallel manipulator with kinematic redundancy is proposed. First, the Degrees of freedom (DOF) of the whole parallel manipulator and the Relative DOF (RDOF) between the moving platform and fixed base are studied. The results indicate that the proposed mechanism is kinematically redundant. Then, the kinematics, Jacobian matrices and workspace of this proposed parallel manipulator with kinematic redundancy are analyzed. Finally, the statics simulation of the proposed parallel manipulator is performed. The obtained stress and displacement distribution can be used to determine the easily destroyed place in the mechanism configurations.

  14. Kinematics analysis of a novel planar parallel manipulator with kinematic redundancy

    International Nuclear Information System (INIS)

    Qu, Haibo; Guo, Sheng

    2017-01-01

    In this paper, a novel planar parallel manipulator with kinematic redundancy is proposed. First, the Degrees of freedom (DOF) of the whole parallel manipulator and the Relative DOF (RDOF) between the moving platform and fixed base are studied. The results indicate that the proposed mechanism is kinematically redundant. Then, the kinematics, Jacobian matrices and workspace of this proposed parallel manipulator with kinematic redundancy are analyzed. Finally, the statics simulation of the proposed parallel manipulator is performed. The obtained stress and displacement distribution can be used to determine the easily destroyed place in the mechanism configurations

  15. Distance-two interpolation for parallel algebraic multigrid

    International Nuclear Information System (INIS)

    Sterck, H de; Falgout, R D; Nolting, J W; Yang, U M

    2007-01-01

    In this paper we study the use of long distance interpolation methods with the low complexity coarsening algorithm PMIS. AMG performance and scalability is compared for classical as well as long distance interpolation methods on parallel computers. It is shown that the increased interpolation accuracy largely restores the scalability of AMG convergence factors for PMIS-coarsened grids, and in combination with complexity reducing methods, such as interpolation truncation, one obtains a class of parallel AMG methods that enjoy excellent scalability properties on large parallel computers

  16. Visual Analysis of North Atlantic Hurricane Trends Using Parallel Coordinates and Statistical Techniques

    National Research Council Canada - National Science Library

    Steed, Chad A; Fitzpatrick, Patrick J; Jankun-Kelly, T. J; Swan II, J. E

    2008-01-01

    ... for a particular dependent variable. These capabilities are combined into a unique visualization system that is demonstrated via a North Atlantic hurricane climate study using a systematic workflow. This research corroborates the notion that enhanced parallel coordinates coupled with statistical analysis can be used for more effective knowledge discovery and confirmation in complex, real-world data sets.

  17. Design, analysis and control of cable-suspended parallel robots and its applications

    CERN Document Server

    Zi, Bin

    2017-01-01

    This book provides an essential overview of the authors’ work in the field of cable-suspended parallel robots, focusing on innovative design, mechanics, control, development and applications. It presents and analyzes several typical mechanical architectures of cable-suspended parallel robots in practical applications, including the feed cable-suspended structure for super antennae, hybrid-driven-based cable-suspended parallel robots, and cooperative cable parallel manipulators for multiple mobile cranes. It also addresses the fundamental mechanics of cable-suspended parallel robots on the basis of their typical applications, including the kinematics, dynamics and trajectory tracking control of the feed cable-suspended structure for super antennae. In addition it proposes a novel hybrid-driven-based cable-suspended parallel robot that uses integrated mechanism design methods to improve the performance of traditional cable-suspended parallel robots. A comparative study on error and performance indices of hybr...

  18. ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains

    Science.gov (United States)

    Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz

    2016-01-01

    With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity. PMID:27420734

  19. Insight into the heterogeneous adsorption of humic acid fluorescent components on multi-walled carbon nanotubes by excitation-emission matrix and parallel factor analysis.

    Science.gov (United States)

    Yang, Chenghu; Liu, Yangzhi; Cen, Qiulin; Zhu, Yaxian; Zhang, Yong

    2018-02-01

    The heterogeneous adsorption behavior of commercial humic acid (HA) on pristine and functionalized multi-walled carbon nanotubes (MWCNTs) was investigated by fluorescence excitation-emission matrix and parallel factor (EEM- PARAFAC) analysis. The kinetics, isotherms, thermodynamics and mechanisms of adsorption of HA fluorescent components onto MWCNTs were the focus of the present study. Three humic-like fluorescent components were distinguished, including one carboxylic-like fluorophore C1 (λ ex /λ em = (250, 310) nm/428nm), and two phenolic-like fluorophores, C2 (λ ex /λ em = (300, 460) nm/552nm) and C3 (λ ex /λ em = (270, 375) nm/520nm). The Lagergren pseudo-second-order model can be used to describe the adsorption kinetics of the HA fluorescent components. In addition, both the Freundlich and Langmuir models can be suitably employed to describe the adsorption of the HA fluorescent components onto MWCNTs with significantly high correlation coefficients (R 2 > 0.94, Padsorption affinity (K d ) and nonlinear adsorption degree from the HA fluorescent components to MWCNTs was clearly observed. The adsorption mechanism suggested that the π-π electron donor-acceptor (EDA) interaction played an important role in the interaction between HA fluorescent components and the three MWCNTs. Furthermore, the values of the thermodynamic parameters, including the Gibbs free energy change (ΔG°), enthalpy change (ΔH°) and entropy change (ΔS°), showed that the adsorption of the HA fluorescent components on MWCNTs was spontaneous and exothermic. Copyright © 2017 Elsevier Inc. All rights reserved.

  20. Alleviating Search Uncertainty through Concept Associations: Automatic Indexing, Co-Occurrence Analysis, and Parallel Computing.

    Science.gov (United States)

    Chen, Hsinchun; Martinez, Joanne; Kirchhoff, Amy; Ng, Tobun D.; Schatz, Bruce R.

    1998-01-01

    Grounded on object filtering, automatic indexing, and co-occurrence analysis, an experiment was performed using a parallel supercomputer to analyze over 400,000 abstracts in an INSPEC computer engineering collection. A user evaluation revealed that system-generated thesauri were better than the human-generated INSPEC subject thesaurus in concept…

  1. Development of GPU Based Parallel Computing Module for Solving Pressure Equation in the CUPID Component Thermo-Fluid Analysis Code

    International Nuclear Information System (INIS)

    Lee, Jin Pyo; Joo, Han Gyu

    2010-01-01

    In the thermo-fluid analysis code named CUPID, the linear system of pressure equations must be solved in each iteration step. The time for repeatedly solving the linear system can be quite significant because large sparse matrices of Rank more than 50,000 are involved and the diagonal dominance of the system is hardly hold. Therefore parallelization of the linear system solver is essential to reduce the computing time. Meanwhile, Graphics Processing Units (GPU) have been developed as highly parallel, multi-core processors for the global demand of high quality 3D graphics. If a suitable interface is provided, parallelization using GPU can be available to engineering computing. NVIDIA provides a Software Development Kit(SDK) named CUDA(Compute Unified Device Architecture) to code developers so that they can manage GPUs for parallelization using the C language. In this research, we implement parallel routines for the linear system solver using CUDA, and examine the performance of the parallelization. In the next section, we will describe the method of CUDA parallelization for the CUPID code, and then the performance of the CUDA parallelization will be discussed

  2. Parallelization of quantum molecular dynamics simulation code

    International Nuclear Information System (INIS)

    Kato, Kaori; Kunugi, Tomoaki; Shibahara, Masahiko; Kotake, Susumu

    1998-02-01

    A quantum molecular dynamics simulation code has been developed for the analysis of the thermalization of photon energies in the molecule or materials in Kansai Research Establishment. The simulation code is parallelized for both Scalar massively parallel computer (Intel Paragon XP/S75) and Vector parallel computer (Fujitsu VPP300/12). Scalable speed-up has been obtained with a distribution to processor units by division of particle group in both parallel computers. As a result of distribution to processor units not only by particle group but also by the particles calculation that is constructed with fine calculations, highly parallelization performance is achieved in Intel Paragon XP/S75. (author)

  3. Operation States Analysis of the Series-Parallel resonant Converter Working Above Resonance Frequency

    Directory of Open Access Journals (Sweden)

    Peter Dzurko

    2007-01-01

    Full Text Available Operation states analysis of a series-parallel converter working above resonance frequency is described in the paper. Principal equations are derived for individual operation states. On the basis of them the diagrams are made out. The diagrams give the complex image of the converter behaviour for individual circuit parameters. The waveforms may be utilised at designing the inverter individual parts.

  4. BCYCLIC: A parallel block tridiagonal matrix cyclic solver

    Science.gov (United States)

    Hirshman, S. P.; Perumalla, K. S.; Lynch, V. E.; Sanchez, R.

    2010-09-01

    A block tridiagonal matrix is factored with minimal fill-in using a cyclic reduction algorithm that is easily parallelized. Storage of the factored blocks allows the application of the inverse to multiple right-hand sides which may not be known at factorization time. Scalability with the number of block rows is achieved with cyclic reduction, while scalability with the block size is achieved using multithreaded routines (OpenMP, GotoBLAS) for block matrix manipulation. This dual scalability is a noteworthy feature of this new solver, as well as its ability to efficiently handle arbitrary (non-powers-of-2) block row and processor numbers. Comparison with a state-of-the art parallel sparse solver is presented. It is expected that this new solver will allow many physical applications to optimally use the parallel resources on current supercomputers. Example usage of the solver in magneto-hydrodynamic (MHD), three-dimensional equilibrium solvers for high-temperature fusion plasmas is cited.

  5. Finite element electromagnetic field computation on the Sequent Symmetry 81 parallel computer

    International Nuclear Information System (INIS)

    Ratnajeevan, S.; Hoole, H.

    1990-01-01

    Finite element field analysis algorithms lend themselves to parallelization and this fact is exploited in this paper to implement a finite element analysis program for electromagnetic field computation on the Sequent Symmetry 81 parallel computer with three processors. In terms of waiting time, the maximum gains are to be made in matrix solution and therefore this paper concentrates on the gains in parallelizing the solution part of finite element analysis. An outline of how parallelization could be exploited in most finite element operations is given in this paper although the actual implemention of parallelism on the Sequent Symmetry 81 parallel computer was in sparsity computation, matrix assembly and the matrix solution areas. In all cases, the algorithms were modified suit the parallel programming application rather than allowing the compiler to parallelize on existing algorithms

  6. Kinematics and dynamics analysis of a quadruped walking robot with parallel leg mechanism

    Science.gov (United States)

    Wang, Hongbo; Sang, Lingfeng; Hu, Xing; Zhang, Dianfan; Yu, Hongnian

    2013-09-01

    It is desired to require a walking robot for the elderly and the disabled to have large capacity, high stiffness, stability, etc. However, the existing walking robots cannot achieve these requirements because of the weight-payload ratio and simple function. Therefore, Improvement of enhancing capacity and functions of the walking robot is an important research issue. According to walking requirements and combining modularization and reconfigurable ideas, a quadruped/biped reconfigurable walking robot with parallel leg mechanism is proposed. The proposed robot can be used for both a biped and a quadruped walking robot. The kinematics and performance analysis of a 3-UPU parallel mechanism which is the basic leg mechanism of a quadruped walking robot are conducted and the structural parameters are optimized. The results show that performance of the walking robot is optimal when the circumradius R, r of the upper and lower platform of leg mechanism are 161.7 mm, 57.7 mm, respectively. Based on the optimal results, the kinematics and dynamics of the quadruped walking robot in the static walking mode are derived with the application of parallel mechanism and influence coefficient theory, and the optimal coordination distribution of the dynamic load for the quadruped walking robot with over-determinate inputs is analyzed, which solves dynamic load coupling caused by the branches’ constraint of the robot in the walk process. Besides laying a theoretical foundation for development of the prototype, the kinematics and dynamics studies on the quadruped walking robot also boost the theoretical research of the quadruped walking and the practical applications of parallel mechanism.

  7. Parallel and Efficient Sensitivity Analysis of Microscopy Image Segmentation Workflows in Hybrid Systems.

    Science.gov (United States)

    Barreiros, Willian; Teodoro, George; Kurc, Tahsin; Kong, Jun; Melo, Alba C M A; Saltz, Joel

    2017-09-01

    We investigate efficient sensitivity analysis (SA) of algorithms that segment and classify image features in a large dataset of high-resolution images. Algorithm SA is the process of evaluating variations of methods and parameter values to quantify differences in the output. A SA can be very compute demanding because it requires re-processing the input dataset several times with different parameters to assess variations in output. In this work, we introduce strategies to efficiently speed up SA via runtime optimizations targeting distributed hybrid systems and reuse of computations from runs with different parameters. We evaluate our approach using a cancer image analysis workflow on a hybrid cluster with 256 nodes, each with an Intel Phi and a dual socket CPU. The SA attained a parallel efficiency of over 90% on 256 nodes. The cooperative execution using the CPUs and the Phi available in each node with smart task assignment strategies resulted in an additional speedup of about 2×. Finally, multi-level computation reuse lead to an additional speedup of up to 2.46× on the parallel version. The level of performance attained with the proposed optimizations will allow the use of SA in large-scale studies.

  8. An interactive parallel processor for data analysis

    International Nuclear Information System (INIS)

    Mong, J.; Logan, D.; Maples, C.; Rathbun, W.; Weaver, D.

    1984-01-01

    A parallel array of eight minicomputers has been assembled in an attempt to deal with kiloparameter data events. By exporting computer system functions to a separate processor, the authors have been able to achieve computer amplification linearly proportional to the number of executing processors

  9. A parallel implementation of 3D Zernike moment analysis

    Science.gov (United States)

    Berjón, Daniel; Arnaldo, Sergio; Morán, Francisco

    2011-01-01

    Zernike polynomials are a well known set of functions that find many applications in image or pattern characterization because they allow to construct shape descriptors that are invariant against translations, rotations or scale changes. The concepts behind them can be extended to higher dimension spaces, making them also fit to describe volumetric data. They have been less used than their properties might suggest due to their high computational cost. We present a parallel implementation of 3D Zernike moments analysis, written in C with CUDA extensions, which makes it practical to employ Zernike descriptors in interactive applications, yielding a performance of several frames per second in voxel datasets about 2003 in size. In our contribution, we describe the challenges of implementing 3D Zernike analysis in a general-purpose GPU. These include how to deal with numerical inaccuracies, due to the high precision demands of the algorithm, or how to deal with the high volume of input data so that it does not become a bottleneck for the system.

  10. Foundations of factor analysis

    CERN Document Server

    Mulaik, Stanley A

    2009-01-01

    Introduction Factor Analysis and Structural Theories Brief History of Factor Analysis as a Linear Model Example of Factor AnalysisMathematical Foundations for Factor Analysis Introduction Scalar AlgebraVectorsMatrix AlgebraDeterminants Treatment of Variables as Vectors Maxima and Minima of FunctionsComposite Variables and Linear Transformations Introduction Composite Variables Unweighted Composite VariablesDifferentially Weighted Composites Matrix EquationsMulti

  11. Characterizing chromophoric dissolved organic matter in Lake Tianmuhu and its catchment basin using excitation-emission matrix fluorescence and parallel factor analysis.

    Science.gov (United States)

    Zhang, Yunlin; Yin, Yan; Feng, Longqing; Zhu, Guangwei; Shi, Zhiqiang; Liu, Xiaohan; Zhang, Yuanzhi

    2011-10-15

    Chromophoric dissolved organic matter (CDOM) is an important optically active substance that transports nutrients, heavy metals, and other pollutants from terrestrial to aquatic systems and is used as a measure of water quality. To investigate how the source and composition of CDOM changes in both space and time, we used chemical, spectroscopic, and fluorescence analyses to characterize CDOM in Lake Tianmuhu (a drinking water source) and its catchment in China. Parallel factor analysis (PARAFAC) identified three individual fluorophore moieties that were attributed to humic-like and protein-like materials in 224 water samples collected between December 2008 and September 2009. The upstream rivers contained significantly higher concentrations of CDOM than did the lake water (a(350) of 4.27±2.51 and 2.32±0.59 m(-1), respectively), indicating that the rivers carried a substantial load of organic matter to the lake. Of the three main rivers that flow into Lake Tianmuhu, the Pingqiao River brought in the most CDOM from the catchment to the lake. CDOM absorption and the microbial and terrestrial humic-like components, but not the protein-like component, were significantly higher in the wet season than in other seasons, indicating that the frequency of rainfall and runoff could significantly impact the quantity and quality of CDOM collected from the catchment. The different relationships between the maximum fluorescence intensities of the three PARAFAC components, CDOM absorption, and chemical oxygen demand (COD) concentration in riverine and lake water indicated the difference in the composition of CDOM between Lake Tianmuhu and the rivers that feed it. This study demonstrates the utility of combining excitation-emission matrix fluorescence and PARAFAC to study CDOM dynamics in inland waters. Copyright © 2011 Elsevier Ltd. All rights reserved.

  12. Exploration Of Deep Learning Algorithms Using Openacc Parallel Programming Model

    KAUST Repository

    Hamam, Alwaleed A.

    2017-03-13

    Deep learning is based on a set of algorithms that attempt to model high level abstractions in data. Specifically, RBM is a deep learning algorithm that used in the project to increase it\\'s time performance using some efficient parallel implementation by OpenACC tool with best possible optimizations on RBM to harness the massively parallel power of NVIDIA GPUs. GPUs development in the last few years has contributed to growing the concept of deep learning. OpenACC is a directive based ap-proach for computing where directives provide compiler hints to accelerate code. The traditional Restricted Boltzmann Ma-chine is a stochastic neural network that essentially perform a binary version of factor analysis. RBM is a useful neural net-work basis for larger modern deep learning model, such as Deep Belief Network. RBM parameters are estimated using an efficient training method that called Contrastive Divergence. Parallel implementation of RBM is available using different models such as OpenMP, and CUDA. But this project has been the first attempt to apply OpenACC model on RBM.

  13. Exploration Of Deep Learning Algorithms Using Openacc Parallel Programming Model

    KAUST Repository

    Hamam, Alwaleed A.; Khan, Ayaz H.

    2017-01-01

    Deep learning is based on a set of algorithms that attempt to model high level abstractions in data. Specifically, RBM is a deep learning algorithm that used in the project to increase it's time performance using some efficient parallel implementation by OpenACC tool with best possible optimizations on RBM to harness the massively parallel power of NVIDIA GPUs. GPUs development in the last few years has contributed to growing the concept of deep learning. OpenACC is a directive based ap-proach for computing where directives provide compiler hints to accelerate code. The traditional Restricted Boltzmann Ma-chine is a stochastic neural network that essentially perform a binary version of factor analysis. RBM is a useful neural net-work basis for larger modern deep learning model, such as Deep Belief Network. RBM parameters are estimated using an efficient training method that called Contrastive Divergence. Parallel implementation of RBM is available using different models such as OpenMP, and CUDA. But this project has been the first attempt to apply OpenACC model on RBM.

  14. Kinematics/statics analysis of a novel serial-parallel robotic arm with hand

    Energy Technology Data Exchange (ETDEWEB)

    Lu, Yi; Dai, Zhuohong; Ye, Nijia; Wang, Peng [Yanshan University, Hebei (China)

    2015-10-15

    A robotic arm with fingered hand generally has multi-functions to complete various complicated operations. A novel serial-parallel robotic arm with a hand is proposed and its kinematics and statics are studied systematically. A 3D prototype of the serial-parallel robotic arm with a hand is constructed and analyzed by simulation. The serial-parallel robotic arm with a hand is composed of an upper 3RPS parallel manipulator, a lower 3SPR parallel manipulator and a hand with three finger mechanisms. Its kinematics formulae for solving the displacement, velocity, acceleration of are derived. Its statics formula for solving the active/constrained forces is derived. Its reachable workspace and orientation workspace are constructed and analyzed. Finally, an analytic example is given for solving the kinematics and statics of the serial-parallel robotic arm with a hand and the analytic solutions are verified by a simulation mechanism.

  15. Kinematics/statics analysis of a novel serial-parallel robotic arm with hand

    International Nuclear Information System (INIS)

    Lu, Yi; Dai, Zhuohong; Ye, Nijia; Wang, Peng

    2015-01-01

    A robotic arm with fingered hand generally has multi-functions to complete various complicated operations. A novel serial-parallel robotic arm with a hand is proposed and its kinematics and statics are studied systematically. A 3D prototype of the serial-parallel robotic arm with a hand is constructed and analyzed by simulation. The serial-parallel robotic arm with a hand is composed of an upper 3RPS parallel manipulator, a lower 3SPR parallel manipulator and a hand with three finger mechanisms. Its kinematics formulae for solving the displacement, velocity, acceleration of are derived. Its statics formula for solving the active/constrained forces is derived. Its reachable workspace and orientation workspace are constructed and analyzed. Finally, an analytic example is given for solving the kinematics and statics of the serial-parallel robotic arm with a hand and the analytic solutions are verified by a simulation mechanism.

  16. Optimisation of a parallel ocean general circulation model

    Science.gov (United States)

    Beare, M. I.; Stevens, D. P.

    1997-10-01

    This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  17. Sensitivity Analysis of the Proximal-Based Parallel Decomposition Methods

    Directory of Open Access Journals (Sweden)

    Feng Ma

    2014-01-01

    Full Text Available The proximal-based parallel decomposition methods were recently proposed to solve structured convex optimization problems. These algorithms are eligible for parallel computation and can be used efficiently for solving large-scale separable problems. In this paper, compared with the previous theoretical results, we show that the range of the involved parameters can be enlarged while the convergence can be still established. Preliminary numerical tests on stable principal component pursuit problem testify to the advantages of the enlargement.

  18. Parallelization of TMVA Machine Learning Algorithms

    CERN Document Server

    Hajili, Mammad

    2017-01-01

    This report reflects my work on Parallelization of TMVA Machine Learning Algorithms integrated to ROOT Data Analysis Framework during summer internship at CERN. The report consists of 4 impor- tant part - data set used in training and validation, algorithms that multiprocessing applied on them, parallelization techniques and re- sults of execution time changes due to number of workers.

  19. High sensitivity and high Q-factor nanoslotted parallel quadrabeam photonic crystal cavity for real-time and label-free sensing

    Energy Technology Data Exchange (ETDEWEB)

    Yang, Daquan [Rowland Institute at Harvard University, Cambridge, Massachusetts 02142 (United States); State Key Laboratory of Information Photonics and Optical Communications, School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876 (China); School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138 (United States); Kita, Shota; Wang, Cheng; Lončar, Marko [School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138 (United States); Liang, Feng; Quan, Qimin [Rowland Institute at Harvard University, Cambridge, Massachusetts 02142 (United States); Tian, Huiping; Ji, Yuefeng [State Key Laboratory of Information Photonics and Optical Communications, School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876 (China)

    2014-08-11

    We experimentally demonstrate a label-free sensor based on nanoslotted parallel quadrabeam photonic crystal cavity (NPQC). The NPQC possesses both high sensitivity and high Q-factor. We achieved sensitivity (S) of 451 nm/refractive index unit and Q-factor >7000 in water at telecom wavelength range, featuring a sensor figure of merit >2000, an order of magnitude improvement over the previous photonic crystal sensors. In addition, we measured the streptavidin-biotin binding affinity and detected 10 ag/mL concentrated streptavidin in the phosphate buffered saline solution.

  20. Finding Tropical Cyclones on a Cloud Computing Cluster: Using Parallel Virtualization for Large-Scale Climate Simulation Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Hasenkamp, Daren; Sim, Alexander; Wehner, Michael; Wu, Kesheng

    2010-09-30

    Extensive computing power has been used to tackle issues such as climate changes, fusion energy, and other pressing scientific challenges. These computations produce a tremendous amount of data; however, many of the data analysis programs currently only run a single processor. In this work, we explore the possibility of using the emerging cloud computing platform to parallelize such sequential data analysis tasks. As a proof of concept, we wrap a program for analyzing trends of tropical cyclones in a set of virtual machines (VMs). This approach allows the user to keep their familiar data analysis environment in the VMs, while we provide the coordination and data transfer services to ensure the necessary input and output are directed to the desired locations. This work extensively exercises the networking capability of the cloud computing systems and has revealed a number of weaknesses in the current cloud system software. In our tests, we are able to scale the parallel data analysis job to a modest number of VMs and achieve a speedup that is comparable to running the same analysis task using MPI. However, compared to MPI based parallelization, the cloud-based approach has a number of advantages. The cloud-based approach is more flexible because the VMs can capture arbitrary software dependencies without requiring the user to rewrite their programs. The cloud-based approach is also more resilient to failure; as long as a single VM is running, it can make progress while as soon as one MPI node fails the whole analysis job fails. In short, this initial work demonstrates that a cloud computing system is a viable platform for distributed scientific data analyses traditionally conducted on dedicated supercomputing systems.

  1. Finding Tropical Cyclones on a Cloud Computing Cluster: Using Parallel Virtualization for Large-Scale Climate Simulation Analysis

    International Nuclear Information System (INIS)

    Hasenkamp, Daren; Sim, Alexander; Wehner, Michael; Wu, Kesheng

    2010-01-01

    Extensive computing power has been used to tackle issues such as climate changes, fusion energy, and other pressing scientific challenges. These computations produce a tremendous amount of data; however, many of the data analysis programs currently only run a single processor. In this work, we explore the possibility of using the emerging cloud computing platform to parallelize such sequential data analysis tasks. As a proof of concept, we wrap a program for analyzing trends of tropical cyclones in a set of virtual machines (VMs). This approach allows the user to keep their familiar data analysis environment in the VMs, while we provide the coordination and data transfer services to ensure the necessary input and output are directed to the desired locations. This work extensively exercises the networking capability of the cloud computing systems and has revealed a number of weaknesses in the current cloud system software. In our tests, we are able to scale the parallel data analysis job to a modest number of VMs and achieve a speedup that is comparable to running the same analysis task using MPI. However, compared to MPI based parallelization, the cloud-based approach has a number of advantages. The cloud-based approach is more flexible because the VMs can capture arbitrary software dependencies without requiring the user to rewrite their programs. The cloud-based approach is also more resilient to failure; as long as a single VM is running, it can make progress while as soon as one MPI node fails the whole analysis job fails. In short, this initial work demonstrates that a cloud computing system is a viable platform for distributed scientific data analyses traditionally conducted on dedicated supercomputing systems.

  2. ADaCGH: A parallelized web-based application and R package for the analysis of aCGH data.

    Directory of Open Access Journals (Sweden)

    Ramón Díaz-Uriarte

    Full Text Available BACKGROUND: Copy number alterations (CNAs in genomic DNA have been associated with complex human diseases, including cancer. One of the most common techniques to detect CNAs is array-based comparative genomic hybridization (aCGH. The availability of aCGH platforms and the need for identification of CNAs has resulted in a wealth of methodological studies. METHODOLOGY/PRINCIPAL FINDINGS: ADaCGH is an R package and a web-based application for the analysis of aCGH data. It implements eight methods for detection of CNAs, gains and losses of genomic DNA, including all of the best performing ones from two recent reviews (CBS, GLAD, CGHseg, HMM. For improved speed, we use parallel computing (via MPI. Additional information (GO terms, PubMed citations, KEGG and Reactome pathways is available for individual genes, and for sets of genes with altered copy numbers. CONCLUSIONS/SIGNIFICANCE: ADACGH represents a qualitative increase in the standards of these types of applications: a all of the best performing algorithms are included, not just one or two; b we do not limit ourselves to providing a thin layer of CGI on top of existing BioConductor packages, but instead carefully use parallelization, examining different schemes, and are able to achieve significant decreases in user waiting time (factors up to 45x; c we have added functionality not currently available in some methods, to adapt to recent recommendations (e.g., merging of segmentation results in wavelet-based and CGHseg algorithms; d we incorporate redundancy, fault-tolerance and checkpointing, which are unique among web-based, parallelized applications; e all of the code is available under open source licenses, allowing to build upon, copy, and adapt our code for other software projects.

  3. 6th International Parallel Tools Workshop

    CERN Document Server

    Brinkmann, Steffen; Gracia, José; Resch, Michael; Nagel, Wolfgang

    2013-01-01

    The latest advances in the High Performance Computing hardware have significantly raised the level of available compute performance. At the same time, the growing hardware capabilities of modern supercomputing architectures have caused an increasing complexity of the parallel application development. Despite numerous efforts to improve and simplify parallel programming, there is still a lot of manual debugging and  tuning work required. This process  is supported by special software tools, facilitating debugging, performance analysis, and optimization and thus  making a major contribution to the development of  robust and efficient parallel software. This book introduces a selection of the tools, which were presented and discussed at the 6th International Parallel Tools Workshop, held in Stuttgart, Germany, 25-26 September 2012.

  4. Energization of Long HVAC Cables in Parallel - Analysis and Estimation Formulas

    DEFF Research Database (Denmark)

    Silva, Filipe Faria Da; Bak, Claus Leth

    2012-01-01

    The installation of long HVAC cables has recently become more common and it tends to increase during the next years. Consequently, the energization of long HVAC cables in parallel is also a more common condition. The energization of HVAC cables in parallel resembles the en-ergization of capacitor...... has several simplifications and does not always provide accurate results. This paper proposes a new formula that can be used for the estimation of these two quantities for two HVAC cables in parallel....

  5. Parallel Programming with Intel Parallel Studio XE

    CERN Document Server

    Blair-Chappell , Stephen

    2012-01-01

    Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the

  6. Instantaneous Kinematics Analysis via Screw-Theory of a Novel 3-CRC Parallel Mechanism

    Directory of Open Access Journals (Sweden)

    Hussein de la Torre

    2016-06-01

    Full Text Available This paper presents the mobility and kinematics analysis of a novel parallel mechanism that is composed by one base, one platform and three identical limbs with CRC joints. The paper obtains closed-form solutions to the direct and inverse kinematics problems, and determines the mobility of the mechanism and instantaneous kinematics by applying screw theory. The obtained results show that this parallel robot is part of the family 2R1T, since the platform shows 3 DOF, i.e.: one translation perpendicular to the base and two rotations about skew axes. In order to calculate the direct instantaneous kinematics, this paper introduces the vector mh, which is part of the joint velocity vector that multiplies the overall inverse Jacobian matrix. This paper compares the results between simulations and numerical examples using Mathematica and SolidWorks in order to prove the accuracy of the analytical results.

  7. Explorations of the implementation of a parallel IDW interpolation algorithm in a Linux cluster-based parallel GIS

    Science.gov (United States)

    Huang, Fang; Liu, Dingsheng; Tan, Xicheng; Wang, Jian; Chen, Yunping; He, Binbin

    2011-04-01

    To design and implement an open-source parallel GIS (OP-GIS) based on a Linux cluster, the parallel inverse distance weighting (IDW) interpolation algorithm has been chosen as an example to explore the working model and the principle of algorithm parallel pattern (APP), one of the parallelization patterns for OP-GIS. Based on an analysis of the serial IDW interpolation algorithm of GRASS GIS, this paper has proposed and designed a specific parallel IDW interpolation algorithm, incorporating both single process, multiple data (SPMD) and master/slave (M/S) programming modes. The main steps of the parallel IDW interpolation algorithm are: (1) the master node packages the related information, and then broadcasts it to the slave nodes; (2) each node calculates its assigned data extent along one row using the serial algorithm; (3) the master node gathers the data from all nodes; and (4) iterations continue until all rows have been processed, after which the results are outputted. According to the experiments performed in the course of this work, the parallel IDW interpolation algorithm can attain an efficiency greater than 0.93 compared with similar algorithms, which indicates that the parallel algorithm can greatly reduce processing time and maximize speed and performance.

  8. Ethnicity Modifies Associations between Cardiovascular Risk Factors and Disease Severity in Parallel Dutch and Singapore Coronary Cohorts.

    Directory of Open Access Journals (Sweden)

    Crystel M Gijsberts

    Full Text Available In 2020 the largest number of patients with coronary artery disease (CAD will be found in Asia. Published epidemiological and clinical reports are overwhelmingly derived from western (White cohorts and data from Asia are scant. We compared CAD severity and all-cause mortality among 4 of the world's most populous ethnicities: Whites, Chinese, Indians and Malays.The UNIted CORoNary cohort (UNICORN simultaneously enrolled parallel populations of consecutive patients undergoing coronary angiography or intervention for suspected CAD in the Netherlands and Singapore. Using multivariable ordinal regression, we investigated the independent association of ethnicity with CAD severity and interactions between risk factors and ethnicity on CAD severity. Also, we compared all-cause mortality among the ethnic groups using multivariable Cox regression analysis.We included 1,759 White, 685 Chinese, 201 Indian and 224 Malay patients undergoing coronary angiography. We found distinct inter-ethnic differences in cardiovascular risk factors. Furthermore, the associations of gender and diabetes with severity of CAD were significantly stronger in Chinese than Whites. Chinese (OR 1.3 [1.1-1.7], p = 0.008 and Malay (OR 1.9 [1.4-2.6], p<0.001 ethnicity were independently associated with more severe CAD as compared to White ethnicity. Strikingly, when stratified for diabetes status, we found a significant association of all three Asian ethnic groups as compared to White ethnicity with more severe CAD among diabetics, but not in non-diabetics. Crude all-cause mortality did not differ, but when adjusted for covariates mortality was higher in Malays than the other ethnic groups.In this population of individuals undergoing coronary angiography, ethnicity is independently associated with the severity of CAD and modifies the strength of association between certain risk factors and CAD severity. Furthermore, mortality differs among ethnic groups. Our data provide insight in

  9. Massive Parallelism of Monte-Carlo Simulation on Low-End Hardware using Graphic Processing Units

    International Nuclear Information System (INIS)

    Mburu, Joe Mwangi; Hah, Chang Joo Hah

    2014-01-01

    Within the past decade, research has been done on utilizing GPU massive parallelization in core simulation with impressive results but unfortunately, not much commercial application has been done in the nuclear field especially in reactor core simulation. The purpose of this paper is to give an introductory concept on the topic and illustrate the potential of exploiting the massive parallel nature of GPU computing on a simple monte-carlo simulation with very minimal hardware specifications. To do a comparative analysis, a simple two dimension monte-carlo simulation is implemented for both the CPU and GPU in order to evaluate performance gain based on the computing devices. The heterogeneous platform utilized in this analysis is done on a slow notebook with only 1GHz processor. The end results are quite surprising whereby high speedups obtained are almost a factor of 10. In this work, we have utilized heterogeneous computing in a GPU-based approach in applying potential high arithmetic intensive calculation. By applying a complex monte-carlo simulation on GPU platform, we have speed up the computational process by almost a factor of 10 based on one million neutrons. This shows how easy, cheap and efficient it is in using GPU in accelerating scientific computing and the results should encourage in exploring further this avenue especially in nuclear reactor physics simulation where deterministic and stochastic calculations are quite favourable in parallelization

  10. Massive Parallelism of Monte-Carlo Simulation on Low-End Hardware using Graphic Processing Units

    Energy Technology Data Exchange (ETDEWEB)

    Mburu, Joe Mwangi; Hah, Chang Joo Hah [KEPCO International Nuclear Graduate School, Ulsan (Korea, Republic of)

    2014-05-15

    Within the past decade, research has been done on utilizing GPU massive parallelization in core simulation with impressive results but unfortunately, not much commercial application has been done in the nuclear field especially in reactor core simulation. The purpose of this paper is to give an introductory concept on the topic and illustrate the potential of exploiting the massive parallel nature of GPU computing on a simple monte-carlo simulation with very minimal hardware specifications. To do a comparative analysis, a simple two dimension monte-carlo simulation is implemented for both the CPU and GPU in order to evaluate performance gain based on the computing devices. The heterogeneous platform utilized in this analysis is done on a slow notebook with only 1GHz processor. The end results are quite surprising whereby high speedups obtained are almost a factor of 10. In this work, we have utilized heterogeneous computing in a GPU-based approach in applying potential high arithmetic intensive calculation. By applying a complex monte-carlo simulation on GPU platform, we have speed up the computational process by almost a factor of 10 based on one million neutrons. This shows how easy, cheap and efficient it is in using GPU in accelerating scientific computing and the results should encourage in exploring further this avenue especially in nuclear reactor physics simulation where deterministic and stochastic calculations are quite favourable in parallelization.

  11. A parallel finite element method for the analysis of crystalline solids

    DEFF Research Database (Denmark)

    Sørensen, N.J.; Andersen, B.S.

    1996-01-01

    A parallel finite element method suitable for the analysis of 3D quasi-static crystal plasticity problems has been developed. The method is based on substructuring of the original mesh into a number of substructures which are treated as isolated finite element models related via the interface...... conditions. The resulting interface equations are solved using a direct solution method. The method shows a good speedup when increasing the number of processors from 1 to 8 and the effective solution of 3D crystal plasticity problems whose size is much too large for a single work station becomes possible....

  12. Optimisation of a parallel ocean general circulation model

    Directory of Open Access Journals (Sweden)

    M. I. Beare

    1997-10-01

    Full Text Available This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  13. Optimisation of a parallel ocean general circulation model

    Directory of Open Access Journals (Sweden)

    M. I. Beare

    Full Text Available This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  14. Mesh Partitioning Algorithm Based on Parallel Finite Element Analysis and Its Actualization

    Directory of Open Access Journals (Sweden)

    Lei Zhang

    2013-01-01

    Full Text Available In parallel computing based on finite element analysis, domain decomposition is a key technique for its preprocessing. Generally, a domain decomposition of a mesh can be realized through partitioning of a graph which is converted from a finite element mesh. This paper discusses the method for graph partitioning and the way to actualize mesh partitioning. Relevant softwares are introduced, and the data structure and key functions of Metis and ParMetis are introduced. The writing, compiling, and testing of the mesh partitioning interface program based on these key functions are performed. The results indicate some objective law and characteristics to guide the users who use the graph partitioning algorithm and software to write PFEM program, and ideal partitioning effects can be achieved by actualizing mesh partitioning through the program. The interface program can also be used directly by the engineering researchers as a module of the PFEM software. So that it can reduce the application of the threshold of graph partitioning algorithm, improve the calculation efficiency, and promote the application of graph theory and parallel computing.

  15. Analysis and Design of High-Order Parallel Resonant Converters

    Science.gov (United States)

    Batarseh, Issa Eid

    1990-01-01

    In this thesis, a special state variable transformation technique has been derived for the analysis of high order dc-to-dc resonant converters. Converters comprised of high order resonant tanks have the advantage of utilizing the parasitic elements by making them part of the resonant tank. A new set of state variables is defined in order to make use of two-dimensional state-plane diagrams in the analysis of high order converters. Such a method has been successfully used for the analysis of the conventional Parallel Resonant Converters (PRC). Consequently, two -dimensional state-plane diagrams are used to analyze the steady state response for third and fourth order PRC's when these converters are operated in the continuous conduction mode. Based on this analysis, a set of control characteristic curves for the LCC-, LLC- and LLCC-type PRC are presented from which various converter design parameters are obtained. Various design curves for component value selections and device ratings are given. This analysis of high order resonant converters shows that the addition of the reactive components to the resonant tank results in converters with better performance characteristics when compared with the conventional second order PRC. Complete design procedure along with design examples for 2nd, 3rd and 4th order converters are presented. Practical power supply units, normally used for computer applications, were built and tested by using the LCC-, LLC- and LLCC-type commutation schemes. In addition, computer simulation results are presented for these converters in order to verify the theoretical results.

  16. Load Balancing of Parallel Monte Carlo Transport Calculations

    International Nuclear Information System (INIS)

    Procassini, R J; O'Brien, M J; Taylor, J M

    2005-01-01

    The performance of parallel Monte Carlo transport calculations which use both spatial and particle parallelism is increased by dynamically assigning processors to the most worked domains. Since he particle work load varies over the course of the simulation, this algorithm determines each cycle if dynamic load balancing would speed up the calculation. If load balancing is required, a small number of particle communications are initiated in order to achieve load balance. This method has decreased the parallel run time by more than a factor of three for certain criticality calculations

  17. Microprocessor event analysis in parallel with Camac data acquisition

    International Nuclear Information System (INIS)

    Cords, D.; Eichler, R.; Riege, H.

    1981-01-01

    The Plessey MIPROC-16 microprocessor (16 bits, 250 ns execution time) has been connected to a Camac System (GEC-ELLIOTT System Crate) and shares the Camac access with a Nord-1OS computer. Interfaces have been designed and tested for execution of Camac cycles, communication with the Nord-1OS computer and DMA-transfer from Camac to the MIPROC-16 memory. The system is used in the JADE data-acquisition-system at PETRA where it receives the data from the detector in parallel with the Nord-1OS computer via DMA through the indirect-data-channel mode. The microprocessor performs an on-line analysis of events and the result of various checks is appended to the event. In case of spurious triggers or clear beam gas events, the Nord-1OS buffer will be reset and the event omitted from further processing. (orig.)

  18. The parallel processing of EGS4 code on distributed memory scalar parallel computer:Intel Paragon XP/S15-256

    Energy Technology Data Exchange (ETDEWEB)

    Takemiya, Hiroshi; Ohta, Hirofumi; Honma, Ichirou

    1996-03-01

    The parallelization of Electro-Magnetic Cascade Monte Carlo Simulation Code, EGS4 on distributed memory scalar parallel computer: Intel Paragon XP/S15-256 is described. EGS4 has the feature that calculation time for one incident particle is quite different from each other because of the dynamic generation of secondary particles and different behavior of each particle. Granularity for parallel processing, parallel programming model and the algorithm of parallel random number generation are discussed and two kinds of method, each of which allocates particles dynamically or statically, are used for the purpose of realizing high speed parallel processing of this code. Among four problems chosen for performance evaluation, the speedup factors for three problems have been attained to nearly 100 times with 128 processor. It has been found that when both the calculation time for each incident particles and its dispersion are large, it is preferable to use dynamic particle allocation method which can average the load for each processor. And it has also been found that when they are small, it is preferable to use static particle allocation method which reduces the communication overhead. Moreover, it is pointed out that to get the result accurately, it is necessary to use double precision variables in EGS4 code. Finally, the workflow of program parallelization is analyzed and tools for program parallelization through the experience of the EGS4 parallelization are discussed. (author).

  19. A parallel algorithm for transient solid dynamics simulations with contact detection

    International Nuclear Information System (INIS)

    Attaway, S.; Hendrickson, B.; Plimpton, S.; Gardner, D.; Vaughan, C.; Heinstein, M.; Peery, J.

    1996-01-01

    Solid dynamics simulations with Lagrangian finite elements are used to model a wide variety of problems, such as the calculation of impact damage to shipping containers for nuclear waste and the analysis of vehicular crashes. Using parallel computers for these simulations has been hindered by the difficulty of searching efficiently for material surface contacts in parallel. A new parallel algorithm for calculation of arbitrary material contacts in finite element simulations has been developed and implemented in the PRONTO3D transient solid dynamics code. This paper will explore some of the issues involved in developing efficient, portable, parallel finite element models for nonlinear transient solid dynamics simulations. The contact-detection problem poses interesting challenges for efficient implementation of a solid dynamics simulation on a parallel computer. The finite element mesh is typically partitioned so that each processor owns a localized region of the finite element mesh. This mesh partitioning is optimal for the finite element portion of the calculation since each processor must communicate only with the few connected neighboring processors that share boundaries with the decomposed mesh. However, contacts can occur between surfaces that may be owned by any two arbitrary processors. Hence, a global search across all processors is required at every time step to search for these contacts. Load-imbalance can become a problem since the finite element decomposition divides the volumetric mesh evenly across processors but typically leaves the surface elements unevenly distributed. In practice, these complications have been limiting factors in the performance and scalability of transient solid dynamics on massively parallel computers. In this paper the authors present a new parallel algorithm for contact detection that overcomes many of these limitations

  20. Parallel replica dynamics method for bistable stochastic reaction networks: Simulation and sensitivity analysis

    Science.gov (United States)

    Wang, Ting; Plecháč, Petr

    2017-12-01

    Stochastic reaction networks that exhibit bistable behavior are common in systems biology, materials science, and catalysis. Sampling of stationary distributions is crucial for understanding and characterizing the long-time dynamics of bistable stochastic dynamical systems. However, simulations are often hindered by the insufficient sampling of rare transitions between the two metastable regions. In this paper, we apply the parallel replica method for a continuous time Markov chain in order to improve sampling of the stationary distribution in bistable stochastic reaction networks. The proposed method uses parallel computing to accelerate the sampling of rare transitions. Furthermore, it can be combined with the path-space information bounds for parametric sensitivity analysis. With the proposed methodology, we study three bistable biological networks: the Schlögl model, the genetic switch network, and the enzymatic futile cycle network. We demonstrate the algorithmic speedup achieved in these numerical benchmarks. More significant acceleration is expected when multi-core or graphics processing unit computer architectures and programming tools such as CUDA are employed.

  1. Parallel replica dynamics method for bistable stochastic reaction networks: Simulation and sensitivity analysis.

    Science.gov (United States)

    Wang, Ting; Plecháč, Petr

    2017-12-21

    Stochastic reaction networks that exhibit bistable behavior are common in systems biology, materials science, and catalysis. Sampling of stationary distributions is crucial for understanding and characterizing the long-time dynamics of bistable stochastic dynamical systems. However, simulations are often hindered by the insufficient sampling of rare transitions between the two metastable regions. In this paper, we apply the parallel replica method for a continuous time Markov chain in order to improve sampling of the stationary distribution in bistable stochastic reaction networks. The proposed method uses parallel computing to accelerate the sampling of rare transitions. Furthermore, it can be combined with the path-space information bounds for parametric sensitivity analysis. With the proposed methodology, we study three bistable biological networks: the Schlögl model, the genetic switch network, and the enzymatic futile cycle network. We demonstrate the algorithmic speedup achieved in these numerical benchmarks. More significant acceleration is expected when multi-core or graphics processing unit computer architectures and programming tools such as CUDA are employed.

  2. High-speed parallel solution of the neutron diffusion equation with the hierarchical domain decomposition boundary element method incorporating parallel communications

    International Nuclear Information System (INIS)

    Tsuji, Masashi; Chiba, Gou

    2000-01-01

    A hierarchical domain decomposition boundary element method (HDD-BEM) for solving the multiregion neutron diffusion equation (NDE) has been fully parallelized, both for numerical computations and for data communications, to accomplish a high parallel efficiency on distributed memory message passing parallel computers. Data exchanges between node processors that are repeated during iteration processes of HDD-BEM are implemented, without any intervention of the host processor that was used to supervise parallel processing in the conventional parallelized HDD-BEM (P-HDD-BEM). Thus, the parallel processing can be executed with only cooperative operations of node processors. The communication overhead was even the dominant time consuming part in the conventional P-HDD-BEM, and the parallelization efficiency decreased steeply with the increase of the number of processors. With the parallel data communication, the efficiency is affected only by the number of boundary elements assigned to decomposed subregions, and the communication overhead can be drastically reduced. This feature can be particularly advantageous in the analysis of three-dimensional problems where a large number of processors are required. The proposed P-HDD-BEM offers a promising solution to the deterioration problem of parallel efficiency and opens a new path to parallel computations of NDEs on distributed memory message passing parallel computers. (author)

  3. Instabilities in parallel channel of forced-convection boiling upflow system, 5

    International Nuclear Information System (INIS)

    Aritomi, Masanori; Aoki, Shigebumi; Inoue, Akira

    1983-01-01

    The density wave instability in a parallel boiling channel system heated electrically has been studied experimentally and analytically by the authors. In our country, the steam generator for LMFBR has been investigated with Power Reactor and Nuclear Fuel Development Corp. as the central figure for its development, and many results of this instability were reported. Their results were different from our ones as regard to the governing factor of the period of flow oscillation in the unstable region and to the effect of the slip ratio on the stability in analysis. A new linear analytical model is proposed in this paper and the analytical results are compared with ones of two-phase analyses based on the same linear method as this model. Subsequently, the effect of the slip ratio on the stability is studied analytically by this model. The parallel boiling channel system is studied experimentally and analytically, using Freon-113 as test fluid heated by hot water as simulation of the SG for LMFBR. The governing factor of the period of flow oscillation is made clear. (author)

  4. Transmission Index Research of Parallel Manipulators Based on Matrix Orthogonal Degree

    Science.gov (United States)

    Shao, Zhu-Feng; Mo, Jiao; Tang, Xiao-Qiang; Wang, Li-Ping

    2017-11-01

    Performance index is the standard of performance evaluation, and is the foundation of both performance analysis and optimal design for the parallel manipulator. Seeking the suitable kinematic indices is always an important and challenging issue for the parallel manipulator. So far, there are extensive studies in this field, but few existing indices can meet all the requirements, such as simple, intuitive, and universal. To solve this problem, the matrix orthogonal degree is adopted, and generalized transmission indices that can evaluate motion/force transmissibility of fully parallel manipulators are proposed. Transmission performance analysis of typical branches, end effectors, and parallel manipulators is given to illustrate proposed indices and analysis methodology. Simulation and analysis results reveal that proposed transmission indices possess significant advantages, such as normalized finite (ranging from 0 to 1), dimensionally homogeneous, frame-free, intuitive and easy to calculate. Besides, proposed indices well indicate the good transmission region and relativity to the singularity with better resolution than the traditional local conditioning index, and provide a novel tool for kinematic analysis and optimal design of fully parallel manipulators.

  5. Performance analysis of a refrigeration system with parallel control of evaporation pressure

    International Nuclear Information System (INIS)

    Lee, Jong Suk

    2008-01-01

    The conventional refrigeration system is composed of a compressor, condenser, receiver, expansion valve or capillary tube, and an evaporator. The refrigeration system used in this study has additional expansion valve and evaporator along with an Evaporation Pressure Regulator(EPR) at the exit side of the evaporator. The two evaporators can be operated at different temperatures according to the opening of the EPR. The experimental results obtained using the refrigeration system with parallel control of evaporation pressure are presented and the performance analysis of the refrigeration system with two evaporators is conducted

  6. Turbo-SMT: Parallel Coupled Sparse Matrix-Tensor Factorizations and Applications

    Science.gov (United States)

    Papalexakis, Evangelos E.; Faloutsos, Christos; Mitchell, Tom M.; Talukdar, Partha Pratim; Sidiropoulos, Nicholas D.; Murphy, Brian

    2016-01-01

    How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like ’edible’, ’fits in hand’)? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem. Can we enhance any CMTF solver, so that it can operate on potentially very large datasets that may not fit in main memory? We introduce Turbo-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, produces sparse and interpretable solutions, and parallelizes any CMTF algorithm, producing sparse and interpretable solutions (up to 65 fold). Additionally, we improve upon ALS, the work-horse algorithm for CMTF, with respect to efficiency and robustness to missing values. We apply Turbo-SMT to BrainQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. Turbo-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy. Finally, we demonstrate the generality of Turbo-SMT, by applying it on a Facebook dataset (users, ’friends’, wall-postings); there, Turbo-SMT spots spammer-like anomalies. PMID:27672406

  7. Microprocessor event analysis in parallel with CAMAC data acquisition

    CERN Document Server

    Cords, D; Riege, H

    1981-01-01

    The Plessey MIPROC-16 microprocessor (16 bits, 250 ns execution time) has been connected to a CAMAC System (GEC-ELLIOTT System Crate) and shares the CAMAC access with a Nord-10S computer. Interfaces have been designed and tested for execution of CAMAC cycles, communication with the Nord-10S computer and DMA-transfer from CAMAC to the MIPROC-16 memory. The system is used in the JADE data-acquisition-system at PETRA where it receives the data from the detector in parallel with the Nord-10S computer via DMA through the indirect-data-channel mode. The microprocessor performs an on-line analysis of events and the results of various checks is appended to the event. In case of spurious triggers or clear beam gas events, the Nord-10S buffer will be reset and the event omitted from further processing. (5 refs).

  8. Parallel computing of physical maps--a comparative study in SIMD and MIMD parallelism.

    Science.gov (United States)

    Bhandarkar, S M; Chirravuri, S; Arnold, J

    1996-01-01

    Ordering clones from a genomic library into physical maps of whole chromosomes presents a central computational problem in genetics. Chromosome reconstruction via clone ordering is usually isomorphic to the NP-complete Optimal Linear Arrangement problem. Parallel SIMD and MIMD algorithms for simulated annealing based on Markov chain distribution are proposed and applied to the problem of chromosome reconstruction via clone ordering. Perturbation methods and problem-specific annealing heuristics are proposed and described. The SIMD algorithms are implemented on a 2048 processor MasPar MP-2 system which is an SIMD 2-D toroidal mesh architecture whereas the MIMD algorithms are implemented on an 8 processor Intel iPSC/860 which is an MIMD hypercube architecture. A comparative analysis of the various SIMD and MIMD algorithms is presented in which the convergence, speedup, and scalability characteristics of the various algorithms are analyzed and discussed. On a fine-grained, massively parallel SIMD architecture with a low synchronization overhead such as the MasPar MP-2, a parallel simulated annealing algorithm based on multiple periodically interacting searches performs the best. For a coarse-grained MIMD architecture with high synchronization overhead such as the Intel iPSC/860, a parallel simulated annealing algorithm based on multiple independent searches yields the best results. In either case, distribution of clonal data across multiple processors is shown to exacerbate the tendency of the parallel simulated annealing algorithm to get trapped in a local optimum.

  9. Conceptual design and kinematic analysis of a novel parallel robot for high-speed pick-and-place operations

    Science.gov (United States)

    Meng, Qizhi; Xie, Fugui; Liu, Xin-Jun

    2018-06-01

    This paper deals with the conceptual design, kinematic analysis and workspace identification of a novel four degrees-of-freedom (DOFs) high-speed spatial parallel robot for pick-and-place operations. The proposed spatial parallel robot consists of a base, four arms and a 1½ mobile platform. The mobile platform is a major innovation that avoids output singularity and offers the advantages of both single and double platforms. To investigate the characteristics of the robot's DOFs, a line graph method based on Grassmann line geometry is adopted in mobility analysis. In addition, the inverse kinematics is derived, and the constraint conditions to identify the correct solution are also provided. On the basis of the proposed concept, the workspace of the robot is identified using a set of presupposed parameters by taking input and output transmission index as the performance evaluation criteria.

  10. Parallel plasma fluid turbulence calculations

    International Nuclear Information System (INIS)

    Leboeuf, J.N.; Carreras, B.A.; Charlton, L.A.; Drake, J.B.; Lynch, V.E.; Newman, D.E.; Sidikman, K.L.; Spong, D.A.

    1994-01-01

    The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center's CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated

  11. Parallelization methods study of thermal-hydraulics codes

    International Nuclear Information System (INIS)

    Gaudart, Catherine

    2000-01-01

    The variety of parallelization methods and machines leads to a wide selection for programmers. In this study we suggest, in an industrial context, some solutions from the experience acquired through different parallelization methods. The study is about several scientific codes which simulate a large variety of thermal-hydraulics phenomena. A bibliography on parallelization methods and a first analysis of the codes showed the difficulty of our process on the whole applications to study. Therefore, it would be necessary to identify and extract a representative part of these applications and parallelization methods. The linear solver part of the codes forced itself. On this particular part several parallelization methods had been used. From these developments one could estimate the necessary work for a non initiate programmer to parallelize his application, and the impact of the development constraints. The different methods of parallelization tested are the numerical library PETSc, the parallelizer PAF, the language HPF, the formalism PEI and the communications library MPI and PYM. In order to test several methods on different applications and to follow the constraint of minimization of the modifications in codes, a tool called SPS (Server of Parallel Solvers) had be developed. We propose to describe the different constraints about the optimization of codes in an industrial context, to present the solutions given by the tool SPS, to show the development of the linear solver part with the tested parallelization methods and lastly to compare the results against the imposed criteria. (author) [fr

  12. Reliability-Based Optimization of Series Systems of Parallel Systems

    DEFF Research Database (Denmark)

    Enevoldsen, I.; Sørensen, John Dalsgaard

    1993-01-01

    Reliability-based design of structural systems is considered. In particular, systems where the reliability model is a series system of parallel systems are treated. A sensitivity analysis for this class of problems is presented. Optimization problems with series systems of parallel systems...... optimization of series systems of parallel systems, but it is also efficient in reliability-based optimization of series systems in general....

  13. SWAMP+: multiple subsequence alignment using associative massive parallelism

    Energy Technology Data Exchange (ETDEWEB)

    Steinfadt, Shannon Irene [Los Alamos National Laboratory; Baker, Johnnie W [KENT STATE UNIV.

    2010-10-18

    A new parallel algorithm SWAMP+ incorporates the Smith-Waterman sequence alignment on an associative parallel model known as ASC. It is a highly sensitive parallel approach that expands traditional pairwise sequence alignment. This is the first parallel algorithm to provide multiple non-overlapping, non-intersecting subsequence alignments with the accuracy of Smith-Waterman. The efficient algorithm provides multiple alignments similar to BLAST while creating a better workflow for the end users. The parallel portions of the code run in O(m+n) time using m processors. When m = n, the algorithmic analysis becomes O(n) with a coefficient of two, yielding a linear speedup. Implementation of the algorithm on the SIMD ClearSpeed CSX620 confirms this theoretical linear speedup with real timings.

  14. Reliability and mass analysis of dynamic power conversion systems with parallel or standby redundancy

    Science.gov (United States)

    Juhasz, Albert J.; Bloomfield, Harvey S.

    1987-01-01

    A combinatorial reliability approach was used to identify potential dynamic power conversion systems for space mission applications. A reliability and mass analysis was also performed, specifically for a 100-kWe nuclear Brayton power conversion system with parallel redundancy. Although this study was done for a reactor outlet temperature of 1100 K, preliminary system mass estimates are also included for reactor outlet temperatures ranging up to 1500 K.

  15. Reliability and mass analysis of dynamic power conversion systems with parallel of standby redundancy

    Science.gov (United States)

    Juhasz, A. J.; Bloomfield, H. S.

    1985-01-01

    A combinatorial reliability approach is used to identify potential dynamic power conversion systems for space mission applications. A reliability and mass analysis is also performed, specifically for a 100 kWe nuclear Brayton power conversion system with parallel redundancy. Although this study is done for a reactor outlet temperature of 1100K, preliminary system mass estimates are also included for reactor outlet temperatures ranging up to 1500 K.

  16. Dynamic Load Balancing of Parallel Monte Carlo Transport Calculations

    International Nuclear Information System (INIS)

    O'Brien, M; Taylor, J; Procassini, R

    2004-01-01

    The performance of parallel Monte Carlo transport calculations which use both spatial and particle parallelism is increased by dynamically assigning processors to the most worked domains. Since the particle work load varies over the course of the simulation, this algorithm determines each cycle if dynamic load balancing would speed up the calculation. If load balancing is required, a small number of particle communications are initiated in order to achieve load balance. This method has decreased the parallel run time by more than a factor of three for certain criticality calculations

  17. Analysis of flow distribution instability in parallel thin rectangular multi-channel system

    Energy Technology Data Exchange (ETDEWEB)

    Xia, G.L. [School of Nuclear Science and Technology, Xi’an Jiaotong University, Xi’an City 710049 (China); Fundamental Science on Nuclear Safety and Simulation Technology Laboratory, Harbin Engineering University, Harbin City 150001 (China); Su, G.H., E-mail: ghsu@mail.xjtu.edu.cn [School of Nuclear Science and Technology, Xi’an Jiaotong University, Xi’an City 710049 (China); Peng, M.J. [Fundamental Science on Nuclear Safety and Simulation Technology Laboratory, Harbin Engineering University, Harbin City 150001 (China)

    2016-08-15

    Highlights: • Flow distribution instability in parallel thin rectangular multi-channel system is studied using RELAP5 codes. • Flow excursion may bring parallel heating channel into the density wave oscillations region. • Flow distribution instability is more likely to happen at low power/flow ratio conditions. • The increase of channel number will not affect the flow distribution instability boundary. • Asymmetry inlet throttling and heating will make system more unstable. - Abstract: The flow distribution instability in parallel thin rectangular multi-channel system has been researched in the present study. The research model of parallel channel system is established by using RELAP5/MOD3.4 codes. The transient process of flow distribution instability is studied at imposed inlet mass flow rate and imposed pressure drop conditions. The influence of heating power, mass flow rate, system pressure and channel number on flow distribution instability are analyzed. Furthermore, the flow distribution instability of parallel two-channel system under asymmetric inlet throttling and heating power is studied. The results show that, if multi-channel system operates at the negative slope region of channel ΔP–G curve, small disturbance in pressure drop will lead to flow redistribution between parallel channels. Flow excursion may bring the operating point of heating channel into the density-wave oscillations region, this will result in out-phase or in-phase flow oscillations. Flow distribution instability is more likely to happen at low power/flow ratio conditions, the stability of parallel channel system increases with system pressure, the channel number has a little effect on system stability, but the asymmetry inlet throttling or heating power will make the system more unstable.

  18. Influences of Device and Circuit Mismatches on Paralleling Silicon Carbide MOSFETs

    DEFF Research Database (Denmark)

    Li, Helong; Munk-Nielsen, Stig; Wang, Xiongfei

    2016-01-01

    This paper addresses the influences of device and circuit mismatches on paralleling the Silicon Carbide (SiC) MOSFETs. Comprehensive theoretical analysis and experimental validation from paralleled discrete devices to paralleled dies in multichip power modules are first presented. Then, the influ......This paper addresses the influences of device and circuit mismatches on paralleling the Silicon Carbide (SiC) MOSFETs. Comprehensive theoretical analysis and experimental validation from paralleled discrete devices to paralleled dies in multichip power modules are first presented. Then......, the influence of circuit mismatch on paralleling SiC MOSFETs is investigated and experimentally evaluated for the first time. It is found that the mismatch of the switching loop stray inductance can also lead to on-state current unbalance with inductive output current, in addition to the on-state resistance...... of the device. It further reveals that circuit mismatches and a current coupling among the paralleled dies exist in a SiC MOSFET multichip power module, which is critical for the transient current distribution in the power module. Thus, a power module layout with an auxiliary source connection is developed...

  19. Particularities of fully-parallel manipulators in 6-DOFs robots design: a review of critical aspects

    Directory of Open Access Journals (Sweden)

    Milica Lucian

    2017-01-01

    Full Text Available A whole range of industrial applications requires the presence of parallel mechanisms with six degrees of freedom (6-DOF which have been developed in the last fifteen years, and one of the reasons why they still are a current topic is that present-day computers are capable of performing real-time motion laws of great complexity associated with these types of parallel mechanisms. The present work underlines particularities of parallel manipulators and their importance in the design of 6-DOF robots. The paper reveals the progress made in the last twenty years in the development of 6-DOF parallel manipulators, which increasingly find a wide scope of applications in different industrial areas such as robotics, manufacture and assisted medicine. It also emphasizes the need to determine singular configurations and the effect of cinematic redundancy which can increase the working space of the manipulators by adding active joints in one or more branches of the manipulator. Throughout the work, there were outlined three types of singularities encountered in the modelling of different types of parallel manipulators, and three types of redundancy. Furthermore, an analysis was made of the dimension of the workspace for a series of parallel manipulators, highlighting a number of factors that influence its size.

  20. Parallel Integer Factorization Using Quadratic Forms

    National Research Council Canada - National Science Library

    McMath, Stephen S

    2005-01-01

    Factorization is important for both practical and theoretical reasons. In secure digital communication, security of the commonly used RSA public key cryptosystem depends on the difficulty of factoring large integers...

  1. Kinematics analysis and simulation of a new underactuated parallel robot

    Directory of Open Access Journals (Sweden)

    Wenxu YAN

    2017-04-01

    Full Text Available The number of degrees of freedom is equal to the number of the traditional robot driving motors, which causes defects such as low efficiency. To overcome that problem, based on the traditional parallel robot, a new underactuated parallel robot is presented. The structure characteristics and working principles of the underactuated parallel robot are analyzed. The forward and inverse solutions are derived by way of space analytic geometry and vector algebra. The kinematics model is established, and MATLAB is implied to verify the accuracy of forward and inverse solutions and identify the optimal work space. The simulation results show that the robot can realize the function of robot switch with three or four degrees of freedom when the number of driving motors is three, improving the efficiency of robot grasping, with the characteristics of large working space, high speed operation, high positioning accuracy, low manufacturing cost and so on, and it will have a wide range of industrial applications.

  2. Practical parallel computing

    CERN Document Server

    Morse, H Stephen

    1994-01-01

    Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi

  3. Parallel rendering

    Science.gov (United States)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  4. Scalability of Parallel Scientific Applications on the Cloud

    Directory of Open Access Journals (Sweden)

    Satish Narayana Srirama

    2011-01-01

    Full Text Available Cloud computing, with its promise of virtually infinite resources, seems to suit well in solving resource greedy scientific computing problems. To study the effects of moving parallel scientific applications onto the cloud, we deployed several benchmark applications like matrix–vector operations and NAS parallel benchmarks, and DOUG (Domain decomposition On Unstructured Grids on the cloud. DOUG is an open source software package for parallel iterative solution of very large sparse systems of linear equations. The detailed analysis of DOUG on the cloud showed that parallel applications benefit a lot and scale reasonable on the cloud. We could also observe the limitations of the cloud and its comparison with cluster in terms of performance. However, for efficiently running the scientific applications on the cloud infrastructure, the applications must be reduced to frameworks that can successfully exploit the cloud resources, like the MapReduce framework. Several iterative and embarrassingly parallel algorithms are reduced to the MapReduce model and their performance is measured and analyzed. The analysis showed that Hadoop MapReduce has significant problems with iterative methods, while it suits well for embarrassingly parallel algorithms. Scientific computing often uses iterative methods to solve large problems. Thus, for scientific computing on the cloud, this paper raises the necessity for better frameworks or optimizations for MapReduce.

  5. Quantitative analysis of titanium-induced artifacts and correlated factors during micro-CT scanning.

    Science.gov (United States)

    Li, Jun Yuan; Pow, Edmond Ho Nang; Zheng, Li Wu; Ma, Li; Kwong, Dora Lai Wan; Cheung, Lim Kwong

    2014-04-01

    To investigate the impact of cover screw, resin embedment, and implant angulation on artifact of microcomputed tomography (micro-CT) scanning for implant. A total of twelve implants were randomly divided into 4 groups: (i) implant only; (ii) implant with cover screw; (iii) implant with resin embedment; and (iv) implants with cover screw and resin embedment. Implants angulation at 0°, 45°, and 90° were scanned by micro-CT. Images were assessed, and the ratio of artifact volume to total volume (AV/TV) was calculated. A multiple regression analysis in stepwise model was used to determine the significance of different factors. One-way ANOVA was performed to identify which combination of factors could minimize the artifact. In the regression analysis, implant angulation was identified as the best predictor for artifact among the factors (P  0.05). Non-embedded implants with the axis parallel to X-ray source of micro-CT produced minimal artifact. Implant angulation and resin embedment affected the artifact volume of micro-CT scanning for implant, while cover screw did not. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  6. Design and analysis of all-dielectric broadband nonpolarizing parallel-plate beam splitters.

    Science.gov (United States)

    Wang, Wenliang; Xiong, Shengming; Zhang, Yundong

    2007-06-01

    Past research on the all-dielectric nonpolarizing beam splitter is reviewed. With the aid of the needle thin-film synthesis method and the conjugate graduate refine method, three different split ratio nonpolarizing parallel-plate beam splitters over a 200 nm spectral range centered at 550 nm with incidence angles of 45 degrees are designed. The chosen materials component and the initial stack are based on the Costich and Thelen theories. The results of design and analysis show that the designs maintain a very low polarization ratio in the working range of the spectrum and has a reasonable angular field.

  7. Parallel computations

    CERN Document Server

    1982-01-01

    Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn

  8. Parallel scalability and efficiency of vortex particle method for aeroelasticity analysis of bluff bodies

    Science.gov (United States)

    Tolba, Khaled Ibrahim; Morgenthal, Guido

    2018-01-01

    This paper presents an analysis of the scalability and efficiency of a simulation framework based on the vortex particle method. The code is applied for the numerical aerodynamic analysis of line-like structures. The numerical code runs on multicore CPU and GPU architectures using OpenCL framework. The focus of this paper is the analysis of the parallel efficiency and scalability of the method being applied to an engineering test case, specifically the aeroelastic response of a long-span bridge girder at the construction stage. The target is to assess the optimal configuration and the required computer architecture, such that it becomes feasible to efficiently utilise the method within the computational resources available for a regular engineering office. The simulations and the scalability analysis are performed on a regular gaming type computer.

  9. Pteros 2.0: Evolution of the fast parallel molecular analysis library for C++ and python.

    Science.gov (United States)

    Yesylevskyy, Semen O

    2015-07-15

    Pteros is the high-performance open-source library for molecular modeling and analysis of molecular dynamics trajectories. Starting from version 2.0 Pteros is available for C++ and Python programming languages with very similar interfaces. This makes it suitable for writing complex reusable programs in C++ and simple interactive scripts in Python alike. New version improves the facilities for asynchronous trajectory reading and parallel execution of analysis tasks by introducing analysis plugins which could be written in either C++ or Python in completely uniform way. The high level of abstraction provided by analysis plugins greatly simplifies prototyping and implementation of complex analysis algorithms. Pteros is available for free under Artistic License from http://sourceforge.net/projects/pteros/. © 2015 Wiley Periodicals, Inc.

  10. Parallel sorting algorithms

    CERN Document Server

    Akl, Selim G

    1985-01-01

    Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the

  11. Massively parallel sequencing of forensic STRs

    DEFF Research Database (Denmark)

    Parson, Walther; Ballard, David; Budowle, Bruce

    2016-01-01

    The DNA Commission of the International Society for Forensic Genetics (ISFG) is reviewing factors that need to be considered ahead of the adoption by the forensic community of short tandem repeat (STR) genotyping by massively parallel sequencing (MPS) technologies. MPS produces sequence data that...

  12. Parallel Factor Analysis as an exploratory tool for wavelet transformed event-related EEG

    DEFF Research Database (Denmark)

    Mørup, Morten; Hansen, Lars Kai; Hermann, Cristoph S.

    2006-01-01

    by the inter-trial phase coherence (ITPC) encompassing ANOVA analysis of differences between conditions and 5-way analysis of channel x frequency x time x subject x condition. A flow chart is presented on how to perform data exploration using the PARAFAC decomposition on multi-way arrays. This includes (A......) channel x frequency x time 3-way arrays of F test values from a repeated measures analysis of variance (ANOVA) between two stimulus conditions; (B) subject-specific 3-way analyses; and (C) an overall 5-way analysis of channel x frequency x time x subject x condition. The PARAFAC decompositions were able...... of the 3-way array of ANOVA F test values clearly showed the difference of regions of interest across modalities, while the 5-way analysis enabled visualization of both quantitative and qualitative differences. Consequently, PARAFAC is a promising data exploratory tool in the analysis of the wavelets...

  13. Signal-to-noise ratio measurement in parallel MRI with subtraction mapping and consecutive methods

    International Nuclear Information System (INIS)

    Imai, Hiroshi; Miyati, Tosiaki; Ogura, Akio; Doi, Tsukasa; Tsuchihashi, Toshio; Machida, Yoshio; Kobayashi, Masato; Shimizu, Kouzou; Kitou, Yoshihiro

    2008-01-01

    When measuring the signal-to-noise ratio (SNR) of an image the used parallel magnetic resonance imaging, it was confirmed that there was a problem in the application of past SNR measurement. With the method of measuring the noise from the background signal, SNR with parallel imaging was higher than that without parallel imaging. In the subtraction method (NEMA standard), which sets a wide region of interest, the white noise was not evaluated correctly although SNR was close to the theoretical value. We proposed two techniques because SNR in parallel imaging was not uniform according to inhomogeneity of the coil sensitivity distribution and geometry factor. Using the first method (subtraction mapping), two images were scanned with identical parameters. The SNR in each pixel divided the running mean (7 by 7 pixels in neighborhood) by standard deviation/√2 in the same region of interest. Using the second (consecutive) method, more than fifty consecutive scans of the uniform phantom were obtained with identical scan parameters. Then the SNR was calculated from the ratio of mean signal intensity to the standard deviation in each pixel on a series of images. Moreover, geometry factors were calculated from SNRs with and without parallel imaging. The SNR and geometry factor using parallel imaging in the subtraction mapping method agreed with those of the consecutive method. Both methods make it possible to obtain a more detailed determination of SNR in parallel imaging and to calculate the geometry factor. (author)

  14. Capacity Analysis for Parallel Runway through Agent-Based Simulation

    Directory of Open Access Journals (Sweden)

    Yang Peng

    2013-01-01

    Full Text Available Parallel runway is the mainstream structure of China hub airport, runway is often the bottleneck of an airport, and the evaluation of its capacity is of great importance to airport management. This study outlines a model, multiagent architecture, implementation approach, and software prototype of a simulation system for evaluating runway capacity. Agent Unified Modeling Language (AUML is applied to illustrate the inbound and departing procedure of planes and design the agent-based model. The model is evaluated experimentally, and the quality is studied in comparison with models, created by SIMMOD and Arena. The results seem to be highly efficient, so the method can be applied to parallel runway capacity evaluation and the model propose favorable flexibility and extensibility.

  15. Parallel Expansions of Sox Transcription Factor Group B Predating the Diversifications of the Arthropods and Jawed Vertebrates

    Science.gov (United States)

    Zhong, Lei; Wang, Dengqiang; Gan, Xiaoni; Yang, Tong; He, Shunping

    2011-01-01

    Group B of the Sox transcription factor family is crucial in embryo development in the insects and vertebrates. Sox group B, unlike the other Sox groups, has an unusually enlarged functional repertoire in insects, but the timing and mechanism of the expansion of this group were unclear. We collected and analyzed data for Sox group B from 36 species of 12 phyla representing the major metazoan clades, with an emphasis on arthropods, to reconstruct the evolutionary history of SoxB in bilaterians and to date the expansion of Sox group B in insects. We found that the genome of the bilaterian last common ancestor probably contained one SoxB1 and one SoxB2 gene only and that tandem duplications of SoxB2 occurred before the arthropod diversification but after the arthropod-nematode divergence, resulting in the basal repertoire of Sox group B in diverse arthropod lineages. The arthropod Sox group B repertoire expanded differently from the vertebrate repertoire, which resulted from genome duplications. The parallel increases in the Sox group B repertoires of the arthropods and vertebrates are consistent with the parallel increases in the complexity and diversification of these two important organismal groups. PMID:21305035

  16. Visual analysis of inter-process communication for large-scale parallel computing.

    Science.gov (United States)

    Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu

    2009-01-01

    In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.

  17. Diffusion tensor tractography of the brainstem pyramidal tract; A study on the optimal reduction factor in parallel imaging

    Energy Technology Data Exchange (ETDEWEB)

    Bae, Yun Jung; Park, Jong Bin; Kim, Jae Hyoung; Choi, Byung Se; Jung, Cheol Kyu [Dept. of of Radiology, Seoul National University College of Medicine, Seoul National University Bundang Hospital, Seongnam (Korea, Republic of)

    2016-08-15

    Parallel imaging mitigates susceptibility artifacts that can adversely affect diffusion tensor tractography (DTT) of the pons depending on the reduction (R) factor. We aimed to find the optimal R factor for DTT of the pons that would allow us to visualize the largest possible number of pyramidal tract fibers. Diffusion tensor imaging was performed on 10 healthy subjects at 3 Tesla based on single-shot echo-planar imaging using the following parameters: b value, 1000 s/mm{sup 2}; gradient direction, 15; voxel size, 2 × 2 × 2 mm{sup 3}; and R factors, 1, 2, 3, 4, and 5. DTT of the right and left pyramidal tracts in the pons was conducted in all subjects. Signal-to-noise ratio (SNR), image distortion, and the number of fibers in the tracts were compared across R factors. SNR, image distortion, and fiber number were significantly different according to R factor. Maximal SNR was achieved with an R factor of 2. Image distortion was minimal with an R factor of 5. The number of visible fibers was greatest with an R factor of 3. R factor 3 is optimal for DTT of the pontine pyramidal tract. A balanced consideration of SNR and image distortion, which do not have the same dependence on the R factor, is necessary for DTT of the pons.

  18. Determining the Number of Factors in P-Technique Factor Analysis

    Science.gov (United States)

    Lo, Lawrence L.; Molenaar, Peter C. M.; Rovine, Michael

    2017-01-01

    Determining the number of factors is a critical first step in exploratory factor analysis. Although various criteria and methods for determining the number of factors have been evaluated in the usual between-subjects R-technique factor analysis, there is still question of how these methods perform in within-subjects P-technique factor analysis. A…

  19. Research in Parallel Algorithms and Software for Computational Aerosciences

    Science.gov (United States)

    Domel, Neal D.

    1996-01-01

    Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.

  20. Parallel computing solution of Boltzmann neutron transport equation

    International Nuclear Information System (INIS)

    Ansah-Narh, T.

    2010-01-01

    The focus of the research was on developing parallel computing algorithm for solving Eigen-values of the Boltzmam Neutron Transport Equation (BNTE) in a slab geometry using multi-grid approach. In response to the problem of slow execution of serial computing when solving large problems, such as BNTE, the study was focused on the design of parallel computing systems which was an evolution of serial computing that used multiple processing elements simultaneously to solve complex physical and mathematical problems. Finite element method (FEM) was used for the spatial discretization scheme, while angular discretization was accomplished by expanding the angular dependence in terms of Legendre polynomials. The eigenvalues representing the multiplication factors in the BNTE were determined by the power method. MATLAB Compiler Version 4.1 (R2009a) was used to compile the MATLAB codes of BNTE. The implemented parallel algorithms were enabled with matlabpool, a Parallel Computing Toolbox function. The option UseParallel was set to 'always' and the default value of the option was 'never'. When those conditions held, the solvers computed estimated gradients in parallel. The parallel computing system was used to handle all the bottlenecks in the matrix generated from the finite element scheme and each domain of the power method generated. The parallel algorithm was implemented on a Symmetric Multi Processor (SMP) cluster machine, which had Intel 32 bit quad-core x 86 processors. Convergence rates and timings for the algorithm on the SMP cluster machine were obtained. Numerical experiments indicated the designed parallel algorithm could reach perfect speedup and had good stability and scalability. (au)

  1. A Beginner’s Guide to Factor Analysis: Focusing on Exploratory Factor Analysis

    Directory of Open Access Journals (Sweden)

    An Gie Yong

    2013-10-01

    Full Text Available The following paper discusses exploratory factor analysis and gives an overview of the statistical technique and how it is used in various research designs and applications. A basic outline of how the technique works and its criteria, including its main assumptions are discussed as well as when it should be used. Mathematical theories are explored to enlighten students on how exploratory factor analysis works, an example of how to run an exploratory factor analysis on SPSS is given, and finally a section on how to write up the results is provided. This will allow readers to develop a better understanding of when to employ factor analysis and how to interpret the tables and graphs in the output.

  2. Detection of Copper (II) and Cadmium (II) binding to dissolved organic matter from macrophyte decomposition by fluorescence excitation-emission matrix spectra combined with parallel factor analysis

    International Nuclear Information System (INIS)

    Yuan, Dong-hai; Guo, Xu-jing; Wen, Li; He, Lian-sheng; Wang, Jing-gang; Li, Jun-qi

    2015-01-01

    Fluorescence excitation-emission matrix (EEM) spectra coupled with parallel factor analysis (PARAFAC) was used to characterize dissolved organic matter (DOM) derived from macrophyte decomposition, and to study its complexation with Cu (II) and Cd (II). Both the protein-like and the humic-like components showed a marked quenching effect by Cu (II). Negligible quenching effects were found for Cd (II) by components 1, 5 and 6. The stability constants and the fraction of the binding fluorophores for humic-like components and Cu (II) can be influenced by macrophyte decomposition of various weight gradients in aquatic plants. Macrophyte decomposition within the scope of the appropriate aquatic phytomass can maximize the stability constant of DOM-metal complexes. A large amount of organic matter was introduced into the aquatic environment by macrophyte decomposition, suggesting that the potential risk of DOM as a carrier of heavy metal contamination in macrophytic lakes should not be ignored. - Highlights: • Macrophyte decomposition increases fluorescent DOM components in the upper sediment. • Protein-like components are quenched or enhanced by adding Cu (II) and Cd (II). • Macrophyte decomposition DOM can impact the affinity of Cu (II) and Cd (II). • The log K M and f values showed a marked change due to macrophyte decomposition. • Macrophyte decomposition can maximize the stability constant of DOM-Cu (II) complexes. - Macrophyte decomposition DOM can influence on the binding affinity of metal ions in macrophytic lakes

  3. Parallel MR imaging.

    Science.gov (United States)

    Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A; Seiberlich, Nicole

    2012-07-01

    Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the undersampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. Copyright © 2012 Wiley Periodicals, Inc.

  4. Development of Industrial High-Speed Transfer Parallel Robot

    International Nuclear Information System (INIS)

    Kim, Byung In; Kyung, Jin Ho; Do, Hyun Min; Jo, Sang Hyun

    2013-01-01

    Parallel robots used in industry require high stiffness or high speed because of their structural characteristics. Nowadays, the importance of rapid transportation has increased in the distribution industry. In this light, an industrial parallel robot has been developed for high-speed transfer. The developed parallel robot can handle a maximum payload of 3 kg. For a payload of 0.1 kg, the trajectory cycle time is 0.3 s (come and go), and the maximum velocity is 4.5 m/s (pick amp, place work, adept cycle). In this motion, its maximum acceleration is very high and reaches approximately 13g. In this paper, the design, analysis, and performance test results of the developed parallel robot system are introduced

  5. Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

    Science.gov (United States)

    Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.

    2017-12-01

    This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.

  6. Performance of a plasma fluid code on the Intel parallel computers

    International Nuclear Information System (INIS)

    Lynch, V.E.; Carreras, B.A.; Drake, J.B.; Leboeuf, J.N.; Liewer, P.

    1992-01-01

    One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel σ machine gives an improvement factor close to 64 over the single-processor CRAY-2

  7. Performance of a plasma fluid code on the Intel parallel computers

    Science.gov (United States)

    Lynch, V. E.; Carreras, B. A.; Drake, J. B.; Leboeuf, J. N.; Liewer, P.

    1992-01-01

    One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel (sigma) machine gives an improvement factor close to 64 over the single-processor CRAY-2.

  8. High-Performance Psychometrics: The Parallel-E Parallel-M Algorithm for Generalized Latent Variable Models. Research Report. ETS RR-16-34

    Science.gov (United States)

    von Davier, Matthias

    2016-01-01

    This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…

  9. High fidelity thermal-hydraulic analysis using CFD and massively parallel computers

    International Nuclear Information System (INIS)

    Weber, D.P.; Wei, T.Y.C.; Brewster, R.A.; Rock, Daniel T.; Rizwan-uddin

    2000-01-01

    Thermal-hydraulic analyses play an important role in design and reload analysis of nuclear power plants. These analyses have historically relied on early generation computational fluid dynamics capabilities, originally developed in the 1960s and 1970s. Over the last twenty years, however, dramatic improvements in both computational fluid dynamics codes in the commercial sector and in computing power have taken place. These developments offer the possibility of performing large scale, high fidelity, core thermal hydraulics analysis. Such analyses will allow a determination of the conservatism employed in traditional design approaches and possibly justify the operation of nuclear power systems at higher powers without compromising safety margins. The objective of this work is to demonstrate such a large scale analysis approach using a state of the art CFD code, STAR-CD, and the computing power of massively parallel computers, provided by IBM. A high fidelity representation of a current generation PWR was analyzed with the STAR-CD CFD code and the results were compared to traditional analyses based on the VIPRE code. Current design methodology typically involves a simplified representation of the assemblies, where a single average pin is used in each assembly to determine the hot assembly from a whole core analysis. After determining this assembly, increased refinement is used in the hot assembly, and possibly some of its neighbors, to refine the analysis for purposes of calculating DNBR. This latter calculation is performed with sub-channel codes such as VIPRE. The modeling simplifications that are used involve the approximate treatment of surrounding assemblies and coarse representation of the hot assembly, where the subchannel is the lowest level of discretization. In the high fidelity analysis performed in this study, both restrictions have been removed. Within the hot assembly, several hundred thousand to several million computational zones have been used, to

  10. Parallel processing of two-dimensional Sn transport calculations

    International Nuclear Information System (INIS)

    Uematsu, M.

    1997-01-01

    A parallel processing method for the two-dimensional S n transport code DOT3.5 has been developed to achieve a drastic reduction in computation time. In the proposed method, parallelization is achieved with angular domain decomposition and/or space domain decomposition. The calculational speed of parallel processing by angular domain decomposition is largely influenced by frequent communications between processing elements. To assess parallelization efficiency, sample problems with up to 32 x 32 spatial meshes were solved with a Sun workstation using the PVM message-passing library. As a result, parallel calculation using 16 processing elements, for example, was found to be nine times as fast as that with one processing element. As for parallel processing by geometry segmentation, the influence of processing element communications on computation time is small; however, discontinuity at the segment boundary degrades convergence speed. To accelerate the convergence, an alternate sweep of angular flux in conjunction with space domain decomposition and a two-step rescaling method consisting of segmentwise rescaling and ordinary pointwise rescaling have been developed. By applying the developed method, the number of iterations needed to obtain a converged flux solution was reduced by a factor of 2. As a result, parallel calculation using 16 processing elements was found to be 5.98 times as fast as the original DOT3.5 calculation

  11. Basic design of parallel computational program for probabilistic structural analysis

    International Nuclear Information System (INIS)

    Kaji, Yoshiyuki; Arai, Taketoshi; Gu, Wenwei; Nakamura, Hitoshi

    1999-06-01

    In our laboratory, for 'development of damage evaluation method of structural brittle materials by microscopic fracture mechanics and probabilistic theory' (nuclear computational science cross-over research) we examine computational method related to super parallel computation system which is coupled with material strength theory based on microscopic fracture mechanics for latent cracks and continuum structural model to develop new structural reliability evaluation methods for ceramic structures. This technical report is the review results regarding probabilistic structural mechanics theory, basic terms of formula and program methods of parallel computation which are related to principal terms in basic design of computational mechanics program. (author)

  12. Basic design of parallel computational program for probabilistic structural analysis

    Energy Technology Data Exchange (ETDEWEB)

    Kaji, Yoshiyuki; Arai, Taketoshi [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment; Gu, Wenwei; Nakamura, Hitoshi

    1999-06-01

    In our laboratory, for `development of damage evaluation method of structural brittle materials by microscopic fracture mechanics and probabilistic theory` (nuclear computational science cross-over research) we examine computational method related to super parallel computation system which is coupled with material strength theory based on microscopic fracture mechanics for latent cracks and continuum structural model to develop new structural reliability evaluation methods for ceramic structures. This technical report is the review results regarding probabilistic structural mechanics theory, basic terms of formula and program methods of parallel computation which are related to principal terms in basic design of computational mechanics program. (author)

  13. Parallelization of the unstructured Navier-stoke solver LILAC for the aero-thermal analysis of a gas-cooled reactor

    International Nuclear Information System (INIS)

    Kim, J. T.; Kim, S. B.; Lee, W. J.

    2004-01-01

    Currently lilac code is under development to analyse thermo-hydraulics of the gas-cooled reactor(GCR) especially high-temperature GCR which is one of the gen IV nuclear reactors. The lilac code was originally developed for the analysis of thermo-hydraulics in a molten pool. And now it is modified to resolve the compressible gas flows in the GCR. The more complexities in the internal flow geometries of the GCR reactor and aero-thermal flows, the number of computational cells are increased and finally exceeds the current computing powers of the desktop computers. To overcome the problem and well resolve the interesting physics in the GCR it is conducted to parallels the lilac code by the decomposition of a computational domain or grid. Some benchmark problems are solved with the parallelized lilac code and its speed-up characteristics by the parallel computation is evaluated and described in the article

  14. A Parallel Software Pipeline for DMET Microarray Genotyping Data Analysis

    Directory of Open Access Journals (Sweden)

    Giuseppe Agapito

    2018-06-01

    Full Text Available Personalized medicine is an aspect of the P4 medicine (predictive, preventive, personalized and participatory based precisely on the customization of all medical characters of each subject. In personalized medicine, the development of medical treatments and drugs is tailored to the individual characteristics and needs of each subject, according to the study of diseases at different scales from genotype to phenotype scale. To make concrete the goal of personalized medicine, it is necessary to employ high-throughput methodologies such as Next Generation Sequencing (NGS, Genome-Wide Association Studies (GWAS, Mass Spectrometry or Microarrays, that are able to investigate a single disease from a broader perspective. A side effect of high-throughput methodologies is the massive amount of data produced for each single experiment, that poses several challenges (e.g., high execution time and required memory to bioinformatic software. Thus a main requirement of modern bioinformatic softwares, is the use of good software engineering methods and efficient programming techniques, able to face those challenges, that include the use of parallel programming and efficient and compact data structures. This paper presents the design and the experimentation of a comprehensive software pipeline, named microPipe, for the preprocessing, annotation and analysis of microarray-based Single Nucleotide Polymorphism (SNP genotyping data. A use case in pharmacogenomics is presented. The main advantages of using microPipe are: the reduction of errors that may happen when trying to make data compatible among different tools; the possibility to analyze in parallel huge datasets; the easy annotation and integration of data. microPipe is available under Creative Commons license, and is freely downloadable for academic and not-for-profit institutions.

  15. Design strategies for irregularly adapting parallel applications

    International Nuclear Information System (INIS)

    Oliker, Leonid; Biswas, Rupak; Shan, Hongzhang; Sing, Jaswinder Pal

    2000-01-01

    Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance of dynamically adapting computations. In this work, we examine two major classes of adaptive applications, under five competing programming methodologies and four leading parallel architectures. Results indicate that it is possible to achieve message-passing performance using shared-memory programming techniques by carefully following the same high level strategies. Adaptive applications have computational work loads and communication patterns which change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications are therefore a challenging task. This work examines the implementation of two typical adaptive applications, Dynamic Remeshing and N-Body, across various programming paradigms and architectural platforms. We compare several critical factors of the parallel code development, including performance, programmability, scalability, algorithmic development, and portability

  16. Parallel workflow for high-throughput (>1,000 samples/day quantitative analysis of human insulin-like growth factor 1 using mass spectrometric immunoassay.

    Directory of Open Access Journals (Sweden)

    Paul E Oran

    Full Text Available Insulin-like growth factor 1 (IGF1 is an important biomarker for the management of growth hormone disorders. Recently there has been rising interest in deploying mass spectrometric (MS methods of detection for measuring IGF1. However, widespread clinical adoption of any MS-based IGF1 assay will require increased throughput and speed to justify the costs of analyses, and robust industrial platforms that are reproducible across laboratories. Presented here is an MS-based quantitative IGF1 assay with performance rating of >1,000 samples/day, and a capability of quantifying IGF1 point mutations and posttranslational modifications. The throughput of the IGF1 mass spectrometric immunoassay (MSIA benefited from a simplified sample preparation step, IGF1 immunocapture in a tip format, and high-throughput MALDI-TOF MS analysis. The Limit of Detection and Limit of Quantification of the resulting assay were 1.5 μg/L and 5 μg/L, respectively, with intra- and inter-assay precision CVs of less than 10%, and good linearity and recovery characteristics. The IGF1 MSIA was benchmarked against commercially available IGF1 ELISA via Bland-Altman method comparison test, resulting in a slight positive bias of 16%. The IGF1 MSIA was employed in an optimized parallel workflow utilizing two pipetting robots and MALDI-TOF-MS instruments synced into one-hour phases of sample preparation, extraction and MSIA pipette tip elution, MS data collection, and data processing. Using this workflow, high-throughput IGF1 quantification of 1,054 human samples was achieved in approximately 9 hours. This rate of assaying is a significant improvement over existing MS-based IGF1 assays, and is on par with that of the enzyme-based immunoassays. Furthermore, a mutation was detected in ∼1% of the samples (SNP: rs17884626, creating an A→T substitution at position 67 of the IGF1, demonstrating the capability of IGF1 MSIA to detect point mutations and posttranslational modifications.

  17. An easy guide to factor analysis

    CERN Document Server

    Kline, Paul

    2014-01-01

    Factor analysis is a statistical technique widely used in psychology and the social sciences. With the advent of powerful computers, factor analysis and other multivariate methods are now available to many more people. An Easy Guide to Factor Analysis presents and explains factor analysis as clearly and simply as possible. The author, Paul Kline, carefully defines all statistical terms and demonstrates step-by-step how to work out a simple example of principal components analysis and rotation. He further explains other methods of factor analysis, including confirmatory and path analysis, a

  18. A SPECT reconstruction method for extending parallel to non-parallel geometries

    International Nuclear Information System (INIS)

    Wen Junhai; Liang Zhengrong

    2010-01-01

    Due to its simplicity, parallel-beam geometry is usually assumed for the development of image reconstruction algorithms. The established reconstruction methodologies are then extended to fan-beam, cone-beam and other non-parallel geometries for practical application. This situation occurs for quantitative SPECT (single photon emission computed tomography) imaging in inverting the attenuated Radon transform. Novikov reported an explicit parallel-beam formula for the inversion of the attenuated Radon transform in 2000. Thereafter, a formula for fan-beam geometry was reported by Bukhgeim and Kazantsev (2002 Preprint N. 99 Sobolev Institute of Mathematics). At the same time, we presented a formula for varying focal-length fan-beam geometry. Sometimes, the reconstruction formula is so implicit that we cannot obtain the explicit reconstruction formula in the non-parallel geometries. In this work, we propose a unified reconstruction framework for extending parallel-beam geometry to any non-parallel geometry using ray-driven techniques. Studies by computer simulations demonstrated the accuracy of the presented unified reconstruction framework for extending parallel-beam to non-parallel geometries in inverting the attenuated Radon transform.

  19. A novel six-degrees-of-freedom series-parallel manipulator

    Energy Technology Data Exchange (ETDEWEB)

    Gallardo-Alvarado, J.; Rodriguez-Castro, R.; Aguilar-Najera, C. R.; Perez-Gonzalez, L. [Instituto Tecnologico de Celaya, Celaya (Mexico)

    2012-06-15

    This paper addresses the description and kinematic analyses of a new non-redundant series-parallel manipulator. The primary feature of the robot is to have a decoupled topology consisting of a lower parallel manipulator, for controlling the orientation of the coupler platform, assembled in series connection with a upper parallel manipulator, for controlling the position of the output platform, capable to provide arbitrary poses to the output platform with respect to the fixed platform. The forward displacement analysis is carried-out in semi-closed form solutions by resorting to simple closure equations. On the other hand; the velocity, acceleration and singularity analyses of the manipulator are approached by means of the theory of screws. Simple and compact expressions are derived here for solving the infinitesimal kinematics by taking advantage of the concept of reciprocal screws. Furthermore, the analysis of the Jacobians of the robot shows that the lower parallel manipulator is practically free of singularities. In order to illustrate the performance of the manipulator, a numerical example which consists of solving the inverse/forward kinematics of the series-parallel manipulator as well as its singular configurations is provided.

  20. The language parallel Pascal and other aspects of the massively parallel processor

    Science.gov (United States)

    Reeves, A. P.; Bruner, J. D.

    1982-01-01

    A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.

  1. Parallel Atomistic Simulations

    Energy Technology Data Exchange (ETDEWEB)

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  2. Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

    Science.gov (United States)

    Hsieh, Shang-Hsien

    1993-01-01

    The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.

  3. Analysis of IDR(s Family of Solvers for Reservoir Simulations on Different Parallel Architectures

    Directory of Open Access Journals (Sweden)

    Seignole Vincent

    2016-09-01

    Full Text Available The present contribution consists in providing a detailed analysis of several realizations of the IDR(s family of solvers, under different facets: robustness, performance and implementation on different parallel environments in regards of sequential IDR(s resolution implementation tested through several industrial geologically and structurally coherent 3D-field case reservoir models. This work is the result of continuous efforts towards time-response improvement of Storengy’s reservoir three-dimensional simulator named Multi, dedicated to gas-storage applications.

  4. Circuit mismatch influence on performance of paralleling silicon carbide MOSFETs

    DEFF Research Database (Denmark)

    Li, Helong; Munk-Nielsen, Stig; Pham, Cam

    2014-01-01

    This paper focuses on circuit mismatch influence on performance of paralleling SiC MOSFETs. Power circuit mismatch and gate driver mismatch influences are analyzed in detail. Simulation and experiment results show the influence of circuit mismatch and verify the analysis. This paper aims to give...... suggestions on paralleling discrete SiC MOSFETs and designing layout of power modules with paralleled SiC MOSFETs dies....

  5. Comparison of multihardware parallel implementations for a phase unwrapping algorithm

    Science.gov (United States)

    Hernandez-Lopez, Francisco Javier; Rivera, Mariano; Salazar-Garibay, Adan; Legarda-Sáenz, Ricardo

    2018-04-01

    Phase unwrapping is an important problem in the areas of optical metrology, synthetic aperture radar (SAR) image analysis, and magnetic resonance imaging (MRI) analysis. These images are becoming larger in size and, particularly, the availability and need for processing of SAR and MRI data have increased significantly with the acquisition of remote sensing data and the popularization of magnetic resonators in clinical diagnosis. Therefore, it is important to develop faster and accurate phase unwrapping algorithms. We propose a parallel multigrid algorithm of a phase unwrapping method named accumulation of residual maps, which builds on a serial algorithm that consists of the minimization of a cost function; minimization achieved by means of a serial Gauss-Seidel kind algorithm. Our algorithm also optimizes the original cost function, but unlike the original work, our algorithm is a parallel Jacobi class with alternated minimizations. This strategy is known as the chessboard type, where red pixels can be updated in parallel at same iteration since they are independent. Similarly, black pixels can be updated in parallel in an alternating iteration. We present parallel implementations of our algorithm for different parallel multicore architecture such as CPU-multicore, Xeon Phi coprocessor, and Nvidia graphics processing unit. In all the cases, we obtain a superior performance of our parallel algorithm when compared with the original serial version. In addition, we present a detailed comparative performance of the developed parallel versions.

  6. Performance of a plasma fluid code on the Intel parallel computers

    International Nuclear Information System (INIS)

    Lynch, V.E.; Carreras, B.A.; Drake, J.B.; Leboeuf, J.N.; Liewer, P.

    1992-01-01

    One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel (sigma) machine gives an improvement factor close to 64 over the single-processor CRAY-2. 12 refs

  7. Forms and factors of peer violence and victimisation

    OpenAIRE

    Dinić Bojana; Sokolovska Valentina; Milovanović Ilija; Oljača Milan

    2014-01-01

    The main aim of this study was to explore the latent structure of violence and victimisation based on the factor analysis of the Peer Violence and Victimisation Questionnaire (PVVQ), as well as to examine the correlates of violence and victimisation. The sample included 649 secondary school students (61.8% male) from the urban area. Besides the PVVQ, the Aggressiveness questionnaire AVDH was administered. Based on parallel analysis, three factors were extra...

  8. Parallel integer sorting with medium and fine-scale parallelism

    Science.gov (United States)

    Dagum, Leonardo

    1993-01-01

    Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.

  9. Accuracy analysis of hybrid parallel robot for the assembling of ITER

    Energy Technology Data Exchange (ETDEWEB)

    Wang Yongbo [Institute of Mechatronics and Virtual Engineering, Lappeenranta University of Technology, Skinnarilankatu 34, 53850 Lappeenranta (Finland); The State Key Laboratory of Mechanical Transmission, Chongqing University (China); Pessi, Pekka [Institute of Mechatronics and Virtual Engineering, Lappeenranta University of Technology, Skinnarilankatu 34, 53850 Lappeenranta (Finland); Wu Huapeng [Institute of Mechatronics and Virtual Engineering, Lappeenranta University of Technology, Skinnarilankatu 34, 53850 Lappeenranta (Finland)], E-mail: huapeng@lut.fi; Handroos, Heikki [Institute of Mechatronics and Virtual Engineering, Lappeenranta University of Technology, Skinnarilankatu 34, 53850 Lappeenranta (Finland)

    2009-06-15

    This paper presents a novel mobile parallel robot, which is able to carry welding and machining processes from inside the international thermonuclear experimental reactor (ITER) vacuum vessel (VV). The kinematics design of the robot has been optimized for ITER access. To improve the accuracy of the parallel robot, the errors caused by the stiffness and manufacture process have to be compensated or limited to a minimum value. In this paper kinematics errors and stiffness modeling are given. The simulation results are presented.

  10. Accuracy analysis of hybrid parallel robot for the assembling of ITER

    International Nuclear Information System (INIS)

    Wang Yongbo; Pessi, Pekka; Wu Huapeng; Handroos, Heikki

    2009-01-01

    This paper presents a novel mobile parallel robot, which is able to carry welding and machining processes from inside the international thermonuclear experimental reactor (ITER) vacuum vessel (VV). The kinematics design of the robot has been optimized for ITER access. To improve the accuracy of the parallel robot, the errors caused by the stiffness and manufacture process have to be compensated or limited to a minimum value. In this paper kinematics errors and stiffness modeling are given. The simulation results are presented.

  11. Xyce Parallel Electronic Simulator Users' Guide Version 6.8

    Energy Technology Data Exchange (ETDEWEB)

    Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Aadithya, Karthik Venkatraman [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Schiek, Richard L. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verley, Jason C. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-10-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase$-$ a message passing parallel implementation $-$ which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

  12. Control rod drop transient analysis with the coupled parallel code pCTF-PARCSv2.7

    International Nuclear Information System (INIS)

    Ramos, Enrique; Roman, Jose E.; Abarca, Agustín; Miró, Rafael; Bermejo, Juan A.

    2016-01-01

    Highlights: • An MPI parallel version of the thermal–hydraulic subchannel code COBRA-TF has been developed. • The parallel code has been coupled to the 3D neutron diffusion code PARCSv2.7. • The new codes are validated with a control rod drop transient. - Abstract: In order to reduce the response time when simulating large reactors in detail, a parallel version of the thermal–hydraulic subchannel code COBRA-TF (CTF) has been developed using the standard Message Passing Interface (MPI). The parallelization is oriented to reactor cells, so it is best suited for models consisting of many cells. The generation of the Jacobian matrix is parallelized, in such a way that each processor is in charge of generating the data associated with a subset of cells. Also, the solution of the linear system of equations is done in parallel, using the PETSc toolkit. With the goal of creating a powerful tool to simulate the reactor core behavior during asymmetrical transients, the 3D neutron diffusion code PARCSv2.7 (PARCS) has been coupled with the parallel version of CTF (pCTF) using the Parallel Virtual Machine (PVM) technology. In order to validate the correctness of the parallel coupled code, a control rod drop transient has been simulated comparing the results with the real experimental measures acquired during an NPP real test.

  13. Hybrid parallel execution model for logic-based specification languages

    CERN Document Server

    Tsai, Jeffrey J P

    2001-01-01

    Parallel processing is a very important technique for improving the performance of various software development and maintenance activities. The purpose of this book is to introduce important techniques for parallel executation of high-level specifications of software systems. These techniques are very useful for the construction, analysis, and transformation of reliable large-scale and complex software systems. Contents: Current Approaches; Overview of the New Approach; FRORL Requirements Specification Language and Its Decomposition; Rewriting and Data Dependency, Control Flow Analysis of a Lo

  14. Program Transformation to Identify List-Based Parallel Skeletons

    Directory of Open Access Journals (Sweden)

    Venkatesh Kannan

    2016-07-01

    Full Text Available Algorithmic skeletons are used as building-blocks to ease the task of parallel programming by abstracting the details of parallel implementation from the developer. Most existing libraries provide implementations of skeletons that are defined over flat data types such as lists or arrays. However, skeleton-based parallel programming is still very challenging as it requires intricate analysis of the underlying algorithm and often uses inefficient intermediate data structures. Further, the algorithmic structure of a given program may not match those of list-based skeletons. In this paper, we present a method to automatically transform any given program to one that is defined over a list and is more likely to contain instances of list-based skeletons. This facilitates the parallel execution of a transformed program using existing implementations of list-based parallel skeletons. Further, by using an existing transformation called distillation in conjunction with our method, we produce transformed programs that contain fewer inefficient intermediate data structures.

  15. CUBESIM, Hypercube and Denelcor Hep Parallel Computer Simulation

    International Nuclear Information System (INIS)

    Dunigan, T.H.

    1988-01-01

    1 - Description of program or function: CUBESIM is a set of subroutine libraries and programs for the simulation of message-passing parallel computers and shared-memory parallel computers. Subroutines are supplied to simulate the Intel hypercube and the Denelcor HEP parallel computers. The system permits a user to develop and test parallel programs written in C or FORTRAN on a single processor. The user may alter such hypercube parameters as message startup times, packet size, and the computation-to-communication ratio. The simulation generates a trace file that can be used for debugging, performance analysis, or graphical display. 2 - Method of solution: The CUBESIM simulator is linked with the user's parallel application routines to run as a single UNIX process. The simulator library provides a small operating system to perform process and message management. 3 - Restrictions on the complexity of the problem: Up to 128 processors can be simulated with a virtual memory limit of 6 million bytes. Up to 1000 processes can be simulated

  16. About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems

    Directory of Open Access Journals (Sweden)

    Loredana MOCEAN

    2009-01-01

    Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.

  17. Parallel computing works!

    CERN Document Server

    Fox, Geoffrey C; Messina, Guiseppe C

    2014-01-01

    A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop

  18. Time complexity analysis for distributed memory computers: implementation of parallel conjugate gradient method

    NARCIS (Netherlands)

    Hoekstra, A.G.; Sloot, P.M.A.; Haan, M.J.; Hertzberger, L.O.; van Leeuwen, J.

    1991-01-01

    New developments in Computer Science, both hardware and software, offer researchers, such as physicists, unprecedented possibilities to solve their computational intensive problems.However, full exploitation of e.g. new massively parallel computers, parallel languages or runtime environments

  19. Vectorization, parallelization and implementation of Quantum molecular dynamics codes (QQQF, MONTEV)

    Energy Technology Data Exchange (ETDEWEB)

    Kato, Kaori [High Energy Accelerator Research Organization, Tsukuba, Ibaraki (Japan); Kunugi, Tomoaki; Kotake, Susumu; Shibahara, Masahiko

    1998-03-01

    This report describes parallelization, vectorization and implementation for two simulation codes, Quantum molecular dynamics simulation code QQQF and Photon montecalro molecular dynamics simulation code MONTEV, that have been developed for the analysis of the thermalization of photon energies in the molecule or materials. QQQF has been vectorized and parallelized on Fujitsu VPP and has been implemented from VPP to Intel Paragon XP/S and parallelized. MONTEV has been implemented from VPP to Paragon and parallelized. (author)

  20. Parallel computers and three-dimensional computational electromagnetics

    International Nuclear Information System (INIS)

    Madsen, N.K.

    1994-01-01

    The authors have continued to enhance their ability to use new massively parallel processing computers to solve time-domain electromagnetic problems. New vectorization techniques have improved the performance of their code DSI3D by factors of 5 to 15, depending on the computer used. New radiation boundary conditions and far-field transformations now allow the computation of radar cross-section values for complex objects. A new parallel-data extraction code has been developed that allows the extraction of data subsets from large problems, which have been run on parallel computers, for subsequent post-processing on workstations with enhanced graphics capabilities. A new charged-particle-pushing version of DSI3D is under development. Finally, DSI3D has become a focal point for several new Cooperative Research and Development Agreement activities with industrial companies such as Lockheed Advanced Development Company, Varian, Hughes Electron Dynamics Division, General Atomic, and Cray

  1. Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

    Directory of Open Access Journals (Sweden)

    Stephen L. Olivier

    2013-01-01

    Full Text Available Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems. Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.

  2. A Linguistic Technique for Marking and Analyzing Syntactic Parallelism.

    Science.gov (United States)

    Sackler, Jessie Brome

    Sentences in rhetoric texts were used in this study to determine a way in which thetorical syntactic parallelism can be analyzed. A tagmemic analysis determined tagmas which were parallel or identical or similar to one another. These were distinguished from tagmas which were identical because of the syntactic constraints of the language…

  3. Parallelization characteristics of the DeCART code

    International Nuclear Information System (INIS)

    Cho, J. Y.; Joo, H. G.; Kim, H. Y.; Lee, C. C.; Chang, M. H.; Zee, S. Q.

    2003-12-01

    domain using MPI. In memory distribution capability, the memory requirement of about 11 GBytes for a simplified SMART core problem is reduced by the factor of about 11 when using 12 processors. Therefore it is concluded that the parallel capability accompanying memory distribution of the DeCART code enables not only to solve a problem efficiently via parallel computing but also to solve huge problems via memory distribution on affordable LINUX clusters, and this parallel execution feature is an important element of DeCART since it increases significantly the practical application of the DeCART code

  4. Compiling Scientific Programs for Scalable Parallel Systems

    National Research Council Canada - National Science Library

    Kennedy, Ken

    2001-01-01

    ...). The research performed in this project included new techniques for recognizing implicit parallelism in sequential programs, a powerful and precise set-based framework for analysis and transformation...

  5. Resonance analysis in parallel voltage-controlled Distributed Generation inverters

    DEFF Research Database (Denmark)

    Wang, Xiongfei; Blaabjerg, Frede; Chen, Zhe

    2013-01-01

    Thanks to the fast responses of the inner voltage and current control loops, the dynamic behaviors of parallel voltage-controlled Distributed Generation (DG) inverters not only relies on the stability of load sharing among them, but subjects to the interactions between the voltage control loops...

  6. A simple and efficient parallel FFT algorithm using the BSP model

    NARCIS (Netherlands)

    Bisseling, R.H.; Inda, M.A.

    2000-01-01

    In this paper we present a new parallel radix FFT algorithm based on the BSP model Our parallel algorithm uses the groupcyclic distribution family which makes it simple to understand and easy to implement We show how to reduce the com munication cost of the algorithm by a factor of three in the case

  7. A time-variant analysis of the 1/f^(2) phase noise in CMOS parallel LC-Tank quadrature oscillators

    DEFF Research Database (Denmark)

    Andreani, Pietro

    2006-01-01

    This paper presents a study of 1/f2 phase noise in quadrature oscillators built by connecting two differential LC-tank oscillators in a parallel fashion. The analysis clearly demonstrates the necessity of adopting a time-variant theory of phase noise, where a more simplistic, time...

  8. Massively Parallel and Scalable Implicit Time Integration Algorithms for Structural Dynamics

    Science.gov (United States)

    Farhat, Charbel

    1997-01-01

    Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because of the following additional facts: (a) explicit schemes are easier to parallelize than implicit ones, and (b) explicit schemes induce short range interprocessor communications that are relatively inexpensive, while the factorization methods used in most implicit schemes induce long range interprocessor communications that often ruin the sought-after speed-up. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet be offset by the speed of the currently available parallel hardware. Therefore, it is essential to develop efficient alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating the low-frequency dynamics of aerospace structures.

  9. Vectorization, parallelization and porting of nuclear codes (porting). Progress report fiscal 1998

    International Nuclear Information System (INIS)

    Nemoto, Toshiyuki; Kawai, Wataru; Ishizuki, Shigeru; Kawasaki, Nobuo; Kume, Etsuo; Adachi, Masaaki; Ogasawara, Shinobu

    2000-03-01

    Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system, the AP3000 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 12 codes in fiscal 1998. These results are reported in 3 parts, i.e., the vectorization and parallelization on vector processors part, the parallelization on scalar processors part and the porting part. In this report, we describe the porting. In this porting part, the porting of Monte Carlo N-Particle Transport code MCNP4B2 and Reactor Safety Analysis code RELAP5 on the AP3000 are described. In the vectorization and parallelization on vector processors part, the vectorization of General Tokamak Circuit Simulation Program code GTCSP, the vectorization and parallelization of Molecular Dynamics Ntv Simulation code MSP2, Eddy Current Analysis code EDDYCAL, Thermal Analysis Code for Test of Passive Cooling System by HENDEL T2 code THANPACST2 and MHD Equilibrium code SELENEJ on the VPP500 are described. In the parallelization on scalar processors part, the parallelization of Monte Carlo N-Particle Transport code MCNP4B2, Plasma Hydrodynamics code using Cubic Interpolated propagation Method PHCIP and Vectorized Monte Carlo code (continuous energy model/multi-group model) MVP/GMVP on the Paragon are described. (author)

  10. Study on MPI/OpenMP hybrid parallelism for Monte Carlo neutron transport code

    International Nuclear Information System (INIS)

    Liang Jingang; Xu Qi; Wang Kan; Liu Shiwen

    2013-01-01

    Parallel programming with mixed mode of messages-passing and shared-memory has several advantages when used in Monte Carlo neutron transport code, such as fitting hardware of distributed-shared clusters, economizing memory demand of Monte Carlo transport, improving parallel performance, and so on. MPI/OpenMP hybrid parallelism was implemented based on a one dimension Monte Carlo neutron transport code. Some critical factors affecting the parallel performance were analyzed and solutions were proposed for several problems such as contention access, lock contention and false sharing. After optimization the code was tested finally. It is shown that the hybrid parallel code can reach good performance just as pure MPI parallel program, while it saves a lot of memory usage at the same time. Therefore hybrid parallel is efficient for achieving large-scale parallel of Monte Carlo neutron transport. (authors)

  11. Parallel, Rapid Diffuse Optical Tomography of Breast

    National Research Council Canada - National Science Library

    Yodh, Arjun

    2001-01-01

    During the last year we have experimentally and computationally investigated rapid acquisition and analysis of informationally dense diffuse optical data sets in the parallel plate compressed breast geometry...

  12. Parallel, Rapid Diffuse Optical Tomography of Breast

    National Research Council Canada - National Science Library

    Yodh, Arjun

    2002-01-01

    During the last year we have experimentally and computationally investigated rapid acquisition and analysis of informationally dense diffuse optical data sets in the parallel plate compressed breast geometry...

  13. Factor analysis

    CERN Document Server

    Gorsuch, Richard L

    2013-01-01

    Comprehensive and comprehensible, this classic covers the basic and advanced topics essential for using factor analysis as a scientific tool in psychology, education, sociology, and related areas. Emphasizing the usefulness of the techniques, it presents sufficient mathematical background for understanding and sufficient discussion of applications for effective use. This includes not only theory but also the empirical evaluations of the importance of mathematical distinctions for applied scientific analysis.

  14. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-08-12

    Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  15. Parallel finite elements with domain decomposition and its pre-processing

    International Nuclear Information System (INIS)

    Yoshida, A.; Yagawa, G.; Hamada, S.

    1993-01-01

    This paper describes a parallel finite element analysis using a domain decomposition method, and the pre-processing for the parallel calculation. Computer simulations are about to replace experiments in various fields, and the scale of model to be simulated tends to be extremely large. On the other hand, computational environment has drastically changed in these years. Especially, parallel processing on massively parallel computers or computer networks is considered to be promising techniques. In order to achieve high efficiency on such parallel computation environment, large granularity of tasks, a well-balanced workload distribution are key issues. It is also important to reduce the cost of pre-processing in such parallel FEM. From the point of view, the authors developed the domain decomposition FEM with the automatic and dynamic task-allocation mechanism and the automatic mesh generation/domain subdivision system for it. (author)

  16. Parallel magnetic resonance imaging as approximation in a reproducing kernel Hilbert space

    International Nuclear Information System (INIS)

    Athalye, Vivek; Lustig, Michael; Martin Uecker

    2015-01-01

    In magnetic resonance imaging data samples are collected in the spatial frequency domain (k-space), typically by time-consuming line-by-line scanning on a Cartesian grid. Scans can be accelerated by simultaneous acquisition of data using multiple receivers (parallel imaging), and by using more efficient non-Cartesian sampling schemes. To understand and design k-space sampling patterns, a theoretical framework is needed to analyze how well arbitrary sampling patterns reconstruct unsampled k-space using receive coil information. As shown here, reconstruction from samples at arbitrary locations can be understood as approximation of vector-valued functions from the acquired samples and formulated using a reproducing kernel Hilbert space with a matrix-valued kernel defined by the spatial sensitivities of the receive coils. This establishes a formal connection between approximation theory and parallel imaging. Theoretical tools from approximation theory can then be used to understand reconstruction in k-space and to extend the analysis of the effects of samples selection beyond the traditional image-domain g-factor noise analysis to both noise amplification and approximation errors in k-space. This is demonstrated with numerical examples. (paper)

  17. Xyce™ Parallel Electronic Simulator Users' Guide, Version 6.5.

    Energy Technology Data Exchange (ETDEWEB)

    Keiter, Eric R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Aadithya, Karthik V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Mei, Ting [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Russo, Thomas V. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Schiek, Richard L. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Sholander, Peter E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Thornquist, Heidi K. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation; Verley, Jason C. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Electrical Models and Simulation

    2016-06-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The information herein is subject to change without notice. Copyright © 2002-2016 Sandia Corporation. All rights reserved.

  18. Exploratory Bi-factor Analysis: The Oblique Case

    OpenAIRE

    Jennrich, Robert L.; Bentler, Peter M.

    2011-01-01

    Bi-factor analysis is a form of confirmatory factor analysis originally introduced by Holzinger and Swineford (1937). The bi-factor model has a general factor, a number of group factors, and an explicit bi-factor structure. Jennrich and Bentler (2011) introduced an exploratory form of bi-factor analysis that does not require one to provide an explicit bi-factor structure a priori. They use exploratory factor analysis and a bi-factor rotation criterion designed to produce a rotated loading mat...

  19. Parallel GPU implementation of iterative PCA algorithms.

    Science.gov (United States)

    Andrecut, M

    2009-11-01

    Principal component analysis (PCA) is a key statistical technique for multivariate data analysis. For large data sets, the common approach to PCA computation is based on the standard NIPALS-PCA algorithm, which unfortunately suffers from loss of orthogonality, and therefore its applicability is usually limited to the estimation of the first few components. Here we present an algorithm based on Gram-Schmidt orthogonalization (called GS-PCA), which eliminates this shortcoming of NIPALS-PCA. Also, we discuss the GPU (Graphics Processing Unit) parallel implementation of both NIPALS-PCA and GS-PCA algorithms. The numerical results show that the GPU parallel optimized versions, based on CUBLAS (NVIDIA), are substantially faster (up to 12 times) than the CPU optimized versions based on CBLAS (GNU Scientific Library).

  20. Convergence analysis of a class of massively parallel direction splitting algorithms for the Navier-Stokes equations in simple domains

    KAUST Repository

    Guermond, Jean-Luc; Minev, Peter D.; Salgado, Abner J.

    2012-01-01

    We provide a convergence analysis for a new fractional timestepping technique for the incompressible Navier-Stokes equations based on direction splitting. This new technique is of linear complexity, unconditionally stable and convergent, and suitable for massive parallelization. © 2012 American Mathematical Society.

  1. The effect of earthquake on architecture geometry with non-parallel system irregularity configuration

    Science.gov (United States)

    Teddy, Livian; Hardiman, Gagoek; Nuroji; Tudjono, Sri

    2017-12-01

    Indonesia is an area prone to earthquake that may cause casualties and damage to buildings. The fatalities or the injured are not largely caused by the earthquake, but by building collapse. The collapse of the building is resulted from the building behaviour against the earthquake, and it depends on many factors, such as architectural design, geometry configuration of structural elements in horizontal and vertical plans, earthquake zone, geographical location (distance to earthquake center), soil type, material quality, and construction quality. One of the geometry configurations that may lead to the collapse of the building is irregular configuration of non-parallel system. In accordance with FEMA-451B, irregular configuration in non-parallel system is defined to have existed if the vertical lateral force-retaining elements are neither parallel nor symmetric with main orthogonal axes of the earthquake-retaining axis system. Such configuration may lead to torque, diagonal translation and local damage to buildings. It does not mean that non-parallel irregular configuration should not be formed on architectural design; however the designer must know the consequence of earthquake behaviour against buildings with irregular configuration of non-parallel system. The present research has the objective to identify earthquake behaviour in architectural geometry with irregular configuration of non-parallel system. The present research was quantitative with simulation experimental method. It consisted of 5 models, where architectural data and model structure data were inputted and analyzed using the software SAP2000 in order to find out its performance, and ETAB2015 to determine the eccentricity occurred. The output of the software analysis was tabulated, graphed, compared and analyzed with relevant theories. For areas of strong earthquake zones, avoid designing buildings which wholly form irregular configuration of non-parallel system. If it is inevitable to design a

  2. Parallel Breadth-First Search on Distributed Memory Systems

    Energy Technology Data Exchange (ETDEWEB)

    Computational Research Division; Buluc, Aydin; Madduri, Kamesh

    2011-04-15

    Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms. We present two highly-tuned par- allel approaches for BFS on large parallel systems: a level-synchronous strategy that relies on a simple vertex-based partitioning of the graph, and a two-dimensional sparse matrix- partitioning-based approach that mitigates parallel commu- nication overhead. For both approaches, we also present hybrid versions with intra-node multithreading. Our novel hybrid two-dimensional algorithm reduces communication times by up to a factor of 3.5, relative to a common vertex based approach. Our experimental study identifies execu- tion regimes in which these approaches will be competitive, and we demonstrate extremely high performance on lead- ing distributed-memory parallel systems. For instance, for a 40,000-core parallel execution on Hopper, an AMD Magny- Cours based system, we achieve a BFS performance rate of 17.8 billion edge visits per second on an undirected graph of 4.3 billion vertices and 68.7 billion edges with skewed degree distribution.

  3. Comparing calibration methods of electron beams using plane-parallel chambers with absorbed-dose to water based protocols

    International Nuclear Information System (INIS)

    Stewart, K.J.; Seuntjens, J.P.

    2002-01-01

    Recent absorbed-dose-based protocols allow for two methods of calibrating electron beams using plane-parallel chambers, one using the N D,w Co for a plane-parallel chamber, and the other relying on cross-calibration of the plane-parallel chamber in a high-energy electron beam against a cylindrical chamber which has an N D,w Co factor. The second method is recommended as it avoids problems associated with the P wall correction factors at 60 Co for plane-parallel chambers which are used in the determination of the beam quality conversion factors. In this article we investigate the consistency of these two methods for the PTW Roos, Scanditronics NACP02, and PTW Markus chambers. We processed our data using both the AAPM TG-51 and the IAEA TRS-398 protocols. Wall correction factors in 60 Co beams and absorbed-dose beam quality conversion factors for 20 MeV electrons were derived for these chambers by cross-calibration against a cylindrical ionization chamber. Systematic differences of up to 1.6% were found between our values of P wall and those from the Monte Carlo calculations underlying AAPM TG-51, and up to 0.6% when comparing with the IAEA TRS-398 protocol. The differences in P wall translate directly into differences in the beam quality conversion factors in the respective protocols. The relatively large spread in the experimental data of P wall , and consequently the absorbed-dose beam quality conversion factor, confirms the importance of the cross-calibration technique when using plane-parallel chambers for calibrating clinical electron beams. We confirmed that for well-guarded plane-parallel chambers, the fluence perturbation correction factor at d max is not significantly different from the value at d ref . For the PTW Markus chamber the variation in the latter factor is consistent with published fits relating it to average energy at depth

  4. Parallel phase model : a programming model for high-end parallel machines with manycores.

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Junfeng (Syracuse University, Syracuse, NY); Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian

    2009-04-01

    This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.

  5. Systematic approach for deriving feasible mappings of parallel algorithms to parallel computing platforms

    NARCIS (Netherlands)

    Arkin, Ethem; Tekinerdogan, Bedir; Imre, Kayhan M.

    2017-01-01

    The need for high-performance computing together with the increasing trend from single processor to parallel computer architectures has leveraged the adoption of parallel computing. To benefit from parallel computing power, usually parallel algorithms are defined that can be mapped and executed

  6. A MapReduce-Based Parallel Frequent Pattern Growth Algorithm for Spatiotemporal Association Analysis of Mobile Trajectory Big Data

    Directory of Open Access Journals (Sweden)

    Dawen Xia

    2018-01-01

    Full Text Available Frequent pattern mining is an effective approach for spatiotemporal association analysis of mobile trajectory big data in data-driven intelligent transportation systems. While existing parallel algorithms have been successfully applied to frequent pattern mining of large-scale trajectory data, two major challenges are how to overcome the inherent defects of Hadoop to cope with taxi trajectory big data including massive small files and how to discover the implicitly spatiotemporal frequent patterns with MapReduce. To conquer these challenges, this paper presents a MapReduce-based Parallel Frequent Pattern growth (MR-PFP algorithm to analyze the spatiotemporal characteristics of taxi operating using large-scale taxi trajectories with massive small file processing strategies on a Hadoop platform. More specifically, we first implement three methods, that is, Hadoop Archives (HAR, CombineFileInputFormat (CFIF, and Sequence Files (SF, to overcome the existing defects of Hadoop and then propose two strategies based on their performance evaluations. Next, we incorporate SF into Frequent Pattern growth (FP-growth algorithm and then implement the optimized FP-growth algorithm on a MapReduce framework. Finally, we analyze the characteristics of taxi operating in both spatial and temporal dimensions by MR-PFP in parallel. The results demonstrate that MR-PFP is superior to existing Parallel FP-growth (PFP algorithm in efficiency and scalability.

  7. Parallel Algorithms for Switching Edges in Heterogeneous Graphs.

    Science.gov (United States)

    Bhuiyan, Hasanuzzaman; Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav

    2017-06-01

    An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors.

  8. Models of parallel computation :a survey and classification

    Institute of Scientific and Technical Information of China (English)

    ZHANG Yunquan; CHEN Guoliang; SUN Guangzhong; MIAO Qiankun

    2007-01-01

    In this paper,the state-of-the-art parallel computational model research is reviewed.We will introduce various models that were developed during the past decades.According to their targeting architecture features,especially memory organization,we classify these parallel computational models into three generations.These models and their characteristics are discussed based on three generations classification.We believe that with the ever increasing speed gap between the CPU and memory systems,incorporating non-uniform memory hierarchy into computational models will become unavoidable.With the emergence of multi-core CPUs,the parallelism hierarchy of current computing platforms becomes more and more complicated.Describing this complicated parallelism hierarchy in future computational models becomes more and more important.A semi-automatic toolkit that can extract model parameters and their values on real computers can reduce the model analysis complexity,thus allowing more complicated models with more parameters to be adopted.Hierarchical memory and hierarchical parallelism will be two very important features that should be considered in future model design and research.

  9. Parallel peak pruning for scalable SMP contour tree computation

    Energy Technology Data Exchange (ETDEWEB)

    Carr, Hamish A. [Univ. of Leeds (United Kingdom); Weber, Gunther H. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Univ. of California, Davis, CA (United States); Sewell, Christopher M. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Ahrens, James P. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2017-03-09

    As data sets grow to exascale, automated data analysis and visualisation are increasingly important, to intermediate human understanding and to reduce demands on disk storage via in situ analysis. Trends in architecture of high performance computing systems necessitate analysis algorithms to make effective use of combinations of massively multicore and distributed systems. One of the principal analytic tools is the contour tree, which analyses relationships between contours to identify features of more than local importance. Unfortunately, the predominant algorithms for computing the contour tree are explicitly serial, and founded on serial metaphors, which has limited the scalability of this form of analysis. While there is some work on distributed contour tree computation, and separately on hybrid GPU-CPU computation, there is no efficient algorithm with strong formal guarantees on performance allied with fast practical performance. Here in this paper, we report the first shared SMP algorithm for fully parallel contour tree computation, withfor-mal guarantees of O(lgnlgt) parallel steps and O(n lgn) work, and implementations with up to 10x parallel speed up in OpenMP and up to 50x speed up in NVIDIA Thrust.

  10. PARALLEL ADAPTIVE MULTILEVEL SAMPLING ALGORITHMS FOR THE BAYESIAN ANALYSIS OF MATHEMATICAL MODELS

    KAUST Repository

    Prudencio, Ernesto; Cheung, Sai Hung

    2012-01-01

    In recent years, Bayesian model updating techniques based on measured data have been applied to many engineering and applied science problems. At the same time, parallel computational platforms are becoming increasingly more powerful and are being used more frequently by the engineering and scientific communities. Bayesian techniques usually require the evaluation of multi-dimensional integrals related to the posterior probability density function (PDF) of uncertain model parameters. The fact that such integrals cannot be computed analytically motivates the research of stochastic simulation methods for sampling posterior PDFs. One such algorithm is the adaptive multilevel stochastic simulation algorithm (AMSSA). In this paper we discuss the parallelization of AMSSA, formulating the necessary load balancing step as a binary integer programming problem. We present a variety of results showing the effectiveness of load balancing on the overall performance of AMSSA in a parallel computational environment.

  11. Parallel adaptation of a vectorised quantumchemical program system

    International Nuclear Information System (INIS)

    Van Corler, L.C.H.; Van Lenthe, J.H.

    1987-01-01

    Supercomputers, like the CRAY 1 or the Cyber 205, have had, and still have, a marked influence on Quantum Chemistry. Vectorization has led to a considerable increase in the performance of Quantum Chemistry programs. However, clockcycle times more than a factor 10 smaller than those of the present supercomputers are not to be expected. Therefore future supercomputers will have to depend on parallel structures. Recently, the first examples of such supercomputers have been installed. To be prepared for this new generation of (parallel) supercomputers one should consider the concepts one wants to use and the kind of problems one will encounter during implementation of existing vectorized programs on those parallel systems. The authors implemented four important parts of a large quantumchemical program system (ATMOL), i.e. integrals, SCF, 4-index and Direct-CI in the parallel environment at ECSEC (Rome, Italy). This system offers simulated parallellism on the host computer (IBM 4381) and real parallellism on at most 10 attached processors (FPS-164). Quantumchemical programs usually handle large amounts of data and very large, often sparse matrices. The transfer of that many data can cause problems concerning communication and overhead, in view of which shared memory and shared disks must be considered. The strategy and the tools that were used to parallellise the programs are shown. Also, some examples are presented to illustrate effectiveness and performance of the system in Rome for these type of calculations

  12. Parallel algorithms

    CERN Document Server

    Casanova, Henri; Robert, Yves

    2008-01-01

    ""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi

  13. Factors affecting construction performance: exploratory factor analysis

    Science.gov (United States)

    Soewin, E.; Chinda, T.

    2018-04-01

    The present work attempts to develop a multidimensional performance evaluation framework for a construction company by considering all relevant measures of performance. Based on the previous studies, this study hypothesizes nine key factors, with a total of 57 associated items. The hypothesized factors, with their associated items, are then used to develop questionnaire survey to gather data. The exploratory factor analysis (EFA) was applied to the collected data which gave rise 10 factors with 57 items affecting construction performance. The findings further reveal that the items constituting ten key performance factors (KPIs) namely; 1) Time, 2) Cost, 3) Quality, 4) Safety & Health, 5) Internal Stakeholder, 6) External Stakeholder, 7) Client Satisfaction, 8) Financial Performance, 9) Environment, and 10) Information, Technology & Innovation. The analysis helps to develop multi-dimensional performance evaluation framework for an effective measurement of the construction performance. The 10 key performance factors can be broadly categorized into economic aspect, social aspect, environmental aspect, and technology aspects. It is important to understand a multi-dimension performance evaluation framework by including all key factors affecting the construction performance of a company, so that the management level can effectively plan to implement an effective performance development plan to match with the mission and vision of the company.

  14. Parallel translation in warped product spaces: application to the Reissner-Nordstroem spacetime

    International Nuclear Information System (INIS)

    Raposo, A P; Del Riego, L

    2005-01-01

    A formal treatment of the parallel translation transformations in warped product manifolds is presented and related to those parallel translation transformations in each of the factor manifolds. A straightforward application to the Schwarzschild and Reissner-Nordstroem geometries, considered here as particular examples, explains some apparently surprising properties of the holonomy in these manifolds

  15. Xyce parallel electronic simulator users guide, version 6.0.

    Energy Technology Data Exchange (ETDEWEB)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.

    2013-08-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

  16. Xyce parallel electronic simulator users' guide, Version 6.0.1.

    Energy Technology Data Exchange (ETDEWEB)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Warrender, Christina E.; Baur, David Gregory.

    2014-01-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandias needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase a message passing parallel implementation which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

  17. Xyce parallel electronic simulator users guide, version 6.1

    Energy Technology Data Exchange (ETDEWEB)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory

    2014-03-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

  18. A factor analysis to detect factors influencing building national brand

    Directory of Open Access Journals (Sweden)

    Naser Azad

    Full Text Available Developing a national brand is one of the most important issues for development of a brand. In this study, we present factor analysis to detect the most important factors in building a national brand. The proposed study uses factor analysis to extract the most influencing factors and the sample size has been chosen from two major auto makers in Iran called Iran Khodro and Saipa. The questionnaire was designed in Likert scale and distributed among 235 experts. Cronbach alpha is calculated as 84%, which is well above the minimum desirable limit of 0.70. The implementation of factor analysis provides six factors including “cultural image of customers”, “exciting characteristics”, “competitive pricing strategies”, “perception image” and “previous perceptions”.

  19. Layout design and energetic analysis of a complex diesel parallel hybrid electric vehicle

    International Nuclear Information System (INIS)

    Finesso, Roberto; Spessa, Ezio; Venditti, Mattia

    2014-01-01

    Highlights: • Layout design, energetic and cost analysis of complex parallel hybrid vehicles. • Development of global and real-time optimizers for control strategy identification. • Rule-based control strategies to minimize fuel consumption and NO x . • Energy share across each working mode for battery and thermal engine. - Abstract: The present paper is focused on the design, optimization and analysis of a complex parallel hybrid electric vehicle, equipped with two electric machines on both the front and rear axles, and on the evaluation of its potential to reduce fuel consumption and NO x emissions over several driving missions. The vehicle has been compared with two conventional parallel hybrid vehicles, equipped with a single electric machine on the front axle or on the rear axle, as well as with a conventional vehicle. All the vehicles have been equipped with compression ignition engines. The optimal layout of each vehicle was identified on the basis of the minimization of the overall powertrain costs during the whole vehicle life. These costs include the initial investment due to the production of the components as well as the operating costs related to fuel consumption and to battery depletion. Identification of the optimal powertrain control strategy, in terms of the management of the power flows of the engine and electric machines, and of gear selection, is necessary in order to be able to fully exploit the potential of the hybrid architecture. To this end, two global optimizers, one of a deterministic nature and another of a stochastic type, and two real-time optimizers have been developed, applied and compared. A new mathematical technique has been developed and applied to the vehicle simulation model in order to decrease the computational time of the optimizers. First, the vehicle model equations were written in order to allow a coarse time grid to be used, then, the control variables (i.e., power flow and gear number) were discretized, and the

  20. Parallelization of MCNP 4, a Monte Carlo neutron and photon transport code system, in highly parallel distributed memory type computer

    International Nuclear Information System (INIS)

    Masukawa, Fumihiro; Takano, Makoto; Naito, Yoshitaka; Yamazaki, Takao; Fujisaki, Masahide; Suzuki, Koichiro; Okuda, Motoi.

    1993-11-01

    In order to improve the accuracy and calculating speed of shielding analyses, MCNP 4, a Monte Carlo neutron and photon transport code system, has been parallelized and measured of its efficiency in the highly parallel distributed memory type computer, AP1000. The code has been analyzed statically and dynamically, then the suitable algorithm for parallelization has been determined for the shielding analysis functions of MCNP 4. This includes a strategy where a new history is assigned to the idling processor element dynamically during the execution. Furthermore, to avoid the congestion of communicative processing, the batch concept, processing multi-histories by a unit, has been introduced. By analyzing a sample cask problem with 2,000,000 histories by the AP1000 with 512 processor elements, the 82 % of parallelization efficiency is achieved, and the calculational speed has been estimated to be around 50 times as fast as that of FACOM M-780. (author)

  1. AZTEC: A parallel iterative package for the solving linear systems

    Energy Technology Data Exchange (ETDEWEB)

    Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S. [Sandia National Labs., Albuquerque, NM (United States)

    1996-12-31

    We describe a parallel linear system package, AZTEC. The package incorporates a number of parallel iterative methods (e.g. GMRES, biCGSTAB, CGS, TFQMR) and preconditioners (e.g. Jacobi, Gauss-Seidel, polynomial, domain decomposition with LU or ILU within subdomains). Additionally, AZTEC allows for the reuse of previous preconditioning factorizations within Newton schemes for nonlinear methods. Currently, a number of different users are using this package to solve a variety of PDE applications.

  2. 3-Way characterization of soils by Procrustes rotation, matrix-augmented principal components analysis and parallel factor analysis

    Czech Academy of Sciences Publication Activity Database

    Andrade, J.M.; Kubista, Mikael; Carlosena, A.; Prada, D.

    2007-01-01

    Roč. 603, č. 1 (2007), s. 20-29 ISSN 0003-2670 Institutional research plan: CEZ:AV0Z50520514 Keywords : PCA * heavy metals * soil Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.186, year: 2007

  3. Information-Limited Parallel Processing in Difficult Heterogeneous Covert Visual Search

    Science.gov (United States)

    Dosher, Barbara Anne; Han, Songmei; Lu, Zhong-Lin

    2010-01-01

    Difficult visual search is often attributed to time-limited serial attention operations, although neural computations in the early visual system are parallel. Using probabilistic search models (Dosher, Han, & Lu, 2004) and a full time-course analysis of the dynamics of covert visual search, we distinguish unlimited capacity parallel versus serial…

  4. Massive hybrid parallelism for fully implicit multiphysics

    International Nuclear Information System (INIS)

    Gaston, D. R.; Permann, C. J.; Andrs, D.; Peterson, J. W.

    2013-01-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided. (authors)

  5. Massive hybrid parallelism for fully implicit multiphysics

    Energy Technology Data Exchange (ETDEWEB)

    Gaston, D. R.; Permann, C. J.; Andrs, D.; Peterson, J. W. [Idaho National Laboratory, 2525 N. Fremont Ave., Idaho Falls, ID 83415 (United States)

    2013-07-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided. (authors)

  6. MASSIVE HYBRID PARALLELISM FOR FULLY IMPLICIT MULTIPHYSICS

    Energy Technology Data Exchange (ETDEWEB)

    Cody J. Permann; David Andrs; John W. Peterson; Derek R. Gaston

    2013-05-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided.

  7. A comparative critical analysis of modern task-parallel runtimes.

    Energy Technology Data Exchange (ETDEWEB)

    Wheeler, Kyle Bruce; Stark, Dylan; Murphy, Richard C.

    2012-12-01

    The rise in node-level parallelism has increased interest in task-based parallel runtimes for a wide array of application areas. Applications have a wide variety of task spawning patterns which frequently change during the course of application execution, based on the algorithm or solver kernel in use. Task scheduling and load balance regimes, however, are often highly optimized for specific patterns. This paper uses four basic task spawning patterns to quantify the impact of specific scheduling policy decisions on execution time. We compare the behavior of six publicly available tasking runtimes: Intel Cilk, Intel Threading Building Blocks (TBB), Intel OpenMP, GCC OpenMP, Qthreads, and High Performance ParalleX (HPX). With the exception of Qthreads, the runtimes prove to have schedulers that are highly sensitive to application structure. No runtime is able to provide the best performance in all cases, and those that do provide the best performance in some cases, unfortunately, provide extremely poor performance when application structure does not match the schedulers assumptions.

  8. On the mathematic simulation of the energy efficiency for heat exchangers with the systems of impingement plane-parallel jets

    Directory of Open Access Journals (Sweden)

    Haritonova Larisa

    2017-01-01

    Full Text Available The article gives the analytical generalization of the data on the energy efficiency for heat exchangers with the flat heat exchange surface to which systems of impact plane parallel jets are sent. Functional relations of specific power consumption (per unit of area, which were obtained for the first time using the techniques of the similarity law, for moving a heat carrier are shown with regard to design and operation factors. The regression equations representing a mathematical model of the process enable to carry out an analysis of various factors impact on the parameter to be determined. The obtained results can be used to optimize or to create the calculation techniques for new highly-efficient heat exchange devices with jet plane -parallel impingement systems and also to reduce power consumption for moving a heat carrier.

  9. Where are the parallel algorithms?

    Science.gov (United States)

    Voigt, R. G.

    1985-01-01

    Four paradigms that can be useful in developing parallel algorithms are discussed. These include computational complexity analysis, changing the order of computation, asynchronous computation, and divide and conquer. Each is illustrated with an example from scientific computation, and it is shown that computational complexity must be used with great care or an inefficient algorithm may be selected.

  10. Adaptive-Multilevel BDDC and its parallel implementation

    Czech Academy of Sciences Publication Activity Database

    Sousedík, Bedřich; Šístek, Jakub; Mandel, J.

    2013-01-01

    Roč. 95, č. 12 (2013), s. 1087-1119 ISSN 0010-485X R&D Projects: GA ČR GA106/08/0403 Institutional support: RVO:61388998 ; RVO:67985840 Keywords : BDDC * parallel algorithms * domain decomposition Subject RIV: JC - Computer Hardware ; Software; BA - General Mathematics (MU-W) Impact factor: 1.055, year: 2013

  11. Parallel Relational Universes – experiments in modularity

    DEFF Research Database (Denmark)

    Pagliarini, Luigi; Lund, Henrik Hautop

    2015-01-01

    : We here describe Parallel Relational Universes, an artistic method used for the psychological analysis of group dynamics. The design of the artistic system, which mediates group dynamics, emerges from our studies of modular playware and remixing playware. Inspired from remixing modular playware......, where users remix samples in the form of physical and functional modules, we created an artistic instantiation of such a concept with the Parallel Relational Universes, allowing arts alumni to remix artistic expressions. Here, we report the data emerged from a first pre-test, run with gymnasium’s alumni....... We then report both the artistic and the psychological findings. We discuss possible variations of such an instrument. Between an art piece and a psychological test, at a first cognitive analysis, it seems to be a promising research tool...

  12. Large-Scale Parallel Finite Element Analysis of the Stress Singular Problems

    International Nuclear Information System (INIS)

    Noriyuki Kushida; Hiroshi Okuda; Genki Yagawa

    2002-01-01

    In this paper, the convergence behavior of large-scale parallel finite element method for the stress singular problems was investigated. The convergence behavior of iterative solvers depends on the efficiency of the pre-conditioners. However, efficiency of pre-conditioners may be influenced by the domain decomposition that is necessary for parallel FEM. In this study the following results were obtained: Conjugate gradient method without preconditioning and the diagonal scaling preconditioned conjugate gradient method were not influenced by the domain decomposition as expected. symmetric successive over relaxation method preconditioned conjugate gradient method converged 6% faster as maximum if the stress singular area was contained in one sub-domain. (authors)

  13. Parallel Integer Factorization Using Quadratic Forms

    National Research Council Canada - National Science Library

    McMath, Stephen S

    2005-01-01

    .... In 1975, Daniel Shanks used class group infrastructure to modify the Morrison-Brillhart algorithm and develop Square Forms Factorization, but he never published his work on this algorithm or provided...

  14. Electromagnetic ion-cyclotron instability in the presence of a parallel electric field with general loss-cone distribution function - particle aspect analysis

    Directory of Open Access Journals (Sweden)

    G. Ahirwar

    2006-08-01

    Full Text Available The effect of parallel electric field on the growth rate, parallel and perpendicular resonant energy and marginal stability of the electromagnetic ion-cyclotron (EMIC wave with general loss-cone distribution function in a low β homogeneous plasma is investigated by particle aspect approach. The effect of the steepness of the loss-cone distribution is investigated on the electromagnetic ion-cyclotron wave. The whole plasma is considered to consist of resonant and non-resonant particles. It is assumed that resonant particles participate in the energy exchange with the wave, whereas non-resonant particles support the oscillatory motion of the wave. The wave is assumed to propagate parallel to the static magnetic field. The effect of the parallel electric field with the general distribution function is to control the growth rate of the EMIC waves, whereas the effect of steep loss-cone distribution is to enhance the growth rate and perpendicular heating of the ions. This study is relevant to the analysis of ion conics in the presence of an EMIC wave in the auroral acceleration region of the Earth's magnetoplasma.

  15. Parallel multiple instance learning for extremely large histopathology image analysis.

    Science.gov (United States)

    Xu, Yan; Li, Yeshu; Shen, Zhengyang; Wu, Ziwei; Gao, Teng; Fan, Yubo; Lai, Maode; Chang, Eric I-Chao

    2017-08-03

    Histopathology images are critical for medical diagnosis, e.g., cancer and its treatment. A standard histopathology slice can be easily scanned at a high resolution of, say, 200,000×200,000 pixels. These high resolution images can make most existing imaging processing tools infeasible or less effective when operated on a single machine with limited memory, disk space and computing power. In this paper, we propose an algorithm tackling this new emerging "big data" problem utilizing parallel computing on High-Performance-Computing (HPC) clusters. Experimental results on a large-scale data set (1318 images at a scale of 10 billion pixels each) demonstrate the efficiency and effectiveness of the proposed algorithm for low-latency real-time applications. The framework proposed an effective and efficient system for extremely large histopathology image analysis. It is based on the multiple instance learning formulation for weakly-supervised learning for image classification, segmentation and clustering. When a max-margin concept is adopted for different clusters, we obtain further improvement in clustering performance.

  16. Parallel algorithms for mapping pipelined and parallel computations

    Science.gov (United States)

    Nicol, David M.

    1988-01-01

    Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.

  17. Factor analysis of multivariate data

    Digital Repository Service at National Institute of Oceanography (India)

    Fernandes, A.A.; Mahadevan, R.

    A brief introduction to factor analysis is presented. A FORTRAN program, which can perform the Q-mode and R-mode factor analysis and the singular value decomposition of a given data matrix is presented in Appendix B. This computer program, uses...

  18. Configuration affects parallel stent grafting results.

    Science.gov (United States)

    Tanious, Adam; Wooster, Mathew; Armstrong, Paul A; Zwiebel, Bruce; Grundy, Shane; Back, Martin R; Shames, Murray L

    2018-05-01

    A number of adjunctive "off-the-shelf" procedures have been described to treat complex aortic diseases. Our goal was to evaluate parallel stent graft configurations and to determine an optimal formula for these procedures. This is a retrospective review of all patients at a single medical center treated with parallel stent grafts from January 2010 to September 2015. Outcomes were evaluated on the basis of parallel graft orientation, type, and main body device. Primary end points included parallel stent graft compromise and overall endovascular aneurysm repair (EVAR) compromise. There were 78 patients treated with a total of 144 parallel stents for a variety of pathologic processes. There was a significant correlation between main body oversizing and snorkel compromise (P = .0195) and overall procedural complication (P = .0019) but not with endoleak rates. Patients were organized into the following oversizing groups for further analysis: 0% to 10%, 10% to 20%, and >20%. Those oversized into the 0% to 10% group had the highest rate of overall EVAR complication (73%; P = .0003). There were no significant correlations between any one particular configuration and overall procedural complication. There was also no significant correlation between total number of parallel stents employed and overall complication. Composite EVAR configuration had no significant correlation with individual snorkel compromise, endoleak, or overall EVAR or procedural complication. The configuration most prone to individual snorkel compromise and overall EVAR complication was a four-stent configuration with two stents in an antegrade position and two stents in a retrograde position (60% complication rate). The configuration most prone to endoleak was one or two stents in retrograde position (33% endoleak rate), followed by three stents in an all-antegrade position (25%). There was a significant correlation between individual stent configuration and stent compromise (P = .0385), with 31

  19. Factor analysis and scintigraphy

    International Nuclear Information System (INIS)

    Di Paola, R.; Penel, C.; Bazin, J.P.; Berche, C.

    1976-01-01

    The goal of factor analysis is usually to achieve reduction of a large set of data, extracting essential features without previous hypothesis. Due to the development of computerized systems, the use of largest sampling, the possibility of sequential data acquisition and the increase of dynamic studies, the problem of data compression can be encountered now in routine. Thus, results obtained for compression of scintigraphic images were first presented. Then possibilities given by factor analysis for scan processing were discussed. At last, use of this analysis for multidimensional studies and specially dynamic studies were considered for compression and processing [fr

  20. Using a Linux Cluster for Parallel Simulations of an Active Magnetic Regenerator Refrigerator

    DEFF Research Database (Denmark)

    Petersen, T.F.; Pryds, N.; Smith, A.

    2006-01-01

    This paper describes the implementation of a Comsol Multiphysics model on a Linux computer Cluster. The Magnetic Refrigerator (MR) is a special type of refrigerator with potential to reduce the energy consumption of household refrigeration by a factor of two or more. To conduct numerical analysis....... The coupled set of equations and the transient convergence towards the final steady state means that the model has an excessive solution time. To make parametric studies practical, the developed model was implemented on a Cluster to allow parallel simulations, which has decreased the solution time...

  1. High spatial resolution CT image reconstruction using parallel computing

    International Nuclear Information System (INIS)

    Yin Yin; Liu Li; Sun Gongxing

    2003-01-01

    Using the PC cluster system with 16 dual CPU nodes, we accelerate the FBP and OR-OSEM reconstruction of high spatial resolution image (2048 x 2048). Based on the number of projections, we rewrite the reconstruction algorithms into parallel format and dispatch the tasks to each CPU. By parallel computing, the speedup factor is roughly equal to the number of CPUs, which can be up to about 25 times when 25 CPUs used. This technique is very suitable for real-time high spatial resolution CT image reconstruction. (authors)

  2. Parallel computing works

    Energy Technology Data Exchange (ETDEWEB)

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  3. Vectorization, parallelization and porting of nuclear codes. 2001

    International Nuclear Information System (INIS)

    Akiyama, Mitsunaga; Katakura, Fumishige; Kume, Etsuo; Nemoto, Toshiyuki; Tsuruoka, Takuya; Adachi, Masaaki

    2003-07-01

    Several computer codes in the nuclear field have been vectorized, parallelized and transported on the super computer system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 10 codes in fiscal 2001. In this report, the parallelization of Neutron Radiography for 3 Dimensional CT code NR3DCT, the vectorization of unsteady-state heat conduction code THERMO3D, the porting of initial program of MHD simulation, the tuning of Heat And Mass Balance Analysis Code HAMBAC, the porting and parallelization of Monte Carlo N-Particle transport code MCNP4C3, the porting and parallelization of Monte Carlo N-Particle transport code system MCNPX2.1.5, the porting of induced activity calculation code CINAC-V4, the use of VisLink library in multidimensional two-fluid model code ACD3D and the porting of experiment data processing code from GS8500 to SR8000 are described. (author)

  4. Unified dataflow model for the analysis of data and pipeline parallelism, and buffer sizing

    NARCIS (Netherlands)

    Hausmans, J.P.H.M.; Geuns, S.J.; Wiggers, M.H.; Bekooij, Marco Jan Gerrit

    2014-01-01

    Real-time stream processing applications such as software defined radios are usually executed concurrently on multiprocessor systems. Exploiting coarse-grained data parallelism by duplicating tasks is often required, besides pipeline parallelism, to meet the temporal constraints of the applications.

  5. Fast parallel event reconstruction

    CERN Multimedia

    CERN. Geneva

    2010-01-01

    On-line processing of large data volumes produced in modern HEP experiments requires using maximum capabilities of modern and future many-core CPU and GPU architectures.One of such powerful feature is a SIMD instruction set, which allows packing several data items in one register and to operate on all of them, thus achievingmore operations per clock cycle. Motivated by the idea of using the SIMD unit ofmodern processors, the KF based track fit has been adapted for parallelism, including memory optimization, numerical analysis, vectorization with inline operator overloading, and optimization using SDKs. The speed of the algorithm has been increased in 120000 times with 0.1 ms/track, running in parallel on 16 SPEs of a Cell Blade computer.  Running on a Nehalem CPU with 8 cores it shows the processing speed of 52 ns/track using the Intel Threading Building Blocks. The same KF algorithm running on an Nvidia GTX 280 in the CUDA frameworkprovi...

  6. General upper bounds on the runtime of parallel evolutionary algorithms.

    Science.gov (United States)

    Lässig, Jörg; Sudholt, Dirk

    2014-01-01

    We present a general method for analyzing the runtime of parallel evolutionary algorithms with spatially structured populations. Based on the fitness-level method, it yields upper bounds on the expected parallel runtime. This allows for a rigorous estimate of the speedup gained by parallelization. Tailored results are given for common migration topologies: ring graphs, torus graphs, hypercubes, and the complete graph. Example applications for pseudo-Boolean optimization show that our method is easy to apply and that it gives powerful results. In our examples the performance guarantees improve with the density of the topology. Surprisingly, even sparse topologies such as ring graphs lead to a significant speedup for many functions while not increasing the total number of function evaluations by more than a constant factor. We also identify which number of processors lead to the best guaranteed speedups, thus giving hints on how to parameterize parallel evolutionary algorithms.

  7. Parallel Tensor Compression for Large-Scale Scientific Data.

    Energy Technology Data Exchange (ETDEWEB)

    Kolda, Tamara G. [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Ballard, Grey [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Austin, Woody Nathan [Univ. of Texas, Austin, TX (United States)

    2015-10-01

    As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data. By viewing the data as a dense five way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 10000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed memory parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.

  8. A parallel offline CFD and closed-form approximation strategy for computationally efficient analysis of complex fluid flows

    Science.gov (United States)

    Allphin, Devin

    Computational fluid dynamics (CFD) solution approximations for complex fluid flow problems have become a common and powerful engineering analysis technique. These tools, though qualitatively useful, remain limited in practice by their underlying inverse relationship between simulation accuracy and overall computational expense. While a great volume of research has focused on remedying these issues inherent to CFD, one traditionally overlooked area of resource reduction for engineering analysis concerns the basic definition and determination of functional relationships for the studied fluid flow variables. This artificial relationship-building technique, called meta-modeling or surrogate/offline approximation, uses design of experiments (DOE) theory to efficiently approximate non-physical coupling between the variables of interest in a fluid flow analysis problem. By mathematically approximating these variables, DOE methods can effectively reduce the required quantity of CFD simulations, freeing computational resources for other analytical focuses. An idealized interpretation of a fluid flow problem can also be employed to create suitably accurate approximations of fluid flow variables for the purposes of engineering analysis. When used in parallel with a meta-modeling approximation, a closed-form approximation can provide useful feedback concerning proper construction, suitability, or even necessity of an offline approximation tool. It also provides a short-circuit pathway for further reducing the overall computational demands of a fluid flow analysis, again freeing resources for otherwise unsuitable resource expenditures. To validate these inferences, a design optimization problem was presented requiring the inexpensive estimation of aerodynamic forces applied to a valve operating on a simulated piston-cylinder heat engine. The determination of these forces was to be found using parallel surrogate and exact approximation methods, thus evidencing the comparative

  9. A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

    Energy Technology Data Exchange (ETDEWEB)

    Ma, Kwan-Liu [Univ. of California, Davis, CA (United States)

    2017-02-01

    efficient computation on an exascale computer. This project concludes with a functional prototype containing pervasively parallel algorithms that perform demonstratively well on many-core processors. These algorithms are fundamental for performing data analysis and visualization at extreme scale.

  10. Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

    Science.gov (United States)

    Sun, Xian-He

    1997-01-01

    and Reduced Parallel Diagonal Dominant (RPDD) algorithm have been carefully studied on different parallel platforms for different applications, and a NASA simulation code developed by Man M. Rai and his colleagues has been parallelized and implemented based on data dependency analysis. These achievements are addressed in detail in the paper.

  11. Pulse mode counting system with parallel port interface

    International Nuclear Information System (INIS)

    Farooq, M.A.; Mushtaq, N.; Sultan, M.; Karim, A.

    2010-11-01

    Pulse mode Counting System (PPCS) module has been designed and developed which is compatible with SPP (Standard Parallel Port) and EPP Enhanced Parallel Port). This system can capture, present and store real time data in a well formatted form. The stored data is in a format that can be imported in different packages for further analysis. The purpose of this system is to facilitate the research experiments having frequency range up to 4 MHz and storing range up to 16 million counts. (author)

  12. Analytical and experimental analysis of a parallel leaf spring guidance

    NARCIS (Netherlands)

    Meijaard, Jacob Philippus; Brouwer, Dannis Michel; Jonker, Jan B.; Denier, J.; Finn, M.

    2008-01-01

    A parallel leaf spring guidance is defined as a benchmark problem for flexible multibody formalisms and codes. The mechanism is loaded by forces and an additional moment or misalignment. Buckling loads, changes in compliance and frequencies, and large-amplitude vibrations are calculated. A

  13. Web based parallel/distributed medical data mining using software agents

    Energy Technology Data Exchange (ETDEWEB)

    Kargupta, H.; Stafford, B.; Hamzaoglu, I.

    1997-12-31

    This paper describes an experimental parallel/distributed data mining system PADMA (PArallel Data Mining Agents) that uses software agents for local data accessing and analysis and a web based interface for interactive data visualization. It also presents the results of applying PADMA for detecting patterns in unstructured texts of postmortem reports and laboratory test data for Hepatitis C patients.

  14. Kinematic Analysis and Optimization of a New Compliant Parallel Micromanipulator

    Directory of Open Access Journals (Sweden)

    Qingsong Xu

    2008-11-01

    Full Text Available In this paper, a new three translational degrees of freedom (DOF compliant parallel micromanipulator (CPM is proposed, which has an excellent accuracy of parallel mechanisms with flexure hinges. The system is established by a proper selection of hardware and analyzed via the derived pseudo-rigid-body model. In view of the physical constraints imposed by both the piezoelectric actuators and flexure hinges, the CPM's reachable workspace is determined analytically, where a maximum cylinder defined as an usable workspace can be inscribed. Moreover, the optimal design of the CPM with the consideration of the usable workspace size and global dexterity index simultaneously is carried out by utilizing the approaches of direct search method, genetic algorithm (GA, and particle swarm optimization (PSO, respectively. The simulation results show that the PSO is the best method for the optimization, and the results are valuable in the design of a new micromanipulator.

  15. Template based parallel checkpointing in a massively parallel computer system

    Science.gov (United States)

    Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  16. Development of parallel benchmark code by sheet metal forming simulator 'ITAS'

    International Nuclear Information System (INIS)

    Watanabe, Hiroshi; Suzuki, Shintaro; Minami, Kazuo

    1999-03-01

    This report describes the development of parallel benchmark code by sheet metal forming simulator 'ITAS'. ITAS is a nonlinear elasto-plastic analysis program by the finite element method for the purpose of the simulation of sheet metal forming. ITAS adopts the dynamic analysis method that computes displacement of sheet metal at every time unit and utilizes the implicit method with the direct linear equation solver. Therefore the simulator is very robust. However, it requires a lot of computational time and memory capacity. In the development of the parallel benchmark code, we designed the code by MPI programming to reduce the computational time. In numerical experiments on the five kinds of parallel super computers at CCSE JAERI, i.e., SP2, SR2201, SX-4, T94 and VPP300, good performances are observed. The result will be shown to the public through WWW so that the benchmark results may become a guideline of research and development of the parallel program. (author)

  17. Scattering Analysis of a Compact Dipole Array with Series and Parallel Feed Network including Mutual Coupling Effect

    Directory of Open Access Journals (Sweden)

    H. L. Sneha

    2013-01-01

    Full Text Available The current focus in defense arena is towards the stealth technology with an emphasis to control the radar cross-section (RCS. The scattering from the antennas mounted over the platform is of prime importance especially for a low-observable aerospace vehicle. This paper presents the analysis of the scattering cross section of a uniformly spaced linear dipole array. Two types of feed networks, that is, series and parallel feed networks, are considered. The total RCS of phased array with either kind of feed network is obtained by following the signal as it enters through the aperture and travels through the feed network. The RCS estimation of array is done including the mutual coupling effect between the dipole elements in three configurations, that is, side-by-side, collinear, and parallel-in-echelon. The results presented can be useful while designing a phased array with optimum performance towards low observability.

  18. Efficient multitasking of Choleski matrix factorization on CRAY supercomputers

    Science.gov (United States)

    Overman, Andrea L.; Poole, Eugene L.

    1991-01-01

    A Choleski method is described and used to solve linear systems of equations that arise in large scale structural analysis. The method uses a novel variable-band storage scheme and is structured to exploit fast local memory caches while minimizing data access delays between main memory and vector registers. Several parallel implementations of this method are described for the CRAY-2 and CRAY Y-MP computers demonstrating the use of microtasking and autotasking directives. A portable parallel language, FORCE, is used for comparison with the microtasked and autotasked implementations. Results are presented comparing the matrix factorization times for three representative structural analysis problems from runs made in both dedicated and multi-user modes on both computers. CPU and wall clock timings are given for the parallel implementations and are compared to single processor timings of the same algorithm.

  19. Acceleration and parallelization calculation of EFEN-SP_3 method

    International Nuclear Information System (INIS)

    Yang Wen; Zheng Youqi; Wu Hongchun; Cao Liangzhi; Li Yunzhao

    2013-01-01

    Due to the fact that the exponential function expansion nodal-SP_3 (EFEN-SP_3) method needs further improvement in computational efficiency to routinely carry out PWR whole core pin-by-pin calculation, the coarse mesh acceleration and spatial parallelization were investigated in this paper. The coarse mesh acceleration was built by considering discontinuity factor on each coarse mesh interface and preserving neutron balance within each coarse mesh in space, angle and energy. The spatial parallelization based on MPI was implemented by guaranteeing load balancing and minimizing communications cost to fully take advantage of the modern computing and storage abilities. Numerical results based on a commercial nuclear power reactor demonstrate an speedup ratio of about 40 for the coarse mesh acceleration and a parallel efficiency of higher than 60% with 40 CPUs for the spatial parallelization. With these two improvements, the EFEN code can complete a PWR whole core pin-by-pin calculation with 289 × 289 × 218 meshes and 4 energy groups within 100 s by using 48 CPUs (2.40 GHz frequency). (authors)

  20. Analysis of parallel optical sampling rate and ADC requirements in digital coherent receivers

    DEFF Research Database (Denmark)

    Lorences Riesgo, Abel; Galili, Michael; Peucheret, Christophe

    2012-01-01

    We comprehensively assess analog-to-digital converter requirements in coherent digital receiver schemes with parallel optical sampling. We determine the electronic requirements in accordance with the properties of the free running local oscillator.......We comprehensively assess analog-to-digital converter requirements in coherent digital receiver schemes with parallel optical sampling. We determine the electronic requirements in accordance with the properties of the free running local oscillator....

  1. Parallel manipulators with two end-effectors : Getting a grip on Jacobian-based stiffness analysis

    NARCIS (Netherlands)

    Hoevenaars, A.G.L.

    2016-01-01

    Robots that are developed for applications which require a high stiffness-over-inertia ratio, such as pick-and-place robots, machining robots, or haptic devices, are often based on parallel manipulators. Parallel manipulators connect an end-effector to an inertial base using multiple serial

  2. A Parallel Particle Swarm Optimization Algorithm Accelerated by Asynchronous Evaluations

    Science.gov (United States)

    Venter, Gerhard; Sobieszczanski-Sobieski, Jaroslaw

    2005-01-01

    A parallel Particle Swarm Optimization (PSO) algorithm is presented. Particle swarm optimization is a fairly recent addition to the family of non-gradient based, probabilistic search algorithms that is based on a simplified social model and is closely tied to swarming theory. Although PSO algorithms present several attractive properties to the designer, they are plagued by high computational cost as measured by elapsed time. One approach to reduce the elapsed time is to make use of coarse-grained parallelization to evaluate the design points. Previous parallel PSO algorithms were mostly implemented in a synchronous manner, where all design points within a design iteration are evaluated before the next iteration is started. This approach leads to poor parallel speedup in cases where a heterogeneous parallel environment is used and/or where the analysis time depends on the design point being analyzed. This paper introduces an asynchronous parallel PSO algorithm that greatly improves the parallel e ciency. The asynchronous algorithm is benchmarked on a cluster assembled of Apple Macintosh G5 desktop computers, using the multi-disciplinary optimization of a typical transport aircraft wing as an example.

  3. Three dimensional Burn-up program parallelization using socket programming

    International Nuclear Information System (INIS)

    Haliyati R, Evi; Su'ud, Zaki

    2002-01-01

    A computer parallelization process was built with a purpose to decrease execution time of a physics program. In this case, a multi computer system was built to be used to analyze burn-up process of a nuclear reactor. This multi computer system was design need using a protocol communication among sockets, i.e. TCP/IP. This system consists of computer as a server and the rest as clients. The server has a main control to all its clients. The server also divides the reactor core geometrically to in parts in accordance with the number of clients, each computer including the server has a task to conduct burn-up analysis of 1/n part of the total reactor core measure. This burn-up analysis was conducted simultaneously and in a parallel way by all computers, so a faster program execution time was achieved close to 1/n times that of one computer. Then an analysis was carried out and states that in order to calculate the density of atoms in a reactor of 91 cm x 91 cm x 116 cm, the usage of a parallel system of 2 computers has the highest efficiency

  4. The voltage—current relationship and equivalent circuit implementation of parallel flux-controlled memristive circuits

    International Nuclear Information System (INIS)

    Bao Bo-Cheng; Feng Fei; Dong Wei; Pan Sai-Hu

    2013-01-01

    A flux-controlled memristor characterized by smooth cubic nonlinearity is taken as an example, upon which the voltage—current relationships (VCRs) between two parallel memristive circuits — a parallel memristor and capacitor circuit (the parallel MC circuit), and a parallel memristor and inductor circuit (the parallel ML circuit) — are investigated. The results indicate that the VCR between these two parallel memristive circuits is closely related to the circuit parameters, and the frequency and amplitude of the sinusoidal voltage stimulus. An equivalent circuit model of the memristor is built, upon which the circuit simulations and experimental measurements of both the parallel MC circuit and the parallel ML circuit are performed, and the results verify the theoretical analysis results

  5. Evaluating parallel relational databases for medical data analysis.

    Energy Technology Data Exchange (ETDEWEB)

    Rintoul, Mark Daniel; Wilson, Andrew T.

    2012-03-01

    Hospitals have always generated and consumed large amounts of data concerning patients, treatment and outcomes. As computers and networks have permeated the hospital environment it has become feasible to collect and organize all of this data. This raises naturally the question of how to deal with the resulting mountain of information. In this report we detail a proof-of-concept test using two commercially available parallel database systems to analyze a set of real, de-identified medical records. We examine database scalability as data sizes increase as well as responsiveness under load from multiple users.

  6. Decomposition based parallel processing technique for efficient collaborative optimization

    International Nuclear Information System (INIS)

    Park, Hyung Wook; Kim, Sung Chan; Kim, Min Soo; Choi, Dong Hoon

    2000-01-01

    In practical design studies, most of designers solve multidisciplinary problems with complex design structure. These multidisciplinary problems have hundreds of analysis and thousands of variables. The sequence of process to solve these problems affects the speed of total design cycle. Thus it is very important for designer to reorder original design processes to minimize total cost and time. This is accomplished by decomposing large multidisciplinary problem into several MultiDisciplinary Analysis SubSystem (MDASS) and processing it in parallel. This paper proposes new strategy for parallel decomposition of multidisciplinary problem to raise design efficiency by using genetic algorithm and shows the relationship between decomposition and Multidisciplinary Design Optimization(MDO) methodology

  7. Power stability methods for parallel systems

    International Nuclear Information System (INIS)

    Wallach, Y.

    1988-01-01

    Parallel-Processing Systems are already commercially available. This paper shows that if one of them - the Alternating Sequential Parallel, or ASP system - is applied to network stability calculations it will lead to a higher speed of solution. The ASP system is first described and is then shown to be cheaper, more reliable and available than other parallel systems. Also, no deadlock need be feared and the speedup is normally very high. A number of ASP systems were already assembled (the SMS systems, Topps, DIRMU etc.). At present, an IBM Local Area Network is being modified so that it too can work in the ASP mode. Existing ASP systems were programmed in Fortran or assembly language. Since newer systems (e.g. DIRMU) are programmed in Modula-2, this language can be used. Stability analysis is based on solving nonlinear differential and algebraic equations. The algorithm for solving the nonlinear differential equations on ASP, is described and programmed in Modula-2. The speedup is computed and is shown to be almost optimal

  8. Introduction to parallel programming

    CERN Document Server

    Brawer, Steven

    1989-01-01

    Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race

  9. Parallelism in matrix computations

    CERN Document Server

    Gallopoulos, Efstratios; Sameh, Ahmed H

    2016-01-01

    This book is primarily intended as a research monograph that could also be used in graduate courses for the design of parallel algorithms in matrix computations. It assumes general but not extensive knowledge of numerical linear algebra, parallel architectures, and parallel programming paradigms. The book consists of four parts: (I) Basics; (II) Dense and Special Matrix Computations; (III) Sparse Matrix Computations; and (IV) Matrix functions and characteristics. Part I deals with parallel programming paradigms and fundamental kernels, including reordering schemes for sparse matrices. Part II is devoted to dense matrix computations such as parallel algorithms for solving linear systems, linear least squares, the symmetric algebraic eigenvalue problem, and the singular-value decomposition. It also deals with the development of parallel algorithms for special linear systems such as banded ,Vandermonde ,Toeplitz ,and block Toeplitz systems. Part III addresses sparse matrix computations: (a) the development of pa...

  10. A discrete ordinate response matrix method for massively parallel computers

    International Nuclear Information System (INIS)

    Hanebutte, U.R.; Lewis, E.E.

    1991-01-01

    A discrete ordinate response matrix method is formulated for the solution of neutron transport problems on massively parallel computers. The response matrix formulation eliminates iteration on the scattering source. The nodal matrices which result from the diamond-differenced equations are utilized in a factored form which minimizes memory requirements and significantly reduces the required number of algorithm utilizes massive parallelism by assigning each spatial node to a processor. The algorithm is accelerated effectively by a synthetic method in which the low-order diffusion equations are also solved by massively parallel red/black iterations. The method has been implemented on a 16k Connection Machine-2, and S 8 and S 16 solutions have been obtained for fixed-source benchmark problems in X--Y geometry

  11. The Modeling and Harmonic Coupling Analysis of Multiple-Parallel Connected Inverter Using Harmonic State Space (HSS)

    DEFF Research Database (Denmark)

    Kwon, Jun Bum; Wang, Xiongfei; Bak, Claus Leth

    2015-01-01

    As the number of power electronics based systems are increasing, studies about overall stability and harmonic problems are rising. In order to analyze harmonics and stability, most research is using an analysis method, which is based on the Linear Time Invariant (LTI) approach. However, this can...... be difficult in terms of complex multi-parallel connected systems, especially in the case of renewable energy, where possibilities for intermittent operation due to the weather conditions exist. Hence, it can bring many different operating points to the power converter, and the impedance characteristics can...... can demonstrate other phenomenon, which can not be found in the conventional LTI approach. The theoretical modeling and analysis are verified by means of simulations and experiments....

  12. Parallelization in Modern C++

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...

  13. An Expert System for the Development of Efficient Parallel Code

    Science.gov (United States)

    Jost, Gabriele; Chun, Robert; Jin, Hao-Qiang; Labarta, Jesus; Gimenez, Judit

    2004-01-01

    We have built the prototype of an expert system to assist the user in the development of efficient parallel code. The system was integrated into the parallel programming environment that is currently being developed at NASA Ames. The expert system interfaces to tools for automatic parallelization and performance analysis. It uses static program structure information and performance data in order to automatically determine causes of poor performance and to make suggestions for improvements. In this paper we give an overview of our programming environment, describe the prototype implementation of our expert system, and demonstrate its usefulness with several case studies.

  14. Optimization approaches to mpi and area merging-based parallel buffer algorithm

    Directory of Open Access Journals (Sweden)

    Junfu Fan

    Full Text Available On buffer zone construction, the rasterization-based dilation method inevitably introduces errors, and the double-sided parallel line method involves a series of complex operations. In this paper, we proposed a parallel buffer algorithm based on area merging and MPI (Message Passing Interface to improve the performances of buffer analyses on processing large datasets. Experimental results reveal that there are three major performance bottlenecks which significantly impact the serial and parallel buffer construction efficiencies, including the area merging strategy, the task load balance method and the MPI inter-process results merging strategy. Corresponding optimization approaches involving tree-like area merging strategy, the vertex number oriented parallel task partition method and the inter-process results merging strategy were suggested to overcome these bottlenecks. Experiments were carried out to examine the performance efficiency of the optimized parallel algorithm. The estimation results suggested that the optimization approaches could provide high performance and processing ability for buffer construction in a cluster parallel environment. Our method could provide insights into the parallelization of spatial analysis algorithm.

  15. Massively parallel mathematical sieves

    Energy Technology Data Exchange (ETDEWEB)

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  16. Hypercube Expert System Shell - Applying Production Parallelism.

    Science.gov (United States)

    1989-12-01

    possible processor organizations, or int( rconntction n thod,, for par- allel architetures . The following are examples of commonlv used interconnection...this timing analysis because match speed-up avaiiah& from production parallelism is proportional to the average number of affected produclions1 ( 11:5

  17. Comparative Analysis of Torque and Acceleration of Pre- and Post-Transmission Parallel Hybrid Drivetrains

    Directory of Open Access Journals (Sweden)

    Zulkifli Saiful A.

    2016-01-01

    Full Text Available Parallel hybrid electric vehicles (HEV can be classified according to the location of the electric motor with respect to the transmission unit for the internal combustion engine (ICE: they can be pre-transmission or posttransmission parallel hybrid. A split-axle parallel HEV – in which the ICE and electric motor provide propulsion power to different axles – is a sub-type of the post-transmission hybrid, since addition of torque and power from the two power sources occurs after the vehicle’s transmission. The term ‘through-the-road’ (TTR hybrid is also used for the split-parallel HEV, since power coupling between the ICE and electric motor is not through some mechanical device but through the vehicle itself, its wheels and the road on which it moves. The present work presents torquespeed relationship of the split-parallel hybrid and analyses simulation results of torque profiles and acceleration performance of pre-transmission and post-transmission hybrid configurations, using three different sizes of electric motor. Different operating regions of the pre-trans and post-trans motors are observed, leading to different speed and torque profiles. Although ICE average efficiency in the post-trans hybrid is slightly lower than in the pre-trans hybrid, the post-trans hybrid vehicle has better fuel economy and acceleration performance than the pre-trans hybrid vehicle.

  18. Module Six: Parallel Circuits; Basic Electricity and Electronics Individualized Learning System.

    Science.gov (United States)

    Bureau of Naval Personnel, Washington, DC.

    In this module the student will learn the rules that govern the characteristics of parallel circuits; the relationships between voltage, current, resistance and power; and the results of common troubles in parallel circuits. The module is divided into four lessons: rules of voltage and current, rules for resistance and power, variational analysis,…

  19. Stiffness Analysis and Comparison of 3-PPR Planar Parallel Manipulators with Actuation Compliance

    DEFF Research Database (Denmark)

    Wu, Guanglei; Bai, Shaoping; Kepler, Jørgen Asbøl

    2012-01-01

    In this paper, the stiffness of 3-PPR planar parallel manipulator (PPM) is analyzed with the consideration of nonlinear actuation compliance. The characteristics of the stiffness matrix pertaining to the planar parallel manipulators are analyzed and discussed. Graphic representation of the stiffn...... of the stiffness characteristics by means of translational and rotational stiffness mapping is developed. The developed method is illustrated with an unsymmetrical 3-PPR PPM, being compared with its structure-symmetrical counterpart....

  20. P3T+: A Performance Estimator for Distributed and Parallel Programs

    Directory of Open Access Journals (Sweden)

    T. Fahringer

    2000-01-01

    Full Text Available Developing distributed and parallel programs on today's multiprocessor architectures is still a challenging task. Particular distressing is the lack of effective performance tools that support the programmer in evaluating changes in code, problem and machine sizes, and target architectures. In this paper we introduce P3T+ which is a performance estimator for mostly regular HPF (High Performance Fortran programs but partially covers also message passing programs (MPI. P3T+ is unique by modeling programs, compiler code transformations, and parallel and distributed architectures. It computes at compile-time a variety of performance parameters including work distribution, number of transfers, amount of data transferred, transfer times, computation times, and number of cache misses. Several novel technologies are employed to compute these parameters: loop iteration spaces, array access patterns, and data distributions are modeled by employing highly effective symbolic analysis. Communication is estimated by simulating the behavior of a communication library used by the underlying compiler. Computation times are predicted through pre-measured kernels on every target architecture of interest. We carefully model most critical architecture specific factors such as cache lines sizes, number of cache lines available, startup times, message transfer time per byte, etc. P3T+ has been implemented and is closely integrated with the Vienna High Performance Compiler (VFC to support programmers develop parallel and distributed applications. Experimental results for realistic kernel codes taken from real-world applications are presented to demonstrate both accuracy and usefulness of P3T+.

  1. PEM-PCA: A Parallel Expectation-Maximization PCA Face Recognition Architecture

    Directory of Open Access Journals (Sweden)

    Kanokmon Rujirakul

    2014-01-01

    Full Text Available Principal component analysis or PCA has been traditionally used as one of the feature extraction techniques in face recognition systems yielding high accuracy when requiring a small number of features. However, the covariance matrix and eigenvalue decomposition stages cause high computational complexity, especially for a large database. Thus, this research presents an alternative approach utilizing an Expectation-Maximization algorithm to reduce the determinant matrix manipulation resulting in the reduction of the stages’ complexity. To improve the computational time, a novel parallel architecture was employed to utilize the benefits of parallelization of matrix computation during feature extraction and classification stages including parallel preprocessing, and their combinations, so-called a Parallel Expectation-Maximization PCA architecture. Comparing to a traditional PCA and its derivatives, the results indicate lower complexity with an insignificant difference in recognition precision leading to high speed face recognition systems, that is, the speed-up over nine and three times over PCA and Parallel PCA.

  2. PEM-PCA: a parallel expectation-maximization PCA face recognition architecture.

    Science.gov (United States)

    Rujirakul, Kanokmon; So-In, Chakchai; Arnonkijpanich, Banchar

    2014-01-01

    Principal component analysis or PCA has been traditionally used as one of the feature extraction techniques in face recognition systems yielding high accuracy when requiring a small number of features. However, the covariance matrix and eigenvalue decomposition stages cause high computational complexity, especially for a large database. Thus, this research presents an alternative approach utilizing an Expectation-Maximization algorithm to reduce the determinant matrix manipulation resulting in the reduction of the stages' complexity. To improve the computational time, a novel parallel architecture was employed to utilize the benefits of parallelization of matrix computation during feature extraction and classification stages including parallel preprocessing, and their combinations, so-called a Parallel Expectation-Maximization PCA architecture. Comparing to a traditional PCA and its derivatives, the results indicate lower complexity with an insignificant difference in recognition precision leading to high speed face recognition systems, that is, the speed-up over nine and three times over PCA and Parallel PCA.

  3. Data communications in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-11-12

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.

  4. High performance statistical computing with parallel R: applications to biology and climate modelling

    International Nuclear Information System (INIS)

    Samatova, Nagiza F; Branstetter, Marcia; Ganguly, Auroop R; Hettich, Robert; Khan, Shiraj; Kora, Guruprasad; Li, Jiangtian; Ma, Xiaosong; Pan, Chongle; Shoshani, Arie; Yoginath, Srikanth

    2006-01-01

    Ultrascale computing and high-throughput experimental technologies have enabled the production of scientific data about complex natural phenomena. With this opportunity, comes a new problem - the massive quantities of data so produced. Answers to fundamental questions about the nature of those phenomena remain largely hidden in the produced data. The goal of this work is to provide a scalable high performance statistical data analysis framework to help scientists perform interactive analyses of these raw data to extract knowledge. Towards this goal we have been developing an open source parallel statistical analysis package, called Parallel R, that lets scientists employ a wide range of statistical analysis routines on high performance shared and distributed memory architectures without having to deal with the intricacies of parallelizing these routines

  5. Parallel algorithms for network routing problems and recurrences

    International Nuclear Information System (INIS)

    Wisniewski, J.A.; Sameh, A.H.

    1982-01-01

    In this paper, we consider the parallel solution of recurrences, and linear systems in the regular algebra of Carre. These problems are equivalent to solving the shortest path problem in graph theory, and they also arise in the analysis of Fortran programs. Our methods for solving linear systems in the regular algebra are analogues of well-known methods for solving systems of linear algebraic equations. A parallel version of Dijkstra's method, which has no linear algebraic analogue, is presented. Considerations for choosing an algorithm when the problem is large and sparse are also discussed

  6. Abstract Level Parallelization of Finite Difference Methods

    Directory of Open Access Journals (Sweden)

    Edwin Vollebregt

    1997-01-01

    Full Text Available A formalism is proposed for describing finite difference calculations in an abstract way. The formalism consists of index sets and stencils, for characterizing the structure of sets of data items and interactions between data items (“neighbouring relations”. The formalism provides a means for lifting programming to a more abstract level. This simplifies the tasks of performance analysis and verification of correctness, and opens the way for automaticcode generation. The notation is particularly useful in parallelization, for the systematic construction of parallel programs in a process/channel programming paradigm (e.g., message passing. This is important because message passing, unfortunately, still is the only approach that leads to acceptable performance for many more unstructured or irregular problems on parallel computers that have non-uniform memory access times. It will be shown that the use of index sets and stencils greatly simplifies the determination of which data must be exchanged between different computing processes.

  7. A parallel finite-difference method for computational aerodynamics

    International Nuclear Information System (INIS)

    Swisshelm, J.M.

    1989-01-01

    A finite-difference scheme for solving complex three-dimensional aerodynamic flow on parallel-processing supercomputers is presented. The method consists of a basic flow solver with multigrid convergence acceleration, embedded grid refinements, and a zonal equation scheme. Multitasking and vectorization have been incorporated into the algorithm. Results obtained include multiprocessed flow simulations from the Cray X-MP and Cray-2. Speedups as high as 3.3 for the two-dimensional case and 3.5 for segments of the three-dimensional case have been achieved on the Cray-2. The entire solver attained a factor of 2.7 improvement over its unitasked version on the Cray-2. The performance of the parallel algorithm on each machine is analyzed. 14 refs

  8. General-purpose parallel simulator for quantum computing

    International Nuclear Information System (INIS)

    Niwa, Jumpei; Matsumoto, Keiji; Imai, Hiroshi

    2002-01-01

    With current technologies, it seems to be very difficult to implement quantum computers with many qubits. It is therefore of importance to simulate quantum algorithms and circuits on the existing computers. However, for a large-size problem, the simulation often requires more computational power than is available from sequential processing. Therefore, simulation methods for parallel processors are required. We have developed a general-purpose simulator for quantum algorithms/circuits on the parallel computer (Sun Enterprise4500). It can simulate algorithms/circuits with up to 30 qubits. In order to test efficiency of our proposed methods, we have simulated Shor's factorization algorithm and Grover's database search, and we have analyzed robustness of the corresponding quantum circuits in the presence of both decoherence and operational errors. The corresponding results, statistics, and analyses are presented in this paper

  9. A parallel buffer tree

    DEFF Research Database (Denmark)

    Sitchinava, Nodar; Zeh, Norbert

    2012-01-01

    We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number of...... in the optimal OhOf(psortN + K/PB) parallel I/O complexity, where K is the size of the output reported in the process and psortN is the parallel I/O complexity of sorting N elements using P processors....

  10. Application Portable Parallel Library

    Science.gov (United States)

    Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

    1995-01-01

    Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.

  11. Parallel Algorithms and Patterns

    Energy Technology Data Exchange (ETDEWEB)

    Robey, Robert W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-06-16

    This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.

  12. The level 1 and 2 specification for parallel benchmark and a benchmark test of scalar-parallel computer SP2 based on the specifications

    International Nuclear Information System (INIS)

    Orii, Shigeo

    1998-06-01

    A benchmark specification for performance evaluation of parallel computers for numerical analysis is proposed. Level 1 benchmark, which is a conventional type benchmark using processing time, measures performance of computers running a code. Level 2 benchmark proposed in this report is to give the reason of the performance. As an example, scalar-parallel computer SP2 is evaluated with this benchmark specification in case of a molecular dynamics code. As a result, the main causes to suppress the parallel performance are maximum band width and start-up time of communication between nodes. Especially the start-up time is proportional not only to the number of processors but also to the number of particles. (author)

  13. MMC with parallel-connected MOSFETs as an alternative to wide bandgap converters for LVDC distribution networks

    Directory of Open Access Journals (Sweden)

    Yanni Zhong

    2017-03-01

    Full Text Available Low-voltage direct-current (LVDC networks offer improved conductor utilisation on existing infrastructure and reduced conversion stages, which can lead to a simpler and more efficient distribution network. However, LVDC networks must continue to support AC loads, requiring efficient, low-distortion DC–AC converters. Additionally, increasing numbers of DC loads on the LVAC network require controlled, low-distortion, unity power factor AC-DC converters with large capacity, and bi-directional capability. An AC–DC/DC–AC converter design is therefore proposed in this study to minimise conversion loss and maximise power quality. Comparative analysis is performed for a conventional IGBT two-level converter, a SiC MOSFET two-level converter, a Si MOSFET modular multi-level converter (MMC and a GaN HEMT MMC, in terms of power loss, reliability, fault tolerance, converter cost and heatsink size. The analysis indicates that the five-level MMC with parallel-connected Si MOSFETs is an efficient, cost-effective converter for low-voltage converter applications. MMC converters suffer negligible switching loss, which enables reduced device switching without loss penalty from increased harmonics and filtering. Optimal extent of parallel-connection for MOSFETs in an MMC is investigated. Experimental results are presented to show the reduction in device stress and electromagnetic interference generating transients through the use of reduced switching and device parallel-connection.

  14. Parallel Aircraft Trajectory Optimization with Analytic Derivatives

    Science.gov (United States)

    Falck, Robert D.; Gray, Justin S.; Naylor, Bret

    2016-01-01

    Trajectory optimization is an integral component for the design of aerospace vehicles, but emerging aircraft technologies have introduced new demands on trajectory analysis that current tools are not well suited to address. Designing aircraft with technologies such as hybrid electric propulsion and morphing wings requires consideration of the operational behavior as well as the physical design characteristics of the aircraft. The addition of operational variables can dramatically increase the number of design variables which motivates the use of gradient based optimization with analytic derivatives to solve the larger optimization problems. In this work we develop an aircraft trajectory analysis tool using a Legendre-Gauss-Lobatto based collocation scheme, providing analytic derivatives via the OpenMDAO multidisciplinary optimization framework. This collocation method uses an implicit time integration scheme that provides a high degree of sparsity and thus several potential options for parallelization. The performance of the new implementation was investigated via a series of single and multi-trajectory optimizations using a combination of parallel computing and constraint aggregation. The computational performance results show that in order to take full advantage of the sparsity in the problem it is vital to parallelize both the non-linear analysis evaluations and the derivative computations themselves. The constraint aggregation results showed a significant numerical challenge due to difficulty in achieving tight convergence tolerances. Overall, the results demonstrate the value of applying analytic derivatives to trajectory optimization problems and lay the foundation for future application of this collocation based method to the design of aircraft with where operational scheduling of technologies is key to achieving good performance.

  15. Modeling, analysis, and design of stationary reference frame droop controlled parallel three-phase voltage source inverters

    DEFF Research Database (Denmark)

    Vasquez, Juan Carlos; Guerrero, Josep M.; Savaghebi, Mehdi

    2011-01-01

    and discussed. Experimental results are provided to validate the performance and robustness of the VSIs functionality during Islanded and grid-connected operations, allowing a seamless transition between these modes through control hierarchies by regulating frequency and voltage, main-grid interactivity......Power electronics based microgrids consist of a number of voltage source inverters (VSIs) operating in parallel. In this paper, the modeling, control design, and stability analysis of three-phase VSIs are derived. The proposed voltage and current inner control loops and the mathematical models...

  16. Exploratory Bi-Factor Analysis: The Oblique Case

    Science.gov (United States)

    Jennrich, Robert I.; Bentler, Peter M.

    2012-01-01

    Bi-factor analysis is a form of confirmatory factor analysis originally introduced by Holzinger and Swineford ("Psychometrika" 47:41-54, 1937). The bi-factor model has a general factor, a number of group factors, and an explicit bi-factor structure. Jennrich and Bentler ("Psychometrika" 76:537-549, 2011) introduced an exploratory form of bi-factor…

  17. Sequential combination of k-t principle component analysis (PCA) and partial parallel imaging: k-t PCA GROWL.

    Science.gov (United States)

    Qi, Haikun; Huang, Feng; Zhou, Hongmei; Chen, Huijun

    2017-03-01

    k-t principle component analysis (k-t PCA) is a distinguished method for high spatiotemporal resolution dynamic MRI. To further improve the accuracy of k-t PCA, a combination with partial parallel imaging (PPI), k-t PCA/SENSE, has been tested. However, k-t PCA/SENSE suffers from long reconstruction time and limited improvement. This study aims to improve the combination of k-t PCA and PPI on both reconstruction speed and accuracy. A sequential combination scheme called k-t PCA GROWL (GRAPPA operator for wider readout line) was proposed. The GRAPPA operator was performed before k-t PCA to extend each readout line into a wider band, which improved the condition of the encoding matrix in the following k-t PCA reconstruction. k-t PCA GROWL was tested and compared with k-t PCA and k-t PCA/SENSE on cardiac imaging. k-t PCA GROWL consistently resulted in better image quality compared with k-t PCA/SENSE at high acceleration factors for both retrospectively and prospectively undersampled cardiac imaging, with a much lower computation cost. The improvement in image quality became greater with the increase of acceleration factor. By sequentially combining the GRAPPA operator and k-t PCA, the proposed k-t PCA GROWL method outperformed k-t PCA/SENSE in both reconstruction speed and accuracy, suggesting that k-t PCA GROWL is a better combination scheme than k-t PCA/SENSE. Magn Reson Med 77:1058-1067, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.

  18. Totally parallel multilevel algorithms

    Science.gov (United States)

    Frederickson, Paul O.

    1988-01-01

    Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.

  19. Optimal task mapping in safety-critical real-time parallel systems; Placement optimal de taches pour les systemes paralleles temps-reel critiques

    Energy Technology Data Exchange (ETDEWEB)

    Aussagues, Ch

    1998-12-11

    This PhD thesis is dealing with the correct design of safety-critical real-time parallel systems. Such systems constitutes a fundamental part of high-performance systems for command and control that can be found in the nuclear domain or more generally in parallel embedded systems. The verification of their temporal correctness is the core of this thesis. our contribution is mainly in the following three points: the analysis and extension of a programming model for such real-time parallel systems; the proposal of an original method based on a new operator of synchronized product of state machines task-graphs; the validation of the approach by its implementation and evaluation. The work addresses particularly the main problem of optimal task mapping on a parallel architecture, such that the temporal constraints are globally guaranteed, i.e. the timeliness property is valid. The results incorporate also optimally criteria for the sizing and correct dimensioning of a parallel system, for instance in the number of processing elements. These criteria are connected with operational constraints of the application domain. Our approach is based on the off-line analysis of the feasibility of the deadline-driven dynamic scheduling that is used to schedule tasks inside one processor. This leads us to define the synchronized-product, a system of linear, constraints is automatically generated and then allows to calculate a maximum load of a group of tasks and then to verify their timeliness constraints. The communications, their timeliness verification and incorporation to the mapping problem is the second main contribution of this thesis. FInally, the global solving technique dealing with both task and communication aspects has been implemented and evaluated in the framework of the OASIS project in the LETI research center at the CEA/Saclay. (author) 96 refs.

  20. A Model for Speedup of Parallel Programs

    Science.gov (United States)

    1997-01-01

    Sanjeev. K Setia . The interaction between mem- ory allocation and adaptive partitioning in message- passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [15] Sanjeev K. Setia and Satish K. Tripathi. A compar- ative analysis of static

  1. Emotional stimuli exert parallel effects on attention and memory.

    Science.gov (United States)

    Talmi, Deborah; Ziegler, Marilyne; Hawksworth, Jade; Lalani, Safina; Herman, C Peter; Moscovitch, Morris

    2013-01-01

    Because emotional and neutral stimuli typically differ on non-emotional dimensions, it has been difficult to determine conclusively which factors underlie the ability of emotional stimuli to enhance immediate long-term memory. Here we induced arousal by varying participants' goals, a method that removes many potential confounds between emotional and non-emotional items. Hungry and sated participants encoded food and clothing images under divided attention conditions. Sated participants attended to and recalled food and clothing images equivalently. Hungry participants performed worse on the concurrent tone-discrimination task when they viewed food relative to clothing images, suggesting enhanced attention to food images, and they recalled more food than clothing images. A follow-up regression analysis of the factors predicting memory for individual pictures revealed that food images had parallel effects on attention and memory in hungry participants, so that enhanced attention to food images did not predict their enhanced memory. We suggest that immediate long-term memory for food is enhanced in the hungry state because hunger leads to more distinctive processing of food images rendering them more accessible during retrieval.

  2. Parallel processing based decomposition technique for efficient collaborative optimization

    International Nuclear Information System (INIS)

    Park, Hyung Wook; Kim, Sung Chan; Kim, Min Soo; Choi, Dong Hoon

    2001-01-01

    In practical design studies, most of designers solve multidisciplinary problems with large sized and complex design system. These multidisciplinary problems have hundreds of analysis and thousands of variables. The sequence of process to solve these problems affects the speed of total design cycle. Thus it is very important for designer to reorder the original design processes to minimize total computational cost. This is accomplished by decomposing large multidisciplinary problem into several MultiDisciplinary Analysis SubSystem (MDASS) and processing it in parallel. This paper proposes new strategy for parallel decomposition of multidisciplinary problem to raise design efficiency by using genetic algorithm and shows the relationship between decomposition and Multidisciplinary Design Optimization(MDO) methodology

  3. Oscillatory flow at the end of parallel-plate stacks: phenomenological and similarity analysis

    International Nuclear Information System (INIS)

    Mao Xiaoan; Jaworski, Artur J

    2010-01-01

    This paper addresses the physics of the oscillatory flow in the vicinity of a series of parallel plates forming geometrically identical channels. This type of flow is particularly relevant to thermoacoustic engines and refrigerators, where a reciprocating flow is responsible for the desirable energy transfer, but it is also of interest to general fluid mechanics of oscillatory flows past bluff bodies. In this paper, the physics of an acoustically induced flow past a series of plates in an isothermal condition is studied in detail using the data provided by PIV imaging. Particular attention is given to the analysis of the wake flow during the ejection part of the flow cycle, where either closed recirculating vortices or alternating vortex shedding can be observed. This is followed by a similarity analysis of the governing Navier-Stokes equations in order to derive the similarity criteria governing the wake flow behaviour. To this end, similarity numbers including two types of Reynolds number, the Keulegan-Carpenter number and a non-dimensional stack configuration parameter, d/h, are considered and their influence on the phenomena are discussed.

  4. Review of data and methods recommended in the international code of practice for dosimetry IAEA Technical Reports Series No. 381, The Use of Plane Parallel Ionization Chambers in High Energy Electron and Photon beams. Final report of the co-ordinated research project on dose determination with plane parallel ionization chambers in therapeutic electron and photon beams

    International Nuclear Information System (INIS)

    Dusautoy, A.; Roos, M.; Svensson, H.; Andreo, P.

    2000-01-01

    An IAEA Co-ordinated Research Project was designed to validate the data and procedures included in the International Code of Practice Technical Reports Series (TRS) No. 381, ''The Use of Plane Parallel Ionization Chambers in High Energy Electron and Photon Beams''. This work reviews and analyses the procedures used and the data obtained by the participants of the project. The analysis shows that applying TRS-381 generally produces reliable results. The determination of absorbed dose to water using the electron method in reference conditions is within the stated uncertainties (2.9%). Comparisons have shown TRS-381 is consistent with the AAPM TG-39 protocol within 1% for measurements made in water. Based on the analysis, recommendations are given with respect to: (i) the use of plane parallel ionization chambers of the Markus type, (ii) the values for the fluence correction factor for cylindrical chambers, (iii) the value of the wall correction factor for the Roos chamber in 60 Co beams, and (iv) the use of plastic phantoms and the values of the fluence correction factors. (author)

  5. Neural Parallel Engine: A toolbox for massively parallel neural signal processing.

    Science.gov (United States)

    Tam, Wing-Kin; Yang, Zhi

    2018-05-01

    Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.

  6. A possibility of parallel and anti-parallel diffraction measurements on ...

    Indian Academy of Sciences (India)

    However, a bent perfect crystal (BPC) monochromator at monochromatic focusing condition can provide a quite flat and equal resolution property at both parallel and anti-parallel positions and thus one can have a chance to use both sides for the diffraction experiment. From the data of the FWHM and the / measured ...

  7. Design and test of a parallel kinematic solar tracker

    Directory of Open Access Journals (Sweden)

    Stefano Mauro

    2015-12-01

    Full Text Available This article proposes a parallel kinematic solar tracker designed for driving high-concentration photovoltaic modules. This kind of module produces energy only if they are oriented with misalignment errors lower than 0.4°. Generally, a parallel kinematic structure provides high stiffness and precision in positioning, so these features make this mechanism fit for the purpose. This article describes the work carried out to design a suitable parallel machine: an already existing architecture was chosen, and the geometrical parameters of the system were defined in order to obtain a workspace consistent with the requirements for sun tracking. Besides, an analysis of the singularities of the system was carried out. The method used for the singularity analysis revealed the existence of singularities which had not been previously identified for this kind of mechanism. From the analysis of the mechanism developed, very low nominal energy consumption and elevated stiffness were found. A small-scale prototype of the system was constructed for the first time. A control algorithm was also developed, implemented, and tested. Finally, experimental tests were carried out in order to verify the capability of the system of ensuring precise pointing. The tests have been considered passed as the system showed an orientation error lower than 0.4° during sun tracking.

  8. Optimizing a massive parallel sequencing workflow for quantitative miRNA expression analysis.

    Directory of Open Access Journals (Sweden)

    Francesca Cordero

    Full Text Available BACKGROUND: Massive Parallel Sequencing methods (MPS can extend and improve the knowledge obtained by conventional microarray technology, both for mRNAs and short non-coding RNAs, e.g. miRNAs. The processing methods used to extract and interpret the information are an important aspect of dealing with the vast amounts of data generated from short read sequencing. Although the number of computational tools for MPS data analysis is constantly growing, their strengths and weaknesses as part of a complex analytical pipe-line have not yet been well investigated. PRIMARY FINDINGS: A benchmark MPS miRNA dataset, resembling a situation in which miRNAs are spiked in biological replication experiments was assembled by merging a publicly available MPS spike-in miRNAs data set with MPS data derived from healthy donor peripheral blood mononuclear cells. Using this data set we observed that short reads counts estimation is strongly under estimated in case of duplicates miRNAs, if whole genome is used as reference. Furthermore, the sensitivity of miRNAs detection is strongly dependent by the primary tool used in the analysis. Within the six aligners tested, specifically devoted to miRNA detection, SHRiMP and MicroRazerS show the highest sensitivity. Differential expression estimation is quite efficient. Within the five tools investigated, two of them (DESseq, baySeq show a very good specificity and sensitivity in the detection of differential expression. CONCLUSIONS: The results provided by our analysis allow the definition of a clear and simple analytical optimized workflow for miRNAs digital quantitative analysis.

  9. Optimizing a massive parallel sequencing workflow for quantitative miRNA expression analysis.

    Science.gov (United States)

    Cordero, Francesca; Beccuti, Marco; Arigoni, Maddalena; Donatelli, Susanna; Calogero, Raffaele A

    2012-01-01

    Massive Parallel Sequencing methods (MPS) can extend and improve the knowledge obtained by conventional microarray technology, both for mRNAs and short non-coding RNAs, e.g. miRNAs. The processing methods used to extract and interpret the information are an important aspect of dealing with the vast amounts of data generated from short read sequencing. Although the number of computational tools for MPS data analysis is constantly growing, their strengths and weaknesses as part of a complex analytical pipe-line have not yet been well investigated. A benchmark MPS miRNA dataset, resembling a situation in which miRNAs are spiked in biological replication experiments was assembled by merging a publicly available MPS spike-in miRNAs data set with MPS data derived from healthy donor peripheral blood mononuclear cells. Using this data set we observed that short reads counts estimation is strongly under estimated in case of duplicates miRNAs, if whole genome is used as reference. Furthermore, the sensitivity of miRNAs detection is strongly dependent by the primary tool used in the analysis. Within the six aligners tested, specifically devoted to miRNA detection, SHRiMP and MicroRazerS show the highest sensitivity. Differential expression estimation is quite efficient. Within the five tools investigated, two of them (DESseq, baySeq) show a very good specificity and sensitivity in the detection of differential expression. The results provided by our analysis allow the definition of a clear and simple analytical optimized workflow for miRNAs digital quantitative analysis.

  10. Parallel-Sequential Texture Analysis

    NARCIS (Netherlands)

    van den Broek, Egon; Singh, Sameer; Singh, Maneesha; van Rikxoort, Eva M.; Apte, Chid; Perner, Petra

    2005-01-01

    Color induced texture analysis is explored, using two texture analysis techniques: the co-occurrence matrix and the color correlogram as well as color histograms. Several quantization schemes for six color spaces and the human-based 11 color quantization scheme have been applied. The VisTex texture

  11. Parallel Careers and their Consequences for Companies in Brazil

    Directory of Open Access Journals (Sweden)

    Maria Candida Baumer Azevedo

    2014-04-01

    Full Text Available Given the relevance of the need to manage parallel careers to attract and retain people in organizations, this paper provides insight into this phenomenon from an organizational perspective. The parallel career concept, introduced by Alboher (2007 and recently addressed by Schuiling (2012, has previously been examined only from the perspective of the parallel career holder (PC holder. The paper provides insight from both individual and organizational perspectives on the phenomenon of parallel careers and considers how it can function as an important tool for attracting and retaining people by contributing to human development. This paper employs a qualitative approach that includes 30 semi-structured one-on-one interviews. The organizational perspective arises from the 15 interviews with human resources (HR executives from different companies. The individual viewpoint originates from the interviews with 15 executives who are also PC holders. An inductive content analysis approach was used to examine Brazilian companies and the Brazilian office of multinationals. Companies that are concerned about having the best talent on their teams can benefit from a deeper understanding of parallel careers, which can be used to attract, develop, and retain talent. Limitations and directions for future research are discussed.

  12. Comparative analysis of the serial/parallel numerical calculation of boiling channels thermohydraulics; Analisis comparativo del calculo numerico serie/paralelo de la termohidraulica de canales con ebullicion

    Energy Technology Data Exchange (ETDEWEB)

    Cecenas F, M., E-mail: mcf@iie.org.mx [Instituto Nacional de Electricidad y Energias Limpias, Reforma 113, Col. Palmira, 62490 Cuernavaca, Morelos (Mexico)

    2017-09-15

    A parallel channel model with boiling and punctual neutron kinetics is used to compare the implementation of its programming in C language through a conventional scheme and through a parallel programming scheme. In both cases the subroutines written in C are practically the same, but they vary in the way of controlling the execution of the tasks that calculate the different channels. Parallel Virtual Machine is used for the parallel solution, which allows the passage of messages between tasks to control convergence and transfer the variables of interest between the tasks that run simultaneously on a platform equipped with a multi-core microprocessor. For some problems defined as a study case, such as the one presented in this paper, a computer with two cores can reduce the computation time to 54-56% of the time required by the same program in its conventional sequential version. Similarly, a processor with four cores can reduce the time to 22-33% of execution time of the conventional serial version. These results of substantially reducing the computation time are very motivating of all those applications that can be prepared to be parallelized and whose execution time is an important factor. (Author)

  13. A Parallel Solver for Large-Scale Markov Chains

    Czech Academy of Sciences Publication Activity Database

    Benzi, M.; Tůma, Miroslav

    2002-01-01

    Roč. 41, - (2002), s. 135-153 ISSN 0168-9274 R&D Projects: GA AV ČR IAA2030801; GA ČR GA101/00/1035 Keywords : parallel preconditioning * iterative methods * discrete Markov chains * generalized inverses * singular matrices * graph partitioning * AINV * Bi-CGSTAB Subject RIV: BA - General Mathematics Impact factor: 0.504, year: 2002

  14. Economical parallel oligonucleotide and peptide synthesizer - PET OLIGATOR

    Czech Academy of Sciences Publication Activity Database

    Lebl, M.; Pistek, Ch.; Hachmann, J.; Mudra, Petr; Pešek, Václav; Pokorný, Vít; Poncar, Pavel; Ženíšek, Karel

    2007-01-01

    Roč. 13, 1/2 (2007), s. 367-375 ISSN 1573-3149 Grant - others:NIH SBIR(US) R43 GM61511-01; NIH SBIR(US) R43 GM58981-01 Institutional research plan: CEZ:AV0Z40550506 Keywords : automated synthesizer * centrifugation * parallel synthesis Subject RIV: CC - Organic Chemistry Impact factor: 0.971, year: 2007

  15. Advanced exergy analysis of a R744 booster refrigeration system with parallel compression

    DEFF Research Database (Denmark)

    Gullo, Paride; Elmegaard, Brian; Cortella, Giovanni

    2016-01-01

    In this paper, the advanced exergy analysis was applied to a R744 booster refrigeration system with parallel compression taking into account the design external temperatures of 25 degrees C and 35 degrees C, as well as the operating conditions of a conventional European supermarket. The global...... efficiencies of all the chosen compressors were extrapolated from some manufactures' data and appropriated optimization procedures of the performance of the investigated solution were implemented.According to the results associated with the conventional exergy evaluation, the gas cooler/condenser, the HS (high...... stage) compressor and the MT (medium temperature) display cabinet exhibited the highest enhancement potential. The further splitting of their corresponding exergy destruction rates into their different parts and the following assessment of the interactions among the components allowed figuring out...

  16. [Resolving characteristic of CDOM by excitation-emission matrix spectroscopy combined with parallel factor analysis in the seawater of outer Yangtze Estuary in Autumn in 2010].

    Science.gov (United States)

    Yan, Li-Hong; Chen, Xue-Jun; Su, Rong-Guo; Han, Xiu-Rong; Zhang, Chuan-Song; Shi, Xiao-Yong

    2013-01-01

    The distribution and estuarine behavior of fluorescent components of chromophoric dissolved organic matter in the seawater of outer Yangtze Estuary were determined by fluorescence excitation emission matrix spectra combined with parallel factor analysis. Six individual fluorescent components were identified by PARAFAC models, including three terrestrial humic-like components C1 [330 nm/390(430) nm], C2 (390 nm/480 nm), C3 (360 nm/440 nm), marine biological production component C5 (300 nm/400 nm) and protein-like components C4 (290 nm/350 nm) and C6 (275 nm/300 nm). The results indicated that C1, C2, and C3 showed a conservative mixing behavior in the whole estuarine region, especially in high-salinity region. And the fluorescence intensity proportion of C1 and C3 decreased with increase of salinity and fluorescence intensity proportion of C2 kept constant with increase of salinity in the whole estuarine region. While C4 showed conservative mixing behavior in low-salinity region and non-conservative mixing behavior in high-salinity region, and fluorescence intensity proportion of C4 increased with increase of salinity. However, C5 and C6 showed a non-conservative mixing behavior and fluorescence intensity proportion increased with increase of salinity in high-salinity region. Significantly spatial difference was recorded for CDOM absorption coefficient in the coastal region and in the open water areas with the highest value in coastal region and the lowest value in the open water areas. The scope of absorption coefficient and absorption slope was higher in coastal region than that in the open water areas. Significantly positive correlations were found between CDOM absorption coefficient and the fluorescence intensities of C1, C2, C3, and C4, but no significant correlation was found between C5 and C6, suggesting that the river inputs contributed to the coastal areas, while CDOM in the open water areas was affected by terrestrial inputs and phytoplankton degradation.

  17. [Characterizing chromophoric dissolved organic matter (CDOM) in Lake Honghu, Lake Donghu and Lake Liangzihu using excitation-emission matrices (EEMs) fluorescence and parallel factor analysis (PARAFAC)].

    Science.gov (United States)

    Zhou, Yong-Qiang; Zhang, Yun-Lin; Niu, Cheng; Wang, Ming-Zhu

    2013-12-01

    Little is known about DOM characteristics in medium to large sized lakes located in the middle and lower reaches of Yangtze River, like Lake Honghu, Lake Donghu and Lake Liangzihu. Absorption, fluorescence and composition characteristics of chromophoric dissolved organic matter (CDOM) are presented using the absorption spectroscopy, the excitation-emission ma trices (EEMs) fluorescence and parallel factor analysis (PARAFAC) model based on the data collected in Sep-Oct. 2007 including 15, 9 and 10 samplings in Lake Honghu, Lake Donghu and Lake Liangzihu, respectively. CDOM absorption coefficient at 350 nm a(350) coefficient in Lake Honghu was significantly higher than those in Lake Donghu and Lake Liangzihu (t-test, pCDOM spectral slope in the wavelength range of 280-500 nm (S280-500) and a(350) (R2 =0. 781, p<0. 001). The mean value of S280-500 in Lake Honghu was significantly lower than those in Lake Donghu (t-test, p

  18. Sharing of nonlinear load in parallel-connected three-phase converters

    DEFF Research Database (Denmark)

    Borup, Uffe; Blaabjerg, Frede; Enjeti, Prasad N.

    2001-01-01

    compensation are connected in parallel. Without the new solution, they are normally not able to distinguish the harmonic currents that flow to the load and harmonic currents that circulate between the converters. Analysis and experimental results on two 90-kVA 400-Hz converters in parallel are presented......In this paper, a new control method is presented which enables equal sharing of linear and nonlinear loads in three-phase power converters connected in parallel, without communication between the converters. The paper focuses on solving the problem that arises when two converters with harmonic....... The results show that both linear and nonlinear loads can be shared equally by the proposed concept....

  19. Parallel implementation of the PHOENIX generalized stellar atmosphere program. II. Wavelength parallelization

    International Nuclear Information System (INIS)

    Baron, E.; Hauschildt, Peter H.

    1998-01-01

    We describe an important addition to the parallel implementation of our generalized nonlocal thermodynamic equilibrium (NLTE) stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is, distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition, task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000 - 300,000) and hence parallelization over wavelength can lead both to considerable speedup in calculation time and the ability to make use of the aggregate memory available on massively parallel supercomputers. Here, we describe an implementation of a pipelined design for the wavelength parallelization of PHOENIX, where the necessary data from the processor working on a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known. Our implementation uses a MIMD design based on a relatively small number of standard message passing interface (MPI) library calls and is fully portable between serial and parallel computers. copyright 1998 The American Astronomical Society

  20. A parallel implementation of 3D Zernike moment analysis

    OpenAIRE

    Berjón Díez, Daniel; Arnaldo Duart, Sergio; Morán Burgos, Francisco

    2011-01-01

    Zernike polynomials are a well known set of functions that find many applications in image or pattern characterization because they allow to construct shape descriptors that are invariant against translations, rotations or scale changes. The concepts behind them can be extended to higher dimension spaces, making them also fit to describe volumetric data. They have been less used than their properties might suggest due to their high computational cost. We present a parallel implementation of 3...

  1. Parallel implementation of DNA sequences matching algorithms using PWM on GPU architecture.

    Science.gov (United States)

    Sharma, Rahul; Gupta, Nitin; Narang, Vipin; Mittal, Ankush

    2011-01-01

    Positional Weight Matrices (PWMs) are widely used in representation and detection of Transcription Factor Of Binding Sites (TFBSs) on DNA. We implement online PWM search algorithm over parallel architecture. A large PWM data can be processed on Graphic Processing Unit (GPU) systems in parallel which can help in matching sequences at a faster rate. Our method employs extensive usage of highly multithreaded architecture and shared memory of multi-cored GPU. An efficient use of shared memory is required to optimise parallel reduction in CUDA. Our optimised method has a speedup of 230-280x over linear implementation on GPU named GeForce GTX 280.

  2. Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications

    International Nuclear Information System (INIS)

    Hoisie, A.; Lubeck, O.; Wasserman, H.

    1998-01-01

    The authors develop a model for the parallel performance of algorithms that consist of concurrent, two-dimensional wavefronts implemented in a message passing environment. The model, based on a LogGP machine parameterization, combines the separate contributions of computation and communication wavefronts. They validate the model on three important supercomputer systems, on up to 500 processors. They use data from a deterministic particle transport application taken from the ASCI workload, although the model is general to any wavefront algorithm implemented on a 2-D processor domain. They also use the validated model to make estimates of performance and scalability of wavefront algorithms on 100-TFLOPS computer systems expected to be in existence within the next decade as part of the ASCI program and elsewhere. In this context, they analyze two problem sizes. The model shows that on the largest such problem (1 billion cells), inter-processor communication performance is not the bottleneck. Single-node efficiency is the dominant factor

  3. Gravitational Waves: Search Results, Data Analysis and Parameter Estimation. Amaldi 10 Parallel Session C2

    Science.gov (United States)

    Astone, Pia; Weinstein, Alan; Agathos, Michalis; Bejger, Michal; Christensen, Nelson; Dent, Thomas; Graff, Philip; Klimenko, Sergey; Mazzolo, Giulio; Nishizawa, Atsushi

    2015-01-01

    The Amaldi 10 Parallel Session C2 on gravitational wave(GW) search results, data analysis and parameter estimation included three lively sessions of lectures by 13 presenters, and 34 posters. The talks and posters covered a huge range of material, including results and analysis techniques for ground-based GW detectors, targeting anticipated signals from different astrophysical sources: compact binary inspiral, merger and ringdown; GW bursts from intermediate mass binary black hole mergers, cosmic string cusps, core-collapse supernovae, and other unmodeled sources; continuous waves from spinning neutron stars; and a stochastic GW background. There was considerable emphasis on Bayesian techniques for estimating the parameters of coalescing compact binary systems from the gravitational waveforms extracted from the data from the advanced detector network. This included methods to distinguish deviations of the signals from what is expected in the context of General Relativity.

  4. Gravitational waves: search results, data analysis and parameter estimation: Amaldi 10 Parallel session C2.

    Science.gov (United States)

    Astone, Pia; Weinstein, Alan; Agathos, Michalis; Bejger, Michał; Christensen, Nelson; Dent, Thomas; Graff, Philip; Klimenko, Sergey; Mazzolo, Giulio; Nishizawa, Atsushi; Robinet, Florent; Schmidt, Patricia; Smith, Rory; Veitch, John; Wade, Madeline; Aoudia, Sofiane; Bose, Sukanta; Calderon Bustillo, Juan; Canizares, Priscilla; Capano, Colin; Clark, James; Colla, Alberto; Cuoco, Elena; Da Silva Costa, Carlos; Dal Canton, Tito; Evangelista, Edgar; Goetz, Evan; Gupta, Anuradha; Hannam, Mark; Keitel, David; Lackey, Benjamin; Logue, Joshua; Mohapatra, Satyanarayan; Piergiovanni, Francesco; Privitera, Stephen; Prix, Reinhard; Pürrer, Michael; Re, Virginia; Serafinelli, Roberto; Wade, Leslie; Wen, Linqing; Wette, Karl; Whelan, John; Palomba, C; Prodi, G

    The Amaldi 10 Parallel Session C2 on gravitational wave (GW) search results, data analysis and parameter estimation included three lively sessions of lectures by 13 presenters, and 34 posters. The talks and posters covered a huge range of material, including results and analysis techniques for ground-based GW detectors, targeting anticipated signals from different astrophysical sources: compact binary inspiral, merger and ringdown; GW bursts from intermediate mass binary black hole mergers, cosmic string cusps, core-collapse supernovae, and other unmodeled sources; continuous waves from spinning neutron stars; and a stochastic GW background. There was considerable emphasis on Bayesian techniques for estimating the parameters of coalescing compact binary systems from the gravitational waveforms extracted from the data from the advanced detector network. This included methods to distinguish deviations of the signals from what is expected in the context of General Relativity.

  5. A proposal simulated annealing algorithm for proportional parallel flow shops with separated setup times

    Directory of Open Access Journals (Sweden)

    Helio Yochihiro Fuchigami

    2014-08-01

    Full Text Available This article addresses the problem of minimizing makespan on two parallel flow shops with proportional processing and setup times. The setup times are separated and sequence-independent. The parallel flow shop scheduling problem is a specific case of well-known hybrid flow shop, characterized by a multistage production system with more than one machine working in parallel at each stage. This situation is very common in various kinds of companies like chemical, electronics, automotive, pharmaceutical and food industries. This work aimed to propose six Simulated Annealing algorithms, their perturbation schemes and an algorithm for initial sequence generation. This study can be classified as “applied research” regarding the nature, “exploratory” about the objectives and “experimental” as to procedures, besides the “quantitative” approach. The proposed algorithms were effective regarding the solution and computationally efficient. Results of Analysis of Variance (ANOVA revealed no significant difference between the schemes in terms of makespan. It’s suggested the use of PS4 scheme, which moves a subsequence of jobs, for providing the best percentage of success. It was also found that there is a significant difference between the results of the algorithms for each value of the proportionality factor of the processing and setup times of flow shops.

  6. An Approach to Evaluate Stability for Cable-Based Parallel Camera Robots with Hybrid Tension-Stiffness Properties

    Directory of Open Access Journals (Sweden)

    Huiling Wei

    2015-12-01

    Full Text Available This paper focuses on studying the effect of cable tensions and stiffness on the stability of cable-based parallel camera robots. For this purpose, the tension factor and the stiffness factor are defined, and the expression of stability is deduced. A new approach is proposed to calculate the hybrid-stability index with the minimum cable tension and the minimum singular value. Firstly, the kinematic model of a cable-based parallel camera robot is established. Based on the model, the tensions are solved and a tension factor is defined. In order to obtain the tension factor, an optimization of the cable tensions is carried out. Then, an expression of the system's stiffness is deduced and a stiffness factor is defined. Furthermore, an approach to evaluate the stability of the cable-based camera robots with hybrid tension-stiffness properties is presented. Finally, a typical three-degree-of-freedom cable-based parallel camera robot with four cables is studied as a numerical example. The simulation results show that the approach is both reasonable and effective.

  7. Frontiers of massively parallel scientific computation

    International Nuclear Information System (INIS)

    Fischer, J.R.

    1987-07-01

    Practical applications using massively parallel computer hardware first appeared during the 1980s. Their development was motivated by the need for computing power orders of magnitude beyond that available today for tasks such as numerical simulation of complex physical and biological processes, generation of interactive visual displays, satellite image analysis, and knowledge based systems. Representative of the first generation of this new class of computers is the Massively Parallel Processor (MPP). A team of scientists was provided the opportunity to test and implement their algorithms on the MPP. The first results are presented. The research spans a broad variety of applications including Earth sciences, physics, signal and image processing, computer science, and graphics. The performance of the MPP was very good. Results obtained using the Connection Machine and the Distributed Array Processor (DAP) are presented

  8. A parallel sweeping preconditioner for frequency-domain seismic wave propagation

    KAUST Repository

    Poulson, Jack

    2012-09-01

    We present a parallel implementation of Engquist and Ying\\'s sweeping preconditioner, which exploits radiation boundary conditions in order to form an approximate block LDLT factorization of the Helmholtz operator with only O(N4/3) work and an application (and memory) cost of only O(N logN). The approximate factorization is then used as a preconditioner for GMRES, and we show that essentially O(1) iterations are required for convergence, even for the full SEG/EAGE over-thrust model at 30 Hz. In particular, we demonstrate the solution of said problem in a mere 15 minutes on 8192 cores of TACC\\'s Lonestar, which may be the largest-scale 3D heterogeneous Helmholtz calculation to date. Generalizations of our parallel strategy are also briefly discussed for time-harmonic linear elasticity and Maxwell\\'s equations.

  9. MapReduce Based Parallel Bayesian Network for Manufacturing Quality Control

    Science.gov (United States)

    Zheng, Mao-Kuan; Ming, Xin-Guo; Zhang, Xian-Yu; Li, Guo-Ming

    2017-09-01

    Increasing complexity of industrial products and manufacturing processes have challenged conventional statistics based quality management approaches in the circumstances of dynamic production. A Bayesian network and big data analytics integrated approach for manufacturing process quality analysis and control is proposed. Based on Hadoop distributed architecture and MapReduce parallel computing model, big volume and variety quality related data generated during the manufacturing process could be dealt with. Artificial intelligent algorithms, including Bayesian network learning, classification and reasoning, are embedded into the Reduce process. Relying on the ability of the Bayesian network in dealing with dynamic and uncertain problem and the parallel computing power of MapReduce, Bayesian network of impact factors on quality are built based on prior probability distribution and modified with posterior probability distribution. A case study on hull segment manufacturing precision management for ship and offshore platform building shows that computing speed accelerates almost directly proportionally to the increase of computing nodes. It is also proved that the proposed model is feasible for locating and reasoning of root causes, forecasting of manufacturing outcome, and intelligent decision for precision problem solving. The integration of bigdata analytics and BN method offers a whole new perspective in manufacturing quality control.

  10. A Novel Algorithm for Solving the Multidimensional Neutron Transport Equation on Massively Parallel Architectures

    Energy Technology Data Exchange (ETDEWEB)

    Azmy, Yousry

    2014-06-10

    We employ the Integral Transport Matrix Method (ITMM) as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells' fluxes and between the cells' and boundary surfaces' fluxes. The main goals of this work are to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and parallel performance of the developed methods with increasing number of processes, P. The fastest observed parallel solution method, Parallel Gauss-Seidel (PGS), was used in a weak scaling comparison with the PARTISN transport code, which uses the source iteration (SI) scheme parallelized with the Koch-baker-Alcouffe (KBA) method. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method- even without acceleration/preconditioning-is completitive for optically thick problems as P is increased to the tens of thousands range. For the most optically thick cells tested, PGS reduced execution time by an approximate factor of three for problems with more than 130 million computational cells on P = 32,768. Moreover, the SI-DSA execution times's trend rises generally more steeply with increasing P than the PGS trend. Furthermore, the PGS method outperforms SI for the periodic heterogeneous layers (PHL) configuration problems. The PGS method outperforms SI and SI-DSA on as few as P = 16 for PHL problems and reduces execution time by a factor of ten or more for all problems considered with more than 2 million computational cells on P = 4.096.

  11. Parallel k-means++

    Energy Technology Data Exchange (ETDEWEB)

    2017-04-04

    A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.

  12. Exploiting parallel R in the cloud with SPRINT.

    Science.gov (United States)

    Piotrowski, M; McGilvary, G A; Sloan, T M; Mewissen, M; Lloyd, A D; Forster, T; Mitchell, L; Ghazal, P; Hill, J

    2013-01-01

    Advances in DNA Microarray devices and next-generation massively parallel DNA sequencing platforms have led to an exponential growth in data availability but the arising opportunities require adequate computing resources. High Performance Computing (HPC) in the Cloud offers an affordable way of meeting this need. Bioconductor, a popular tool for high-throughput genomic data analysis, is distributed as add-on modules for the R statistical programming language but R has no native capabilities for exploiting multi-processor architectures. SPRINT is an R package that enables easy access to HPC for genomics researchers. This paper investigates: setting up and running SPRINT-enabled genomic analyses on Amazon's Elastic Compute Cloud (EC2), the advantages of submitting applications to EC2 from different parts of the world and, if resource underutilization can improve application performance. The SPRINT parallel implementations of correlation, permutation testing, partitioning around medoids and the multi-purpose papply have been benchmarked on data sets of various size on Amazon EC2. Jobs have been submitted from both the UK and Thailand to investigate monetary differences. It is possible to obtain good, scalable performance but the level of improvement is dependent upon the nature of the algorithm. Resource underutilization can further improve the time to result. End-user's location impacts on costs due to factors such as local taxation. Although not designed to satisfy HPC requirements, Amazon EC2 and cloud computing in general provides an interesting alternative and provides new possibilities for smaller organisations with limited funds.

  13. Parallel experimental design and multivariate analysis provides efficient screening of cell culture media supplements to improve biosimilar product quality.

    Science.gov (United States)

    Brühlmann, David; Sokolov, Michael; Butté, Alessandro; Sauer, Markus; Hemberger, Jürgen; Souquet, Jonathan; Broly, Hervé; Jordan, Martin

    2017-07-01

    Rational and high-throughput optimization of mammalian cell culture media has a great potential to modulate recombinant protein product quality. We present a process design method based on parallel design-of-experiment (DoE) of CHO fed-batch cultures in 96-deepwell plates to modulate monoclonal antibody (mAb) glycosylation using medium supplements. To reduce the risk of losing valuable information in an intricate joint screening, 17 compounds were separated into five different groups, considering their mode of biological action. The concentration ranges of the medium supplements were defined according to information encountered in the literature and in-house experience. The screening experiments produced wide glycosylation pattern ranges. Multivariate analysis including principal component analysis and decision trees was used to select the best performing glycosylation modulators. Subsequent D-optimal quadratic design with four factors (three promising compounds and temperature shift) in shake tubes confirmed the outcome of the selection process and provided a solid basis for sequential process development at a larger scale. The glycosylation profile with respect to the specifications for biosimilarity was greatly improved in shake tube experiments: 75% of the conditions were equally close or closer to the specifications for biosimilarity than the best 25% in 96-deepwell plates. Biotechnol. Bioeng. 2017;114: 1448-1458. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  14. Execution Model of Three Parallel Languages: OpenMP, UPC and CAF

    Directory of Open Access Journals (Sweden)

    Ami Marowka

    2005-01-01

    Full Text Available The aim of this paper is to present a qualitative evaluation of three state-of-the-art parallel languages: OpenMP, Unified Parallel C (UPC and Co-Array Fortran (CAF. OpenMP and UPC are explicit parallel programming languages based on the ANSI standard. CAF is an implicit programming language. On the one hand, OpenMP designs for shared-memory architectures and extends the base-language by using compiler directives that annotate the original source-code. On the other hand, UPC and CAF designs for distribute-shared memory architectures and extends the base-language by new parallel constructs. We deconstruct each language into its basic components, show examples, make a detailed analysis, compare them, and finally draw some conclusions.

  15. Experiences in Data-Parallel Programming

    Directory of Open Access Journals (Sweden)

    Terry W. Clark

    1997-01-01

    Full Text Available To efficiently parallelize a scientific application with a data-parallel compiler requires certain structural properties in the source program, and conversely, the absence of others. A recent parallelization effort of ours reinforced this observation and motivated this correspondence. Specifically, we have transformed a Fortran 77 version of GROMOS, a popular dusty-deck program for molecular dynamics, into Fortran D, a data-parallel dialect of Fortran. During this transformation we have encountered a number of difficulties that probably are neither limited to this particular application nor do they seem likely to be addressed by improved compiler technology in the near future. Our experience with GROMOS suggests a number of points to keep in mind when developing software that may at some time in its life cycle be parallelized with a data-parallel compiler. This note presents some guidelines for engineering data-parallel applications that are compatible with Fortran D or High Performance Fortran compilers.

  16. A global database with parallel measurements to study non-climatic changes

    Science.gov (United States)

    Venema, Victor; Auchmann, Renate; Aguilar, Enric; Auer, Ingeborg; Azorin-Molina, Cesar; Brandsma, Theo; Brunetti, Michele; Dienst, Manuel; Domonkos, Peter; Gilabert, Alba; Lindén, Jenny; Milewska, Ewa; Nordli, Øyvind; Prohom, Marc; Rennie, Jared; Stepanek, Petr; Trewin, Blair; Vincent, Lucie; Willett, Kate; Wolff, Mareile

    2016-04-01

    In this work we introduce the rationale behind the ongoing compilation of a parallel measurements database, in the framework of the International Surface Temperatures Initiative (ISTI) and with the support of the World Meteorological Organization. We intend this database to become instrumental for a better understanding of inhomogeneities affecting the evaluation of long-term changes in daily climate data. Long instrumental climate records are usually affected by non-climatic changes, due to, e.g., (i) station relocations, (ii) instrument height changes, (iii) instrumentation changes, (iv) observing environment changes, (v) different sampling intervals or data collection procedures, among others. These so-called inhomogeneities distort the climate signal and can hamper the assessment of long-term trends and variability of climate. Thus to study climatic changes we need to accurately distinguish non-climatic and climatic signals. The most direct way to study the influence of non-climatic changes on the distribution and to understand the reasons for these biases is the analysis of parallel measurements representing the old and new situation (in terms of e.g. instruments, location, different radiation shields, etc.). According to the limited number of available studies and our understanding of the causes of inhomogeneity, we expect that they will have a strong impact on the tails of the distribution of air temperatures and most likely of other climate elements. Our abilities to statistically homogenize daily data will be increased by systematically studying different causes of inhomogeneity replicated through parallel measurements. Current studies of non-climatic changes using parallel data are limited to local and regional case studies. However, the effect of specific transitions depends on the local climate and the most interesting climatic questions are about the systematic large-scale biases produced by transitions that occurred in many regions. Important

  17. Non-Cartesian parallel imaging reconstruction.

    Science.gov (United States)

    Wright, Katherine L; Hamilton, Jesse I; Griswold, Mark A; Gulani, Vikas; Seiberlich, Nicole

    2014-11-01

    Non-Cartesian parallel imaging has played an important role in reducing data acquisition time in MRI. The use of non-Cartesian trajectories can enable more efficient coverage of k-space, which can be leveraged to reduce scan times. These trajectories can be undersampled to achieve even faster scan times, but the resulting images may contain aliasing artifacts. Just as Cartesian parallel imaging can be used to reconstruct images from undersampled Cartesian data, non-Cartesian parallel imaging methods can mitigate aliasing artifacts by using additional spatial encoding information in the form of the nonhomogeneous sensitivities of multi-coil phased arrays. This review will begin with an overview of non-Cartesian k-space trajectories and their sampling properties, followed by an in-depth discussion of several selected non-Cartesian parallel imaging algorithms. Three representative non-Cartesian parallel imaging methods will be described, including Conjugate Gradient SENSE (CG SENSE), non-Cartesian generalized autocalibrating partially parallel acquisition (GRAPPA), and Iterative Self-Consistent Parallel Imaging Reconstruction (SPIRiT). After a discussion of these three techniques, several potential promising clinical applications of non-Cartesian parallel imaging will be covered. © 2014 Wiley Periodicals, Inc.

  18. Influence of Paralleling Dies and Paralleling Half-Bridges on Transient Current Distribution in Multichip Power Modules

    DEFF Research Database (Denmark)

    Li, Helong; Zhou, Wei; Wang, Xiongfei

    2018-01-01

    This paper addresses the transient current distribution in the multichip half-bridge power modules, where two types of paralleling connections with different current commutation mechanisms are considered: paralleling dies and paralleling half-bridges. It reveals that with paralleling dies, both t...

  19. Feature extraction through parallel Probabilistic Principal Component Analysis for heart disease diagnosis

    Science.gov (United States)

    Shah, Syed Muhammad Saqlain; Batool, Safeera; Khan, Imran; Ashraf, Muhammad Usman; Abbas, Syed Hussnain; Hussain, Syed Adnan

    2017-09-01

    Automatic diagnosis of human diseases are mostly achieved through decision support systems. The performance of these systems is mainly dependent on the selection of the most relevant features. This becomes harder when the dataset contains missing values for the different features. Probabilistic Principal Component Analysis (PPCA) has reputation to deal with the problem of missing values of attributes. This research presents a methodology which uses the results of medical tests as input, extracts a reduced dimensional feature subset and provides diagnosis of heart disease. The proposed methodology extracts high impact features in new projection by using Probabilistic Principal Component Analysis (PPCA). PPCA extracts projection vectors which contribute in highest covariance and these projection vectors are used to reduce feature dimension. The selection of projection vectors is done through Parallel Analysis (PA). The feature subset with the reduced dimension is provided to radial basis function (RBF) kernel based Support Vector Machines (SVM). The RBF based SVM serves the purpose of classification into two categories i.e., Heart Patient (HP) and Normal Subject (NS). The proposed methodology is evaluated through accuracy, specificity and sensitivity over the three datasets of UCI i.e., Cleveland, Switzerland and Hungarian. The statistical results achieved through the proposed technique are presented in comparison to the existing research showing its impact. The proposed technique achieved an accuracy of 82.18%, 85.82% and 91.30% for Cleveland, Hungarian and Switzerland dataset respectively.

  20. Advances in randomized parallel computing

    CERN Document Server

    Rajasekaran, Sanguthevar

    1999-01-01

    The technique of randomization has been employed to solve numerous prob­ lems of computing both sequentially and in parallel. Examples of randomized algorithms that are asymptotically better than their deterministic counterparts in solving various fundamental problems abound. Randomized algorithms have the advantages of simplicity and better performance both in theory and often in practice. This book is a collection of articles written by renowned experts in the area of randomized parallel computing. A brief introduction to randomized algorithms In the aflalysis of algorithms, at least three different measures of performance can be used: the best case, the worst case, and the average case. Often, the average case run time of an algorithm is much smaller than the worst case. 2 For instance, the worst case run time of Hoare's quicksort is O(n ), whereas its average case run time is only O( n log n). The average case analysis is conducted with an assumption on the input space. The assumption made to arrive at t...

  1. Quantitative and Selective Analysis of Feline Growth Related Proteins Using Parallel Reaction Monitoring High Resolution Mass Spectrometry.

    Directory of Open Access Journals (Sweden)

    Mårten Sundberg

    Full Text Available Today immunoassays are widely used in veterinary medicine, but lack of species specific assays often necessitates the use of assays developed for human applications. Mass spectrometry (MS is an attractive alternative due to high specificity and versatility, allowing for species-independent analysis. Targeted MS-based quantification methods are valuable complements to large scale shotgun analysis. A method referred to as parallel reaction monitoring (PRM, implemented on Orbitrap MS, has lately been presented as an excellent alternative to more traditional selected reaction monitoring/multiple reaction monitoring (SRM/MRM methods. The insulin-like growth factor (IGF-system is not well described in the cat but there are indications of important differences between cats and humans. In feline medicine IGF-I is mainly analyzed for diagnosis of growth hormone disorders but also for research, while the other proteins in the IGF-system are not routinely analyzed within clinical practice. Here, a PRM method for quantification of IGF-I, IGF-II, IGF binding protein (BP -3 and IGFBP-5 in feline serum is presented. Selective quantification was supported by the use of a newly launched internal standard named QPrEST™. Homology searches demonstrated the possibility to use this standard of human origin for quantification of the targeted feline proteins. Excellent quantitative sensitivity at the attomol/μL (pM level and selectivity were obtained. As the presented approach is very generic we show that high resolution mass spectrometry in combination with PRM and QPrEST™ internal standards is a versatile tool for protein quantitation across multispecies.

  2. A hybrid parallel framework for the cellular Potts model simulations

    Energy Technology Data Exchange (ETDEWEB)

    Jiang, Yi [Los Alamos National Laboratory; He, Kejing [SOUTH CHINA UNIV; Dong, Shoubin [SOUTH CHINA UNIV

    2009-01-01

    The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).

  3. Pelvic Insufficiency Fracture After Pelvic Radiotherapy for Cervical Cancer: Analysis of Risk Factors

    International Nuclear Information System (INIS)

    Oh, Dongryul; Huh, Seung Jae; Nam, Heerim; Park, Won; Han, Youngyih; Lim, Do Hoon; Ahn, Yong Chan; Lee, Jeong Won; Kim, Byoung Gie; Bae, Duk Soo; Lee, Je Ho

    2008-01-01

    Purpose: To investigate the incidence, clinical characteristics, and risk factors of pelvic insufficiency fracture (PIF) after pelvic radiotherapy (RT) in cervical cancer. Methods and Materials: Medical records and imaging studies, including bone scintigraphy, CT, and MRI of 557 patients with cervical cancer who received whole-pelvic RT between January 1998 and August 2005 were reviewed. Results: Eighty-three patients were diagnosed as having PIF after pelvic RT. The 5-year cumulative incidence of PIF was 19.7%. The most commonly involved site was the sacroiliac joint. Pelvic pain developed in 48 patients (57.8%) at diagnosis. Eleven patients (13.3%) needed admission or narcotics because of severe pain, and others had good relief of symptoms with conservative management. In univariate analysis, age ≥55 years (p < 0.001), anteroposterior/posteroanterior parallel opposing technique (p = 0.001), curative treatment (p < 0.001), and radiation dose ≥50.4 Gy (p = 0.005) were the predisposing factors for development of PIF. Concurrent chemotherapy (p = 0.78) was not significant. Multivariate analysis showed that age ≥55 years (p < 0.001), body weight <55 kg (p = 0.02), curative treatment (p = 0.03), and radiation dose ≥50.4 Gy (p = 0.04) were significant predisposing factors for development of PIF. Conclusion: The development of PIF is not rare after pelvic RT. The use of multibeam arrangements to reduce the volume and dose of irradiated pelvic bone can be helpful to minimize the risk of fracture, especially in elderly women with low body weight

  4. Analysis of clinical complication data for radiation hepatitis using a parallel architecture model

    International Nuclear Information System (INIS)

    Jackson, A.; Haken, R.K. ten; Robertson, J.M.; Kessler, M.L.; Kutcher, G.J.; Lawrence, T.S.

    1995-01-01

    Purpose: The detailed knowledge of dose volume distributions available from the three-dimensional (3D) conformal radiation treatment of tumors in the liver (reported elsewhere) offers new opportunities to quantify the effect of volume on the probability of producing radiation hepatitis. We aim to test a new parallel architecture model of normal tissue complication probability (NTCP) with these data. Methods and Materials: Complication data and dose volume histograms from a total of 93 patients with normal liver function, treated on a prospective protocol with 3D conformal radiation therapy and intraarterial hepatic fluorodeoxyuridine, were analyzed with a new parallel architecture model. Patient treatment fell into six categories differing in doses delivered and volumes irradiated. By modeling the radiosensitivity of liver subunits, we are able to use dose volume histograms to calculate the fraction of the liver damaged in each patient. A complication results if this fraction exceeds the patient's functional reserve. To determine the patient distribution of functional reserves and the subunit radiosensitivity, the maximum likelihood method was used to fit the observed complication data. Results: The parallel model fit the complication data well, although uncertainties on the functional reserve distribution and subunit radiosensitivy are highly correlated. Conclusion: The observed radiation hepatitis complications show a threshold effect that can be described well with a parallel architecture model. However, additional independent studies are required to better determine the parameters defining the functional reserve distribution and subunit radiosensitivity

  5. User-friendly parallelization of GAUDI applications with Python

    International Nuclear Information System (INIS)

    Mato, Pere; Smith, Eoin

    2010-01-01

    GAUDI is a software framework in C++ used to build event data processing applications using a set of standard components with well-defined interfaces. Simulation, high-level trigger, reconstruction, and analysis programs used by several experiments are developed using GAUDI. These applications can be configured and driven by simple Python scripts. Given the fact that a considerable amount of existing software has been developed using serial methodology, and has existed in some cases for many years, implementation of parallelisation techniques at the framework level may offer a way of exploiting current multi-core technologies to maximize performance and reduce latencies without re-writing thousands/millions of lines of code. In the solution we have developed, the parallelization techniques are introduced to the high level Python scripts which configure and drive the applications, such that the core C++ application code requires no modification, and that end users need make only minimal changes to their scripts. The developed solution leverages from existing generic Python modules that support parallel processing. Naturally, the parallel version of a given program should produce results consistent with its serial execution. The evaluation of several prototypes incorporating various parallelization techniques are presented and discussed.

  6. User-friendly parallelization of GAUDI applications with Python

    Energy Technology Data Exchange (ETDEWEB)

    Mato, Pere; Smith, Eoin, E-mail: pere.mato@cern.c [PH Department, CERN, 1211 Geneva 23 (Switzerland)

    2010-04-01

    GAUDI is a software framework in C++ used to build event data processing applications using a set of standard components with well-defined interfaces. Simulation, high-level trigger, reconstruction, and analysis programs used by several experiments are developed using GAUDI. These applications can be configured and driven by simple Python scripts. Given the fact that a considerable amount of existing software has been developed using serial methodology, and has existed in some cases for many years, implementation of parallelisation techniques at the framework level may offer a way of exploiting current multi-core technologies to maximize performance and reduce latencies without re-writing thousands/millions of lines of code. In the solution we have developed, the parallelization techniques are introduced to the high level Python scripts which configure and drive the applications, such that the core C++ application code requires no modification, and that end users need make only minimal changes to their scripts. The developed solution leverages from existing generic Python modules that support parallel processing. Naturally, the parallel version of a given program should produce results consistent with its serial execution. The evaluation of several prototypes incorporating various parallelization techniques are presented and discussed.

  7. Parallel eigenanalysis of finite element models in a completely connected architecture

    Science.gov (United States)

    Akl, F. A.; Morel, M. R.

    1989-01-01

    A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.

  8. Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis

    Science.gov (United States)

    Choudhary, Alok Nidhi

    1989-01-01

    Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.

  9. Pattern-Driven Automatic Parallelization

    Directory of Open Access Journals (Sweden)

    Christoph W. Kessler

    1996-01-01

    Full Text Available This article describes a knowledge-based system for automatic parallelization of a wide class of sequential numerical codes operating on vectors and dense matrices, and for execution on distributed memory message-passing multiprocessors. Its main feature is a fast and powerful pattern recognition tool that locally identifies frequently occurring computations and programming concepts in the source code. This tool also works for dusty deck codes that have been "encrypted" by former machine-specific code transformations. Successful pattern recognition guides sophisticated code transformations including local algorithm replacement such that the parallelized code need not emerge from the sequential program structure by just parallelizing the loops. It allows access to an expert's knowledge on useful parallel algorithms, available machine-specific library routines, and powerful program transformations. The partially restored program semantics also supports local array alignment, distribution, and redistribution, and allows for faster and more exact prediction of the performance of the parallelized target code than is usually possible.

  10. Examination of Speed Contribution of Parallelization for Several Fingerprint Pre-Processing Algorithms

    Directory of Open Access Journals (Sweden)

    GORGUNOGLU, S.

    2014-05-01

    Full Text Available In analysis of minutiae based fingerprint systems, fingerprints needs to be pre-processed. The pre-processing is carried out to enhance the quality of the fingerprint and to obtain more accurate minutiae points. Reducing the pre-processing time is important for identification and verification in real time systems and especially for databases holding large fingerprints information. Parallel processing and parallel CPU computing can be considered as distribution of processes over multi core processor. This is done by using parallel programming techniques. Reducing the execution time is the main objective in parallel processing. In this study, pre-processing of minutiae based fingerprint system is implemented by parallel processing on multi core computers using OpenMP and on graphics processor using CUDA to improve execution time. The execution times and speedup ratios are compared with the one that of single core processor. The results show that by using parallel processing, execution time is substantially improved. The improvement ratios obtained for different pre-processing algorithms allowed us to make suggestions on the more suitable approaches for parallelization.

  11. Data communications in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-10-29

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer, the parallel computer including a plurality of compute nodes that execute a parallel application, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources, including receiving in an origin endpoint of the PAMI a data communications instruction, the instruction characterized by an instruction type, the instruction specifying a transmission of transfer data from the origin endpoint to a target endpoint and transmitting, in accordance with the instruction type, the transfer data from the origin endpoint to the target endpoint.

  12. The STAPL Parallel Graph Library

    KAUST Repository

    Harshvardhan,; Fidel, Adam; Amato, Nancy M.; Rauchwerger, Lawrence

    2013-01-01

    This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable

  13. Seasonal characterization of CDOM for lakes in semi-arid regions of Northeast China using excitation-emission matrices fluorescence and parallel factor analysis (EEM-PARAFAC)

    Science.gov (United States)

    Zhao, Y.; Song, K.; Wen, Z.; Li, L.; Zang, S.; Shao, T.; Li, S.; Du, J.

    2015-04-01

    The seasonal characteristics of fluorescence components in CDOM for lakes in the semi-arid region of Northeast China were examined by excitation-emission matrices fluorescence and parallel factor analysis (EEM-PARAFAC). Two humic-like peaks C1 (Ex/Em = 230, 300/425 nm) and C2 (Ex/Em = 255, 350/460 nm) and two protein-like B (Ex/Em = 220, 275/320 nm) and T (Ex/Em = 225, 290/360 nm) peaks were identified using PARAFAC. The average fluorescence intensity of the four components differed with seasonal variation from June and August 2013 to February and April 2014. The total fluorescence intensity significantly varied from 2.54 ± 0.68 nm-1 in June to the mean value 1.93 ± 0.70 nm-1 in August 2013, and then increased to 2.34 ± 0.92 nm-1 in February and reduced to the lowest 1.57 ± 0.55 nm-1 in April 2014. In general, the fluorescence intensity was dominated by peak C1, indicating that most part of CDOM for inland waters being investigated in this study was originated from phytoplankton degradation. The lowest C2 represents only a small portion of CDOM from terrestrial imported organic matter to water bodies through rainwash and soil leaching. The two protein-like intensities (B and T) formed in situ through microbial activity have almost the same intensity. Especially, in August 2013 and February 2014, the two protein-like peaks showed obviously difference from other seasons and the highest C1 (1.02 nm-1) was present in February 2014. Components 1 and 2 exhibited strong linear correlation (R2 = 0.633). There were significantly positive linear relationships between CDOM absorption coefficients a(254) (R2 = 0.72, 0.46, p DOC. However, almost no obvious correlation was found between salinity and EEM-PARAFAC extracted components except for C3 (R2 = 0.469). Results from this investigation demonstrate that the EEM-PARAFAC technique can be used to evaluate the seasonal dynamics of CDOM fluorescence components for inland waters in semi-arid regions of Northeast China.

  14. The effect of plasma fluctuations on parallel transport parameters in the SOL

    DEFF Research Database (Denmark)

    Havlíčková, E.; Fundamenski, W.; Naulin, Volker

    2011-01-01

    The effect of plasma fluctuations due to turbulence at the outboard midplane on parallel transport properties is investigated. Time-dependent fluctuating signals at different radial locations are used to study the effect of signal statistics. Further, a computational analysis of parallel transport...... to a comparison of steady-state and time-dependent modelling....

  15. LOOP-3, Hydraulic Stability in Heated Parallel Channels

    Energy Technology Data Exchange (ETDEWEB)

    Davies, A L [AEEW, Dorset (United Kingdom)

    1968-02-01

    1 - Nature of physical problem solved: Hydraulic stability in parallel channels. 2 - Method of solution: Calculation of transfer functions developed in reference (10 below). 3 - Restrictions on the complexity of the problem: Only due to assumptions in analysis (see ref.)

  16. Towards Interactive Visual Exploration of Parallel Programs using a Domain-Specific Language

    KAUST Repository

    Klein, Tobias; Bruckner, Stefan; Grö ller, M. Eduard; Hadwiger, Markus; Rautek, Peter

    2016-01-01

    The use of GPUs and the massively parallel computing paradigm have become wide-spread. We describe a framework for the interactive visualization and visual analysis of the run-time behavior of massively parallel programs, especially OpenCL kernels. This facilitates understanding a program's function and structure, finding the causes of possible slowdowns, locating program bugs, and interactively exploring and visually comparing different code variants in order to improve performance and correctness. Our approach enables very specific, user-centered analysis, both in terms of the recording of the run-time behavior and the visualization itself. Instead of having to manually write instrumented code to record data, simple code annotations tell the source-to-source compiler which code instrumentation to generate automatically. The visualization part of our framework then enables the interactive analysis of kernel run-time behavior in a way that can be very specific to a particular problem or optimization goal, such as analyzing the causes of memory bank conflicts or understanding an entire parallel algorithm.

  17. A parallel neural network training algorithm for control of discrete dynamical systems.

    Energy Technology Data Exchange (ETDEWEB)

    Gordillo, J. L.; Hanebutte, U. R.; Vitela, J. E.

    1998-01-20

    In this work we present a parallel neural network controller training code, that uses MPI, a portable message passing environment. A comprehensive performance analysis is reported which compares results of a performance model with actual measurements. The analysis is made for three different load assignment schemes: block distribution, strip mining and a sliding average bin packing (best-fit) algorithm. Such analysis is crucial since optimal load balance can not be achieved because the work load information is not available a priori. The speedup results obtained with the above schemes are compared with those corresponding to the bin packing load balance scheme with perfect load prediction based on a priori knowledge of the computing effort. Two multiprocessor platforms: a SGI/Cray Origin 2000 and a IBM SP have been utilized for this study. It is shown that for the best load balance scheme a parallel efficiency of over 50% for the entire computation is achieved by 17 processors of either parallel computers.

  18. Towards Interactive Visual Exploration of Parallel Programs using a Domain-Specific Language

    KAUST Repository

    Klein, Tobias

    2016-04-19

    The use of GPUs and the massively parallel computing paradigm have become wide-spread. We describe a framework for the interactive visualization and visual analysis of the run-time behavior of massively parallel programs, especially OpenCL kernels. This facilitates understanding a program\\'s function and structure, finding the causes of possible slowdowns, locating program bugs, and interactively exploring and visually comparing different code variants in order to improve performance and correctness. Our approach enables very specific, user-centered analysis, both in terms of the recording of the run-time behavior and the visualization itself. Instead of having to manually write instrumented code to record data, simple code annotations tell the source-to-source compiler which code instrumentation to generate automatically. The visualization part of our framework then enables the interactive analysis of kernel run-time behavior in a way that can be very specific to a particular problem or optimization goal, such as analyzing the causes of memory bank conflicts or understanding an entire parallel algorithm.

  19. Parallelism and array processing

    International Nuclear Information System (INIS)

    Zacharov, V.

    1983-01-01

    Modern computing, as well as the historical development of computing, has been dominated by sequential monoprocessing. Yet there is the alternative of parallelism, where several processes may be in concurrent execution. This alternative is discussed in a series of lectures, in which the main developments involving parallelism are considered, both from the standpoint of computing systems and that of applications that can exploit such systems. The lectures seek to discuss parallelism in a historical context, and to identify all the main aspects of concurrency in computation right up to the present time. Included will be consideration of the important question as to what use parallelism might be in the field of data processing. (orig.)

  20. Operability probabilistic analysis: methodology for economic improvement through the parallelization of process plants; Analisis probabilistico de operatividad: metodologia para mejora economica a traves de la paralelizacion de plantas de proceso

    Energy Technology Data Exchange (ETDEWEB)

    Mendoza, A.; Francois, J. L.; Martin del Campo, C.; Nelson, P. F., E-mail: iqalexmdz@yahoo.com.mx [UNAM, Facultad de Ingenieria, Departamento de Sistemas Energeticos, Paseo Cuauhnahuac No. 8532, Col. Progreso, 62550 Jiutepec, Morelos (Mexico)

    2012-10-15

    One of the major challenges of the emergent technologies to overcome is the economic competitive with regard to the established technologies ar the present time, since these should not only take advantage efficiently of the energy resources and the raw materials in their productive processes, but also to elevate to the maximum the use of the derived economic resources of the initial investment of the plant. In special cases, like in those related with the electric power generation or fuels, the fixed cost represents a high percentage of the total cost, where is observed a great dependence with the plant factor, parameter that in turn is susceptible to non prospective variations but yes predictable by means of the use of analytic tools, able to relate the failures rates of present elements in the plant with the probability of operation outside times, as the Operability Probabilistic Analysis. In this study were evaluated the implications of changes in the plant configurations, with the purpose of knowing the economic advantages of a major or minor equipment s division in parallel (parallelization); the function general objective is established to evaluate the parallelization alternatives and the basic concepts are presented to carry out this methodology. At the end a study case is developed for a hydrogen production plant in its section of sulfuric acid decomposition. (Author)